Parsing complicated strings

Hi. I'm still a newbie, though I'm learning more every day. I'm working on an app for MacOS, and I have a couple questions regarding parsing working with big, complicated strings of text.

I have three processes that are going to be running from command-line apps that will be located in my apps main resource bundle.

The first will simply analyse the input file and return to pieces of data I need to build arguments that will be passed into the processes for the other two command-line apps.

I have a wall 'o text that is stderr and stdout piped from that process and displayed in a textfield,

I suppose my first question is...do I need to display this in the textfield to work with it?

Ideally, all the user needs to see displayed in that textfield is the output of the final process, not the intermediate stuff. But I'm not sure how to parse console output or if console output is even parse-able once there's no more console? (i.e. the app has been full compiled and packaged and isn't being run from XCode any longer.)

When I was trying to figure this out, I saw lots of how-tos about converting stdout and stderr to a file, and I managed to extrapolate that to display it in the text field. But do I need to do this to work with it?

Second question: One of the pieces of data I need to work with is buried in fairly intricate line of text, and I'm unsure of the most efficient way of extracting it (and from there, isolating the first 8 characters of it, which is all I need)

Right now, what I have is this:

func parseOutput() {
	let outputLog = outputDisplay.string
	let outputLineArray = outputLog.split { $0.isNewline }
	// filter by a keyword, which is a unique word that won't be
	// found anywhere else in the log, preceding the data I actually need
	// data I need will be Keyword == DataINeed
	let outputLineElement = outputLineArray.filter { $0.contains("keyword") }
	// convert the single array element that is the entire line into a string
	let outputLineString = outputLineElement.joined()
	// split the string into a new array separated by whitespaces
	let singleLineArray = outputLineString.split(separator: " ")
	// get the last part of that array, which is the data I actually need
	let dataINeed = singleLineArray.last
	// get the first 8 characters of that data and convert the substring to a string
	dataFirst8 = String(dataINeed!.prefix(8))
}

Surely there has to be a less clumsy and repetitive way of doing this?

Sounds like a job for regular expressions!
From your code it seems like you want your regex to be ^keyword.* (.{8}).*$
You can copy-paste it to https://regex101.com/ write some example lines and check if it does what you want to do, and by hovering over the regex check what each character does. Then you can use NSRegularExpression class from Foundation to extract the capturing group

1 Like

To answer your first question, no, you don’t need a TextField to work with the data that the process outputs. TextField is a ‘view’ whose job it is to present underlying data ‘models’ to the user. You can work with a model of your data without any views.

May I recommend that you Google for “Model View Controller” or “MVC”.

1 Like

Thank you, @cukr and @Diggory! Those are definitely the answers I needed!

1 Like

... I may have spoken just a little too soon.

I might be missing something obvious, but I've been looking all night and I haven't found the answer yet.

@cukr the regex tester you linked for me gave me the regex I needed (which is "keyword.* (.{8}).*$" -- it didn't match when prefaced by the ^).

It gives me two results: a Full Match, which is keyword == everythingthatfollows and Group 1, which is the first 8 characters of everythingthatfollows. Which is what I need.

What I need my method to do is return Group 1 as a string (Group1.stringValue, if you will) that I can then use to construct an argument for a process.

But while I've found plenty of how-tos on running comparisons, working with ranges, replacing matches, and getting bool results if there's a match, the closest I've been able to come to what I need is firstMatch which still returns a type of NSTextCheckingResult, and I can't figure out how to get a string from it.

So, experimenting in playground for now, I have:

import Cocoa
import Foundation

let regexPattern = "keyword.* (.{8}).*$"
let outputString = "Big Long String of Stuff that I really don't need except except that one part following keyword == whichionlyneedthefirst8charactersof"

func parseOutput() {
	let regex = try! NSRegularExpression(pattern: regexPattern, options: [])
	let match = regex.firstMatch(in: outputString, options: [], range: NSRange(location: 0, length: outputString.count))
	print(match!)
}
parseOutput()

Which prints to the console: <NSSimpleRegularExpressionCheckingResult: 0x7fa1b1e48bb0>{88, 46}{<NSRegularExpression: 0x7fa1b1d0de60> keyword.* (.{8}).*$ 0x0}

this is the only article I've found that actually discusses getting a string from the results of the regex comparison, but the moment I started trying to pick my way through it and reframe it so that I was only getting the firstMatch, I kept either breaking it or ending up back at NSTextCheckingResult as my output.

You were pretty close, and using NSRegularExpression isn't obvious at all! (I blame obj-c)

func parseOutput() {
    let regex = try! NSRegularExpression(pattern: regexPattern, options: [])
    let match = regex.firstMatch(in: outputString, options: [], range: NSRange(location: 0, length: outputString.count))
    let matchRange = match!.range(at: 1) // range(at: 0) is the full match, 1 is the first capturing group
    // You can also use `range(withName:)` if your regex is more complex
    let rangeFromNSRange = Range(matchRange, in: outputString)! // convert obj-c NSRange into swift String range
    let first8CharactersOfThatOnePart = String(outputString[rangeFromNSRange])
    print(first8CharactersOfThatOnePart)
}
1 Like

Thank you! After spending all night researching this, if I never see the word "string" again (especially in an ambiguous context where I can't tell if it means the variable type String, or the regex String, or the string that I'm trying to parse) it will be too soon.

I'm going to have nightmares about strings.

I'm so traumatized I'll be throwing out my yarn collection later today.

I set out to learn Swift and maybe get more familiar with ffmpeg along the way. Little did I know I'd also be picking up how to work with regex and obj-c. If this is what the first six weeks is like, I can't imagine where I'll be in a year. :dizzy_face:

Anyway. Thank you.

shambles away muttering, "learn Swift, they said. It's straightforward, they said..."

1 Like

Oh no! I hope you will give your yarn to animal shelter, so that it could find a new home.
Swift stdlib needs a good regex library :frowning:
Good luck at learning!

1 Like

Sorry to make things more complicated, but reaching for NSRegularExpression and NSRange brings another complication with it that has been overlooked: UTF‐16.

To be honest, the code in your original post at the very top is probably more‐or‐less what I would have gone with after as many years working with Swift as Swift has existed.

If you are feeling discouraged, then just revert to your original code and ignore what follows.


However, if you do want to use regular expressions—and you are willing to keep learning—then you need to make some adjustments.

The Search Range

NSRange(location: 0, length: outputString.count)

You are lucky in that that cannot trigger a fatal error, but it could still lop off the tail end of the string. count is the number of extended grapheme clusters, which Swift calls Character. On the other hand, length is the number of UTF‐16 code units. So you need to pass utf16.count to length instead.

// This is an “x” with 20 macrons (◌̄) stacked on it.
let complex = "x̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄"
print(complex.count) // 1
print(complex.utf16.count) // 21

let string = "\(complex) keyword == theEight"

let mixedUpRange = NSRange(location: 0, length: string.count)
print((string as NSString).substring(with: mixedUpRange))
// x̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄

let correctRange = NSRange(location: 0, length: string.utf16.count)
print((string as NSString).substring(with: correctRange))
// x̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄ keyword == theEight

So if you pass the mixed‐up range as your search range, the thing you are looking for may be beyond the end of where you are searching.

It is worth noting that you do not even need to leave ASCII before the two to diverge. Even a carriage return and line feed combination will cause a difference:

let complex = "\r\n"
print(complex.count) // 1
print(complex.utf16.count) // 2

Extracting the Match

Here you need to decide whether you want 8 Characters (grapheme clusters) or 8 UTF16.CodeUnits.

let outputString = "... keyword == simple+x̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄"

The original version pulled 8 Characters:

dataFirst8 = String(dataINeed!.prefix(8))
// simple+x̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄̄

But the new version attempts to pull 8 UTF16.CodeUnits:

let regexPattern = "keyword.* (.{8}).*$"
// ...
let matchRange = match!.range(at: 1)
// simple+x

However, the new version can contain indices that are invalid for String, so the conversion back to String.Index can fail even when there is a valid match:

let matchRange = match!.range(at: 1)
// simple+x

let rangeFromNSRange = Range(matchRange, in: outputString)
// nil
// (Or trap, if you don’t remove the force unwrap (!)
// that was originally at the end of this line.)

So instead you’ll need to do the conversion yourself in a way that preserves intra‐Character indices:

let matchRange = match!.range(at: 1)
// simple+x

let utf16 = outputString.utf16
let lowerBound = utf16.index(
    utf16.startIndex,
    offsetBy: matchRange.location)
let upperBound = utf16.index(
    utf16.startIndex,
    offsetBy: matchRange.location + matchRange.length)
let first8CharactersOfThatOnePart
    = String(outputString.utf16[lowerBound ..< upperBound])
5 Likes

It is...surprisingly possible that this is what I've been bumping into all day.

I've been getting a fatal error because of a nil return at let matchRange = match!.range(at: 1) and it's been making me crazy.

I had assumed it was a problem with the way I was deriving the outputString variable (that either I was using it wrong so that it wasn't being defined in the function where it was supposed to be, or that I wasn't properly piping string output from the process now that I'm no longer parsing it from a textview) but hmm. Yeah. It's possible I just spent the entire night and day making something work that wasn't going to work no matter what I tried.

I will read over your message again and see if I can make sense of it, or if it's worth it to me to try to do it that way instead of the rather blunt-instrument approach I was taking in the beginning.