Str.range result changes in macOS 14.0 Sonoma

Changes in macOS 14.0 Sonoma introduced data-loss bugs to the Pacific Tech Graphing Calculator which I tracked down to changes in the result from str.range in

func lineTermination() -> String.Index { str.range(of: #";[\r\n]"#, options:[.regularExpression], range:pos..<str.endIndex)?.upperBound ?? str.endIndex }

Previously, applying that to the string y=sin(x)+sin(2x)+sin(3x);\rText "This is a function of x.";\r would find the position of the first ;\r, now in Sonoma it returns the end of the entire input string, causing the rest of my document parser to fail quietly, discarding the rest of the document, (after which auto-save would result in data loss)

Was I relying on unspecified behavior of str.range? Was the change in Sonoma documented anywhere? Any advice for a simple fix?

My best guess is that they might have made the Foundation regular expression routines call into the Swift ones. The Foundation regex routines probably considered the \r and \n to be separate characters but if they're calling into the Swift routines then the the \r\n would be treated as a single character (in line with Unicode rules and the normal behavior of \r\n in Swift strings).

You could probably replace the [\r\n] with (\r|\n) to get consistent behavior with both implementations. Or you could just switch the order and use [\n\r] as I think they'd always be interpreted as separate characters in that order.

But I'd be tempted to switch to a solution that accepts any of the possible line endings, rather than only the classic Mac (\r) and UNIX (\n) line endings. (handling Windows-style \r\n is usually a good idea.)

str.firstIndex {$0.isNewline} is what pops to mind.

Furthermore, if possible I'd avoid all of the old NSString methods, as there are a lot of pitfalls to be found there.

4 Likes

Oddly, str.firstIndex { $0.isNewline } returns nil for that string (Xcode 15 on Sonoma). However, str.ranges(of: #";\r"#) works fine but I can't find a regex that works for matching a ; plus any newline.

1 Like

Please do file a bug on this using Feedback Assistant; it sounds like a behavior change that should be link-checked.

1 Like

Today I learned, (thank you @tim1724) that \r\n is a single grapheme cluster:

This is, in fact, the source of the behavior that you're seeing. In macOS 14, the String methods in Foundation began using Swift standard library, including the Swift Regex type. Since the Regex type operates on grapheme clusters by default, the custom character class in your regex gets interpreted as having a single member ("\r\n") instead of two separate ones that can match individually (i.e. ["\r", "\n"]).

There are a few options to work around this change:

  1. Using the \v character class will match any "vertical whitespace" character
  2. Reversing the order of \r and \n resolves the issue, since "\n\r" doesn't coalesce into a single grapheme cluster
  3. Using the regex builder syntax instead of a regex literal, with CharacterClass.newlineSequence
  4. Explicitly enabling Unicode scalar matching mode
// 1. Using '\v'
str.range(of: #";\v"#, ...)
// 2. Reversing the order
str.range(of: #";[\n\r]"#, ...)
// 3. Using builder syntax
str[pos...].firstRange(of: Regex { ";"; CharacterClass.newlineSequence })
// 4. Using unicode matching
str[pos...].firstRange(of: /;[\r\n]/.matchingSemantics(.unicodeScalar))

We'll look at how to address this — thanks for bringing it up!

8 Likes

FB13286132 (in case any  folks need that)

4 Likes

Thank you! It is such a pleasure to be back on Swift Forums after a year away. It is a privilege to receive such friendly, reliable, fast and authoritative assistance on technical matters. The signal to noise here is astoundingly high.

(Particularly in the category of asking questions on the internet where my general expectation is that of shouting uselessly into the void.)

9 Likes