[Pitch] Unicode for String Processing

benlings · April 25, 2022, 7:42am

I'm not sure which Regex-related proposal this fits under, but this behaviour was a little unexpected at first (running under the current nightly docker image):

let r = try! Regex(compiling: #"\n"#)
let c = "line1\r\nline2".contains(r)
print("contains? \(c)")

From here, I understand that \r\n is a grapheme cluster and therefore a single Character so it makes sense once I had made that link. However, with the introduction of Regex, it's going to be easier to do text processing in Swift and I think this will catch others out.

It might be worth mentioning this particular case when talking about the differences between Unicode and code-point-level text processing? I had naïvely thought that it would only make a difference to things like emoji and accented characters.