I find it convenient sometimes to treat a string as a collection of characters e.g.
let v = Character("a")
let c = Character("x")
let b1 = "aeiou".contains(v) // true
let b2 = "aeiou".contains(c) // false
However, I have a piece of code that does this looking for a character that might be the start of a line ending. It's looking for \r and \n on the grounds that most reasonable line endings start with one of those two characters. The test I had didn't work because for some reason "\r\n" behaves strangely in a collection context. The following code illustrates the weirdness.
let s1 = "\r\n"
let s2 = "\r \n"
let lf = Character("\n")
print(s1.contains(lf))
print(s2.contains(lf))
print(Array(s1))
print(Array(s2))
This is the output
false
true
["\r\n"]
["\r", " ", "\n"]
The string "\r\n" does not contain a line feed but the string "\r \n" does. Not only that, but when converted to an array, the former contains only a single element consisting of a some mashup of the two characters, but the latter contains the three characters (correctly IMO).
This looks like a bug but it has occurred to me that there might be some Unicode standard that says CR LF can be considered a single character in decomposed form (or whatever the terminology).
You're probably running into trouble because "\r\n" is a single Unicode character composed of two code points. Since String operates over characters, not code points, it composes those two code points into a single character. Surprisingly, you'll find that "\r\n".count == 1!
For the specific problem you're encountering, I'd recommend using an array of characters rather than a string:
let s: [Character] = ["\r", "\n"]
let lf = Character("\n")
print(s.contains(lf))
@David_Smith, I vaguely recall you and I discussed this exact scenario way back in the day, but I can't find a record of the discussion. I know you know more than I doβcan you elaborate at all?
Yes, but I'm lazy and it's a drag having to type all those commas and quotes and have to put the type annotation on so you don't get an array of strings.
I was annoyed by this, but it actually makes my problem a bit easier because I can test for all types of line ending assuming the line ending is a single character instead of sometimes being two.
Indeed, as others have commented, you have stumbled on this edict of the Unicode standard that CRLF is a single Unicode extended grapheme cluster, which is known in Swift as a character. Hurray!
If you're interested (and how could anybody not be?) there is a whole specification on Unicode line-breaking algorithms - UAX#14.
Most of it relates to wrapping (finding opportunities to break based on how the text is presented), but it also defines mandatory breaks, which I think is what your code is looking for - elements of the text such as \n, which force a break.
Long story short, you can handle all of those mandatory breaks using Character.isNewline as @grynspan suggested.
Otherwise, just to reiterate what others have said, if you care about this specific kind of break only, CRLF is a single character and you can indeed create a Swift Character with that value:
Yes, I am aware of this, but I'm porting a C library to Swift. I thought I'd get a straightish translation working before I started refactoring it and making it more Swifty.
If you want to process text like C, it may be better to work with the UTF-8 view and forgo high-level Unicode processing for now. You can replace operations with Unicode-aware variants later, where it makes sense to do so (you may even need to ignore some Unicode behaviour for compatibility reasons).
That's how I started, but it turned out to be extremely painful.The library reads the text into a raw buffer undecoded and then converts it from whatever character set (only UTF-8 and UTF-16 supported) into effectively Unicode code points and then converts those into UTF-8 (or back into UTF-8). Then it has more code to treat multibyte UTF-8 as single characters.
I've saved several hundred lines by just converting the raw input into a sequence of Character instead of UTF-8. But, in writing this comment, I have realised my previous comment does not hold water as an excuse not to do it properly. I'm in a sort of half way house in which I'm duplicating some of the Unicode work that Apple has already done except with added bugs.