In the docs for asciiValue it mentions:
/// A character with the value "\r\n" (CR-LF) is normalized to "\n" (LF) and
/// has an
asciiValue property equal to 10.
My understanding of unicode normalization was that this implies equality, but these don't equal atm?
That (CR + LF → LF) is not Unicode normalization, and the two are not canonically equivalent by Unicode’s standards. Hence they do not satisfy
The “normalization” we are talking about here (by using the basic dictionary definition, not the Unicode technical term) is done simply because CR + LF is one
Character (extended grapheme cluster in Unicode parlance), but two ASCII values. For the
Character instance to produce a single
UInt8, it has to somehow handle two as one. To do that, it was decided to convert the pair to the equivalent UNIX line ending when needing to express it as a single ASCII byte. The alternative design choice would have been to return
nil, as is done for
≠ or any other Unicode‐only character, but that design seems even less intuitive.
That makes sense then, thank you. Yeah with the alternative approach a string with only ASCII unicode scalars might have characters returning
Is this the right forum category for this discussion? Just checking.
The thread began as essentially, “The standard library seems to be doing this wrong. Should we fix it or not?” For that sentiment, I think it was posted in the right place.
(JIRA is another reasonable place for bringing this sort of thing up.)