I wouldn't say that it is incorrect for Swift. I would say that it seems inconsistent with a view of String as a collection of graphemes. These semantics may be perfect for, e.g., String.UnicodeScalarView.trimmed()
.
Degenerate graphemes are a reality that we have to deal with, and we (unfortunately) cannot have a nice algebraic String because of them. But, it does seem odd for String to have a common-use API that produces them.
I don't think I communicated my point well. My example was meant to demonstrate some corner cases that we need to have a strategy for, even if a random package's implementation doesn't care about the details. I see (at least) 3 potential results, each of which has some upsides and downsides. The question is what the result should be.
I’ll use \u{020} to represent a space explicitly.
“\u{020}\u{301}abc\n\u{301}”.trimmed()
could yield:
- “abc”, under the view that String APIs operate on graphemes and a grapheme leading with whitespace is considered whitespace.
- “\u{020}\u{301}abc\n\u{301}”, under the view that String APIs operate on graphemes and a grapheme with a combining character is not whitespace.
- “\u{301}abc\n\u{301}”, which results in a degenerate leading grapheme.
I feel like #3 makes sense for a trimmed()
on the UnicodeScalarView or the the code unit views. For String
, we need to figure out which of these 3 (or perhaps an unknown 4th option) is the least harmful behavior.