String hygiene

If you implemented this using trimmingCharacters, then it's actually what NSString returned. Since we'll want this for the standard library itself, we'll need an implementation that doesn't depend on Foundation (edit: this is an implementation detail, not important to this discussion). We'll need to define the semantics in these situations.

More precisely:

Array(" \u{301}abc\n\u{301}".trimmed().unicodeScalars)
// => Array<Unicode.Scalar> = ["\u{0301}", "a", "b", "c", "\n", "\u{0301}"]

So it did drop the leading space. This means that:

let str = " \u{301}abc\n\u{301}"
str.count // => 6
str.trimmed().count // => 6
(str + str).count // => 12
(str + str.trimmed()).count // => 11

Which might not be intuitive, as trimming results in an isolated combining scalar. This is further complicated by the fact that Unicode pre-4.1 recommended the use of a space followed by a combining scalar as a technique for displaying isolated combining scalars. If someone was doing this, they certainly don't want it trimmed! But, this was a bad idea (because computers use spaces as separators ), and they reverted the suggestion in 4.1 and later.

I'm not arguing either way, I'm just curious what the proper semantics should be. There might not be a good answer, so we'd have to figure out what the least bad answer is.

3 Likes