I think CharacterSet is misleading, confusing and probably dangerous (its name and its documentation together with its semantics).
Despite it's name, it doesn't correctly handle Characters, and it doesn't correctly handle UnicodeScalars either.
A BMP scalar is not a UnicodeScalar or Unicode.Scalar, as can be seen in the above link, and is demonstrated by this little program:
let str = "😀A"
for scalar in str.unicodeScalars {
let hexValue = String(scalar.value, radix: 16)
print("UnicodeScalar:", scalar)
print(" hexValue:", hexValue)
}
which prints:
UnicodeScalar: 😀
hexValue: 1f600
UnicodeScalar: A
hexValue: 41
Note that 1f600 is not a 16-bit value, so "
" is clearly not representable as a BMP scalar (but note that it is representable as a UnicodeScalar).
The danger of CharacterSet is that it will happily accept input that it cannot handle correctly, and when doing so, it shows strange behavior like the following:
import Foundation
let cs1 = CharacterSet(charactersIn: "🙂🙁")
print(cs1.contains("🙂")) // true
print(cs1.contains("🙁")) // true
let cs2 = CharacterSet(charactersIn: "🙂🙁A")
print(cs2.contains("🙂")) // false (!)
print(cs2.contains("🙁")) // false (!)
print(cs2.contains("A")) // true
And another example:
import Foundation
var cs1 = CharacterSet()
cs1.insert("\u{1F600}")
print(cs1.contains("\u{1F600}")) // true
var cs2 = CharacterSet(charactersIn: "a")
cs2.insert("\u{1F600}")
print(cs2.contains("\u{1F600}")) // false (!)
The true semantics of CharacterSet is ... mysterious, or at least not at all according to what its name suggests, or what its current documentation says:
A CharacterSet represents a set of Unicode-compliant characters. Foundation types use CharacterSet to group characters together for searching operations, so that they can find any of a particular set of characters during a search.
This type provides “copy-on-write” behavior, and is also bridged to the Objective-C NSCharacterSet class.