it is surprisingly hard to find the answer to this question! from reading the source code of the standard library, it seems String.lowercased
calls Unicode.Scalar.Properties.lowercaseMapping
is this the same as the en_US_POSIX
locale?
it is surprisingly hard to find the answer to this question! from reading the source code of the standard library, it seems String.lowercased
calls Unicode.Scalar.Properties.lowercaseMapping
is this the same as the en_US_POSIX
locale?
It doesn't use any locale, it uses the lowercase case mapping as defined by the Unicode standard, which is invariant of locale.
if i have a database with a unique index on a string field, and i want to validate from Swift that all keys are unique under String.lowercased
before performing an insert, what collation should i select when creating the index?
the closest thing i could think of is en_US_POSIX
with secondary
ICU comparison level, but i have no idea if this is actually correct.
Just be careful that Swift and your database engine may disagree about whether a key is unique or a duplicate, as their data tables may be at different versions. And each may further change its answer as they are updated with new tables.
See the recent pitch thread for Unicode normalisation APIs, and in particular the discussion around stable normalisations.
i suppose this is probably unavoidable. but if i wanted to minimize the probability of disagreement (e.g., to minimize the amount of HTTP redirect cycles in a URL router), what is the best way to align both sides?
for example, is there any advantage to using String.lowercased(with:)
with the language set to en
?
I would check from the Swift side whether the string contains unassigned code-points.
If it does, Swift canβt say for certain whether the strings are unique or duplicates. The uncertainty is that strings which are considered unique (today) may be duplicates (tomorrow). No strings ever go from duplicates -> unique.
If it does not, Swift can definitely say whether the strings are unique or duplicates. So you can put them in the DB knowing that nobody will ever think those keys are duplicates.
As for the locale-related aspect, I donβt know. But the normalisation stability issue remains regardless, so I figured it was worth mentioning.