This is a very important type - our new broadest text currency type.
I need more time to digest it, but with regard to normalisation checks, and quick-check in particular, I can share some of the things I considered when omitting it from the normalisation proposal.
Basically, when you perform a quick check for normalisation, you can end up with one of 3 answers:
- YES - definitely is normalised
- NO - definitely is not normalised
- MAYBE - could be normalised, but a more comprehensive (non-"quick") check is needed to say for sure
If we return a Bool
, we would need to collapse NO and MAYBE to the same state, losing important information ("do I need to bother to run the comprehensive check?").
So you might think the ideal result looks something like this:
enum QuickCheckResult {
case yes
case no
case maybe
}
Except that's not quite ideal, either. You see, if we reached a MAYBE character, it means there is some prefix of the String which was entirely YES (otherwise we would have early-exited). If you did want to resolve that MAYBE condition, we'd really want to preserve that information. So maybe it would instead look something like this:
enum QuickCheckResult<C: Collection> {
case yes
case no
case maybe(requiresCheckFrom: C.Index)
}
And the way you want to use it would be something like:
var i = text.startIndex
while i < text.endIndex {
let result = text[i...].isNormalized_QuickCheck(.nfc)
switch result {
case .yes:
return true
case .no:
return false
case .maybe(requiresCheckFrom: let startOfRemainder):
let (isNormalized: Bool, resumeFrom: C.Index) = text.isNormalized_resolveQuickCheckMaybe(.nfc, from: startOfRemainder)
guard isNormalized else {
return false
}
i = resumeFrom
}
}
This is very, very rough and we can probably clean it up a lot. The reason I omitted this from the normalisation proposal was so I could avoid thinking about the design of it too much, so these mental sketches are as much as I've got (normalisation is a big enough proposal as it is).
But essentially what you want is to keep using the quick-check algorithm as much as possible, and only fall back to the slow path when you really, truly have no other choice.
This algorithm is already implemented in the isNormalized
function I wrote for the normalisation proposal.
(Okay, admittedly it's structured slightly differently from the above because it tracks when a segment containing a MAYBE character ends, not when it begins, but conceptually it's the same thing.)
Another thing to consider: in a language like Swift, where Strings are compared with canonical equivalence by default, you very rarely need to manually normalise text. Anybody who is even going down this road, manually normalising or checking for normalisation, is already a bit of an advanced user, and if they're reaching for a normalisation quick-check no less, I think it's safe to say they are the kind of developer who would appreciate us returning the full fidelity of information captured by the quick-check algorithm.