I did not want to put this in the ongoing review topic of the proposal. I'm leaving it here because I don't know of a better place.
Problem: Some digit character groups match number-like grapheme clusters.
// this matches:
try /[1-2]/.wholeMatch(in: "1️⃣")
// still matches:
try /[1-2]/.asciiOnlyDigits().wholeMatch(in: "1️⃣")
// does not match:
try /[12]/.wholeMatch(in: "1️⃣")
Above described behavior seems inconsistent and difficult to predict. Shouldn't [1-2] and [12] be identical? Should they match anything outside of ascii?
Note:
is U+0031 (ascii digit 1) U+FE0F (VARIATION SELECTOR-16) U+20E3 (COMBINING ENCLOSING KEYCAP)
Same is true for 1︎⃣: U+0031 (ascii digit 1) U+FE0E (VARIATION SELECTOR-15) U+20E3 (COMBINING ENCLOSING KEYCAP)
1 Like
scanon
(Steve Canon)
2
2 Likes
Note, the right thread to post this to if this isn't a bug and is expected but debatable behavior would be the other review thread: SE-0355: Regex Syntax and Runtime Construction, rather than the literals mega-thread, since it isn't specific to literals so much as the behavior of regex syntax.
3 Likes
More specifically, it would be [Pitch] Unicode for String Processing which proposes semantics, including Character ranges and asciiOnlyDigits.
2 Likes