Bad Digit Matching: Bugreport regarding SE-0354 Regex Literals

I did not want to put this in the ongoing review topic of the proposal. I'm leaving it here because I don't know of a better place.

Problem: Some digit character groups match number-like grapheme clusters.

// this matches:
try /[1-2]/.wholeMatch(in: "1️⃣")

// still matches:
try /[1-2]/.asciiOnlyDigits().wholeMatch(in: "1️⃣")

// does not match:
try /[12]/.wholeMatch(in: "1️⃣")

Above described behavior seems inconsistent and difficult to predict. Shouldn't [1-2] and [12] be identical? Should they match anything outside of ascii?

Note: :one: is U+0031 (ascii digit 1) U+FE0F (VARIATION SELECTOR-16) U+20E3 (COMBINING ENCLOSING KEYCAP)

Same is true for 1︎⃣: U+0031 (ascii digit 1) U+FE0E (VARIATION SELECTOR-15) U+20E3 (COMBINING ENCLOSING KEYCAP)

1 Like

Bugs are best reported via github issues on either the swift repo or the string-processing repo.

I went ahead and did this for you: Digit matching behaving as intended? · Issue #401 · apple/swift-experimental-string-processing · GitHub

2 Likes

Note, the right thread to post this to if this isn't a bug and is expected but debatable behavior would be the other review thread: SE-0355: Regex Syntax and Runtime Construction, rather than the literals mega-thread, since it isn't specific to literals so much as the behavior of regex syntax.

3 Likes

More specifically, it would be [Pitch] Unicode for String Processing which proposes semantics, including Character ranges and asciiOnlyDigits.

2 Likes