Declarative String Processing Overview

Michael_Ilseman · October 6, 2021, 9:58pm

When applied with grapheme-cluster semantics (the default if applied to String), it would match grapheme-cluster by grapheme-cluster and comparison obeys canonical equivalence. There are some features that might not be supported, e.g. generalizing some scalar properties to grapheme clusters. Resulting indices would be grapheme-cluster aligned.

When applied with scalar semantics (the default if applied to String.UnicodeScalarView), then it would have scalar-by-scalar matching with binary semantics. Resulting indices would be scalar-aligned.

TBD is application to one of the encoded views, but it will probably closely adhere to scalar semantics.

These would also likely have different character classes, one which maps to a Character property (and we should add new ones as part of this effort) and one which follows the normal ands and ors based on scalar property. Character classes would likely be customizable (e.g. POSIX mode, or even supply a custom one), mechanism TBD.