SE-0357: Regex String Processing Algorithms

Michael_Ilseman · May 19, 2022, 8:04pm

That would be the semantic definition of the any character class, which would probably be formalized as part of [Pitch] Unicode for String Processing.

Minor process note, we should probably treat the definitions in the syntax description as temporary and update them after the Unicode pitch happens. I do think it's clearer to have them there, but they're not normative.

That being said, could we discuss departure from other languages more? Perl, Python, ICU/NSRegularExpression, Rust, Javascript, C#, ... all have the behavior of being beginning/end of input unless multi-line is specified. I couldn't quickly figure out Ruby, but that might be the language that diverges from the pack here.

It might end up being the case that we argue for multi-line as the default.