[Pitch] Regular Expression Literals

Michael_Ilseman · October 15, 2021, 6:33pm

The idea is for different libraries to provide different semantics using the same regex syntax.

For example, a higher level framework that knows the current locale of the reader and/or application domain, might want to provide more linguistically sophisticated matching. Examples:

Digraphs, such as "ch" in Czech, as a single distinct letter for matching purposes, if applicable to the user's current language
Ligatures, such as ﬁ, might be comparable with their expanded form fi, or not, depending on application
Word boundaries, such as \b, could incorporate large language dictionaries to better understand where boundaries are inside languages that don't separate words by whitespace (e.g. Chinese).
Fuzzy matching, such as allowing to match the same word whether it is typed as a compound, properly hyphenated, or two separate words (windswept, wind-swept, wind swept, wind-\n\s*swept, etc).