[Pitch] Regular Expression Literals

The idea is for different libraries to provide different semantics using the same regex syntax.

For example, a higher level framework that knows the current locale of the reader and/or application domain, might want to provide more linguistically sophisticated matching. Examples:

  • Digraphs, such as "ch" in Czech, as a single distinct letter for matching purposes, if applicable to the user's current language
  • Ligatures, such as , might be comparable with their expanded form fi, or not, depending on application
  • Word boundaries, such as \b, could incorporate large language dictionaries to better understand where boundaries are inside languages that don't separate words by whitespace (e.g. Chinese).
  • Fuzzy matching, such as allowing to match the same word whether it is typed as a compound, properly hyphenated, or two separate words (windswept, wind-swept, wind swept, wind-\n\s*swept, etc).
4 Likes