The idea is for different libraries to provide different semantics using the same regex syntax.
For example, a higher level framework that knows the current locale of the reader and/or application domain, might want to provide more linguistically sophisticated matching. Examples:
- Digraphs, such as "ch" in Czech, as a single distinct letter for matching purposes, if applicable to the user's current language
-
Ligatures, such as
fi
, might be comparable with their expanded formfi
, or not, depending on application -
Word boundaries, such as
\b
, could incorporate large language dictionaries to better understand where boundaries are inside languages that don't separate words by whitespace (e.g. Chinese). - Fuzzy matching, such as allowing to match the same word whether it is typed as a compound, properly hyphenated, or two separate words (
windswept
,wind-swept
,wind swept
,wind-\n\s*swept
, etc).