Swift Regex: lookbehind

Notably missing from Swift's new regex features is a way to perform a lookbehind. Is this an intentional omission or something that will eventually arrive?

6 Likes

It is intended to be eventually supported. Can you share your use case? There's a lot of different kinds of lookbehind, and many engines do not support the fully general case as it can add extra algorithmic factors that aren't present with lookahead.

For example, there's lookbehind of fixed-length verbatim content, look-behind of an alternation of fixed-length verbatim content, and lookbehind of simple and reversible regexes. These can be supported with different levels of efficiency without really changing the regex's algorithmic complexity.

But, in the most general form, a look-behind regex component would need to attempt the regex from every starting position in the input prior to the current position. This is very different than a lookahead regex which only tries from the current position.

6 Likes

A place where I just wanted to use a lookbehind was in implementing a hashtag detector that follows the Unicode specification for Hashtag Identifiers. (Who knew such a thing existed?!)

The spec includes the following:

<Hashtag-Identifier> := <Start> <Continue>* (<Medial> <Continue>+)*

When parsing hashtags in flowing text, it is recommended that an extended Hashtag only be recognized when there is no Continue character before a Start character. For example, in “abc#def” there would be no hashtag, while there would be in “abc #def” or “abc.#def”.

One natural way to implement this would be with a negative lookbehind, verifying that there is not a <Continue> character before the <Start> character (which is generally '#').

So, this would be a single character lookbehind.

6 Likes

I have regex using strings and NSRegularExpression that I'm looking to convert to Regex Builder that requires LookBehinds and NegativeLookBehinds. As a simple example I'm trying to extract valid comparison operators. Without NegativeLookBehinds I can't filter out repeating occurrences e.g. ===, << or >>= should not match, but ==, =, <= etc should.

This is always evaluated on short strings, so for me the performance argument doesn't have much weight. For any api there is potential for bad performance to creep in if you use it incorrectly. NSRegularExpression supports this, so it seems a bit arbitrary to exclude it.

2 Likes

+1 for look-behind support

RegExp lookbehind assertions would be useful to me. Especially with Safari support in iOS 16.4.

My use case is wanting to match on a full word (not just part of a word), so the proceeding character must be the start of the string, a whitespace or newline, or a period or comma. However, preferably I would not capture what proceeds the first word as this is not important to me.

// EX 1: Colder with a low of 5.
// EX 2: a low of 5 below zero.
// EX 3: high of 5. low of 6.
let lowRegex = /(?<=^|\s|\.|\,)low of (\d+|zero)( below zero)?/

Without the lookbehind assertion I would capture low zero in below zero from example 2 which was not what I was intending. Alternatively if I just decide to capture the first character then my matches would often be prefixed by a whitespace/newline or period/comma which I'd rather ignore, especially if I intend on replacing the match in the original string.

Perhaps a more general version of this would be something like PCRE's \K:

The escape sequence \K causes any previously matched characters not to be included in the final matched sequence. For example, the pattern:

foo\Kbar

matches "foobar", but reports that it has matched "bar".

The documentation says that \K can be used to work around the fact that lookbehinds must be fixed-length. I wonder if \K and variable-length lookbehinds have different performance implications.