I fully agree @calebkleveter, I think that compile-time checking for regular expressions and the concept of RegEx literals would be extremely useful. If it were checked at compile-time, a hypothetical RegEx literal could also be highlighted accordingly which would really make them easier to read. Because of this, a unique syntax is probably in order to differentiate it from a string literal (maybe using /
instead of "
).
NSRegularExpression
also has other problems that make it harder to use. Firstly, it is not geared for use in Swift as it is fundamentally tied to Objective C and makes use of types like NSString
, NSRange
, NSMutableString
, etc., even using Objective C in its documentation. Moreover, it makes uses of pointers which makes it even less user friendly. As well, NSRegularExpression
's algorithms are not generic over StringProtocol
making working with Substring
more of a hassle.
There are also a bunch of algorithms missing for regular expressions such as, splitting with a RegEx as the delimiter, matching or replacing a regex in a string a specified max number of times, removing matches of a RegEx in a string (opposed to replacing occurrences with an empty string as I currently do), lazy iteration through matches of a pattern in a string as an alternative to NSRegularExpression
's enumerateMatches(in:options:range:using:)
, etc. Furthermore, if this were added to the standard library, maybe Foundation could provide extensions that allow it to easily work with files.
Having a nice, succinct, swifty API for regular expressions with first-class support would make a big difference in terms of readability. For example, right now it is not as easy as it probably should be to check if a string matches a RegEx pattern:
// Now:
let somePattern = #"..."#
let matchesRegex = someString.range(of: somePattern, options: .regularExpression, range: nil, locale: nil) != nil
This has some problems because matchesRegex
may be false if the pattern couldn't be matched or if the pattern is an invalid RegEx, we don't know. To curb this, one would need NSRegularExpression
.
guard let _ = try? NSRegularExpression(pattern: somePattern) else {
fatalError("Regular expression pattern is invalid.")
}
// check for match ...
In a hypothetical implementation, it could be as easy as the following:
let matchesRegex = someString.matches(/.../)
// Compile-time error is thrown if RegEx literal is invalid
Lastly, a native Swift implementation has the potential to be quite powerful as it could leverage Swift's behaviour around characters and grapheme clusters.
extension StringProtocol {
func matches<T>(_ pattern: T) -> Bool where T: StringProtocol {
return self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) != nil
}
}
let str = "\u{D55C}" // 한
let pattern = "\u{1112}\u{1161}\u{11AB}" // ᄒ, ᅡ, ᆫ
print(str) // 한
print(pattern) // 한
print(str.unicodeScalars.elementsEqual(pattern.unicodeScalars)) // false
print(str == pattern) // true
print(str.matches(pattern)) // false
In Swift, string equality allows for the same characters composed in different ways to be considered equal (while their respective unicode scalars are not necessarily equal). Because NSRegularExpression
does not leverage this type of equality, checking if str
matches pattern
returns false
, even though String
's default semantics dictate that the two are in fact equal. A Swift implementation could allow for the use of such semantics with the unicode equality available as an explicit option.
Swift's API for working with strings and extracting information is lacking and regular expressions would be a very good step in the right direction.
A few questions about regular expressions:
- Do you think that regular expressions should have their own literal syntax? If so, how should it look?
- Should regular expressions be incorporated into the standard library or available as a standalone module (potentially with compiler support)?
- What algorithms should be available to work with regular expressions?