The Swift 5.7 string processing features (regex builder DSL + typed matches in regex) are neat. Inspired by that work, I have an idea for the next chapter in the Swift string processing story: type-safe data detectors.
The problem with regex
The problem with using regex for data parsing is that they require you to think on the wrong level of abstraction (i.e. in terms of character sets, rather than data types). To construct a regex, you first need to carefully examine how the data is formatted in text, and also take into account its "context" (the other chararcters in the string that proceed it).
The process of writing a regex to parse out data (even with the builder DSL) is tedious when we "just want our data!". And the final regex (whether in concise form or builder long form) obfuscates what it is you're actually trying to parse out of the string.
Proposed solution: type-safe data detecting in String
Here's an example of how we'd parse the data out of a test suite log:
let (testCount, failureCount, timeTaken) = "Executed 4 tests, with 1 failure in 0.009 seconds".find(.number, .number, .time)! testCount // 4 failureCount // 1 timeTaken // 0.009 seconds
The find method on string returns a tuple populated with the requested types.
let successCount = testCount - failureCount // 3
Another couple of examples:
let (date, temperature, humidity) = "On August 23, 2022 the temperature in Chicago was 68.3 ºF (with a humidity of 74%)".find(.date, .temperature, .percentage)! date // August 23, 2022 temperature // 68.3 ºF humidity // 74%
let (earnings, fileSize, url) = "Total Earnings From PDF: $12.2k (3.25 MB, at https://lifeadvice.co.uk/pdfs/download?id=guide)".find(.currency, .fileSize, .url)! earnings // 12,200 USD fileSize // 3.25 MB url // https://lifeadvice.co.uk/pdfs/download?id=guide
Dates & numbers come in lots of different formats, but by working on the level of abstraction of data types, you're able to ignore the specifics of the format of the data in the particular string you're working with.
I have a working implementation of this string parsing syntax available here.
I have included around 30 common data types out-of-the-box (including dates, urls, email addresses, percentages, units of different kinds, etc) and up to 6 data points per call (until we get variadic generics in Swift 6 ). You can also easily extend the system with parsers for your own custom types. I also have examples of how this approach can be used for data transformation as well.
My implementation uses the SoulverCore math engine for parsing. SoulverCore is closed source, but it's written in 100% Swift and works on Linux & Windows, so it's a good proof of concept for what could become part of platform-independent Foundation to support this feature in a future version of Swift.
Performance is acceptable (though significantly slower than regex): SoulverCore can do around 6k parse operations/second on my Intel i9 MacBook Pro, and 10k+ parse operations/second on my friend's M1.
I've been using regular expressions for years, but I've never loved using them (certainly not in the way I love using Swift).
Most regexes are just trying to get you some data…, so why can't computers be smart and just give you the data?
My proposal is Swifty, in the sense that the syntax is concise and clear, returned values are type-safe, and my proof of concept demonstrates this can be done in a performant manner.
This is my first post here so please be gentle, but I'm very open to feedback, suggestions and criticisms. I'm also just happy to just be contributing to the discussion on how to make Swift "better at string processing than Perl". Cheers.