Speeding up String's UTF8 parsing?

I recently came across Daniel Lemire's work on fast UTF8 validation. Given that Swift needs to UTF8-validate nearly every string that is e.g. read from a SQLite database, I was wondering whether it would be possible and worthwhile to use Lemire's work for speeding up Swift's current UTF8 validation code.

I am not an expert in UTF8 parsing and validation, but thought that it would be worth discussing at least.

7 Likes

I think that’s a great work! As I’m not familiar with UTF8, I believe we should first evaluate the algorithm to prove it matches Swift’s use case and doesn’t spoil the Apache-2 license. I also think you can try to contact Daniel Lemire and invite him to bring the PR himself.

Vectorizing UTF8 handling has been on our list for a while. I expect at some point @scanon will get a moment in between numerics projects and do something amazing there :slight_smile:

6 Likes