Additional String Processing APIs

Two major issues come to mind:

  • In-place substitution only exists on NSMutableString (NSMS.replaceOccurences(of: String, with: String, options: NSString.CompareOptions, range: NSRange). If you start with a Swift String, this means bridging to NSMS, performing the replacements, then bridging back. This is a pretty basic feature, and we should do better.

    I understand that there are designs (and prototypes, even) for generic Collection-based pattern matching and substitution. I think we should make it a priority.

  • ASCII string processing. Currently I'm working on parsing data formats that only accept ASCII strings, and the overheads from the unicode model are a significant performance drain. We pay for every call to index(after:), even though we check every character and will ultimately fail if any of them are non-ASCII!

    I would really like a way to query in advance whether a string is ASCII (so we can fail early if it isn't). String already sets an internal flag as part of its UTF8 validation, but doesn't expose that information as API. That means we need to check it ourselves, which is an O(n) operation. It seems like it would be simple to expose an isASCII property (similar to isContiguousUTF8) to make this easier.

    I've been thinking about making my own ASCIIString type to eliminate the indexing overheads. Ideally, this would work like String's existing views (.utf8/.utf16/etc), but I don't believe String provides convenient APIs to work at that level (the best I can think of is using String.utf8.withUnsafeBufferPointer { ... } to access the storage without unicode getting in the way, but then my .ascii view is bound to that scope). I also don't believe it's possible to support efficient mutations with a wrapper view, whether using the buffer-pointer or some stdlib-provided view (see the recent discussion about slices).

    So yeah, better support for ASCII strings is something that I'd appreciate.

5 Likes