Are String views cached?

Hi all,
I have a situation where I need to sort a collection of Strings by their UTF16 literal representation. Similar to the following Objective-C counterpart:

            return [a compare:b options:NSLiteralSearch];

The details in difference are:
utf8 preserves unicode code point ordering while due to the encoding scheme, utf16 sorts unicode code points that encode as surrogate pairs (U+10000 to U+10FFFF) before U+E000 to U+FFFF.

My question is: Will using the utf 16 view on a String cache the view so that it's not calculated again and again?
In other words: Does it make sense to do:

array.sort { a, b in
    a.utf16.lexicographicallyPrecedes(b.utf16)
}

Thanks! :-)

1 Like

Sort of. String does calculate "breadcrumbs" when you access the UTF-16 view, which helps speed up later indexing operations. As the name suggests, this consists of a few indexes and their UTF-8/UTF-16 offsets, so when you ask for the UTF-16 version of a particular index, it can start calculating from the nearest breadcrumb.

Breadcrumbs won't help with this, but I think it's fine - lexicographicallyPrecedes can early-exit if the strings are different, and the transcoding can be trivial (e.g. if the strings are mostly ASCII), so performance really depends on the contents of the strings. It may also be faster to eagerly copy to (String, Array<UInt16>) tuples, and sort those, but the only way to know is to try it.

They're both valid approaches.

1 Like

Thank you very much, @Karl
Great to get the 'breadcrumbs' refreshed - I had forgotten about the details here. And also thanks for the analysis that they won't actually help in this situation.
I'll give both strategies a shot. :-)

1 Like
Terms of Service

Privacy Policy

Cookie Policy