Is there a way to generate collation keys for a Swift String?

Is there any way in Swift to generate collation keys for a given String?

Consider the differences in ordering between these two calls:

import Foundation

let values = ["30", "10", "2", "1", "20", "3"]

// Prints: ["1", "10", "2", "20", "3", "30"]
print(values.sorted())

// Prints: ["1", "2", "3", "10", "20", "30"]
print(values.sorted { $0.localizedStandardCompare($1) == .orderedAscending })

The call to localizedStandardCompare is part of NSString and is significantly slower than just comparing two Strings using the < operator, though it produces a more logical ordering for some use-cases.

There's an ancient set of APIs in CarbonCore that can be used to create a collation key for a given collator. These APIs are deprecated and not available on iOS.

Apple Developer Documentation: Unicode Utilities

In the latest beta builds, there is a new Locale.Collation type that looks promising but doesn't really appear to offer any configuration or way to turn a String into a collation key:

Apple Developer Documentation: Locale.Collation

Being able to precompute collation keys can significantly speed up sorting of large data sets. Is there any way to do that with what currently exists in Swift on Apple platforms?

I am not aware of anything provided by Swift or the platform. They mostly just overlay a hidden ICU that you are not supposed to touch, and that leaves us unable to legitimately get the data out from underneath.

You could try a package I have that reimplements collation and is capable of this. (The relevant documentation is here, although if the link dies soon from the switch to DocC, this source link should still be stable.) Its purpose was more to enable custom tailorings than to speed anything up, so it may not actually be any faster. If additional optimizations or custom hooks would be helpful, you are welcome to bring them up in an issue. You are also welcome to just copy and paste whatever portions might be useful to you.

1 Like