String comparison improvements

Hey Swift-Dev,

The swift standard library team have been working on a new implementation for comparing Swift strings for Swift 5. Michael touched on the motivations in the State of String email but I’ll summarize here:

The Swift String comparison implementations on Apple platforms and Linux differ in results and performance. Apple platforms use CFStringCompare with no locale, while Linux uses ICU libraries. Unifying the algorithms that Swift strings use for comparison is reason alone for doing a new implementation.
We've come up with some great common fast paths that speed up comparisons for a lot of common cases. Our microbenchmarks show up to a 6.8x increase in performance and there is still some low hanging fruit in our implementation that would bring further speedups.

Bare in mind this is not intended to be a replacement for sorting strings that will be presented to users, for that developers should stick to NSLocalizedString APIs.

Hi Lance,

I read Michael’s emails but I don’t remember at the moment — what is the new string comparison implementation going to be based on?
Also, how will this affect bridged strings? If I compare two `NSString`s, I may get a different result than if I compare the same two strings as bridged through `String`, correct?

— Itai

···

On 17 Jan 2018, at 13:19, Lance Parker via swift-dev wrote:

Hey Swift-Dev,

The swift standard library team have been working on a new implementation for comparing Swift strings for Swift 5. Michael touched on the motivations in the State of String email but I’ll summarize here:

The Swift String comparison implementations on Apple platforms and Linux differ in results and performance. Apple platforms use CFStringCompare with no locale, while Linux uses ICU libraries. Unifying the algorithms that Swift strings use for comparison is reason alone for doing a new implementation.
We've come up with some great common fast paths that speed up comparisons for a lot of common cases. Our microbenchmarks show up to a 6.8x increase in performance and there is still some low hanging fruit in our implementation that would bring further speedups.

Bare in mind this is not intended to be a replacement for sorting strings that will be presented to users, for that developers should stick to NSLocalizedString APIs.

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Comments inline below

Hi Lance,

I read Michael’s emails but I don’t remember at the moment — what is the new string comparison implementation going to be based on?

The new approach uses the lexicographical ordering of NFC-normalized UTF-16 code units. For two known ASCII strings, we just use memcmp.

Also, how will this affect bridged strings? If I compare two NSStrings, I may get a different result than if I compare the same two strings as bridged through String, correct?

If I understand correctly, you’re asking what will happen if you have two strings explicitly typed as NSString in swift and you compare them. I believe they’ll still use whatever NSString does for comparison today, so CFStringCompare. For Swift strings backed by a bridged NSString, this new comparison method will be used.

It might make sense for explicit NSStrings in Swift to use the new method as well. What do you think?

···

On Jan 17, 2018, at 1:46 PM, Itai Ferber <iferber@apple.com> wrote:
— Itai

On 17 Jan 2018, at 13:19, Lance Parker via swift-dev wrote:

Hey Swift-Dev,

The swift standard library team have been working on a new implementation for comparing Swift strings for Swift 5. Michael touched on the motivations in the State of String email but I’ll summarize here:

The Swift String comparison implementations on Apple platforms and Linux differ in results and performance. Apple platforms use CFStringCompare with no locale, while Linux uses ICU libraries. Unifying the algorithms that Swift strings use for comparison is reason alone for doing a new implementation.
We've come up with some great common fast paths that speed up comparisons for a lot of common cases. Our microbenchmarks show up to a 6.8x increase in performance and there is still some low hanging fruit in our implementation that would bring further speedups.

Bare in mind this is not intended to be a replacement for sorting strings that will be presented to users, for that developers should stick to NSLocalizedString APIs.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Hi Lance,

I read Michael’s emails but I don’t remember at the moment — what is the new string comparison implementation going to be based on?

It is the lexicographical ordering of NFC-normalized UTF-16 code units. We use ICU to normalize strings, and there are a number of fast paths for scenarios where portions of strings are already in this canonical form.

Also, how will this affect bridged strings? If I compare two NSStrings, I may get a different result than if I compare the same two strings as bridged through String, correct?

If they have type String (i.e. they were bridged), they will get the new ordering semantics. I’m not familiar with how the type NSString is ordered in corelibs-foundation, do you know? Either way, as a different type, NSString could independently choose different ordering semantics.

···

On Jan 17, 2018, at 1:46 PM, Itai Ferber via swift-dev <swift-dev@swift.org> wrote:

— Itai

On 17 Jan 2018, at 13:19, Lance Parker via swift-dev wrote:

Hey Swift-Dev,

The swift standard library team have been working on a new implementation for comparing Swift strings for Swift 5. Michael touched on the motivations in the State of String email but I’ll summarize here:

The Swift String comparison implementations on Apple platforms and Linux differ in results and performance. Apple platforms use CFStringCompare with no locale, while Linux uses ICU libraries. Unifying the algorithms that Swift strings use for comparison is reason alone for doing a new implementation.
We've come up with some great common fast paths that speed up comparisons for a lot of common cases. Our microbenchmarks show up to a 6.8x increase in performance and there is still some low hanging fruit in our implementation that would bring further speedups.

Bare in mind this is not intended to be a replacement for sorting strings that will be presented to users, for that developers should stick to NSLocalizedString APIs.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Hi Lance (and Michael — I’ll keep the conversation in one thread),

Sounds reasonable. My one concern is that behavior around implicitly bridged strings is going to change, in a potentially problematic and "magic" way:

let s1: String = …
let s2: String = …

let mySet = NSMutableSet()
mySet.add(s1)
mySet.add(s2)

// s1 != s2 by Swift ordering rules, but CFStringCompare(s1, s2, …) == kCFCompareEqualTo
print(mySet.count) // => 1 ????

or alternatively:

let s1: String = …
let s2: String = …

let mySet = NSMutableSet()
mySet.add(s1)

// s1 == s2 by Swift ordering rules, but CFStringCompare(s1, s2, …) != kCFCompareEqualTo
print(mySet.contains(s2)) // => false ????

I don’t think there’s much we can do about this, but this is something we’re going to have to watch out for — this will be a relatively rare and very subtle behavior change.

— Itai

···

On 17 Jan 2018, at 13:56, Lance Parker wrote:

Comments inline below

On Jan 17, 2018, at 1:46 PM, Itai Ferber <iferber@apple.com> wrote:

Hi Lance,

I read Michael’s emails but I don’t remember at the moment — what is the new string comparison implementation going to be based on?

The new approach uses the lexicographical ordering of NFC-normalized UTF-16 code units. For two known ASCII strings, we just use memcmp.

Also, how will this affect bridged strings? If I compare two NSStrings, I may get a different result than if I compare the same two strings as bridged through String, correct?

If I understand correctly, you’re asking what will happen if you have two strings explicitly typed as NSString in swift and you compare them. I believe they’ll still use whatever NSString does for comparison today, so CFStringCompare. For Swift strings backed by a bridged NSString, this new comparison method will be used.

It might make sense for explicit NSStrings in Swift to use the new method as well. What do you think?

— Itai

On 17 Jan 2018, at 13:19, Lance Parker via swift-dev wrote:

Hey Swift-Dev,

The swift standard library team have been working on a new implementation for comparing Swift strings for Swift 5. Michael touched on the motivations in the State of String email but I’ll summarize here:

The Swift String comparison implementations on Apple platforms and Linux differ in results and performance. Apple platforms use CFStringCompare with no locale, while Linux uses ICU libraries. Unifying the algorithms that Swift strings use for comparison is reason alone for doing a new implementation.
We've come up with some great common fast paths that speed up comparisons for a lot of common cases. Our microbenchmarks show up to a 6.8x increase in performance and there is still some low hanging fruit in our implementation that would bring further speedups.

Bare in mind this is not intended to be a replacement for sorting strings that will be presented to users, for that developers should stick to NSLocalizedString APIs.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Hi Lance (and Michael — I’ll keep the conversation in one thread),

Sounds reasonable. My one concern is that behavior around implicitly bridged strings is going to change, in a potentially problematic and "magic" way:

let s1: String = …
let s2: String = …

let mySet = NSMutableSet()
mySet.add(s1)
mySet.add(s2)

// s1 != s2 by Swift ordering rules, but CFStringCompare(s1, s2, …) == kCFCompareEqualTo
print(mySet.count) // => 1 ???
or alternatively:

let s1: String = …
let s2: String = …

let mySet = NSMutableSet()
mySet.add(s1)

// s1 == s2 by Swift ordering rules, but CFStringCompare(s1, s2, …) != kCFCompareEqualTo
print(mySet.contains(s2)) // => false ???
I don’t think there’s much we can do about this, but this is something we’re going to have to watch out for — this will be a relatively rare and very subtle behavior change.

My understanding is that CFStringCompare, when given kCFCompareNonliteral, follows canonical equivalence and thus equality would not be affected. However, ordering, i.e. `<`, could change.

···

On Jan 17, 2018, at 2:36 PM, Itai Ferber <iferber@apple.com> wrote:

— Itai

On 17 Jan 2018, at 13:56, Lance Parker wrote:

Comments inline below

On Jan 17, 2018, at 1:46 PM, Itai Ferber <iferber@apple.com <mailto:iferber@apple.com>> wrote:

Hi Lance,

I read Michael’s emails but I don’t remember at the moment — what is the new string comparison implementation going to be based on?

The new approach uses the lexicographical ordering of NFC-normalized UTF-16 code units. For two known ASCII strings, we just use memcmp.

Also, how will this affect bridged strings? If I compare two NSStrings, I may get a different result than if I compare the same two strings as bridged through String, correct?

If I understand correctly, you’re asking what will happen if you have two strings explicitly typed as NSString in swift and you compare them. I believe they’ll still use whatever NSString does for comparison today, so CFStringCompare. For Swift strings backed by a bridged NSString, this new comparison method will be used.

It might make sense for explicit NSStrings in Swift to use the new method as well. What do you think?

— Itai

On 17 Jan 2018, at 13:19, Lance Parker via swift-dev wrote:

Hey Swift-Dev,

The swift standard library team have been working on a new implementation for comparing Swift strings for Swift 5. Michael touched on the motivations in the State of String email but I’ll summarize here:

The Swift String comparison implementations on Apple platforms and Linux differ in results and performance. Apple platforms use CFStringCompare with no locale, while Linux uses ICU libraries. Unifying the algorithms that Swift strings use for comparison is reason alone for doing a new implementation.
We've come up with some great common fast paths that speed up comparisons for a lot of common cases. Our microbenchmarks show up to a 6.8x increase in performance and there is still some low hanging fruit in our implementation that would bring further speedups.

Bare in mind this is not intended to be a replacement for sorting strings that will be presented to users, for that developers should stick to NSLocalizedString APIs.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev