The Array.sorted() results of the macOS and Ubuntu were inconsistent

I know the sorted{$0.compare($1) == .orderedAscending} is working, but why the default results were inconsistent?

e.g.

$ cat test.swift
import Foundation

print(["Apple", "apple", "Banana", "banana"].sorted())

# macOS:10.13.4 swift: 4.1
$ swift test.swift
["Apple", "Banana", "apple", "banana"]

# Ubuntu:16.04 swift: 4.1
$ swift test.swift
["apple", "Apple", "banana", "Banana"]

Updated on May 26:

# Ubuntu:16.04 swift: 4.2-DEVELOPMENT-SNAPSHOT-2018-05-23-a
$ swift test.swift
["Apple", "Banana", "apple", "banana"]

This seems to be SR-982
String sort order and comparisons differ between Darwin and Linux

1 Like

None of those strings compare equal, so I'd expect there to be a single "correct" ordering…

I think this is the root issue. Comparison of Strings should not be platform dependent; I don't know of any other languages that do this.

1 Like

In actuality Swift does (technically) guarantee Unicode-correct string ordering, with the exception that Apple doesn't use Unicode-correct string ordering on ASCII-ASCII string comparison, Apple uses ASCII standard string ordering when both strings are purely ASCII, which is faster.

So, probably unlike all other text bugs, this bug only appears on ASCII strings, in stead of emoji strings.

According to the latest comment in SR-982
String sort order and comparisons differ between Darwin and Linux
this “This should've been fixed” – but I don't know which version has been decided to be “correct.” The latest developer snapshot for Xcode still sorts as ["Apple", "Banana", "apple", "banana"].

This paragraph in the String Manifesto could also be relevant:

Finally, once it is agreed that the default role for String is to handle machine-generated and machine-readable text, the default ordering of Strings need no longer use the UCA at all. It is sufficient to order them in any way that's consistent with equality, so String ordering can simply be a lexicographical comparison of normalized forms, ...

I have updated the category.

This is the old behavior. Prior to Swift 4.1, Linux used UCA with DUCET ordering and comparisons. To improve performance on both Darwin and Linux (especially older Ubuntu's), we implemented a new comparison implementation for Swift 4.2. This new implementation is the same across Linux and Darwin, so both platforms get the same answer.

Have you tried with a recent 4.2 or master snapshot?

2 Likes

Is it correct that string comparison is a lexicographical comparison of the Unicode scalars in the decomposed form? Is this documented somewhere?

This is no specific formal guarantee, and the ordering is not guaranteed to be stable even across executions of the same program.

Right now it is the lexicographical order of the NFC-normalized UTF-16 code units, not the scalar values. It's likely that we may switch it to the scalar values in the future. The only place where this would cause an ordering difference would be BMP scalars beyond the surrogates.

5 Likes

Oh, I try run the Ubuntu 4.2-DEVELOPMENT-SNAPSHOT-2018-05-23-a , the results were consistent.

Thank you so much.

1 Like