The Array.sorted() results of the macOS and Ubuntu were inconsistent

I know the sorted{$0.compare($1) == .orderedAscending} is working, but why the default results were inconsistent?

e.g.

$ cat test.swift
import Foundation

print(["Apple", "apple", "Banana", "banana"].sorted())

# macOS:10.13.4 swift: 4.1
$ swift test.swift
["Apple", "Banana", "apple", "banana"]

# Ubuntu:16.04 swift: 4.1
$ swift test.swift
["apple", "Apple", "banana", "Banana"]

Updated on May 26:

# Ubuntu:16.04 swift: 4.2-DEVELOPMENT-SNAPSHOT-2018-05-23-a
$ swift test.swift
["Apple", "Banana", "apple", "banana"]

This seems to be SR-982
String sort order and comparisons differ between Darwin and Linux

1 Like

This has nothing to do with "Server" and the general Swift forum will likely give you a better explanation.

I'd say because there are two (different) implementations of the stdlib for Linux and macOS. As far as I can see the semantics of func <=(String, String) do not seem to be documented/fixed (also the language guide doesn't mention it). And they probably chose to implement it differently (my guess: macOS just using Foundation.String.compare and tuxOS using ICU or maybe plain strcmp, ask GitHub ;-) )

The more interesting question is why "sorted.compare" works the same (which is not in the stdlib, but in Foundation). And this is probably because the Foundation implementation is the same. Pure luck I'd say, do not rely on it :slight_smile:

But most importantly: why do you assume this is stable in the first place. The String <= String operation is ambiguous in the first place. If you need some specific ordering, you need to be more precise.

Maybe it is worth to file a bug on func <=(String, String) being locked down to a specific sorting behavior, but my guess is that this is intentionally vague to support different implementations.

1 Like

None of those strings compare equal, so I'd expect there to be a single "correct" ordering…

Why and for what usage? Since the specific comparison approach is unspecified, I would always expect the "fastest" stable sorting the implementation can provide (which may differ on platforms).

If stable sorting within an implementation is not enough for you (and that seems to be the only given guarantee), you can always provide a specific one.

I think this is the root issue. Comparison of Strings should not be platform dependent; I don't know of any other languages that do this.

1 Like

In actuality Swift does (technically) guarantee Unicode-correct string ordering, with the exception that Apple doesn't use Unicode-correct string ordering on ASCII-ASCII string comparison, Apple uses ASCII standard string ordering when both strings are purely ASCII, which is faster.

So, probably unlike all other text bugs, this bug only appears on ASCII strings, in stead of emoji strings.

According to the latest comment in SR-982
String sort order and comparisons differ between Darwin and Linux
this “This should've been fixed” – but I don't know which version has been decided to be “correct.” The latest developer snapshot for Xcode still sorts as ["Apple", "Banana", "apple", "banana"].

This paragraph in the String Manifesto could also be relevant:

Finally, once it is agreed that the default role for String is to handle machine-generated and machine-readable text, the default ordering of Strings need no longer use the UCA at all. It is sufficient to order them in any way that's consistent with equality, so String ordering can simply be a lexicographical comparison of normalized forms, ...

I have updated the category.

This is the old behavior. Prior to Swift 4.1, Linux used UCA with DUCET ordering and comparisons. To improve performance on both Darwin and Linux (especially older Ubuntu's), we implemented a new comparison implementation for Swift 4.2. This new implementation is the same across Linux and Darwin, so both platforms get the same answer.

Have you tried with a recent 4.2 or master snapshot?

2 Likes

Is it correct that string comparison is a lexicographical comparison of the Unicode scalars in the decomposed form? Is this documented somewhere?

This is no specific formal guarantee, and the ordering is not guaranteed to be stable even across executions of the same program.

Right now it is the lexicographical order of the NFC-normalized UTF-16 code units, not the scalar values. It's likely that we may switch it to the scalar values in the future. The only place where this would cause an ordering difference would be BMP scalars beyond the surrogates.

5 Likes

Oh, I try run the Ubuntu 4.2-DEVELOPMENT-SNAPSHOT-2018-05-23-a , the results were consistent.

Thank you so much.

1 Like