[Starter Proposal] Add Stable Sort to StdLib

reference* types

1 Like

My line of reasoning is the following:

  • The Equatable protocol defines a notion of substitutability that is wholly independent of identity.
  • Whether a type is a value type or a reference type is irrelevant here—if it conforms to Equatable, then by that protocol's documented definition, two values that compare equal are interchangeable regardless of whether they are two separate objects with different identities or not.
  • Therefore, it doesn't matter whether you choose a stable or unstable sort, because if the values are truly interchangeable, then the order of equal elements shouldn't matter.
  • So, a sort of Equatable elements that does not also provide a custom comparison predicate can choose a more performant unstable sort algorithm and still provide results that are entirely consistent with the substitutability property of the protocol.

The documentation for Equatable says this:

Equality implies substitutability—any two instances that compare equally can be used interchangeably in any code that depends on their values.

Your argument seems to be that even if two reference types are Equatable, you can still distinguish them based on their identity. Yes, that's true, but if you say that distinction matters for a sort, then it clearly violates the interchangeability requirement of its Equatable conformance.

Interchangeable in value, yes. But stability upon sorting an instance of T : Collection where Element : class has to do with identity.

2 Likes

That's my point: there will be code that doesn't depend on their values (and people are free to write this code, since they're not value types), and you're making that code at lot harder to write.

What, then, is an unstable sort if you use that definition?

3 Likes

Let's assume that we have an array with reference types [B2, A1, C1, B1], which conform to Comparable (and thus Equatable) based on the letter alone.

If I sort those by using the Comparable conformance, my claim is that substitutability means that I shouldn't be caring if I get back [A1, B2, B1, C1] or [A1, B1, B2, C1]. Either should be just as good, because my conformance says those two Bs are interchangeable, so a more performant sort algorithm that does not guarantee order stability is fine.

If I sort those instead by using a version of sort that takes a custom comparator, then I'm asking it to ignore the conformance and instead use that comparator, which only answers the question "is X ordered before Y" and makes no claim about interchangeability. So, I would expect the sort to be the safest possible result in that case, which would be stable—[A1, B2, B1, C1].

2 Likes

That's my reasoning, FWIW. I also think that in Swift's user community there are likely to be many people who:

  • Don't care about the performance difference
  • Naturally assume that sorting is stable because of effects they experience in, e.g. table UIs

I advocate using an adaptive algorithm that decides how much/whether extra memory is used, and falls back to something like this to get O(1) memory use at, potentially, some cost to performance. I'm not saying O(1) memory use overall is a critical goal, but it can easily be achieved by falling back when the number of elements exceeds some constant.

Personally I don't think I'll be thoroughly convinced we need to have an unstable sort in the standard library until someone can demonstrate performance gains over the best adaptive stable sort we come up with.

Let me also add that I've seen real performance improvements over the standard library's sort by using insertion sort (which is stable) when the number of elements is known to be small. I do realize that the standard library's sort uses insertion sort (dynamically) for some of those cases, but the way it makes the choice to use insertion sort is clearly not always optimal.

2 Likes

Those two Bs are interchangeable in value, sure. And yes, I might not care whether I get back [A1, B2, B1, C1] or [A1, B1, B2, C1] after one sort precisely because they are interchangeable in value. But it's still legitimate for me to care if the second item in the array is going to be the same element (i.e., identity) if I sort it again.

I see your point, and my counter would be "if you care about identity, then use a custom comparator to signal that to the sort algorithm because it removes the substitutability property from the equation." I can see how some might say that's too subtle, though.

That being said, I'd be perfectly fine with stable sort being the only option provided by stdlib, which makes this a non-issue anyway.

1 Like

I'm sorry, was this supposed to link to something?

Could get away with an unstable sort on value types? Since there's no concept of identity here a faster/more memory efficient algorithm could be employed.

We could (this is an implementation detail, but important anyway) make the algorithm pick one of three:

  1. Insertion sort (for very small N)
  2. O(N) space Timsort or similar (for normal N)
  3. O(1) space in-place Mergesort or similar (for huge N)

The threshold between 1 and 2 can maybe be hardcoded after doing benchmarks over N.

The threshold between 2 and 3 could be picked from the size of the elements to be sorted and the maximum space admissible, determined either dynamically (asking the runtime for what is the memory situation) or statically (from a good reasoning on available resources).

Actually, Timsort does this already, so you only really have two cases here.

1 Like

Hmm. Maybe?
For custom comparators (think comparing vectors by X, Y or Z) the value-equality isn’t present in the comparison. There is equality, but only under a certain criteria. There is no full equality under those comparisons.

Under Comparable conformance comparators though, I’m not sure what the answer might be.

edit: added quote. My gosh, I was ambiguous w/o it

Fixed; refresh and try again.

This is the key thing for me in favor of it. Because most languages use a stable sort, users are used to that and we would encourage bugs where people make the wrong assumption if Swift zigs where other languages zag.

2 Likes

Value types can have identity or reference semantics (UnsafeBufferPointer is a value type).

We could get away with types with value semantics having an unstable sort. But there's no way for the type system to detect value semantics (many threads out there on why we don't have such a feature).

We could potentially specialize sorting of known value types like Int or String, perhaps, if it can be shown to be worth it. But hopefully, like Dave says, a good enough stable sort would obviate this need.

1 Like

Actually, I don't think this is true. Most of the languages I've used offer an unstable sort by default (i.e., the "sort" function), with an option to enable stable sorting. This includes C, C++, Objective-C, Rust, JavaScript (actually, EMCA doesn't define a particular sort, so you have to assume it may not be stable), Java (for primitive types), and Lua (which provides no stable sort). Only Python does stable sorting by default. I'm not arguing against the fact that stable sort should be the default, but just wanted to weigh in with my experience.

The rust sort appears to be stable.

The Java sort for primitive types is indeed unstable. I'm comfortable with my instances of the number 42 being reordered :)

1 Like

Oops, I guess I should read the docs more closely
looks like stable sorts aren't as rare as I thought they were.