reference* types
My line of reasoning is the following:
- The
Equatableprotocol defines a notion of substitutability that is wholly independent of identity. - Whether a type is a value type or a reference type is irrelevant hereâif it conforms to
Equatable, then by that protocol's documented definition, two values that compare equal are interchangeable regardless of whether they are two separate objects with different identities or not. - Therefore, it doesn't matter whether you choose a stable or unstable sort, because if the values are truly interchangeable, then the order of equal elements shouldn't matter.
- So, a sort of
Equatableelements that does not also provide a custom comparison predicate can choose a more performant unstable sort algorithm and still provide results that are entirely consistent with the substitutability property of the protocol.
The documentation for Equatable says this:
Equality implies substitutabilityâany two instances that compare equally can be used interchangeably in any code that depends on their values.
Your argument seems to be that even if two reference types are Equatable, you can still distinguish them based on their identity. Yes, that's true, but if you say that distinction matters for a sort, then it clearly violates the interchangeability requirement of its Equatable conformance.
Interchangeable in value, yes. But stability upon sorting an instance of T : Collection where Element : class has to do with identity.
That's my point: there will be code that doesn't depend on their values (and people are free to write this code, since they're not value types), and you're making that code at lot harder to write.
What, then, is an unstable sort if you use that definition?
Let's assume that we have an array with reference types [B2, A1, C1, B1], which conform to Comparable (and thus Equatable) based on the letter alone.
If I sort those by using the Comparable conformance, my claim is that substitutability means that I shouldn't be caring if I get back [A1, B2, B1, C1] or [A1, B1, B2, C1]. Either should be just as good, because my conformance says those two Bs are interchangeable, so a more performant sort algorithm that does not guarantee order stability is fine.
If I sort those instead by using a version of sort that takes a custom comparator, then I'm asking it to ignore the conformance and instead use that comparator, which only answers the question "is X ordered before Y" and makes no claim about interchangeability. So, I would expect the sort to be the safest possible result in that case, which would be stableâ[A1, B2, B1, C1].
That's my reasoning, FWIW. I also think that in Swift's user community there are likely to be many people who:
- Don't care about the performance difference
- Naturally assume that sorting is stable because of effects they experience in, e.g. table UIs
I advocate using an adaptive algorithm that decides how much/whether extra memory is used, and falls back to something like this to get O(1) memory use at, potentially, some cost to performance. I'm not saying O(1) memory use overall is a critical goal, but it can easily be achieved by falling back when the number of elements exceeds some constant.
Personally I don't think I'll be thoroughly convinced we need to have an unstable sort in the standard library until someone can demonstrate performance gains over the best adaptive stable sort we come up with.
Let me also add that I've seen real performance improvements over the standard library's sort by using insertion sort (which is stable) when the number of elements is known to be small. I do realize that the standard library's sort uses insertion sort (dynamically) for some of those cases, but the way it makes the choice to use insertion sort is clearly not always optimal.
If I sort those by using the
Comparableconformance, my claim is that substitutability means that I shouldnât be caring if I get back [A1, B2, B1, C1] or [A1, B1, B2, C1]. Either should be just as good, because my conformance says those two Bs are interchangeable, so a more performant sort algorithm that does not guarantee order stability is fine.
Those two Bs are interchangeable in value, sure. And yes, I might not care whether I get back [A1, B2, B1, C1] or [A1, B1, B2, C1] after one sort precisely because they are interchangeable in value. But it's still legitimate for me to care if the second item in the array is going to be the same element (i.e., identity) if I sort it again.
Those two Bs are interchangeable in value, sure. And yes, I might not care whether I get back [A1, B2, B1, C1] or [A1, B1, B2, C1] after one sort precisely because they are interchangeable in value. But itâs still legitimate for me to care if the second item in the array is going to be the same element (i.e., identity) if I sort it again.
I see your point, and my counter would be "if you care about identity, then use a custom comparator to signal that to the sort algorithm because it removes the substitutability property from the equation." I can see how some might say that's too subtle, though.
That being said, I'd be perfectly fine with stable sort being the only option provided by stdlib, which makes this a non-issue anyway.
I advocate using an adaptive algorithm that decides how much/whether extra memory is used
I'm sorry, was this supposed to link to something?
Could get away with an unstable sort on value types? Since there's no concept of identity here a faster/more memory efficient algorithm could be employed.
Let me also add that Iâve seen real performance improvements over the standard libraryâs sort by using insertion sort (which is stable) when the number of elements is known to be small. I do realize that the standard libraryâs sort uses insertion sort (dynamically) for some of those cases, but the way it makes the choice to use insertion sort is clearly not always optimal.
We could (this is an implementation detail, but important anyway) make the algorithm pick one of three:
- Insertion sort (for very small N)
- O(N) space Timsort or similar (for normal N)
- O(1) space in-place Mergesort or similar (for huge N)
The threshold between 1 and 2 can maybe be hardcoded after doing benchmarks over N.
The threshold between 2 and 3 could be picked from the size of the elements to be sorted and the maximum space admissible, determined either dynamically (asking the runtime for what is the memory situation) or statically (from a good reasoning on available resources).
- Insertion sort (for very small N)
- O(N) space Timsort or similar (for normal N)
Actually, Timsort does this already, so you only really have two cases here.
Could get away with an unstable sort on value types? Since thereâs no concept of identity here a faster/more memory efficient algorithm could be employed.
Hmm. Maybe?
For custom comparators (think comparing vectors by X, Y or Z) the value-equality isnât present in the comparison. There is equality, but only under a certain criteria. There is no full equality under those comparisons.
Under Comparable conformance comparators though, Iâm not sure what the answer might be.
edit: added quote. My gosh, I was ambiguous w/o it
Iâm sorry, was this supposed to link to something?
Fixed; refresh and try again.
many people[...]Naturally assume that sorting is stable because of effects they experience in, e.g. table UIs
This is the key thing for me in favor of it. Because most languages use a stable sort, users are used to that and we would encourage bugs where people make the wrong assumption if Swift zigs where other languages zag.
Could get away with an unstable sort on value types? Since thereâs no concept of identity here a faster/more memory efficient algorithm could be employed.
Value types can have identity or reference semantics (UnsafeBufferPointer is a value type).
We could get away with types with value semantics having an unstable sort. But there's no way for the type system to detect value semantics (many threads out there on why we don't have such a feature).
We could potentially specialize sorting of known value types like Int or String, perhaps, if it can be shown to be worth it. But hopefully, like Dave says, a good enough stable sort would obviate this need.
Because most languages use a stable sort
Actually, I don't think this is true. Most of the languages I've used offer an unstable sort by default (i.e., the "sort" function), with an option to enable stable sorting. This includes C, C++, Objective-C, Rust, JavaScript (actually, EMCA doesn't define a particular sort, so you have to assume it may not be stable), Java (for primitive types), and Lua (which provides no stable sort). Only Python does stable sorting by default. I'm not arguing against the fact that stable sort should be the default, but just wanted to weigh in with my experience.
The rust sort appears to be stable.
The Java sort for primitive types is indeed unstable. I'm comfortable with my instances of the number 42 being reordered :)
Oops, I guess I should read the docs more closelyâŠlooks like stable sorts aren't as rare as I thought they were.