Revisiting the choice of sort algorithm

Swift.sort is currently an introsort. Introsort is unstable, and there's been discussion over on evolution of adding a stable sort, either instead of or as well as the current sort.

The current sort is problematic because it uses recursion heavily, which defeats a number of optimizations relating to ARC and elimination of the overhead of passing in a closure for the comparative for the comparator. This is why we use gyb to stamp out two near identical versions rather than implementing the Equatable version in terms of the closure-taking one.

So we might find that a stable sort that didn’t use recursion did better in some of the benchmarks and allowed us to reduce the std lib binary size. If anyone is interested in having a go at an implementation that’d be great.

I say might, because the tricky part is going to be efficiently moving the elements around when they're non-trivial. The current algo uses swap because the sorting is happening in-place, which is great because swapping shouldn’t need any ARC. But a sorting algorithm that used additional storage, like a simple merge sort, can’t use swap like that. Instead it'd need to resort to unsafe techniques, or find a way of doing a stable sort in-place (a lot harder to do).

5 Likes

This is probably a stupid question, but what is the advantage of a stable sort algorithm? Whenever I try to work this out, I only find people using it to sort on two different properties by sorting twice, e.g. the classic last name, first name example. This seems inefficient (two sorts) and potentially confusing (you have to sort in the opposite order to the natural one) and would probably be better handled by passing a single comparison closure that checked both properties, or by a sort function that took multiple closures and did the “call the next closure if this one compares equal” part for you. Is there another common use case for stable sorting that I'm missing?

While the example you've provided is probably the most common use case, another one that I can think of is when you're sorting something that you're going to display to users. If you're performing repeated sorts (say after adding items to a list), users would expect elements to stay in the same relative order rather than jump around all over the place.

That makes sense, i.e. when you don't want to specify a total order on the elements of your list and are just appending and sorting every time.

Also, while you can do this kind of sort in one shot with a single closure in code, users also want/expect the ability to do the same thing by e.g. clicking on one column header after another for a list view, which requires the re-sorting you do when they do that to be stable.