Pitch: Document Sorting as Stable

nnnnnnnn · August 25, 2022, 4:16am

A quick pitch to add a guarantee that the stdlib's sort is stable! This has been true for a long time, so we should promise that it will stay that way.

Introduction

Swift's sorting algorithm was changed to be stable before Swift 5, but we've never updated the documentation to provide that guarantee. Let's commit to the sorting algorithm being stable so that people can rely on that behavior.

Motivation

A stable sort is a sort that keeps the original relative order for any elements that compare as equal or unordered. For example, given this list of players that are already sorted by last name, a sort by first name preserves the original order of the two players named "Ashley":

var roster = [
   Player(first: "Sam", last: "Coffey"),
   Player(first: "Ashley", last: "Hatch"),
   Player(first: "Kristie", last: "Mewis"),
   Player(first: "Ashley", last: "Sanchez"),
   Player(first: "Sophia", last: "Smith"),
]

roster.sort(by: { $0.first < $1.first })
// roster == [
//    Player(first: "Ashley", last: "Hatch"),
//    Player(first: "Ashley", last: "Sanchez"),
//    Player(first: "Kristie", last: "Mewis"),
//    Player(first: "Sam", last: "Coffey"),
//    Player(first: "Sophia", last: "Smith"),
// ]

For users who are unaware that many sorting algorithms aren't stable, an unstable sort can be surprising. Preserving relative order is an expectation set by software like spreadsheets, where sorting by one column, and then another, is a way to complete a sort based on multiple properties.

Sort stability isn't always observable. When a collection is sorted based on the elements' Comparable conformance, like sorting an array of integers, "unordered" elements are typically indistinguishable. In general, sort stability is important when elements are sorted based on a subset of their properties.

Proposed solution

The standard library's implementation of sort() and sorted() has been stable since before ABI stability was declared in Swift 5, but the documentation explicitly doesn't make this guarantee:

The sorting algorithm is not guaranteed to be stable. A stable sort preserves the relative order of elements that compare as equal.

Let's change that! Since all current versions of the Swift runtime include a stable sort, this change can be made to the standard library documentation only:

- /// The sorting algorithm is not guaranteed to be stable. A stable sort
+ /// The sorting algorithm is guaranteed to be stable. A stable sort
  /// preserves the relative order of elements that compare as equal.

Source compatibility

This change codifies the existing standard library behavior, so it is compatible with all existing source code.

Effect on ABI stability

The change to make sorting stable was implemented before ABI stability, so all ABI-stable versions of Swift already provide this behavior.

Effect on API resilience

Making this guarantee explicit requires that any changes to the sort algorithm maintain stability.

Alternatives considered

There are a variety of other sorting-related improvements that could be interesting to pursue, including key-path or function-based sorting, sorted collection types or protocols, sort descriptors, and more. These ideas can be explored in future pitches and proposals.

David_Smith · August 25, 2022, 6:24am

Seems good to me. Presumably if we ever had reason to want to use an unstable sorting algorithm, we could introduce new API for it.

ksluder · August 25, 2022, 7:45am

To steel-man: the pitch would be stronger if it addressed the alternative of continuing to explicitly document the sorting algorithm as not guaranteed to be stable. What are the upsides of the current spelling? This seems to be the most obvious counter-argument to me, and worthy of spelling out the reasoning against.

In fact, the “alternatives considered” section is really just a list of “potential things to sort”.

stevapple · August 25, 2022, 9:05am

True alternative: to introduce a new stableSort and stableSorted API, and to seek for updating sort and sorted to faster, potentially unstable implementation.

Rust has sort, sort_stable and sort_unstable, using three different implementations. I don’t think we actually need a dedicated unstable sort API because the users only care if it is guaranteed to be stable or not. Swift’s sort and stableSort is also semantically identical to C++’s sort and stable_sort, while theoretically being much more efficient.

jrose · August 25, 2022, 5:29pm

To spell it out explicitly, Swift could have sort and stableSort, or it could have sort and unstableSort. I agree with Nate and David that the latter is more desirable, in the spirit of “make the simple thing easy and correct, but the performant thing possible”.

George · August 25, 2022, 5:44pm

Another option would be sort(stable: Bool = true)

hisekaldma · August 25, 2022, 7:02pm

Changing sort to be unstable would also lead to potentially subtle bugs in projects out there that depend on the current behavior, Hyrum’s law and all.

Jumhyn · August 25, 2022, 7:15pm

Yeah, I think this is a determinative question. Do we feel 'locked in' by potential clients which have (incorrectly) relied on the stability of today's sort? If so, then most of the potential benefits of either sort-stableSort or sort-unstableSort pair are somewhat moot.

jim · August 25, 2022, 7:29pm

Would people then expect stable = false to imply changes must happen when sorting an already ordered list?

hisekaldma · August 25, 2022, 7:37pm

I definitely think we should feel locked in. We can’t assume that clients have been supplying their own stable sorting algorithms all along, even though the std library one is stable. If the std library was actually serious about the warning that sort isn’t stable, sort should never have been made stable in the first place. (Remember that Set and Dictionary were changed to have more unstable order so that clients wouldn’t inadvertently rely on the order being stable.)

stevapple · August 25, 2022, 7:43pm

I’d like to explain a bit on why I don’t like the idea of sort+unstableSort by design.

First of all, let’s keep in mind the line from Swift API Design Guidelines, which suggest API to include all the words needed to avoid ambiguity.

sort, by its name, doesn’t imply whether it’s stable or not. The word stable isn’t “needless” and shouldn’t be omitted. Instead, by using an API called sort, the caller expects any valid sorting algorithm. Stability isn’t implied and thus shouldn’t be expected.

stableSort and unstableSort, by their names, are specific variants of sort that are guaranteed to be stable or unstable. A stable sorting algorithm is useful, because it provides extra functionality than sort. That’s desired, and it perfectly implies that the key difference between sort and stableSort is stability — not performance or anything else. This is the exact goal of having a stable variant of sort.

On the other hand, a dedicated unstable sorting function is almost useless. Although unstable algorithms are usually regarded to be more performant than stable ones, this is not always the case. It’s by implementation and design. It’s possible to have performant stable algorithm and bad unstable algorithm, and the name unstableSort doesn’t imply anything on performance either.

Even worse, this will add unnecessary restriction on the implementation because by using the name we can no longer implement this API with a stable sorting algorithm, even if it’s super performant. I don’t think it is reasonable, therefore I’m strongly against the name unstableSort.

Documentation isn’t everything. API design matters more.

Avi · August 25, 2022, 7:47pm

I don't think it's reasonable to suggest that a method called unstableSort must be unstable, in that sorting the same source twice must yield different results. If you know enough about sorting algorithms to understand what unstable means in context, you'd expect that the algorithm is not necessarily giving stable results, not that it must not give stable results.

stevapple · August 25, 2022, 7:58pm

Then why’re we choosing this name if it’s not using an unstable algorithm? What does the name really want to say?

An "unstable sorting algorithm" is a long established term and there’s a clear line between unstable and stable algorithms. Using a stable algorithm for unstableSort is semantically against its name — instead of breaking its behavior I would say. If you really like one, you should give it a descriptive name, eg. performantSort.

stevapple · August 25, 2022, 8:09pm

If we really think this is worth, we can easily “rename” an existing API with @_silgen_name, which is already used to implement SE-0370.

That is to say, we’re not actually locked down by ABI. At source level such misuses should be regarded as programmatic error, which can be rather easily fixed after the proposal.

xwu · August 25, 2022, 8:24pm

No, but I would expect that there would be some input for which the method would produce a sorted result that doesn't preserve the relative order of equivalent elements.

Put another way, I should be able to illustrate (for teaching purposes, say) with a well-chosen argument what it means for a sorting algorithm to be stable versus unstable by means of calling unstableSort and contrasting the result with the output of a stable sorting algorithm.

David_Smith · August 25, 2022, 8:49pm

There's a long long tradition of languages and libraries not always fully exercising the possibility space that their API naming and documentation allow. I think saying that a hypothetical unstableSort could in practice be implemented with a stable sorting algorithm is just fine: any unstable sort may, coincidentally, return elements in the same order as before. This (hypothetical) one merely does that more often* than most.

*always

phoneyDev · August 25, 2022, 8:52pm

@nnnnnnnn +1 I, for one, did have my own implementation of a stable sort in one project, probably cribbed from SO. I'm sure plenty of others do as well. Documenting that sort is guaranteed to be stable, will be beneficial. The current mealy-mouthed comment. 'is not guaranteed to be stable' serves no one at this point. I guess it did when sort wasn't stable but being stable was a goal.
I don't see the need for stableSort or unstableSort apis. Certainly not as part of this pitch.
If additional sort algorithms need to be added to the Algorithms project I would expect more specific names for them.

jjrscott · August 25, 2022, 9:48pm

This might seem a bit off-topic, but it is relatively easy to implement stable sort based on top of unstable sort by using indexes rather than the values at those indexes as the input to the comparator. If anything I think our issue is that the current sort uses Bool rather than ComparisonResult which can be chained like Perl does (I miss <=>).

For reference, here’s a post I wrote a while back wrt search, but the same applies to sort: Index based binary search · jjrscott

tomkeith · August 25, 2022, 10:05pm

A simple change in documentation. +1 from me.

nnnnnnnn · August 25, 2022, 10:52pm

This is definitely my thinking — sort instability likely isn't even on the radar for a lot of developers, so there's great value in providing stability with the default API.

As far as alternatives and naming, I don't think anyone is specifically looking for an unstableSort(). People might want a sort that has different tradeoffs from the default. For example, the current sort allocates a buffer to perform its merges, so it might be valuable to provide a sort that performs its work entirely in place. That sort would probably need to be unstable, but instability wouldn't be the intent behind the sort's authors or users, and I wouldn't really expect it would be salient enough to include in the API name. I'll add a note about this to the alternatives.