Sorting Collections with `map` closures and SortDescriptors

cal · September 18, 2018, 7:41pm

Hey everyone, I'd like to continue a discussion about sorting ergonomics that I pitched in July. (This is a refined iteration of that pitch, based on great feedback from the original thread).

Today, to sort a Collection of non- Comparable elements, we write a closure that returns whether or not two elements areInIncreasingOrder :

struct Person {
    let name: String
    let age: Int
}

let people = [
    Person(name: "Bob", age: 34),
    Person(name: "Alice", age: 34),
    Person(name: "Eve", age: 30)]


people.sorted(by: { $0.age < $1.age })
// returns [Eve (30), Bob (34), Alice, (34)]

This example has a few features that you could consider "pain points":

The call site doesn't form a grammatical english phrase. [1]
It's relatively easy to mix up < and > , and it isn't immediately obvious from the call site which operator results in which ordering.
Repeating the property for both $0 and $1 is a touch verbose.

To specify further ordering (for example, sorting by age and then by name), we must write a more complex closure:

people.sorted(by: { lhs, rhs in
    if lhs.age != rhs.age {
        return lhs.age < rhs.age
    } else {
        return lhs.name < lhs.name
    }
})

// returns [Eve (30), Alice (34), Bob (34)]

This example also has a few pain points:

The closure is more "imperative" than "declarative". It has to be read closely to grok exactly what it's doing.
There are several other ways to write such logic, so there is not one singular canonical version.
It continues to be rather verbose, with six property accessors, and scales poorly as you add more fallbacks.

Proposed Solution

To improve the ergonomics and clarity of these call-sites, I'd like to propose two new sorting APIs:

Sorting with `map` closures

If we had a sort variant that took a closure similar to map, we could instead write:

people.sorted(by: { $0.age })

// instead of
people.sorted(by: { $0.age < $1.age })

This API is more declarative, less verbose, and has less room for error. The call-site also reads much more like an actual english phrase.

Sorting with `SortDescriptor`

A simple map-like sort doesn't help us in the more complex case where we'd like to use additional properties as fallbacks.

NSSortDescriptor is a great Foundation API for declaratively describing how to sort a collection. If our Person type was instead an @objc class instead of a struct, we could write:

import Foundation

(people as NSArray).sortedArray(using: [
    NSSortDescriptor(keyPath: \Person.name, ascending: true),
    NSSortDescriptor(keyPath: \Person.age, ascending: true)
]) as! [Person]

This is declaratively an improvement, but isn't an acceptable solution. It shackles us to the Objective-C runtime, can't be used with structs, and is unavailable on non-Apple platforms.

If we had a first-class SortDescriptor type, we could write:

people.sorted(by: [
    SortDescriptor({ $0.age }),
    SortDescriptor({ $0.name })])

//instead of
people.sorted(by: { lhs, rhs in
    if lhs.age != rhs.age {
        return lhs.age < rhs.age
    } else {
        return lhs.name < lhs.name
    }
})

This declarative SortDescriptor approach is a huge improvement over the imperative approach that is currently available. It's less verbose, less complicated, and would scale better as you added additional fallbacks.

Detailed Design

These improvement would be accomplished by introducing two new sort variants, and a new SortDescriptor type:

public extension Collection {
    
    public func sorted<T: Comparable>(by value: (Element) throws -> T) rethrows -> [Element]
    
    public func sorted(by sortDescriptors: [SortDescriptor<Element>]) -> [Element]
    
}

// an equivalent set of mutating `sort(by:)` methods would also be made available.

public struct SortDescriptor<Element> {

    public init<Value: Comparable>(_ value: @escaping (Element) -> Value)

    public func orders(_ lhs: Element, before rhs: Element) -> Bool

    /// Returns a `SortDescriptor` based on the same property, but with a reversed sort order.
    public var reversed: SortDescriptor<Element>
    
}

A playground with concrete implementation details is available here.

Alternatives considered

Sorting using `KeyPath`

My original pitch in this area focused on KeyPaths rather than map closures, and yielded call-sites like:

people.sorted(by: \.age)

people.sorted(by: [
    SortDescriptor(\.age),
    SortDescriptor(\.name)])

The consensus was that using KeyPaths in this way is unprecedented in the Standard Library. It would be unusual, for example, if we allowed people.sorted(by: \.age) but not people.map(\.age).

My take is that KeyPath<Element, Value> could become interchangeable with (Element) -> Value, since they have the same semantic meaning. That could be implemented as either by blessing KeyPath with compiler magic, or through some sophisticated new API. That would provide implicit KeyPath support for most of the Standard Library, but is far out of scope for this proposal.

Schwartzian transform

In the original pitch thread, several people suggested using the Schwartzian transform as part of the implementation, or perhaps offering it as an option.

I experimented with an implementation that uses the Schwartzian transform, and found these results using this implementation:

Simple Accessor ($0.name)
=========================
Standard Library Sort:  0.0113s
Closure-Based Sort:     0.0154s
Schwartzian Sort:	    0.0307s
    
Complex Calculation ($0.arbitraryComplexCalculation)
=========================
Standard Library Sort:  1.074s
Closure-Based Sort:     1.0514s
Schwartzian Sort:	    0.0578s

While the Schwartzian sort performance was much better for a complex computed property, it was three-times worse for a simple accessor.

The Swift API Design Guidelines suggests that we should "Document the complexity of any computed property that is not O(1)" and notes that "people often assume that property access involves no significant computation".

I think that, since O(1) computed properties are the expected default, we should focus on that case as the default. We could potentially include a opt-in parameter on the sort(by:) map closure implementation, but I didn't include that in this pitch for a few reasons

That approach wouldn't translate very well to the SortDescriptor implementation.
It isn't clear how to present that option in a reasonable way. The term schwartzian transform is jargon and should be avoided.
It isn't the primary focus of this pitch.

Ponyboy47 · September 18, 2018, 7:50pm

This pitch was specifically about mapping with a KeyPath, but I personally agree that KeyPaths should be interchangeable with Elements since that's really all a KeyPath is doing. This should probably be part of its own pitch/proposal though...

Ponyboy47 · September 18, 2018, 8:03pm

I'm +1 for the proposal.

Swift claims to be programming style agnostic so this is a valid point

I was literally writing a sort(by: ...) closure yesterday and had to look at the documentation to see how I should properly implement it.

The aforementioned closure I was writing had 4 or 5 fallbacks and was definitely unwieldy to look at.

NOTE: Your example under "Sorting with map closures" has a typo. You said $0.age < $0.age and it should probably be $0.age < $1.age

I do have a question about whether or not this would be allowed:

array.sorted(by: { $1.age })

// I'd assume it would mean one of these, which should result in the same sorting
array.sorted(by: { $1.age < $0.age })
array.sorted(by: { $0.age > $1.age })

cal · September 18, 2018, 8:09pm

The new declaration I'm proposing takes an (Element) -> T closure:

public func sorted<Comparable>(by value: (Element) -> T) -> [Element]

so array.sorted(by: { $1.age }) would be a compiler error. If you wanted to sort based on age in descending order, you could use the SortDescriptor variant:

array.sorted(by: [SortDescriptor({ $0.age }).reversed])
// returns [Bob (34), Alice, (34), Eve (30)]

beccadax · September 18, 2018, 10:57pm

I like the direction this is going, but how much performance are we giving up by burying the comparison functions in a closure inside a struct inside an array? My gut says it's probably not too bad, but in the final proposal I'd hope to see benchmarks comparing sorts with SortDescriptor to sorts with equivalent handwritten comparison functions (in optimized builds).

Ben_Cohen · September 19, 2018, 12:30am

I'm fairly skeptical of introducing a new SortDescriptor type for this purpose. The beauty of sort taking a closure is that closures are so flexible. They can cover any comparison – less than, greater than, caseless (localized or not), stringly or numerically, partial comparisons like ignoring prefixes. All by taking a single argument, a function from two elements to bool.

It would be nice to stick with this flexibility, and one way to do that is just to take a sequence of predicates:

extension MutableCollection where Self: RandomAccessCollection {
  // an actual proposal should be generic over types other than just  an Array of comparators
  mutating func sort<Comparisons: Sequence>(by comparators: Comparisons)
  where Comparisons.Element == (Element,Element) -> Bool
}

The usage:

people.sort(by: [
  { $0.age < $1.age },
  { $0.name < $1.name },
])

doesn't seem significantly different to the proposed:

people.sorted(by: [
  SortDescriptor({ $0.age }),
  SortDescriptor({ $0.name })
])

In fact, it looks a bit more lightweight to me. But without the need for a new top-level type, and without giving up as much flexibility.

If you're keen to add more sugar, this can still be done without more types, with the addition of functions that transform a key path into a predicate:

func lesser<T,U: Comparable>(of path: KeyPath<T,U>) -> (T,T)->Bool {
  return { $0[keyPath: path] < $1[keyPath: path] }
}

func greater<T,U: Comparable>(of path: KeyPath<T,U>) -> (T,T)->Bool {
  return { $0[keyPath: path] > $1[keyPath: path] }
}

people.sort(by: [lesser(of: \.age), greater(of: \.name)])

cal · September 19, 2018, 1:21am

That's a great suggestion that I hadn't thought of before! I completely prefer that design over the design I proposed initially. It fits in much better with the existing standard library, and it avoids having to add a new type.

My dream syntax would be:

people.sorted(by: [
    { $0.name },
    { $0.age }
])

but we can't express that in the type system without variadic generics. The main reason I've been using a SortDescriptor type is to type-erase the map closures.

I'm still very keen on the map-closure design, though. For symmetry, that could potentially give us four designs:

// (1) The current standard library, sort(by: (Element, Element) -> Bool)
people.sort(by: { $0.age < $1.age })

// (2) A map-closure design, sort<T: Comparable>(by: (Element) -> T)
people.sort(by: { $0.age })

// (3) An improved "Sort Descriptor" design: sort(by: [(Element, Element) -> Bool])
people.sort(by: [
    { $0.age < $1.age },
    { $0.name < $1.name }
])

// (4) An additional design for symmetry, that would require Variadic Generics.
//     Not currently possible.
people.sort(by: [
    { $0.age },
    { $0.name }
])

So that being said, there are a few open questions:

Is it acceptable to have (2) and (3) but not (4)? That would feel a bit asymetric to me.
If this was packaged together as a comprehensive "sort ergonomics" proposal, would it make sense to also conditionally propose (4) (the design that requires variadic generics) on the precondition that it would be implemented if/when it became possible?
Is it even desirable to have four overloads of sort(by:)? If not, which one do we cut? I think (3) is filling a bigger gap / solving a more important problem, while (2) leans closer towards "sugar". That's probably a strong indicator, but I would also be sad to miss out on (2) if we cut it out.

Of all of those options, I think the most sensible choice may potentially be to slim the proposal to just include sort(by: [(Element, Element) -> Bool]).

AlexanderM · September 19, 2018, 1:30am

This was a problem I tried to solve previously. I was hoping to be able to take an array of comparator closures ([(T) -> Comparable]), and sort by each criteria, breaking ties with the next criteria.

Because Comparable is a protocol with a Self constraint, you cannot use it as a nominal type (I think that's the correct term, but I might be mistaken), so you cannot have an array of (T) -> Comparable. Thus, you need a wrapper type like SortDescriptor, which acts like a type eraser to erase the concrete types of the return values (which can be multiple. E.g. first sort by a String name, then an Int age). Conventionally, such a type eraser would be called AnyComparable.

I can't wait for generalized existentials. This is 1 of a million similar problems that we would be able to so elegantly solve, without type erasers.

AlexanderM · September 19, 2018, 1:35am

I envision only 2 necessary overloads:

The current sort(by: (Element, Element) -> Bool)
A future possibility: sort(by: ((Element) -> Comparable)...). This would rely on generalized existential, which would allow (E) -> T1, (E) -> T2, etc. to all be homogeneously generalized to (E) -> Comparable.

The "single sort descriptor" design emerges as a consequence of using a var args call with only 1 arg. For performance improvements, an overload might need to be added that explicitly only take 1 args, to spare the array allocation. I suspect the compiler would be able to statically optimize this, so it might not even be necessary.

jawbroken · September 19, 2018, 3:40am

cal:

To specify further ordering (for example, sorting by age and then by name), we must write a more complex closure:
people.sorted(by: { lhs, rhs in
    if lhs.age != rhs.age {
        return lhs.age < rhs.age
    } else {
        return lhs.name < lhs.name
    }
})

I think in an ideal world this would be represented as

people.sorted(by: { ($0.age, $0.name) })

rather than the proposed collection-based versions, but there are implementation issues here because tuples don't currently conform to Comparable. That limitation is probably easier to lift than implementing variadic generics, though.

The original pitch combined two things: sorting by mapping to something Comparable, and sorting on multiple criteria. It seems like @cal is leaning towards shifting the focus solely to the second thing, but I still think the first one would be nice to address. I think it makes for code that reads naturally, particularly in tutorials for people learning the language, and it would be a powerful tool if tuples were made Comparable some day.

beccadax · September 19, 2018, 9:36am

jawbroken:

I think in an ideal world this would be represented as
people.sorted(by: { ($0.age, $0.name) })
rather than the proposed collection-based versions, but there are implementation issues here because tuples don't currently conform to Comparable . That limitation is probably easier to lift than implementing variadic generics, though.

Wow, that's super-interesting.

I think I've heard in the past that tuple conformances are nontrivial because it's difficult to handle differing arities, and the problem is very similar to variadic generics (unless we generate specialized versions for each arity we might encounter). Having said that, this solution definitely has the right feel—if all the features for this are in place by Swift 20, we'll wish this is how sorting worked.

I really think the best way to describe a sort is, in most cases, to simply list the properties you want to sort on, not to provide a comparator function. The comparator is more flexible, but it's also more repetitive and error-prone. We should not be afraid to include both (Element, Element) -> Bool and (Element) -> T variants of sort(by:) and sorted(by:); what's exciting about Comparable tuples here is that those two might be enough.

I wonder if we might want to support a syntax like \(.age, .name) for a key path which extracts age and name and forms a tuple containing them. If so, this would harmonize well with the "treat keypath literals as potentially forming a closure" design.

If this is where we want to be eventually, we honestly might as well add the sort(by sortKey: (Element) -> SortKey) and equivalent sorted(by:) methods now; these methods are trivial and will be useful even without any further enhancements. Then we can add keypath-literal closures soon; that should be relatively small, but might be large enough to put off until the next release. Comparable tuples are definitely large enough that they need to be put off.

The tuple-keypath feature is trickier, because keypath buffers are ABI. But we're not actually talking about using keypaths in this specific use case—we're talking about using keypath literals to generate a closure. That wouldn't be a problem. If we were willing to have actual, runtime-looked-up tuple keypaths only work on new standard libraries, it's otherwise an additive change.

cal · September 19, 2018, 1:51pm

One potential issue with this design is that it's no longer lazy. One benefit you get from a collection is that you don't have to evaluate / compute the fallback sort keys unless it's actually necessary. I do very much like the design, though, because its rather powerful while only requiring one new overload.

I agree with @beccadax:

Ben_Cohen · September 19, 2018, 3:19pm

They are indeed trivial and so are ruled out by the moratorium on trivially-composable sugar.

It really would be a negative user experience to have many different ways of sorting, just to sugar the experience slightly in common cases. Look at NSArray for what can happen. There are so many choices! Using comparators, using descriptors, using selectors, with hints, with options, with comparison results instead of booleans. It shouldn't take a blog post to explain how to sort stuff.

By contrast the Swift story is still clean and we need to work hard to keep it that way. There is sort and sorted for in- and out-of-place, and you can either go with the default Comparable ordering or supply a predicate. That's it.

Now, I do think there is a need to help developers who want to sort first by one property, then another, then another. So a second sorting method that allows for that, instead of having to write an if/else closure that's tricky to figure out, is warranted under the criteria set out for additions to the standard library. But if possible, that should follow the existing patterns i.e. be closure-based. That's why I think a array of predicates makes sense.

cal · September 19, 2018, 3:30pm

This is what I was getting at up-thread — the sort(by: [(Element, Element) -> Bool]) overload offers a solution to a real issue, rather than sugar an existing call-site.

The improvement becomes even more pronounced as you add more cases / fallbacks:

people.sorted(by: [
    { $0.name < $1.name },
    { $0.age < $1.age },
    { $0.height < $1.height },
    { $0.salary < $1.salary }
])
    
people.sorted(by: { lhs, rhs in
    if lhs.name != rhs.name {
        return lhs.name < rhs.name
    } else if lhs.age != rhs.age {
        return lhs.age < rhs.age
    } else if lhs.height != rhs.height {
        return lhs.height < rhs.height
    } else if lhs.salary != rhs.salary {
        return lhs.salary < rhs.salary
    } else {
        return false
    }
})

I think this has the strongest case to make a good proposal, since it (1) fills in important missing functionality, (2) isn't just trivial sugar, and (3) mirrors the existing standard library very closely.

jawbroken · September 20, 2018, 12:27am

Yeah, doing this properly as a general language feature is similarly complicated, but just making tuples automatically conform to a few useful standard library protocols as a stopgap (e.g. Equatable, Comparable, Hashable) would be much easier and has been previously discussed.

Sure, but this goes back to the Schwartzian transform question where you concluded that this overhead was negligible for most cases because computed properties are generally expected to be O(1). Inlining might make this lazy in the usual case, anyway.

It's still an unfortunately long way from what I think it should be:

people.sorted(by: { ($0.name, $0.age, $0.height, $0.salary) })

I don't see the Element -> Comparable version as leading to an explosion in the ways of sorting like @Ben_Cohen does, and I don't see it as a negative user experience to provide a method that handles the most common use cases for providing a sorting predicate, while also avoiding repetition and not requiring them to think about (or look up) whether the comparison in the predicate should be <, <=, >, or >=.

mpangburn · September 20, 2018, 3:37am

In describing a sort by a property (a la people.sorted(by: { $0.name })), the layer of convenience performs one task: the transformation of a property access of type (Element) -> SortKey to a comparison function of type (Element, Element) -> Bool.

Here's another way of looking at the same transformation: given a function of type (SortKey, SortKey) -> Bool (e.g. <) and a function of type (Element) -> SortKey (e.g. { $0.name }), we can use the latter to lift the former into a function of type (Element, Element) -> Bool (in this case, { $0.name < $1.name }).

We can write a function, which I'll call their for readability purposes made clear in a moment, which performs this lift:

func their<Input, Intermediate, Output>(
    _ extractIntermediate: @escaping (Input) -> Intermediate,
    _ combine: @escaping (Intermediate, Intermediate) -> Output
) -> (Input, Input) -> Output {
    return { input1, input2 in
        combine(extractIntermediate(input1), extractIntermediate(input2))
    }
}

Then, using only the existing sorting API, we can write the following:

people.sorted(by: their({ $0.name }, <)) // equivalent to `{ $0.name < $1.name }`
people.sorted(by: their({ $0.name }, >)) // equivalent to `{ $0.name > $1.name }`

You can picture how much prettier the above syntax would become with the appropriate subtyping relationship between KeyPath<T, U> and (T) -> U:

people.sorted(by: their(\.name, <))

There are additional gains to writing the function in this way, however. Notice that their isn't restricted to Comparable SortKey types, which means we can make use of tuple comparisons even without tuple protocol conformances:

people.sorted(by: their({ ($0.name, $0.age, $0.height, $0.salary) }, <))

Tuple comparison only covers the case when all desired properties are sorted by the same comparison function—Ben's proposed API utilizing a Sequence of comparison functions fills the gap here, and it works with their, too:

people.sorted(by: [
    their({ $0.name }, <),
    their({ $0.age }), >)
])

Finally, I'll note that this function generalizes beyond sorting. Here a couple additional examples:

let oldestPerson = people.max(by: their({ $0.age }, <))
// `their` isn't tied to functions returning `Bool`, either:
let sequentialRunTimeDeltas = zip(firstRuns, secondRuns).map(their({ $0.duration }, -))

Whether their is appropriate for this particular proposal, I'm not sure—but I would be remiss not to mention it for its utility in creating comparison functions without significantly expanding API surface.

A quick note on prior art

Haskell's Data.Function package defines on, which you can see here. on is basically an infix version of their which better lends to readability given the names of functions with which it's commonly used in Haskell.
I'd imagine other functional languages define a similar function, though I haven't done much digging here.

jawbroken · September 20, 2018, 5:27am

That's all true. By the same token, though, we don't need the sequence version of the sort functions because we could just pass an sequence of sort predicates to a function that combines them into a single sort predicate.

Edit: e.g. roughly like

func combineComparisons<S: Sequence, T>(_ comparisons: S) -> (T, T) -> Bool where S.Element == (T, T) -> Bool {
  return { t1, t2 in
    for c in comparisons {
      if c(t1, t2) { return true }
      if c(t2, t1) { return false }
    }
    return false
  }
}

That's probably not my position though, because I think it's nice to provide a few conveniences that don't require you to know about a bunch of free functions.

cal · September 21, 2018, 11:45am

mpangburn:

people.sorted(by: [
    their({ $0.name }, <),
    their({ $0.age }), >)
])
Finally, I'll note that this function generalizes beyond sorting. Here's [an] example:
let oldestPerson = people.max(by: their({ $0.age }, <))

You do have a great point here, with respect to generalizing to use-cases other than just sort and sorted. It's nice that max and min get the benefits of a design like this effectively for-free.

I have some thoughts, though:

I would want to see an overload like func their<Input, SortKey: Comparable>(_ sortKey: SortKey) -> (Input, Input) -> Bool that provides a sensible default for Comparable sort keys.
There's precedent for this approach in Haskell, but that's a very composition-heavy language. Are there any examples of a composition-approach like this in the Standard Library?
I'm not a huge fan of free-floating global functions — they're tough to find if you don't already know about them.
their({ $0.age }) isn't that much shorter than { $0.age < $1.age }

jrose · September 21, 2018, 4:33pm

One problem with the $0.age < $1.age syntax is that it repeats the "age", which leads to copy-and-paste bugs. It's also kind of a pain for longer property names.

To take a wild guess, I think a series of key paths / key functions would handle 90% of the custom sorts that are out there today, i.e. only 1 in every 10 custom sorts would need to fall back to the general comparator closure form. Having the sort-by-key form, even a single key, is something I always appreciated in Python and Ruby. We might not have the language tools to do it yet today, though, and I wouldn't want to jam something in to work around that.

mpangburn · September 21, 2018, 5:12pm

I would want to see an overload like func their<Input, SortKey: Comparable>(_ sortKey: SortKey) -> (Input, Input) -> Bool that provides a sensible default for Comparable sort keys.

The challenge with providing an overload with a default second argument is the conflation of their with comparison functions specifically. Lifting a function of one argument onto a function of two arguments is a general operation without ties to concrete types. You can imagine something like the following for a single-argument their function:

sequenceOfUserTuples.map(their { $0.age })

I would expect an array of tuples containing the age corresponding to each user in each tuple of the previous sequence rather than an array of Bool values. This intuition would (to me) suggest a default argument of the identity function for the second parameter—and at that point we've just defined map on the homogeneous two-tuple.

I think a better spelling for a single-argument version of their specialized for comparison functions would be the lesser(of:) and greater(of:) functions that Ben described above.

There's precedent for this approach in Haskell, but that's a very composition-heavy language. Are there any examples of a composition-approach like this in the Standard Library?

I'm not a huge fan of free-floating global functions — they're tough to find if you don't already know about them.

I think their would be a great fit for a non-standard library geared toward functional utilities, but the non-standard library movement hasn't gained much traction. You're right that it doesn't align well with the Standard Library, which (at least thus far) hasn't supported abstractions at this level, which really only make sense as free functions—but I felt it was worth a mention in this thread for its applicability in this domain.

their({ $0.age }) isn't that much shorter than { $0.age < $1.age }

Jordan just touched on this, but the goal here isn't to reduce the number of characters; it's to reduce repetition. The arguments here aren't specific to their and can be made analagously to those proposed with Bool.toggle() with respect to the value of removing repetition on opposite sides of an operator.