Sort `Collection` using `KeyPath`

cal · July 18, 2018, 3:31am

Hey everybody — I'd like to pitch and socialize a new method for sorting Collections using a KeyPath.

Today, to sort a Collection of non-Comparable elements, we write a closure that returns whether or not two elements areInIncreasingOrder:

[[1, 2, 3], [1], [1, 2]].sorted(by: { $0.count < $1.count })
// returns [[1], [1, 2], [1, 2, 3]]

This has a few features that you could consider "pain points":

The call site doesn't form a grammatical english phrase. [1]
It's relatively easy to mix up < and >, and it isn't immediately obvious from the call site which operator results in which ordering.
Repeating the property for both $0 and $1 is a touch verbose.

Key Paths could come in handy here. If we had a sort variant that accepted a KeyPath, we could instead write:

[[1, 2, 3], [1], [1, 2]].sorted(by: \.count)
// returns [[1], [1, 2], [1, 2, 3]]

This is a big improvement in terms of expressivity, clarity, and brevity!

A less-arbitrary example could look something like this:

struct OS: Equatable, Hashable {
    let name: String
    let yearReleased: Int
    
    static let supported = Set([
        OS(name: "FreeBSD", yearReleased: 1993),
        OS(name: "iOS", yearReleased: 2007),
        OS(name: "macOS", yearReleased: 2001), ...])
}


OS.supported.sorted(by: \.name)
OS.supported.sorted(by: \.yearReleased)

// instead of
OS.supported.sorted(by: { $0.name < $1.name })
OS.supported.sorted(by: { $0.yearReleased < $1.yearReleased })

Implementation

Here's one potential implementation:

extension Sequence {
    
    public func sorted<Value>(
        by keyPath: KeyPath<Self.Element, Value>,
        using valuesAreInIncreasingOrder: (Value, Value) throws -> Bool)
        rethrows -> [Self.Element]
    {
        return try self.sorted(by: {
            try valuesAreInIncreasingOrder($0[keyPath: keyPath], $1[keyPath: keyPath])
        })
    }
    
    public func sorted<Value: Comparable>(
        by keyPath: KeyPath<Self.Element, Value>)
        -> [Self.Element]
    {
        return self.sorted(by: keyPath, using: <)
    }
    
}

extension MutableCollection where Self: RandomAccessCollection {
    // an equivalent set of methods, but `public mutating func sort(by:...) { ... }`
}

I'd love to hear other people's thoughts on this! I'm aware there's a high bar for accepting new methods or method-variants into the Standard Library, but this seems like a great application of the (relatively new) KeyPath API.

Additional Directions

We can also use Key Paths to drive a new SortDescriptor API:

OS.supported.sorted(by: [
    SortDescriptor(\.yearReleased),
    SortDescriptor(\.name, using: String.caseInsensitiveCompare)])

// instead of
OS.supported.sorted { lhs, rhs in
    if lhs.yearReleased == rhs.yearReleased {
        return lhs.name.caseInsensitiveCompare(rhs.name) == .orderedAscending
    } else {
        return lhs.yearReleased < rhs.yearReleased
    }
}

I described one potential implementation for an API like that a bit down-thread.

Alternatives

If we want symmetry with methods like map and filter, we could potentially use (Self.Element) -> Comparable instead so that it would read like OS.supported.sorted(by: { $0.yearReleased }) and SortDescriptor({ $0.name }, using: String.caseInsensitiveCompare).

jawbroken · July 18, 2018, 3:56am

There have been a few requests for a way to convert key paths to functions (e.g. your \.name to a (OS) -> String function), either implicitly or using a prefix operator. Key paths could then be used naturally with higher-order functions like map, filter, etc without having to create a lot of duplicate APIs. So I would suggest that the fundamental form of this proposal would be the (Self.Element) -> Comparable version that you mention in alternatives, which also has the advantage of being more flexible/general. I think that version would be a great addition to the standard library.

Jon_Hull · July 18, 2018, 4:52pm

I really like this!

As a nitpick, I think the implementation should be:

public func sorted<Value: Comparable>(
    by keyPath: KeyPath<Self.Element, Value>,
    order: Order = .increasing) -> [Self.Element]
{
    switch order {
    case .increasing: 
        return self.sorted(by: { $0[keyPath: keyPath]  <  $1[keyPath: keyPath] })
    case .decreasing: 
        return self.sorted(by: { $0[keyPath: keyPath]  >  $1[keyPath: keyPath] })
    }
}

That way we are only performing the switch once, as opposed to once-per-comparison.

Also, while we are here, it might be nice to have a version which takes a series of key paths to do a more complex sort (where the next keypath in line is used when the current one evaluates equal).

cal · July 18, 2018, 5:15pm

Great point on the implementation! I'll update it.

Regarding multiple keypaths —This is something I looked in to:

func sorted<Value: Comparable>(by keyPaths: [KeyPath<Element, Value>]) -> [Element] { ... }

//or even
struct SortDescriptor<Element, Value: Comparable> { let keyPath: KeyPath<Element, Value>; let order: Order; }

func sorted<Value>(by descriptors: [SortDescriptor<Element, Value>]) -> [Element] { ... }

The biggest issue that is that all of the key paths would have to result in the same Value type, so variadic generics would be a real prerequisite to implementing it correctly.

Ben_Cohen · July 18, 2018, 5:19pm

I think there is a lot of room for ergonomic improvements to sorting, so it's great to see this!

I would suggest, though, taking a step back and thinking about all the different aspects of this as a a whole. Individual targeted fixes are nice, but risk API sprawl when they handle some cases not others, leading to other targeted fixes.

Some topics to consider:

key paths vs closures
- note, localized/caseless sorting is also painful, but at least doable, with closure-based arguments
- including weighing the pros/cons of parametrizing with ascending/descending choice (not a technique that sits well with the existing idioms used in the std lib)
sorting by a sequence of comparators, lexicographically
partial sorting/partitioning

A great place to start would be a comparison with the existing APIs to be found in other places i.e. Java, Ruby, Cocoa (which has all sorts of different ways to sort, sortUsingDescriptors being particularly of interest).

The multi-field sort problem is also linked to the question of switching Swift's sort algorithm to be stable by default, though that's also a big topic so doesn't have to be tied to this, just something to consider.

AaronKreipe · July 18, 2018, 8:11pm

This is a great idea! I have been using something very similar to this, except with a closure that takes a comparator instead of just a value.

extension Collection{
    func sorted<Value: Comparable>(
        by keyPath: KeyPath<Element, Value>,
        _ comparator: (_ lhs: Value, _ rhs: Value)->Bool)->[Element]{
        return self.sorted{
            comparator($0[keyPath: keyPath], $1[keyPath: keyPath])
        }
    }
}

https://pbs.twimg.com/media/DWDX0hUV4AAZc2W?format=jpg&name=small

Nobody1707 · July 18, 2018, 9:28pm

I was just going to suggest this, but with comparator defaulted to <. Which requires parentheses, in case you were also having trouble figuring out how to reference < as a closure.

extension Collection {
    func sorted<Value: Comparable>(
        by keyPath: KeyPath<Element, Value>,
        _ comparator: (Value, Value) -> Bool = (<)) -> [Element] {
        return sorted {
            comparator($0[keyPath: keyPath], $1[keyPath: keyPath])
        }
    }
}

cal · July 18, 2018, 9:38pm

_ comparator: (Value, Value) -> Bool = (<)

I think that approach is a lot better than using an enum. It's more consistent with idioms that currently exist in the standard library, it's more powerful, and it doesn't require declaring a new type. I'll update the pitch!

Nobody1707 · July 18, 2018, 9:50pm

Making it rethrowing caused these errors in particular that confused me:

./KeyPathSort.swift:35:7: error: call can throw but is not marked with 'try'
print(things.sorted(by: \.level))
      ^~~~~~~~~~~~~~~~~~~~~~~~~~
./KeyPathSort.swift:35:7: note: call is to 'rethrows' function, but a defaulted argument function can throw
print(things.sorted(by: \.level))

< doesn't throw, so why does the compiler think the default implementation throws?

Tino · July 18, 2018, 9:52pm

You don't need to restrict Value to be Comparable — that shortcut can go into an extension that builds on top of the general case.

cal · July 18, 2018, 10:00pm

You're right — this may make more sense:

extension Sequence {
    
    public func sorted<Value>(
        by keyPath: KeyPath<Self.Element, Value>,
        using valuesAreInIncreasingOrder: (Value, Value) throws -> Bool)
        rethrows -> [Self.Element]
    {
        return try self.sorted(by: {
            try valuesAreInIncreasingOrder($0[keyPath: keyPath], $1[keyPath: keyPath])
        })
    }
    
    public func sorted<Value: Comparable>(
        by keyPath: KeyPath<Self.Element, Value>)
        -> [Self.Element]
    {
        return self.sorted(by: keyPath, using: <)
    }
    
}

Nobody1707 · July 18, 2018, 10:05pm

Yeah, that makes it work properly while still being rethrowing.

cal · July 18, 2018, 10:22pm

Within the context of this pitch, my ideal call-site syntax for doing advanced String comparisons would look like this:

OS.supported.sorted(by: \.name, using: .caseInsensitiveCompare)
OS.supported.sorted(by: \.name, using: .localizedCaseInsensitveCompare)

The trouble is those methods are currently (String) -> (String) -> ComparisonResult, so it's not immediately compatible with (String, String) -> Bool.

You'd currently have to write something like this:

OS.supported.sorted(by: \.name, using: { $0.caseInsensitiveCompare($1) == .orderedAscending })

[SE-0042] Flattening the function type of unapplied method references was accepted in March 2016 (!!), but it looks like we still don't have an actual implementation that's made it into the language. If that made it across the finish line, it would partially solve the problem.

It's also unclear how ComparisonResult would fit in this picture. Is .orderedSame truthy or falsey? That seems to play in to the bigger unanswered question about stable sorting.

Jon_Hull · July 19, 2018, 2:12am

cal:

Within the context of this pitch, my ideal call-site syntax for doing advanced String comparisons would look like this:
OS.supported.sorted(by: \.name, using: .caseInsensitiveCompare)
OS.supported.sorted(by: \.name, using: .localizedCaseInsensitveCompare)
The trouble is those methods are currently (String) -> (String) -> ComparisonResult , so it's not immediately compatible with (String, String) -> Bool .

You'd currently have to write something like this:
OS.supported.sorted(by: \.name, using: { $0.caseInsensitiveCompare($1) == .orderedAscending })

Hmm... maybe the best way to spell this is to put a localization normalization method on string:

OS.supported.sorted(by: \name.lowercased)
OS.supported.sorted(by: \name.normalized)

cal · July 19, 2018, 2:51am

That doesn't work out very well using an array of Key Paths, but it does work if we use a type-erased SortDescriptor struct!

We'd get a call site that reads very nicely:

OS.supported.sorted(by: [
    SortDescriptor(\.yearReleased),
    SortDescriptor(\.name, using: String.caseInsensitiveCompare)])

compared to how you have have to express that today:

OS.supported.sorted { lhs, rhs in
    if (lhs.yearReleased == rhs.yearReleased) {
        return lhs.name.caseInsensitiveCompare(rhs.name) == .orderedAscending
    } else {
        return lhs.yearReleased < rhs.yearReleased
    }
}

It uses an implementation that looks like this.

mpangburn · July 19, 2018, 3:04am

I agree that Swift's sorting API is currently underdeveloped, and sorting on a particular key would be a notable improvement.

Java's Comparator API has a few features that translate well to Swift, most notably the idea of sorting potentially null values. In Swift, it would be nice to sort a Sequence on an Optional key while allowing preference as to whether nil values should come before or after non-nil values.

I have a sketch of a comparator (sort descriptor) API in Swift here, though it's a little messy since it involves workarounds for lack of parameterized extensions.

beccadax · July 19, 2018, 6:51am

Something like this, perhaps?

So far, the biggest irritation I've noticed is that a design as generic as this one struggles to provide context for a literal syntax, key path root type inference, leading-dot syntax, etc. That makes a lot of things we'd like to do much clumsier than I'd like. Maybe we could resolve those issues with a little more work, though.

cal · July 19, 2018, 7:06am

That implementation is really compelling!! I like the notion that a KeyPath can *be* a SortDescriptor in its own right.

I do think it's really important for a SortDescriptor's compare method to return a ComparisonResult rather than a Bool, though. If were were to add a "sort by descriptors" method, the .orderedSame case is key for knowing when to disambiguate with the next descriptor.

beccadax · July 19, 2018, 9:31pm

A little more work on that, exposing methods more carefully and adding - and | operators to allow negation and fallbacks.

I think it should return whatever Comparable returns; I don't really care whether that's a boolean or an enum, I just care that it's consistent. Note that ComparisonResult is currently part of Foundation; we'd have to bring it down into the standard library to use it in this.

Nevin · July 19, 2018, 10:12pm

There was discussion last year about introducing a 4-case enum for comparison results: greater than, less than, equal to, and incomparable. The last one being important for IEEE floats.