Sort `Collection` using `KeyPath`

Hey everybody β€” I'd like to pitch and socialize a new method for sorting Collections using a KeyPath.

Today, to sort a Collection of non-Comparable elements, we write a closure that returns whether or not two elements areInIncreasingOrder:

[[1, 2, 3], [1], [1, 2]].sorted(by: { $0.count < $1.count })
// returns [[1], [1, 2], [1, 2, 3]]

This has a few features that you could consider "pain points":

  1. The call site doesn't form a grammatical english phrase. [1]
  2. It's relatively easy to mix up < and >, and it isn't immediately obvious from the call site which operator results in which ordering.
  3. Repeating the property for both $0 and $1 is a touch verbose.

Key Paths could come in handy here. If we had a sort variant that accepted a KeyPath, we could instead write:

[[1, 2, 3], [1], [1, 2]].sorted(by: \.count)
// returns [[1], [1, 2], [1, 2, 3]]

This is a big improvement in terms of expressivity, clarity, and brevity!

A less-arbitrary example could look something like this:

struct OS: Equatable, Hashable {
    let name: String
    let yearReleased: Int
    
    static let supported = Set([
        OS(name: "FreeBSD", yearReleased: 1993),
        OS(name: "iOS", yearReleased: 2007),
        OS(name: "macOS", yearReleased: 2001), ...])
}


OS.supported.sorted(by: \.name)
OS.supported.sorted(by: \.yearReleased)

// instead of
OS.supported.sorted(by: { $0.name < $1.name })
OS.supported.sorted(by: { $0.yearReleased < $1.yearReleased })

Implementation

Here's one potential implementation:

extension Sequence {
    
    public func sorted<Value>(
        by keyPath: KeyPath<Self.Element, Value>,
        using valuesAreInIncreasingOrder: (Value, Value) throws -> Bool)
        rethrows -> [Self.Element]
    {
        return try self.sorted(by: {
            try valuesAreInIncreasingOrder($0[keyPath: keyPath], $1[keyPath: keyPath])
        })
    }
    
    public func sorted<Value: Comparable>(
        by keyPath: KeyPath<Self.Element, Value>)
        -> [Self.Element]
    {
        return self.sorted(by: keyPath, using: <)
    }
    
}

extension MutableCollection where Self: RandomAccessCollection {
    // an equivalent set of methods, but `public mutating func sort(by:...) { ... }`
}

I'd love to hear other people's thoughts on this! I'm aware there's a high bar for accepting new methods or method-variants into the Standard Library, but this seems like a great application of the (relatively new) KeyPath API.

Additional Directions

We can also use Key Paths to drive a new SortDescriptor API:

OS.supported.sorted(by: [
    SortDescriptor(\.yearReleased),
    SortDescriptor(\.name, using: String.caseInsensitiveCompare)])

// instead of
OS.supported.sorted { lhs, rhs in
    if lhs.yearReleased == rhs.yearReleased {
        return lhs.name.caseInsensitiveCompare(rhs.name) == .orderedAscending
    } else {
        return lhs.yearReleased < rhs.yearReleased
    }
}

I described one potential implementation for an API like that a bit down-thread.

Alternatives

If we want symmetry with methods like map and filter, we could potentially use (Self.Element) -> Comparable instead so that it would read like OS.supported.sorted(by: { $0.yearReleased }) and SortDescriptor({ $0.name }, using: String.caseInsensitiveCompare).

18 Likes

There have been a few requests for a way to convert key paths to functions (e.g. your \.name to a (OS) -> String function), either implicitly or using a prefix operator. Key paths could then be used naturally with higher-order functions like map, filter, etc without having to create a lot of duplicate APIs. So I would suggest that the fundamental form of this proposal would be the (Self.Element) -> Comparable version that you mention in alternatives, which also has the advantage of being more flexible/general. I think that version would be a great addition to the standard library.

13 Likes

I really like this!

As a nitpick, I think the implementation should be:

public func sorted<Value: Comparable>(
    by keyPath: KeyPath<Self.Element, Value>,
    order: Order = .increasing) -> [Self.Element]
{
    switch order {
    case .increasing: 
        return self.sorted(by: { $0[keyPath: keyPath]  <  $1[keyPath: keyPath] })
    case .decreasing: 
        return self.sorted(by: { $0[keyPath: keyPath]  >  $1[keyPath: keyPath] })
    }
}

That way we are only performing the switch once, as opposed to once-per-comparison.

Also, while we are here, it might be nice to have a version which takes a series of key paths to do a more complex sort (where the next keypath in line is used when the current one evaluates equal).

4 Likes

Great point on the implementation! I'll update it.

Regarding multiple keypaths β€”This is something I looked in to:

func sorted<Value: Comparable>(by keyPaths: [KeyPath<Element, Value>]) -> [Element] { ... }

//or even
struct SortDescriptor<Element, Value: Comparable> { let keyPath: KeyPath<Element, Value>; let order: Order; }

func sorted<Value>(by descriptors: [SortDescriptor<Element, Value>]) -> [Element] { ... }

The biggest issue that is that all of the key paths would have to result in the same Value type, so variadic generics would be a real prerequisite to implementing it correctly.

I think there is a lot of room for ergonomic improvements to sorting, so it's great to see this!

I would suggest, though, taking a step back and thinking about all the different aspects of this as a a whole. Individual targeted fixes are nice, but risk API sprawl when they handle some cases not others, leading to other targeted fixes.

Some topics to consider:

  • key paths vs closures
    • note, localized/caseless sorting is also painful, but at least doable, with closure-based arguments
    • including weighing the pros/cons of parametrizing with ascending/descending choice (not a technique that sits well with the existing idioms used in the std lib)
  • sorting by a sequence of comparators, lexicographically
  • partial sorting/partitioning

A great place to start would be a comparison with the existing APIs to be found in other places i.e. Java, Ruby, Cocoa (which has all sorts of different ways to sort, sortUsingDescriptors being particularly of interest).

The multi-field sort problem is also linked to the question of switching Swift's sort algorithm to be stable by default, though that's also a big topic so doesn't have to be tied to this, just something to consider.

3 Likes

This is a great idea! I have been using something very similar to this, except with a closure that takes a comparator instead of just a value.

extension Collection{
    func sorted<Value: Comparable>(
        by keyPath: KeyPath<Element, Value>,
        _ comparator: (_ lhs: Value, _ rhs: Value)->Bool)->[Element]{
        return self.sorted{
            comparator($0[keyPath: keyPath], $1[keyPath: keyPath])
        }
    }
}

https://pbs.twimg.com/media/DWDX0hUV4AAZc2W?format=jpg&name=small

5 Likes

I was just going to suggest this, but with comparator defaulted to <. Which requires parentheses, in case you were also having trouble figuring out how to reference < as a closure.

extension Collection {
    func sorted<Value: Comparable>(
        by keyPath: KeyPath<Element, Value>,
        _ comparator: (Value, Value) -> Bool = (<)) -> [Element] {
        return sorted {
            comparator($0[keyPath: keyPath], $1[keyPath: keyPath])
        }
    }
}
7 Likes

_ comparator: (Value, Value) -> Bool = (<)

I think that approach is a lot better than using an enum. It's more consistent with idioms that currently exist in the standard library, it's more powerful, and it doesn't require declaring a new type. I'll update the pitch!

1 Like

Making it rethrowing caused these errors in particular that confused me:

./KeyPathSort.swift:35:7: error: call can throw but is not marked with 'try'
print(things.sorted(by: \.level))
      ^~~~~~~~~~~~~~~~~~~~~~~~~~
./KeyPathSort.swift:35:7: note: call is to 'rethrows' function, but a defaulted argument function can throw
print(things.sorted(by: \.level))

< doesn't throw, so why does the compiler think the default implementation throws?

You don't need to restrict Value to be Comparable β€” that shortcut can go into an extension that builds on top of the general case.

You're right β€” this may make more sense:

extension Sequence {
    
    public func sorted<Value>(
        by keyPath: KeyPath<Self.Element, Value>,
        using valuesAreInIncreasingOrder: (Value, Value) throws -> Bool)
        rethrows -> [Self.Element]
    {
        return try self.sorted(by: {
            try valuesAreInIncreasingOrder($0[keyPath: keyPath], $1[keyPath: keyPath])
        })
    }
    
    public func sorted<Value: Comparable>(
        by keyPath: KeyPath<Self.Element, Value>)
        -> [Self.Element]
    {
        return self.sorted(by: keyPath, using: <)
    }
    
}
4 Likes

Yeah, that makes it work properly while still being rethrowing.

Within the context of this pitch, my ideal call-site syntax for doing advanced String comparisons would look like this:

OS.supported.sorted(by: \.name, using: .caseInsensitiveCompare)
OS.supported.sorted(by: \.name, using: .localizedCaseInsensitveCompare)

The trouble is those methods are currently (String) -> (String) -> ComparisonResult, so it's not immediately compatible with (String, String) -> Bool.

You'd currently have to write something like this:

OS.supported.sorted(by: \.name, using: { $0.caseInsensitiveCompare($1) == .orderedAscending })

[SE-0042] Flattening the function type of unapplied method references was accepted in March 2016 (!!), but it looks like we still don't have an actual implementation that's made it into the language. If that made it across the finish line, it would partially solve the problem.

It's also unclear how ComparisonResult would fit in this picture. Is .orderedSame truthy or falsey? That seems to play in to the bigger unanswered question about stable sorting.

Hmm... maybe the best way to spell this is to put a localization normalization method on string:

OS.supported.sorted(by: \name.lowercased)
OS.supported.sorted(by: \name.normalized)

That doesn't work out very well using an array of Key Paths, but it does work if we use a type-erased SortDescriptor struct!

We'd get a call site that reads very nicely:

OS.supported.sorted(by: [
    SortDescriptor(\.yearReleased),
    SortDescriptor(\.name, using: String.caseInsensitiveCompare)])

compared to how you have have to express that today:

OS.supported.sorted { lhs, rhs in
    if (lhs.yearReleased == rhs.yearReleased) {
        return lhs.name.caseInsensitiveCompare(rhs.name) == .orderedAscending
    } else {
        return lhs.yearReleased < rhs.yearReleased
    }
}

It uses an implementation that looks like this.

1 Like

I agree that Swift's sorting API is currently underdeveloped, and sorting on a particular key would be a notable improvement.

Java's Comparator API has a few features that translate well to Swift, most notably the idea of sorting potentially null values. In Swift, it would be nice to sort a Sequence on an Optional key while allowing preference as to whether nil values should come before or after non-nil values.

I have a sketch of a comparator (sort descriptor) API in Swift here, though it's a little messy since it involves workarounds for lack of parameterized extensions.

Something like this, perhaps?

So far, the biggest irritation I've noticed is that a design as generic as this one struggles to provide context for a literal syntax, key path root type inference, leading-dot syntax, etc. That makes a lot of things we'd like to do much clumsier than I'd like. Maybe we could resolve those issues with a little more work, though.

2 Likes

That implementation is really compelling!! I like the notion that a KeyPath can *be* a SortDescriptor in its own right.

I do think it's really important for a SortDescriptor's compare method to return a ComparisonResult rather than a Bool, though. If were were to add a "sort by descriptors" method, the .orderedSame case is key for knowing when to disambiguate with the next descriptor.

A little more work on that, exposing methods more carefully and adding - and | operators to allow negation and fallbacks.

I think it should return whatever Comparable returns; I don't really care whether that's a boolean or an enum, I just care that it's consistent. Note that ComparisonResult is currently part of Foundation; we'd have to bring it down into the standard library to use it in this.

1 Like

There was discussion last year about introducing a 4-case enum for comparison results: greater than, less than, equal to, and incomparable. The last one being important for IEEE floats.

1 Like