Sort `Collection` using `KeyPath`

You don't need to restrict Value to be Comparable — that shortcut can go into an extension that builds on top of the general case.

You're right — this may make more sense:

extension Sequence {
    
    public func sorted<Value>(
        by keyPath: KeyPath<Self.Element, Value>,
        using valuesAreInIncreasingOrder: (Value, Value) throws -> Bool)
        rethrows -> [Self.Element]
    {
        return try self.sorted(by: {
            try valuesAreInIncreasingOrder($0[keyPath: keyPath], $1[keyPath: keyPath])
        })
    }
    
    public func sorted<Value: Comparable>(
        by keyPath: KeyPath<Self.Element, Value>)
        -> [Self.Element]
    {
        return self.sorted(by: keyPath, using: <)
    }
    
}
4 Likes

Yeah, that makes it work properly while still being rethrowing.

Within the context of this pitch, my ideal call-site syntax for doing advanced String comparisons would look like this:

OS.supported.sorted(by: \.name, using: .caseInsensitiveCompare)
OS.supported.sorted(by: \.name, using: .localizedCaseInsensitveCompare)

The trouble is those methods are currently (String) -> (String) -> ComparisonResult, so it's not immediately compatible with (String, String) -> Bool.

You'd currently have to write something like this:

OS.supported.sorted(by: \.name, using: { $0.caseInsensitiveCompare($1) == .orderedAscending })

[SE-0042] Flattening the function type of unapplied method references was accepted in March 2016 (!!), but it looks like we still don't have an actual implementation that's made it into the language. If that made it across the finish line, it would partially solve the problem.

It's also unclear how ComparisonResult would fit in this picture. Is .orderedSame truthy or falsey? That seems to play in to the bigger unanswered question about stable sorting.

Hmm... maybe the best way to spell this is to put a localization normalization method on string:

OS.supported.sorted(by: \name.lowercased)
OS.supported.sorted(by: \name.normalized)

That doesn't work out very well using an array of Key Paths, but it does work if we use a type-erased SortDescriptor struct!

We'd get a call site that reads very nicely:

OS.supported.sorted(by: [
    SortDescriptor(\.yearReleased),
    SortDescriptor(\.name, using: String.caseInsensitiveCompare)])

compared to how you have have to express that today:

OS.supported.sorted { lhs, rhs in
    if (lhs.yearReleased == rhs.yearReleased) {
        return lhs.name.caseInsensitiveCompare(rhs.name) == .orderedAscending
    } else {
        return lhs.yearReleased < rhs.yearReleased
    }
}

It uses an implementation that looks like this.

1 Like

I agree that Swift's sorting API is currently underdeveloped, and sorting on a particular key would be a notable improvement.

Java's Comparator API has a few features that translate well to Swift, most notably the idea of sorting potentially null values. In Swift, it would be nice to sort a Sequence on an Optional key while allowing preference as to whether nil values should come before or after non-nil values.

I have a sketch of a comparator (sort descriptor) API in Swift here, though it's a little messy since it involves workarounds for lack of parameterized extensions.

Something like this, perhaps?

So far, the biggest irritation I've noticed is that a design as generic as this one struggles to provide context for a literal syntax, key path root type inference, leading-dot syntax, etc. That makes a lot of things we'd like to do much clumsier than I'd like. Maybe we could resolve those issues with a little more work, though.

2 Likes

That implementation is really compelling!! I like the notion that a KeyPath can *be* a SortDescriptor in its own right.

I do think it's really important for a SortDescriptor's compare method to return a ComparisonResult rather than a Bool, though. If were were to add a "sort by descriptors" method, the .orderedSame case is key for knowing when to disambiguate with the next descriptor.

A little more work on that, exposing methods more carefully and adding - and | operators to allow negation and fallbacks.

I think it should return whatever Comparable returns; I don't really care whether that's a boolean or an enum, I just care that it's consistent. Note that ComparisonResult is currently part of Foundation; we'd have to bring it down into the standard library to use it in this.

1 Like

There was discussion last year about introducing a 4-case enum for comparison results: greater than, less than, equal to, and incomparable. The last one being important for IEEE floats.

1 Like

Looking at your implementation, I think you're right about this. I didn't consider that you can still derive == using only orders(_:before:) or <. In that case, it seems best to model Comparable as closely as possible.

public func orders(_ a: First.Root, before b: First.Root) -> Bool {
    if first.orders(a, before: b) {
        return true
    }
    else if first.orders(b, before: a) {
        return false
    }
    else {
         // this is the .orderedSame case 👍
        return second.orders(a, before: b)
    }
}

You also worked in some cool operators:

people.sorted(by: -\Person.name) // reversed the sort
people.sorted(by: \Person.name | \Person.age) // uses \.age to disambiguate

I like how the call-site terse and yet expressive. I also really like the notion of negating / reversing a SortDescriptor after creating it. I don't think operators are a good direction to move towards, though:

  • Negating and composing Sort Descriptors is totally reasonable, but the current spelling makes it read like it's negating and composing Key Paths (which is less meaningful). Although a KeyPath is a SortDescriptor, that specific conformance seems more like an implementation detail than an intrinsic "feature" of a KeyPath. I think we should avoid overloading behavior onto KeyPath itself.

  • There isn't much precedent for a composition API like that in the Standard Library, so I think working that in to the proposal would cause more pushback than an API that took a [SortDescriptor].

I'm also not convinced we want to use a SortDescriptorProtocol. While this would let us treat a KeyPath as SortDescriptor without any extra decoration, I think there are a few issues we'd have to contend with:

  • The autocomplete suggestions on public func sorted(by: SortDescriptorProtocol) become less useful for people unfamiliar with the API. It's not clear that you can use a KeyPath in that way unless you additionally know that KeyPath: SortDescriptorProtocol.

  • The KeyPath literals are pretty fragile when the whole thing is abstracted behind a protocol, so you have to use \Person.name instead of \.name. (Is this a compiler bug related to KeyPaths, or an unavoidable aspect of using a protocol in that way? I'm not sure.)

I think the best mixture of these two approaches would look something like this:

people.sorted(by: \.name)

people.sorted(by: SortDescriptor(\.age).reversed)

people.sorted(by: [
    SortDescriptor(\.age).reversed,
    SortDescriptor(\.name, using: String.caseInsensitiveCompare)])

with an API like this (full implementation):

public struct SortDescriptor<Element> {
    public func orders(_ a: Element, before b: Element) -> Bool { .... }
    public var reversed: SortDescriptor { ... }
}

extension Collection {
    
    public func sorted<Value: Comparable>(by keyPath: KeyPath<Self.Element, Value>) -> [Self.Element] {
        return self.sorted(by: SortDescriptor(keyPath))
    }
    
    public func sorted(by sortDescriptor: SortDescriptor<Self.Element>) -> [Self.Element] {
        return self.sorted(by: [sortDescriptor])
    }
    
    public func sorted(by descriptors: [SortDescriptor<Self.Element>]) -> [Self.Element] {
        ...
    }
    
}

and an autocomplete that looks like this:

1 Like

Going back to the point I was trying to make in my original reply, one way forward would be to separate this into a few proposals:

  • A proposal for some version of sort/sorted that takes (Self.Element) -> Comparable. This seems to be straightforwardly useful and uncontroversial, with mostly implementation details that still need to be discussed (e.g. should it be implemented using a Schwartzian transform or is the overhead not worthwhile for common use cases).
  • Ways to convert key paths, collections of key paths, etc. into closures (or types like SortDescriptor) that are useful for higher-order functions generally. For example, you might want convert a collection of key paths into an (Element) -> (T, U, …) closure to use in a map (and if tuples could conform to Comparable this would be directly usable for sorting, also). This is a large design space that will take time to explore and is inherently more controversial because it has implications for the long term design of the standard library.
  • Possibly separately, or part of the previous point, consider if there should be a bunch of convenience overloads (like the ones for sorted(by:) in @cal's picture above) for every higher order function to remove the need to use a prefix operator, or call/refer to an instance method, to convert from, say, a SortDescriptor into its underlying (T, T) -> Bool ordering function.

All these considerations are just as useful for things like partitioning, grouping elements of collections in a Dictionary by some derived key, etc, as @Ben_Cohen pointed out.

2 Likes

The API is nice, but looking at the implementation I am completely lost. I don't want to pick on your code, the standard library is even worse. I find it a failure of the language.

Basically abstractions to the point of incomprehension. It is like haskell, but without the elegance.

One other thing I want to throw in here is that with this form of the API, we can use the Schwartzian transform to only calculate the value at the keypath once per element instead of O(nlogn) times. It could be an option on the function. It would result in much faster sorts when the value at the keypath is expensive to retrieve.

6 Likes

It seems very reasonable that KeyPath<T, U> would be implicitly interchangeable with (T) -> U.

By the same system, some ComparatorProtocol or SortDescriptorProtocol could potentially be interchangeable with (T, T) -> Bool.

There are a few approaches I can think of:

1) Closure types as a Protocol

We could treat any generic closure type as a protocol that can be implemented. That could give us spellings like this:

extension KeyPath: (Root) -> Value {
    func call(with root: Root) -> Value { ... }
}

extension SortDescriptor: (Element, Element) -> Bool {
    func call(with a: Element, _ b: Element) -> Value { ... }
}

2) @callable

Another possible design could mirror the design of SE-216 Dynamic Callable's @dynamicCallable more closely. We could have a @callable attribute. Some possible spellings could inclue:

2.1) @callable(type)

@callable((Root) -> Value)
struct KeyPath<Root, Value> {
    func perform(_ root: Root) -> Value { ... }
}

@callable((Element, Element) -> Bool)
struct SortDescriptor<Element> {
    func perform(_ a: Element, _ b: Element) -> Bool { ... }
}

2.2) @callable(method)

@callable(map(_:))
struct KeyPath<Root, Value> {
    func map(_ root: Root) -> Value { ... }
}

@callable(orders(_:before:))
struct SortDescriptor<Element> {
    func orders(_ a: Element, before b: Element) -> Bool { ... }
}

At the call site:

For all of these examples, the call-site behavior of types expressing this behavior seems fairly reasonable:

\String.count("value")
// interpreted as \String.count.call(with: "value")

["a", "bb", "ccc"].map(\.count)
// interpreted as ["a", "bb", "ccc"].map(\.count.call(with:))

["a", "bb", "ccc"].sorted(by: SortDescriptor(...))
// interpreted as ["a", "bb", "ccc"].sorted(by: SortDescriptor(...).call(with:_:))

Of those, I personally like @callable(method) the most, since it very explicitly describes the transform that gets applied to the call-site.

1 Like

cc'ing @Chris_Lattner3 since he wrote the @dyanmicCallable proposal — is a related @callable attribute something you've considered?

I have had some thoughts about this after seeing it discussed during the @dynamicCallable proposal.

The version I have been thinking about (and actually have a draft pitch of here) is to permit instance methods named _ and have a type be a subtype of the types of all of the types of the instance methods it has named _.

Edit: I just noticed a thread explicitly about this appear, so I am reposting this here.

1 Like

It's directly copied out of a playground that's still a work in progress, so it's basically optimized for writing instead of reading. Good documentation and some carefully-chosen names would go a long way.

Given that KeyPaths can now automatically convert from a closure, it makes sense to offer the API as a closure instead so that consumers of this API can use both, since there are performance characteristics of using KeyPaths which can be important in some algorithms

by value: (Element) -> Value,
4 Likes