Sorting Collections with `map` closures and SortDescriptors

Ben_Cohen · September 21, 2018, 5:26pm

I agree – but the case was much stronger with toggle, because it's common in that case to have long chains to get to the boolean you want to toggle. I don't think it's quite the same here.

And unlike toggle, there is a loss of expressivity with a solution that just references the path to the element, not the comparison: you are now locked into only one comparison (less than), which then leads to the need for a descriptor type that can have a reversed or caseless property.

Maybe there are still neater solutions out there to be discovered that get the best of both worlds.

sveinhal · September 23, 2018, 12:04pm

I’ve used this operator in some projects. I’m not thrilled about it though.

prefix operator ^
prefix func ^<A, B> (operand: KeyPath<A, B>) -> (A) -> B {
  return { $0[keyPath: operand] }
}

let keyPath = \Person.name // KeyPath<Person, String>
let closure = ^keyPath     // (Person) -> String

let names = people.map(\.name)  // does not compile
let names = people.map(^\.name) // ["Ben", "Jerry", "Adam"]

AliSoftware · February 25, 2020, 7:13pm

Hello fellow Swifters!

What do you think about reviving that discussion?

I think that it would be valuable to introduce this in the stdlib now that we have KeyPaths, that SE-249 has landed, and that all of this would allow providing a nice KeyPath-friendly API for sorting

I've tried my take at what the API would like in modern Swift 5.2 here (gist here) if you want to take a look.

Having this in the stdlib would allow us to write things as beautiful as these:

// Sort by name, ascending
users.sorted(by: \.name)

// Sort by age, descending
users.sorted(by: \.age, .descending)
users.sorted(by: .descending(\.age))

// Sort by name ascending, then by age descending
users.sorted(by: .ascending(\.name), .descending(\.age))

What I like with this design is that it both allows a nice short call if we need the default ascending order and just a KeyPath, a more descriptive call site when we want to mix ascending and descending orders for various keys, and all that without polluting the global space with free functions.

Bonus: in my implementation I even took advantage of SE-0253 (callAsFunction) to also allow things like:

let sortByNameAndAge = SortDescriptor<User>(.ascending(\.name), .descending(\.age))
let sortedUsers = sortByNameAndAge(users)

cal · February 25, 2020, 7:16pm

I think the up-thread feedback from @Ben_Cohen and @jrose still applies here:

I think the ideal long-term solution would use varadic generics instead of having to introduce a type-erased SortDescriptor wrapper:

AliSoftware · February 25, 2020, 7:20pm

Sure, though correct me if I'm wrong, but from what I understood from the reactions to @Ben_Cohen 's feedback, we were still missing the ability to avoid repeating the property key.

In other words, the main concern we want to address here is to avoid having to repeat the property name in { $0.firstname < $1.firstname } closures that you currently have to pass to sorted, but instead just provide a KeyPath like \.name and avoid repeating the property, thus avoiding the potential for bugs.

I'd already be happy if we at least had a sort(KeyPath) and not just a sort((Element, Element) -> Bool)

suyashsrijan · February 25, 2020, 7:20pm

I think it might also be nice to be able to do something like this:

users.sorted(<, by: \.name)

AliSoftware · February 25, 2020, 7:24pm

I think the ideal long-term solution would use varadic generics instead of having to introduce a type-erased SortDescriptor wrapper:

@cal Also, note that in my implementation, the SortDescriptor type doesn't have to be type-erased, as it's only generic over the Collection.Element, not over the Part of the KeyPath<Whole, Part> used to build it.

Which is why I'm able –even in today's Swift without variadic generics– to pass an array of SortDescriptor<User> in my examples even if each one extract a different property type from the Element

cal · February 25, 2020, 7:26pm

We pitched this design in May 2019 and the core team decided not to move forward with it.

cal · February 25, 2020, 7:27pm

This is the definition of type-erasure (you're erasing <Whole, Part> to just <Whole>)

nnnnnnnn · February 25, 2020, 8:16pm

Thanks for bringing this up again, @AliSoftware! It would be fantastic to expand the standard library's capabilities in this area. There are several moving parts to consider in this area:

What does a SortDescriptor type provide that improves on using a simple (Element) -> T closure?
How does this expanded notion of sorting interact with the design of a potential SortedCollection protocol and future concrete sorted collections?
What kinds of optimizations are enabled by a proposed solution?
What about persistence? For example, if I'm displaying a spreadsheet, I might want to sort on multiple columns, but have that be independent of the way my data is stored.
What other sorting algorithms are we waiting for in the standard library? A partial sort, at minimum, would be valuable for finding the smallest or largest n items in a collection.
Are there language features that would improve the experience of a particular solution? We can use key-path expressions in place of closures now; a ExpressibleByKeyPathExpression protocol could make it possible to use those expressions to form SortDescriptors as well.
Finally, there's also some research to do, in looking at the way other libraries have solved this problem (like NSSortDescriptor, of course).

I think we'll need to explore each of these questions to know when we have a stdlib addition that hits the right notes. Much of that exploration has already taken place on these forums, so part of the work is bringing that together. Would you like to take a look at that?

AliSoftware · February 27, 2020, 5:12pm

Thanks for the answer @nnnnnnnn

"What does SortDescriptor provide that a (Element) -> T closure wouldn't

Mostly:

ability to reverse the sort (because T can be anything so unlike a Bool that you can just invert to revert the sort, if you pass a KeyPath (which would be turned into a closure as per SE-249) it's not enough to be able to revert the sort
ability to pass multiple sorting keys (KeyPaths) even if they are KeyPath to properties of different types

But tbh I would already be happy with the stdlib just having a sorted(by: (Element) -> T), using comparator: (T, T) -> Bool with comparator defaulting to < when T: Comparable. That would not support multiple-keys sorting, but that's a rather rare use case anyway so I'd be ok with this.

How does this expanded notion of sorting interact with the design of a potential SortedCollection protocol and future concrete sorted collections?

I missed that this was a thing in the works. Good point, if we consider having SortedCollection or other similar concepts in the future, let's wait for those to settle. Is there already some kind of Manifesto started about that?

What kinds of optimizations are enabled by a proposed solution?

I haven't done any in-depth analysis of performance optimization but since SortDescriptor ends up being just a wrapper to construct a type-erased closure from a KeyPath, I'm assuming that with optimization enabled on the compiler it wouldn't be less performant that building the closure yourself.

The main point of the proposal is not about improving performance of the sorting algorithm but to improve the API of Collection by avoiding useless typing in the most common case where you want to sort by a property of Element, while avoid the risk of error that arise when repeating that property twice in the closure for the current statu-quo

What about persistence? For example, if I'm displaying a spreadsheet, I might want to sort on multiple columns, but have that be independent of the way my data is stored.

Sorry but I don't see how that question fits with the original topic proposed here? This is a different problem and question, that actually already applies to using closures with the current sort(by:) API and is not specific to the new API with KeyPath or SortDescriptor, is it?

What other sorting algorithms are we waiting for in the standard library? A partial sort, at minimum, would be valuable for finding the smallest or largest n items in a collection.

I see your point, but I'm not sure this is completely relevant to the problem we're trying to solve with this sort(by: [SortDescriptor]) API here. Again, the point of the suggested new sort(by: [SortDescriptor]) API is to be a convenience entry point for the existing sort(by: (Element, Element) -> Bool) that allows to construct that closure more easily by passing one or multiple KeyPath (wrapped in a SortDescriptor… or not if we abandon the SortDescriptor idea to support multiple-keys sorting and go straight to KeyPath with single-property sorts)

In the end, I'd agree to drop the SortDescriptor idea given how it's controversial to add that new type in the stdlib. But would it be acceptable to at least add the one with KeyPath to the stdlib?

extension Collection {
  func sorted<T>(by keyPath: KeyPath<Element, T>, using: (T, T) -> Bool) -> [Element] {
    sort(by: { using($0[keyPath: keyPath], $1[keyPath: keyPath]) })
  }
  func sort<T: Comparable>(by keyPath: KeyPath<Element, T>) -> Element {
    sort(by: keyPath, using: <)
  }
}

jenox · February 27, 2020, 5:42pm

Personally I would prefer the closure version func sorted<T>(by transform: (Element) -> T, using areInIncreasingOrder: (T, T) -> Bool) you mentioned above instead of restricting ourselves to key paths, especially since key paths can be used as closures in Swift 5.2.

AliSoftware · February 27, 2020, 7:17pm

I like that too

Nevin · February 27, 2020, 8:12pm

For sorting with multiple predicates (sort by this, then by that, etc.), one option is a SortKey type with static .by() methods to enable code like this:

let list = people.sorted(
  .by(\.age, >),
  .by(\.name),
  .by(\.favoriteNumber, { abs($0 - .pi) < abs($1 - .pi) })
)

The implementation is pretty straightforward:

Basic SortKey type

struct SortKey<T> {
  var comparison: (T,T)->Bool
}

Static .by() methods

extension SortKey {
  // .by(\.key, >)
  static func by<U>(_ transform: @escaping (T)->U, _ areInIncreasingOrder: @escaping (U,U)->Bool) -> SortKey {
    return SortKey{ areInIncreasingOrder(transform($0), transform($1)) }
  }

  // .by(\.key)
  static func by<U: Comparable>(_ transform: @escaping (T)->U) -> SortKey {
    return SortKey.by(transform, <)
  }
  
  // .by(customComparison)
  static func by(_ areInIncreasingOrder: @escaping (T,T)->Bool) -> SortKey {
    return SortKey(comparison: areInIncreasingOrder)
  }
}

KeyPath versions for those not yet on Swift 5.2

extension SortKey {
  static func by<U>(_ path: KeyPath<T, U>, _ areInIncreasingOrder: @escaping (U,U)->Bool) -> SortKey {
    return SortKey.by({ $0[keyPath: path] }, areInIncreasingOrder)
  }
  
  static func by<U: Comparable>(_ path: KeyPath<T, U>) -> SortKey {
    return SortKey.by(path, <)
  }
}

Combining multiple SortKeys

extension SortKey {
  static func merging(_ keys: [SortKey]) -> SortKey {
    return SortKey {
      for key in keys {
        if key.comparison($0, $1) { return true }
        if key.comparison($1, $0) { return false }
      }
      return false
    }
  }
}

Sorting

extension Sequence {
  func sorted(byKeys keys: [SortKey<Element>]) -> [Element] {
    return self.sorted(by: SortKey.merging(keys).comparison)
  }
  
  func sorted(_ keys: SortKey<Element>...) -> [Element] {
    return self.sorted(byKeys: keys)
  }
}

extension MutableCollection where Self: RandomAccessCollection {
  mutating func sort(byKeys keys: [SortKey<Element>]) {
    sort(by: SortKey.merging(keys).comparison)
  }
  
  mutating func sort(_ keys: SortKey<Element>...) {
    sort(byKeys: keys)
  }
}

This is all possible today without any new language features (indeed the code here already works). It’s also possible to support throws as well, but not rethrows, so that’s a hypothetical future enhancement.

Nevin · February 28, 2020, 10:25pm

For single predicates, I think it’s important to be able to specify the order (if desired). That is, if we have dedicated functions for the simple case, then all of the following should work:

people.sorted(by: \.age)              // Youngest first
people.sorted(>, by: \.age)           // Oldest first
people.sorted{ abs($0.age - 30) }     // Closest to 30 first
people.sorted(>){ abs($0.age - 30) }  // Furthest from 30 first

Note that the sort order is the first argument, which both aligns with the existing sort functions, and also leaves the trailing closure position for the more-likely-to-be-customized transform.

Implementation

extension Sequence {
  func sorted<T>(_ areInIncreasingOrder: (T,T)throws->Bool, by transform: (Element)throws->T) rethrows -> [Element] {
    return try self.sorted{ try areInIncreasingOrder(transform($0), transform($1)) }
  }
  
  func sorted<T: Comparable>(by transform: (Element)throws->T) rethrows -> [Element] {
    return try self.sorted(<, by: transform)
  }
}

extension MutableCollection where Self: RandomAccessCollection {
  mutating func sort<T>(_ areInIncreasingOrder: (T,T)throws->Bool, by transform: (Element)throws->T) rethrows {
    try self.sort{ try areInIncreasingOrder(transform($0), transform($1)) }
  }
  
  mutating func sort<T: Comparable>(by transform: (Element)throws->T) rethrows {
    try self.sort(<, by: transform)
  }
}

KeyPath versions

extension Sequence {
  func sorted<T>(_ areInIncreasingOrder: (T,T)throws->Bool, by path: KeyPath<Element,T>) rethrows -> [Element] {
    return try self.sorted{ try areInIncreasingOrder($0[keyPath: path], $1[keyPath: path]) }
  }
  
  func sorted<T: Comparable>(by path: KeyPath<Element,T>) -> [Element] {
    return self.sorted(<, by: path)
  }
}

extension MutableCollection where Self: RandomAccessCollection {
  mutating func sort<T>(_ areInIncreasingOrder: (T,T)throws->Bool, by path: KeyPath<Element,T>) rethrows {
    try self.sort{ try areInIncreasingOrder($0[keyPath: path], $1[keyPath: path]) }
  }
  
  mutating func sort<T: Comparable>(by path: KeyPath<Element,T>) {
    self.sort(<, by: path)
  }
}