Sorting Collections with `map` closures and SortDescriptors

Yeah, doing this properly as a general language feature is similarly complicated, but just making tuples automatically conform to a few useful standard library protocols as a stopgap (e.g. Equatable, Comparable, Hashable) would be much easier and has been previously discussed.

Sure, but this goes back to the Schwartzian transform question where you concluded that this overhead was negligible for most cases because computed properties are generally expected to be O(1). Inlining might make this lazy in the usual case, anyway.

It's still an unfortunately long way from what I think it should be:

people.sorted(by: { ($0.name, $0.age, $0.height, $0.salary) })

I don't see the Element -> Comparable version as leading to an explosion in the ways of sorting like @Ben_Cohen does, and I don't see it as a negative user experience to provide a method that handles the most common use cases for providing a sorting predicate, while also avoiding repetition and not requiring them to think about (or look up) whether the comparison in the predicate should be <, <=, >, or >=.

In describing a sort by a property (a la people.sorted(by: { $0.name })), the layer of convenience performs one task: the transformation of a property access of type (Element) -> SortKey to a comparison function of type (Element, Element) -> Bool.

Here's another way of looking at the same transformation: given a function of type (SortKey, SortKey) -> Bool (e.g. <) and a function of type (Element) -> SortKey (e.g. { $0.name }), we can use the latter to lift the former into a function of type (Element, Element) -> Bool (in this case, { $0.name < $1.name }).

We can write a function, which I'll call their for readability purposes made clear in a moment, which performs this lift:

func their<Input, Intermediate, Output>(
    _ extractIntermediate: @escaping (Input) -> Intermediate,
    _ combine: @escaping (Intermediate, Intermediate) -> Output
) -> (Input, Input) -> Output {
    return { input1, input2 in
        combine(extractIntermediate(input1), extractIntermediate(input2))
    }
}

Then, using only the existing sorting API, we can write the following:

people.sorted(by: their({ $0.name }, <)) // equivalent to `{ $0.name < $1.name }`
people.sorted(by: their({ $0.name }, >)) // equivalent to `{ $0.name > $1.name }`

You can picture how much prettier the above syntax would become with the appropriate subtyping relationship between KeyPath<T, U> and (T) -> U:

people.sorted(by: their(\.name, <))

There are additional gains to writing the function in this way, however. Notice that their isn't restricted to Comparable SortKey types, which means we can make use of tuple comparisons even without tuple protocol conformances:

people.sorted(by: their({ ($0.name, $0.age, $0.height, $0.salary) }, <))

Tuple comparison only covers the case when all desired properties are sorted by the same comparison function—Ben's proposed API utilizing a Sequence of comparison functions fills the gap here, and it works with their, too:

people.sorted(by: [
    their({ $0.name }, <),
    their({ $0.age }), >)
])

Finally, I'll note that this function generalizes beyond sorting. Here a couple additional examples:

let oldestPerson = people.max(by: their({ $0.age }, <))
// `their` isn't tied to functions returning `Bool`, either:
let sequentialRunTimeDeltas = zip(firstRuns, secondRuns).map(their({ $0.duration }, -)) 

Whether their is appropriate for this particular proposal, I'm not sure—but I would be remiss not to mention it for its utility in creating comparison functions without significantly expanding API surface.

A quick note on prior art

Haskell's Data.Function package defines on, which you can see here. on is basically an infix version of their which better lends to readability given the names of functions with which it's commonly used in Haskell.
I'd imagine other functional languages define a similar function, though I haven't done much digging here.

6 Likes

That's all true. By the same token, though, we don't need the sequence version of the sort functions because we could just pass an sequence of sort predicates to a function that combines them into a single sort predicate.

Edit: e.g. roughly like

func combineComparisons<S: Sequence, T>(_ comparisons: S) -> (T, T) -> Bool where S.Element == (T, T) -> Bool {
  return { t1, t2 in
    for c in comparisons {
      if c(t1, t2) { return true }
      if c(t2, t1) { return false }
    }
    return false
  }
}

That's probably not my position though, because I think it's nice to provide a few conveniences that don't require you to know about a bunch of free functions.

1 Like

You do have a great point here, with respect to generalizing to use-cases other than just sort and sorted. It's nice that max and min get the benefits of a design like this effectively for-free.

I have some thoughts, though:

  • I would want to see an overload like func their<Input, SortKey: Comparable>(_ sortKey: SortKey) -> (Input, Input) -> Bool that provides a sensible default for Comparable sort keys.

  • There's precedent for this approach in Haskell, but that's a very composition-heavy language. Are there any examples of a composition-approach like this in the Standard Library?

  • I'm not a huge fan of free-floating global functions — they're tough to find if you don't already know about them.

  • their({ $0.age }) isn't that much shorter than { $0.age < $1.age }

1 Like

One problem with the $0.age < $1.age syntax is that it repeats the "age", which leads to copy-and-paste bugs. It's also kind of a pain for longer property names.

To take a wild guess, I think a series of key paths / key functions would handle 90% of the custom sorts that are out there today, i.e. only 1 in every 10 custom sorts would need to fall back to the general comparator closure form. Having the sort-by-key form, even a single key, is something I always appreciated in Python and Ruby. We might not have the language tools to do it yet today, though, and I wouldn't want to jam something in to work around that.

6 Likes
  • I would want to see an overload like func their<Input, SortKey: Comparable>(_ sortKey: SortKey) -> (Input, Input) -> Bool that provides a sensible default for Comparable sort keys.

The challenge with providing an overload with a default second argument is the conflation of their with comparison functions specifically. Lifting a function of one argument onto a function of two arguments is a general operation without ties to concrete types. You can imagine something like the following for a single-argument their function:

sequenceOfUserTuples.map(their { $0.age })

I would expect an array of tuples containing the age corresponding to each user in each tuple of the previous sequence rather than an array of Bool values. This intuition would (to me) suggest a default argument of the identity function for the second parameter—and at that point we've just defined map on the homogeneous two-tuple.

I think a better spelling for a single-argument version of their specialized for comparison functions would be the lesser(of:) and greater(of:) functions that Ben described above.

  • There's precedent for this approach in Haskell, but that's a very composition-heavy language. Are there any examples of a composition-approach like this in the Standard Library?
  • I'm not a huge fan of free-floating global functions — they're tough to find if you don't already know about them.

I think their would be a great fit for a non-standard library geared toward functional utilities, but the non-standard library movement hasn't gained much traction. You're right that it doesn't align well with the Standard Library, which (at least thus far) hasn't supported abstractions at this level, which really only make sense as free functions—but I felt it was worth a mention in this thread for its applicability in this domain.

  • their({ $0.age }) isn't that much shorter than { $0.age < $1.age }

Jordan just touched on this, but the goal here isn't to reduce the number of characters; it's to reduce repetition. The arguments here aren't specific to their and can be made analagously to those proposed with Bool.toggle() with respect to the value of removing repetition on opposite sides of an operator.

I agree – but the case was much stronger with toggle, because it's common in that case to have long chains to get to the boolean you want to toggle. I don't think it's quite the same here.

And unlike toggle, there is a loss of expressivity with a solution that just references the path to the element, not the comparison: you are now locked into only one comparison (less than), which then leads to the need for a descriptor type that can have a reversed or caseless property.

Maybe there are still neater solutions out there to be discovered that get the best of both worlds.

3 Likes

I’ve used this operator in some projects. I’m not thrilled about it though.

prefix operator ^
prefix func ^<A, B> (operand: KeyPath<A, B>) -> (A) -> B {
  return { $0[keyPath: operand] }
}

let keyPath = \Person.name // KeyPath<Person, String>
let closure = ^keyPath     // (Person) -> String

let names = people.map(\.name)  // does not compile
let names = people.map(^\.name) // ["Ben", "Jerry", "Adam"]
2 Likes

Hello fellow Swifters!

What do you think about reviving that discussion?

I think that it would be valuable to introduce this in the stdlib now that we have KeyPaths, that SE-249 has landed, and that all of this would allow providing a nice KeyPath-friendly API for sorting

I've tried my take at what the API would like in modern Swift 5.2 here (gist here) if you want to take a look.


Having this in the stdlib would allow us to write things as beautiful as these:

// Sort by name, ascending
users.sorted(by: \.name)

// Sort by age, descending
users.sorted(by: \.age, .descending)
users.sorted(by: .descending(\.age))

// Sort by name ascending, then by age descending
users.sorted(by: .ascending(\.name), .descending(\.age))

What I like with this design is that it both allows a nice short call if we need the default ascending order and just a KeyPath, a more descriptive call site when we want to mix ascending and descending orders for various keys, and all that without polluting the global space with free functions.

Bonus: in my implementation I even took advantage of SE-0253 (callAsFunction) to also allow things like:

let sortByNameAndAge = SortDescriptor<User>(.ascending(\.name), .descending(\.age))
let sortedUsers = sortByNameAndAge(users)
2 Likes

I think the up-thread feedback from @Ben_Cohen and @jrose still applies here:

I think the ideal long-term solution would use varadic generics instead of having to introduce a type-erased SortDescriptor wrapper:

Sure, though correct me if I'm wrong, but from what I understood from the reactions to @Ben_Cohen 's feedback, we were still missing the ability to avoid repeating the property key.

In other words, the main concern we want to address here is to avoid having to repeat the property name in { $0.firstname < $1.firstname } closures that you currently have to pass to sorted, but instead just provide a KeyPath like \.name and avoid repeating the property, thus avoiding the potential for bugs.

I'd already be happy if we at least had a sort(KeyPath) and not just a sort((Element, Element) -> Bool)

I think it might also be nice to be able to do something like this:

users.sorted(<, by: \.name)
1 Like

I think the ideal long-term solution would use varadic generics instead of having to introduce a type-erased SortDescriptor wrapper:

@cal Also, note that in my implementation, the SortDescriptor type doesn't have to be type-erased, as it's only generic over the Collection.Element, not over the Part of the KeyPath<Whole, Part> used to build it.

Which is why I'm able –even in today's Swift without variadic generics– to pass an array of SortDescriptor<User> in my examples even if each one extract a different property type from the Element :slight_smile:

We pitched this design in May 2019 and the core team decided not to move forward with it.

2 Likes

This is the definition of type-erasure (you're erasing <Whole, Part> to just <Whole>)

Thanks for bringing this up again, @AliSoftware! It would be fantastic to expand the standard library's capabilities in this area. There are several moving parts to consider in this area:

  • What does a SortDescriptor type provide that improves on using a simple (Element) -> T closure?
  • How does this expanded notion of sorting interact with the design of a potential SortedCollection protocol and future concrete sorted collections?
  • What kinds of optimizations are enabled by a proposed solution?
  • What about persistence? For example, if I'm displaying a spreadsheet, I might want to sort on multiple columns, but have that be independent of the way my data is stored.
  • What other sorting algorithms are we waiting for in the standard library? A partial sort, at minimum, would be valuable for finding the smallest or largest n items in a collection.
  • Are there language features that would improve the experience of a particular solution? We can use key-path expressions in place of closures now; a ExpressibleByKeyPathExpression protocol could make it possible to use those expressions to form SortDescriptors as well.
  • Finally, there's also some research to do, in looking at the way other libraries have solved this problem (like NSSortDescriptor, of course).

I think we'll need to explore each of these questions to know when we have a stdlib addition that hits the right notes. Much of that exploration has already taken place on these forums, so part of the work is bringing that together. Would you like to take a look at that?

2 Likes

Thanks for the answer @nnnnnnnn

"What does SortDescriptor provide that a (Element) -> T closure wouldn't

Mostly:

  1. ability to reverse the sort (because T can be anything so unlike a Bool that you can just invert to revert the sort, if you pass a KeyPath (which would be turned into a closure as per SE-249) it's not enough to be able to revert the sort
  2. ability to pass multiple sorting keys (KeyPaths) even if they are KeyPath to properties of different types

But tbh I would already be happy with the stdlib just having a sorted(by: (Element) -> T), using comparator: (T, T) -> Bool with comparator defaulting to < when T: Comparable. That would not support multiple-keys sorting, but that's a rather rare use case anyway so I'd be ok with this.

  • How does this expanded notion of sorting interact with the design of a potential SortedCollection protocol and future concrete sorted collections?

I missed that this was a thing in the works. Good point, if we consider having SortedCollection or other similar concepts in the future, let's wait for those to settle. Is there already some kind of Manifesto started about that?

What kinds of optimizations are enabled by a proposed solution?

I haven't done any in-depth analysis of performance optimization but since SortDescriptor ends up being just a wrapper to construct a type-erased closure from a KeyPath, I'm assuming that with optimization enabled on the compiler it wouldn't be less performant that building the closure yourself.

The main point of the proposal is not about improving performance of the sorting algorithm but to improve the API of Collection by avoiding useless typing in the most common case where you want to sort by a property of Element, while avoid the risk of error that arise when repeating that property twice in the closure for the current statu-quo

  • What about persistence? For example, if I'm displaying a spreadsheet, I might want to sort on multiple columns, but have that be independent of the way my data is stored.

Sorry but I don't see how that question fits with the original topic proposed here? This is a different problem and question, that actually already applies to using closures with the current sort(by:) API and is not specific to the new API with KeyPath or SortDescriptor, is it?

  • What other sorting algorithms are we waiting for in the standard library? A partial sort, at minimum, would be valuable for finding the smallest or largest n items in a collection.

I see your point, but I'm not sure this is completely relevant to the problem we're trying to solve with this sort(by: [SortDescriptor]) API here. Again, the point of the suggested new sort(by: [SortDescriptor]) API is to be a convenience entry point for the existing sort(by: (Element, Element) -> Bool) that allows to construct that closure more easily by passing one or multiple KeyPath (wrapped in a SortDescriptor… or not if we abandon the SortDescriptor idea to support multiple-keys sorting and go straight to KeyPath with single-property sorts)


In the end, I'd agree to drop the SortDescriptor idea given how it's controversial to add that new type in the stdlib. But would it be acceptable to at least add the one with KeyPath to the stdlib?

extension Collection {
  func sorted<T>(by keyPath: KeyPath<Element, T>, using: (T, T) -> Bool) -> [Element] {
    sort(by: { using($0[keyPath: keyPath], $1[keyPath: keyPath]) })
  }
  func sort<T: Comparable>(by keyPath: KeyPath<Element, T>) -> Element {
    sort(by: keyPath, using: <)
  }
}

Personally I would prefer the closure version func sorted<T>(by transform: (Element) -> T, using areInIncreasingOrder: (T, T) -> Bool) you mentioned above instead of restricting ourselves to key paths, especially since key paths can be used as closures in Swift 5.2.

I like that too :+1:

For sorting with multiple predicates (sort by this, then by that, etc.), one option is a SortKey type with static .by() methods to enable code like this:

let list = people.sorted(
  .by(\.age, >),
  .by(\.name),
  .by(\.favoriteNumber, { abs($0 - .pi) < abs($1 - .pi) })
)

The implementation is pretty straightforward:

Basic SortKey type
struct SortKey<T> {
  var comparison: (T,T)->Bool
}
Static .by() methods
extension SortKey {
  // .by(\.key, >)
  static func by<U>(_ transform: @escaping (T)->U, _ areInIncreasingOrder: @escaping (U,U)->Bool) -> SortKey {
    return SortKey{ areInIncreasingOrder(transform($0), transform($1)) }
  }

  // .by(\.key)
  static func by<U: Comparable>(_ transform: @escaping (T)->U) -> SortKey {
    return SortKey.by(transform, <)
  }
  
  // .by(customComparison)
  static func by(_ areInIncreasingOrder: @escaping (T,T)->Bool) -> SortKey {
    return SortKey(comparison: areInIncreasingOrder)
  }
}
KeyPath versions for those not yet on Swift 5.2
extension SortKey {
  static func by<U>(_ path: KeyPath<T, U>, _ areInIncreasingOrder: @escaping (U,U)->Bool) -> SortKey {
    return SortKey.by({ $0[keyPath: path] }, areInIncreasingOrder)
  }
  
  static func by<U: Comparable>(_ path: KeyPath<T, U>) -> SortKey {
    return SortKey.by(path, <)
  }
}
Combining multiple SortKeys
extension SortKey {
  static func merging(_ keys: [SortKey]) -> SortKey {
    return SortKey {
      for key in keys {
        if key.comparison($0, $1) { return true }
        if key.comparison($1, $0) { return false }
      }
      return false
    }
  }
}
Sorting
extension Sequence {
  func sorted(byKeys keys: [SortKey<Element>]) -> [Element] {
    return self.sorted(by: SortKey.merging(keys).comparison)
  }
  
  func sorted(_ keys: SortKey<Element>...) -> [Element] {
    return self.sorted(byKeys: keys)
  }
}

extension MutableCollection where Self: RandomAccessCollection {
  mutating func sort(byKeys keys: [SortKey<Element>]) {
    sort(by: SortKey.merging(keys).comparison)
  }
  
  mutating func sort(_ keys: SortKey<Element>...) {
    sort(byKeys: keys)
  }
}

This is all possible today without any new language features (indeed the code here already works). It’s also possible to support throws as well, but not rethrows, so that’s a hypothetical future enhancement.

4 Likes
Terms of Service

Privacy Policy

Cookie Policy