Map Sorting

anthonylatsis · March 9, 2019, 7:32pm

Map Sorting

Authors: Anthony Latsis, Cal Stephens
Status: Awaiting Review
Implementation: apple/swift#23156

Introduction

This proposal presents an addition to the Standard Library that makes it easy to sort a collection over some set of mapped values, provided via a transform closure or KeyPath, in a way that is ergonomic, idiomatic, and performant.

Motivation

To sort a Collection, you're currently required to specify an areInIncreasingOrder predicate of the form (Element, Element) -> Bool. If you're sorting on the Element itself, you can use simple comparators like < and >.

Sorting a Collection on some property or value derived from the Element is more complex, and currently requires specifying a closure that welds together both the accessing and comparison of values.

struct Person {
  ...
  var age: Int
}

struct Chat {
  ...
  var lastMessage: Message
}

var people: [Person] = ...
var chats: [Chat] = ...

people.sort { $0.age < $1.age }
chats.sort { $0.lastMessage.date > $1.lastMessage.date }

In many circumstances, this approach can cause issues:

The $0.property < $1.property syntax often leads to copy-and-paste bugs.
For long property names or complicated multi-line predicates, this syntax can be especially verbose since it requires duplicating the logic for retrieving the values.
When the values are expensive to retrieve or calculate, this type of predicate becomes an obstacle for optimizations. It may be desirable to trade memory for speed such that each value is computed once per element rather than O(n logn) times. For an ϴ(n) operation, this optimization can theoretically speed up sorting by a factor of 10 for an array of 1000 elements. This is called the Schwartzian Transform.

Thereby, the goal of this proposal is to introduce an API that decouples the comparison of values from their retrieval, bringing ergonomic benefits for an ample range of cases, and opening the door for new optimizations.

Proposed solution

The authors propose to add an overload for both the nonmutating sorted and in-place sort methods on Sequence and MutableCollection respectively. A mapping closure (Element) -> Value will lead the argument list, followed by the well known areInIncreasingOrder predicate of the type (Value, Value) -> Bool. Additionally, we propose a flag for opting in to the Schwartzian Transform optimization.

Here are some example usages:

people.sort(on: { $0.age }, by: <)
chats.sort(on: { $0.lastMessage.date }, by: >)

fileUnits.sort(
  on: { $0.raw.count(of: "func") },
  by: <,
  isExpensiveTransform: true
)

Following the acceptance of SE-0249, the examples above can also be written with a key-path:

people.sort(on: \.age, by: <)
chats.sort(on: \.lastMessage.date, by: >)

Implementation

extension Sequence {

  @inlinable
  public func sorted<Value>(
    on transform: (Element) throws -> Value,
    by areInIncreasingOrder: (Value, Value) throws -> Bool,
    isExpensiveTransform: Bool = false
  ) rethrows -> [Element] {
    guard isExpensiveTransform else {
      return try sorted {
        try areInIncreasingOrder(transform($0), transform($1))
      }
    }
    var pairs = try map {
      try (element: $0, value: transform($0))
    }
    try pairs.sort {
      try areInIncreasingOrder($0.value, $1.value)
    }

    return pairs.map { $0.element }
  }
}

extension MutableCollection where Self: RandomAccessCollection {

  @inlinable
  public mutating func sort<Value>(
    on transform: (Element) throws -> Value,
    by areInIncreasingOrder: (Value, Value) throws -> Bool,
    isExpensiveTransform: Bool = false
  ) rethrows {
    guard isExpensiveTransform else {
      return try sort {
        try areInIncreasingOrder(transform($0), transform($1))
      }
    }
    var pairs = try map {
      try (element: $0, value: transform($0))
    }
    try pairs.sort {
      try areInIncreasingOrder($0.value, $1.value)
    }

    for (i, j) in zip(indices, pairs.indices) {
      self[i] = pairs[j].element
    }
  }
}

Source compatibility & ABI stability

This is an ABI-compatible addition with no impact on source compatibility.

Considerations

Provide `<` as the default predicate when `Value: Comparable`

A practically useful complement to the API in question would be a version of sort(on:) and sorted(on:) where < is provided as the default sorting predicate for Comparable values:

people.sort(on: { $0.age })
people.sort(on: \.age)

This follows the precedent set by the existing sort(by:) and sorted(by:) methods. We chose not to include these additional overloads in the main proposal, but expect them to be considered during review.

Alternatives considered

Argument labels

`people.sort(over: { $0.age }, by: <)`

over has more of a mathematical flavor to it, where the transform argument is read as a set of values with an injective (one-to-one) relation to the elements of the sequence. For that matter, «map sorting» can be thought of as sorting the set of transformed values and permuting the elements of the original sequence accordingly. While this variant emphasizes the strong correlation of mathematics and computer science, Swift as a multi-paradigm language should strive to settle with generally understandable names that, ideally, can be easily recognized regardless of the user's background.

`people.sort(by: { $0.age }, using: <)`

The by label is a perfect candidate to describe the metric used to sort the sequence. using, on its turn, is just as fitting for a predicate or an operator. The pair in question is perhaps the only one that always boils down to a proper
sentence - «Sort(ed) by a metric using a predicate». Nevertheless, the author is convinced in the superior importance of preserving API uniformity and consistency with existing API the Standard Library developers have worked so hard to keep. With ABI Stability kicking in, we no longer have the opportunity for amendments to public API signatures and must be especially careful in this regard.

Argument order

Before adding the isExpensiveTransform flag, it was discussed that one could rearrange the arguments such that the transform closure follows the areInInreasingOrder predicate. That would have allowed the caller to make use of trailing-closure syntax:

people.sort(by: <) { $0.age }

The authors concluded that transform should be positioned before the areInIncreasingOrder predicate. This better mirrors the flow of data, where the transform closure is always called before the areInIncreasingOrder predicate (Element) -> (Value), (Value, Value) -> (Bool).

`isExpensiveTransform`

The isExpensiveTransform is an optional flag included in the proposed method so that the caller may opt-in to using the Schwartzian Transform. The authors think that this optimization is useful enough to be worth including as a part of the proposal. Since it is an optional parameter (defaulting to false), it could alternatively be excluded. Callers seeking this sort of optimization would otherwise have to use a more complex pattern that is both easy to get wrong and generally less efficient than the implementation provided in this proposal.

array.map { ($0, $0.count) }
    .sorted(by: { $0.1 })
    .map { $0.0 }

cal · March 10, 2019, 12:24am

This looks great! Glad to see somebody working in this area again. I've been hoping to come back and take another pass at this topic for a while. Overall I think this is a well-scoped proposal! It's definitely a welcome addition to the Standard Library in my book.

Some feedback about the proposed changes:

I find it a bit unusual for the (Value, Value) -> Bool comparator to come before the (Element) -> Value closure. In the logical flow of the algorithm itself, the transformation from Element to Value comes before comparing the Values, so I think it should follow that the (Element) -> Value closure should come before the (Value, Value) -> Element comparator in the parameter list.
I think that, for consistency with the existing sort(by: (Element, Element) -> Bool) spelling, that this new overload should be spelled sort(using: (Element) -> Value, by: (Value, Value) -> Bool). It seems desirable to make sure that "by:" refers to the same type of comparator closure in both of the two sort overloads.
I think it would be great to include the sort<Value: Comparable>(by: (Element) -> Value) overload in this proposal as well. There's already precedent in the Standard Library for defaulting to sort(by: <) when Element : Comparable (example: [3, 2, 7].sort() -> [2, 3, 7]).

Some meta-feedback on the proposal document itself:

As written, the proposal feels a bit too centered around the potential future KeyPath-based syntax. For example, you show the KeyPath syntax three times, and only show the actual proposed closure syntax once. Additionally, the KeyPath syntax is shown before the closure syntax that is actually being proposed in this document. I think sorting by a KeyPath is a desirable end-goal, but it probably belongs on the Future Works section. (The ideal future would probably be a world where KeyPath<Root, Value> can serve as a stand-in replacement for (Root) -> Value. Then we would get KeyPath support in map, filter, and sort all at once.)
It also may good to add a mention the Schwartzian transform in the Alternatives Considered section. I don't think we should be using that transform by default in this implementation, since I assume most people would be sorting over normal property accessors (it seems unwise to sort over an expensive computed property). The API could potentially include something like sort(using:...by:...precomputeKeys:Bool) or even sort(using:...by:...usingSchwatzianTransform:Bool), but I think that it should be omitted for the sake of simplicity.

I'd be happy to do whatever I can to help get this proposal review-ready! I've spent a lot of time thinking about this topic, and I'm very interested in seeing it get across the finish line

anthonylatsis · March 10, 2019, 1:26am

Hi Cal!

In Swift, it is a general style convention that a closure in an argument list should go last. This way trailing closure syntax is left to the user's discretion. I mentioned this decision was based on how often we need an actual braced closure for each argument. Unless we have key-path conversions, the answer for the transform closure is - always. The predicate, however, is predominantly < or >. Which one do you prefer?

chats.sort(by: { $0.lastMsg.date }, using: <)
chats.sort(using: <) { $0.lastMsg.date }

Regardless, the crucial difference is that the first example happens to enforce style. If that were not the case though, I mostly agree. On the other hand, arguments, mathematically and like words in a sentence, can almost always be freely permuted without distorting the semantics of a function call. In fact, some languages let the user determine the order of arguments at call site for themselves; something Swift might have to consider at some point.

It's worth noting that the existing sort takes a single argument, that is, there is no question about which argument a particular label suits best. To me, the first variant makes the most sense, but these design subtleties are of course open to discussion.

chats.sort(using: <, by: $0.lastMsg.date })
chats.sort(by: <, using: { $0.lastMsg.date })

xwu · March 10, 2019, 2:25am

I'll throw in one more option:

sort(on: \.age, by: <)

Then, the existing overload functions almost like a default value:

// Equivalent:
sort(on: \.self, by: <)
sort(by: <)

And as you mention we can default the other one too for 'Comparable' types:

// Equivalent:
sort(on: \.age, by: <)
sort(on: \.age)

// Equivalent:
sort(on: \.self, by: <)
sort(by: <)
sort(on: \.self)
sort()

aemino · March 10, 2019, 2:56am

This is a cool idea! One nit, though: areInIncreasingOrder seems like an odd name for the comparator function.

Also...

I'm not sure how much sense it makes to pick a specific operator as a default value for sorting; in the case of numbers, why should either ascending/descending order be the default over the other?

xwu · March 10, 2019, 3:45am

The naming of the comparator function is not part of this pitch. That issue was decided in 2016 as part of SE-0118, and is already part of the standard library (since Swift 3).

That issue is also not part of this pitch. The default sort order is already part of the standard library (and has been since before Swift 3).

aemino · March 10, 2019, 3:46am

Huh, I find that a bit unfortunate -- but good to know, thanks!

DevAndArtist · March 10, 2019, 6:34am

Would you say that the Swift‘s stdlib has to follow it‘s mistakes in its further evolution and use the same strange combination for all new yet similar algorithms? What me concens the most is not the name itself, but that it‘s prefixed with by which in conjunction reads aweful from a non-native English speaker perspective. Sure that the language should not read like a book, but to me this particular case seems to be broken and could be improved at least in new API.

Morten_Bek_Ditlevsen · March 10, 2019, 6:36am

Could we imagine overloads for secondary and tertiary keypath/sort direction pairs?
I can’t really think of a nice api for the overloads, but I could definitely imagine it being useful.

Morten_Bek_Ditlevsen · March 10, 2019, 7:07am

Perhaps an optional final parameter that is a variadic list of pairs of ‘by’ and ‘using’?

hartbit · March 10, 2019, 8:08am

cal:

I find it a bit unusual for the (Value, Value) -> Bool comparator to come before the (Element) -> Value closure. In the logical flow of the algorithm itself, the transformation from Element to Value comes before comparing the Value s, so I think it should follow that the (Element) -> Value closure should come before the (Value, Value) -> Element comparator in the parameter list.

I think that, for consistency with the existing sort(by: (Element, Element) -> Bool) spelling, that this new overload should be spelled sort(using: (Element) -> Value, by: (Value, Value) -> Bool) . It seems desirable to make sure that " by: " refers to the same type of comparator closure in both of the two sort overloads.

I think it would be great to include the sort<Value: Comparable>(by: (Element) -> Value) overload in this proposal as well. There's already precedent in the Standard Library for defaulting to sort(by: <) when Element : Comparable (example: [3, 2, 7].sort() -> [2, 3, 7] ).

I wholeheartedly agree with all of @cal's points. Otherwise, great proposal!

cal · March 10, 2019, 11:58am

There was an extensive discussion about this type of sorting in this thread from September. It's a pretty big design space, made even more complicated by the fact that we don't have variadic generics yet. It's on the roadmap, but I haven't seen any indication that it's coming any time soon.

But that being said, it may be wise to design this proposal in such a way that the (Element) -> Comparable transformation may one day be replaced with a ((Element) -> some variadic Comparable)... (that is, a variadic parameter of transformations).

This isn't strictly possible with the sort(using: (Element) -> Value, by: (Value, Value) -> Bool), because you need both the transformation and areInIncreasingOrder closures in order to complete each step in the sort process. While we could instead have the function take a tuple of those two closures (which may then be made variadic in the distance future), I think it's unwise to make the simple case more complex for the sake of some future possibility that may never come.

For that reason, it may make sense to propose func sort<Value: Comparable>(by transformation: (Element) -> Value) so that it may one day be swapped out for the variadic equivalent func sort<variadic Value: Comparable>(by transformation: ((Element) -> variadic Value)...)

I like a comment @jrose made about this in the previous thread:

Since this is proposal is ultimately about simplifying sort ergonomics, we may be better off keeping the new overload as simple as possible (i.e. just func sort<Value: Comparable>(by transformation: (Element) -> Value) for now) rather than try and build in a more complex system that tries to handle all of the less-common methods of sorting. If we make sorting easier for 90% of use cases, and leave the status-quo untouched for the other 10% of cases, then I think we've still done our job.

cal · March 10, 2019, 11:59am

Another thought worth considering is an overload like this variant from the previous thread is already possible to write today, without variadic generics:

func sort(by areInIncreasingOrder: ((Element, Element) -> Bool)...) { ... } 

people.sort(by:
    { $0.name < $1.name },
    { $0.age < $1.age },
    { $0.height < $1.height },
    { $0.salary < $1.salary })

I'm not sure where that fits in to the big picture for this proposal, but I do think it's a very noteworthy potential syntax for this topic:

anthonylatsis · March 10, 2019, 12:07pm

Swift conventions apart for a moment, note that developers will not be able to opt-in for trailing closure syntax if the transform argument ends up first:

collection.sort(arg1: { $0.metric }, arg2: <)

people.sort(arg1: {
  let calendar = Calendar.current
  return calendar.component(.day, for: $0.date)
  // Or some other more complex calculation that requires you
  // to scroll down to find out >. A user might not expect it to be
  // there as well.
}, arg2: >)

Morten_Bek_Ditlevsen · March 10, 2019, 12:13pm

Ah, of course. I didn’t think this through enough to realize that variadic generics is a prerequisite. Good to see that this topic has renewed interest!

And I definitely agree that this first step is a very welcome addition.

cal · March 10, 2019, 12:31pm

Both of the arguments are closures. If you were to use a more complex areInIncreasingOrder predicate, you would get

users.sorted(using: {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}, by: { $0.name })

or

users.sorted(by: { $0.name }) {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}

Since both of the parameters are closures and can be arbitrarily complex, I personally don't believe one takes precedent over the other with respect to trailing closure syntax. I recognize that < would be by far the most common value for the areInIncreasingOrder closure, but if we provided it as the default value when Value is Comparable then this issue would resolve itself for the most part. You could omit the by: parameter, meaning you'd be able to use the trailing closure syntax for (Element) -> Value if you desired.

Outside of trailing-closure considerations, It's actually quite difficult to write the (Value, Value) -> Bool closure before writing the (Element) -> Value closure. Since you haven't yet told the compiler the type of Value, you don't get any autocomplete or type checking at all in the (Value, Value) -> Bool closure until after you fill out the (Element) -> Value.

users.sorted(using: {
	$0... // At this moment, $0 doesn't have a type yet. Neither autocomplete nor type checking are available. 
}, by: <#T##(User) -> Value#>)

where <#T##(User) -> Value#> is the text representation of Xcode's editor placeholder.

In that sense the type of the (Value, Value) -> Bool closure is directly dependent on the final type of the (Element) -> Value closure (which is usually inferred indirectly by the compiler). Since one parameter has a clear Type Dependency on the other, the parameters should be ordered accordingly.

anthonylatsis · March 10, 2019, 12:58pm

Good call. > (not <) is still far more common than a custom predicate, so this makes it a double-edged sword; argument permutability would be very useful here. I'm not sure how I feel about having the arguments the other way around, since a decent part of my motivation was driven by convenience.

anthonylatsis · March 10, 2019, 1:03pm

If we do look forward to key-paths though, a third convenience method pretty much solves the issue.

cal · March 10, 2019, 2:27pm

I think it may be useful to have real-world statistics on how often < , > , and other more complicated sorting constructs ( localizedCaseInsensitiveCompare == .orderedAscending is my go-to "complex" example) actually get used in the wild.

In the Swift Source Compatibility Suite

I identified 243 instances of sort / sorted in the swift-source-compat-suite. Of these 243, 80% (194) used < either as their comparator or as the primary operation in their comparator closure. 12% (30) used >. The remaining 8% of sorts (19) mostly checked for == .orderedAscending. Only 1-2% of the sorts in the compatibility suite used some comparison operator other than <, > or == .orderedAscending.

54% (132) of the sorts provided a custom areInIncreasingOrder closure. Of those 132 closures, 53% (71) took the form { $0.someProperty < $1.someProperty } and 10% (14) took the form { $0.someProperty > $1.someProperty }.

Potential Impact of this Proposal

In all, 30% of the sorts I identified in the Source Compatibility Suite took the form { $0.someProperty < $1.someProperty }. Over half of the sorts that provided a custom areInIncreasingOrder closure took that form as well.

With these overloads below, 30% of all sorts in the compatibility suite (and the majority of sorts that provided a areInIncreasingOrder closure at all) could be improved from sorted { $0.someProperty < $1.someProperty } to sorted { $0.someProperty }.

5% could be improved from sorted { $0.someProperty > $1.someProperty } to sorted(using: { $0.someProperty }, by: >)

extension Sequence {
    func sorted<Value>(using transform: (Element) -> Value, by: (Value, Value) -> Bool) -> [Element] {
        ...
    }
    
    func sorted<Value: Comparable>(using transform: (Element) -> Value) -> [Element] {
        return self.sorted(using: transform, by: <)
    }
}

// + corresponding `sort` overloads

My raw notes:

.sorted(): 62
.sort(): 7
.sorted(by: <): 30
.sort(by: <): 0

.sorted(by: >): 12
.sort(by: >): 0

.sorted {
total: 87
block with <: 67
block with >: 8
{ $0.someProperty < $1.someProperty }: 48
{ $0.someProperty > $1.someProperty }: 6

.sorted(by: {
total: 36
block with <: 24
block with >: 8
{ $0.someProperty < $1.someProperty }: 19
{ $0.someProperty > $1.someProperty }: 6

.sort {
total: 8
block with <: 3
block with >: 2
{ $0.someProperty < $1.someProperty }: 3
{ $0.someProperty > $1.someProperty }: 2

.sort(by: {
total: 1
block with <: 1
block with >: 0
{ $0.someProperty < $1.someProperty }: 1
{ $0.someProperty > $1.someProperty }: 0

anthonylatsis · March 10, 2019, 5:21pm

Thank you for taking your time to browse the Compat suite!

To summarize

So to be precise, 34% of the sorts fall under the proposal, with 5% of them using descending order (>) and 1-2% falling back to custom predicates, some of which might be operating on transformed values. Indeed, the relative numbers on custom predicates and > are not showing any compelling reason to stick to the suggested argument order, but the overall statistics, as expected, show a very supportive tendency.

Map Sorting

Map Sorting

Introduction

Motivation

Proposed solution

Implementation

Source compatibility & ABI stability

Considerations

Provide < as the default predicate when Value: Comparable

Alternatives considered

Argument labels

people.sort(over: { $0.age }, by: <)

people.sort(by: { $0.age }, using: <)

Argument order

isExpensiveTransform

In the Swift Source Compatibility Suite

Potential Impact of this Proposal

My raw notes:

Provide `<` as the default predicate when `Value: Comparable`

`people.sort(over: { $0.age }, by: <)`

`people.sort(by: { $0.age }, using: <)`

`isExpensiveTransform`