Map Sorting


(Anthony Latsis) #1

Map Sorting

Prologue

Hello to everyone! I am aware there were similar discussions before, specifically #1 and #2. Although this pitch was initially formed independently, I was lucky enough to discover these along the way;
I've read both and considered all the feedback, thanks!

Introduction

This proposal presents an addition to the Standard Library in an effort to make map sorting
and, eventually, closureless key-path sorting a functional prerequisite for Swift.

Swift-evolution thread: Map Sorting

Motivation

The straightforward way to sort a collection over a Comparable property of its Element type is to use a regular predicate, welding together the accessing and comparison of values.

struct Person {
    ...
    var age: Int
}

struct Chat {
    ...
    var lastMsg: Message
}

var people: [Person] = ...
var chats: [Chat] = ...

people.sort { $0.age < $1.age }
chats.sort { $0.lastMsg.date > $1.lastMsg.date }

Most often, however, all we need to determine sorting is a key-path on an element. Other times, a comparison closure on its own happens to be inconvenient or inefficient.

  • The $0.property < $1.property syntax often leads to copy-and-paste bugs.
  • The base metric might be obtained through non-trivial computation, where a predicate alone instigates code duplication and makes it harder to spot the sorting order.
  • When the values are expensive to retrieve, the predicate obscuring the computations also becomes a major obstacle for optimizations, such as trading memory for speed to warrant that each value is computed once per element rather than O(n logn) times. For clarity, applying the latter to a ϴ(n) operation theoretically speeds up sorting by a factor of 10 for an array of 1000 elements.

Thereby, the goal is to introduce an API that decouples the comparison of values from their calculations, favoring optimizations and bringing ergonomic advantages for an ample range of cases.

Proposed solution

Add an overload for both the nonmutating sorted and in-place sort methods on Sequence and MutableCollection respectively. A mapping closure on Element will lead the argument list, followed by the well known areInIncreasingOrder predicate and, finally, a flag that triggers the already mentioned Schwartzian Transform optimization. transform is positioned before the predicate specifically because the latter is type-dependent on and logically precedes the former, meaning, above all, fundamental type-checker and autocompletion support. Here are some example usages:

chats.sort(on: { $0.lastMsg.date }, by: >)

fileUnits.sort(
  on: { $0.raw.count(of: "func") },
  by: <,
  isExpensiveTransform: true
)

Implementation

extension Sequence {

  @inlinable
  public func sorted<Value>(
    on transform: (Element) throws -> Value,
    by areInIncreasingOrder: (Value, Value) throws -> Bool,
    isExpensiveTransform: Bool = false
  ) rethrows -> [Element] {
    guard isExpensiveTransform else {
      return try sorted {
        try areInIncreasingOrder(transform($0), transform($1))
      }
    }
    var pairs = try map {
      try (element: $0, value: transform($0))
    }
    try pairs.sort {
      try areInIncreasingOrder($0.value, $1.value)
    }

    return pairs.map { $0.element }
  }
}

extension MutableCollection where Self: RandomAccessCollection {

  @inlinable
  public mutating func sort<Value>(
    on transform: (Element) throws -> Value,
    by areInIncreasingOrder: (Value, Value) throws -> Bool,
    isExpensiveTransform: Bool = false
  ) rethrows {
    guard isExpensiveTransform else {
      return try sort {
        try areInIncreasingOrder(transform($0), transform($1))
      }
    }
    var pairs = try map {
      try (element: $0, value: transform($0))
    }
    try pairs.sort {
      try areInIncreasingOrder($0.value, $1.value)
    }

    for (i, j) in zip(indices, pairs.indices) {
      self[i] = pairs[j].element
    }
  }
}

Source compatibility & ABI stability

This is an ABI-compatible addition with no impact on source compatibility.

Alternatives considered

Why not key-paths?

Swift 4 key-path expressions enable us to get rid of the closure and hence retain the argument label, giving the call a surprising resemblance to actual human language:

people.sort(by: \.age)
chats.sort(by: \.lastMsg.date, >)

Like sort()
and sorted(), key-path sorting is a highly common practice and a fundamental case that deserves some out-of-the-box convenience. The only issue with focusing solely on key-paths in an ABI-stable world is that we risk API sprawl by neglecting custom mappings, whereas the closure-based approach is both flexible and provident in leaving space for implicit key-path-to-function conversions. Ideally, the choice between a key-path and a closure will merely be a matter of style:

chats.sort(on: { $0.lastMsg.date }, by: >)
chats.sort(on: \.lastMsg.date, by: >)

Argument label naming

people.sort(over: { $0.age }, by: <)

over has more of a mathematical flavor to it, where the transform argument is read as a set of values with an injective (one-to-one) relation to the elements of the sequence. For that matter, «map sorting» can be thought of as sorting the set of transformed values and permuting the elements of the original sequence accordingly. While this variant emphasizes the strong correlation of mathematics and computer science, Swift as a multi-paradigm language should strive to settle with generally understandable names that are recognizable, ideally, regardless of the user's background.

people.sort(by: { $0.age }, using: <)

The by label is a perfect candidate to describe the yielded metric used to sort the sequence. using, on its turn, is just as fitting for a predicate or an operator. The pair in question is perhaps the only one that always boils down to a proper sentence - «Sort-ed by a metric using a predicate». Nevertheless, the author is convinced in the superior importance of preserving API uniformity and consistency with existing API the Standard Library developers have worked so hard to keep. With ABI Stability kicking in, we no longer have an opportunity for amendments to public API signatures and must be especially careful in this regard.


(Cal Stephens) #2

This looks great! Glad to see somebody working in this area again. I've been hoping to come back and take another pass at this topic for a while. Overall I think this is a well-scoped proposal! It's definitely a welcome addition to the Standard Library in my book.

Some feedback about the proposed changes:

  • I find it a bit unusual for the (Value, Value) -> Bool comparator to come before the (Element) -> Value closure. In the logical flow of the algorithm itself, the transformation from Element to Value comes before comparing the Values, so I think it should follow that the (Element) -> Value closure should come before the (Value, Value) -> Element comparator in the parameter list.

  • I think that, for consistency with the existing sort(by: (Element, Element) -> Bool) spelling, that this new overload should be spelled sort(using: (Element) -> Value, by: (Value, Value) -> Bool). It seems desirable to make sure that "by:" refers to the same type of comparator closure in both of the two sort overloads.

  • I think it would be great to include the sort<Value: Comparable>(by: (Element) -> Value) overload in this proposal as well. There's already precedent in the Standard Library for defaulting to sort(by: <) when Element : Comparable (example: [3, 2, 7].sort() -> [2, 3, 7]).

Some meta-feedback on the proposal document itself:

  • As written, the proposal feels a bit too centered around the potential future KeyPath-based syntax. For example, you show the KeyPath syntax three times, and only show the actual proposed closure syntax once. Additionally, the KeyPath syntax is shown before the closure syntax that is actually being proposed in this document. I think sorting by a KeyPath is a desirable end-goal, but it probably belongs on the Future Works section. (The ideal future would probably be a world where KeyPath<Root, Value> can serve as a stand-in replacement for (Root) -> Value. Then we would get KeyPath support in map, filter, and sort all at once.)

  • It also may good to add a mention the Schwartzian transform in the Alternatives Considered section. I don't think we should be using that transform by default in this implementation, since I assume most people would be sorting over normal property accessors (it seems unwise to sort over an expensive computed property). The API could potentially include something like sort(using:...by:...precomputeKeys:Bool) or even sort(using:...by:...usingSchwatzianTransform:Bool), but I think that it should be omitted for the sake of simplicity.

I'd be happy to do whatever I can to help get this proposal review-ready! I've spent a lot of time thinking about this topic, and I'm very interested in seeing it get across the finish line :smile:


(Anthony Latsis) #3

Hi Cal!

In Swift, it is a general style convention that a closure in an argument list should go last. This way trailing closure syntax is left to the user's discretion. I mentioned this decision was based on how often we need an actual braced closure for each argument. Unless we have key-path conversions, the answer for the transform closure is - always. The predicate, however, is predominantly < or >. Which one do you prefer?

  • chats.sort(by: { $0.lastMsg.date }, using: <)

  • chats.sort(using: <) { $0.lastMsg.date }

Regardless, the crucial difference is that the first example happens to enforce style. If that were not the case though, I mostly agree. On the other hand, arguments, mathematically and like words in a sentence, can almost always be freely permuted without distorting the semantics of a function call. In fact, some languages let the user determine the order of arguments at call site for themselves; something Swift might have to consider at some point.

It's worth noting that the existing sort takes a single argument, that is, there is no question about which argument a particular label suits best. To me, the first variant makes the most sense, but these design subtleties are of course open to discussion.

  • chats.sort(using: <, by: $0.lastMsg.date })

  • chats.sort(by: <, using: { $0.lastMsg.date })


(Xiaodi Wu) #4

I'll throw in one more option:

sort(on: \.age, by: <)

Then, the existing overload functions almost like a default value:

// Equivalent:
sort(on: \.self, by: <)
sort(by: <)

And as you mention we can default the other one too for 'Comparable' types:

// Equivalent:
sort(on: \.age, by: <)
sort(on: \.age)

// Equivalent:
sort(on: \.self, by: <)
sort(by: <)
sort(on: \.self)
sort()

#5

This is a cool idea! One nit, though: areInIncreasingOrder seems like an odd name for the comparator function.

Also...

I'm not sure how much sense it makes to pick a specific operator as a default value for sorting; in the case of numbers, why should either ascending/descending order be the default over the other?


(Xiaodi Wu) #6

The naming of the comparator function is not part of this pitch. That issue was decided in 2016 as part of SE-0118, and is already part of the standard library (since Swift 3).

That issue is also not part of this pitch. The default sort order is already part of the standard library (and has been since before Swift 3).


#7

Huh, I find that a bit unfortunate -- but good to know, thanks!


(Adrian Zubarev) #8

Would you say that the Swift‘s stdlib has to follow it‘s mistakes in its further evolution and use the same strange combination for all new yet similar algorithms? What me concens the most is not the name itself, but that it‘s prefixed with by which in conjunction reads aweful from a non-native English speaker perspective. Sure that the language should not read like a book, but to me this particular case seems to be broken and could be improved at least in new API.


(Morten Bek Ditlevsen) #9

Could we imagine overloads for secondary and tertiary keypath/sort direction pairs?
I can’t really think of a nice api for the overloads, but I could definitely imagine it being useful. :slight_smile:


(Morten Bek Ditlevsen) #10

Perhaps an optional final parameter that is a variadic list of pairs of ‘by’ and ‘using’?


(David Hart) #11

I wholeheartedly agree with all of @cal's points. Otherwise, great proposal!


(Cal Stephens) #12

There was an extensive discussion about this type of sorting in this thread from September. It's a pretty big design space, made even more complicated by the fact that we don't have variadic generics yet. It's on the roadmap, but I haven't seen any indication that it's coming any time soon.

But that being said, it may be wise to design this proposal in such a way that the (Element) -> Comparable transformation may one day be replaced with a ((Element) -> some variadic Comparable)... (that is, a variadic parameter of transformations).

This isn't strictly possible with the sort(using: (Element) -> Value, by: (Value, Value) -> Bool), because you need both the transformation and areInIncreasingOrder closures in order to complete each step in the sort process. While we could instead have the function take a tuple of those two closures (which may then be made variadic in the distance future), I think it's unwise to make the simple case more complex for the sake of some future possibility that may never come.

For that reason, it may make sense to propose func sort<Value: Comparable>(by transformation: (Element) -> Value) so that it may one day be swapped out for the variadic equivalent func sort<variadic Value: Comparable>(by transformation: ((Element) -> variadic Value)...)

I like a comment @jrose made about this in the previous thread:

Since this is proposal is ultimately about simplifying sort ergonomics, we may be better off keeping the new overload as simple as possible (i.e. just func sort<Value: Comparable>(by transformation: (Element) -> Value) for now) rather than try and build in a more complex system that tries to handle all of the less-common methods of sorting. If we make sorting easier for 90% of use cases, and leave the status-quo untouched for the other 10% of cases, then I think we've still done our job.


(Cal Stephens) #13

Another thought worth considering is an overload like this variant from the previous thread is already possible to write today, without variadic generics:

func sort(by areInIncreasingOrder: ((Element, Element) -> Bool)...) { ... } 

people.sort(by:
    { $0.name < $1.name },
    { $0.age < $1.age },
    { $0.height < $1.height },
    { $0.salary < $1.salary })

I'm not sure where that fits in to the big picture for this proposal, but I do think it's a very noteworthy potential syntax for this topic:


(Anthony Latsis) #14

Swift conventions apart for a moment, note that developers will not be able to opt-in for trailing closure syntax if the transform argument ends up first:

collection.sort(arg1: { $0.metric }, arg2: <)

people.sort(arg1: {
  let calendar = Calendar.current
  return calendar.component(.day, for: $0.date)
  // Or some other more complex calculation that requires you
  // to scroll down to find out >. A user might not expect it to be
  // there as well.
}, arg2: >)

(Morten Bek Ditlevsen) #15

Ah, of course. I didn’t think this through enough to realize that variadic generics is a prerequisite. Good to see that this topic has renewed interest! :slight_smile:

And I definitely agree that this first step is a very welcome addition.


(Cal Stephens) #16

Both of the arguments are closures. If you were to use a more complex areInIncreasingOrder predicate, you would get

users.sorted(using: {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}, by: { $0.name })

or

users.sorted(by: { $0.name }) {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}

Since both of the parameters are closures and can be arbitrarily complex, I personally don't believe one takes precedent over the other with respect to trailing closure syntax. I recognize that < would be by far the most common value for the areInIncreasingOrder closure, but if we provided it as the default value when Value is Comparable then this issue would resolve itself for the most part. You could omit the by: parameter, meaning you'd be able to use the trailing closure syntax for (Element) -> Value if you desired.


Outside of trailing-closure considerations, It's actually quite difficult to write the (Value, Value) -> Bool closure before writing the (Element) -> Value closure. Since you haven't yet told the compiler the type of Value, you don't get any autocomplete or type checking at all in the (Value, Value) -> Bool closure until after you fill out the (Element) -> Value.

users.sorted(using: {
	$0... // At this moment, $0 doesn't have a type yet. Neither autocomplete nor type checking are available. 
}, by: <#T##(User) -> Value#>)

where <#T##(User) -> Value#> is the text representation of Xcode's editor placeholder.


In that sense the type of the (Value, Value) -> Bool closure is directly dependent on the final type of the (Element) -> Value closure (which is usually inferred indirectly by the compiler). Since one parameter has a clear Type Dependency on the other, the parameters should be ordered accordingly.


(Anthony Latsis) #17

Good call. > (not <) is still far more common than a custom predicate, so this makes it a double-edged sword; argument permutability would be very useful here. I'm not sure how I feel about having the arguments the other way around, since a decent part of my motivation was driven by convenience.


(Anthony Latsis) #18

If we do look forward to key-paths though, a third convenience method pretty much solves the issue.


(Cal Stephens) #20

I think it may be useful to have real-world statistics on how often < , > , and other more complicated sorting constructs ( localizedCaseInsensitiveCompare == .orderedAscending is my go-to "complex" example) actually get used in the wild.

In the Swift Source Compatibility Suite

I identified 243 instances of sort / sorted in the swift-source-compat-suite. Of these 243, 80% (194) used < either as their comparator or as the primary operation in their comparator closure. 12% (30) used >. The remaining 8% of sorts (19) mostly checked for == .orderedAscending. Only 1-2% of the sorts in the compatibility suite used some comparison operator other than <, > or == .orderedAscending.

54% (132) of the sorts provided a custom areInIncreasingOrder closure. Of those 132 closures, 53% (71) took the form { $0.someProperty < $1.someProperty } and 10% (14) took the form { $0.someProperty > $1.someProperty }.

Potential Impact of this Proposal

In all, 30% of the sorts I identified in the Source Compatibility Suite took the form { $0.someProperty < $1.someProperty }. Over half of the sorts that provided a custom areInIncreasingOrder closure took that form as well.

With these overloads below, 30% of all sorts in the compatibility suite (and the majority of sorts that provided a areInIncreasingOrder closure at all) could be improved from sorted { $0.someProperty < $1.someProperty } to sorted { $0.someProperty }.

5% could be improved from sorted { $0.someProperty > $1.someProperty } to sorted(using: { $0.someProperty }, by: >)

extension Sequence {
    func sorted<Value>(using transform: (Element) -> Value, by: (Value, Value) -> Bool) -> [Element] {
        ...
    }
    
    func sorted<Value: Comparable>(using transform: (Element) -> Value) -> [Element] {
        return self.sorted(using: transform, by: <)
    }
}

// + corresponding `sort` overloads

My raw notes:

.sorted(): 62
.sort(): 7
.sorted(by: <): 30
.sort(by: <): 0

.sorted(by: >): 12
.sort(by: >): 0

.sorted {
total: 87
block with <: 67
block with >: 8
{ $0.someProperty < $1.someProperty }: 48
{ $0.someProperty > $1.someProperty }: 6

.sorted(by: {
total: 36
block with <: 24
block with >: 8
{ $0.someProperty < $1.someProperty }: 19
{ $0.someProperty > $1.someProperty }: 6

.sort {
total: 8
block with <: 3
block with >: 2
{ $0.someProperty < $1.someProperty }: 3
{ $0.someProperty > $1.someProperty }: 2

.sort(by: {
total: 1
block with <: 1
block with >: 0
{ $0.someProperty < $1.someProperty }: 1
{ $0.someProperty > $1.someProperty }: 0


(Anthony Latsis) #21

Thank you for taking your time to browse the Compat suite!

To summarize

So to be precise, 34% of the sorts fall under the proposal, with 5% of them using descending order (>) and 1-2% falling back to custom predicates, some of which might be operating on transformed values. Indeed, the relative numbers on custom predicates and > are not showing any compelling reason to stick to the suggested argument order, but the overall statistics, as expected, show a very supportive tendency.