Map Sorting

Swift conventions apart for a moment, note that developers will not be able to opt-in for trailing closure syntax if the transform argument ends up first:

collection.sort(arg1: { $0.metric }, arg2: <)

people.sort(arg1: {
  let calendar = Calendar.current
  return calendar.component(.day, for: $0.date)
  // Or some other more complex calculation that requires you
  // to scroll down to find out >. A user might not expect it to be
  // there as well.
}, arg2: >)
1 Like

Ah, of course. I didn’t think this through enough to realize that variadic generics is a prerequisite. Good to see that this topic has renewed interest! :slight_smile:

And I definitely agree that this first step is a very welcome addition.

Both of the arguments are closures. If you were to use a more complex areInIncreasingOrder predicate, you would get

users.sorted(using: {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}, by: { $0.name })

or

users.sorted(by: { $0.name }) {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}

Since both of the parameters are closures and can be arbitrarily complex, I personally don't believe one takes precedent over the other with respect to trailing closure syntax. I recognize that < would be by far the most common value for the areInIncreasingOrder closure, but if we provided it as the default value when Value is Comparable then this issue would resolve itself for the most part. You could omit the by: parameter, meaning you'd be able to use the trailing closure syntax for (Element) -> Value if you desired.


Outside of trailing-closure considerations, It's actually quite difficult to write the (Value, Value) -> Bool closure before writing the (Element) -> Value closure. Since you haven't yet told the compiler the type of Value, you don't get any autocomplete or type checking at all in the (Value, Value) -> Bool closure until after you fill out the (Element) -> Value.

users.sorted(using: {
	$0... // At this moment, $0 doesn't have a type yet. Neither autocomplete nor type checking are available. 
}, by: <#T##(User) -> Value#>)

where <#T##(User) -> Value#> is the text representation of Xcode's editor placeholder.


In that sense the type of the (Value, Value) -> Bool closure is directly dependent on the final type of the (Element) -> Value closure (which is usually inferred indirectly by the compiler). Since one parameter has a clear Type Dependency on the other, the parameters should be ordered accordingly.

1 Like

Good call. > (not <) is still far more common than a custom predicate, so this makes it a double-edged sword; argument permutability would be very useful here. I'm not sure how I feel about having the arguments the other way around, since a decent part of my motivation was driven by convenience.

If we do look forward to key-paths though, a third convenience method pretty much solves the issue.

I think it may be useful to have real-world statistics on how often < , > , and other more complicated sorting constructs ( localizedCaseInsensitiveCompare == .orderedAscending is my go-to "complex" example) actually get used in the wild.

In the Swift Source Compatibility Suite

I identified 243 instances of sort / sorted in the swift-source-compat-suite. Of these 243, 80% (194) used < either as their comparator or as the primary operation in their comparator closure. 12% (30) used >. The remaining 8% of sorts (19) mostly checked for == .orderedAscending. Only 1-2% of the sorts in the compatibility suite used some comparison operator other than <, > or == .orderedAscending.

54% (132) of the sorts provided a custom areInIncreasingOrder closure. Of those 132 closures, 53% (71) took the form { $0.someProperty < $1.someProperty } and 10% (14) took the form { $0.someProperty > $1.someProperty }.

Potential Impact of this Proposal

In all, 30% of the sorts I identified in the Source Compatibility Suite took the form { $0.someProperty < $1.someProperty }. Over half of the sorts that provided a custom areInIncreasingOrder closure took that form as well.

With these overloads below, 30% of all sorts in the compatibility suite (and the majority of sorts that provided a areInIncreasingOrder closure at all) could be improved from sorted { $0.someProperty < $1.someProperty } to sorted { $0.someProperty }.

5% could be improved from sorted { $0.someProperty > $1.someProperty } to sorted(using: { $0.someProperty }, by: >)

extension Sequence {
    func sorted<Value>(using transform: (Element) -> Value, by: (Value, Value) -> Bool) -> [Element] {
        ...
    }
    
    func sorted<Value: Comparable>(using transform: (Element) -> Value) -> [Element] {
        return self.sorted(using: transform, by: <)
    }
}

// + corresponding `sort` overloads

My raw notes:

.sorted(): 62
.sort(): 7
.sorted(by: <): 30
.sort(by: <): 0

.sorted(by: >): 12
.sort(by: >): 0

.sorted {
total: 87
block with <: 67
block with >: 8
{ $0.someProperty < $1.someProperty }: 48
{ $0.someProperty > $1.someProperty }: 6

.sorted(by: {
total: 36
block with <: 24
block with >: 8
{ $0.someProperty < $1.someProperty }: 19
{ $0.someProperty > $1.someProperty }: 6

.sort {
total: 8
block with <: 3
block with >: 2
{ $0.someProperty < $1.someProperty }: 3
{ $0.someProperty > $1.someProperty }: 2

.sort(by: {
total: 1
block with <: 1
block with >: 0
{ $0.someProperty < $1.someProperty }: 1
{ $0.someProperty > $1.someProperty }: 0

3 Likes

Thank you for taking your time to browse the Compat suite!

To summarize

So to be precise, 34% of the sorts fall under the proposal, with 5% of them using descending order (>) and 1-2% falling back to custom predicates, some of which might be operating on transformed values. Indeed, the relative numbers on custom predicates and > are not showing any compelling reason to stick to the suggested argument order, but the overall statistics, as expected, show a very supportive tendency.

1 Like

I like this. The biggest issue I see with the current sorted(by:) function and its mutable variant is that you have to make sure your areInIncreasingOrder function satisfies the requirements for a strict weak order (see the documentation), otherwise the result may be quite wrong. For example, it wouldn't be too hard to make this mistake:

["Tom", "Vladimir", "Clyde"].sorted(by: { a, b in a.count < a.count })

which could potentially behave unpredictably, making it hard to catch the bug.

Exposing an easier way for the common use case that can guarantee that the closure is always valid would make it easier to avoid such mistakes (and the closure variant still exists for people who know what they're doing).

1 Like

Bar naming and the question of whether this should be in the stdlib, this is quite easy to implement and I coincidentally did so today (although using an array instead of varargs):

extension Sequence {
    public func sorted(byMultiple predicates: [(Element, Element) -> Bool]) -> [Self.Element] {
        let combinedPredicate = { (e1: Element, e2: Element) -> Bool in
            for predicate in predicates {
                if predicate(e1, e2) { return true }
                if predicate(e2, e1) { return false }
            }
            return false
        }

        return sorted(by: combinedPredicate)
    }
}

The only non-trivial thing is actually convincing yourself that this sort is well-behaved whenever the individual predicates are (and by well-behaved, I mean that they satisify the requirements for a strict weak ordering). It's possible to prove this formally, though, which means that it could make sense to add to the stdlib since that would take the burden of proof away from individual developers.

1 Like

This is interesting — the way I've handled this in my own code is to implement a couple of free functions that convert a closure that yields a Comparable element into a comparison closure. That is:

func ascending<T, U: Comparable>(_ getComparable: @escaping (T) -> U) -> (T, T) -> Bool {
    return { a, b in getComparable(a) < getComparable(b) }
}

func descending<T, U: Comparable>(_ getComparable: @escaping (T) -> U) -> (T, T) -> Bool {
    return { a, b in getComparable(a) > getComparable(b) }
}

Then I can use them with the existing sort methods, plus anything else that takes a binary closure:

let names = ["Alex", "Tabitha", "Ian", "Chrysta"]
let shortToLong = names.sorted(by: ascending({ $0.count }))
let longToShort = names.sorted(by: descending({ $0.count }))
let longestName = names.max(by: ascending(\.count))

The last line uses the proposed key-paths as closures syntax. I have a prefix operator defined that does that conversion, so that's the way I normally write these.

9 Likes

@Ben_Cohen Is the addition of an argument with a default value ABI compatible?

No, it is not. But the addition of an overload with an extra argument is ABI-compatible, and if the original function is not inlinable or transparent, it can call through to the new one.

Edit: On second thought, the better solution is probably to implement this basically as described in the proposal, with the two-argument overload calling through to the existing one-argument form, and use @_alwaysEmitIntoClient to make it backwards-deployable.

3 Likes

or even if it is inlinable – you'd just need to recompile

(and remember what the old function did to make sure any invariants that code relied on are preserved forever...)

2 Likes

Actually, I was weighing how reasonable would it be to add the Schwarzian transform option to each overload. So I was curious whether we could add it later without breaking ABI.

The proposal has been updated to reflect what has been discussed so far. Because optimizations play an important role in the motivation, I decided to include the Schwartzian transform as an option on the proposed solution. For that matter, the corresponding argument label (currently isExpensiveTransform) is open for debate.

This looks great! Good improvements overall.

I'd still personally like to see an overload that defaults to by: < where Element: Comparable.

  • There's prior art for this (sort() is just sort(by: <))
  • < is by far the most common areInIncreasingOrder predicate in the wild by a very large factor:
    • 46% of sorts in the compatibility suite are just sort()/sorted(), and over half of the remaining 54% use < as their predicate — so < is used ~75% of the time.

It was when I tried to actually write the overload and use it that I realized the compiler cannot infer an oveload based on how many shorthand parameters are used in a closure, meaning trailing closure syntax is still out. Given that, I'm not convinced being able to write sorted(on: {$0.age}) instead of sorted(on: {$0.age}, by: <) is worth two additional methods with two default parameters each. Regarding sort(), I feel like the difference between 2 and 1 arguments is less tangible than the difference between 1 and 0 arguments. Though, at the same time, I wouldn't say I am against.

@Ben_Cohen Hello Ben,

The Schwartzian transform for in-place sorting currently works trivially by mapping the receiver to a temporary array of pairs (elt, transform(elt)), sorting that array in place relative to the transformed values and finally applying the resulting element order to the receiver using a loop:

var pairs = map { (element: $0, value: transform($0)) }

pairs.sort {
  areInIncreasingOrder($0.value, $1.value)
}

for (i, j) in zip(indices, pairs.indices) {
  self[i] = pairs[j].element
}

Memory-wise, the algorithm can be optimized by "zipping" the receiver with the transformed values and then sorting only the transformed values, simultaneously applying the same permutations to the zipped receiver. This would save us from allocating an array of pairs and the final step of copying the results to the receiver.

Am I correct in assuming this cannot be adapted without uprooting the existing sort algorithms, since we would have to work with 2 buffer pointers at the same time?

Right, this would effectively be a reimplementation that compares on one buffer and then performs moves in both (and allocates two additional buffers for merging instead of one).

I don't know whether it would be an improvement to write something like (probably not):

let mapped = self.map(transform)
let sortedIndices = mapped.indices.sorted(by: { mapped[$0] < mapped[$1] })
return Array(unsafeUninitializedCapacity: self.count) { buffer, initCount in
    for i in 0..<self.count {
        buffer[i] = self[sortedIndices[i]]
    }
    initCount = self.count
}
1 Like

Sequence doesn't have count nor indices, but we still can't apply this to in-place collection sorting either because the receiver's Index is opaque unlike Array's.

EDIT

To be exact, we can do self[index(startIndex, offsetBy: sortedIndices[i])], but for in-place sorting rearranging the elements using indices would mess up the order.