Map Sorting


(Pierpaolo Frasa) #22

I like this. The biggest issue I see with the current sorted(by:) function and its mutable variant is that you have to make sure your areInIncreasingOrder function satisfies the requirements for a strict weak order (see the documentation), otherwise the result may be quite wrong. For example, it wouldn't be too hard to make this mistake:

["Tom", "Vladimir", "Clyde"].sorted(by: { a, b in a.count < a.count })

which could potentially behave unpredictably, making it hard to catch the bug.

Exposing an easier way for the common use case that can guarantee that the closure is always valid would make it easier to avoid such mistakes (and the closure variant still exists for people who know what they're doing).


(Pierpaolo Frasa) #23

Bar naming and the question of whether this should be in the stdlib, this is quite easy to implement and I coincidentally did so today (although using an array instead of varargs):

extension Sequence {
    public func sorted(byMultiple predicates: [(Element, Element) -> Bool]) -> [Self.Element] {
        let combinedPredicate = { (e1: Element, e2: Element) -> Bool in
            for predicate in predicates {
                if predicate(e1, e2) { return true }
                if predicate(e2, e1) { return false }
            }
            return false
        }

        return sorted(by: combinedPredicate)
    }
}

The only non-trivial thing is actually convincing yourself that this sort is well-behaved whenever the individual predicates are (and by well-behaved, I mean that they satisify the requirements for a strict weak ordering). It's possible to prove this formally, though, which means that it could make sense to add to the stdlib since that would take the burden of proof away from individual developers.


(Nate Cook) #24

This is interesting — the way I've handled this in my own code is to implement a couple of free functions that convert a closure that yields a Comparable element into a comparison closure. That is:

func ascending<T, U: Comparable>(_ getComparable: @escaping (T) -> U) -> (T, T) -> Bool {
    return { a, b in getComparable(a) < getComparable(b) }
}

func descending<T, U: Comparable>(_ getComparable: @escaping (T) -> U) -> (T, T) -> Bool {
    return { a, b in getComparable(a) > getComparable(b) }
}

Then I can use them with the existing sort methods, plus anything else that takes a binary closure:

let names = ["Alex", "Tabitha", "Ian", "Chrysta"]
let shortToLong = names.sorted(by: ascending({ $0.count }))
let longToShort = names.sorted(by: descending({ $0.count }))
let longestName = names.max(by: ascending(\.count))

The last line uses the proposed key-paths as closures syntax. I have a prefix operator defined that does that conversion, so that's the way I normally write these.


(Anthony Latsis) #25

@Ben_Cohen Is the addition of an argument with a default value ABI compatible?


(Brent Royal-Gordon) #26

No, it is not. But the addition of an overload with an extra argument is ABI-compatible, and if the original function is not inlinable or transparent, it can call through to the new one.

Edit: On second thought, the better solution is probably to implement this basically as described in the proposal, with the two-argument overload calling through to the existing one-argument form, and use @_alwaysEmitIntoClient to make it backwards-deployable.


(Ben Cohen) #27

or even if it is inlinable – you'd just need to recompile

(and remember what the old function did to make sure any invariants that code relied on are preserved forever...)


(Anthony Latsis) #28

Actually, I was weighing how reasonable would it be to add the Schwarzian transform option to each overload. So I was curious whether we could add it later without breaking ABI.


(Anthony Latsis) #29

The proposal has been updated to reflect what has been discussed so far. Because optimizations play an important role in the motivation, I decided to include the Schwartzian transform as an option on the proposed solution. For that matter, the corresponding argument label (currently isExpensiveTransform) is open for debate.


(Cal Stephens) #30

This looks great! Good improvements overall.

I'd still personally like to see an overload that defaults to by: < where Element: Comparable.

  • There's prior art for this (sort() is just sort(by: <))
  • < is by far the most common areInIncreasingOrder predicate in the wild by a very large factor:
    • 46% of sorts in the compatibility suite are just sort()/sorted(), and over half of the remaining 54% use < as their predicate — so < is used ~75% of the time.

(Anthony Latsis) #31

It was when I tried to actually write the overload and use it that I realized the compiler cannot infer an oveload based on how many shorthand parameters are used in a closure, meaning trailing closure syntax is still out. Given that, I'm not convinced being able to write sorted(on: {$0.age}) instead of sorted(on: {$0.age}, by: <) is worth two additional methods with two default parameters each. Regarding sort(), I feel like the difference between 2 and 1 arguments is less tangible than the difference between 1 and 0 arguments. Though, at the same time, I wouldn't say I am against.


(Anthony Latsis) #32

@Ben_Cohen Hello Ben,

The Schwartzian transform for in-place sorting currently works trivially by mapping the receiver to a temporary array of pairs (elt, transform(elt)), sorting that array in place relative to the transformed values and finally applying the resulting element order to the receiver using a loop:

var pairs = map { (element: $0, value: transform($0)) }

pairs.sort {
  areInIncreasingOrder($0.value, $1.value)
}

for (i, j) in zip(indices, pairs.indices) {
  self[i] = pairs[j].element
}

Memory-wise, the algorithm can be optimized by "zipping" the receiver with the transformed values and then sorting only the transformed values, simultaneously applying the same permutations to the zipped receiver. This would save us from allocating an array of pairs and the final step of copying the results to the receiver.

Am I correct in assuming this cannot be adapted without uprooting the existing sort algorithms, since we would have to work with 2 buffer pointers at the same time?


(Nate Cook) #33

Right, this would effectively be a reimplementation that compares on one buffer and then performs moves in both (and allocates two additional buffers for merging instead of one).

I don't know whether it would be an improvement to write something like (probably not):

let mapped = self.map(transform)
let sortedIndices = mapped.indices.sorted(by: { mapped[$0] < mapped[$1] })
return Array(unsafeUninitializedCapacity: self.count) { buffer, initCount in
    for i in 0..<self.count {
        buffer[i] = self[sortedIndices[i]]
    }
    initCount = self.count
}

(Anthony Latsis) #34

Sequence doesn't have count nor indices, but we still can't apply this to in-place collection sorting either because the receiver's Index is opaque unlike Array's.

EDIT

To be exact, we can do self[index(startIndex, offsetBy: sortedIndices[i])], but for in-place sorting rearranging the elements using indices would mess up the order.


(Anthony Latsis) #35

@cal I apologize for being inconsiderate, I will of course extend the «Alternatives Considered» section to mention convenience overloads.