Map Sorting

Huh, I find that a bit unfortunate -- but good to know, thanks!

Would you say that the Swift‘s stdlib has to follow it‘s mistakes in its further evolution and use the same strange combination for all new yet similar algorithms? What me concens the most is not the name itself, but that it‘s prefixed with by which in conjunction reads aweful from a non-native English speaker perspective. Sure that the language should not read like a book, but to me this particular case seems to be broken and could be improved at least in new API.

Could we imagine overloads for secondary and tertiary keypath/sort direction pairs?
I can’t really think of a nice api for the overloads, but I could definitely imagine it being useful. :slight_smile:

Perhaps an optional final parameter that is a variadic list of pairs of ‘by’ and ‘using’?

I wholeheartedly agree with all of @cal's points. Otherwise, great proposal!

1 Like

There was an extensive discussion about this type of sorting in this thread from September. It's a pretty big design space, made even more complicated by the fact that we don't have variadic generics yet. It's on the roadmap, but I haven't seen any indication that it's coming any time soon.

But that being said, it may be wise to design this proposal in such a way that the (Element) -> Comparable transformation may one day be replaced with a ((Element) -> some variadic Comparable)... (that is, a variadic parameter of transformations).

This isn't strictly possible with the sort(using: (Element) -> Value, by: (Value, Value) -> Bool), because you need both the transformation and areInIncreasingOrder closures in order to complete each step in the sort process. While we could instead have the function take a tuple of those two closures (which may then be made variadic in the distance future), I think it's unwise to make the simple case more complex for the sake of some future possibility that may never come.

For that reason, it may make sense to propose func sort<Value: Comparable>(by transformation: (Element) -> Value) so that it may one day be swapped out for the variadic equivalent func sort<variadic Value: Comparable>(by transformation: ((Element) -> variadic Value)...)

I like a comment @jrose made about this in the previous thread:

Since this is proposal is ultimately about simplifying sort ergonomics, we may be better off keeping the new overload as simple as possible (i.e. just func sort<Value: Comparable>(by transformation: (Element) -> Value) for now) rather than try and build in a more complex system that tries to handle all of the less-common methods of sorting. If we make sorting easier for 90% of use cases, and leave the status-quo untouched for the other 10% of cases, then I think we've still done our job.

1 Like

Another thought worth considering is an overload like this variant from the previous thread is already possible to write today, without variadic generics:

func sort(by areInIncreasingOrder: ((Element, Element) -> Bool)...) { ... } 

people.sort(by:
    { $0.name < $1.name },
    { $0.age < $1.age },
    { $0.height < $1.height },
    { $0.salary < $1.salary })

I'm not sure where that fits in to the big picture for this proposal, but I do think it's a very noteworthy potential syntax for this topic:

2 Likes

Swift conventions apart for a moment, note that developers will not be able to opt-in for trailing closure syntax if the transform argument ends up first:

collection.sort(arg1: { $0.metric }, arg2: <)

people.sort(arg1: {
  let calendar = Calendar.current
  return calendar.component(.day, for: $0.date)
  // Or some other more complex calculation that requires you
  // to scroll down to find out >. A user might not expect it to be
  // there as well.
}, arg2: >)
1 Like

Ah, of course. I didn’t think this through enough to realize that variadic generics is a prerequisite. Good to see that this topic has renewed interest! :slight_smile:

And I definitely agree that this first step is a very welcome addition.

Both of the arguments are closures. If you were to use a more complex areInIncreasingOrder predicate, you would get

users.sorted(using: {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}, by: { $0.name })

or

users.sorted(by: { $0.name }) {
    $0.localizedCaseInsensitiveCompare($1) == .orderedAscending
}

Since both of the parameters are closures and can be arbitrarily complex, I personally don't believe one takes precedent over the other with respect to trailing closure syntax. I recognize that < would be by far the most common value for the areInIncreasingOrder closure, but if we provided it as the default value when Value is Comparable then this issue would resolve itself for the most part. You could omit the by: parameter, meaning you'd be able to use the trailing closure syntax for (Element) -> Value if you desired.


Outside of trailing-closure considerations, It's actually quite difficult to write the (Value, Value) -> Bool closure before writing the (Element) -> Value closure. Since you haven't yet told the compiler the type of Value, you don't get any autocomplete or type checking at all in the (Value, Value) -> Bool closure until after you fill out the (Element) -> Value.

users.sorted(using: {
	$0... // At this moment, $0 doesn't have a type yet. Neither autocomplete nor type checking are available. 
}, by: <#T##(User) -> Value#>)

where <#T##(User) -> Value#> is the text representation of Xcode's editor placeholder.


In that sense the type of the (Value, Value) -> Bool closure is directly dependent on the final type of the (Element) -> Value closure (which is usually inferred indirectly by the compiler). Since one parameter has a clear Type Dependency on the other, the parameters should be ordered accordingly.

1 Like

Good call. > (not <) is still far more common than a custom predicate, so this makes it a double-edged sword; argument permutability would be very useful here. I'm not sure how I feel about having the arguments the other way around, since a decent part of my motivation was driven by convenience.

If we do look forward to key-paths though, a third convenience method pretty much solves the issue.

I think it may be useful to have real-world statistics on how often < , > , and other more complicated sorting constructs ( localizedCaseInsensitiveCompare == .orderedAscending is my go-to "complex" example) actually get used in the wild.

In the Swift Source Compatibility Suite

I identified 243 instances of sort / sorted in the swift-source-compat-suite. Of these 243, 80% (194) used < either as their comparator or as the primary operation in their comparator closure. 12% (30) used >. The remaining 8% of sorts (19) mostly checked for == .orderedAscending. Only 1-2% of the sorts in the compatibility suite used some comparison operator other than <, > or == .orderedAscending.

54% (132) of the sorts provided a custom areInIncreasingOrder closure. Of those 132 closures, 53% (71) took the form { $0.someProperty < $1.someProperty } and 10% (14) took the form { $0.someProperty > $1.someProperty }.

Potential Impact of this Proposal

In all, 30% of the sorts I identified in the Source Compatibility Suite took the form { $0.someProperty < $1.someProperty }. Over half of the sorts that provided a custom areInIncreasingOrder closure took that form as well.

With these overloads below, 30% of all sorts in the compatibility suite (and the majority of sorts that provided a areInIncreasingOrder closure at all) could be improved from sorted { $0.someProperty < $1.someProperty } to sorted { $0.someProperty }.

5% could be improved from sorted { $0.someProperty > $1.someProperty } to sorted(using: { $0.someProperty }, by: >)

extension Sequence {
    func sorted<Value>(using transform: (Element) -> Value, by: (Value, Value) -> Bool) -> [Element] {
        ...
    }
    
    func sorted<Value: Comparable>(using transform: (Element) -> Value) -> [Element] {
        return self.sorted(using: transform, by: <)
    }
}

// + corresponding `sort` overloads

My raw notes:

.sorted(): 62
.sort(): 7
.sorted(by: <): 30
.sort(by: <): 0

.sorted(by: >): 12
.sort(by: >): 0

.sorted {
total: 87
block with <: 67
block with >: 8
{ $0.someProperty < $1.someProperty }: 48
{ $0.someProperty > $1.someProperty }: 6

.sorted(by: {
total: 36
block with <: 24
block with >: 8
{ $0.someProperty < $1.someProperty }: 19
{ $0.someProperty > $1.someProperty }: 6

.sort {
total: 8
block with <: 3
block with >: 2
{ $0.someProperty < $1.someProperty }: 3
{ $0.someProperty > $1.someProperty }: 2

.sort(by: {
total: 1
block with <: 1
block with >: 0
{ $0.someProperty < $1.someProperty }: 1
{ $0.someProperty > $1.someProperty }: 0

3 Likes

Thank you for taking your time to browse the Compat suite!

To summarize

So to be precise, 34% of the sorts fall under the proposal, with 5% of them using descending order (>) and 1-2% falling back to custom predicates, some of which might be operating on transformed values. Indeed, the relative numbers on custom predicates and > are not showing any compelling reason to stick to the suggested argument order, but the overall statistics, as expected, show a very supportive tendency.

1 Like

I like this. The biggest issue I see with the current sorted(by:) function and its mutable variant is that you have to make sure your areInIncreasingOrder function satisfies the requirements for a strict weak order (see the documentation), otherwise the result may be quite wrong. For example, it wouldn't be too hard to make this mistake:

["Tom", "Vladimir", "Clyde"].sorted(by: { a, b in a.count < a.count })

which could potentially behave unpredictably, making it hard to catch the bug.

Exposing an easier way for the common use case that can guarantee that the closure is always valid would make it easier to avoid such mistakes (and the closure variant still exists for people who know what they're doing).

1 Like

Bar naming and the question of whether this should be in the stdlib, this is quite easy to implement and I coincidentally did so today (although using an array instead of varargs):

extension Sequence {
    public func sorted(byMultiple predicates: [(Element, Element) -> Bool]) -> [Self.Element] {
        let combinedPredicate = { (e1: Element, e2: Element) -> Bool in
            for predicate in predicates {
                if predicate(e1, e2) { return true }
                if predicate(e2, e1) { return false }
            }
            return false
        }

        return sorted(by: combinedPredicate)
    }
}

The only non-trivial thing is actually convincing yourself that this sort is well-behaved whenever the individual predicates are (and by well-behaved, I mean that they satisify the requirements for a strict weak ordering). It's possible to prove this formally, though, which means that it could make sense to add to the stdlib since that would take the burden of proof away from individual developers.

1 Like

This is interesting — the way I've handled this in my own code is to implement a couple of free functions that convert a closure that yields a Comparable element into a comparison closure. That is:

func ascending<T, U: Comparable>(_ getComparable: @escaping (T) -> U) -> (T, T) -> Bool {
    return { a, b in getComparable(a) < getComparable(b) }
}

func descending<T, U: Comparable>(_ getComparable: @escaping (T) -> U) -> (T, T) -> Bool {
    return { a, b in getComparable(a) > getComparable(b) }
}

Then I can use them with the existing sort methods, plus anything else that takes a binary closure:

let names = ["Alex", "Tabitha", "Ian", "Chrysta"]
let shortToLong = names.sorted(by: ascending({ $0.count }))
let longToShort = names.sorted(by: descending({ $0.count }))
let longestName = names.max(by: ascending(\.count))

The last line uses the proposed key-paths as closures syntax. I have a prefix operator defined that does that conversion, so that's the way I normally write these.

11 Likes

@Ben_Cohen Is the addition of an argument with a default value ABI compatible?

No, it is not. But the addition of an overload with an extra argument is ABI-compatible, and if the original function is not inlinable or transparent, it can call through to the new one.

Edit: On second thought, the better solution is probably to implement this basically as described in the proposal, with the two-argument overload calling through to the existing one-argument form, and use @_alwaysEmitIntoClient to make it backwards-deployable.

3 Likes

or even if it is inlinable – you'd just need to recompile

(and remember what the old function did to make sure any invariants that code relied on are preserved forever...)

2 Likes