Pitch [stdlib]: making sorting algorithm selectable

stdlib
sorting
(Giuseppe Lanza) #1

It's well known that currently, it is possible to sort a collection by using the sort(ed) function(s).

It's currently unclear without peeking under the hood of swift implementation is the algorithm used for sorting, changes to this algorithm (recently it passed from introsort to timsort) and performances tradeoffs.

Different sorting algorithms might be preferred depending on the problem that needs to be solved. In certain cases it's important to know that the sorting is stable, in other cases might be important to be sure that the algorithm is in place.

With the change from swift 4 to swift 5, the sorting algorithm changed from being in place (introsort) to be stable (timsort) with undocumented time complexity (you can find it out just by looking at the swift source code)

Developers choices in sorting their collections might be affected from their current knowledge of the current implementation of the swift sorting algorithm. A change in this implementation might result in breaking backward compatibility.

Proposal
sort function should have a parameter that accepts the sorting algorithm to be used to sort the collection.

This could be achieved by introducing a new protocol SortImplementation

public protocol SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows
}

it will be possible at this point to define sorting algorythms as structs

struct TimSort: SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        print("TimSort")
    }
}

struct HeapSort: SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        print("HeapSort")
    }
}

struct IntroSort: SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        print("IntroSort")
    }
}

A SortAlgorithm struct will group and hide actual implementations of stdlib sorting algorithms

public struct SortAlgorithm {
    public static var heapSort: SortAlgorithm { return SortAlgorithm(with: HeapSort.self) }
    public static var timSort: SortAlgorithm { return SortAlgorithm(with: TimSort.self) }
    
    let implementation: SortImplementation.Type
    public init<T: SortImplementation>(with implementation: T.Type) {
        self.implementation = implementation
    }
}

the sort function would then become something similar to this

extension Array {
    mutating func sort(using implementation: SortImplementation.Type, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try implementation.sort(array: &self, by: areInIncreasingOrder)
    }
    
    mutating func sort(using algorithm: SortAlgorithm = .timSort, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try algorithm.implementation.sort(array: &self, by: areInIncreasingOrder)
    }
}

This way the usage of sort would become:

array.sort()
array.sort(using: .heapSort)
array.sort(using: .insertionSort)
array.sort(using: .quickSort)
array.sort(using: .introSort)
//and so on...

The default sort would still be accessible in the usual way and the existing code wouldn't change. The benefits for those users that decide to use a specific sorting algorithm are that implementation details are well known, performances are predictable, changes to the swift default sorting algorithm won't affect existing code performances or memory usage.

Developers will also be able to create custom Sorting implementations just by creating a struct conformant to SortImplementation

struct CustomSort: SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        print("custom sort")
    }
}

and use it this way

array.sort(using: CustomSort.self)

I would include introsort, quicksort, heapsort, insertionsort and timsort.

Alternatives
Currently, special needs are addressed by developers by writing themselves the implementation of their needed sorting algorithms, risking to duplicate a sorting algorithm already implemented by the standard library, but not exposed publicly.

It's my first attempt in contributing to the swift community, so if i made mistakes in presenting the pitch, please let me know and thank you for your time.

2 Likes
(Tino) #2

Should SortImplementation be an enum, and .heapSort etc cases (wouldn't like that…)? Or something else?

#3

How about telling the stdlib what properties you need, and stdlib choosing the fastest implementation for you?

array.sort(properties: [.stable, .inPlace])

Some combinations would be impossible, but that could be checked if/when we have static assertions Compile-Time Constant Expressions for Swift

5 Likes
(Jacob Williams) #4

I like the idea of making the sorting algorithm selectable, but I don't think it should be an enum. As more and more sorting algorithms become available that would just complicate the sort function and it also doesn't allow for developers to write their own sorting implementations that can be used either in personal/professional projects nor be shared with the community.

I think it would be best if there were a SortImplementation protocol with static sort functions that operate on comparables. Then sort would be called by passing the type of the sort implementation like:

array.sort()
array.sort(using: HeapSort.self)
array.sort(using: InsertionSort.self)
array.sort(using: MyCustomSort.self)
3 Likes
(Giuseppe Lanza) #5

I was thinking of SortImplementation as a protocol.

public protocol SortImplementation { ... }
struct HeapSort: SortImplementation { ... }
public extension SortImplementation {
    static var heapSort: SortImplementation {
        return HeapSort()
   }
}

This way developers can extend sort functions to adopt their own implementation.

The syntax would remain the one described in the first post

2 Likes
(Tim Vermeulen) #6

You can't add static vars to protocols like that – it makes heapSort available on types implementing the protocol, not on the protocol itself. But you can achieve that syntax when SortImplementation is a struct with static functions, while still allowing people to extend it to add their own sorting algorithms.

3 Likes
(Giuseppe Lanza) #7

You are right! :D yes, that would be the idea.

(Giuseppe Lanza) #8

this would be a super simplified version of my idea and which would be the real iplementation:

public protocol SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows
}

struct TimSort: SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        ...
    }
}


struct HeapSort: SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
         ...
    }
}

public struct SortAlgorithm {
    static var heapSort: SortImplementation { return HeapSort() }
    static var timSort: SortImplementation { return TimSort() }
}

extension Array {
    mutating func sort(using algorithm: SortImplementation = SortAlgorithm.timSort, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try algorithm.sort(array: &self, by: areInIncreasingOrder)
    }
}

var array = [1, 2, 3, 4]
array.sort(using: SortAlgorithm.heapSort, by: <)

What do you think?

#9

Why use structs that conform to protocols instead of just functions? heapSort(array) doesn't introduce any new protocols, uses functions to do what functions are for, and it's painfully obvious how to add a new algorithm.

4 Likes
(Giuseppe Lanza) #10

then the areInIncreasingOrder function would need to be moved in this algorithm function, or duplicated (for backward compatibility)... i'm not sure i like the idea, or i didn't understand what you are proposing.

(Giuseppe Lanza) #11

adding custom sorting algorithm might be as easy as creating a new SortImplementation

struct IntroSort: SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        ...
    }
}

array.sort(using: IntroSort(), by: <)

//This would be optional
extension SortAlgorithm {
    static var introSort: SortImplementation { return IntroSort() }
}

array.sort(using: SortAlgorithm.introSort, by: <)
#12

You still can optionally pass areInIncreasingOrder by closure if you want

func heapSort<T>(_ array: [T], by areInIncreasingOrder: (T, T) -> Bool) -> [T] {
// implement heap sort here
	... 

}

func heapSort<T>(_ array: [T]) -> [T] where T: Comparable {
	return heapSort(array, by: <)
}

func quickSort<T>(_ array: [T], by areInIncreasingOrder: (T, T) -> Bool) -> [T] {
// implement quick sort here
    ...
}

func quickSort<T>(_ array: [T]) -> [T] where T: Comparable {
	return quickSort(array, by: <)
}
(Giuseppe Lanza) #13

but how the existing sort function would change? Are you proposing to leave it the way it is now, and just expose new sort functions?

#14

Yeah :)

(Jacob Williams) #15

I don't really see the benefit of having the SortAlgorithm struct. You end up with something more verbose (SortAlgorithm.HeapSort vs just HeapSort()) and I also don't like the idea of maintaining the SortAlgorithm struct with all the possible sorting algorithms that may be added or the idea of how many people will end up extending it with their own versions of the exact same algorithm.

I think that a protocol that says SortImplementation is painfully obvious too. Especially if it's a part of the default sort function on Array. Anyone looking at the docs would see it.

I don't like the idea of each possible sorting algorithm getting its own function. There are 43 different sorting algorithms listed on Wikipedia and I'm sure there are more out there and more will be created in the future. It would be a dangerous precedent to set by saying that any new sorting algorithm added to the standard library should get its own special function. That sounds like a nonstarter right there.

Having a sort protocol gives a standard way for people to create their own sorting algorithms that would eventually be added to the stdlib. The stdlib begins with just the trimsort or introsort. How would using functions affect the ABI? Does adding a new function to the Array struct affect its ABI? (Genuinely asking because I can never remember what things affect ABI)

#16

I think that a protocol that says SortImplementation is painfully obvious too. Especially if it's a part of the default sort function on Array. Anyone looking at the docs would see it.

While easy to understand, I don't think it's obvious. If you make a poll on twitter asking to finish the sentence "I don't like the stdlib sort algorithm, so I have to write...", a lot of answers will be "my own sort function", and not many "my own struct satisfying a protocol".

Is adding 43 structs all being a wrapper around a single function better than adding 43 functions from ABI/API standpoint? If so I retract my suggestion.

For me a sorting algorithm is just a transformation of one array into another. You give it a value, and comparator, and it spits back transformed value. It sounds like a function, I think it should be a function.

This is what you're proposing

struct FooSort: SortImplementation {
  func sort<T>(array: inout [T], by areInIncreasingOrder: (T, T) -> Bool) {
    ...
  }
}
[1, 3, 2].sort(using: FooSort.self)

I don't like how the struct is just a wrapper around a function. It would annoy me just as much as this hypotetical comparator design:

struct FooComparison: Comparator {
  func compare<T>(lhs: T, rhs: T) -> Bool {
    ...
  }
}
[1, 3, 2].sort(by: FooComparison.self)

So I would like for sorters to be functions instead of wrappers around a function, just like comparators are

  func fooSort<T>(array: inout [T], by areInIncreasingOrder: (T, T) -> Bool) {
    ...
  }
  [1, 3, 2].sort(using: fooSort)

but at this point, is there a reason to not just use them directly?

(Giuseppe Lanza) #17

I see your point.

I don't think it would affect ABI at all.
Just like you i don't believe that having a function for each possible sort would be nice.

As of which sorting algorith to include i would go for timsort, heapsort, quicksort, introsort, insertionsort in standard library.

I still see the benefit of having a protocol to guide developers in the implementation of their sorting algorithms.

With this in mind, api might look more like what you were proposing

public protocol SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows
}

struct TimSort: SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        ...
    }
}


struct HeapSort: SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        ...
    }
}

extension Array {
    mutating func sort(by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try sort(using: TimSort.self, by: areInIncreasingOrder)
    }
    
    mutating func sort<T: SortImplementation>(using algorithm: T.Type, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try algorithm.sort(array: &self, by: areInIncreasingOrder)
    }
}

var array = [1, 2, 3, 4]
array.sort(using: HeapSort.self, by: <)

extending it would mean just to write your own sort implementation

struct IntroSort: SortImplementation {
    static func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        ...
    }
}

array.sort(using: IntroSort.self, by: <)
(Giuseppe Lanza) #18

but at this point, is there a reason to not just use them directly?

That's a very good point, and basically the main reason i had SortingAlgorithm struct in place in my example: To hide implementations in stdlib

(Jacob Williams) #19

Since the SortImplementation doesn't have any associatedtype or Self requirements you should only need the one sort function on Array:

extension Array {
    mutating func sort(using algorithm: SortImplementation.Type = TimSort.self, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try algorithm.sort(array: &self, by: areInIncreasingOrder)
    }
}
(Giuseppe Lanza) #20

How about this?


public protocol SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows
}

struct TimSort: SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        print("TimSort")
    }
}


struct HeapSort: SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        print("HeapSort")
    }
}

//This hides implementations
public struct SortAlgorithm {
    public static var heapSort: SortAlgorithm { return SortAlgorithm(with: HeapSort()) }
    public static var timSort: SortAlgorithm { return SortAlgorithm(with: TimSort()) }
    
    let implementation: SortImplementation
    public init(with implementation: SortImplementation) {
        self.implementation = implementation
    }
}

extension Array {
    mutating func sort(using implementation: SortImplementation, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try implementation.sort(array: &self, by: areInIncreasingOrder)
    }
    
    mutating func sort(using algorithm: SortAlgorithm = .timSort, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        try algorithm.implementation.sort(array: &self, by: areInIncreasingOrder)
    }
}

var array = [1, 2, 3, 4]
array.sort(using: .heapSort, by: <)

extending would be

struct IntroSort: SortImplementation {
    func sort<Element>(array: inout Array<Element>, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        print("IntroSort")
    }
}
//optionally
extension SortAlgorithm {
    static var introSort: SortAlgorithm { return SortAlgorithm(with: IntroSort()) }
}

array.sort(using: .introSort, by: <)
array.sort(using: IntroSort(), by: <)