[PITCH] SortedCollection

collection
stdlib
sorting
(Giuseppe Lanza) #1

Following the topic about SortedCollection I've decided to give some thought to the matter and possible implementation.

Like mentioned by @nnnnnnnn here, possible design for a SortedColelction protocol should include the comparison closure that defines the order of elements. And I didn't really identify any other mandatory requirements so far:

public protocol SortedCollection: Collection {
    var areInIncreasingOrder: (Element, Element) -> Bool { get }
}

Thinking of the mutable version of this protocol, i believe that the insert should return the insertion index for easy following accesses.

//Mutable version
public protocol MutableSortedCollection: SortedCollection {
    @discardableResult
    mutating func insert(_ element: Element) -> Index
    
    @discardableResult
    mutating func remove(_ element: Element) -> Element
}

Slices of a Sorted collection should be SortedColelctions themselves:

extension Slice: SortedCollection where Base: SortedCollection {
    public var areInIncreasingOrder: (Base.Element, Base.Element) -> Bool {
        return base.areInIncreasingOrder
    }
}

finally we can have extensions to search for elements indexes taking advantage of the fact that the collection is actually sorted


extension SortedCollection {
    func _firstIndex(of element: Element) -> Index {
        var lo = startIndex
        var hi = endIndex
        while lo < hi {
            let half = distance(from: lo, to: hi) / 2
            let mid = index(lo, offsetBy: half)
            
            let midEl = self[mid]
            if areInIncreasingOrder(midEl, element) {
                lo = index(after: mid)
            } else {
                hi = mid
            }
        }
        return lo
    }
    
    func _lastIndex(of element: Element) -> Index {
        var lo = startIndex
        var hi = endIndex
        while lo < hi {
            let half = distance(from: lo, to: hi) / 2
            let mid = index(lo, offsetBy: half)
            
            let midEl = self[mid]
            if areInIncreasingOrder(element, midEl) {
                hi = mid
            } else {
                lo = index(after: mid)
            }
        }
        return lo
    }
}

public extension SortedCollection where Element: Equatable {
    func firstIndex(of element: Element) -> Index? {
        guard !isEmpty else { return nil }
        let index = _firstIndex(of: element)
        return self[index] == element ? index : nil
    }
    
    @inlinable func contains(_ element: Element) -> Bool {
        return firstIndex(of: element) != nil
    }
}

public extension SortedCollection where Self: BidirectionalCollection, Element: Equatable {
    func lastIndex(of element: Element) -> Index? {
        guard !isEmpty else { return nil }
        var index = _lastIndex(of: element)
        guard index > startIndex else { return nil }
        formIndex(before: &index)
        return self[index] == element ? index : nil
    }
}

Having this setup we can write a SortedArray struct

public struct SortedArray<Element>: RandomAccessCollection, SortedCollection {
    public typealias Index = Int
    private var storage: [Element]
    public let areInIncreasingOrder: (Element, Element) -> Bool
    
    public var startIndex: Int { return storage.startIndex }
    public var endIndex: Int { return storage.endIndex }

    public subscript(position: Int) -> Element {
        get { return storage[position] }
    }
    
    public func index(after i: Int) -> Int {
        return storage.index(after: i)
    }
}


//Initializer
extension SortedArray {
    public init<S: Sequence>(_ sequence: S, by areInIncreasingOrder: @escaping (Element, Element) -> Bool) where S.Element == Element {
        self.areInIncreasingOrder = areInIncreasingOrder
        self.storage = sequence.sorted(by: areInIncreasingOrder)
    }
    
    public init(by areInIncreasingOrder: @escaping (Element, Element) -> Bool) {
        self.areInIncreasingOrder = areInIncreasingOrder
        self.storage = [Element]()
    }
}

extension SortedArray where Element: Comparable {
    public init<S: Sequence>(_ sequence: S) where S.Element == Element {
        self.areInIncreasingOrder = ( < )
        self.storage = sequence.sorted(by: <)
    }
    
    public init() {
        self.areInIncreasingOrder = ( < )
        self.storage = [Element]()
    }
}

extension SortedArray: CustomStringConvertible {
    public var description: String { return storage.description }
}

Just brainstorming about this implementation. I would be interested in which are your thoughts and where we should go from here to get a proposal.

(David Waite) #2

One useful function is to create subsets based on the ordering - for instance, all entries that are below a certain value, larger than a certain value, or between two values. You could also have a method to bisect. The values don't need to be within the set.

Likewise, finding the value in a collection nearest the current value - closest below, above, etc.

(Gal Cohen) #3

In case it hasn't already been mentioned, there's a really nice reference implementation of a sorted array described by @ole here

(Giuseppe Lanza) #4

Thank you, i didn't see this one

1 Like
(Lance Parker) #5

"Increasing" should probably be spelled "ascending"

1 Like
(Giuseppe Lanza) #6

I adopted the same naming convention used by the swift sort function. I believe devs are already familiar with this one, but I'm not against changing it.

1 Like
(Giuseppe Lanza) #7

I was thinking of a min and max implementation but here is opened the dilemma "min according to what?" A SortedArray designed this way might have a custom comparator for Comparable elements. In this case what min should return? The minimum according to the SortedArray's areInIncreasingOrder closure, or according to Comparable?

(Riley Testut) #8

Is there a reason Element shouldn’t be required to be Comparable? IMO that follows precedent set by Set for Hashable elements and would just simplify things (unless I’m missing something obvious).

(Tino) #9

There are plenty of different ways for sorting objects, and without a custom comparator, the feature will be much less useful.
Just imagine a list of people: You surely could conform to Comparable, but that would limit yourself to a single order — whereas you might want to sort by name in one situation, and by date of birth in another.

We can still add a default comparator for comparable elements, so I see no real benefit to make this a requirement.

3 Likes
(Rod Brown) #10

Agreed. This is also consistent with other APIs within the standard library where you can use a custom comparator closure, and then adding conditional methods that apply only when the items conform to a certain protocol.

A good example of this would be firstIndex(where:) which takes a closure, and firstIndex(of:) which takes an element type only when the element type is Equatable. The "where" version makes no assumptions that the standard definition of "equability" (because the elements may not even be equatable) and allows the programmer to specify something that indicates a "match", besides equatability of the whole type.

This is extremely consistent. Another example for sorting would be an account of transactions where you sort the transactions by name, alphabetically, or by the amount being paid. Definitely makes a whole lot of sense to avoid comparable as the only way of doing this.

2 Likes
(Giuseppe Lanza) #11

I add to what has been said already, that there are also extensions to make regular use of the data structure when elements are equatable and/or comparable.

(Giuseppe Lanza) #12

I've build these two features. The question is: should they be in standard library? Do they cover a generic enough use case?

Filter can do the job, but it does it in O(n) and returns an array. These two functions can do the job in O(log n) and return a slice, so there are benefits. Which are your thoughts?