Sorted Collection Protocol

Letan · April 6, 2018, 10:51pm

Here's a draft proposal of introducing a SortedCollection protocol. The proposal doesn't include a concrete type that implements this. However, I wouldn't mind any suggestions for a concrete type, but would valid reasons over any other types.

Sorted Collections

Proposal: SE-NNNN
Authors: Letanyan Arumugam
Review Manager: TBD
Status: Awaiting implementation

During the review process, add the following fields as needed:

Implementation: apple/swift#NNNNN
Decision Notes: Rationale, Additional Commentary
Bugs: SR-NNNN, SR-MMMM
Previous Revision: 1
Previous Proposal: SE-XXXX

Introduction

There are algorithms, and data structures rely on having sorted guarantees. However,
currently, the Swift Standard Library does not provide a way to allow types to
offer this semantic guarantee.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

Having a common entry point to write algorithms that rely on sorted order guarantees
would be desirable for allowing communication between multiple libraries and
code bases. The Standard Library may also want to provide types that would
require the semantics of this proposal later.

However, the commonality is only useful if the use cases that use the standard entry point are
useful on their own.

Performance

A typical way of improving search performance is to use a binary search which
relies on having a random access collection of sorted elements. Other structures
have structural qualities that inherently allow similar O(log n) performance.

A collection that is updated to be sorted at each entry can be more efficient
than calling sort every time you need to use the collection.

Semantics

It is useful to have a way to represent collections with an inherit sorted order.
Structures such as priority queues, sorted dictionaries and sets have useful
features. Namely displaying results or prioritising information.

Proposed solution

Introducing a protocol that provides the semantic guarantees of having a
collection that is always sorted.

protocol SortedCollection : Collection {
    mutating func insert(_ element: Element)
}

Note that there is no requirement for an Element to be Comparable. Not
having this requirement allows any conformer to be able to use any way of
providing sorted guarantees.

The proposed solution will provide types to have their elements declare sorting
by their Comparable conformance or through a closure.

Detailed design

/// A collection that guarantees that the conforming type always has its
/// elements in a given  sorted order.
///
/// For example, when elements are inserted into the collection the collection
/// will have them inserted in a order determined by some comparison.
///
///     var xs = SortedArray()
///     xs.insert(5)
///     xs.insert(10)
///     xs.insert(0)
///
/// The order that xs has its elements would be as follows, 0, 5, 10.
protocol SortedCollection : Collection {
    /// Inserts an element into correctly corresponding comparative position.
    ///
    /// This method must ensure that any element passed in must be placed in
    /// its correct position within the collection. This position can be based
    /// on the elements relation using its conformance to Comparable, if such 
    /// a conformance is available. Or through a closure or some other method
    /// that determines the relational order between two elements.
    mutating func insert(_ element: Element)
}

Source compatibility

This is purely additive.

Effect on ABI stability

No effect on ABI.

Effect on API resilience

Some types in the Standard Library may adopt this protocol and as such removal
of this would cause specific algorithms that rely on using a SortedCollection
constraint to be modified.

Alternatives considered

Constrain Element to Comparable

This would force collections to force elements to conform to Comparable if
they weren't. Collections may also want to have custom sorting that only makes
sense for specific scenarios. Take for example a table row, the sorting method is
not an inherent part of any row item, but somewhat situational. Types should be allowed to
provide this flexibility without being restrained.

Require a Sorting Closure

On the other side of requiring Element to conform to Comparable is a general
closure. Forcing conformers to fulfil is also undesirable when not
needed, as it forces the conformer to include overhead to a type that
may not require it.

CTMacUser · April 7, 2018, 6:44pm

So, the protocol assumes that there's no uniqueness requirement; that any arbitrary value may be inserted, even if there's an existing equivalent one?

For example, Set does have a uniqueness requirement, so a version of insert for it would require some way to flag a conflict/failure.

Letan · April 7, 2018, 7:15pm

Good point. I think we could make insert(_:) failable by having it return a Bool to indicate insertion failure. We could also possibly make it the same as the Standard Libraries Set.insert and return a (inserted: Bool, memberAfterInsert: Element)?

It may be weird for some structures that are okay with containing elements that are equal, but a method that captures all possible use cases is surely more necessary.

CTMacUser · April 7, 2018, 7:36pm

Hmm, maybe have a UniquelySortedCollection with an insert with a conflict-information return and a refined SortedCollection with a Void-returning insert. The later protocol's default implementation for the former's insert will call the later's insert with a no-conflict flag.

...

Oh, wait; you want both sorted-collection protocols to return the concerned element's index, so you can reference it again. But the USC version would return the index and a conflict flag.

David_Sweeris · April 7, 2018, 7:41pm

Why not only require a sorting closure if Element doesn't conform to Comparable?

(I'm not actually sure the language supports this, other than providing a default implementation that passes along < as the closure).

Letan · April 7, 2018, 8:18pm

Something like this might work well.

protocol SortedCollection : Collection {
  mutating func insert(_ element: Element) -> Index
}

protocol UniquelySortedCollection : SortedCollection {
  mutating func insert(unique element: Element) -> (Bool, Index)
}

extension UniquelySortedCollection {
  mutating func insert(_ element: Element) -> Index {
    return insert(unique: element).1
  }
}

The way I understood your comment it sounded like you wanted to represent the relationship the other way around. I don't think to do it that way makes much sense, since uniqueness is a refinement. Doing this the other way around would be to have SC erase semantics of USC.

nnnnnnnn · April 8, 2018, 12:14am

Thanks for starting up this topic, @Letan! This is a bit of a hole in the standard library, and we haven't had a good answer yet. There was a proposal that was rejected about two years ago—the response from the standard library team at the time is particularly useful to read, and the rest of the thread also has a good discussion that would be helpful to catch up on: [Review] SE-0074: Implementation of Binary Search functions - #5 by dabrahams

In my exploration of the topic after that proposal, I've found that it helps if the protocol you start with only has a requirement that lets you maintain the order of the elements. That is, SortedCollection might look like this:

protocol SortedCollection: Collection {
  /// An ordering for all elements in the collection.
  var areInIncreasingOrder: (Element, Element) -> Bool { get }
}

This way Range and ClosedRange can conform right out of the gate, and ReversedCollection can even conditionally conform.

extension Range: SortedCollection 
    where Bound: Strideable, Bound.Stride: SignedInteger {
  var areInIncreasingOrder: (Element, Element) -> Bool { 
    return (<)
  }
}

Then we want to fill in what operations make sense to do on an immutable sorted collection, before we move on to sorted collections that can have elements inserted and removed. A binary search could be provided as an implementation detail, and maybe an insertionPoint(for element: Element) -> Index method, as a way of finding where an element would go if it were to be added.

On mutation, it does seem like there's room for a RangeRemoveableCollection protocol for types that can remove groups of elements. Along with sorted collections, Set and Dictionary could even conform to that protocol, which would go in between Collection and RangeReplaceableCollection.

In general, it's difficult to design the protocols for something like this without implementing the concrete types that would conform to them. I'd strongly suggest starting there, and then finding the abstractions that make sense.

CTMacUser · April 8, 2018, 5:05am

Now looking at my post, I'm not sure which way I meant it.

But with your way, with UniquelySortedCollection refining SortedCollection, USC's version of SC's method means that a function taking a reference to a SC could get a failed Index as a return value and not know it may be bad, since you drop the conflict flag.

If you do it the other way, then when using a SC through a reference to a USC, the caller will receive a (Bool, Index) pair with the Bool always indicating success. This direction is safer.

Letan · April 8, 2018, 7:41pm

Immutable collections completely slipped my mind. But yes this would work well.

Yes, that was my thinking. We don't need to explicitly provide a binarySearch method. It could just be used in methods like find and contains.

Should this not be a separate proposal? It certainly would make sense to separate the two though.
It's the cleanest way to model this.

I've made implementations of Sorted(Set|Array|Dictionary). Do you have others that might be useful to design against?

After further usage, I agree with this. Thanks to @nnnnnnnn suggestion most of the danger would go away, I assume, since most people would use the base immutable protocol to constrain against.

Letan · April 8, 2018, 7:48pm

After further experimentation and design. I found the following hierarchy to be useful.
Please excuse the names of these for now.

protocol SortedCollection : Collection {
  associatedtype SortingElement
  typealias Comparator = (SortingElement, SortingElement) -> Bool
  typealias InsertionPoint = (containsExistingElement: Bool, index: Index)
  
  var areInIncreasingOrder: Comparator { get }
  func sortingElement(for element: Element) -> SortingElement
  func insertionPoint(for element: SortingElement) -> InsertionPoint
}

protocol SortedUniqueInsertableCollection : SortedCollection where SortingElement: Equatable {
  mutating func insert(unique element: Element) -> (Bool, Index)
}

protocol SortedInsertableCollection : SortedUniqueInsertableCollection {
  mutating func insert(_ element: Element) -> Index
}

extension SortedInsertableCollection {
  mutating func insert(unique element: Element) -> (Bool, Index) {
    let idx = insert(element)
    return (true, idx)
  }
}

The only thing I think that's not obvious about this is the SortingElement. Most of the time, such as in a Set or Array, SortingElement == Element. However, for the case of a Dictionary, we would want to sort based on only Key's. But, we would want a (Key, Value) as our Element for Sequence and other Collection methods.

kareman · May 3, 2018, 10:16pm

I implemented something like this here: GitHub - kareman/Sorted: A prototype of a Sorted protocol in Swift..

nickmoore · May 4, 2018, 7:26am

I really like this outline, however having SortedInsertableCollection inherit from SortedUniqueInsertableCollection seems strange to me.

It restricts SortedInsertableCollection to Equatable sorting elements, which seems unnecessary, and adds the curious requirement for an insert(unique:) function onto the non-unique collection.

Why this rather than:

protocol SortedInsertableCollection : SortedCollection { /*...*/ }
protocol SortedUniqueInsertableCollection : SortedCollection where SortingElement: Equatable { /*...*/ }

What is the value of having an (arguably misleading) insert(unique:) shim function on non-unique collections? Shouldn't such a function come with an expectation of unique insertion?

Letan · May 4, 2018, 8:59am

This is a good point! I only caught this after reviewing the implementations I wrote, using these protocols, but I've just not had the time to further explore the design space.

nickmoore:

Why this rather than:

protocol SortedInsertableCollection : SortedCollection { /*...*/ }
protocol SortedUniqueInsertableCollection : SortedCollection where SortingElement: Equatable { /*...*/ }

Thank you for this suggestion. I don't know why I felt the need for there to be a direct relationship between the two of them.

nnnnnnnn · May 6, 2018, 9:48pm

Kare, this is a great start! Thanks for keeping this ball rolling. Just a few questions / bits of feedback:

It looks like only random-access collections are using a binary search for finding an element, with forward and bidirectional collections still using a linear search. I think you could try a binary search with these too, since while the index advancement will still be O(n), you only need to perform O(log n) comparisons, which can be a big savings.
I don't think this extension is necessary:
```
extension SortedCollection where Element: Comparable {
    public var areInIncreasingOrder: (Element, Element) -> Bool {
        return (<)
    }
}
```
That would prevent Comparable elements from being stored in reverse order, or in an order other than their comparable-ness. Your simplified initializers for SortedArray, though, are great, since they let a user easily create a sorted array with the default order.
I think you want to specify that the SubSequence of a SortedCollection is also a SortedCollection—that would let you do your recursive searches on slices, rather than needs to pass ranges as you recurse into narrower chunks of the collection. You'll probably need to extend Slice to conditionally conform to SortedCollection, too.
If you have a sorted collection type that can be inserted into, you can generally also remove elements, right? Would removal need to be a different protocol, or could those operations join insertion? Finally, if you can remove all the elements from a collection and add more, that collection probably supports initializing an empty version of itself — I think your SortedArray type would support all these operations.
In SortedArray, you can cut out the index-moving methods if you declare RandomAccessCollection in the initial declaration and add typealias Indices = Range<Int>. Then you just need startIndex, endIndex, and subscript and you'll get the performance you're looking for.

kareman · May 7, 2018, 9:41pm

Nice, I implemented your suggestions here and it made the code a lot simpler.

1

I did this, but my implementation recalculates count for every slice, which will be expensive for non-random-access collections; I think there are ways around this though. We should probably run some benchmarks to see which implementation is most efficient. But we need an ordered non-random-access collection type, ~~and the only ones I can think of in the Standard Library are String and it's views, and it doesn't make much sense to sort those~~ like DefaultIndices<String>, though it doesn’t support duplicate elements.

2

nnnnnnnn:

I don’t think this extension is necessary:
extension SortedCollection where Element: Comparable {
    public var areInIncreasingOrder: (Element, Element) -&gt; Bool {
        return (&lt;)
    }
}
That would prevent Comparable elements from being stored in reverse order, or in an order other than their comparable-ness. Your simplified initializers for SortedArray, though, are great, since they let a user easily create a sorted array with the default order.

But a conforming type can override this, like SortedArray does. Or maybe it is better to require conforming types to explicitly implement this, instead of doing it automatically?

4

I added remove methods to SortedArray (and it has an empty initialiser), but I think only methods that can take advantage of the fact that the elements are sorted belong in the SortedCollection and MutableSortedCollection protocols. Like remove(Element) and remove(all: Element).

Note: MutableSortedCollection probably needs a better name, as it is nothing like MutableCollection.

anandabits · May 8, 2018, 9:36pm

Sorry I'm late to this thread. What is the motivation for using a areInIncreasingOrder property that returns a closure instead of using a method for comparison? Is there a need to be able to use the comparator independently of the collection instance? Won't using a closure reduce performance in some cases?

However, for the case of a Dictionary, we would want to sort based on only Key's. But, we would want a (Key, Value) as our Element for Sequence and other Collection methods.

This is an interesting observation. Something similar would be necessary to accommodate Dictionary if Swift ever gets HKT and a Functor protocol - we would only want to map Value in that case.

nnnnnnnn · May 9, 2018, 1:53am

I think you would need to pass the count as an argument to avoid the repeated calculation if you keep the recursive version of this. An iterative version wouldn't have that issue, but you'd need to track the index movements another way.

That's true — as long as collections can provide their own, it's probably fine, and simplifies conformance for things like Range.

Yeah, Mutable needs to mean mutable-at-this-index, which sorted collections can't be. Not a lot of great adjectives left for this…

nnnnnnnn · May 9, 2018, 1:57am

Is there a performance difference between those? For the general case of a sorted collection with an arbitrary sorting mechanism, the user will need to supply a comparison closure.

Letan:

protocol SortedCollection : Collection {
  associatedtype SortingElement
  typealias Comparator = (SortingElement, SortingElement) -&gt; Bool
  typealias InsertionPoint = (containsExistingElement: Bool, index: Index)

  var areInIncreasingOrder: Comparator { get }
  func sortingElement(for element: Element) -&gt; SortingElement
  func insertionPoint(for element: SortingElement) -&gt; InsertionPoint
}

Could you say a bit about this design? What motivates having a separate SortingElement associated type rather than handling that transformation in an (Element, Element) -> Bool comparator? A dictionary's comparator could ignore the value parts of the element tuple, for example.

anandabits · May 9, 2018, 2:11am

I think there would be as a difference - a closure would defeat specialization and inlining, wouldn't it? Using a closure may still be useful in many cases but I would want to see pretty strong motivation to support requiring that in the definition of the protocol.

Letan · May 9, 2018, 5:45am

It’s been a while, but iirc there was a problem getting a default comparison function for the SortedDictionary. Because tuples can’t conform (yet?) to protocols. And putting the key value pair in a struct would mean that we couldn’t use the pattern (key, value) in a for loop.