Why must collection indices be comparable?

I received this exception, and learned that the Index type on every Collection must be Comparable:

Only BidirectionalCollections can have end come before start

This requirement seems overly restrictive, and at odds with the design of Collection. The purpose of Collection as opposed to a RandomAccessCollection is to describe containers that don't have immediate ordered access to their elements.
For a Collection, even computing distance between indices is allowed to be linear time.
If I don't have random access or distance,
how can I implement comparison, except by linearly iterating around the data structure?

Here are two examples of data structures that I could see as Collection, were it not for this requirement:

  1. Linked lists. The index type could be the node itself. The index(after:) returns the next node.
  2. Node based data structures like trees which may pool and reuse nodes. The index type could be an index in a node pool, so the ordering has nothing to do with the iteration order.

Can anyone point out why Equatable is insufficient, or how one either of these data structures could be implemented under such a restriction? (I can provide further detail if desired.)

2 Likes

I've run into this issue before. It isn't ideal but the way I solved it for linked list was to have the index be an (offset, element) pair, with the offset only used for comparison (and nil signifying the end). This only works because Index has semantics where it is invalidated on mutation and it is not necessarily sensible to compare indices from different collections.

2 Likes

Thanks for sharing that. I also just saw a toy LinkedList that actually iterated over the entire list in the comparison to implement it.

Here’s the rationale from SE-0065: A New Model for Collections and Indices:

The Comparable Requirement on Indices

In this model indices store the minimal amount of information required to describe an element's position. Usually an index can be represented with one or two Int s that efficiently encode the path to the element from the root of a data structure. Since one is free to choose the encoding of the “path”, we think it is possible to choose it in such a way that indices are cheaply comparable. That has been the case for all of the indices required to implement the standard library, and a few others we investigated while researching this change.

It's worth noting that this requirement isn't strictly necessary. Without it, though, indices would have no requirements beyond Equatable , and creation of a Range<T> would have to be allowed for any T conforming to Equatable . As a consequence, most interesting range operations, such as containment checks, would be unavailable unless T were also Comparable , and we'd be unable to provide bounds-checking in the general case.

That said, the requirement has real benefits. For example, it allows us to support distance measurement between arbitrary indices, even in collections without random access traversal. In the old model, x.distance(to: y) for these collections had the undetectable precondition that x precede y , with unpredictable consequences for violation in the general case.

4 Likes

There it is! Thank you very much.

My interpretation of their rationale, is that it's easy to keep track of where you are in the path, even though you may not have the entire path's ordering immediately accessible. The only indices that need to be correctly ordered all the time are startIndex and endIndex, all intermediate indicies
can be constructed ad-hoc in index(after:).
We should not be doing complex calculations to compare.

I have written a sample linked list based on this interpretation,
it uses the idea Greg mentioned.
Do we think this implementation is canonical in terms of that proposal?

struct LinkedList<Element>: MutableCollection {
    subscript(i: Index) -> Element {
        get {
            return i.node!.value
        }
        set {
            i.node!.value = newValue
        }
    }
    
    func index(after i: Index) -> Index {
        return Index(
            node: i.node!.next,
            pathIndex: i.pathIndex + 1
        )
    }
    
    class Node {
        var value: Element
        var next: Node? = nil
        
        init(_ x: Element) {
            value = x
        }
    }
    
    var head: Node? = nil
    var count = 0
    
    var startIndex: Index {
        return Index(
            node: head,
            pathIndex: 0
        )
    }
    
    var endIndex: Index {
        return Index(
            node: nil,
            pathIndex: count
        )
    }
    
    struct Index: Comparable {
        var node: Node?
        var pathIndex: Int
        
        static func < (lhs: LinkedList<Element>.Index, rhs: LinkedList<Element>.Index) -> Bool {
            if (lhs.node === rhs.node) {
                assert(lhs.pathIndex == rhs.pathIndex)
            }
            return lhs.pathIndex < rhs.pathIndex
        }
        
        static func == (lhs: LinkedList<Element>.Index, rhs: LinkedList<Element>.Index) -> Bool {
            if lhs.node === rhs.node {
                assert(lhs.pathIndex == rhs.pathIndex)
                return true
            }
            return false
        }
    }
    
    init<S: Sequence>(_ s: S) where S.Element == Element {
        var it = s.makeIterator()
        while let x = it.next() {
            insert(x)
        }
        reverse()
    }
    
    // Mutating functions invalidate indicies
    mutating func insert(_ x: Element) {
        let n = Node(x)
        n.next = head
        head = n
        count += 1
    }
    
    mutating func insert(_ x: Element, after: Index) {
        let n = Node(x)
        n.next = after.node!.next
        after.node!.next = n
        count += 1
    }
    
    mutating func reverse() {
        var previous: Node? = nil
        
        var node = head
        while node != nil {
            let next = node!.next
            node!.next = previous
            previous = node
            node = next
        }
        
        head = previous
    }
}

These same goals seem like they can be achieved with conditional conformance, though removing the requirement would be source-breaking and might not be worth the trouble.

1 Like

I'm not sure I understand that part, but yes, your implementation looks fine to me. See also Dropping Comparable requirement for indices.
FWIW, to this day, I still turn this design choice choice over in my head wondering if we got it right.
Note also that you could invalidate fewer indices by counting down instead of up :wink:

2 Likes

Even if the protocol needed to be left there for compatibility, they could remove the runtime checks so you could have a bogus implementation, but I agree, it looks like this was settled a long time ago.

I haven't though much about Range, so I don't have an informed opinion on how it is affected.

These same goals seem like they can be achieved with conditional conformance

The 3rd bullet here explains why that might not be a good idea.

2 Likes

Thanks for the thread and I'll play with "counting down".
To explain what I meant in other words, startIndex and endIndex have to constructible at any time, in (hopefully) constant time, the intermediate indices can be constructed as asked for, so the Comparable requirement is looser than it immediately appears.

I agree plain old LinkedLists aren't terribly useful. But, I think there are variants that are. What I have in mind is Lisp style lists implemented with a pool, like: https://raw.githubusercontent.com/rjernst/stepanov-components-course/master/code/lecture12/list_pool.h,
I can iterate them in the way we just discussed, but it's unfortunate that I can't "set-cdr!" without invalidating everything. Alex suggests that something liked LinkedIterators are a perfectly valid iterator concept, and I don't see how a Collection variant could model a similar concept with this constraint. But, this is obviously what you are discussing in that thread.

All true. That's how indices work in general.

so the Comparable requirement is looser than it immediately appears.

I guess whether that's true depends on how it immediately appears to you. To me it looks exactly as loose as it actually is :wink:

Salutations! It's always great to meet another Stepanovian.
Nit: to do set-cdr! you don't have to invalidate everything; just those indices on one side of the insertion point.

Yes, you and Alex are right; this is a perfectly valid use case for which structural mutations may needlessly (by some measure) invalidate indices. I'm not sure this is a problem for any generic code, since in an arbitrary collection, structural mutations already invalidate arbitrary indices (or invalidate their correspondence with element values — see Array) and thus their usefulness as a persistent general position indicator in a generic context.

Ah, you mean this Linked Iterator. I note that Alex doesn't say anything about invalidation in that text, but I suppose you can draw some conclusions from the algorithms he discusses. There are various ways to attack those problems in Swift, but admittedly none of them will integrate beautifully with the existing protocols.

There's another factor in the decision to require comparability that I haven't mentioned yet. We were betting that locality of reference and minimizing allocations are so important, that basically every application is most-efficiently served by a data structure with ”contiguous elements at the leaves,” e.g. you don't want a binary tree, you want a B-tree; you don't want a linked list, you want a deque or a rope or a linked-list of arrays, etc. All such data structures, as far as we could tell, had the same index invalidation properties as arrays, and so would have the same problem.

So the decision was definitely a departure from the purest principles of generic programming, made for practical reasons. Of course we might have bet wrong…

1 Like

That is not the purpose of these types. The purpose of these types is:

  • Sequence models data types whose elements can be (synchronously) iterated using a for loop.
  • Collection models Sequences which allow you to go back to a previous element (by using its index) and resume the iteration from there.
  • BidirectionalCollection models Collections which allow you to iterate backwards as well as forwards.
  • RandomAccessCollection models BidirectionalCollections where moving N elements from a given index is just as fast as moving 1 element from it.

Note that none of these types model “unordered access”. When you iterate over a sequence, the elements always come out in some order—the order that next() returns them in—and when you iterate over an instance a Collection instance several times without mutating it, the elements will always come out in the same order. The order might be arbitrary, as it is for Dictionary and Set, but that just means the type makes no promises about whether two “equal” instances will iterate their elements in the same order or how any given mutation will affect the order of iteration. It is still the case that, if you write two for loops over the same Set variable without mutating the set between them, they will receive the elements in the same order.

Moreover, the order of iteration always matches the order implied by the index’s Comparable conformance. This makes it possible to determine which of two indices comes before the other in the collection. It also makes it possible to bounds-check—the precondition that “only index i where startIndex <= i < endIndex is valid” is impossible to state if you cannot compare indices.

6 Likes

Well, without knowing that it also models Collection, you aren't allowed to iterate the same sequence multiple times, so an arbitrary sequence has no observable, stable order.

2 Likes

Note that none of these types model “unordered access”

I never said anything about unordered access,
In the same way that you don't have constant access to every value in a linked list, you don't have constant access to the index ordering,
you have to iterate to find it.
Using an index pair just caches that lookup.

All of what you said is compatible with indices only being Equatable.

precondition that “only index i where startIndex <= i < endIndex is valid” is impossible to state if you cannot compare indices.

Yes you can. Here is one such definition: If you increment startIndex it must reach endIndex in finitely many steps. Those intermediate indices define an ordered collection of elements and i must be one of those. You're right that this statement is hard to check efficiently, which is also why it's tricky to write comparable indices for every data structure, but it's not impossible to state.

It's always great to meet another Stepanovian.

Same to you! Big fan of EOP as well.

to do set-cdr! you don't have to invalidate everything; just those indices on one side of the insertion point.

That's true. I'm going to take a stab a list_pool in Swift and see if I can get a better understanding of what algorithms work or not.

There's another factor in the decision to require comparability

Those are good points. Note that pooling strategies such as this do preserve a lot of locality.

Of course we might have bet wrong…

Maybe there could be a looser concept to accommodate, but I don't understand how collections dovetail with slices and ranges to say definitively.
At the very least, for me this limit is primarily on data structures,
algorithms can continue to work happily in the most abstract way
which is probably what I care about most.

Thanks for the detailed response.

1 Like