An alternative resolution for Sequence vs. Collection

CTMacUser · June 21, 2018, 9:12pm

Continuing the discussion from [Pitch] Remove destructive consumption from Sequence:

As the kids say nowadays: hold my

I was just reading random blogs and saw this type:

enum List<A> {
    case empty
    indirect case cons(A, List<A>)
}

and thought it would be interesting to try out. I got string convertibility (regular and debug), (conditional) equality, and nil and array literal initialization. I also got Sequence conformance.

The type seems like a reasonable container, so maybe I could add Collection conformance too. I thought I could use a simple integer for the Index. But I found problems:

You can't get from an offset index to its corresponding element in O(1) time. You always have to walk the list in O(n) time.
When making a sub-collection, that sub-collection has to keep the same index values as those corresponding elements have in the original collection. That requires an extra offset in my case, which can't be put anywhere without altering the type to no longer being a pure enum.

I got a classic computer-science list here; I should be able to do container-like things to it. But I can't since it isn't really Collection-able.

Looking at the recent debates over sequences vs. collections, maybe the muddiness in their responsibilities, especially relative to actual types in use, indicates we haven't all the abstractions correct. For the classic CS list I got here, maybe instead of tying persistence on sequences with Collection, make an intermediate marker protocol with that semantic. (We already have RandomAccessCollection as a marker protocol; no new members relative to BidirectionalCollection, just better behavior.)

The PersistentSequence protocol refines Sequence by making the user make sure that a separate call to makeIterator() does not invalidate other iterators for that sequence already in play. The new hierarchy could be like:

Sequence, pretty much the same.
PersistentSequence: Sequence, adds multi-pass guarantee.
MutableSequence: PersistentSequence, adds mutableForEach(_).
ReplaceableSequence: PersistentSequence, adds insertAndOrReplaceBeforeEach(_), append(_), and possiblyRemoveEach(_), etc.
UnorderedCollection, Index is only Equatable, for-in is handled internally by calling indices, which is the only part of an unordered collection which must be a Sequence (PersistentSequence?, Collection?); Element, isEmpty, and subscripting stay the same, but there is no Iterator nor SubSequence nor anything else from Sequence.
Collection: UnordredCollection & PersistentSequence, pretty much the same as now.
MutableCollection now includes MutableSequence.
RangeReplaceableCollection now includes ReplaceableSequence.

dwaite · June 21, 2018, 9:54pm

How about this as a rough start? I wrapped the enum since there didn't appear to be a way to otherwise mutate the value within an existing linked list node.

public class Node<T> : Comparable {
    public init() {
        state = .empty
    }
    
    fileprivate enum State {
        case empty
        case cons(T, Node<T>)
    }
    fileprivate var state: State

    public static func == (lhs: Node<T>, rhs: Node<T>) -> Bool {
        return lhs === rhs
    }

    public static func < (lhs: Node<T>, rhs: Node<T>) -> Bool {
        guard lhs !== rhs else {
            return false
        }
        switch (lhs.state, rhs.state) {
        case (.empty, .empty):
            return false
        case (.cons(_,_), .empty):
            return true
        case (.empty, .cons(_,_)):
            return false
        case (.cons(_, let lnode), .cons(_, _)):
            return lnode < rhs
        }
    }
}

public struct LinkedList<T> : MutableCollection {
    
    public typealias Element = T
    public typealias Index = Node<T>
    
    public typealias SubSequence = LinkedList<T>
    
    public func index(after i: Node<T>) -> Node<T> {
        switch i.state {
        case .empty:
            fatalError()
        case .cons(_, let node):
            return node
        }
    }
    
    public var startIndex: Node<T>
    
    public var endIndex: Node<T> { return Node<T>()}
    
    public subscript(_ index: Node<T>) -> T{
        get {
            switch index.state {
            case .empty:
                fatalError()
            case .cons(let value, _):
                return value
            }
        }
        set {
            switch index.state {
            case .empty:
                index.state = .cons(newValue, Node())
                return
            case .cons(_, let lnode):
                index.state = .cons(newValue, lnode)
            }
        }
    }
}

jawbroken · June 22, 2018, 1:03am

I'm not sure exactly what problem you're trying to solve with the complex hierarchy here but, as @dwaite points out, this isn't the only possible Index type, so you shouldn't conclude that it “isn't really Collection-able”. See the thread about making Sequence just a convenient way to define a Collection, and the draft implementation, for a clever way to do this automatically for any Sequence type.

Martin · June 22, 2018, 5:21am

Doesn't that make index comparison a O(N) operation? I don't know if there is formal requirement, but SE-0065 states that indices should be “cheaply comparable”:

In this model indices store the minimal amount of information required to describe an element's position. Usually an index can be represented with one or two Ints that efficiently encode the path to the element from the root of a data structure. Since one is free to choose the encoding of the “path”, we think it is possible to choose it in such a way that indices are cheaply comparable. That has been the case for all of the indices required to implement the standard library, and a few others we investigated while researching this change.

jawbroken · June 22, 2018, 5:24am

That's why the draft implementation linked in my post above has

/// The associated values are:
/// - The zero-based position in the collection, for `Comparable` purposes.
/// - The element itself, so that it only needs to be computed once.
/// - The state, immediately after generating the element at this index.
case element(Int, Base.Element, Base.Iterator)

CTMacUser · June 23, 2018, 11:23pm

One problem that hierarchy fixes is specifying a re-visitable Sequence without having to jump to Collection. The various "Mutable" and "Replaceable" protocols aren't immediately important; what is important is adding a protocol above Sequence, adding one below Collection and properly connecting them.

The crux of the idea is that forcing Sequence into Collection is just a problematic as the current stuffing of Collection into Sequence. We go from having collections that are inappropriately ordered to sequences that could require space-O(n) caching. Collections and sequences represent concepts that are unrelated in the abstract. So the resolution has to break our code somewhere else; at the point where we say that Sequence is the One True Way to handle a for-in construct. One path switches to two, one for Sequence conformers and one for conformers to the new UnorderedCollection.