Singly and Doubly Linked List collections in standard library

tayyab · April 4, 2018, 7:09am

This design is much better and simpler but I think InvertedRange is deviated from the standard Range design. I mean reading it the first time creates a sense that it might be some kind of range that forwards from end to start. Furthermore there are already methods like prepend, truncate, etc in the protocol, so giving similar literals in the enum kinds of duplicates things. It is also harder to handle these many cases.

There are already different ranges in the standard library such as ClosedRange, PartialRangeFrom, PartialRangeThrough and PartialRangeUpTo. But if we can introduce an OpenRange with a new operator >.< that excludes both start and end elements, that would simplify the interface further. We'll operate on OpenRange so that indices are not invalidated and we won't have to provide new ranges in trade methods. For this to work, additional properties are required for accessing before start named like head and accessing after end named like tail.

I haven't yet thought in depth about what additional complexities it would introduce though. But it might be something to consider.

tayyab · April 4, 2018, 9:25am

Just realized endIndex is already “past the end” position. So there is no need for tailIndex. headIndex or a similar naming will suffice.

CTMacUser · April 5, 2018, 3:46am

Since it's one extra state past what Index needs, what about an optional? That what I got so far:

public struct OpenRange<Bound: Comparable>: Equatable {
    public let lowerBound: Bound?
    public let upperBound: Bound

    public init(uncheckedBounds bounds: (lower: Bound, upper: Bound)) { /*...*/ }
    public init(_ range: PartialRangeUpTo<Bound>) { /*...*/ }
}
extension OpenRange: RangeExpression { /*...*/ }

But now, looking at this post again, I see that I can't use it. The point of using a half-closed range is that the second bound is where I break the linked list. I can't use an open range because I can't backtrack (just once) to the severing point when I only have a singly-linked list.

Oh, I'm using ">*<" for the open-range operator. The period (".") is legal as an operator character only when the entire operator string starts with a period. So I would need something like ".>.<" otherwise.

tayyab · April 5, 2018, 8:16am

Sounds fair enough. How about this one.

infix operator >**

/// A half-open interval, from a lower bound (excluding), to an upper bound (including).
public struct LeftOpenRange<Bound: Comparable>: Equatable {
    public let lowerBound: Bound
    public let upperBound: Bound

    public init(uncheckedBounds bounds: (lower: Bound, upper: Bound)) { /*...*/ }
}
extension LeftOpenRange: RangeExpression { /*...*/ }

There should be no need to make lowerBound optional as there will be a headIndex pointing to a before start index but no element would exist here.

And with that, moved ranges are not intact anymore. How about making firstRange and secondRange inout as well? It would increase the complexity a little bit but I think it would still be better as trade methods will be rarely used considering the fact that there are already flexible options such as segregate, truncate, prepend, etc.

CTMacUser · April 5, 2018, 4:13pm

I had the same idea before checking up on this thread.

On the headIndex idea: that would have to be on Collection somehow. I think for backwards-compatibility, it needs to be a new protocol:

/**
 A collection that supports a "past the start" sentinel index value.

 Before-the-start index values ensure that extraction operations on the prefixes of instances of types conforming to `PoorlyTransferableCollection` (*i.e.* singly-linked list types) can use the same methods that take a general (open) range, instead of custom variant methods targeting the prefix of a collection.

 Conforming to the PreflightedCollection Protocol
 ================================================

 To add PreflightedCollection conformance to your custom types, implement the `beforeStartIndex` property in addition to the requirements of the `Collection` protocol.

 If your type also conforms to `BidirectionalCollection`, now `c.index(after: c.index(before: c.startIndex)) == c.startIndex` is valid (and `true`) for a given bi-directional collection `c`.
 */
public protocol PreflightedCollection: Collection {

    /**
     The collection's "past the start" position—that is, the position one less than the first valid subscript argument.

     Note that "past" from `startIndex` to `beforeStartIndex` is the opposite direction as going from the last valid subscript argument to `endIndex`.

     When you need an (at least) left-open range that includes the first element of a collection, use the fully-open range operator (`>*<`) or left-open range operator (`>**`) with `beforeStartIndex`.  Either operator creates a range that doesn’t include the lower bound, so it’s always safe to use with `beforeStartIndex`. For example:

         var numbers: MyDoublyLinkedList = [10, 20, 30, 40, 50]
         if let index = numbers.index(of: 30) {
             let lowNumbers = numbers.segregate(out: numbers.beforeStartIndex >** index)
             print(lowNumbers)
         }
         print(numbers)
         /*
             Prints: """
                [10, 20, 30]
                [40, 50]
                """
          */

     As the "past the start" sentinel,

         c.index(after: c.beforeStartIndex) == c.startIndex
         b.index(before: b.startIndex) == b.beforeStartIndex

     for given non-empty collections `c` and `b`, where the type of `b` also conforms to `BidirectionalCollection`.  Like `endIndex`, `beforeStartIndex` is not dereferencable relative to `self` because there is no corresponding element.  And it is not proper to backwards-iterate to before `beforeStartIndex`.

     If the collection is empty, `beforeStartIndex` is equal to `startIndex` (and `endIndex`).
     */
    var beforeStartIndex: Index { get }

}

tayyab · April 6, 2018, 7:31am

If the collection is empty, beforeStartIndex is equal to startIndex (and endIndex).

Equality of startIndex and endIndex for an empty collection makes sense but beforeStartIndex should never be equal to startIndex. Otherwise, it would be against the range semantic.

tayyab · April 6, 2018, 7:59am

OK. So currently we have two options to tackle ranges (for singly linked list). The first one would be to introduce a LeftOpenRange and operate trade methods over it. The pros of this approach are following:

The index can be made to point to the actual node. So there is no misleading regarding element index.

The cons are following:

We would have to introduce a new range type.
It requires a new collection type protocol that exposes a property for accessing before start index.
In replaceSubrange method, we would have to traverse from the start of the list to the previous node first.

The other option is to operate on existing Range type and have the index contain the previous node. The pros of this approach are following:

There would be no need to introduce another range type or a collection type that exposes before start index.
The replaceSubrange method can be implemented without traversing to previous node first.

The cons are following:

The index would contain the previous node of expected element which would be misleading.
There would be another dereference for accessing next node’s element in subscript.

CTMacUser · April 7, 2018, 7:06am

If there was no endIndex, then we would have startIndex and beforeStartIndex equal to represent empty collections, right? Why would the addition of endIndex change that?

Also, why would having users need to design two junk sentinel index values for empty lists be better?

CTMacUser · April 7, 2018, 7:20am

Right now, I got OpenRange and LeftOpenRange range value types and a PreflightedCollection protocol to support beforeStartIndex (which the two range types support).

I'm reminded of a different thread, where they're discussions of whether Dictionary and Set are worthy of being Sequence/Collection types. They are plain Collection types, although they do have element manipulation. Dictionary can partially change its element's values, and it and Set can add or remove elements. However, those capabilities are restricted; they cannot do arbitrary general changes, and that's why they don't conform to MutableCollection (for Dictionary) or RangeReplaceableCollection.

We may have to do the same here. PoorlyTransferableCollection will have to branch off of Collection directly, instead of RangeReplaceableCollection, because PTC can't do most of RRC's methods without a O(n) search time penalty. But the bi-directional requirement in TransferableCollection means that TC will refine both PTC and RRC.

If we want to be able to add/remove elements without affecting the validity of indexes of uninvolved elements, then an index cannot use its predecessor for referral. (What happens if that neighbor is deleted? How could already existing Index values get their dangling reference updated?)

Michael_Ilseman · April 7, 2018, 11:49pm

(Big fan of LLVM ADT)

LLVM's ilists really shine because they are capable of storing polymorphic elements which have a natural container they belong to.

For monomorphic elements, or for storing polymorphic elements in an auxiliary container, @lorentey's BTree package has a List that could be an interesting start.

John_McCall · April 8, 2018, 4:11am

Linked lists really excel when you can build references directly to the list nodes and use those directly as insert/remove/split/merge points.

Such references are incompatible with value semantics, or at least I don't see how they couldn't be. And it's hard to imagine that we'd add a general purpose collection with reference semantics to the standard library.

If you can't directly reference nodes, then you'll never get the full benefits of a linked list, and any interior insert/remove/split/merge will require a linear scan. That might seem better than a linear copy in principle, but in practice the memory-system impact of separate allocation means it probably isn't. In such cases, you should really be using either a deque (if your collection can get large and you really do need to do insert/remove/split/merge at an arbitrary index) or just a dynamic array that isn't rooted to the start of its buffer (if all you need is efficient insert/remove at the beginning as well as the end).

IIRC, the classic argument against deques is that they're very, very rarely used in any way that actually justifies the additional complexity and overhead. The data structure really does need to get quite large, and you need to be making a lot of interior changes that have to be observed immediately, before they make sense.

I think it's fair to complain that Swift doesn't have a data structure that allows efficient insert/remove at the front.

saagarjha · April 8, 2018, 5:08am

Could we possibly allow mutation through an index? Possibly, something like remove(before:)/remove(after:), insert(before:)/insert(after:)?

I think CFArray is actually a deque, so this is technically untrue ;) That being said, I still think that we should still look into adding a Deque data type, possibly based on a circular buffer or similar.

John_McCall · April 8, 2018, 5:39am

Well, but what's an index? If an index just holds a direct reference to a node, then the indices are only valid as long as the list uses that exact node — that is, it will be invalidated if the list needs to copy its nodes. If the nodes are shared between values (copy on write), that will be required for any mutation of the list, and I don't think non-structural mutations (e.g. list[idx] = 5) are allowed to invalidate indices like that. If the nodes aren't shared between values, then slices are either no longer independent values or they're required to use different nodes (and hence indices) from their parent collection.

More fundamentally, the concept only really works if other things hold long-standing direct references to nodes in the list. Even if those references are abstracted behind indices, they're effectively interior pointers into a specific list variable. That's definitely not pure value semantics.

You could have a move-only linked list type, though.

taylorswift · April 8, 2018, 5:47am

i literally write this type manually for every project. it’s gotten to the point where it lives in a gist i just copy and paste everywhere. please, dequeue in the standard library, please

saagarjha · April 8, 2018, 6:00am

That's what I was thinking, yes.

But we control when a copy occurs, right? Or am I not understanding what you mean by copy? Ignoring COW for now by making a node be a class (does a struct have to be COW? Can I opt out, short of making a class?), this should still work out. list[idx] = 5 would not change the identity of the node pointed at by idx, it would just peek into the node and change it's value there.

I don't think I understand you here. Each node is unique, irrespective of the value it caries. A slice is just something like "2nd node to 5th node" from the parent linked list. Indices are preserved from the parent collection, because the index "2nd node" from the parent linked list now points to the first element in this slice.

What exactly do you mean here by pure value semantics? Are you saying that it's not possible to satisfy this:

I'd like a copy of the linked list to behave like any other value type does: copying it copies its internal nodes as well. In doing so, indices from the original list wouldn't work on the new one, but that's not a requirement, right?

John_McCall · April 8, 2018, 3:01pm

If copying a list copies all the nodes (which is not something we currently expose any ability to do), then yes, indices will change, and that’s arguably permitted. But the current language model around slices is that they are also independent values, a sort of copy of a subset of the elements; and slices are required to share indices.

Suppose my list asserts that it uniquely references its nodes, which it can do while remaining copyable by using some future copy-constructor feature. This means that subscript-set can just directly overwrite the node’s contents without doing a uniqueness test, which is great.

Now, suppose I create a slice:
var slice = list[n1..<n2]

Either this slice has its own independent nodes, which means the indices are different, or I can permanently use this slice to modify the contents of list from afar, which is not value semantics.

You really need at least slices to be non-copyable, making the above ill-formed and forcing you instead to write (using a future scoped-acces feature from the ownership manifesto):

inout slice = &list[n1..<n2]

saagarjha · April 8, 2018, 8:17pm

Wait, this isn't exposed? I'm not aware of how other standard library data structures are designed, but wouldn't something like Array need to copy its internal buffer if it is copied?

Ooh, I didn't realize that; I always thought of slices as a mutable view into a collection. Just so I'm on the same page, would you mind answering these?

I take a slice and and then mutate the parent collection in a way that invalidates all of its indices. Do indices I stored from the parent collection before the mutation still work on the slice, or can I call them invalid? Conversely, the new indices from the parent don't work on the slice, right?
Do I have to allow for every linked list operation on the slice? Can I get away with just supporting assignment through an index (e.g. slice[index] = 42) and not support things that invalidate indices like slice.insert(42, after: index)? Do slices even have to be mutable?
Is there a time complexity requirement on creating a slice?

I can think of a couple horrible solutions that might work, such as a slice consulting its parent collection or nodes refcounting how many indices point to them to stay alive, but I'd like to know what operations I need to support when slicing.

hlovatt · April 9, 2018, 12:59am

Linked lists are not that great a data structure in terms of performance, as others in this forum have pointed out; also it turns out that they are not a good fit for the design of the Collection's protocols; and they don't play well with reference counting.

If instead of a linked list a segmented array was added, would that work better?

Segmented arrays are used in Haskell and other languages to give better performance over linked lists. The basic data structure for a segmented array is [[T]], i.e. an array of arrays. Each of the segments (sub-arrays) is kept within size limits, often less than 128 entries, by adding a new segment when the current segment reaches its size limit. The reverse operation of merging segments is performed if two small segments are next to one another.

John_McCall · April 9, 2018, 7:24am

Array, like most Swift data structures, uses copy-on-write: a primitive value copy just increments a reference count, and mutating methods explicitly check the reference count and clone the underlying structure if the reference is not unique. There are good reasons to use such a design anyway, but it also means that we have not needed to add C++-style rule-of-five resource management.

John_McCall · April 9, 2018, 7:30am

As a general rule, indices that work in a particular collection value should continue to work in it regardless of what you do to other collections. I don’t know what other guarantees we make for all collections.