Singly and Doubly Linked List collections in standard library

CTMacUser · April 9, 2018, 9:24am

I put up a sample as a Gist. I haven't tested any of it; either the range types or the protocols.

Especially the protocols, since I haven't implemented a conforming type yet. If someone wants to try it to see if my philosophy on the core methods, default implementations, and/or extended methods work, go ahead.

Oh, anyone have any ideas on how to make Slice work for PoorlyTransferableCollection? I tried once, but got stymied by the members being restricted when you're not on the Standard Library team. (The "base" is read-only for outsiders and read-write only for internal.)

tayyab · April 9, 2018, 2:07pm

Linked lists really excel when you can build references directly to the list nodes and use those directly as insert/remove/split/merge points.

I have looked into the idea of intrusive list but I see following drawbacks.

There would be some sort of protocols like ForwardLink and DualLink with next and previous properties. The conformance of these protocols will introduce the overhead of virtual calls.
Memory management will be very difficult as swift is neither fully MRC nor GC. And I have already mentioned recursive dealloc issue of ARC. Plus there will be overhead of retain and release calls of Node classes.
The element will be locked to a Node type which is kind of weird for a collection.

tayyab · April 10, 2018, 4:50am

Nope, beforeStartIndex and startIndex still wouldn't be equal. I mean, in a general conversation we can say that the start and end of a thing is equal. But it would be wrong to say that the start and before start is equal.

hlovatt · April 10, 2018, 5:39am

A Segmented Array would work with the current collection design including startIndex and endIndex and would have the benefits of a linked list. A 'minimal' SegmentedArray is below:

struct SegmentedArray<T>: CustomStringConvertible, ExpressibleByArrayLiteral {
    private let maxSegmentSize = 128 // Should be static but can't do static in generic types in Swift 4.
    
    private var segments = /*Contiguous*/Array</*Contiguous*/Array<T>>() // It is easier to debug with Array because Xcode won't display ContinuousArray.
    
    private(set) var startIndex = 0
    
    private(set) var endIndex = 0
    
    init() {}
    
    init<S>(_ s: S) where S: Sequence, S.Element == T {
        for element in s {
            append(element)
        }
    }
    
    init(arrayLiteral elements: T...) {
        for element in elements {
            append(element)
        }
    }
    
    init(repeating repeatedValue: T, count: Int) {
        for _ in 0 ..< count {
            append(repeatedValue)
        }
    }
    
    mutating func append(_ element: T) {
        insert(element, at: endIndex)
    }
    
    mutating func insert(_ element: T, at index: Int) {
        defer {
            endIndex += 1
        }
        let (segmentIndex, subIndex) = index == endIndex ? appendSegmentAndSubIndices() : segmentAndSubIndices(for: index)
        guard segments[segmentIndex].count < maxSegmentSize else {
            var newSegment = /*Contiguous*/Array<T>()
            newSegment.append(element)
            segments.append(newSegment)
            return
        }
        segments[segmentIndex].insert(element, at: subIndex)
    }
    
    @discardableResult mutating func remove(at index: Int) -> T {
        let (segmentIndex, subIndex) = segmentAndSubIndices(for: index)
        let result = segments[segmentIndex].remove(at: subIndex)
        let isMerged = tryToMergeWithPreviousSegment(currentIndex: segmentIndex)
        tryToMergeWithPreviousSegment(currentIndex: (isMerged ? segmentIndex : segmentIndex + 1))
        endIndex -= 1
        return result
    }
    
    subscript(index: Int) -> T {
        get {
            let (segmentIndex, subIndex) = segmentAndSubIndices(for: index)
            return segments[segmentIndex][subIndex]
        }
        set {
            let (segmentIndex, subIndex) = segmentAndSubIndices(for: index)
            segments[segmentIndex][subIndex] = newValue
        }
    }
    
    var description: String {
        return segments.description
    }
    
    private func segmentAndSubIndices(for index: Int) -> (Int, Int) {
        guard index >= startIndex else {
            fatalError("Index, \(index), < startIndex, \(startIndex)")
        }
        var subIndex = index - startIndex
        for (segmentIndex, segment) in segments.enumerated() {
            let count = segment.count
            guard subIndex >= count else {
                return (segmentIndex, subIndex)
            }
            subIndex -= count
        }
        fatalError("Index, \(index), >= endIndex, \(endIndex)")
    }
    
    private mutating func appendSegmentAndSubIndices() -> (Int, Int) {
        if segments.isEmpty {
            segments.append(/*Contiguous*/Array<T>())
        }
        let segmentsLastIndex = segments.endIndex - 1
        return (segmentsLastIndex, segments[segmentsLastIndex].endIndex)
    }
    
    @discardableResult private mutating func tryToMergeWithPreviousSegment(currentIndex: Int) -> Bool {
        guard currentIndex > segments.startIndex && currentIndex < segments.endIndex else {
            return false
        }
        let previousIndex = currentIndex - 1
        guard segments[currentIndex].count + segments[previousIndex].count <= maxSegmentSize else {
            return false
        }
        segments[previousIndex].append(contentsOf: segments[currentIndex])
        segments.remove(at: currentIndex)
        return true
    }
}

var array = SegmentedArray(0 ..< 499)
print(array)
print(array[array.startIndex]) // 0
print(array[131]) // 131
print(array[array.endIndex - 1]) // 498

for _ in 0 ..< 211 {
    array.remove(at: 131)
}
print(array)
print(array[array.startIndex]) // 0
print(array[131]) // 342
print(array[array.endIndex - 1]) // 498

CTMacUser · April 16, 2018, 2:16am

I've updated the Gist with a sample singly-linked list type. I still haven't really tested it, though.

There's one thing I found out:

public subscript(bounds: Range<Index>) -> SinglyLinkedList
public mutating func trade(_ subrange: inout LeftOpenRange<Index>, with other: inout SinglyLinkedList, along otherSubrange: inout LeftOpenRange<Index>)

at most one of these can be O(1).

The original plan was trade(_: with: along:) being O(1). This means that all the nodes need to be directly visible when doing the trading. But this means that I have to layout all the nodes when constructing a sub-sequence; so a subscript would be O(n).

But the Standard Library prefers subscripting to be O(1). This means storing only the class pointer to the storage and the end points. So I can only get all of the specified nodes during trade(_: with: along:) by walking the list, making that O(n) instead.

CTMacUser · April 18, 2018, 8:44am

I tried a less minimal version at a Gist I made. Not really tested, though.

taylorswift · August 8, 2018, 12:01am

any update on this? i’m really needing some doubly linked lists rn

CTMacUser · August 11, 2018, 5:10pm

What do you want to do? Your adaptation of @saagarjha and @John_McCall's deque idea? My adaptation of @hlovatt's segmented-array idea? Something else? Remember that, without adding methods to trade nodes between lists, a doubly-linked list is externally an Array crippled to be a BidirectionalCollection.

taylorswift · August 11, 2018, 6:08pm

basically just stable indices that don’t change with insertions, constant time insertions and deletions, and a way to iterate through all elements. Arrays can’t have stable indices and constant insertions/removals, and linear traversal at the same time. A dictionary can do it but you have to accept monotonically increasing index values (meaning once you use an index you can’t reuse it even if the element is deleted).

even without trading nodes, linked lists have the benefit of 1) elements don’t move around when the list is edited, so you can keep references to them, 2) you can iterate over all elements in linear time, and 3) you can insert and delete in constant time

John_McCall · August 11, 2018, 6:26pm

Well, part of the thing about linked lists is precisely that people often think they're going to be useful in the abstract and then realize that there's a better data structure that works just fine in their use case.

taylorswift · August 11, 2018, 6:29pm

yeah i’ve noticed that but it turns out in this case unordered array doesn’t really work for me since the element that would replace the deleted one would still be moved so everything that references it would have to be updated with its new index. so,, linked list it is

CTMacUser · August 11, 2018, 11:00pm

I was going to show you a sample I already had in my Gist, but I never published it! I only have a singly-linked list implementation there. The doubly-linked version is still on a playground on my hard drive.

Maybe I should put the DLL up there, but it has to be really cleaned up. I have a node-trading refinement of Collection and left- and fully-open range implementations mixed in.

Torust · August 11, 2018, 11:30pm

I’ve also got a basic implementation of a doubly-linked list you can adapt at https://github.com/troughton/SwiftFrameGraph/blob/master/Sources/Utilities/LinkedList.swift. License on the repository is MIT but you can treat that file as public domain.

hlovatt · August 12, 2018, 4:13am

I have certainly used segmented arrays (see my post above) for arrays that require elements randomly inserted, which I think is a commonly cited use case for doubly linked lists (which I found too slow compared to segmented arrays). Another data structure that I have used in Java quite a bit is a sorted dictionary, which I think would do what @taylorswift wants.

So probably saying that there is room for other data structures in Foundation, but agreeing with you that linked list probably isn't worth it.

tayyab · August 12, 2018, 7:49am

It is true that linked lists can be easily created, but the ARC nature of swift introduces a stack overflow on recursive deallocation of nodes.

When I proposed linked lists, I provided an implementation with following use cases in mind:

Constant access/insertion/deletion/removal time
Fixed above mentioned issue
No overhead of retain/release calls
Sharing of nodes with slices and copy on write
Giving the user visibility of node class

Having said that, I certainly like the move semantics suggested by @CTMacUser. But it is quite difficult to make those work in constant time on both types of linked lists.

taylorswift · August 12, 2018, 9:22pm

sorted is actually unnecessary, all I need is for the indices to be stable. the problem with a dictionary is there is no way to find a free key for insertion in constant time, you have to keep trying until you hit dict[test] == nil.

(actually, sorted could solve this probem by taking the highest key and lowest key and incrementing/decrementing it. but keeping a sorted data structure seems like overkill for that)

Torust · August 12, 2018, 9:46pm

If I understand correctly, one possible solution is an array plus a stack. When adding to the array, you either pop the last free index off the stack or you append to the end; when removing, you push the index of the removed object onto the stack. That way, insertions and deletions are constant time, traversal is linear-ish time (just need to check if elements are nil), and indices are stable.

For storing the elements, you can either store them tightly packed in an UnsafeMutablePointer<T> and then initialise/deininitalise individual elements, or else you can just use Optionals (e.g. if you’re storing reference types). If you go for the first option, you probably want another Boolean array indicating whether each element is initialised, since otherwise you need to check through the freeIndices list at each step of iteration.

taylorswift · August 13, 2018, 3:29am

that part is the problem, traversal can become slow if there are many “holes” in the array, and there is no bound on the ratio of holes to occupied elements.

Torust · August 13, 2018, 3:35am

There's no bound on the ratio, but there's a bound on the time to iterate through the full list: it's the maximum number of elements that have been in the list at any one time. Sure, you might run into problems if you insert a million elements and then delete 990,000, but if that's the case it seems like there are other issues with the design.

Have you profiled this method? I'm fairly certain that it would take a very large ratio of holes to filled before a linked list became more performant; iterating through a linear array of data is generally very quick.

taylorswift · August 13, 2018, 3:41am

at this point the problem really becomes more similar to memory defragmentation and allocation than anything else.