Singly and Doubly Linked List collections in standard library

taylorswift · April 2, 2018, 1:28am

yeah you’re probably right. you’d still need some way of using the Int parameter to generate the inline storage though. Without it it’s basically a macro that changes a static variable inside the type and we can already do that in the current protocol system

felix91gr · April 2, 2018, 1:44am

Constant as in, values instead of types?

Like Tuple<n: UInt>?

Because... that would be awesome!

Chris_Lattner3 · April 2, 2018, 2:59am

Yes

felix91gr · April 2, 2018, 3:22am

Omg. So much type-level fun to have! Whenever that arrives, I’m gonna be a happy man :)

tayyab · April 2, 2018, 9:33am

The only reason I gave visibility of nodes was to let the people know what kind of type is being used for storing an element behind the scenes. Of course they would only be able to manipulate it from the collection itself.

By the way, "move semantics" make much more sense. I have taken a look at your design and it seems good. The naming of mutating func trade(my subrange: Range<Index>, forNodesOf other: inout Self, along otherSubrange: Range<Index>) -> (myNewNodes: Range<Index>, otherNewNodes: Range<Index>) seems a little complex though. Also I think the protocol should be named something like TransferableCollection if using this approach.

CTMacUser · April 2, 2018, 8:11pm

Which is a form of juggling nodes around:

extension LinkedNodeCollection {

    /**
     Detaches and returns a suffix, breaking where the prefix would no longer match a running sort.

     The first two elements are checked if they are sorted in respect to the given predicate.  If so, the check continues with the second and third elements, then the third and fourth elements, and so on.  At the first pair where the later element isn't properly ordereed with respect to the earlier element, the suffix starting with that later element is detached and returned.

     - Parameter areInIncreasingOrder: A predicate that returns `true` if its first argument should be ordered before its second argument; otherwise, `false`.

     - Returns: The suffix of `self`, or `nil` if the entirety of `self` is sorted with respect to `areInIncreasingOrder`.  (An instance with less than two elements always returns `nil`.)

     - Postcondition: `self` is sorted with respected to `areInIncreasingOrder`, without any "equivalent" elements.  Further calls with the same predicate will return `nil`.  In other words, `zip(self.dropLast(), self.dropFirst()).map { areInIncreasingOrder($0.0, $0.1) }.elementsEqual(repeatElement(true, count: max(0, self.count - 1)))`.
     */
    public mutating func purgeSuffixMissorted(by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> Self? {
        var lastIncluded = startIndex
        for i in indices.dropFirst() {
            if try !areInIncreasingOrder(self[lastIncluded], self[i]) {
                return truncate(after: lastIncluded)
            }
            lastIncluded = i
        }
        return nil
    }

    /**
     Sorts the collection in place via storage-shuffling, using the given predicate as the comparison between elements.

     The sort is stable.

     - Parameter areInIncreasingOrder: A predicate that returns `true` if its first argument should be ordered before its second argument; otherwise, `false`.

     - Postcondition: `self` is sorted with respected to `areInIncreasingOrder`.
     */
    public mutating func mergeSort(by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows {
        // Check if this instance is already sorted.
        guard var missortedSuffix = try purgeSuffixMissorted(by: areInIncreasingOrder) else { return }

        // Recursively sort.
        try missortedSuffix.mergeSort(by: areInIncreasingOrder)

        // Merge.  Using the suffix collection as the first argument should make equivalent operands choose the element from this instance first.
        var sorted = try Self(merging: &missortedSuffix, and: &self, admitFirstOverSecond: areInIncreasingOrder)
        tradeShifted(my: (nil, endIndex), forNodesOf: &sorted, along: (nil, sorted.endIndex))
    }

}

extension LinkedNodeCollection where Element: Comparable {

    /**
     Detaches and returns the suffix where the sequence first stops increasing.

     - Returns: The suffix of `self`, or `nil` if the entirety of `self` is already strictly increasing.

     - Postcondition: `self` is a strictly increasing sequence.
     */
    public mutating func purgeNonincreasingSuffix() -> Self? {
        return purgeSuffixMissorted(by: <)
    }

    /**
     Sorts the collection in place, shuffling the elements' storage instead of mutating said elements.

     You can sort any element-transferable collection of elements that conform to the `Comparable` protocol by calling this method.  Elements are sorted in ascending order.  The sorting is stable; equal elements keep their relative order in this collection.

     To sort the elements of your collection in descending order, pass the greater-than operator (`>`) to the `mergeSort(by:)` method.

     - Postcondition: `self` is sorted in non-decreasing order.
     */
    public mutating func mergeSort() {
        mergeSort(by: <)
    }

}

I think that counts as O(1) memory management; I don't create any new nodes. I hope using a missorted suffix, which acts as a natural merge sort, counts as a smart scan.

Is it OK that the Comparable version of missorted splitting uses a different name?

I think the sort is stable, by choosing the suffix nodes to go first. But I wouldn't mind anyone double-checking.

CTMacUser · April 2, 2018, 8:36pm

What do you mean by iterator's invalidation behavior? Is it that moving nodes around shouldn't invalidate indexes/references to nodes unaffected by the move. (For example, if I remove a range out of a list, but keep index values for the elements corresponding to the nodes immediately before and after the range, those index values should still work after the removal.) I've implicitly been expecting that, but I don't know if I should insist on it in the protocol (description).

For the "more primitive" concept, do you mean like a protocol? Like the sample protocol from my previous message? Or are you talking about the techniques that would be used within the implementation of trade and/or tradeShifted?

CTMacUser · April 2, 2018, 9:03pm

For "SmallVector," you mean the one for LLVM? I guess that could be:

enum SmallVector<Element, let Threshold: Int> {
    case small(elements: [Threshold; Element?], useCount: Int)
    case large(elements: [Element])

    init<S>(_ elements: S) where S: Sequence, S.Element == Element {
        let array = Array(elements)
        if array.count > Threshold {
            self = .large(elements: array)
        } else {
            self = .small(elements: createArray {
                $0.0 < array.count ? array[$0.0] : nil
            }, useCount: array.count)
        }
    }
}

// Insert extension for whatever combination of MutableCollection, RandomAccessCollection, and/or RangeReplaceableCollection you want.

Where createArray is a Standard Library global function:

func createArray<Element, let Extents: Int pack>(_ initializer: ([#count(Extents); Int]) throws -> Element) rethrows -> [#explode(Extents); Element]

Ben_Cohen · April 2, 2018, 9:06pm

To be clear: there is no rule anywhere that data structures more specialized than the current Array/Dictionary/Set are too specific for the standard library. I even raised a jira to add sorted collections to the standard library, and would love for someone to pick that up and run with it.

Whether a linked list, however, is the right mix of generally useful vs difficult to implement in order to qualify for std lib inclusion is debatable (and being ably debated in this very thread :)

Letan · April 2, 2018, 9:14pm

This sounds great! I was thinking about bringing back up 0074, but solving it by using a more general sorted collection protocol. I mulled over it a while, but decided not to since I wasn’t sure their would be any benefits for it over say a sorted/ordered set performance wise. Am I missing a use case for sorted collection over said sets?

Tino · April 2, 2018, 9:18pm

Wait a moment... does that mean somebody could just write a simple SortedArray, and get his work into the stdlib?
My impression is that there many eager contributors who just don't have the time or knowledge to cope with C++ and llvm, so finding a volunteer for a pure Swift problem should be rather easy ;-)

Ben_Cohen · April 2, 2018, 9:30pm

It's not quite that simple :) A first step would be gathering justifications, putting together a good candidate implementation, and pitching that to the community. And it would probably need to be part of a wholistic consideration of sorted containers and algorithms on sorted input. That task shouldn't be taken lightly.

But it's definitely not the case that such a proposal would be ruled out-of-scope for the std lib from the very outset.

taylorswift · April 2, 2018, 9:33pm

the coding is the easy part, the bureaucracy is the hard part

Tino · April 2, 2018, 9:38pm

just to clarify: My "Like" just means I agree with the statement - I don't actually like that getting non-trivial changes through evolution is as tedious as hand coding the implementation in assembler ;-)

taylorswift · April 2, 2018, 9:39pm

it’s a fact of life that no proposal shall pass the swift evolutions without extensive bikeshedding over support for arbitrary-precision booleans

felix91gr · April 2, 2018, 9:51pm

How I wish we could have Maybe, Probably, Surely booleans!

But the arbitrary precision really comes from how you compose them. Like when things are ocurring Surely x Surely - Maybe so often <3

taylorswift · April 2, 2018, 9:56pm

what we really need is something that can represent arbitrarily fine-grained quantities between 0 (false) and 1(true). such a thing could be immensely useful for example, in probability applications, where we need some way of expressing uncertainty… if only there was a type you could use to express fractional quantities,, but with the binary point fixed after the 0.… like some sort of FixedFloatingPoint…

of course, we’d still need to work out how to support the literal initializers. currently ExpressibleByBooleanLiteral only takes a a standard 1-bit Bool. I suspect this will only be possible after Swift supports variadic generics though

CTMacUser · April 3, 2018, 3:14am

What about starting with a protocol:

protocol IntrusiveListNode: AnyObject {

    var previousNode: IntrusiveListNode? { get set }
    var nextNode: IntrusiveListNode? { get set }

}

But don't see any way to stop anyone who has access to a node's payload from also doing arbitrary link surgery. Are you expecting the users to just have enough discipline to not do that?

I also got a warning that adding "weak" or "unsafe" is going away. I don't see how to prevent reference counting and retains. Unless we ask the node to store OpaquePointer or similar, and take care of retention counts in the list type.

Chris_Lattner3 · April 3, 2018, 3:16am

I don't have a good sense for what will work here. I think the best approach here is to go off and build something, solve for various specific domain problems, and come up with something that works well across multiple common problems. Once you're feeling good about a design and the tradeoffs it encompasses, it would make sense to summarize what you've learned and then potentially put together a proposal.

-Chris

CTMacUser · April 3, 2018, 10:12pm

What about these names and interface:

public protocol PoorlyTransferableCollection: RangeReplaceableCollection where Self.SubSequence: PoorlyTransferableCollection {

    mutating func tradeOver(_ firstRange: InvertedRange<Index>, _ secondRange: InvertedRange<Index>) -> (newFirstRange: InvertedRange<Index>, newSecondRange: InvertedRange<Index>)
    mutating func trade(_ subrange: InvertedRange<Index>, with other: inout Self, along otherSubrange: InvertedRange<Index>) -> (myNewRange: InvertedRange<Index>, otherNewRange: InvertedRange<Index>)

}

extension PoorlyTransferableCollection {

    public mutating func tradeOver(_ firstRange: InvertedRange<Index>, _ secondRange: InvertedRange<Index>) -> (newFirstRange: InvertedRange<Index>, newSecondRange: InvertedRange<Index>) {
        var intermediate = Self()
        let intermediateFirstRange = trade(firstRange, with: &intermediate, along: .all).myNewRange
        let finalSecondRange = trade(secondRange, with: &intermediate, along: .all).myNewRange
        let finalFirstRange = trade(intermediateFirstRange, with: &intermediate, along: .all).myNewRange
        return (finalFirstRange, finalSecondRange)
    }

}

extension PoorlyTransferableCollection {

    public mutating func segregate(out subrange: InvertedRange<Index>) -> Self
    public mutating func truncate(upThrough i: Index) -> Self
    public mutating func prepend(stealingFrom other: inout Self)
    public mutating func truncate(after i: Index) -> Self
    public mutating func insertNodesAfter(i: Index, stealingFrom other: inout Self)
    public mutating func append(stealingFrom other: inout Self)

    public mutating func filterOut(by willBeRemoved: (Element) throws -> Bool) rethrows -> Self
    public mutating func arrangePartitioning(by belongsInSecondPartition: (Element) throws -> Bool) rethrows -> Index

    public init(merging first: inout Self, and second: inout Self, admitFirstOverSecond body: (Element, Element) throws -> Bool) rethrows
    public mutating func purgeSuffixMissorted(by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> Self?
    public mutating func mergeSort(by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows

}

extension PoorlyTransferableCollection where Element: Comparable {

    public mutating func purgeNonincreasingSuffix() -> Self?
    public mutating func mergeSort()

}

protocol TransferableCollection: PoorlyTransferableCollection where Self: BidirectionalCollection, Self.SubSequence: TransferableCollection { /*...*/ }

tradeOver(_:_:) exists for the same reason that MutableCollection.swapAt(_:_:) does; you can't call a function that mutates an object twice, so putting self in trade(_:with:along:) would violate mutation safety. Is there still a problem with the interface complexity? I'm going to insist that conforming types not invalidate the indexes of uninvolved elements (in tradeOver, trade, and replaceSubrange), but you still need to know the indexes of the newly moved elements somehow, and I don't want to force the user to instead track the neighboring elements to infer the new bounds.

A half-closed range is defined as:

public enum InvertedRange<Index: Comparable> {
    case beforeStart
    case prefix(upThrough: Index)
    case range(after: Index, upThrough: Index)
    case suffix(after: Index)
    case afterEnd
    case all
}