[Pre-pitch] removeAll(at:)

Myself and Zach Wolfe started putting a pitch together a few months ago for adding a new method to RangeReplaceableCollection for removing elements at particular indices. Both of us ran short of time to make this public and push it forwards, but in an effort to not let it die I've decided to publish what we have and give someone else the opportunity to see it to the finish. We both believe it would be a great addition to the standard library and at least deserves to be discussed out in the open.

Draft proposal: Remove Elements At Indices · GitHub

The latest version of the implementation:

extension Collection where Element: Comparable {
    @inlinable
    internal var _isSorted: Bool {
        return _isSorted(by: <)
    }

    @inlinable
    internal func _isSorted(by areInIncreasingOrder: (Element, Element) throws -> Bool) -> Bool {
        do {
            var iterator = makeIterator()
            guard var prevElement = iterator.next() else { return true }

            while let element = iterator.next() {
                if try areInIncreasingOrder(element, prevElement) { return false }
                prevElement = element
            }
        } catch {
            return false
        }
        return true
    }
}

extension RangeReplaceableCollection {
    @inlinable
    internal mutating func _removeAll<C>(atSorted indicesToRemove: C) where
        C: BidirectionalCollection,
        C.Element == Index
    {
        var iterator = indicesToRemove.reversed().makeIterator()
        guard let lastIndex = iterator.next() else { return }
        var range = lastIndex..<endIndex
        while let index = iterator.next() {
            precondition(index != range.lowerBound, "Duplicate indices")
            precondition(index < range.lowerBound, "Unsorted indices")
            let nextIndex = self.index(after: index)
            if nextIndex == range.lowerBound {
                range = index..<range.upperBound
            } else {
                removeSubrange(range)
                range = index..<nextIndex
            }
        }
        removeSubrange(range)
    }

    @_transparent
    public mutating func removeAll<S>(at indicesToRemove: S, sorted: Bool = false) where
        S: Sequence,
        S.Element == Index
    {
        if sorted {
            _removeAll(atSorted: Array(indicesToRemove))
        } else {
            _removeAll(atSorted: indicesToRemove.sorted())
        }
    }

    @_transparent
    public mutating func removeAll<C>(at indicesToRemove: C, sorted: Bool = false) where
        C: BidirectionalCollection,
        C.Element == Index
    {
        if sorted || indicesToRemove._isSorted {
            _removeAll(atSorted: indicesToRemove)
        } else {
            _removeAll(atSorted: indicesToRemove.sorted())
        }
    }
}

extension RangeReplaceableCollection where Self: MutableCollection {
    @inlinable
    mutating func _removeAll<S>(atSorted indicesToRemove: S) where
        S: Sequence,
        S.Element == Index
    {
        var indicesToRemove = indicesToRemove.makeIterator()
        // Shift the elements we want to keep to the left.
        guard var destIndex = indicesToRemove.next() else { return }
        precondition(indices.contains(destIndex), "Index out of range")

        var srcIndex = index(after: destIndex)
        var previousRemovalIndex = destIndex
        func shiftLeft(untilIndex index: Index) {
            precondition(index != previousRemovalIndex, "Duplicate indices")
            precondition(previousRemovalIndex < index, "Unsorted indices")
            while srcIndex < index {
                swapAt(destIndex, srcIndex)
                formIndex(after: &destIndex)
                formIndex(after: &srcIndex)
            }
            formIndex(after: &srcIndex)
        }
        while let removeIndex = indicesToRemove.next() {
            precondition(indices.contains(removeIndex), "Index out of range")
            shiftLeft(untilIndex: removeIndex)
        }
        shiftLeft(untilIndex: endIndex)

        // Remove the extra elements from the end of the collection.
        removeSubrange(destIndex..<endIndex)
    }

    @_transparent
    public mutating func removeAll<C>(at indicesToRemove: C, sorted: Bool = false) where
        C: Collection,
        C.Element == Index
    {
        if sorted || indicesToRemove._isSorted {
            _removeAll(atSorted: indicesToRemove)
        } else {
            _removeAll(atSorted: indicesToRemove.sorted())
        }
    }

    @_transparent
    public mutating func removeAll<S>(at indicesToRemove: S, sorted: Bool = false) where
        S: Sequence,
        S.Element == Index
    {
        if sorted {
            _removeAll(atSorted: indicesToRemove)
        } else {
            _removeAll(atSorted: indicesToRemove.sorted())
        }
    }
}

It's also worth noting that it might also be useful to expose the isSorted methods that are used within our solution, but since thats not the reason for this pitch it isn't mentioned anywhere else in the draft proposal and should probably have it's own thread.

2 Likes

If I understand correctly, your implementation calls removeSubrange multiple times, but it may invalidate the indices you're using.

I'm in favor of adding that method :) I often need it, and usually solve that by converting to a Set, but that doesn't preserve the order and works only for Hashable elements.

1 Like

I think it would be useful to include a version of partition which worked on Indexes.
Then, for MutableCollections, this would just be a partition + chop.

1 Like

I understand the issue it tries to resolve but I’m not sure how a generic implementation can achieve this. Index invalidation semantics are unspecified in the collection protocols — all of them, as far as I know. Even replaceSubrange(_:with:) on RangeReplaceableCollection doesn’t guarantee anything:

Calling this method may invalidate any existing indices for use with this collection.

Making it a non-defaulted requirement instead would be more plausible since a concrete implementation knows how to remove elements given their (pre-deletion) indices. An array or array slice can sort the indices in descending order and simply invoke remove(at:) with those indices (or more efficiently, move non-removed elements forward). A set can determine the elements to remove, then repeatedly invoke remove(_:) with those elements. However, adding a non-defaulted requirement to a stdlib protocol is a source-breaking change and I don’t see this happening any time soon.

Adding this to Array(Slice) instead isn’t a breaking change but I’m worried about the cost of sorting the indices not being visible in removeAll(at:) call sites.

Also, (irrelevant) nitpick:

This results in the indices pointing to different elements each time the loop runs and incorrect elements getting removed. In the worst case this will result in an out of bounds error.

I’d argue an out of bounds fatal error is the second-best case scenario, second only to it working as expected. Incorrect behaviour is hard to debug, a crash near the erroneous code is easy! :stuck_out_tongue:

2 Likes

Maybe "All" in "removeAll" is a bit confusing/redundant? remove(at: [5, 7]) seems clear enough.