Removing elements at more than one index from an array?

The method is called remove(elementsAtIndices:) and changes the number of elements (and therefore invalidates all outstanding indexes), so it's a RangeReplaceableCollection method. That is, make it generic over RangeReplaceableCollection where Self: MutableCollection.

I was thinking of a variant that doesn't depend on MutableCollection, which has to selectively copy into a new instance and then replace.

...

Looking through "RangeReplaceableCollection.swift," this philosophy is used by removeAll(where:), which has implementations that do and don't depend on MutableCollection. (Remember that a RRC+MC implementation can't be used by String!)

Hmm, here's some cool potential docs:

extension RangeReplaceableCollection {

    /// Removes the elements from the collection at the indices in the specified sequence.
    ///
    ///     var bugs = ["Aphid", "Bumblebee", "Cicada", "Damselfly", "Earwig"]
    ///     bugs.removeOnly(thoseAt: [1, 4])
    ///     print(bugs)
    ///     // Prints "["Aphid", "Cicada", Damselfly"]"
    ///
    /// Calling this method may invalidate any existing indices for use with this
    /// collection.
    ///
    /// - Parameter list: The positions of the elements to remove. Each index must
    ///   be a valid index of the collection that is not equal to the collection's
    ///   end index.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public mutating func removeOnly<S: Sequence>(thoseAt list: S) where S.Element == Index {
        // Put one of those awesome algorithms here.
    }

    /// Removes the elements from the collection at the specified indices.
    ///
    ///     var bugs = ["Aphid", "Bumblebee", "Cicada", "Damselfly", "Earwig"]
    ///     bugs.removeOnly(elementsAt: 1, 4)
    ///     print(bugs)
    ///     // Prints "["Aphid", "Cicada", Damselfly"]"
    ///
    /// Calling this method may invalidate any existing indices for use with this
    /// collection.
    ///
    /// - Parameter list: The positions of the elements to remove. Each index must
    ///   be a valid index of the collection that is not equal to the collection's
    ///   end index.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public mutating func removeOnly(elementsAt list: Index...) {
        removeOnly(thoseAt: list)
    }

}

For removeOnly(thoseAt:), I assumed that sorting the list and/or needing repeated access is an internal detail, not needed for the user to worry about. Now the second version should probably stay an extension method, but should the first one move to being a requirement? Its philosophy violates how replaceSubrange(_: with:) works since the targeted elements may be discontiguous, so optimized implementations may be desired as needed, but can still have a reasonable default implementation using replaceSubrange(_: with:). Having the first version as a requirement may also ease using the RRC-only vs. RRC+MC split default implementations that removeAll(where:) does.

I've been writing up a similar idea, based on adaptations of the C++ functions adjacent_find, unique, and unique_copy. I couldn't figure out a good names for some of these, but right now I got:

  • For std::adjacent_find( ForwardIt first, ForwardIt last, BinaryPredicate p )
    • Collection.firstAdjacentRange(by:)
    • BidirectionalCollection.lastAdjacentRange(by:)
  • For std::unique( ForwardIt first, ForwardIt last, BinaryPredicate p )
    • MutableCollection.partitionAwayAdjacentValues(by:)
    • RangeReplaceableCollection.removeAdjacentValues(by:), which also needs MC
  • For std::unique_copy( InputIt first, InputIt last, OutputIt d_first, BinaryPredicate p )
    • Sequence.filterUnique(storingAs: by:)
    • LazySequenceProtocol.filterUnique(by:)

where the last one's return type of LazyFilterUniqueSequence follows Collection or BidirectionalCollection if the wrapped sequence does.

Does the distinct(by:) method grabs all subsequent copies of a value, or only the ones that are contiguous (like C++ and my methods)? The difference doesn't matter if the collection is sorted first, like the implementations of remove in this thread probably would.

Calling removeSubrange() repeatedly might be marginally faster than remove(at:) in specific circumstances (where you have lots of contiguous indices), but it's still quadratic. My implementation could be adapted to use IndexSet where Index == Int though. You wouldn't have to worry about whether or not the indices are sorted then.

At least for arrays, I think if we get the unsafe access to uninitialized contents proposal it will make it fairly easy to implement one that both avoids unnecessary reference ops and copies like this (note: This is the version that doesn't save the elements it's about to remove)

extension Array {
	mutating func remove(elementsAtSortedIndices indicesToRemove: [Int]) {
		withUnsafeMutableBufferPointerToStorage(capacity: count) { (buffer, initializedCount) -> () in
			// `buffer.baseAddress` can only be nil if the array is already empty
			guard let basePtr = buffer.baseAddress else { return }
			var removalIndexIterator = indicesToRemove.makeIterator()

			var nextRemoval = removalIndexIterator.next()
			// Guarantees that the first loop will hit the removal condition
			// This way we're guaranteed to not attempt to moveInitialize a value from itself, since I have no idea if that's allowed
			guard var writeIndex = nextRemoval else { return }
			for readIndex in writeIndex..<initializedCount {
				if readIndex == nextRemoval {
					(basePtr + readIndex).deinitialize(count: 1)
					nextRemoval = removalIndexIterator.next()
				}
				else {
					(basePtr + writeIndex).moveInitialize(from: basePtr + readIndex, count: 1)
					writeIndex += 1
				}
			}

			initializedCount = writeIndex
		}
	}
}
2 Likes

Yup your algorithm is going to be more efficient overall, although it could probably still use an IndexSet version working on contiguous ranges so the library can block-move stuff around.

Noting too that if we're not mutating in place then it becomes a variant of an IndexSet subscript (just use the inverse of the removal index set over the full initial array's range), which is an easy enough algorithm to implement efficiently.

Terms of Service

Privacy Policy

Cookie Policy