Removing elements at more than one index from an array?

I guess I might be missing something obvious here, but what is the recommended way (if any) to remove the elements at a specified list of indices from an array? Something like this:

var a = ["a", "b", "c", "d"]
let b = a.remove(elementsAtIndices: [1, 2])
print(a) // ["a", "d"]
print(b) // ["b", "c"]

You can use removeSubrange if the indices are continuous. If not, then I guess you can write your own extension function on Array to remove elements at a specified list of indices.

1 Like

Thanks, I should have made it clear that I meant removing a non-continuous sequence (in specific order). So the example would have been better as eg:

var a = ["a", "b", "c", "d"]
let b = a.remove(elementsAtIndices: [3, 1])
print(a) // ["a", "c"]
print(b) // ["d", "b"]

The simplest way would be to simply loop over the indices and call remove(at:), but it's not very efficient. You could probably use stable partition or half-stable partition to do this more efficiently. There's an implementation of stable partition you could look at: swift/Algorithms.swift at main · apple/swift · GitHub

You'll have to adjust it to make it work with your use case.

1 Like

How’s this?

extension Array {
    mutating func remove(elementsAtIndices indicesToRemove: [Int]) -> [Element] {
        guard !indicesToRemove.isEmpty else {
            return []
        }
        
        // Copy the removed elements in the specified order.
        let removedElements = indicesToRemove.map { self[$0] }
        
        // Sort the indices to remove.
        let indicesToRemove = indicesToRemove.sorted()
        
        // Shift the elements we want to keep to the left.
        var destIndex = indicesToRemove.first!
        var srcIndex = destIndex + 1
        func shiftLeft(untilIndex index: Int) {
            while srcIndex < index {
                self[destIndex] = self[srcIndex]
                destIndex += 1
                srcIndex += 1
            }
            srcIndex += 1
        }
        for removeIndex in indicesToRemove[1...] {
            shiftLeft(untilIndex: removeIndex)
        }
        shiftLeft(untilIndex: self.endIndex)
        
        // Remove the extra elements from the end of the array.
        self.removeLast(indicesToRemove.count)
        
        return removedElements
    }
}
4 Likes

Thanks! That's way more efficient than the simple solution I used:

extension Array {
    mutating func remove(elementsAtIndices indicesToRemove: [Int]) -> [Element] {
        let removedElements = indicesToRemove.map { self[$0] }
        for indexToRemove in indicesToRemove.sorted(by: >) {
            remove(at: indexToRemove)
        }
        return removedElements
    }
}

It would be interesting to see how this method would be properly (including more generically) implemented, for example I guess that it should trap if indicesToRemove contains duplicated indices.

Thank you for the small challenge, I hope I did not forget any corner case.

extension RangeReplaceableCollection where Index: Hashable {
  public mutating func removeAll<C>(at collection: C) -> Self where
    C: Collection,
    C.Element == Index
  {
    let indices = Set(collection)
    // Trap if number of elements in the set is different from the collection.
    // Trap if an index is out of range.
    precondition(
      indices.count == collection.count &&
      indices.allSatisfy(self.indices.contains)
    )
    // Create a new instance and reserve enough capacity to store all removed
    // elements.
    var result = Self.init()
    result.reserveCapacity(indices.count)
    // Remove the elements and fill the new collection.
    indices
      .lazy
      .sorted()
      .enumerated()
      .forEach { (offset, index) in
        let shiftedIndex = self.index(index, offsetBy: -offset)
        let element = remove(at: shiftedIndex)
        result.append(element) 
      }
    // Return the result collection.
    return result
  }
}

The whole trick is, you need to fix the indices for each iteration, which is quite easy since we know that each removal will decrease the count by one. With that knowledge we can simply offset every index by its negative enumerated offset. The indices must also be sorted before we do that.

1 Like

Checking sequence.underestimatedCount might be a not so good idea, a sequence is free to return zero (or some other value less than the actual number of elements).

var a = Array(0..<10)
let toRemove = a.indices.lazy.filter { $0 % 3 == 0 }

print(toRemove.underestimatedCount) // 0
let b = a.removeAll(at: toRemove) // 💣
2 Likes

Thank you, I updated the method to operate on collections instead.

Can someone answer the question if inout means the same as in-place if we would restructure the above example like this?!

extension RangeReplaceableCollection where Index: Hashable {
  public mutating func removeAll<C>(at collection: C) -> Self where
    C: Collection,
    C.Element == Index
  {
    let indices = Set(collection)
    // Trap if number of elements in the set is different from the collection.
    // Trap if an index is out of range.
    precondition(
      indices.count == collection.count &&
      indices.allSatisfy(self.indices.contains)
    )
    return indices
      .lazy
      .sorted()
      .enumerated()
      .reduce(into: .init()) { result, value in
        let (offset, index) = value
        if offset == 0 {
          result.reserveCapacity(indices.count)
        }
        let shiftedIndex = self.index(index, offsetBy: -offset)
        let element = remove(at: shiftedIndex)
        result.append(element)
      }
  }
}

Nice, but it seems to be even less efficient than my naive solution above.

Here's the time it takes to remove (a random list of) half the elements from an Array of Ints (containing 1024 * 256 elements):

0.013 seconds (@anon31136981's here)

2.9 seconds (mine here)

5.9 seconds (@DevAndArtist's here)

Shifting elements in order to be able to remove all elements in one go using removeLast(_:) (like @anon31136981's method does) is probably much faster than using remove(at:) for each element.

I used a command line program, release build, times are averages from 10 repeated runs in a loop, code was written to prevent any possible dead code elimination, MBP late 2013, 2 GHz i7.

Can you share your code that you used to benchmark the algorithms please?

This should be sufficient for trapping on duplicated indices, because shiftLeft() will be called with the same index twice in a row iff it was included at least twice in the array of indices to remove.

extension Array {
    mutating func remove(elementsAtIndices indicesToRemove: [Int]) -> [Element] {
        guard !indicesToRemove.isEmpty else {
            return []
        }
        
        // Copy the removed elements in the specified order.
        let removedElements = indicesToRemove.map { self[$0] }
        
        // Sort the indices to remove.
        let indicesToRemove = indicesToRemove.sorted()
        
        // Shift the elements we want to keep to the left.
        var destIndex = indicesToRemove.first!
        var srcIndex = destIndex + 1
        var prevRemovedIndex = destIndex
        func shiftLeft(untilIndex index: Int) {
            precondition(index != prevRemovedIndex)
            prevRemovedIndex = index
            while srcIndex < index {
                self[destIndex] = self[srcIndex]
                destIndex += 1
                srcIndex += 1
            }
            srcIndex += 1
        }
        for removeIndex in indicesToRemove[1...] {
            shiftLeft(untilIndex: removeIndex)
        }
        shiftLeft(untilIndex: self.endIndex)
        
        // Remove the extra elements from the end of the array.
        self.removeLast(indicesToRemove.count)
        
        return removedElements
    }
}

I’ll try to make it more generic.

1 Like

Here's the microbenchmark program I used, and it now uses Zach's updated version, which didn't change the time (still 0.013 s on my machine):

import QuartzCore // Only for using CACurrentMediaTime for timing.

extension Array {
    mutating func zachRemove(elementsAtIndices indicesToRemove: [Int]) -> [Element] {
        guard !indicesToRemove.isEmpty else {
            return []
        }
        
        // Copy the removed elements in the specified order.
        let removedElements = indicesToRemove.map { self[$0] }
        
        // Sort the indices to remove.
        let indicesToRemove = indicesToRemove.sorted()
        
        // Shift the elements we want to keep to the left.
        var destIndex = indicesToRemove.first!
        var srcIndex = destIndex + 1
        var prevRemovedIndex = destIndex
        func shiftLeft(untilIndex index: Int) {
            precondition(index != prevRemovedIndex)
            prevRemovedIndex = index
            while srcIndex < index {
                self[destIndex] = self[srcIndex]
                destIndex += 1
                srcIndex += 1
            }
            srcIndex += 1
        }
        for removeIndex in indicesToRemove[1...] {
            shiftLeft(untilIndex: removeIndex)
        }
        shiftLeft(untilIndex: self.endIndex)
        
        // Remove the extra elements from the end of the array.
        self.removeLast(indicesToRemove.count)
        
        return removedElements
    }
    
    mutating func jensRemove(elementsAtIndices indicesToRemove: [Int]) -> [Element] {
        let removedElements = indicesToRemove.map { self[$0] }
        for indexToRemove in indicesToRemove.sorted(by: >) {
            remove(at: indexToRemove)
        }
        return removedElements
    }
}

extension RangeReplaceableCollection where Index: Hashable {
    public mutating func removeAll<C>(at collection: C) -> Self where
        C: Collection,
        C.Element == Index
    {
        let indices = Set(collection)
        // Trap if number of elements in the set is different from the collection.
        // Trap if an index is out of range.
        precondition(
            indices.count == collection.count &&
                indices.allSatisfy(self.indices.contains)
        )
        // Create a new instance and reserve enough capacity to store all removed
        // elements.
        var result = Self.init()
        result.reserveCapacity(indices.count)
        // Remove the elements and fill the new collection.
        indices
            .lazy
            .sorted()
            .enumerated()
            .forEach { (offset, element) in
                let index = self.index(element, offsetBy: -offset)
                let element = remove(at: index)
                result.append(element)
        }
        // Return the result collection.
        return result
    }
}


func test() {
    let count = 1024*256
    var a = (0 ..< count).map({ $0 }).shuffled()
    let indicesToRemove = Array(a.indices.shuffled().prefix(count/2))
    
    let t0 = CACurrentMediaTime()
    let b = a.zachRemove(elementsAtIndices: indicesToRemove)
    //let b = a.jensRemove(elementsAtIndices: indicesToRemove)
    //let b = a.removeAll(at: indicesToRemove)
    let t1 = CACurrentMediaTime()

    print("  ", t1 - t0, "seconds",
        "(checksum:", a.reduce(1, &+) &+ b.reduce(1, &+), ")")
}

for _ in 0 ..< 7 { test() }

Note that it's very rudimentary, and you have to comment out / in the line using the method you want to test.

This is a nice implementation! It would make a good addition to the std lib. Some suggestions on tweaks:

  • Making it generic over MutableCollection where Self: RangeReplaceableCollection is pretty straightforward. The compiler will tell you what you need to change. Int becomes Index, += 1 becomes formIndex(after:)
  • The input indices can trivially be converted to a generic Sequence
  • If shiftLeft returns destIndex, you can use that to trim the end using replaceSubrange(lastRemaining..<endIndex) instead of using the count, which wouldn't be valid with a non-Int or non-zero-based index.
  • It's a shame, with an efficient in-place operation like this, to have to still allocate memory for the result and for the sorted indices
    • For the indices: if you didn't want to require the indices come pre-sorted, you could have a variant that checks each index is increasing, and when it isn't, falls back to sorting the remainders. I suspect the vast majority of use cases the input would be sorted
    • For returning the elements: while the user could discard the result, they'll still pay the price of allocating it unless the optimizer is really on their game. It may be better to expect users to save down the to-be-removed elements externally first. As you show, it's a simple one-liner
  • I'd recommend using MutableCollection.swapAt(_:_) rather than sliding elements over each other. It can be more efficient for non-trivial types as it avoids unnecessary reference count ops. It's slightly less efficient for trivial types (since you're copying the to-be-discarded bits) but trivial types are so fast to copy that's a trade-off worth making. It also means it'll still work in a future of collections of move-only types.
18 Likes

Thanks for the suggestions, Ben! Here’s my more generic version, with no memory allocations in the case where indicesToRemove is already sorted. indicesToRemove is a generic Collection rather than a Sequence because we need to iterate through it twice (once to check if it’s sorted, once to perform the removals) and as I understand it, Sequences don’t guarantee that they return the same elements each time.

extension MutableCollection where Self: RangeReplaceableCollection {
    private mutating func remove<C>(elementsAtSortedIndices indicesToRemove: C) where
        C: Collection,
        C.Element == Index
    {
        // Shift the elements we want to keep to the left.
        var destIndex = indicesToRemove.first!
        precondition(indices.contains(destIndex), "Index out of range")
        
        var srcIndex = index(after: destIndex)
        var previousRemovalIndex = destIndex
        func shiftLeft(untilIndex index: Index) {
            precondition(index != previousRemovalIndex, "Duplicate indices")
            while srcIndex < index {
                swapAt(destIndex, srcIndex)
                formIndex(after: &destIndex)
                formIndex(after: &srcIndex)
            }
            formIndex(after: &srcIndex)
        }
        let secondIndex = indicesToRemove.index(after: indicesToRemove.startIndex)
        for removeIndex in indicesToRemove[secondIndex...] {
            precondition(indices.contains(removeIndex), "Index out of range")
            shiftLeft(untilIndex: removeIndex)
        }
        shiftLeft(untilIndex: endIndex)
        
        // Remove the extra elements from the end of the collection.
        removeSubrange(destIndex..<endIndex)
    }
    mutating func remove<C>(elementsAtIndices indicesToRemove: C) where
        C: Collection,
        C.Element == Index
    {
        guard !indicesToRemove.isEmpty else {
            return
        }
        
        // Check if the indices are sorted.
        var isSorted = true
        var prevIndex = indicesToRemove.first!
        let secondIndex = indicesToRemove.index(after: indicesToRemove.startIndex)
        for index in indicesToRemove[secondIndex...] {
            if index < prevIndex {
                isSorted = false
                break
            }
            prevIndex = index
        }
        
        if isSorted {
            remove(elementsAtSortedIndices: indicesToRemove)
        } else {
            remove(elementsAtSortedIndices: indicesToRemove.sorted())
        }
    }
}
1 Like

It seems to me that isSorted might be useful as a property on Collection.

Also, I'm still figuring out this ownership thing, but I was wondering whether it makes sense to annotate the indicesToRemove argument as __owned and sort the indices in place where C: RandomAccessCollection & MutableCollection.

1 Like

I tried to ask @Ben_Cohen about the application he used to benchmark algorithms in his recent talk but it always resulted in radio silence. Luckily I just accidentally stumbled around the project on GitHub. :hugs: So if you want to benchmark the improvements of the algorithms in this thread feel free to try it with that tool and share your result.

In general I think we should use that tool more often in the evolution process.

3 Likes

This is pretty interesting and would actually fit in quite well with a proposal I was thinking of writing for the addition of distinct() and distinct(by:) to the stdlib (for the removal of duplicate elements from a collection).

There might be a few proposals that are worth chaining together here for some convenient collection additions.

You can probably get even more efficient if you use IndexSet and do a reverse enumeration on its contiguous subranges (using its rangeView). Building up an IndexSet from a collection of indices is by itself pretty efficient.

Something like (not tested in compiler):
mutating func removeElements(atIndexes indexes: IndexSet) { for subrange in indexes.rangeView.reversed() { removeSubrange(subrange) } }

Apologies for the formatting.