Implementing a Collection.removingIndices(_:)

The removeAll(where:) in the standard library is pretty awesome but I find wanting to remove a set of indices quite often. (Like with UICollectionView). The best quick thing I could come up with is this:

extension Collection where Index: Hashable {
    func removingIndices<S: Sequence>(_ indiciesToRemove: S) -> [Element] where S.Element == Index {
        let removeSet = Set(indiciesToRemove)
        var predicateCollection = zip(indices, self).map { (removeSet.contains($0.0), $0.1) }
        predicateCollection.removeAll { $0.0 }
        return predicateCollection.map { $0.1 }
    }
}

let array = Array(0...6)
array.removingIndices([0,2])  // [1, 3, 4, 5, 6]
array.removingIndices([])     // [0, 1, 2, 3, 4, 5, 6]
array.removingIndices(array.indices) // []

It makes a copy so it can use O(n) removeAll and then makes another copy to return.

Maybe I am missing something easier? Is this something (perhaps with a better implementation) that could be pitched to a future standard library?

Thanks for any feedback,
Ray

2 Likes

Aside from the obvious imperative loop, one single-pass alternative is:

extension Collection where Index: Hashable {
  func removingIndices<S: Sequence>(_ indiciesToRemove: S) -> [Element] where S.Element == Index {
    let removeSet = Set(indiciesToRemove)
    return zip(indices, self).reduce(into: []) { result, item in
      if !removeSet.contains(item.0) { result += [item.1] }
    }
  }
}
1 Like

Very nice! I suppose we could get super optimized by reserving the capacity of the into array by count - removeSet.count. (Assuming removeSet contains all hits.)

Actually, at that point you might as well not zip at all, which cleans it up a little

extension Collection where Index: Hashable {
  func removingIndices<S: Sequence>(_ indiciesToRemove: S) -> [Element] where S.Element == Index {
    let removeSet = Set(indiciesToRemove)
    return indices.reduce(into: []) { result, index in
      if !removeSet.contains(index) { result.append(self[index]) }
    }
  }
}
1 Like

even cleaner?

extension Collection {
    func removingAllIndices<Indices: Sequence>(
        _ indicesToRemove: Indices
        ) -> [Element]
        where Indices.Element == Index {

        return zip(self.indices, self).lazy
            .filter { !indicesToRemove.contains($0.0) }
            .map { $0.1 }
    }
}
1 Like

This is a great solution too. My first attempt looked (a little) like this but couldn't figure out the lazy trick. By dropping the requirement on Hashable you move the responsibility of performance to the caller and it does save an allocation.

I am interested in the rationale of why you changed the name from removingIndices to removingAllIndicies. I struggle a bit with the names so curious to know what your thinking was here.

It's a spelling fix: index - Wiktionary, the free dictionary

If a consistency argument has greater weight:
https://developer.apple.com/documentation/swift/collection/1641719-indices

1 Like

An alternative approach would be to start with sorted indices (either by sorting the parameter or specifying the order as a precondition). The algorithm can then copy chunks of the source collection into the result without checking every index, which might be more efficient:

extension Collection {
    func removingIndices<S: Sequence>(_ indicesToRemove: S) -> [Element]
        where S.Element == Index
    {
        var result: [Element] = []
        var low = startIndex
        for high in indicesToRemove {
            result.append(contentsOf: self[low..<high])
            low = index(after: high)
        }
        result.append(contentsOf: self[low...])
        return result
    }
}
6 Likes

@jawbroken and @TizianoCoroneo your implementations work, but they're semantically distinct from removeAll(where:), whose important distinction from filter(_:) is that it operates in-place.

I was trying to make a mutating function like removeAll(where:) already present in the standard library, but I couldn't figure it out :slight_smile: so I left the All word there just for consistency (and Indices instead of Indicies is indeed a spelling fix from Latin - it is the nominative plural of "index"). Thinking a little more about the name, I would probably go for this:

extension Collection {
    func removingAll<Indices: Sequence>(
        indices indicesToRemove: Indices
        ) -> [Element]
        where Indices.Element == Index {
            return zip(self.indices, self).lazy
                .filter { !indicesToRemove.contains($0.0) }
                .map { $0.1 }
    }
}

@AlexanderM That's why we are calling these functions removing instead of remove, since we return the result instead of mutating in place.

I tried to write an implementation for the mutating function, but my current results have embarrassing performance :grimacing:

extension RangeReplaceableCollection {
    mutating func removeAll<Indices: Sequence>(
        indices indicesToRemove: Indices
        )
        where Indices.Element == Index {
// Not sure what to do here
    }
}
1 Like

I was testing the performance of various different ways of implementing this function, and I found out that the fastest implementation is this one... by a lot:

extension RangeReplaceableCollection {
    // The implementation defined in `Collection` in my previous post takes 0.111 sec. This takes 0.001 sec! 🤯
    mutating func removeAll<Indices: Sequence>(
        indices indicesToRemove: Indices
        )
        where Indices.Element == Index {
            let sortedIndices = indicesToRemove.sorted()

            sortedIndices
                .lazy
                .reversed()
                .forEach { self.remove(at: $0) }
    }
}

EDIT: I just added @nnnnnnnn iterative version to the test benchmark and it is an order of magnitude faster than this last implementation. Nice job!

Full test code
//
//  BenchMarkTests.swift
//  BenchMarkTests
//
//  Created by Tiziano Coroneo on 07/08/2019.
//  Copyright © 2019 Tiziano Coroneo. All rights reserved.
//

import XCTest

extension Collection {
    // My original zip solution takes 0.111 sec
    func removingAllLazy<Indices: Sequence>(
        indices indicesToRemove: Indices
        ) -> [Element]
        where Indices.Element == Index {
            return zip(self.indices, self).lazy
                .filter { !indicesToRemove.contains($0.0) }
                .map { $0.1 }
    }

    // The same solution without `.lazy` takes 0.117 sec
    func removingAllStrict<Indices: Sequence>(
        indices indicesToRemove: Indices
        ) -> [Element]
        where Indices.Element == Index {
            return zip(self.indices, self)
                .filter { !indicesToRemove.contains($0.0) }
                .map { $0.1 }
    }
}

extension RangeReplaceableCollection {
    // This takes 0.001 sec! 🤯
    mutating func removeAll<Indices: Sequence>(
        indices indicesToRemove: Indices
        )
        where Indices.Element == Index {
            let sortedIndices = indicesToRemove.sorted()

            sortedIndices
                .lazy
                .reversed()
                .forEach { self.remove(at: $0) }
    }

    // This takes 0.113 sec
    mutating func removeAll2<Indices: Sequence>(
        indices indicesToRemove: Indices
        )
        where Element: Equatable, Indices.Element == Index {
            let result = self.removingAllLazy(indices: indicesToRemove)
            self.removeAll(keepingCapacity: true)
            self.append(contentsOf: result)
    }
}

extension Array {
    // This takes 0.116 sec
    mutating func removeAll3<Indices: Sequence>(
        indices indicesToRemove: Indices
        )
        where Element: Equatable, Indices.Element == Index {
            self = self.removingAllLazy(indices: indicesToRemove)
    }
}


class BenchMarkTests: XCTestCase {

    let dimension: Int = 1_000

    var numbers: [Int] = []
    var indices: [Int] = []

    override func setUp() {
        self.numbers = .init(repeating: 1234, count: dimension)
        self.indices = (0..<(dimension / 2)).map { $0 * 2 }
    }

    override func tearDown() {
        self.numbers = []
    }

    func testRemovingAllStrict() {
        self.measureMetrics([
            .wallClockTime
        ], automaticallyStartMeasuring: false) {
            self.setUp()
            self.startMeasuring()
            _ = numbers.removingAllStrict(indices: indices)
            self.stopMeasuring()
            self.tearDown()
        }
    }

    func testRemovingAllLazy() {
        self.measureMetrics([
            .wallClockTime
        ], automaticallyStartMeasuring: false) {
            self.setUp()
            self.startMeasuring()
            _ = numbers.removingAllLazy(indices: indices)
            self.stopMeasuring()
            self.tearDown()
        }
    }

    func testRemoveAll() {
        self.measureMetrics([
            .wallClockTime
        ], automaticallyStartMeasuring: false) {
            self.setUp()
            self.startMeasuring()
            numbers.removeAll(indices: indices)
            self.stopMeasuring()
            self.tearDown()
        }
    }

    func testRemoveAllFromRangeRepleaceableCollection() {
        self.measureMetrics([
            .wallClockTime
        ], automaticallyStartMeasuring: false) {
            self.setUp()
            self.startMeasuring()
            numbers.removeAll2(indices: indices)
            self.stopMeasuring()
            self.tearDown()
        }
    }

    func testRemoveAllFromArray() {
        self.measureMetrics([
            .wallClockTime
        ], automaticallyStartMeasuring: false) {
            self.setUp()
            self.startMeasuring()
            numbers.removeAll3(indices: indices)
            self.stopMeasuring()
            self.tearDown()
        }
    }

}
1 Like

It's worth mentioning that indices expire when Collection mutates (even indices before the mutating index). It should work fine for most cases, but that should also be documented.

4 Likes

For Array specifically, you can reverse-sort the indices-to-remove, then rotate each in turn to the end (ie. rotate array[i...] to the left by 1), and finally drop the last n elements.

You’d have to implement rotate, but you can copy-and-paste it from the standard library algorithm prototypes file.

Could also do sorted(by: >) instead if reverse-sorted. Not sure if it's faster tho.

Be careful as well that the algorithm performs in quadratic time.
For larger collection, it'd probably become slow very quickly.

Here's another one.

extension RangeReplaceableCollection where Self: MutableCollection {
  /// - Requires: `swapAt` does NOT invalidate indices.
  mutating func removeAll<Indices: Sequence>(indices: Indices) where Indices.Element == Index {
    var currentIndex = startIndex

    // Uses `Set` if that proves faster.
    var toSkip = indices.sorted().makeIterator()
    var skipping = toSkip.next()

    for index in self.indices {
      if index == skipping {
        skipping = toSkip.next()
      } else {
        swapAt(currentIndex, index)
        formIndex(after: &currentIndex)
      }
    }
    removeSubrange(currentIndex..<endIndex)
  }
}
1 Like

Sure, but you don't have any other option for an implementation on Collection (or any of the standard libraries that inherit from Collection, like RangeReplaceableCollection) because any attempt at an in-place version will invalidate all the indices on the first modification.

1 Like

I was reading the iOS beta 6 change log, and I noticed that they added exactly the method we’re looking for :partying_face::

mutating func remove(atOffsets offsets: IndexSet)

On RangeReplaceableCollection

1 Like

This is a O(n) solution using the half stable partition algorithm provided in WWDC 2018 Session 223 – Embracing Algorithms.

The indices are passed as IndexSet

extension RangeReplaceableCollection where Self: MutableCollection, Index == Int {

    mutating func remove(at indexes : IndexSet) {
        guard var i = indexes.first, i < count else { return }
        var j = index(after: i)
        var k = indexes.integerGreaterThan(i) ?? endIndex
        while j != endIndex {
            if k != j { swapAt(i, j); formIndex(after: &i) }
            else { k = indexes.integerGreaterThan(k) ?? endIndex }
            formIndex(after: &j)
        }
        removeSubrange(i...)
    }
}
1 Like

Be careful that if the Collection indices is not 0..<count, the operation might not mean the same thing.

1 Like

In Foundation. There’s no reason I can perceive it shouldn’t be in the standard library — apart from the evolution slog, that is.