Recommended idiom for conditionally removing elements while iterating a collection?

Suppose you have an array, and you want to do something with the elements one by one. As you proceed, sometimes you will notice that the current element should actually be removed from the array.

What’s the best way to achieve this?


I noticed on Apple’s SpriteKit documentation page for Disconnecting Bodies from Joints, there is a code example which reads:

SKPhysicsJoint code example
var springs: [SKPhysicsJointSpring] = []
let maxSpringDistance: CGFloat = 31

// Each `scene.physicsWorld.add(spring)` is accompanied by `springs.append(spring)`

func update(_ currentTime: TimeInterval, for scene: SKScene) {
	for spring in springs {
		if let bodyAPosition = spring.bodyA.node?.position,
		let bodyBPosition = spring.bodyB.node?.position {
		let distance = hypot(bodyAPosition.x - bodyBPosition.x,
			bodyAPosition.y - bodyBPosition.y)
		if distance > maxSpringDistance {
			scene.physicsWorld.remove(spring)
			if let index = springs.index(of: spring) {
				springs.remove(at: index)
			}
		}
	}
}}

If we simplify that down, we get a smaller example like so:

var list: [Foo] = ...

func bar() {
  for x in list {
    doSomething(x)
    
    if shouldRemove(x) {
      if let i = list.firstIndex(of: x) {
        list.remove(at: i)
      }
    }
  }
}

Now there are a couple of problems with that code. First, the use of list.firstIndex makes the entire operation accidentally quadratic. (Suppose the first half of the list stays and the last half goes. Then there will be n/2 calls to firstIndex, which each walk n/2 steps to find x.)

Second, and perhaps less obvious, CoW triggers a full copy of the entire list array, the first time remove is called inside the for loop.


So my question is, what’s the best way to write this type of code, so it will run in linear time without triggering CoW? Let’s also say we want the elements to be processed in list order.

The best I could come up with seems…unidiomatic:

var list: [Foo] = ...

func bar() {
  list.removeAll {
    doSomething($0)
    return shouldRemove($0)
  }
}

Using removeAll(where:) to perform side effects feels weird, especially since it relies on the closure being called exactly once for each element in iteration order. That is an implementation detail of the standard library which happens to hold today, but is not documented so it shouldn’t be relied upon.


In order to avoid relying on such undocumented details, we can instead iterate an index manually:

var list: [Foo] = ...

func bar() {
  var i = list.startIndex
  
  while i < list.endIndex {
    let x = list[i]
    doSomething(x)
    
    if shouldRemove(x) {
      list.remove(at: i)
    } else {
      list.formIndex(after: &i)
    }
  }
}

That works correctly for Array (albeit in quadratic time), but if instead we had a generic RangeReplaceableCollection then remove(at:) is documented as potentially invalidating “any existing indices”, meaning in particular i could become invalid.

Is there a better solution?


?
Afair there is also a WWDC video which propagates the use of partition.

I’m not sure what you’re suggesting. The primary purpose of the loop is to perform side effects. Removing elements is expected to be rare.

It seems just as unidiomatic to call partition with the intent of performing side effects as it does to call removeAll for the same purpose.

I feel like the reason this doesn't work is because with Swift's value semantics you have to mutate the array and the iterator to keep the state consistent. As you note with an array specifically you can do this manually because the indices are 0-based integers (remove if needed, increment index if not), but even then you're doing unnecessary shifting. The best answer really might be to build an IndexSet during the iteration and then use remove(atOffsets:) once you're done.

EDIT: I got Foundation.IndexSet and SE-0270's RangeSet mixed up. It's RangeSet that would do this generically, though it's still in the "preview" phase for standard library proposals.

6 Likes

Not sure if you want to add functionality to Array's to self remove the object by using extension. If you simply want to handle the scenario where you are looping through the array and then remove the element if a condition is met then the below code does that linerally.

struct Student{

var name: String

var swiftScore: Int

var kotlinScore: Int

var hasLeft: Bool

}

var sam = Student.init(name: "Sam", swiftScore: 100, kotlinScore: 25, hasLeft: false )

var roxy = Student.init(name: "Roxy", swiftScore: 25, kotlinScore: 100, hasLeft: true )

var mucho = Student.init(name: "Mucho", swiftScore: 60, kotlinScore: 60, hasLeft: true )

var alexa = Student.init(name: "Alexa", swiftScore: 87, kotlinScore: 91, hasLeft: false )

var studentArr = [sam, roxy, mucho, alexa]

print (studentArr.count)

var iCounter = 0

for student in studentArr {

print ("Looping student: (student) at counter:(iCounter)")

if student.hasLeft {

studentArr.remove(at: iCounter)

} else {

iCounter += 1

}

}

studentArr.count

for student in studentArr {

print ("Looping student: (student)")

}

More, documentation on Arrays is available at:

Dictionaries are much simpler as you as simply assign the key as nil
var studentDict = ["YUP01": ["Frank": "Mohatara"], "YUP02" : ["Chris": "Rockstar"]]

studentDict["YUP02"] = nil

Detailed at the below link:

Actually, I'd suggest the best way to do this is to use compactMap, if the code for each element is independent of code for previously-examined elements.

If the code for each element needs context from earlier iterations, I'd suggest making a new empty mutable array and adding "surviving" modified elements from the old array to it, using a regular ol' for loop.

The point is, if you count the number of element copies being done, the "inelegant" approach of creating a whole new array may end up making fewer copies, and is easier to read, I think.

The nice thing about this is, if you need to actually modify the array you started with, COW ensures that replacing the whole array with the new one is cheap.

1 Like

You beat me to it:

func bar() {
    list = list.compactMap { x in
        doSomething(x)
        
        return shouldRemove(x) ? nil : x
    }
}

Although I'm not sure this is the best way to do it... Does the implicit [Foo?] get stored somewhere, or is it only created if it's actually needed?

1 Like

The compactMap approach is as icky as the removeAll(where:) approach; I don't see where the improvement lies. How is either of those better than the RangeSet approach?

The algorithms are tightly related: for RangeReplaceableCollection & MutableCollection, removeAll is generically implemented by partitioning and removing the tail partition.

Technically, any mutation of the collection should be considered a side-effect, but I think you're distinguishing “external” effects—ones other than those on the inout arguments and mutated self of the algorithm.

Both approaches may be non-idiomatic, but if there's significant common work involved in performing the external effects and determining whether or not to remove the element, there's really no better way to do it. The one problem is that documentation for these algorithms doesn't guarantee that the predicate will be evaluated only once for each element (and in order!), despite the fact that even stable partitioning has that property.

If there's not significant common work, you can handle the external effects in one pass and then call removeAll to update the collection for clarity, without hurting big-O complexity, but at the cost of data locality.

Personally, I would file a bug about the documentation and use the combined method, leaving a nice comment in the code to clarify the role of external side-effects.

That algorithm is not linear, but quadratic in the length of the collection because removeAt is O(N) and you're doing it up to N times.

6 Likes

compactMap removes any guesswork or bugs related to removing an element from an array at an index while you’re iterating through the array.

RangeSet has the same advantages and I suspect between the two, that’s the better option. I just coded up the compactMap example first.

You can do a bit better for RandomAccessCollections. For them you can do this:

func bar() {
    var i = list.startIndex
    var counter = 0

    while i < list.endIndex {
        let x = list[i]
        doSomething(x)

        if shouldRemove(x) {
            list.remove(at: i)
            i = list.index(list.startIndex, offsetBy: counter)
        } else {
            list.formIndex(after: &i)
            counter += 1
        }
    }
}

I can't tell what Nevin's requirements are, but in the case of Array here is a non-quadratic solution:

extension RangeReplaceableCollection where Self: MutableCollection {
  mutating func bar() {
    var r = startIndex // index for reading
    var w = startIndex // index for writing

    while r < endIndex {
      let x = self[r]
      doSomething(x)

      if !shouldRemove(x) {
        if w < r {
          self[w] = x
        }
        formIndex(after: &w)
      }
      formIndex(after: &r)
    }
    removeSubrange(w...)
  }
}

So, the "do something" applies to all the elements, both the future-kept and future-rejected ones, right? And you can only test during the "do something," and not in an independent pass either before or after the D.S., right?

  • Declare var keptIndices: [DoSomething.Index]. Set its capacity to that of doSomething.underestimatedCount.
  • In your "do something" pass, append the index of the current element if it will be one you'll keep to keptIndices.
  • After the loop, make a new array of keptIndices.map { doSomething[$0] }.
  • Splat-assign that new array over doSomething.

Don't overthink it.

An issue was how to do this without triggering a copy-on-write. Given that, we might also be interested in avoiding copy-then-overwrite behaviour.

I won’t try writing the fastest solution, but the most idiomatic is definitely:

let list: [Foo] = ...
let filteredList = list.filter { shouldRemove($0) }

In the Apple example, where you have to do two removes, I’d write:

var springs: Set<SKPhysicsJointSpring> = []

func update(_ currentTime: TimeInterval, for scene: SKScene) {
    let invalidSprings = Set( springs.filter { $0.isStretchedTooFar } )
    springs.subtract(invalidSprings)
    invalidSprings.forEach { scene.physicsWorld.remove($0) }
}

[...]

internal extension SKPhysicsJointSpring {
    let maxSpringDistance: CGFloat = 31
    var isStretchedTooFar: Bool {
        guard let bodyAPosition = spring.bodyA.node?.position,
                let bodyBPosition = spring.bodyB.node?.position else { return true }
        let distance = hypot(bodyAPosition.x - bodyBPosition.x,
                               bodyAPosition.y - bodyBPosition.y)
        return distance > maxSpringDistance
    }
}

Note that keeping a Set of springs instead of an Array makes them a lot easier to work with as, well, sets, which is what they really are being used as.

Again, some might complain this isn’t the fastest, but to them I’d say: “It won’t be a problem when you use it.” In this case, how many springs is your scene going to have? 100? 1000? The actual looping code would still run in nanoseconds. So your goal in this case should be to use the clearest code, not the fastest.

-Wil

That one is still quadratic.

That one is just an open-coded version removeAll(where:) but it will be inferior than an implementation that uses swap on elements containing references because of extra ARC traffic.

That one doesn't mutate the list in place.

Y'all: if there's an existing algorithm that does what you need, don't try to code it yourself. Or as Sean Parent says: NO RAW LOOPS.

6 Likes

So my question is, what’s the best way to write this type of code, so it will run in linear time without triggering CoW?

I think I fulfilled the original requirements: Does this trigger CoW? Is it linear? Is it good code?

Note that I was trying to deal with the original Apple SpriteKit code that was listed, which can’t use removeAll because the objects need to be removed from the SpriteKit physicsWorld one by one.

-Wil

Of course it can:

springs.removeAll { spring in
  doSomething(spring)
  let distance = ...
  
  if distance > threshold {
    scene.physicsWorld.remove(spring)
    return true
  }
  return false
}

Now, the more general scenario is when you want to process every element, mutate some of them, and remove others.

That would require a new algorithm, effectively a version of removeAll(where:) whose predicate takes its argument inout.

Sorry, I don’t think I was clear, I meant you can’t call removeAll() on physicsWorld directly, so other approaches had to be taken, of which yours is one.

I don’t personally like the idea of having side-effects (eg, also removing from physicsWorld) inside the call to removeAll() but I acknowledge it’d work!

Whether something “triggers CoW” or not is often not a local phenomenon, because it depends on which things you may be persisting a copy of. For example, if you are storing a second copy of the value of springs somewhere, and it's a type implemented with CoW, that first subtract operation, assuming it's a mutation, is going to create new storage. There's nothing inherently CoW-triggering in that code, though.

That depends on the complexity of springs.subtract and physicsWorld.remove(). If the former is linear and the latter is O(1), the whole thing is linear.

Is it good code?

Hard to say. forEach is a little bit worse for readability than a raw for loop, so if those are your only options, I'd go with the latter. However, NO RAW LOOPS says to factor that loop into an algorithm (extension method on PhysicsWorld or whatever type that is) that removes the elements of a collection from a physics world. Even if you can't improve the performance, doing that would make the code better.

Terms of Service

Privacy Policy

Cookie Policy