Inconsistent behavior with mutating arrays during for-in loops

I noticed inconsistent behavior that I don't understand when it comes to Swift's copy-on-write mechanisms in loops.

This code works:

var myColorGroups = [["red", "blue", "green"], ["purple", "orange", "cyan"], ["pink", "brown", "mint"]]
for (index, group) in myColorGroups.enumerated() {
    for (index2, color) in group.enumerated() {
        if color == "red" {
            myColorGroups.remove(at: index)
        }
    }
}
print(myColorGroups)

However, the exact same operations with Ints doesn't:

var myNumberGroups = [[1, 5, 7], [76, 23, 72], [642, 5345, 1]]
for (index, group) in myNumberGroups.enumerated() {
    for (index2, number) in group.enumerated() {
        if number == 1 {
            myNumberGroups.remove(at: index)
        }
    }
}
print(myNumberGroups)

The error "Index out of range" is produced in the second example.

I don't understand why the first example is allowed, while the second is not. I also don't understand what the best conventions are for performing these kind of operations if the second example is intended behavior in Swift.

If you add "red" to the last group (as with "1"), it will also crash.

1 Like

The best approach is to use existing algorithms:

myColorGroups.removeAll{ $0.contains("red") }

If for some reason you need to write your own such method, it is important to be fastidiously careful that you only subscript with valid indices. There are particular caveats when writing a generic algorithm against, say, the RangeReplaceableCollection protocol, but if you only need to work with Array then you can use 0-based integer indices.

Even with Array, though, removing elements will change the length.

2 Likes

I do want to write my own methods and algorithms, mostly for educational and practice purposes. But I keep running into this sort of problem, and I'm struggling to find the right implementation pattern.

Could it be that Swift is not the right language to use in the context of creating algorithms without incorporating existing ones like removeAll()?

It's not that you can't write algorithms without calling into pre-existing ones, it's just that you need to know what operations invalidate indices for the collection you're using.

enumerated lets you iterate over a read-only copy of your original data (the copy usually gets optimized out), and the indices it gives you are just successive integers starting from zero. So, no changes you make to the original collection effect what you're iterating over when you use enumerate.

You really ought to work with indices directly if you're going write this kind of thing by hand. Array indices are always integers in the range 0..<count, so that should simplify writing the code a bit, since any indices preceding the item you removed are still valid.

1 Like

If you want to write an algorithm yourself, then you need to understand what a correct implementation of that algorithm looks like, how it works, and why it is correct.

The standard-library implementation of removeAll(where:) calls a non-public helper method _halfStablePartition whose implementation, while not particularly complex, is also not exactly obvious if you haven’t seen it before.

There’s a reason it’s written that way, and it has everything to do with the algorithm itself, not the language it’s written in.

Note that I linked the version of removeAll(where:) that’s defined on RangeReplaceableCollection & MutableCollection. There is another version without the MutableCollection constraint, which has a much simpler implementation but worse memory characteristics.

2 Likes

It’s also worth noting, if you did indeed want to achieve myColorGroups.removeAll{ $0.contains("red") }, that your original code has additional problems even aside from the potential to index out of bounds.

With this input:

var myColorGroups = [["red", "red"], ["blue"]]

Your original code will delete both sub-arrays, leaving myColorGroups completely empty.

So, again, I’ll reiterate that one should try to use existing algorithms wherever applicable, because that way each algorithm only needs to be implemented once, and the correct implementation may be tricky to get right.

Advice from Crusty: "No Raw Loops”:

Could be helpful to watch this and proceed with caution...and look into the Algorithms Package for inspiration in writing you own.

2 Likes

Thanks for the advice! I think I now understand things a lot better. I've come up with this solution.

var myNumberGroups = [[1, 5, 7], [76, 23, 72], [642, 5345, 1]]
var index = 0
while index < myNumberGroups.count {
    var indexChange = 0
    for index2 in 0..<myNumberGroups[index].count {
        if myNumberGroups.count > 0 && myNumberGroups[index][index2] == 1 {
            myNumberGroups.remove(at: index)
            indexChange -= 1
            break
        }
    }
    index += 1 + indexChange
}
print(myNumberGroups)

I've run custom and random test cases against it, and I think the general idea works.

Alternatively if I were using existing algorithms, this is probably what I'd do:

var myNumberGroups = [[1, 5, 7], [76, 23, 72], [642, 5345, 1]]
myNumberGroups.removeAll {
    if $0.contains(1) {
        return true
    }
    return false
}
print(myNumberGroups)

My first solution's runtime is not as good as the second, but I'm not trying to reinvent the wheel, I'm just preparing for coding interview situations where I'm told I can't use .removeAll().

1 Like

whenever you are going to REMOVE an item from an array... DO NOT interate in a forward direction, because at some point your index pointer will exceed the bounds of the array.

Iterate in REVERSE, and you will not have an issue

Thank you so much for that video. I just finished watching it and it answered all my questions! I'm going to look into the standard library algorithms more.

2 Likes

Thank @dabrahams

:star_struck:

you mean @crusty ?

5 Likes