Performance of array access vs iterator

Pampel · January 17, 2020, 9:06am

I'm currently in the early stages of writing an iOS game in Swift, and last night noticed that it was occasionally dropping frames. A bunch of profiling later and it turned out what was taking a massive proportion of the frame time was code that iterated over an array and looked like this...

for y in 0..<height {
    for x in 0..<width {
        someArray[x + y * width].someProperty = false
    }
}

The code originally used x and y for other stuff, but this is no longer necessary. Hurray, I thought, I can change it to this and it will be much faster...

for element in someArray { element.someProperty = false }

Alas the code took even longer to execute, about 3x as long, ~20-24ms vs ~8ms. I thought that iteration like this was extremely efficient, and that array indexing would be comparibly inefficient as there would be pointer math and bounds checking etc.

Why is the foreach implementation so much slower? What would be the most efficient way to iterate over the array, short of re-arranging the memory layout and doing memset type shenanigans?

For reference, this was all in debug mode, and the array is a collection of classes, not structs.

JK_Kross · January 17, 2020, 10:27am

I'd suggest you first of all measure again in release mode - talking about performance in debug builds is like measuring an athletes performance before the race, when they're just warming up

Pampel · January 17, 2020, 10:40am

Yeah I understand a release build will give some very different performance characteristics, but I want to understand why the behavior in debug is so different from what I expected. When performance is bad in debug mode I hate relying on release optimizations to fix it, even if profiling the release build shows everything is fine!

JK_Kross · January 17, 2020, 10:50am

That's understandable. But I suspect that people who actually know what they're talking about (unlike me ) will suggest the same...

Try to look up the implementation of forEach in the stdlib (I guess you'd find it in stdlib/public/core, but that's just a guess) - maybe you'll find some clues there.

I don't think I'm educated enough to help you more that that, though. Sorry

lukasa · January 17, 2020, 2:59pm

Because in debug mode Swift's zero-cost abstractions are not zero-cost.

It is common in Swift to add nice API conveniences through high level functions and properties that express the intent of our code. They let us write the code in a way that explains what we're trying to do, rather than focusing too much on making the computer do what we want. These are powerful abstractions.

However, we don't want to have to pay for them. To ensure that we don't, the optimiser is able to see through these abstractions and make them go away at runtime. This allows our high-level, expressive code to run just as fast as code that is low level but correct.

So why doesn't this work in debug mode? Well, the short answer is because debug mode supports debugging. When I attach a debugger to my program, I'd really like to be able to ask my debugger questions like:

What's the value of this variable?
What line of the function have I reached?
What arguments were passed to this function?

In release mode it's totally possible that none of these questions can be answered. Code can be rearranged, variables may be optimised out of existence, function parameters may have long since been written over. This is great for performance, but not good for debugging.

This cannot be stressed too strongly: the performance of your program in debug mode is often entirely unrelated to the performance of your program in release mode. The only reason to profile in debug mode is when your goal is "write code that runs fast in debug mode", and that goal is often in conflict with the goal of "write good code".

As to why your code specifically is slower, the first problem is that foreach is not a keyword, so your second function is not valid.

The most efficient way to iterate the array is for element in array. This will optimise to the ideal outcome in Swift.

Pampel · January 17, 2020, 3:34pm

My bad, I typed the code from memory at work, and had c# on the brain. I've edited it to reflect the actual code. It is actually a standard for x in y loop, which is why I was surprised it was so slow - I expected the abstraction costs around pointer arithmetic/array indexing and bounds checking to be higher than the abstraction cost of iteration, even in debug code.

To be clear, debug performance isn't my goal, but when something surprising (to me, anyway) like this happens I prefer to understand exactly why it happens before relying on release optimizations to make everything ok.

lukasa · January 17, 2020, 3:59pm

Ah, ok. This is the core source of the misunderstanding. Your mental model of Swift is that for...in loops do not incur the abstraction of pointer arithmetic and array indexing. That's not accurate. Array iterating is pointer arithmetic and bounds checking. The way iterating works in Swift is that for x in y is translated to this:

var iterY = y.makeIterator()
while let x = iterY.next() {
    // Loop body
}

The type returned from Array's makeIterator() call is IndexingIterator. This type should give you a clue about how it works, which is roughly like this (I've simplified to remove some generics and some other helpers):

struct IndexingIterator<T> {
    var base: Array<T>
    var currentIndex: Int

    init(base: Array<T>) {
        self.base = base
        self.currentIndex = base.startIndex
    }

    func next() -> T? {
        guard self.currentIndex < self.base.endIndex else { return nil }
        let element = self.base[self.currentIndex]
        self.currentIndex += 1
        return element
    }

Notice that each call to next calls the Array subscript! This Array subscript call has to incur bounds checking and pointer arithmetic. Worse, because it's unoptimised debug mode generic code the pointer arithmetic is quite slow, as the implementation has to ask the runtime how big MyClass is to do the pointer math. And because the optimiser is turned off the body of the next() call can't be inlined, so the compiler can't see that a bunch of these checks aren't necessary.

Why is Array so fast in release mode? Because the Swift compiler does a great job of optimising through those abstractions. In release mode, it can see that next() always increments the index by 1, that it always does the subscript, that the subscript is actually pointer arithmetic and bounds checking, that it can hoist the bounds check, and then all of a sudden what you have is fast code.

Things like this are why we constantly remind people not to benchmark debug mode Swift code. It's entirely unlike the code you'll get in release mode.

Pampel · January 17, 2020, 4:06pm

Thats an awesome answer, thank you!

Ben_Cohen · January 17, 2020, 4:12pm

While @lukasa answer is spot on as far as the reasons why performance in debug is slow, it is a little unfortunate that the iterator for Array is quite so generic in its implementation, and I'd like it to be faster in debug. Iterating over an array even in debug ought to just be a question of bumping a pointer until it reaches the end, with no bounds checks in between. This couldn't be changed easily for various reasons, but my hope is when we introduce generators, the generator would be a lot faster.

Joe_Groff · January 17, 2020, 5:51pm

We also inherited the "debug at -O0" vs "release at -O2" model from what's commonly done in C family languages, but it arguably isn't a good model for languages like Swift or even C++ that delegate so much of their functionality to their standard libraries. I hope that in time, we'll have a more sophisticated debug story, something more akin to a Javascript tracing JIT that can optimize hot paths but still deoptimize when breakpoints or tracing are desired.