rswift
1
I'm building a game using Swift. I've discovered that by separating the inner part of a pair of nested loops iterating over an array, the performance increases by nearly a whopping 10x. Why is this? Is it easier for the optimizer when it's separated in this way? Am I not understanding the reference counting overhead correctly here?
I'm on Xcode 12.5, but have also tested on Xcode 12.4 as well. Targeting macOS (Intel) release build.
I've recreated some sample code shown below. Test1 and Test2 classes show the 2 implementations, with Test2 performing nearly 10x faster.
Test1 outputs: 0.006973981857299805
Test2 outputs: 0.0007460117340087891
import Foundation
final class Body {
let value1: Int
let value2: Int
let value3: Int
let value4: Int
init(value1: Int, value2: Int, value3: Int, value4: Int) {
self.value1 = value1
self.value2 = value2
self.value3 = value3
self.value4 = value4
}
static func createBodies() -> [Body] {
var bodies: [Body] = []
for i in 1 ... 500 {
// Fill with some arbitrary values.
bodies.append(Body(value1: Int(i), value2: Int(i*10), value3: Int(i*100), value4: Int(i*1000)))
}
return bodies
}
}
final class Test1 {
var bodies: [Body]
init() {
self.bodies = Body.createBodies()
}
func run() -> Int {
var total: Int = 0
for body1 in bodies {
var innerTotal = 0
for body2 in bodies {
// some random computation
innerTotal += body1.value1*body2.value1+body1.value2*body2.value2+body1.value3*body2.value3+body1.value4*body2.value4
}
total += innerTotal
}
return total
}
}
final class Test2 {
var bodies: [Body]
init() {
self.bodies = Body.createBodies()
}
func helper(body1:Body, bodies: [Body]) -> Int {
var innerTotal: Int = 0
for body2 in bodies {
// some random computation
innerTotal += body1.value1*body2.value1+body1.value2*body2.value2+body1.value3*body2.value3+body1.value4*body2.value4
}
return innerTotal
}
func run() -> Int {
var total: Int = 0
for body1 in bodies {
total += helper(body1: body1, bodies: bodies)
}
return total
}
}
final class Main {
static func main() {
let test = Test1() // Change this to Test2 for roughly 10x more performance.
let startTime = CFAbsoluteTimeGetCurrent()
let total = test.run()
let elapsedTime = CFAbsoluteTimeGetCurrent() - startTime
print("elapsedTime: \(elapsedTime)")
print("total: \(total)")
}
}
Main.main()
1 Like
By a quick look, it seems like the down side is that on Test1 you are accessing the class mutating member bodies inside the first loop (for body2 in bodies here the access via Test1.bodies.getter) is what is making it slow. And therefore when you separate into the function the second loop access to the bodies is now in the non mutating bodies parameter which access don't have to account for mutability therefore access is faster.
So the solution to your performance issue is to make
final class Test1 {
let bodies: [Body] // Make it a let instead of a var
init() {
self.bodies = Body.createBodies()
}
Here is why the access getter for your let(immutable) property is faster
Emitted code for let bodies: [Body] getter
output.Test1.bodies.getter : [output.Body]:
mov rdi, qword ptr [r13 + 16]
jmp swift_retain@PLT
Emitted code for var bodies: [Body] getter
output.Test1.bodies.getter : [output.Body]:
push rbx
sub rsp, 32
lea rdi, [r13 + 16]
lea rsi, [rsp + 8]
xor edx, edx
xor ecx, ecx
call swift_beginAccess@PLT
mov rbx, qword ptr [r13 + 16]
mov rdi, rbx
call swift_retain@PLT
mov rax, rbx
add rsp, 32
pop rbx
ret
Note that by making bodies a var the getter has to emit an extra swift_beginAccess which I believe is to account for exclusivity and other mutating garantees that I don't know (maybe there is more stuff)... but in the end that is what may cause the performance issue.
So it short the solution is just change the bodies Test1 property from var to let =]
Hope that helps :)
2 Likes
Thanks for the feedback, while the reasoning sounds good, it unfortunately didn't appear to make a significant difference. If you can try running this code yourself I'd be interested in seeing if you can reproduce this performance issue. I've run this on two different Macs so far and different Xcode versions and I can consistently reproduce this.
I've also tried disabling exclusive access to memory checks too and this performance issues still persists, very confused by this.
That is strange, I indeed could reproduce the issue and see the difference when changing ... are you running that in release mode (-O)?
let bodies : 0.0009009838104248047
var bodies: 0.004495978355407715
Both on Test1 class :)
I would post the whole code I just run but is literally just
final class Test1 {
- var bodies: [Body]
+ let bodies: [Body]
init() {
self.bodies = Body.createBodies()
}
...
Perhaps I spoke too soon, you are correct. It turns out this fixes it on Xcode 12.5 but NOT 12.4. Thanks again!
I actually never realized how much overhead can be created by accessing the getter of a mutable array. I often try to stick with structs for critical performance code but sometimes, I end up needing to work with classes and that's usually when I run into performance issues. But I'll be sure to keep this in mind now about accessing the getter of a mutating member inside a performance critical loop.
1 Like
No problem, happy to help =]