First off, props to the folks in the Swift Discord channel, especially the folks in #troubleshooting.
I was having trouble getting optimal performance out of a breadth-first search algorithm. It uses a bitvector to keep track of visited vertices (using a bitvector in this way tends to provide better cache coherency for large graphs).
Anyway, I borrowed code from Buckets-Swift/BitArray.swift at master · mauriciosantos/Buckets-Swift · GitHub and went to work.
Notice Line 201:
static let IntSize = MemoryLayout<Int>.size * 8
If we change this to
static let IntSize = Int.bitWidth
the BFS code speeds up by 40%. (This is with -O)
I don't understand why this should be the case. It seems a bit too magical to me, and I wonder where else I'm unintentionally giving up performance.
(After this change the Swift BFS is within 1.5x the Julia BFS, which itself is within 1.2x of C++.)
Could someone help me understand this?
2 Likes
If you'd like to test this yourself, the code is at GitHub - sbromberger/SwiftGraphs: Graphs in Swift
and the data file can be downloaded from Dropbox - File Deleted (Warning: ~250 MB gzipped, ~495 MB uncompressed).
DaveZ
(David Zarzycki)
3
By default, Swift globals are lazily initialized.
In this case, the compiler proved that Int.bitWidth was constant and avoid the lazy initialization overhead. In contrast, the compiler failed to prove that MemoryLayout<Int>.size * 8 is constant and therefore lazy initialization overhead was used.
1 Like
Could it ever be variable? If not, is this a candidate for optimization in the compiler itself?
DaveZ
(David Zarzycki)
5
The result is "obviously" constant, but not to the compiler apparently. Please file a bug and we'll see what the optimization experts have to say. Thanks!
If you want to look at this... warning nerd snipe.
I think this is most likely a global opt property but my memory might be wrong. But it could also be due to a phase ordering issue. I would run swiftc with the flag -Xllvm -sil-print-all. That will print out the result of each optimization pass on each function. I would look and see what pass gets rid of a, but not b after it runs. We have similar tests for stuff like this here:
And GlobalOpt is here:
2 Likes