One line gives me a 40% speedup. Why?

First off, props to the folks in the Swift Discord channel, especially the folks in #troubleshooting.

I was having trouble getting optimal performance out of a breadth-first search algorithm. It uses a bitvector to keep track of visited vertices (using a bitvector in this way tends to provide better cache coherency for large graphs).

Anyway, I borrowed code from https://github.com/mauriciosantos/Buckets-Swift/blob/master/Source/BitArray.swift and went to work.

Notice Line 201:

static let IntSize = MemoryLayout<Int>.size * 8

If we change this to

static let IntSize = Int.bitWidth

the BFS code speeds up by 40%. (This is with -O)

I don’t understand why this should be the case. It seems a bit too magical to me, and I wonder where else I’m unintentionally giving up performance.

(After this change the Swift BFS is within 1.5x the Julia BFS, which itself is within 1.2x of C++.)

Could someone help me understand this?

2 Likes

If you’d like to test this yourself, the code is at https://github.com/sbromberger/SwiftGraphs

and the data file can be downloaded from https://www.dropbox.com/s/6i362j05y92m57l/indptrvecs-4m-30m.0based.txt.gz?dl=0 (Warning: ~250 MB gzipped, ~495 MB uncompressed).

By default, Swift globals are lazily initialized.

In this case, the compiler proved that Int.bitWidth was constant and avoid the lazy initialization overhead. In contrast, the compiler failed to prove that MemoryLayout<Int>.size * 8 is constant and therefore lazy initialization overhead was used.

1 Like

Could it ever be variable? If not, is this a candidate for optimization in the compiler itself?

The result is “obviously” constant, but not to the compiler apparently. Please file a bug and we’ll see what the optimization experts have to say. Thanks!

Filed on @sbromberger’s behalf, per request. https://bugs.swift.org/browse/SR-7613

3 Likes

If you want to look at this… warning nerd snipe.

I think this is most likely a global opt property but my memory might be wrong. But it could also be due to a phase ordering issue. I would run swiftc with the flag -Xllvm -sil-print-all. That will print out the result of each optimization pass on each function. I would look and see what pass gets rid of a, but not b after it runs. We have similar tests for stuff like this here:

And GlobalOpt is here:

2 Likes