# Why does adding conformance to RandomAccessCollection cause a performance regression?

I have a struct called `BitVector` that could be `: RandomAccessCollection` (RAC). When I add RAC to the definition, I get a horrible performance regression in functions that use this class (but I'm not using any function that RAC requires, like `startIndex` or `endIndex`). This confuses me: why would declaring that my struct implements the property cause a slowdown?

I’m seeing this in another completely unrelated struct as well, though the regression isn’t 2x the way it was with `BitVector` - in this latest case, it’s “only” about 20%.

Can you post your timing code that exposes the regression, so we can see exactly what you’re measuring?

It’s here (that is, this is an update to main.swift in the current master branch, which has the 40-run version of this code):

``````let h = Graph<UInt32>(fromBinaryFile: "/Users/me/dev/swift/SwiftGraphs/data/indptrvecs-4m-30m.0based.bin")

for i in 1 ... 100_000 {
start = DispatchTime.now()
let bfs1 = h.BFS(from: 0)
end = DispatchTime.now()
let timediff = Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / ms
times.append(timediff)
let timeAvg = times.reduce(0, +) / Double(times.count)
let sumOfSquaredAvgDiff = times.map { pow(\$0 - timeAvg, 2.0) }.reduce(0, +)
let timeStd = sqrt(sumOfSquaredAvgDiff / Double(times.count - 1))
print("\(i): curr: \(timediff.truncate(places: 3)), min: \(times.min()!.truncate(places: 3)), max: \(times.max()!.truncate(places: 3)), avg: \(timeAvg.truncate(places: 3)), std: \(timeStd.truncate(places: 3))")
}
``````

Without the property, I’m in the 600-700ms range for a single BFS. With the property, I’m in the 1400-1500ms range.

If you build and run the current master, you will get timings that look like this*:

``````Times: min: 681.877303, max: 769.144946, avg: 703.053950575, std: 22.2595559447778
``````

if you make a single change and add `: RandomAccessCollection` to line 9 of `BitVector.swift` so that it looks like this:

``````public struct BitVector: RandomAccessCollection  {
``````

you should see times like this*:

``````Times: min: 1209.338157, max: 1727.203483, avg: 1333.903255475, std: 125.676044887502
``````

(* exact timings may differ based on machine, of course, but what’s important is the regression.)

In cases like this, I always try to reduce it into a minimal complete/self-contained (command line) program that clearly demonstrates the issue.

Your current test depends on an (to readers of this forum) unknown file called: `"/Users/me/dev/swift/SwiftGraphs/data/indptrvecs-4m-30m.0based.bin"`. It would be better if the test just produced some random data instead of this, which would probably also mean that LineReader could be skipped.

Having all the code in a single post, being able to just copy paste it in order to reproduce the issue, makes it much easier for people who wish to look into it.

(Also, by “adding a a property” I assume you mean “adding conformance to a protocol (RandomAccessCollection)”?)

Below is such a test program, measuring the performance of your `BitVector`'s `testAndSet()` method (which calls its set and get subscript internally).

``````import AppKit

public struct BitVector  {
private var bits: Array<Int>

private let blockSize = Int.bitWidth // NOTE: Should probably be static

public init(repeating: Bool, count: Int) {
let fillInt = repeating ? ~0 : 0
let nBlocks: Int = (count / blockSize) + 1
bits = [Int](repeating: fillInt, count: nBlocks)
}

// Added this for testing:
init(randomBitBlocks blockCount: Int) {
let byteCount = (Int.bitWidth / 8) * blockCount
bits = [Int](repeating: 0, count: blockCount)
let rc = SecRandomCopyBytes(nil, byteCount, &bits)
precondition(rc == errSecSuccess)
}

private func getBlockAndOffset(of bitIndex: Int) -> (Int, Int) {
return (bitIndex / blockSize, bitIndex % blockSize)
}

public var startIndex: Int { return 0 }
public var endIndex: Int { return bits.count * blockSize }

public subscript(_ bitIndex: Int) -> Bool {
get {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
return mask & bits[block] != 0
}
set {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
if newValue {
} else {
}
}
}

public mutating func testAndSet(_ bitIndex: Int) -> Bool {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
let oldval = mask & bits[block] != 0
if !oldval {
}
return oldval
}
}

// Comment/uncomment to see if it makes any difference in timings of test below:
// extension BitVector : RandomAccessCollection {}

func test() {
for _ in 0 ..< 5 {
var bv = BitVector(randomBitBlocks: 1_000_000 / Int.bitWidth)
var setCount = 0
let t0 = CACurrentMediaTime()
for i in bv.startIndex ..< bv.endIndex {
setCount += bv.testAndSet(i) ? 1 : 0
}
let t1 = CACurrentMediaTime()
print("time:", t1 - t0, "seconds. (\(setCount))")
}
}
test()
``````

Using Xcode 9.3, and compiling with -O will result in output like the following on my MBP:

``````time: 0.00946520201978274 seconds. (499910)
time: 0.00749993498902768 seconds. (500297)
time: 0.00683424001908861 seconds. (500176)
time: 0.00753337901551276 seconds. (500429)
time: 0.00689123698975891 seconds. (498780)
``````

And the timings are the same no matter if I comment or uncomment this line:

``````// extension BitVector : RandomAccessCollection {}
``````

So unfortunately this didn’t reproduce the slowdown that you’re seeing, but perhaps you can devise a similar little (command line) program that does?

4 Likes

I tried most of the weekend to get it down to a manageable example, but I had no luck. The graph file is located on dropbox, but please be aware it’s ~244MB.

Sorry this isn’t an easier test, but with that file in the correct directory, you should have my environment fully replicated.

Also: `testAndSet` is not the cause of the regression. If I merely use the getter followed by the setter inside the conditional, I get the same performance.

Inside what conditional?

Also, just to rule out any unrelated causes, you might try what happens if you use eg `CACurrentMediaTime()` instead of `DispatchTime.now()`.

`testAndSet` does two things: it gets a value at a given index and sets it to `true` if it’s not set already. This is functionally equivalent to

``````if !visited[Int(neighbor)] {
visited[Int(neighbor)] = true
...
}
``````

but my intent with `testAndSet` was to see whether I could speed things up by getting rid of one modulo arithmetic operation since those can be expensive if not optimized. (In Swift’s case, the two approaches are pretty much identical.)

I’ll try `CACurrentMediaTime()`. Thanks for the tip.

`CACurrentMediaTime()` didn’t change the results:

without `:RandomAccessCollection`:
`Times: min: 640.769736026414, max: 765.039346995763, avg: 676.587614505843, std: 33.5452987882788`

with `:RandomAccessCollection`:
`Times: min: 1102.71427500993, max: 1316.3051699521, avg: 1187.05889147823, std: 50.3028173711964`

`testAndSet` is using `subscript` which is using `getBlockAndOffset`, and this is all of `BitVector` except the initializer. So do you think the cause of the regression might be the initializer or something unrelated to `BitVector`?

I think it’s unrelated to `testAndSet`, since when I replace that block with

I still have the performance issue when I include `: RandomAccessCollection` to `BitVector`.

I also just got rid of the `RandomAccessCollection` extension I had in there (it defined `searchSortedIndex` which is used on some graph initializers), and that made no difference to the results.

I don’t think the cause is the graph initializer, since the graph is already initialized by the time `BFS` is called.

Ok, I see what you mean, I just figured `testAndSet` covered about all of `BitVector`s functionality, so that it should show the regression if it was in there somewhere. Anyway, I have downloaded the data and am able to reproduce your results, on my machine:

• With `BitVector` not conforming to `RandomAccessCollection`:
Times: min: 751.859559, max: 960.927005, avg: 833.25661605, std: 53.2723263777442

• With `BitVector` conforming to `RandomAccessCollection`:
Times: min: 1328.528196, max: 1817.240632, avg: 1480.1516392, std: 122.247619056441

2 Likes

Thanks for sticking with me. I’m new to Swift, so if you find a way to make a MWE, please share!

This is still a lot of code (removed some) and it’s still dependent on the data file (in the dropbox link above):

``````import Foundation

/// Read text file line by line
public class LineReader {
public let path: String

fileprivate let file: UnsafeMutablePointer<FILE>!

init?(path: String) {
self.path = path
file = fopen(path, "r")
guard file != nil else { return nil }
}

public var nextLine: String? {
var line: UnsafeMutablePointer<CChar>?
var linecap: Int = 0
defer { free(line) }
return getline(&line, &linecap, file) > 0 ? String(cString: line!) : nil
}

deinit {
fclose(file)
}
}

public struct BitVector {
private var bits: Array<Int>

private let blockSize = Int.bitWidth

public init(repeating: Bool, count: Int) {
let fillInt = repeating ? ~0 : 0
let nBlocks: Int = (count / blockSize) + 1
bits = [Int](repeating: fillInt, count: nBlocks)
}

private func getBlockAndOffset(of bitIndex: Int) -> (Int, Int) {
return (bitIndex / blockSize, bitIndex % blockSize)
}

public var startIndex: Int { return 0 }
public var endIndex: Int { return bits.count * blockSize }

public subscript(_ bitIndex: Int) -> Bool {
get {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
return mask & bits[block] != 0
}
set {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
if newValue {
} else {
}
}
}

public mutating func testAndSet(_ bitIndex: Int) -> Bool {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
let oldval = mask & bits[block] != 0
if !oldval {
}
return oldval
}
}

public struct Edge<T: BinaryInteger> {
let src: T
let dst: T

public init(_ s: T, _ d: T) {
src = s
dst = d
}

public var ordered: Edge {
if src > dst {
return Edge(dst, src)
} else { return self }
}

public var reverse: Edge {
return Edge(dst, src)
}
}

extension Edge: CustomStringConvertible {
public var description: String {
return "\(src) -> \(dst)"
}
}

extension Edge: Comparable {
public static func == (lhs: Edge, rhs: Edge) -> Bool {
return lhs.src == rhs.src && lhs.dst == rhs.dst
}

public static func < (lhs: Edge, rhs: Edge) -> Bool {
if lhs.src < rhs.src { return true }
if lhs.src > rhs.src { return false }
return lhs.dst < rhs.dst
}
}

public struct Graph<T: BinaryInteger> {
let rowidx: Array<T>
let colptr: Array<Array<T>.Index>

static var eltype: Any.Type { return T.self }
public var nv: T { return T((colptr.count - 1)) }
public var ne: Int { return rowidx.count / 2 }

public var vertices: StrideTo<T> {
return stride(from: T(0), to: nv, by: +1)
}

public init(fromBinaryFile fileName: String) {
let file = URL(fileURLWithPath: (fileName as NSString).expandingTildeInPath)
let fileHandle = try! FileHandle(forReadingFrom: file)
let magic = fileHandle.readData(ofLength: 4)
guard magic.elementsEqual("GRPH".utf8) else {
fatalError("\(file) was not a graph file")
}
let colSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
colptr = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * colSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [Int] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: colSize)
return [Int](bufferPointer.lazy.map { Int(\$0.bigEndian) })
})
let rowSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
rowidx = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * rowSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [T] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: rowSize)
return [T](bufferPointer.lazy.map { T(\$0.bigEndian) })
})
}

private func vecRange(_ s: Array<T>.Index) -> CountableRange<Array<T>.Index> {
let rStart = colptr[s]
let rEnd = colptr[s + 1]
return rStart ..< rEnd
}
public func neighbors(of vertex: T) -> ArraySlice<T> {
let range = vecRange(Array<T>.Index(vertex))
return rowidx[range]
}

public func BFS(from sourceVertex: Int) -> [T] {
let numVertices = Int(nv)
let maxT = ~T()
var visited = BitVector(repeating: false, count: numVertices)
var vertLevel = Array<T>(repeating: maxT, count: numVertices)
var nLevel: T = 1
var curLevel = [T]()
curLevel.reserveCapacity(numVertices)
var nextLevel = [T]()
nextLevel.reserveCapacity(numVertices)

visited[sourceVertex] = true
vertLevel[sourceVertex] = 0
curLevel.append(T(sourceVertex))

while !curLevel.isEmpty {
for vertex in curLevel {
for neighbor in neighbors(of: vertex) {
if !visited.testAndSet(Int(neighbor)) {
nextLevel.append(neighbor)
vertLevel[Int(neighbor)] = nLevel
}
}
}
nLevel += 1
curLevel.removeAll(keepingCapacity: true)
(curLevel, nextLevel) = (nextLevel, curLevel)
curLevel.sort()
}
return vertLevel
}
}

extension Graph: CustomStringConvertible {
public var description: String {
return "{\(nv), \(ne)} Graph"
}
}

//-----------------------------------------------------------------------------
// Comment/uncomment to see difference in timings:
//-----------------------------------------------------------------------------
// extension BitVector : RandomAccessCollection {}

// (
// With BitVector *not* conforming to RandomAccessCollection:
// Times: min: 751.859559, max: 960.927005, avg: 833.25661605, std: 53.2723263777442
//
// With BitVector conforming to RandomAccessCollection:
// Times: min: 1328.528196, max: 1817.240632, avg: 1480.1516392, std: 122.247619056441
// )

var start = DispatchTime.now()

var end = DispatchTime.now()
print("graph read took \(Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000.0) ms")

print(h)
var times = [Double]()
var ms = 1_000_000.0
for i in 1 ... 40 {
start = DispatchTime.now()
let bfs1 = h.BFS(from: 0)
end = DispatchTime.now()
let timediff = Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / ms
// let timeit = end2.uptimeNanoseconds - start2.uptimeNanoseconds
print("Run \(i): BFS from vertex 0 to \(timediff) ms; sum = \(bfs1.reduce(0, +))")
times.append(timediff)
}

let timeAvg = times.reduce(0, +) / Double(times.count)
let sumOfSquaredAvgDiff = times.map { pow(\$0 - timeAvg, 2.0) }.reduce(0, +)
let timeStd = sqrt(sumOfSquaredAvgDiff / Double(times.count - 1))
print("Times: min: \(times.min()!), max: \(times.max()!), avg: \(timeAvg), std: \(timeStd)")``````

If you check out the sbromberger/MWE branch, that might help.

I tried some things with the code (since I’ve ran into similar issues that I never managed to isolate), but the only thing I could find out was that the regression disappears if you annotate the get and set of `BitVector`s subscript with `@_transparent`. I don’t know why this is, but I guess it does demonstrate a bug (or at least an opportunity for improvement) in the optimizer. But it would be nice if someone more skilled could find a smaller reproducing example, cc @Slava_Pestov, @Joe_Groff, @Andrew_Trick.

I’m attaching the code (it’s the files of your MWE-branch, joined into one) in the state where I left it after finding this workaround (and after doing some other things that I tested but which made no difference, eg removing all `public`, sticking the test code in a func, fixing the warnings that then appeared about changing some vars to lets etc.).

``````import Dispatch
import QuartzCore

struct BitVector {
private var bits: Array<Int>

private let blockSize = Int.bitWidth

init(repeating: Bool, count: Int) {
let fillInt = repeating ? ~0 : 0
let nBlocks: Int = (count / blockSize) + 1
bits = [Int](repeating: fillInt, count: nBlocks)
}

private func getBlockAndOffset(of bitIndex: Int) -> (Int, Int) {
return (bitIndex / blockSize, bitIndex % blockSize)
}

var startIndex: Int { return 0 }
var endIndex: Int { return bits.count * blockSize }

// NOTE: Adding @_transparent to the get and set of this subscript
// removes the slowdown that is otherwise magically introduced by
// conforming BitVector to RandomAccessCollection (see further down).
// So if you comment out the two @_transparent and keep the conformance
// to RandomAccessCollection, the slowdown will be there again.
subscript(_ bitIndex: Int) -> Bool {
@_transparent
get {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
return mask & bits[block] != 0
}
@_transparent
set {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
if newValue {
} else {
}
}
}
}

struct Edge<T: BinaryInteger> {
let src: T
let dst: T

init(_ s: T, _ d: T) {
src = s
dst = d
}

var ordered: Edge {
if src > dst {
return Edge(dst, src)
} else { return self }
}

var reverse: Edge {
return Edge(dst, src)
}
}

struct Graph<T: BinaryInteger> {
let rowidx: Array<T>
let colptr: Array<Array<T>.Index>

static var eltype: Any.Type { return T.self }
var nv: T { return T((colptr.count - 1)) }
var ne: Int { return rowidx.count / 2 }

var vertices: StrideTo<T> {
return stride(from: T(0), to: nv, by: +1)
}

init(fromBinaryFile fileName: String) {
let file = URL(fileURLWithPath: fileName)
let fileHandle = try! FileHandle(forReadingFrom: file)
let magic = fileHandle.readData(ofLength: 4)
guard magic.elementsEqual("GRPH".utf8) else {
fatalError("\(file) was not a graph file")
}
let colSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
colptr = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * colSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [Int] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: colSize)
return [Int](bufferPointer.lazy.map { Int(\$0.bigEndian) })
})
let rowSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
rowidx = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * rowSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [T] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: rowSize)
return [T](bufferPointer.lazy.map { T(\$0.bigEndian) })
})
}

private func vecRange(_ s: Array<T>.Index) -> Range<Array<T>.Index> {
let rStart = colptr[s]
let rEnd = colptr[s + 1]
return rStart ..< rEnd
}

func neighbors(of vertex: T) -> ArraySlice<T> {
let range = vecRange(Array<T>.Index(vertex))
return rowidx[range]
}

func BFS(from sourceVertex: Int) -> [T] {
let numVertices = Int(nv)
let maxT = ~T()
var visited = BitVector(repeating: false, count: numVertices)
var vertLevel = Array<T>(repeating: maxT, count: numVertices)
var nLevel: T = 1
var curLevel = [T]()
curLevel.reserveCapacity(numVertices)
var nextLevel = [T]()
nextLevel.reserveCapacity(numVertices)

visited[sourceVertex] = true
vertLevel[sourceVertex] = 0
curLevel.append(T(sourceVertex))

while !curLevel.isEmpty {
for vertex in curLevel {
for neighbor in neighbors(of: vertex) {
if !visited[Int(neighbor)] {
visited[Int(neighbor)] = true
nextLevel.append(neighbor)
vertLevel[Int(neighbor)] = nLevel
}
}
}
nLevel += 1
curLevel.removeAll(keepingCapacity: true)
(curLevel, nextLevel) = (nextLevel, curLevel)
curLevel.sort()
}
return vertLevel
}
}

extension Graph: CustomStringConvertible {
var description: String {
return "{\(nv), \(ne)} Graph"
}
}

// NOTE: This conformance is not necessary for the test to succeed, but
// without the workaround of adding @_transparent to the get and set of
// BitVector's subscript (see above), this conformance will somehow cause
// a performance regression in the below test.
extension BitVector : RandomAccessCollection {}

func test() {
let start = DispatchTime.now()
let end = DispatchTime.now()
print("graph read took \(Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000.0) ms")
print(h)
var times = [Double]()
for i in 1 ... 40 {
let start = CACurrentMediaTime()
let bfs1 = h.BFS(from: 0)
let end = CACurrentMediaTime()
let timediff = (end - start) * 1000
print("Run \(i): BFS from vertex 0 to \(timediff) ms; sum = \(bfs1.reduce(0, +))")
times.append(timediff)
}

let timeAvg = times.reduce(0, +) / Double(times.count)
let sumOfSquaredAvgDiff = times.map { pow(\$0 - timeAvg, 2.0) }.reduce(0, +)
let timeStd = sqrt(sumOfSquaredAvgDiff / Double(times.count - 1))
print("Times: min: \(times.min()!), max: \(times.max()!), avg: \(timeAvg), std: \(timeStd)")
}
test()
``````

For anyone wanting to look into this: In order to run the above demo program, you also need the graph data file linked to here.

Thanks for narrowing this down, @Jens! It'd also be interesting to compare the `-dump-ast` output with and without the `RandomAccessCollection` conformance. It's likely that it's influencing overload resolution somewhere here in a way that's bad for performance; maybe it's picking a "better" overload that isn't really better in this case.

I'm happy to help narrow this down, but I'll need explicit steps.

If you build the Swift file by replacing the normal output option flag (`-c`, `-emit-library`, etc.) with `-dump-ast` on the commandline, then you ought to get a dump of the Swift compiler's parse and semantic analysis of the program. It'd be interesting to diff the before and after dumps.