I have a struct called BitVector
that could be : RandomAccessCollection
(RAC). When I add RAC to the definition, I get a horrible performance regression in functions that use this class (but I'm not using any function that RAC requires, like startIndex
or endIndex
). This confuses me: why would declaring that my struct implements the property cause a slowdown?
I'm seeing this in another completely unrelated struct as well, though the regression isn't 2x the way it was with BitVector
- in this latest case, it's "only" about 20%.
Can you post your timing code that exposes the regression, so we can see exactly what you’re measuring?
It's here (that is, this is an update to main.swift in the current master branch, which has the 40-run version of this code):
let h = Graph<UInt32>(fromBinaryFile: "/Users/me/dev/swift/SwiftGraphs/data/indptrvecs-4m-30m.0based.bin")
for i in 1 ... 100_000 {
start = DispatchTime.now()
let bfs1 = h.BFS(from: 0)
end = DispatchTime.now()
let timediff = Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / ms
times.append(timediff)
let timeAvg = times.reduce(0, +) / Double(times.count)
let sumOfSquaredAvgDiff = times.map { pow($0 - timeAvg, 2.0) }.reduce(0, +)
let timeStd = sqrt(sumOfSquaredAvgDiff / Double(times.count - 1))
print("\(i): curr: \(timediff.truncate(places: 3)), min: \(times.min()!.truncate(places: 3)), max: \(times.max()!.truncate(places: 3)), avg: \(timeAvg.truncate(places: 3)), std: \(timeStd.truncate(places: 3))")
}
Without the property, I'm in the 600-700ms range for a single BFS. With the property, I'm in the 1400-1500ms range.
The BFS code is here: SwiftGraphs/Graph.swift at dc4c9c0462e54fdb8dac84aa529882736b38ca9a · sbromberger/SwiftGraphs · GitHub
If you build and run the current master, you will get timings that look like this*:
Times: min: 681.877303, max: 769.144946, avg: 703.053950575, std: 22.2595559447778
if you make a single change and add : RandomAccessCollection
to line 9 of BitVector.swift
so that it looks like this:
public struct BitVector: RandomAccessCollection {
you should see times like this*:
Times: min: 1209.338157, max: 1727.203483, avg: 1333.903255475, std: 125.676044887502
(* exact timings may differ based on machine, of course, but what's important is the regression.)
In cases like this, I always try to reduce it into a minimal complete/self-contained (command line) program that clearly demonstrates the issue.
Your current test depends on an (to readers of this forum) unknown file called: "/Users/me/dev/swift/SwiftGraphs/data/indptrvecs-4m-30m.0based.bin"
. It would be better if the test just produced some random data instead of this, which would probably also mean that LineReader could be skipped.
Having all the code in a single post, being able to just copy paste it in order to reproduce the issue, makes it much easier for people who wish to look into it.
(Also, by "adding a a property" I assume you mean "adding conformance to a protocol (RandomAccessCollection)"?)
Below is such a test program, measuring the performance of your BitVector
's testAndSet()
method (which calls its set and get subscript internally).
import AppKit
public struct BitVector {
private var bits: Array<Int>
private let blockSize = Int.bitWidth // NOTE: Should probably be static
public init(repeating: Bool, count: Int) {
let fillInt = repeating ? ~0 : 0
let nBlocks: Int = (count / blockSize) + 1
bits = [Int](repeating: fillInt, count: nBlocks)
}
// Added this for testing:
init(randomBitBlocks blockCount: Int) {
let byteCount = (Int.bitWidth / 8) * blockCount
bits = [Int](repeating: 0, count: blockCount)
let rc = SecRandomCopyBytes(nil, byteCount, &bits)
precondition(rc == errSecSuccess)
}
private func getBlockAndOffset(of bitIndex: Int) -> (Int, Int) {
return (bitIndex / blockSize, bitIndex % blockSize)
}
public var startIndex: Int { return 0 }
public var endIndex: Int { return bits.count * blockSize }
public subscript(_ bitIndex: Int) -> Bool {
get {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
return mask & bits[block] != 0
}
set {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
if newValue {
bits[block] |= mask
} else {
bits[block] &= ~mask
}
}
}
public mutating func testAndSet(_ bitIndex: Int) -> Bool {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
let oldval = mask & bits[block] != 0
if !oldval {
bits[block] |= mask
}
return oldval
}
}
// Comment/uncomment to see if it makes any difference in timings of test below:
// extension BitVector : RandomAccessCollection {}
func test() {
for _ in 0 ..< 5 {
var bv = BitVector(randomBitBlocks: 1_000_000 / Int.bitWidth)
var setCount = 0
let t0 = CACurrentMediaTime()
for i in bv.startIndex ..< bv.endIndex {
setCount += bv.testAndSet(i) ? 1 : 0
}
let t1 = CACurrentMediaTime()
print("time:", t1 - t0, "seconds. (\(setCount))")
}
}
test()
Using Xcode 9.3, and compiling with -O will result in output like the following on my MBP:
time: 0.00946520201978274 seconds. (499910)
time: 0.00749993498902768 seconds. (500297)
time: 0.00683424001908861 seconds. (500176)
time: 0.00753337901551276 seconds. (500429)
time: 0.00689123698975891 seconds. (498780)
And the timings are the same no matter if I comment or uncomment this line:
// extension BitVector : RandomAccessCollection {}
So unfortunately this didn't reproduce the slowdown that you're seeing, but perhaps you can devise a similar little (command line) program that does?
I tried most of the weekend to get it down to a manageable example, but I had no luck. The graph file is located on dropbox, but please be aware it's ~244MB.
Sorry this isn't an easier test, but with that file in the correct directory, you should have my environment fully replicated.
Also: testAndSet
is not the cause of the regression. If I merely use the getter followed by the setter inside the conditional, I get the same performance.
Inside what conditional?
Also, just to rule out any unrelated causes, you might try what happens if you use eg CACurrentMediaTime()
instead of DispatchTime.now()
.
testAndSet
does two things: it gets a value at a given index and sets it to true
if it's not set already. This is functionally equivalent to
if !visited[Int(neighbor)] {
visited[Int(neighbor)] = true
...
}
but my intent with testAndSet
was to see whether I could speed things up by getting rid of one modulo arithmetic operation since those can be expensive if not optimized. (In Swift's case, the two approaches are pretty much identical.)
I'll try CACurrentMediaTime()
. Thanks for the tip.
CACurrentMediaTime()
didn't change the results:
without :RandomAccessCollection
:
Times: min: 640.769736026414, max: 765.039346995763, avg: 676.587614505843, std: 33.5452987882788
with :RandomAccessCollection
:
Times: min: 1102.71427500993, max: 1316.3051699521, avg: 1187.05889147823, std: 50.3028173711964
testAndSet
is using subscript
which is using getBlockAndOffset
, and this is all of BitVector
except the initializer. So do you think the cause of the regression might be the initializer or something unrelated to BitVector
?
I think it's unrelated to testAndSet
, since when I replace that block with
I still have the performance issue when I include : RandomAccessCollection
to BitVector
.
I also just got rid of the RandomAccessCollection
extension I had in there (it defined searchSortedIndex
which is used on some graph initializers), and that made no difference to the results.
I don't think the cause is the graph initializer, since the graph is already initialized by the time BFS
is called.
Ok, I see what you mean, I just figured testAndSet
covered about all of BitVector
s functionality, so that it should show the regression if it was in there somewhere. Anyway, I have downloaded the data and am able to reproduce your results, on my machine:
-
With
BitVector
not conforming toRandomAccessCollection
:
Times: min: 751.859559, max: 960.927005, avg: 833.25661605, std: 53.2723263777442 -
With
BitVector
conforming toRandomAccessCollection
:
Times: min: 1328.528196, max: 1817.240632, avg: 1480.1516392, std: 122.247619056441
Thanks for sticking with me. I'm new to Swift, so if you find a way to make a MWE, please share!
This is still a lot of code (removed some) and it's still dependent on the data file (in the dropbox link above):
import Foundation
/// Read text file line by line
public class LineReader {
public let path: String
fileprivate let file: UnsafeMutablePointer<FILE>!
init?(path: String) {
self.path = path
file = fopen(path, "r")
guard file != nil else { return nil }
}
public var nextLine: String? {
var line: UnsafeMutablePointer<CChar>?
var linecap: Int = 0
defer { free(line) }
return getline(&line, &linecap, file) > 0 ? String(cString: line!) : nil
}
deinit {
fclose(file)
}
}
public struct BitVector {
private var bits: Array<Int>
private let blockSize = Int.bitWidth
public init(repeating: Bool, count: Int) {
let fillInt = repeating ? ~0 : 0
let nBlocks: Int = (count / blockSize) + 1
bits = [Int](repeating: fillInt, count: nBlocks)
}
private func getBlockAndOffset(of bitIndex: Int) -> (Int, Int) {
return (bitIndex / blockSize, bitIndex % blockSize)
}
public var startIndex: Int { return 0 }
public var endIndex: Int { return bits.count * blockSize }
public subscript(_ bitIndex: Int) -> Bool {
get {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
return mask & bits[block] != 0
}
set {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
if newValue {
bits[block] |= mask
} else {
bits[block] &= ~mask
}
}
}
public mutating func testAndSet(_ bitIndex: Int) -> Bool {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
let oldval = mask & bits[block] != 0
if !oldval {
bits[block] |= mask
}
return oldval
}
}
public struct Edge<T: BinaryInteger> {
let src: T
let dst: T
public init(_ s: T, _ d: T) {
src = s
dst = d
}
public var ordered: Edge {
if src > dst {
return Edge(dst, src)
} else { return self }
}
public var reverse: Edge {
return Edge(dst, src)
}
}
extension Edge: CustomStringConvertible {
public var description: String {
return "\(src) -> \(dst)"
}
}
extension Edge: Comparable {
public static func == (lhs: Edge, rhs: Edge) -> Bool {
return lhs.src == rhs.src && lhs.dst == rhs.dst
}
public static func < (lhs: Edge, rhs: Edge) -> Bool {
if lhs.src < rhs.src { return true }
if lhs.src > rhs.src { return false }
return lhs.dst < rhs.dst
}
}
public struct Graph<T: BinaryInteger> {
let rowidx: Array<T>
let colptr: Array<Array<T>.Index>
static var eltype: Any.Type { return T.self }
public var nv: T { return T((colptr.count - 1)) }
public var ne: Int { return rowidx.count / 2 }
public var vertices: StrideTo<T> {
return stride(from: T(0), to: nv, by: +1)
}
public init(fromBinaryFile fileName: String) {
let file = URL(fileURLWithPath: (fileName as NSString).expandingTildeInPath)
let fileHandle = try! FileHandle(forReadingFrom: file)
let magic = fileHandle.readData(ofLength: 4)
guard magic.elementsEqual("GRPH".utf8) else {
fatalError("\(file) was not a graph file")
}
let colSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
colptr = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * colSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [Int] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: colSize)
return [Int](bufferPointer.lazy.map { Int($0.bigEndian) })
})
let rowSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
rowidx = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * rowSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [T] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: rowSize)
return [T](bufferPointer.lazy.map { T($0.bigEndian) })
})
}
private func vecRange(_ s: Array<T>.Index) -> CountableRange<Array<T>.Index> {
let rStart = colptr[s]
let rEnd = colptr[s + 1]
return rStart ..< rEnd
}
public func neighbors(of vertex: T) -> ArraySlice<T> {
let range = vecRange(Array<T>.Index(vertex))
return rowidx[range]
}
public func BFS(from sourceVertex: Int) -> [T] {
let numVertices = Int(nv)
let maxT = ~T()
var visited = BitVector(repeating: false, count: numVertices)
var vertLevel = Array<T>(repeating: maxT, count: numVertices)
var nLevel: T = 1
var curLevel = [T]()
curLevel.reserveCapacity(numVertices)
var nextLevel = [T]()
nextLevel.reserveCapacity(numVertices)
visited[sourceVertex] = true
vertLevel[sourceVertex] = 0
curLevel.append(T(sourceVertex))
while !curLevel.isEmpty {
for vertex in curLevel {
for neighbor in neighbors(of: vertex) {
if !visited.testAndSet(Int(neighbor)) {
nextLevel.append(neighbor)
vertLevel[Int(neighbor)] = nLevel
}
}
}
nLevel += 1
curLevel.removeAll(keepingCapacity: true)
(curLevel, nextLevel) = (nextLevel, curLevel)
curLevel.sort()
}
return vertLevel
}
}
extension Graph: CustomStringConvertible {
public var description: String {
return "{\(nv), \(ne)} Graph"
}
}
//-----------------------------------------------------------------------------
// Comment/uncomment to see difference in timings:
//-----------------------------------------------------------------------------
// extension BitVector : RandomAccessCollection {}
// (
// With BitVector *not* conforming to RandomAccessCollection:
// Times: min: 751.859559, max: 960.927005, avg: 833.25661605, std: 53.2723263777442
//
// With BitVector conforming to RandomAccessCollection:
// Times: min: 1328.528196, max: 1817.240632, avg: 1480.1516392, std: 122.247619056441
// )
var start = DispatchTime.now()
let h = Graph<UInt32>(fromBinaryFile: "~/Downloads/indptrvecs-4m-30m.0based.bin")
var end = DispatchTime.now()
print("graph read took \(Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000.0) ms")
print(h)
var times = [Double]()
var ms = 1_000_000.0
for i in 1 ... 40 {
start = DispatchTime.now()
let bfs1 = h.BFS(from: 0)
end = DispatchTime.now()
let timediff = Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / ms
// let timeit = end2.uptimeNanoseconds - start2.uptimeNanoseconds
print("Run \(i): BFS from vertex 0 to \(timediff) ms; sum = \(bfs1.reduce(0, +))")
times.append(timediff)
}
let timeAvg = times.reduce(0, +) / Double(times.count)
let sumOfSquaredAvgDiff = times.map { pow($0 - timeAvg, 2.0) }.reduce(0, +)
let timeStd = sqrt(sumOfSquaredAvgDiff / Double(times.count - 1))
print("Times: min: \(times.min()!), max: \(times.max()!), avg: \(timeAvg), std: \(timeStd)")
If you check out the sbromberger/MWE branch, that might help.
I tried some things with the code (since I've ran into similar issues that I never managed to isolate), but the only thing I could find out was that the regression disappears if you annotate the get and set of BitVector
s subscript with @_transparent
. I don't know why this is, but I guess it does demonstrate a bug (or at least an opportunity for improvement) in the optimizer. But it would be nice if someone more skilled could find a smaller reproducing example, cc @Slava_Pestov, @Joe_Groff, @Andrew_Trick.
I'm attaching the code (it's the files of your MWE-branch, joined into one) in the state where I left it after finding this workaround (and after doing some other things that I tested but which made no difference, eg removing all public
, sticking the test code in a func, fixing the warnings that then appeared about changing some vars to lets etc.).
import Dispatch
import QuartzCore
struct BitVector {
private var bits: Array<Int>
private let blockSize = Int.bitWidth
init(repeating: Bool, count: Int) {
let fillInt = repeating ? ~0 : 0
let nBlocks: Int = (count / blockSize) + 1
bits = [Int](repeating: fillInt, count: nBlocks)
}
private func getBlockAndOffset(of bitIndex: Int) -> (Int, Int) {
return (bitIndex / blockSize, bitIndex % blockSize)
}
var startIndex: Int { return 0 }
var endIndex: Int { return bits.count * blockSize }
// NOTE: Adding @_transparent to the get and set of this subscript
// removes the slowdown that is otherwise magically introduced by
// conforming BitVector to RandomAccessCollection (see further down).
// So if you comment out the two @_transparent and keep the conformance
// to RandomAccessCollection, the slowdown will be there again.
subscript(_ bitIndex: Int) -> Bool {
@_transparent
get {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
return mask & bits[block] != 0
}
@_transparent
set {
let (block, offset) = getBlockAndOffset(of: bitIndex)
let mask = 1 << offset
if newValue {
bits[block] |= mask
} else {
bits[block] &= ~mask
}
}
}
}
struct Edge<T: BinaryInteger> {
let src: T
let dst: T
init(_ s: T, _ d: T) {
src = s
dst = d
}
var ordered: Edge {
if src > dst {
return Edge(dst, src)
} else { return self }
}
var reverse: Edge {
return Edge(dst, src)
}
}
struct Graph<T: BinaryInteger> {
let rowidx: Array<T>
let colptr: Array<Array<T>.Index>
static var eltype: Any.Type { return T.self }
var nv: T { return T((colptr.count - 1)) }
var ne: Int { return rowidx.count / 2 }
var vertices: StrideTo<T> {
return stride(from: T(0), to: nv, by: +1)
}
init(fromBinaryFile fileName: String) {
let file = URL(fileURLWithPath: fileName)
let fileHandle = try! FileHandle(forReadingFrom: file)
let magic = fileHandle.readData(ofLength: 4)
guard magic.elementsEqual("GRPH".utf8) else {
fatalError("\(file) was not a graph file")
}
let colSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
colptr = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * colSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [Int] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: colSize)
return [Int](bufferPointer.lazy.map { Int($0.bigEndian) })
})
let rowSize = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size).withUnsafeBytes { (ptr: UnsafePointer<UInt32>) -> Int in
return Int(ptr.pointee.bigEndian)
}
rowidx = fileHandle.readData(ofLength: MemoryLayout<UInt32>.size * rowSize).withUnsafeBytes({ (ptr: UnsafePointer<UInt32>) -> [T] in
let bufferPointer = UnsafeBufferPointer(start: ptr, count: rowSize)
return [T](bufferPointer.lazy.map { T($0.bigEndian) })
})
}
private func vecRange(_ s: Array<T>.Index) -> Range<Array<T>.Index> {
let rStart = colptr[s]
let rEnd = colptr[s + 1]
return rStart ..< rEnd
}
func neighbors(of vertex: T) -> ArraySlice<T> {
let range = vecRange(Array<T>.Index(vertex))
return rowidx[range]
}
func BFS(from sourceVertex: Int) -> [T] {
let numVertices = Int(nv)
let maxT = ~T()
var visited = BitVector(repeating: false, count: numVertices)
var vertLevel = Array<T>(repeating: maxT, count: numVertices)
var nLevel: T = 1
var curLevel = [T]()
curLevel.reserveCapacity(numVertices)
var nextLevel = [T]()
nextLevel.reserveCapacity(numVertices)
visited[sourceVertex] = true
vertLevel[sourceVertex] = 0
curLevel.append(T(sourceVertex))
while !curLevel.isEmpty {
for vertex in curLevel {
for neighbor in neighbors(of: vertex) {
if !visited[Int(neighbor)] {
visited[Int(neighbor)] = true
nextLevel.append(neighbor)
vertLevel[Int(neighbor)] = nLevel
}
}
}
nLevel += 1
curLevel.removeAll(keepingCapacity: true)
(curLevel, nextLevel) = (nextLevel, curLevel)
curLevel.sort()
}
return vertLevel
}
}
extension Graph: CustomStringConvertible {
var description: String {
return "{\(nv), \(ne)} Graph"
}
}
// NOTE: This conformance is not necessary for the test to succeed, but
// without the workaround of adding @_transparent to the get and set of
// BitVector's subscript (see above), this conformance will somehow cause
// a performance regression in the below test.
extension BitVector : RandomAccessCollection {}
func test() {
let start = DispatchTime.now()
let h = Graph<UInt32>(fromBinaryFile: "/Users/jens/Downloads/indptrvecs-4m-30m.0based.bin")
let end = DispatchTime.now()
print("graph read took \(Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000.0) ms")
print(h)
var times = [Double]()
for i in 1 ... 40 {
let start = CACurrentMediaTime()
let bfs1 = h.BFS(from: 0)
let end = CACurrentMediaTime()
let timediff = (end - start) * 1000
print("Run \(i): BFS from vertex 0 to \(timediff) ms; sum = \(bfs1.reduce(0, +))")
times.append(timediff)
}
let timeAvg = times.reduce(0, +) / Double(times.count)
let sumOfSquaredAvgDiff = times.map { pow($0 - timeAvg, 2.0) }.reduce(0, +)
let timeStd = sqrt(sumOfSquaredAvgDiff / Double(times.count - 1))
print("Times: min: \(times.min()!), max: \(times.max()!), avg: \(timeAvg), std: \(timeStd)")
}
test()
For anyone wanting to look into this: In order to run the above demo program, you also need the graph data file linked to here.
Thanks for narrowing this down, @Jens! It'd also be interesting to compare the -dump-ast
output with and without the RandomAccessCollection
conformance. It's likely that it's influencing overload resolution somewhere here in a way that's bad for performance; maybe it's picking a "better" overload that isn't really better in this case.
I'm happy to help narrow this down, but I'll need explicit steps.
If you build the Swift file by replacing the normal output option flag (-c
, -emit-library
, etc.) with -dump-ast
on the commandline, then you ought to get a dump of the Swift compiler's parse and semantic analysis of the program. It'd be interesting to diff the before and after dumps.