Runtime performance cost of key paths

atfelix · May 2, 2020, 9:36pm

Questions

Is there an expectation that keypaths significantly slower than functions or straight access with a closure?
If so, is there anyway to mitigate these discrepancies while still using keypaths?

Context

I'm noticing anywhere between an increased factor of 2-10+ when using keypaths as functions or using dynamic member lookup.

Program 1: Key paths as functions

import AppKit

func arrayClosureTest(_ size: UInt) {
    let xs = (0 ... size).map(String.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map { $0.count }
    let t1 = CACurrentMediaTime()

    print("closure test:\n\t\t\(t1 - t0)")
}

func arrayKeyPathTest(_ size: UInt) {
    let xs = (0 ... size).map(String.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map(\.count)
    let t1 = CACurrentMediaTime()

    print("key path test:\n\t\t\(t1 - t0)")
}

func main() {
    print(String(repeating: "=", count: 50))
    let sizes: [UInt] = [1_000, 10_000, 100_000, 1_000_000]
    for size in sizes {
        arrayClosureTest(size)
        arrayKeyPathTest(size)
        print(String(repeating: "=", count: 50))
    }
}

yields with no optimizations (-O0).

==================================================
closure test:
		9.460499859414995e-05
key path test:
		0.0006224889948498458
==================================================
closure test:
		0.0005262159975245595
key path test:
		0.006581138004548848
==================================================
closure test:
		0.0057080870028585196
key path test:
		0.06060750799952075
==================================================
closure test:
		0.05970345200330485
key path test:
		0.5920746110059554
==================================================

If I change the optimizations settings to Fastest, Smallest (-Os), then the output is

==================================================
closure test:
		0.0001036890025716275
key path test:
		0.0008002639951882884
==================================================
closure test:
		0.0005304780061123893
key path test:
		0.006536139000672847
==================================================
closure test:
		0.00698990399541799
key path test:
		0.06180560700886417
==================================================
closure test:
		0.060244746011449024
key path test:
		0.5942129169998225
==================================================

If I change the optimizations settings to Fastest (-Ofast), then the output is

==================================================
closure test:
		8.602200250606984e-05
key path test:
		0.0006292640027822927
==================================================
closure test:
		0.0005449989985208958
key path test:
		0.006529158999910578
==================================================
closure test:
		0.005752074997872114
key path test:
		0.05974736200005282
==================================================
closure test:
		0.06013945199083537
key path test:
		0.5992421360133449
==================================================

Program 2: `@dynamicallyMemberLookup` with key paths

import AppKit

@dynamicMemberLookup
struct Identity<A> {
    let value: A

    subscript<B>(dynamicMember keyPath: KeyPath<A, B>) -> B{
        value[keyPath: keyPath]
    }
}

func straightAccessTest(_ size: UInt) {
    let xs = (0 ... size).map(Identity.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map { $0.value.hashValue }
    let t1 = CACurrentMediaTime()

    print("access test:\n\t\t\(t1 - t0)")
}

func dynamicMemberLookupTest(_ size: UInt) {
    let xs = (0 ... size).map(Identity.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map { $0.hashValue }
    let t1 = CACurrentMediaTime()

    print("dynamic member test:\n\t\t\(t1 - t0)")
}

func main() {
    print(String(repeating: "=", count: 50))
    let sizes: [UInt] = [1_000, 10_000, 100_000, 1_000_000]

    for size in sizes {
        straightAccessTest(size)
        dynamicMemberLookupTest(size)
        print(String(repeating: "=", count: 50))
    }
}

The output for -O0, -Os and -Ofast were pretty much the same for all three. Here is -Ofast's output:

==================================================
access test:
        3.349900362081826e-05
dynamic member test:
        0.0005792470037704334
==================================================
access test:
        0.00019683199934661388
dynamic member test:
        0.005823437997605652
==================================================
access test:
        0.0014278379967436194
dynamic member test:
        0.057225974000175484
==================================================
access test:
        0.015988421000656672
dynamic member test:
        0.5222387870016973
==================================================

johannesweiss · May 4, 2020, 2:08pm

github.com/apple/swift

[SR-9323] KeyPaths are quite slow

opened 01:08PM - 22 Nov 18 UTC

weissi

bug performance compiler

| | | |------------------|-----------------|… |Previous ID | SR-9323 | |Radar | rdar://problem/52529589 | |Original Reporter | @weissi | |Type | Bug | Attachment: [Download](https://user-images.githubusercontent.com/2727770/164963044-daaa6692-c66d-47a3-b822-a02930633670.gz) <details> <summary>Additional Detail from JIRA</summary> | | | |------------------|-----------------| |Votes | 14 | |Component/s | Compiler | |Labels | Bug, Performance | |Assignee | None | |Priority | Medium | md5: 8a7995ec006b266bd138d8a19ef4ebed </details> **is duplicated by**: * [SR-11983](https://bugs.swift.org/browse/SR-11983) KeyPath performance below expectation compared to alternative (see test) **Issue Description:** ## Description Naively I always assumed KeyPaths are quite fast. In my head they were basically a tuple of two function pointers (a getter and a setter, both not capturing so `@convention(thin)` like) that get just handed around and applied. At least I assumed that would be some sort of fast path when it has enough information at compile-time. To see how fast/slow keypaths are, I made a quick benchmark which just incremented a struct member (`Int`) 100 M times. To have an idea what the theoretical maximum is, I compared that to a version that doesn't use key paths at all and just does \`thing.x += 1\` in a loop. (I checked the assembly and the compiler does spit out every single increment (for overflow checking) it does however unroll the loop five times). Anyway, the result is: time taken for 100M direct increments (in s) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.02154 0.02355 0.02358 0.02613 0.02450 0.03828 Now I benchmarked that against KeyPaths ``` java var thing = SomeStruct() for _ in 0..<100_000_000 { thing[keyPath: \SomeStruct.x] += 1 } ``` and the result is: time taken for 100M keypath increments (in s) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.691 4.698 4.722 4.738 4.778 4.821 which is 200x the runtime of the original one. I used `Apple Swift version 5.0-dev (LLVM cbe8d5e28f, Clang 3452631569, Swift 201dcba300)`, before that it was even slower. Then I tried to understand why Keypaths are so slow and I created yet another benchmark which goes through a pretty naive approximation: ``` java public struct SomeStruct { public var x: Int = 0 public init(){} } public struct FakeWritableKeyPath<Thing, Element> { public let writeIt: (inout Thing, Element) -> Void public let readIt: (Thing) -> Element } extension SomeStruct { public static let fakeKeyPathsForX: FakeWritableKeyPath<SomeStruct, Int> = FakeWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } ``` and the loop was ``` java for _ in 0..<100_000_000 { let read = SomeStruct.fakeKeyPathsForX.readIt let write = SomeStruct.fakeKeyPathsForX.writeIt write(&thing, read(thing) + 1) } ``` to my absolute surprise, that yielded better performance ("only" 47x slower): Min. 1st Qu. Median Mean 3rd Qu. Max. 1.073 1.091 1.103 1.116 1.131 1.217 To finish off, I benchmarked against what I thought would kind of approximate the implementation at least in a fast path (just handing two function pointers around): ``` java public struct FakeCheatedWritableKeyPath { public let writeIt: @convention(thin) (inout SomeStruct, Int) -> Void public let readIt: @convention(thin) (SomeStruct) -> Int } extension SomeStruct { public static let fakeCheatedKeyPathsForX: FakeCheatedWritableKeyPath = FakeCheatedWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } ``` with the loop just like above. That started to yield reasonable performance Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2298 0.2329 0.2351 0.2362 0.2401 0.2440 which is only about 10x slower than the direct additions and I think that's reasonable because `INC` is a processor instruction which naturally is a bit faster than 'function call to read the value, increment, function call to write the value'. Also loop unrolling etc... ## Notes ### Compiler Apple Swift version 5.0-dev (LLVM cbe8d5e28f, Clang 3452631569, Swift 201dcba300) Target: x86_64-apple-darwin18.2.0 ### OS macOS 10.14 on Model Identifier: MacBookPro15,1 Processor Name: Intel Core i9 Processor Speed: 2.9 GHz ### Observations I found {{ %5 = keypath $WritableKeyPath\<SomeStruct, Int\>, (root $SomeStruct; stored_property \#SomeStruct.x : $Int) // users: %20, %8}} in the SIL which looks like the compiler has actually quite some understanding of key paths, so maybe there's hope they will soon be faster? 😉 ### Code the structs/fake key paths were defined in a module `Foo` and all the calls were always from another module `TestApp` in order not to get any inlining effects. But even with everything in one module, the slow versions didn't get faster at all. the whole code is attached (the .tar.gz and can be run with just `swift run -c release`), but is also here (note that everything below `// MODULE: Foo` is in another module. ``` java import Foundation import Foo public func measure(_ fn: () throws -> Int) rethrows -> [TimeInterval] { func measureOne(_ fn: () throws -> Int) rethrows -> (TimeInterval, Int) { let start = Date() let v = try fn() let end = Date() return (end.timeIntervalSince(start), v) } let firstRes = try measureOne(fn).1 /* pre-heat and throw away */ var measurements = Array(repeating: 0.0, count: 10) for i in 0..<10 { let timeAndRes = try measureOne(fn) measurements[i] = timeAndRes.0 precondition(firstRes == timeAndRes.1) } print(firstRes) return measurements } public func measureAndPrint(desc: String, fn: () throws -> Int) rethrows -> Void { print("measuring: \(desc): ", terminator: "") let measurements = try measure(fn) print(measurements.reduce("") { $0 + "\($1), " }) } measureAndPrint(desc: "direct") { var thing = SomeStruct() for _ in 0..<100_000_000 { thing.x += 1 } return thing.x } measureAndPrint(desc: "fake key paths") { var thing = SomeStruct() for _ in 0..<100_000_000 { let read = SomeStruct.fakeKeyPathsForX.readIt let write = SomeStruct.fakeKeyPathsForX.writeIt write(&thing, read(thing) + 1) } return thing.x } measureAndPrint(desc: "totally cheated fake key paths") { var thing = SomeStruct() for _ in 0..<100_000_000 { let read = SomeStruct.fakeCheatedKeyPathsForX.readIt let write = SomeStruct.fakeCheatedKeyPathsForX.writeIt write(&thing, read(thing) + 1) } return thing.x } measureAndPrint(desc: "real key paths") { var thing = SomeStruct() for _ in 0..<100_000_000 { thing[keyPath: \SomeStruct.x] += 1 } return thing.x } // MODULE: Foo public struct SomeStruct { public var x: Int = 0 public init(){} } // compiler generated // fake public struct FakeWritableKeyPath<Thing, Element> { public let writeIt: (inout Thing, Element) -> Void public let readIt: (Thing) -> Element } extension SomeStruct { public static let fakeKeyPathsForX: FakeWritableKeyPath<SomeStruct, Int> = FakeWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } // cheat public struct FakeCheatedWritableKeyPath { public let writeIt: @convention(thin) (inout SomeStruct, Int) -> Void public let readIt: @convention(thin) (SomeStruct) -> Int } extension SomeStruct { public static let fakeCheatedKeyPathsForX: FakeCheatedWritableKeyPath = FakeCheatedWritableKeyPath(writeIt: { thing, newValue in thing.x = newValue }, readIt: { thing in return thing.x }) } ```

Jon_Shier · May 4, 2020, 2:36pm

Even if KeyPaths themselves are slow, I would've expected the KeyPath -> function equivalence to not actually create a KeyPath, just the equivalent function. I guess the conversion happens at the KeyPath end at runtime, not to a function at compile time.

gwendal.roue · May 4, 2020, 3:23pm

Me too. Now that's called room for improvement, because I think the design does not lock us in a bad corner :-)

Joe_Groff · May 4, 2020, 4:11pm

Yeah, the runtime implementation has not yet been looked at with any eye for performance, and there are also relatively straightforward optimizations for converting a keypath to a getter function without involving the key path object at all that have not yet been implemented.

atfelix · May 4, 2020, 9:52pm

When profiling, it seems that it was spending a lot of time here. I don't understand the comment // TODO: For perf, we could use a local growable buffer instead of Any, but would like to understand key paths in more depth. Is such a task suitable for a starter with some guidance?

Are any of these tasks suitable for starter with some guidance?

Joe_Groff · May 4, 2020, 11:17pm

The comment there is referring to the use of the curBase local variable to hold the current base of the projection operation. As written, when it traverses a key path, it copies out the current base value at each step into the curBase variable. Since an Any value incurs an allocation for large values, and reassigning the Any will deallocate its current buffer and reallocate a new one, that won't be very efficient. It would be better to keep a single temporary allocation, growing it as needed during the traversal. (It might be even better if, when the KeyPath was instantiated, it pre-computed the size and alignment of a buffer allocation it would need to hold the largest base value resulting from the traversal.) Reducing the number of allocations from Any reassignment, as well as reducing the number of copies generally by avoiding copies during projections of stored property components, would likely improve the performance quite a bit, and might be a worthwhile project, if you're comfortable with unsafe Swift code.

atfelix · May 5, 2020, 4:25am

I don't know what you mean by projection operation. How would this relate to say a key path of the form \A.one.two?.three?

I'm not familiar with unsafe Swift. I know of it but have never written anything with it before. Do you know some community resources regarding it?

filip-sakel · May 5, 2020, 5:59am

Here’s a really good starting point for unsafe pointers.

Joe_Groff · May 5, 2020, 3:43pm

Sorry, projection refers to the application of the key path to get the result of applying it to a base, so starting with a value of A, it extracts the value of the one field, then extracts the two field from that, optional-chains the new value, and so on.

Runtime performance cost of key paths

Questions

Context

Program 1: Key paths as functions

Program 2: @dynamicallyMemberLookup with key paths

Program 2: `@dynamicallyMemberLookup` with key paths