Runtime performance cost of key paths

Questions

  • Is there an expectation that keypaths significantly slower than functions or straight access with a closure?
  • If so, is there anyway to mitigate these discrepancies while still using keypaths?

Context

I'm noticing anywhere between an increased factor of 2-10+ when using keypaths as functions or using dynamic member lookup.

Program 1: Key paths as functions

import AppKit

func arrayClosureTest(_ size: UInt) {
    let xs = (0 ... size).map(String.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map { $0.count }
    let t1 = CACurrentMediaTime()

    print("closure test:\n\t\t\(t1 - t0)")
}

func arrayKeyPathTest(_ size: UInt) {
    let xs = (0 ... size).map(String.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map(\.count)
    let t1 = CACurrentMediaTime()

    print("key path test:\n\t\t\(t1 - t0)")
}

func main() {
    print(String(repeating: "=", count: 50))
    let sizes: [UInt] = [1_000, 10_000, 100_000, 1_000_000]
    for size in sizes {
        arrayClosureTest(size)
        arrayKeyPathTest(size)
        print(String(repeating: "=", count: 50))
    }
}

yields with no optimizations (-O0).

==================================================
closure test:
		9.460499859414995e-05
key path test:
		0.0006224889948498458
==================================================
closure test:
		0.0005262159975245595
key path test:
		0.006581138004548848
==================================================
closure test:
		0.0057080870028585196
key path test:
		0.06060750799952075
==================================================
closure test:
		0.05970345200330485
key path test:
		0.5920746110059554
==================================================

If I change the optimizations settings to Fastest, Smallest (-Os), then the output is

==================================================
closure test:
		0.0001036890025716275
key path test:
		0.0008002639951882884
==================================================
closure test:
		0.0005304780061123893
key path test:
		0.006536139000672847
==================================================
closure test:
		0.00698990399541799
key path test:
		0.06180560700886417
==================================================
closure test:
		0.060244746011449024
key path test:
		0.5942129169998225
==================================================

If I change the optimizations settings to Fastest (-Ofast), then the output is

==================================================
closure test:
		8.602200250606984e-05
key path test:
		0.0006292640027822927
==================================================
closure test:
		0.0005449989985208958
key path test:
		0.006529158999910578
==================================================
closure test:
		0.005752074997872114
key path test:
		0.05974736200005282
==================================================
closure test:
		0.06013945199083537
key path test:
		0.5992421360133449
==================================================

Program 2: @dynamicallyMemberLookup with key paths

import AppKit

@dynamicMemberLookup
struct Identity<A> {
    let value: A

    subscript<B>(dynamicMember keyPath: KeyPath<A, B>) -> B{
        value[keyPath: keyPath]
    }
}

func straightAccessTest(_ size: UInt) {
    let xs = (0 ... size).map(Identity.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map { $0.value.hashValue }
    let t1 = CACurrentMediaTime()

    print("access test:\n\t\t\(t1 - t0)")
}

func dynamicMemberLookupTest(_ size: UInt) {
    let xs = (0 ... size).map(Identity.init)
    let t0 = CACurrentMediaTime()
    let ys = xs.map { $0.hashValue }
    let t1 = CACurrentMediaTime()

    print("dynamic member test:\n\t\t\(t1 - t0)")
}

func main() {
    print(String(repeating: "=", count: 50))
    let sizes: [UInt] = [1_000, 10_000, 100_000, 1_000_000]

    for size in sizes {
        straightAccessTest(size)
        dynamicMemberLookupTest(size)
        print(String(repeating: "=", count: 50))
    }
}

The output for -O0, -Os and -Ofast were pretty much the same for all three. Here is -Ofast's output:

==================================================
access test:
        3.349900362081826e-05
dynamic member test:
        0.0005792470037704334
==================================================
access test:
        0.00019683199934661388
dynamic member test:
        0.005823437997605652
==================================================
access test:
        0.0014278379967436194
dynamic member test:
        0.057225974000175484
==================================================
access test:
        0.015988421000656672
dynamic member test:
        0.5222387870016973
==================================================

Even if KeyPaths themselves are slow, I would've expected the KeyPath -> function equivalence to not actually create a KeyPath, just the equivalent function. I guess the conversion happens at the KeyPath end at runtime, not to a function at compile time.

Me too. Now that's called room for improvement, because I think the design does not lock us in a bad corner :-)

Yeah, the runtime implementation has not yet been looked at with any eye for performance, and there are also relatively straightforward optimizations for converting a keypath to a getter function without involving the key path object at all that have not yet been implemented.

7 Likes

When profiling, it seems that it was spending a lot of time here. I don't understand the comment // TODO: For perf, we could use a local growable buffer instead of Any, but would like to understand key paths in more depth. Is such a task suitable for a starter with some guidance?

Are any of these tasks suitable for starter with some guidance?

The comment there is referring to the use of the curBase local variable to hold the current base of the projection operation. As written, when it traverses a key path, it copies out the current base value at each step into the curBase variable. Since an Any value incurs an allocation for large values, and reassigning the Any will deallocate its current buffer and reallocate a new one, that won't be very efficient. It would be better to keep a single temporary allocation, growing it as needed during the traversal. (It might be even better if, when the KeyPath was instantiated, it pre-computed the size and alignment of a buffer allocation it would need to hold the largest base value resulting from the traversal.) Reducing the number of allocations from Any reassignment, as well as reducing the number of copies generally by avoiding copies during projections of stored property components, would likely improve the performance quite a bit, and might be a worthwhile project, if you're comfortable with unsafe Swift code.

I don't know what you mean by projection operation. How would this relate to say a key path of the form \A.one.two?.three?

I'm not familiar with unsafe Swift. I know of it but have never written anything with it before. Do you know some community resources regarding it?

Here’s a really good starting point for unsafe pointers.

Sorry, projection refers to the application of the key path to get the result of applying it to a base, so starting with a value of A, it extracts the value of the one field, then extracts the two field from that, optional-chains the new value, and so on.