KeyPath performance

I have a package that exports the following simple struct,

public struct Dimensions {
        public var indices: Int {
                return jots - steps
        }
        public var boundaries: Int {
                return steps - units
        }
        public let units: Int
        public let jots: Int
        public let steps: Int
        public static var zero: Dimensions {
                return Dimensions()
        }
        public init(indices: Int = 0, boundaries: Int = 0,
            units: Int = 0) {
                self.units = units
                self.jots = indices +  boundaries + units
                self.steps = boundaries + units
        }
}

I am surprised at the cost of accessing stored (i.e., non-computed)
members of the struct, such as steps, by keypath. Each access
involves a call to swift_getAtKeyPath; under swift_getAtKeyPath
the call stack is three calls deep. I had expected for such a simple
KeyPath to compile into a simple offset into the struct instead of
a handful of calls. Is there some way that I can help the compiler
optimize these accesses?

Dave

1 Like

Is the KeyPath access happening in the same module that declares Dimensions?

I checked it out in godbolt, with the access and declaration in the same file, and there don't seem to be any runtime calls. As you expected, it just loads the value directly:

output.test(output.Dimensions) -> Swift.Int:
        mov     rax, rdx
        ret

Also, how are you accessing the value? Directly with a literal, or via a variable?

var kp: KeyPath<Dimensions, Int> = ...
dimensions[keyPath: kp]

In the same file there is no overhead.
Since the layout might change in the future, Swift will use an accessor rather than using an offset.
If you are sure that you won't modify this struct in can mark it as @frozen.

However this will break binary compatibility when you make changes in the future.

As an example:
If you have a struct like this in a binary framework

public struct Foo {
    public init(x: String){
        self.x = x
    }
    public let x: String
}

and you use it in a different binary you'll end up with with a call to a generated getter method.

If you mark it as frozen, it will be able to just use the offset.

1 Like

That shouldn’t be necessary unless the library is compiled with library evolution support. If it does make a difference, that’s a missed optimization opportunity.

(Note also that it is considered an optimization in Swift, so double-check that you’re not using a debug build.)

1 Like

Thank you for that advice. I tried declaring the type @frozen,

@frozen public struct Dimensions {
// ...
}

Alas, that does not improve the assembly for inner,

func inner(_ dimensions: Dimensions, _ path: KeyPath<Dimensions, Int>) -> Int {
    return dimensions[keyPath: path]
}

which still calls swift_getAtKeyPath,

output.inner(output.Dimensions, Swift.KeyPath<output.Dimensions, Swift.Int>) -> Swift.Int:
        sub     rsp, 40
        mov     qword ptr [rsp + 16], rdi
        mov     qword ptr [rsp + 24], rsi
        mov     qword ptr [rsp + 32], rdx
        lea     rax, [rsp + 8]
        lea     rdi, [rsp + 16]
        mov     rsi, rcx
        call    swift_getAtKeyPath@PLT
        mov     rax, qword ptr [rsp + 8]
        add     rsp, 40
        ret

As another experiment, I rewrote Dimensions without any computed
properties, just in case they prevented optimization:

@frozen public struct Dimensions {
    public let boundaries: Int
    public let indices: Int
    public let units: Int
    public let jots: Int
    public let steps: Int
    public static var zero: Dimensions {
            return Dimensions()
    }
    public init(indices: Int = 0, boundaries: Int = 0,
        units: Int = 0) {
            self.indices = indices
            self.boundaries = boundaries
            self.units = units
            self.jots = indices +  boundaries + units
            self.steps = boundaries + units
    }
}

No effect.

Dave

Is the KeyPath access happening in the same module that declares Dimensions?

I want to make sure that I'm answering the right question. Do you mean,
in the same package, or in the same source file?

The access happens in the package that declares Dimensions, however, the
in-package declarations and uses are spread over a few files.

My app imports the package and calls API functions with
KeyPath<Dimensions, Int>s.

As you expected, it just loads the value directly:

output.test(output.Dimensions) -> Swift.Int:
        mov     rax, rdx
        ret

Thanks, I should have thought of godbolt!

Looks like the compiler optimized the constant-keypath access,
dimensions[keyPath: \.steps] -> dimensions.steps.

Below, I have introduced a level of indirection:


func inner(_ dimensions: Dimensions, _ path: KeyPath<Dimensions, Int>) -> Int {
    return dimensions[keyPath: path]
}

func outer(_ dimensions: Dimensions) -> Int {
    return inner(dimensions, \.steps)
}
public struct Dimensions {
    public var indices: Int {
            return jots - steps
    }
    public var boundaries: Int {
            return steps - units
    }
    public let units: Int
    public let jots: Int
    public let steps: Int
    public static var zero: Dimensions {
            return Dimensions()
    }
    public init(indices: Int = 0, boundaries: Int = 0,
        units: Int = 0) {
            self.units = units
            self.jots = indices +  boundaries + units
            self.steps = boundaries + units
    }
}

outer still optimizes well:

output.outer(output.Dimensions) -> Swift.Int:
        mov     rax, rdx
        ret

For inner, however, the compiler generates a swift_getAtKeyPath
call:

output.inner(output.Dimensions, Swift.KeyPath<output.Dimensions, Swift.Int>) -> Swift.Int:
        sub     rsp, 40
        mov     qword ptr [rsp + 16], rdi
        mov     qword ptr [rsp + 24], rsi
        mov     qword ptr [rsp + 32], rdx
        lea     rax, [rsp + 8]
        lea     rdi, [rsp + 16]
        mov     rsi, rcx
        call    swift_getAtKeyPath@PLT
        mov     rax, qword ptr [rsp + 8]
        add     rsp, 40
        ret

Following the ABI specification, I would hope for the compiler to emit a
fast path resembling this C code in concept:

int
inner(struct Dimensions dimensions, struct KeyPath_Dimensions_Int *path)
{
    int result;

    // fast path: stored property, only one key path component
    if (path->keypath_object_header.kvc_string == NULL &&
        path->keypath_buffer_header.buffer_size == sizeof(path->component[0]) &&
        path->component[0].header.kind == stored)
        return (int *)((char *)&dimensions + path->component[0].payload);

    // slow path: computed property or other
    swift_getAtKeyPath(dimensions, path, &result);
    return result;
}

I believe that fast path would go a long way toward reducing the keypath
overhead in my program.

Since the compiler has full knowledge about the shape of struct Dimensions, I believe that in principle some further optimization is
possible:

int
inner(struct Dimensions dimensions, struct KeyPath_Dimensions_Int *path)
{
    int result;

    // fast path: stored property, only one key path component
    if (path->keypath_object_header.kvc_string == NULL &&
        path->component[0].header.kind == stored)
        return (int *)((char *)&dimensions + path->component[0].payload);

    if (path->keypath_object_header.kvc_string == NULL &&
               path->component[0].header.kind == stored) {
        // slower path: computed property
        ((struct computed_component *)&path->component[0])->getter(
            &dimensions, &result);
        return result;
    } else {
        // slowest path: all other cases
        swift_getAtKeyPath(dimensions, path, &result);
        return result;
    }
}

Dave

Sorry, this is unrelated but how did you get the assembly code for that function?
I would like to learn :slight_smile:

KeyPath's internal representation is not ABI, so the compiler cannot generate code that assumes any layout. The compiler has to generate a generic implementation for inner, but as you saw any statically known uses of inner can and should get optimized away to a property access.

1 Like

In this thread we have been using godbolt. Choose the
Swift language in the left pane and enter your Swift program. In the
right pane the assembly for your Swift program will appear.

You can do a similar thing using tools that come with Xcode:

$ swiftc -target x86_64-apple-macosx12.0 -o - -S keypath.swift | swift demangle

That pipeline emits the assembly on the standard output stream and
converts the mangled Swift names to something more readable.

Another thing you can do is compile to an object file and then dump the
object as assembly,

$ swiftc -target x86_64-apple-macosx12.0 -o keypath.o keypath.swift
$ objdump -S ./keypath.o | swift demangle

The assembly output by swiftc is in the so-called AT&T format instead
of Intel. I don't know how to select otherwise.

You can pass -M intel to objdump and it will use the Intel format
instead of the AT&T format that is the default.

Dave

3 Likes

I would like to understand this in greater detail.

My understanding from reading the Key Path Memory
Layout

is that a key path literal (\.steps) is stored in the executable
binary as a key path pattern. Further, I understand that the key path
pattern is part of the stable ABI.

Before a key path pattern can be applied, it has to be instantiated as a
key path object by the runtime. I had overlooked previously that a key
path object does not have a stable ABI. The compiler will not generate
code that grovels in the runtime object like my hypothetical fast path
does. So I am not going to get rid of that swift_getAtKeyPath call.

Is there some way to make swift_getAtKeyPath run faster? Conversely,
is there a way to avoid cases where it runs particularly slowly?
During a lengthy computation, my program spends ~17% of its time in
swift_getAtKeyPath. Profiles that I collect with dtrace show that
there is an awful lot happening under the hood:

              libswiftCore.dylib`swift::MetadataCacheKey::operator==(swift::MetadataCacheKey) const+0x24
              libswiftCore.dylib`std::__1::pair<swift::HashMapElementWrapper<(anonymous namespace)::GenericCacheEntry>*, unsigned int> swift::ConcurrentReadableHashMap<swift::HashMapElementWrapper<(anonymous namespace)::GenericCacheEntry>,swift::StaticMutex>::find<swift::MetadataCacheKey>(swift::MetadataCacheKey const&, swift::ConcurrentReadableHashMap<swift::HashMapElementWrapper<(anonymous namespace)::GenericCacheEntry>, swift::StaticMutex>::IndexStorage, unsigned long, swift::HashMapElementWrapper<(anonymous namespace)::GenericCacheEntry>*)+0x104
              libswiftCore.dylib`_swift_getGenericMetadata(swift::MetadataRequest, void const* const*, swift::TargetTypeContextDescriptor<swift::InProcess> const*)+0x288
              libswiftCore.dylib`__swift_instantiateCanonicalPrespecializedGenericMetadata+0x24
              libswiftCore.dylib`closure #1 in KeyPath._projectReadOnly(from:)+0x2e4
              libswiftCore.dylib`swift_getAtKeyPath+0xfc

Dave