Generic type metadata prespecialization

One source of memory and performance overhead in Swift code is the instantiation and fetching of type metadata. Even though generic specialization eliminates the need for type metadata in most fully-specialized code, we still need the metadata in many frequently-occurring situations:

  • Objects always need their class metadata, which serves as the "isa" pointer with the object's method table and other dynamic metadata.
  • When putting a value inside an existential box, the type metadata for the value's type is stored in the box to represent its dynamic type.
  • When calling into unspecialized code, type metadata for the generic type arguments has to be formed. Code may remain unspecialized because it crosses ABI boundaries or is invoked via dynamic reflection.

Currently, when Swift needs the metadata for a generic type or for builtin structural types such as tuples and functions, it always calls into the runtime, which will allocate and initialize metadata records for these types on demand. Although the runtime maintains caches for the resulting records, these calls can still be expensive and lead to noticeable time spent in functions like swift_getGenericMetadata. This cost can be particularly compounded by libraries that rely on deep composition of generic adapter types, like the standard library's lazy collections. If you compose a bunch of transformer wrappers, then put the result
in AnySequence or some other dynamic-typed container, like:

let seq = AnySequence(array.lazy.concat().filter { ... }.map { ... })

then we need the metadata for the composed sequence, and getting metadata for a deeply nested generic type like LazyMap<LazyFilter<LazyConcat<...>>> through the runtime requires instantiating every level of generic type, which can take a significant amount of time and memory if it occurs in a hot path.

In many cases, we only really need the metadata for the outermost type, and building the component types' metadata is just a side effect of relying on the runtime to dynamically build the metadata on-demand. We also know in most cases exactly what types we need metadata for at compile time, when we emit code that instantiates the metadata. We can minimize the runtime's involvement in metadata instantiation and reduce memory use by having the compiler generate pre-specialized metadata records for these types, and updating the runtime and ABI in forward-compatible ways to efficiently accommodate prespecializations. @nate_chandler has been working on this optimization in a PR currently open on the Swift compiler: Generic metadata prespecialization, part 1 by nate-chandler · Pull Request #28610 · apple/swift · GitHub

Maintaining uniqueness of metadata records

Prespecialization is limited by some ABI decisions we've already made. The Swift runtime relies on the pointer identity of metadata records to correspond to the identity of types; in other words, if two runtime values have the same dynamic type, their type metadata records should be the same identical object at the same address in memory, so that pointer equality corresponds to type equality. This poses a challenge for pre-specialized metadata record, because it may not be unique:

  • Different dynamic libraries in the process may have generated the same specialization. Only one can be picked as the process-wide canonical metadata for the type.
  • Dynamic requests to instantiate the type metadata need to produce the same metadata record as direct references to the specialized type. In other words, a request to build the metadata for Array with the dynamic element type (Int) -> String has to return the same pointer as a direct reference to specialized metadata for Array<(Int) -> String>.

Therefore, we need a way to register specialized type metadata with the runtime so that it can be "blessed" as canonical metadata. We can do this with various levels of confidence and overhead, depending on how much control we have over the types involved in the specialization.

Alternatively, we might be able to relax the uniqueness requirement in some situations, since there are relatively few operations in Swift that fundamentally rely on that uniqueness, such as == for metatypes and dynamic casts. We could potentially relax the ABI requirement for new code, allowing "surrogate" metadata records to be used up to the point the canonical metadata is actually required, such as when performing one of those operations, or calling into code compiled with the older ABI. Establishing that latter criterion is tricky, however, and having multiple metadata records active for the same type has its own potential for increasing the working set of a process, or for exposing surprising behavior because of differences between different metadata records.

Specialized metadata for generic types defined in the current compilation unit

The current ABI already gives a module full control over how the generic types it defines are instantiated, by having references from outside the module call a metadata accessor function defined by the module. This makes specialization of metadata for types defined in the current compilation unit relatively straightforward. Since the defining module controls metadata instantiation, it can ensure that any prespecialized metadata records it created itself are canonical, by using them to serve dynamic metadata instantiations of the same type. For instance, if a module contains the following code:

struct Foo<T> {}

then it will define a metadata accessor function for Foo, which looks something like this pseudo-C:

const Type *`metadata accessor for Foo`(const Type *T) {
  return swift_getGenericMetadata(&`type descriptor for Foo`, T);
}

where swift_getGenericMetadata is the Swift runtime's default mechanism for dynamically instantiating and caching metadata records at runtime. However, if within the same compilation unit, we know that specific metadata records are used:

func foo() {
  // Instantiate metadata for the given generic instances, by passing the type
  // as a dynamic value to print()
  print(Foo<Int>.self)
  print(Foo<String>.self)
  print(Foo<Float>.self)
}

then we can generate pre-specialized metadata records for Foo<Int>, Foo<String>, and Foo<Float>, and to serve dynamic metadata requests, we can extend the metadata accessor to check for these specializations before falling back to
swift_getGenericMetadata:

const Type *`metadata accessor for Foo`(const Type *T) {
  if (T == &`metadata for Int`) {
    return &`metadata for Foo<Int>`;
  }
  if (T == &`metadata for String`) {
    return &`metadata for Foo<String>`;
  }
  if (T == &`metadata for Float`) {
    return &`metadata for Foo<Float>`;
  }
  return swift_getGenericMetadata(&`type descriptor for Foo`, T);
}

In turn, this guarantees that the specialized metadata records are canonical, so code within the compilation unit can directly reference the metadata records by address instead of calling the metadata accessor. The prespecialized metadata objects themselves would normally remain private to the module, because the exact set of prespecializations that happened to be used inside the implementation of the module should not be ABI. We could conceivably extend the @_specialize attribute, which currently applies to functions to emit specializations for specific types or constraints as ABI, to allow a library to explicitly export certain metadata prespecializations. This would allow the standard library in particular to pre-specialize metadata for common Array, Set, Dictionary, and other types, and let other modules directly access those metadata records.

Specialized metadata for generic types defined in other modules

For generic types defined outside the current module, we don't have the benefit of completely controlling the type's metadata instantiation, but we can build mechanisms into the runtime that give modules the opportunity to influence other modules' generic metadata instantiation.

Because specialized metadata is not guaranteed to be unique in cases like this, we will not be able to access it by direct reference. We'll need to feed accesses through a runtime call that caches the canonical metadata record for the type, similar to what swift_getForeignTypeMetadata does for type metadata generated by the Clang importer from C struct and enum types. So for something like:

func bar() {
  print(Array<Int>.self)
}

we could generate a metadata record specialized for Array<Int>, along with a global variable to cache the instantiated canonical record, and access it through a runtime call, like this pseudo-C:

const Type `type metadata for Array<Int>`;

const Type *`cache for type metadata for Array<Int>` = 0;

void bar(void) {
  // Fetch the metadata
  Metadata *type = swift_getSpecializedGenericMetadata(
    &`type metadata for Array<Int>`,
    &`cache for type metadata for Array<Int>`);

  print(type);
}

swift_getSpecializedGenericMetadata would do the one-time instantiation work of returning the cached pointer if it's been set to non-null, or trying to register the metadata as canonical if possible, if all else failing getting the canonical metadata record from the runtime.

As to how that registration occurs, there are a number of possibilities. The most straightforward one is probably to introduce a new registration section to the binary, akin to the __swift5_proto and __swift5_types sections for protocol conformances and types, to register the set of specializations in each binary with the runtime. This would impose some overhead when the runtime needs to instantiate a generic type, since it would have to scan these sections first, though once all the used generic types are cached, that overhead should eventually be amortized.

There may be ways to improve on this basic approach, particularly to allow for pre-specialized metadata records to be assumed to be canonical in more situations. One rule we could implement is that, if the generic arguments consist only of types defined in a specific module, that the runtime always favors specialization records from that module as canonical. This would allow the module that defines a type Foo to also generate the canonical specialized metadata for Array<Foo>, Set<Foo>, and other standard library types.

Specialized metadata for structural types

Metadata for specialized structural types, such as tuples and functions, presents similar issues to those for generic types defined in the standard library. One complication is that the structural type metadata layouts are currently mostly private to the runtime. We would need to establish some longer-term guarantees about the layouts for kinds of types we want to be able to prespecialize.

Lazifying access to generic arguments through metadata

One of the optimization opportunities of metadata pre-specialization is the ability for the compiler to skip over emitting metadata for unnecessary intermediate types; for instance, if the metadata for Array<(Int, String) -> (Int, String)> is needed, then the compiler can generate a metadata record for specifically that type, without also generating metadata for (Int, String) -> (Int, String) and (Int, String). Unfortunately, the current ABI interferes with this ability, because code generation expects to be able to load the generic arguments directly out of the metadata for a generic type. For example, a metadata record for an instantiation of a generic struct Foo<T, U, V> looks something like this in memory:

struct FooMetadata {
  uintptr_t kind;
  const TypeContextDescriptor *contextDescriptor;
  const Metadata *T;
  const Metadata *U;
  const Metadata *V;
}

And if code needs to derive the generic arguments from the metadata record,
such as when calling a generic function from a protocol witness, then the
compiler today will generate code that loads directly from those offsets. Something like this:

protocol P {
  func foo()
}

struct S<T, U, V>: P {
  func foo() {
    bar(T.self, U.self, V.self)
  }
}

func bar<T, U, V>(_: T.Type, _: U.Type, _: V.Type)

Then the implementation of foo might end up looking something like this
pseudo-C:

void `S.foo`(const FooMetadata *Self) {
  bar(Self->T, Self->U, Self->V);
}

This isn't great for prespecialized metadata, since it means that we would have
to instantiate all of the generic argument metadata anyway, either with additional metadata prespecializations or at runtime, which would increase the code size cost of specialization and introduce instantiation overhead when using pre-specialized records.

We're stuck with the ABI we have for code that has to deploy to existing OSes, which means that we will have to fully instantiate the generic arguments before
passing metadata prespecializations to code that might be compiled with older compilers. We can do better for code built with new compilers, by having the compiler generate code to trigger instantiation of the generic arguments on demand, making the above more like:

struct FooMetadata {
  uintptr_t kind;
  const TypeContextDescriptor *contextDescriptor;
  // These fields point at a mangled name or instantiated type metadata
  // depending on the instantiation state of the metadata record
  union {
    const Metadata *T;
    const char *T_mangled_name;
  };
  union {
    const Metadata *U;
    const char *U_mangled_name;
  };
  union {
    const Metadata *V;
    const char *V_mangled_name;
  };
}


void `S.foo`(const FooMetadata *Self) {
  // Runtime call forces all the type arguments to be instantiated
  swift_instantiateGenericArguments(Self);
  bar(Self->T, Self->U, Self->V);
}

which adds a small amount of overhead, but would let us put off recursively
instantiating metadata for the type's generic arguments until we need to, in code compiled for newer platforms. The compiler-generated prespecialized metadata would then include something like a mangled name for each of its arguments that could be used by the runtime to instantiate the metadata, instead of pointers directly to metadata.

Code size and performance tradeoffs with metadata prespecialization

Generating specialized metadata records at compile time should reduce memory usage by replacing dynamic allocations with statically-allocated metadata records, and by avoiding instantiating intermediate types when building composed generic types. The prebuilt metadata records will of course increase the code size of binaries. One issue is that type metadata records are not "true const" because they contain absolute pointers, so generating prespecialized metadata records as pre-instantiated metadata could also carry a load-time cost. It may still make sense for prespecialized records to use a type metadata pattern and be instantiated lazily.

Metadata prespecialization also opens up further specialization opportunities; at the cost of yet more code size, we could also specialize the value witness methods for specialized types, as well as the destructor and virtual methods of classes. We'll want to experiment to see how the performance/code size tradeoff works out if we take those opportunities.

Compatibility issues

There are places in the Swift runtime that need to be modified to handle pre-specialized metadata records, particularly the metadata cache. Since metadata prespecialization is primarily an optimization, we can disable it when targeting OSes with older Swift runtimes. Some of the ABI changes discussed would also only be possible to take advantage of in programs that don't link in any binaries built with the older ABI, which is a trickier condition to establish. It may not immediately be practical to implement these changes.

18 Likes

This is pretty neat. I'm wondering if there will be some generic args that point directly to metadata and some that point to a mangled name? Or is it most of the args aren't instantiated statically, so we'll make all of the args mangled names? Vice versa? If there can be mangled names and metadata pointers, how will we be able to differentiate a type with many args? Can we fit flags for all of those in the context descriptor's flags (or is there going to be some new flag field just for this)?

Maybe it's as easy as just reading the first word and looking for a metadata kind, but could there be a character sequence that could be equal to one of these kinds?

Although each record itself won't necessarily be huge in terms of code size, lots of specializations could add up. Does the compiler have knowledge on how much it's specializing and taking code size into account? Will -Osize disable code pre-specialization or at least be very conservative with emitting a specialized record?

What exactly do you mean by this? Isn't the value witness table and methods always emitted even if the type is generic?

Aside: Is it strictly necessary for a type to have it's own value witness table if it shares the same layout with other types?

struct Foo<T> {
  let x: Int
}

struct Bar<T> {
  let x: Double
}

These two types share a layout, share a base memcpy method, and share size, stride and other flags, but is it worth having two vwts along with the getEnumTagSinglePayload and storeEnumTagSinglePayload being unique for each one? (I don't know if the last two are exactly the same for each type, but at a quick glance they look the same)

Sorry for all of the questions, I've just become extremely interested in Swift metadata and would love some insight into what considerations are being taken that help shape it. Thanks! :smile:

2 Likes

There are a few different ways we might be able to do this. @Slava_Pestov had mentioned that the state of a metadata record without its argument types filled in already exists as the "abstract" initialization stage in the runtime, to let it break circular references among types during instantiation. We could potentially relax the ABI so that only abstract metadata, rather than complete metadata, is required to pass as generic arguments.

We don't have very sophisticated mechanisms for weighting specialization cost against performance impact yet. We do intend to explore this, because we would like to perform function specialization in a more targeted way as well, potentially by using profile-guided optimization to trace what type specializations are hot in practice in a running process.

Value witness tables are not required to be unique, and the runtime does take advantage of some opportunities to share value witness tables when multiple types share a layout. For generic types, the compiler currently only generates an unspecialized value witness table. When the runtime instantiates metadata, it has the ability to recognize when the instantiated type is fully trivial and swap the generic value witnesses for trivial ones. Independent of this optimization, @Arnold has been looking into a systematic solution for keying value witness tables by the layout they represent, which will allow us to more generally share value witnesses among same-layout types, and even have an interpreter in the runtime that can implement value witness operations based on a layout bytecode instead of open-coding generic value witnesses for every type.

2 Likes