The type metadata that gets emitted for struct, enum, and class types includes a reference to a nominal type descriptor. The descriptor carries information that pertains to the nominal type declaration itself, independent of specific instantiations of generic types, as well as information that’s of a more “reflective” nature which isn’t on the fast path for compiler-generated code but may be of interest to reflection APIs or one-time initialization actions. The current nominal type descriptor format is lacking in a number of ways that I’d like to improve:
Each nominal type descriptor currently identifies the type it represents by mangling its fully-qualified name. This is space-inefficient, since it’s likely that most of the types in a binary are defined in the same Swift module, but each descriptor currently needs to independently mangle its whole context. It’s also not an efficient format for the kinds of things we want to use nominal type descriptor lookup for, such as printing type names or looking up types by name, the APIs for which are likely to want to deal in human-readable representations of the type rather than mangled names.
Nominal type descriptors do not have enough information to dynamically instantiate all generic types. The metadata for a generic type instance has to carry data about the generic arguments of the instance; currently, this can include metadata pointers for type arguments and witness table pointers for required protocol conformances. For example, the metadata for Dictionary<String, Int> needs to be instantiated using the type metadata for String, the conformance String: Hashable, and the type metadata for Int. Nominal type descriptors carry information about the location and number of generic arguments a generic type needs, but no precise information about what those arguments are. If we wanted to be able to instantiate Dictionary<String, Int> from a string at runtime, we would need to know that the Hashable protocol conformance for String is needed.
Nominal type descriptors contain some information that is now redundant, such as a reference to an accessor function that generates a list of field types for structs and classes, which increases code size for things that can now potentially be derived from the detailed “remote mirror” reflection metadata.
In addition to having a descriptor for each type, we could have a more general hierarchy of context descriptors that describe the common traits of a declaration scope, such as a type, extension, or top-level module. This could reduce the size of string data we emit, by having one shared instance of contexts names lsuch as module, parent type, etc., while also making it easier to use that information for printing and parsing types by name. We should also have a format for describing generic requirements, so that we can understand them at runtime. This will be useful not only in context descriptors but for other runtime mechanisms, particularly recording for dynamic conformance lookup purposes the constraints on conditional conformances and base class constraints on protocols.
Kinds of context descriptor
We should have records describing a few different kinds of context:
A module descriptor represents a Swift module. There shouldn’t be any immediate need for anything other than the module’s name in Swift today.
A nominal type descriptor serves the same purpose it does today. It should include:
Kind (struct, class, or enum)
Classes currently also carry vtable metadata that’s used with resilient base classes to lay out and construct method tables when their metadata is instantiated.
An extension descriptor represents an extension context. Nominal types can appear inside extensions, which may constrain away generic parameters of the extended type or introduce new generic requirements via protocol constraints.
Extended context (the nominal type or protocol being extended)
Parent module context of the extension
Generic constraints on the extended context
A local context descriptor represents a function that contains local nested types. My feeling is that we’d want to treat these as anonymous singletons to record the enclosing generic context, but nothing else. We could potentially also use this anonymous context kind to represent private types. Such contexts would thus only need to include:
Describing generic requirements
To describe a generic signature, we need to encode the number of generic parameters along with the requirements on those parameters. Currently Swift only supports type parameters, but we’ll want to keep space in the format to eventually encode non-type parameters. Parameters can be constrained by same-type, base class, and/or protocol constraints, which can apply either directly to the parameters themselves or to associated types of the parameters. Protocol constraints require arguments when the generic environment is instantiated to provide the conformances that satisfy the constraint. We will also want to provide space in the encoding for new kinds of requirement, which may or may not require additional arguments, in the future.
Therefore, a generic requirements descriptor needs at the top level two sections, one to describe parameters, and one to describe constraints. In some situations where the parameter set is implied, such as extensions or conditional conformances, we may only want to encode constraints. For each parameter, we’ll want to know:
what kind of parameter it is. Currently there are only type parameters.
whether it requires an instantiation argument. Same-type constraints may constrain away type parameters that formally exist, and future kinds of parameter may or may not require a runtime argument. Keeping this as a separate bit will allow for some amount of backward and forward compatibility, by allowing clients reading a generic argument list with a generic requirement descriptor to be able to extract the kinds of information they know about without being desynced by new kinds of information they don’t.
For each constraint, we’ll want to know:
what kind of constraint: same type, base class, or protocol. We’ll want room for more constraint kinds.
what parameter the constraint applies to; for now, either a type parameter, or an associated type thereof.
whether the constraint requires an instantiation argument. As for parameter descriptions, keeping this as a separate bit will allow for some amount of backward and forward compatibility, by allowing clients reading a generic argument list with a generic requirement descriptor to be able to extract the kinds of information they know about without being desynced by new kinds of information they don’t.
information specific to the constraint. For same type or base class constraints, we’d want a description of the constraining type; for protocol constraints, a reference to the protocol descriptor should suffice.
I think that’s a summary of everything we should need. Let me know if I missed anything.