ABI for variadic generics

John_McCall · February 9, 2022, 9:11am

The language design for variadic generics is well underway, and it's time to start settling on decisions about how we're going to implement it. We can roughly divide the implementation into eight pieces:

parsing
representation in the AST
type-checking
representation in SIL
AST-to-SIL lowering
SIL analyses and optimizations
SIL-to-LLVM lowering
ABI and runtime

This thread is about the last piece: the ultimate ABI for generated code that uses variadic generics. As always, the SIL representation will be strongly influenced by the combination of the AST representation and the requirements of the ABI.

Brief language overview

A pack is a flat, possibly empty sequence. There are packs at multiple levels: type packs are sequences of types, value packs are sequences of (owned) values, borrow packs are sequences of borrowed packs, etc.

For clarity, in our presentation, packs will be explicitly delimited within curly braces ({}). Packs are composed of a sequence of

concrete components and
expansions of a pattern component for a parallel set of pack variables.

For clarity, whenever a pack is required, we will write it as a delimited pack. For example, if T names a type pack, we will never write Foo<T> and mean forwarding the entire pack; instead, we will write Foo<{T...}>. These rules may differ from the rules of the source language, but at the implementation level, we want to be explicit.

A type pack is not itself a type; it is a different kind of type-level entity, one which can only be used in specific ways. Generic contexts can be declared with a new kind of generic parameter, a generic type parameter pack, which has type-pack kind; the corresponding generic argument must be a type pack (and so is alwayss written in braces, as previously discussed).

Generic type parameter packs may be constrained:

two generic type parameter packs may be constrained to have the same length
the elements of an expansion may be constrained to conform to some protocol
the elements of an expansion may be constrained to (all) equal some common type
the elements of an expansion may be constrained to (pairwise) equal the element of an expansion of the same length

The elements of a value pack are expressions. Each expression has a type, which is therefore a sequence of types parallel with the value pack, and so we can say that the type of an value pack is a type pack of equal length.

A function parameter may be declared to be a function parameter pack. Its type is written as a type pack, and references to it produce a value pack. The corresponding argument in a call to the function must be a value pack.

A tuple type may be composed using a type pack as a component. The resulting type is the same as a tuple in which the elements type pack is expanded (without labels); thus, (Int, {Float, String}) is the same type as (Int, Float, String). A tuple value may be composed using a value pack as a component; its type is that of a tuple with the value pack's type as the corresponding component.

Type and value packs might be usable in other places, but it's not fundamentally important for understanding the feature.

ABI for generic type parameter packs

A generic context will need to be able to extract, for any generic type parameter pack T:

the length of T
the type metadata for the ith element of T

In addition, it will need to be able to extract witness tables for conformance requirements imposed on the elements of T and their associated types, but we will discuss that in the section on witness table packs.

Type metadata packs

The most obvious representation for any sort of sequence is a flat array of pointers to type metadata with a separately stored length. With a slight tweak, that's the representation we will use here; we'll run through the alternatives at the end of the section.

struct TypePack {
  size_t length;
  size_t isLocal : 1; // in bit 0 of elements
  const Metadata * const *elements;
};

The isLocal bit permits packs to be constructed in local memory, while still allowing the runtime to avoid redundant copying when packs are in fact global. The bit must be set if the memory backing the pack array is not guaranteed to be valid and constant for the duration of the program. The bit is guaranteed to be clear for packs loaded from type metadata.

The isLocal bit is stored in the elements pointer so that it can be set separately for different packs that happen to share the same length.

There is a runtime function to copy a pack into global memory:

/// Returns pack if !pack.isLocal; otherwise, copies it into
/// global memory and returns that pack.  The isLocal bit is
/// guaranteed to be clear in the returned pack.
extern "C" __attribute__((swiftcall))
TypePack swift_getTypeMetadataPack(TypePack pack);

We may also want a runtime function to build a pack from a rope with complex internal structure.

Type metadata pack fulfillment

A type metadata pack will be passed to a generic function for each of its generic type parameter packs that is not fulfilled. Fulfillment is an established concept in the Swift ABI which aims to avoid the need to pass redundant type metadata. The rules for fulfillment must be adjusted for variadic generics.

In all the examples below, let X, Y, and Z be ordinary generic type parameters, let P and Q be generic type parameter packs, and let A be a generic type with a single generic type parameter pack.

Fulfillment may consider concrete pack components of an argument pack, but only when they are part of a concrete prefix or suffix of the pack.

For example, given metadata for the type A<X, P..., Y, Q... Z>, X and Z are fulfilled, but Y is not.

Fulfillment from intermediate positions is likely to be rarely useful. Disallowing it minimizes complexity as well as avoiding circularity problems with deriving the pack length that is necessary to find the intermediate position.
Fulfillment may not consider the pattern component of a pack expansion.

For example, given metadata for the type A<Dictionary<X, P>...>, X is not fulfilled.

This is necessary because packs may be empty.
Generic type parameter packs are fulfilled by argument packs if the pack contains a single expansion of a pattern component that is the generic type parameter pack.

For example, P is fulfilled given metadata for any of the types A<P...>, A<X, P...>, or A<P..., Y>, but not given metadata for the types A<P?...> or A<P..., Q...>.

Fulfilling pack parameters by doing non-constant arithmetic on pack argument bounds would be complex and rarely useful, and it could lead to circularity problems.

When a type pack is passed, a data pointer is always passed (with the isLocal bit possibly set). The pack length is also passed, immediately prior to the data pointer, unless the parameter pack is constrained to have the same length as another parameter pack which is either fulfilled or precedes the parameter pack in the parameter list.

Alternatives

Ropes

Some patterns of code are likely to frequently build packs from smaller components that are not fully concrete. For example, a generic function might "append" another type to a pack by constructing the type A<T..., Int>. The operation of constructing such packs would be simplified if packs could be "ropes", which is to say, a tree of concatenations with fully concrete packs at the leaves.

However, composition is not necessarily the best operation to optimize for. It is also important to be able to efficiently decompose a pack by extracting its length and indexing into it. These operations would be made much more complex by a rope representation: in practice, they would require runtime calls instead of inlinable arithmetic and loads.

Rope representations tend to pay off better if uses of the value can simply walk the tree instead of needing to index into it. That would be a severe constraint for variadic generics. The length of a pack must often be evaluated before the loop over the pack can proceed, e.g. to allocate space for a pack of results. Efficiently building the iterative code of a function around a recursive tree walk would also be challenging, and it is unclear how "zip" expansions that would walk multiple packs in parallel could possibly work. Requiring the pack to be flattened before use greatly simplifies this problem.

Even if we accept that the standard representation of a pack will be flat, the rope idea can still be useful to simplify the problem of constructing a pack. Creating a pack from an arbitrary concatenation of expansion and concrete components can be very tricky, and it would be helpful to have a runtime function dedicated to this. Such a function probably would not need a recursive rope, however, just a flat sequence of different components.

Arrays with intrusive lengths

Packs will be passed around a fair amount. It would avoid some code size, and maybe some memory use, if the pack could be passed around as a single pointer by somehow storing the length as part of the array. For example, the length could be stored prior to the first element, like a Pascal string.

This is a very tempting option. The main downside is that it would become impossible to construct a pack from an existing array:

from the elements of a tuple metadata
from a slice of a larger pack

The C++ style of variadic templates relies heavily on slicing packs. For example, the standard way to deconstruct a parameter pack (if expansion alone isn't sufficient) is to use a function template to pull a single value off the front of the pack, then forward the rest of the pack to the same template. Each level of recursion therefore slices the pack one element smaller. If C++ had to dynamically construct type packs, it would be crucial to ensure that this slicing was efficient.

However, C++ does not have to dynamically construct type packs; they are fully erased by template instantiation. Furthermore, this C++-style approach relies on the ability to statically resolve between templated overloads based on the concrete elements of the pack. That is totally incompatible with a Swift-like approach to generics, in which behavior is resolved statically based solely on the knowledge available to the generic definition. There are no basic operations in a Swift-like design which destructure packs into smaller packs.

Another possible use for pack slicing comes from fulfillment. If we're in a function that is passed the metadata for A<X, P...>, it would be useful to be able to efficiently recover P from that metadata. But if we're in this situation because we're planning to recurse, we will need to pass the metadata for A<P...> to the recursive call, and that is not something we naturally have at hand, so we'll still need to fetch metadata.

So it's fair to ask whether this sort of slicing will ever be important enough to justify leaving room for it in the ABI. If we do not, we can store the lenght in the pack and reap some (likely small) benefits.

In the end, we chose to be more future-proof. The benefits of storing the length in the pack are small. The risks of doing so if we find a reason why slicing would be valuable are large.

Tuple type metadata.

Certain kinds of variadic generic code will require the allocation of storage for each type in a pack. It would be reasonable to allocate this as a tuple, which means that it would be valuable to have the type metadata for (P...) on hand. Some variadic generic code will frequently work with such tuples for other reasons. So simply passing a type pack as the corresponding tuple type metadata would have some advantages.

But it would also have some disadvantages. It isn't possibly to statically or locally allocate tuple type metadata, so functions would always have to call into the runtime to pass a pack. A lot of variadic generic functions don't actually need to allocate storage for pack elements; e.g., they might simply forward the values they were passed to other functions. Tuple type metadata also have the same slicing problems that arrays with intrusive lengths have.

It would be nice if we could optimize the creation of tuple type metadata from a pack, but it's unclear how to do so without recreating the slice problems.

ABI for conformance requirement packs

The elements of a generic type parameter pack (or the associated types thereof, etc.) may be constrained to conform to a protocol. For non-@objc protocols, these conformances are reified dynamically as pointers to witness tables. To use these conformances in variadic generic functions, we must therefore support witness table packs.

By and large, the ABI for witness table packs closely resembles that of type metadata packs: it is a pointer to a flat array with an isLocal bit with a separately stored length.

Since witness table packs are always parallel to a type pack, which much somehow be separately discoverable, there's no reason to ever pass the length as part of a witness table pack.

ABI for function parameter packs

A function may be passed a pack of values. We can expect that it will eventually be desired to allow values to be passed using any of the ownership conventions available for normal parameters:

as owned values
as immutably-borrowed values
as mutable references (inout)

Swift doesn't currently have an official way to distinguish between owned and borrowed arguments. However, it's in the ownership manifesto, and it's on the performance roadmap, and there's already an unofficial way to do it. We should clearly design the ABI to not rule anything out.

In all three cases, the natural ABI is an array of pointers to values with a separately stored length. It would be possible to improve on this: for example, if the elements are all constrained to have the same size, the ABI could pass an array of values rather than an array of pointers to values. But it's far from clear that this is a common enough case for variadic generic funtions to justify special treatment in the ABI.

The type of a function parameter pack is a type pack, and the length of the function parameter pack is the length of that pack. The components of that type pack always have to be recoverable independently from the function parameter pack, so there's no reason to pass the length as part of a function parameter pack.

Alternatives

Tuples

The main alternative to passing an array of pointers would be to pass a tuple. This is tempting, since the call would only need to pass a single pointer. However, this idea has a lot of problems:

If the parameter pack is forwarded as part of a larger pack, the tuple will have to be broken up, and the values will have to be moved into the larger tuple. If the tuple isn't owned by the function, this will require copying all the values, which is a lot of relatively heavy work just to forward arguments.
Copying values in this way is not possible for move-only or unmovable values, so the ABI would not be very future-proof.
Similarly, it would not be possible in general to form a function argument pack from individual borrowed values without copying them into a tuple.
Even in the cases where forming a new tuple would just involve copying bits instead of doing full copies, that's still a lot more expensive than copying pointers into a new array.
Projecting an element from the pack would require computing tuple layout instead of just indexing into an array.
Projecting a slice of the pack (if that's ever useful) would, in general, require copying values into a new tuple, since tuple layout is dependent on earlier elements. (For example, there is no portion of a (Bool, Bool, Int) tuple that matches the layout of a (Bool, Int) tuple representing its last two elements.)

ABI for other value packs

Value packs other than function parameter packs only arise locally to a function, so they're not strictly part of the ABI. Nonetheless, this seems like an appropriate price to discuss them.

Borrowed or inout value packs, when it's possible to produce them, can be represented as an array of pointers. The chief implementation difficulty here is that forming an arbitrary number of borrows at once is likely to require a number of other simultaneous allocations, like access storage for dynamic exclusivity enforcement. This isn't inherently difficult in terms of code generation, but it'll probably be challenging to express in SIL.

When a unital owned value pack is required, we can naturally allocate and store it as a tuple, together with an array of pointers to the elements. It would be nice to have some runtime support to minimize the code-size cost of that.

But in many cases, we will not actually require a unital owned value pack. For example, a function which originates a variadic generic call with concrete arguments will naturally produce all of the arguments independently, and then borrows of those values can be assembled into a pack without ever created an owned value pack for all of them.

jrose · February 9, 2022, 4:48pm

This is an extremely narrow point, but I’d suggest limiting the “length” field for packs up front to reserve room for flags. That way anything comparing the lengths of two packs, even just as an assertion, won’t have to mask off a possibly dynamic section of the field. I see that it makes sense to attach isLocal directly to the pointer though.

John_McCall · February 9, 2022, 5:37pm

That’s the thing. There are definitely clever savings we could get by limiting the length, but what’s the point if the length isn’t specific to a pack?

codafi · February 9, 2022, 7:29pm

Indeed, it seems like we have the opposite problem. In general, the problematic operation here is

func append<T..., U...>(front: (T...), back: (U...)) -> (T..., U...) { return (...front, ...back) }

Applying append recursively involves a tree of tuple-to-pack expansions that will inevitably just copy their data into a larger tuple and be immediately destroyed. The explicit append primitive here is a distillation of a pattern that has been seen as part of the strongly-typed regular expression captures effort which seeks to build precisely these kinds of trees via the function builder transform.

This isn't the end of the world by any means. The flat representation implies we'll have to invest in optimizations that can efficiently perform this kind of recursive compound initialization. For example, it's clear that only the outermost allocation in our hypothetical pack tree is the one that requires meaningful initialization. The inliner is our friend in that case, since it can chop the tree structure apart and a cleanup pass can subsequently forward the sub-expansions from the leaves towards their final destination.

Another more radical approach would be to add an instruction that borrows a fixed non-overlapping region of a pack or tuple to construct an uninitialized sub-structure. Effectively, we'd be implementing the "emit-into" peephole but for tuple structure. That all reads so cleanly in my head but it has wild implications for SIL and its ownership model so I'm definitely open to other ideas.

jrose · February 9, 2022, 11:14pm

Even flags that can only be used optimistically or within a resilience boundary could be useful. Since it costs basically nothing to make the length field smaller, why not? (When have we not wanted more space in an ABI?)

John_McCall · February 10, 2022, 1:59am

The struct is really more of a conceptual tool than something that's going to be permanently stored like that in a bunch of places. When a length is stored, we could certainly store a 32-bit value rather than a larger one, but the remaining space will not be used for anything because there will not be anything it naturally belongs to.

Mike_Ash · February 11, 2022, 4:47pm

This all sounds very sensible to me. A couple of minor questions/comments:

Does swift_getTypeMetadataPack intern the pack, or is there expected to be a corresponding free call once it's done?

Passing a parameter pack as an array of pointers makes sense, but I'm a bit vague on how the array gets passed. I assume the actual ABI-level parameter is a pointer to the array?

If we store isLocal in the high bit of elements on ARM64 instead of the low bit, we can save a teensy weensy bit of code by taking advantage of TBI so that code dereferencing elements doesn't have to do any masking.

John_McCall · February 11, 2022, 5:56pm

Interned.

Yeah.

That’s a great point, thanks!

Slava_Pestov · August 1, 2022, 7:54pm

This is an excellent writeup, I just have a few clarifying questions.

This should read sequences of borrowed values, right?

If a generic type has a pack-kinded generic parameter, do we require that the pack is the sole generic parameter in the type's generic parameter list?

How is the metadata for this type constructed in the first place? Does the caller allocate space for the pack X, P..., Y, Q... Z, copy the constituent parts in, and then call the metadata accessor for A which takes a single pack?

What does P?... mean, is that just sugar for Optional<P>...?

Slava_Pestov · August 1, 2022, 8:24pm

Thinking again about generic signature minimization with packs:

Each pack generic parameter has an abstract 'multiplicity', computed with a union-find algorithm. Projecting an associated type from a type pack produces a new type pack where the associated type is projected element-wise.
Associated types cannot be variadic, so the multiplicity of a general type parameter is determined solely by its root generic parameter.
The main operation on multiplicities is asking if two general type parameters have the same multiplicity.
A new "same-length" requirement merges the multiplicity of two generic parameter packs.
A same-type requirement between two packs merges their multiplicity classes and pairwise-equates their elements.
A same-type requirement between a pack and a non-pack type parameter equates each element of the pack to that type parameter.

The last one is interesting because we have to think about the canonical type order. With the above model, consider this generic signature:

<T..., U..., V where T...: Sequence, U...: Equatable, T.Element... == U..., V: Sequence, U... == V.Element>

We're saying that:

T, U are pack kinded
each element of T conforms to Sequence
each element of U conforms to Equatable
the pack T.Element is equal element-wise to the pack U
V is type kinded
V conforms to Sequence
each element of the pack U (which is the same as the pack T.Element) is actually equal to V

If we just ignore packs, we have this generic signature:

<T, U, V where T: Sequence, U: Equatable, T.Element == U, V: Sequence, U == V.Element>

Here, the canonical type of V.Element is U. However with packs, we don't want to canonicalize V.Element to U; they have different kinds. So a same-type requirement between a pack and a non-pack is special, because it doesn't define a reduction of type terms.

I think we want the canonical type computation to work so that the canonical form of a type parameter is another type parameter, and the canonical form of a pack parameter is a pack parameter.

This means I need to extend the representation used in the requirement machine with two additions:

this notion of 'multiplicity', which seems straightforward
preserving kind when computing canonical types (reducing terms)

John_McCall · August 2, 2022, 12:20am

Yes, sorry.

We might have that as a restriction for now in order to avoid hard problems with decomposing flat argument sequences. I don't think we should design around that being a permanent restriction, though. That probably means that e.g. we need a syntax for delimiting packs explicitly, even if we only use it in internal-facing printing like SIL.

When necessary, yeah, we could do it that way. I think we'll probably want a runtime function to create a composite pack from a list of parts like this, though, for code-size reasons if nothing else.

Yeah, that's all I meant.

vitamin · September 8, 2022, 6:12pm

Why does the list of generics have to be an array? Like could we have an option to use variadic generics where it's a set rather than an array, so the order in which the arguments are supplied can't be seen as changing the overall type of a given variadic signature?

John_McCall · September 8, 2022, 6:46pm

That's really a question about the language design of variadic generics, not the ABI. If you can think of a situation where uniquing variadic generics by type identity would be useful, it's worth bringing that up in the variadic generics design thread. Purely from an ABI perspective, it would be a different feature in the source language and could use a different representation at runtime, so the decisions we make about list-variadics wouldn't have to limit set-variadics.