SIL instructions for variadic generics

John_McCall · January 12, 2023, 11:56pm

I'm going to start with the lowered SIL instructions, i.e. those going into IRGen. I think we'll also need instructions to support the use of opaque values coming out of SILGen, but I don't want to focus on those quite yet.

Explicit pack syntax

As usual, textual SIL needs to be more explicit about some things than ordinary Swift source code. There are several places in SIL, such as apply instructions, which have explicit generic arguments for arbitrary generic signatures. Arbitrary generic signatures can contain multiple parameter packs, so we can't flatten argument packs in these argument lists without losing key structure. Instead, we have to have a way to spell packs explicitly in textual SIL. (I anticipate that we'll eventually want this in source as well, but we don't need to design that now.) The current pretty-printer for PackType uses the syntax Pack { ... }, which seems perfectly adequate, so we just need to teach the parser to accept this in SIL mode.

Status: implemented

Pack ownership and inline-ness

While pack types in the source language are always simply lists of types, pack values in SIL must express layout and ownership of their element values.

For layout, the chief question is whether values are stored inline in the pack or as addresses. We really want pack element to have uniform layout within the pack — that is, we want the pack value itself to be a flat array. So storing values inline in a pack is basically only imaginable in very specific circumstances, like if we have a parameter pack of class-constrained type, or a parameter pack of repeat Array<each U>. It's unclear if those optimizations are worthwhile.

As for ownership, we want to make sure that just changing ownership of a pack (borrowing the elements of a consuming parameter pack, for example) doesn't require extra operations dynamically. And we also want to encourage SIL patterns that still permit exclusivity checking.

Status: still being designed

Opened pack archetypes

When iterating over a pack, we need to be able to refer to the current pack element type in basically arbitrary type positions in SIL. Naturally, we use an archetype for this; the AST calls it an "opened element archetype". Like opened existentials, these archetypes are resolved dynamically within the body of a function and can take on different dynamic types within a single activation, and they can't be resolved earlier than a certain point. We therefore have very similar problems to opened existentials in terms of needing to mark type dependencies on the point where the pack element archetype is bound. As a result, we need to generalize the type-dependency tracking for opened existential archetypes to work for all kinds of "local archetype". This is straightforward.

Unlike (current) opened existentials, we may need to bind multiple pack element archetypes simultaneously. This is because two packs may be not only constrained to have the same length, but also constrained to refer to one another. For example, you could have <T..., U... where T: Collection, U: Collection, T.Element == U.Element>. In this situation, type derivations rooted in one element archetype can end up being expressed in terms of a different element archetype. The only reasonable way to avoid this problem is to bind them at the same time. This isn't really a deep problem for SIL's type-dependency tracking; it just needs to be taught that a single instruction can bind multiple archetypes. (Note that there's no need for the different types bound by an instruction to use different type dependencies.)

Currently, for subtle reasons, this is set up to bind all packs in a signature with the same length (to be precise, the same "shape class"). We may want to have a high-level optimization which recognizes that some of this data is unneeded in a particular iteration and so removes unnecessary parameters and requirements from the opened generic signature. (The opened signature has no inherent relationship to the enclosing contextual signature; see the section below on open_pack_element.)

Status: implemented

Pack index type

We need some way to refer to a specific index of a pack. On a primitive level, this could just be an integer. But we really want to try to keep pack indices well-typed: we want to make sure we don't use an index in the wrong pack. To make that possible, we need to preserve the use/def structure of how pack indices are immediately produced. That means we need to use a different type that discourages abstraction (e.g. hiding the production of a value behind a basic-block argument) and algebraic manipulation, which basically means "not just an integer type".

For this purpose, we introduce a Builtin.PackIndex, which is basically just a UInt32.

An alternative design would to be reflect the pack that's being indexed into the type of the pack index. But the specific pack isn't necessarily meaningful — we might want to use the same pack index to index into several different packs that share the same shape class. We can just require that things have the same shape class, and ultimately we are going to have to do that. Still, I think it's better to not overstate the meaningfulness of some specific pack type. It also should lead to more readable SIL.

Status: in progress

Instructions for indexing into packs

We define three instructions for indexing into packs, in service of two different basic patterns. These instructions are collectively the pack indexing instructions. All pack indexing instructions produce a $Builtin.PackIndex and define a pack shape that they index into.

`dynamic_pack_index`

The first pattern for pack indices is that we may need to do something uniformly for an entire pack, treating its elements opaquely. This is what we'll naturally produce in SILGen when we expand over a pack parameter. For this, we need to basically produce a for-like loop pattern. The easiest way to do that is to generate integer indices normally with arithmetic, then translate them into a specific pack index value, locking down the structure of any subsequent derivation.

It is undefined behavior to use an index that is greater than or equal to the length of the given pack.

%packIndex = dynamic_pack_index %i : $Builtin.Int32 of $Pack{repeat each T}

Status:: in progress

`scalar_pack_index` and `pack_pack_index`

The second pattern for pack indices is that we may need to decompose a pack. For example, if we are passing values of type Int and Float to a variadically-generic function, we will need to construct a concrete pack of type Pack{Int, Float} (or maybe borrows thereof). But we could also be doing the same thing to produce a Pack{Int, repeat each T, Float}. We need some way to produce pack indices for specific statically-known components of this. This actually requires two different instructions: one for deriving a scalar pack index, and one for deriving a pack index of an element of a component pack.

The scalar pack index case is easy: we just provide the constant index of the desired component within the pack. The component must not be a pack expansion. SIL substitution may need to change this index if there are packs preceding the index.

%packIndex = scalar_pack_index 2 of $Pack{Int, repeat each T, Float}

The component pack index is a little more complex: we need to select the component pack, but after substitution, this may no longer be a single component. The shape constraint is therefore that a slice of the pack components starting at the given constant index must have the same shape as the pack that the given pack index indexes into.

%subPackIndex = dynamic_pack_index %i : $Builtin.Int32 of $Pack{Int, repeat each T, Float}
...
%packIndex = pack_pack_index 1, %subPackIndex of $Pack{String, Int, repeat each T, Float, Double}

Status: in progress

`open_pack_element`

We need an instruction to bind a pack element archetype. In order to work after inlining (or any other form of substitution), it must not rely on the pack being a contextual pack parameter of the enclosing function. It also needs to stably define the generalization pattern of the pack elements independently of the enclosing environment. Consider, for example, a substitution that replaces a pack parameter with the concrete pack Pack{Int, Float}; when iterating this pack, what protocol requirements should we bind for each type? We derive this pattern (called the opened generic signature) from the original function and generally preserve its structure through passes (although it would be acceptable to reduce it when information is provably unused). The structure of this signature will therefore be the same as the enclosing context during initial emission in SILGen, but not necessarily at any later stage. We then provide contextual-meaningful substitutions for the parameters in this signature, which for packs will be PackTypes. SIL substitution (SILCloner) then just needs to substitute these substitutions.

The shape operand must be a pack parameter in the signature. The pack parameters in the signature with the same shape as the shape operand are collectively called the opened pack parameters. The instruction binds an opened element archetype for each opened pack parameter. The substitutions for each opened pack parameter must have the same shape, which must also be the shape of the pack indexed by the pack index operand.

This instruction expresses a relationship between each opened element archetype and a type pack in the substitutions; this relationship is important for the type-matching algorithm below. The instruction doesn't work with any pack values itself and has a value result solely for the use of type-dependency tracking.

%typeToken = open_pack_element %packIndex of <T...> at <Pack{Int, Float}>,
               shape $T, uuid "01234567-89AB-CDEF-0123-000000000000"

Status: implemented

Structural type matching for pack indices

When manipulating a pack value at a specific pack index, we must generally provide or produce an element value of an appropriate type for the pack index. These instructions will generally state the SIL type that they expect or produce. This type is then required to be well-typed for the element according to the following algorithm:

The inputs to the algorithm are a SIL type for the pack value (e.g. $Pack{*Int, *repeat each T, *Float}), a pack indexing instruction that indexes into a matching shape class, and a SIL type purportedly for the element at that index.
If the pack indexing instruction is scalar_pack_index N of $Pack{...}, then N must be the index of a non-pack-expansion component of the pack value type. The element type is well-typed if it is exactly this component type.
If the pack indexing instruction is pack_pack_index N, %idx of $Pack{...}, then %idx must index into a pack with the same shape as a slice of the pack value type starting at component N. The element type is well-typed if it is well-typed for that slice of the pack value type and %idx.
If the pack indexing instruction is dynamic_pack_index %idx of $Pack{...}, then let S be the set of opened element archetypes in the purported element type which use this same pack indexing instruction as their index. Transitively by construction, the substitution packs corresponding to these archetypes in their opening instructions must have the same shape as the value pack type. Then the element type is well-typed if, for component i of these shapes, there is a substitution which replaces opened element archetypes in S with component i of their corresponding substitution packs that makes the element type equal to component i of the value pack type (handling pack expansions appropriately).

Status: not yet implemented

`alloc_pack` and `dealloc_pack`

Pack values, in this design, are mutable arrays, generally of addresses. That's not great — it'd be much better if we had a way to iteratively construct them as immutable values — but it's the easiest thing to do in the short term, because all we really need to do is allocate them and then insert and extract values.

Pack allocation uses the stack allocator and therefore must obey stack discipline with every other instruction that does.

%pack = alloc_pack $Pack{*Int, *repeat each T, *Float}
...
dealloc_pack %pack

Status: not yet implemented

`pack_set_element_addr` and `pack_get_element_addr`

This instruction is specifically designed for packs of addresses; inline value packs have ownership questions that shouldn't be conflated with this. Honestly, the ownership questions posed by this are bad enough; it ends up creating abstraction of addresses, which we specifically do not generally want in SIL.

The type of the element must match the type of the pack at the given pack index.

pack_set_element_addr %addr: $*@opened(...) U, %packIndex, %pack: $Pack{*Int, *repeat each T, *Float}

%addr = pack_get_element_addr %packIndex, %pack: $Pack{*Int, *repeat each T, *Float} as $*@opened(...) U

Status: not yet implemented

`tuple_pack_element_addr`

In lowered SIL, tuples containing packs must be address-only, since we don't know their size. We can also force address-only patterns on tuples when we use them in element-abstracting ways (I think?). So at that point, the only thing we need is an instruction which dynamically projects out a single element of a tuple, treating the tuple as a sort of pack with heterogeneous layout. Imagine a pack type that contains all the elements of the tuple. Then a pack index into that pack is concretely the index of the tuple element, and we can use the pack index manipulations to construct that element index.

The pack index must index into a pack with the same shape as the imputed pack of the tuple elements. The result type must be an opening of the tuple element type at the pack index.

%componentPackIndex = dynamic_pack_index %i of $Pack{repeat each T}

%tuplePackIndex = pack_pack_index 1, %i of $Pack{Int, repeat each T, Float}
...
%eltAddr = tuple_pack_element_addr %tupleAddr : $*(Int, repeat each T, Float), %tuplePackIndex as $*@opened("01234567-89AB-CDEF-0123-000000000000") U

Status: not yet implemented

Erik_Eckstein · January 16, 2023, 7:19am

Looks good to me

John_McCall · January 20, 2023, 5:00am

PR for the pack indexing instructions. That brings us up to implemented for Builtin.PackIndex, dynamic_pack_index, pack_pack_index, scalar_pack_index, and structural pack indexing. The pack allocation and manipulation instructions are up next.

I haven't been updating the SIL documentation so far, but this thread can stand in until I have things working, at which point it should be painless to condense it into SIL.rst.

John_McCall · January 24, 2023, 12:55am

I'm thinking that what I'll do for pack types is introduce a SILPackType whose element type can carry address-ness and ownership. The basic spelling for that can be the same as for the AST type (Pack { ... }), and we'll distinguish them by position, the same way we do for function types. Lowering will introduce SILPackType in concrete positions.

John_McCall · January 30, 2023, 8:22am

This was done in #63292. I decided not to carry ownership, since that's currently not reflected in the type normally, but it does carry whether the pack elements are indirected a second time. For now, all packs are indirect; we'll need to think about how we can reasonably allow direct packs when the element pattern is know to be fixed-size, small, and bitwise-movable.

I need to take a quick detour to fix the parser for element archetypes to unblock testing for IRGen. After that, I'll add the instructions for allocating and deallocating value packs.

John_McCall · February 2, 2023, 12:03am

Parsing and printing support was #63324, which has now landed. Allocating and deallocating value packs is #63345, which is taking a while to land because of general CI trouble. Next up is pack insertion and extraction.

@nate_chandler has offered to help with implementing IRGen for some of these instructions while I brutally plunge forward making more work for him.

John_McCall · February 7, 2023, 6:22am

pack_element_get and pack_element_set are in #63488.

This includes the actual implementation of the structural pack indexing rules; I said I implemented them above, but I'm not sure why, because I definitely did not do any of it beyond basic shape-matching on the indexing instructions. There's a significant FIXME in the verifier logic around mapping opened-element conformances back to a lane of the substitution conformance pack that we'll definitely need sooner rather than later.

That PR also fixes a bug where you couldn't use the Pack{...} syntax in the signature of a sil function declaration. I was testing for SIL mode by checking if the parser's SIL field was clear rather than calling isInSILMode(), and that was a mistake. Apparently we do an initial parser pass over SIL source files to process all the non-SIL declarations, and the SIL field is clear during this, which is sortof reasonable; and we parse the function types of the SIL declarations during this pass to find the leading { of the function so that we can properly delimiter-match from that point. Anyway, testing for SIL mode the same thing as the rest of the code is obviously the right call, but something about why this approach doesn't work rubs me the wrong way.

John_McCall · February 8, 2023, 4:37am

tuple_pack_element_addr is #63512.

I'm not sure if I'm quite done adding SIL instructions. Because variadic substitution in the current language design can eliminate tuple-ness (e.g. substituting T := Pack{Int} into the tuple type (repeat each T) leaves you with a non-tuple type), I need SIL instruction substitution to recognize that that's happened and eliminate tuple_pack_element_addr. We know (either by structure or on pain of UB) that the index is in bounds, so we should be able to just rewrite the "tuple" address to a different type. I suppose I can just use unchecked_addr_cast for that, but I'd rather take a bigger step and introduce a type_refine_addr that asserts that we have static knowledge that the types match. I'll sleep on it.

John_McCall · February 10, 2023, 12:14am

I wrote some tests for inlining the new instructions, and the inevitable fixes are #63557. I decided to stick with unchecked_addr_cast for now. I'll add the docs for all this and then get started on SILGen.

John_McCall · February 11, 2023, 2:51am

I forgot about adding pack_length. #63589

John_McCall · February 21, 2023, 5:28pm

There were a lot of other demands on my time for a week there, but the docs are now up as #63803.

John_McCall · February 25, 2023, 6:43am

I'll just continue the development log here now that I'm working on SILGen.

I've landed a PR for baseline SILGen support as #63883. This is mostly just adding the core representational support to AbstractionPattern, although it also successfully implements calls with packs of concrete arity. At least, certain kinds of calls.

Next up is supporting packs with expansions in them. Most of the code for that is actually in the PR above, but it doesn't work yet because the AST coming out of the typechecker isn't quite what I was expecting. I talked it over with @Slava_Pestov, and I'm planning to change the AST so that it matches my expectations. Basically, the representation currently allows substitutions for pack parameters to be either packs or bare pack parameters / archetypes. The latter is equivalent to a pack containing an expansion of the parameter, so this is actually an ambiguous representation, which is a serious problem unless we do the work to canonicalize single-expansion packs down to their bare contents. Less importantly (but more personally vexing), it forces clients to handle both representations, and my PR doesn't expect to see bare parameters. So I'm looking into a PR to force substitutions to always be packs, which is something we ought to be able to immediately enforce with assertions.

I'm getting the feeling that supporting packs generally in SILGen is going to be a slog; there are just so many different things.

John_McCall · March 3, 2023, 7:49pm

I made a fair amount of progress on the representation change, but it was dragging out, so I've shelved it for the moment in order to go back to getting basic SILGen working; WIP PR is over here. We will still need to fix this before releases because of the ambiguous representation problem, I think.

I got pack expansions in variadic calls working; that PR is here. There was relatively little pain dealing with the representation problems.

Next up on the direct variadic-generics front is reabstraction, which will undoubtedly be a lot of fun. But before I do that, I've been fighting the code in TypeLowering.cpp, which has a lot of redundancy and misbehavior around recursive type properties; I've got a patch in progress to rework that into cleaner phases (and maybe eventually stop having to build TypeLowering objects just to fetch properties in the first place).

John_McCall · March 4, 2023, 7:19am

Well, the type lowering patch is up.

John_McCall · March 6, 2023, 9:33am

I've put that PR on hold, too; I need to rework it to be less ambitious, at least in the short term.

Instead, here's a PR to handle variadic pack expansions in tuples, at least enough to (1) support tuples in ignored contexts and (2) get the callee side (return statement emission) of returning tuples containing pack expansions working. This is notably missing the caller side, which is a whole other piece of code. It's also missing some code in RValue that will be necessary to unblock general tuple emission in expressions, because RValue likes to eagerly explode tuples, and we either need to stop it from doing that to tuples with pack expansions or teach it to do it correctly.

John_McCall · March 7, 2023, 4:50pm

Caller side of variadic returns is #64167. Oh, and Holly took over my pack-substitutions patch and landed it, which is awesome.

I mentioned this in the commit message, but it's worth giving it a little more prominence. When we have result types that require transformation (bridging or reabstraction, and reabstraction is the only one that can affect variadic generics), we need to first set up to receive the call result, then transform those values after the call. Ideally, we would have a dominance relationship between these two points so that we can allocate temporaries and so on. The low-level representation we're using, with explicit pack loops, breaks dominance between the points — we have to have one loop before the call to set up the pack argument, then another loop after the call to do the transformation. This makes it hard to set up anything particularly clever, and the result is that we have to fall back conservatively on emitting to a temporary.

I think the long-term solution here is the same as it is for borrowing arguments, which have a similar dominance need that pack loops break: we need to "pack-apply" some kind of local yield-once coroutine whose yields will build up packs. In the lowering, values in the coroutine that are live across the yield would be pack-ified, and then the pack-apply would turn into a loop. If a turn through the pack-apply throws, it reverses the loop and aborts the previous iterations. But we'll have to ship what we've got for at least a few releases.

John_McCall · March 9, 2023, 4:02pm

There were wild reports that getters and setters for properties of tuple-with-expansions type were "completely broken" and "crashed immediately in SILGen" and "required all such properties to be private as a workaround", but they were from untrustworthy sources so I disregarded them appropriately.

Unrelatedly, yesterday I was writing an innocent test case and discovered that the compiler crashes if you declare a tuple property that contains a pack expansion! A shocking state of affairs which I promptly set about righting. #64231 is a solid start.

The PR summary talks about the problems with the RValue abstraction and its desire to eagerly expand things. Thinking about it a little more, though, I think the RValue abstraction might be a great way of solving a different problem that we have with local variables. Basically, if we recursively expand a tuple in the arguments, we'll end up with a pack parameter for that component; but the first thing the callee has to do is implode it into a single buffer, because currently all local bindings have to be single buffers. This is really unfortunate for something like a tuple parameter that contains non-trivial types (including pack expansions), because it means we have to copy them all straight off instead of being able to rely on borrowing. It would be better if we just preserved the tuple in its destructured form in the binding, and hey, that's what RValue is all about. It's just the eagerness of doing that — and the flattened structure it uses — that are so unfortunate.

For those readers concerned that I have thoughtlessly eradicated all the SILGen crashers with variadic generics, fear not. I have another test case I'm working on involving subscripts where the indices are tuples-with-expansions, which exercises a bunch of other paths that are still happily asserting. And I haven't even looked at reabstraction yet.

John_McCall · March 10, 2023, 6:48am

Slava did a ton of work to get variadic generic tuple types working in IRGen, but he needed to take a couple days off and asked me finish his patch by making some fixes to IRGen's type fulfillment and local type caching system, which falls squarely into the category of My Fault. That PR is now up as #64273. I'm pretty happy with where this ended up, but there's some interesting future work to consider.

First, there's a lot more we can do with fulfillments:

fulfilling N counts from N+1 packs; I actually wrote the code for this but didn't make the fulfillment code take advantage of it
fulfilling scalar type metadata from packs; we can actually fully decompose packs as long as all the pack lengths involved are fulfilled elsewhere, which I think wouldn't be too difficult for us to handle
fulfilling arbitrary pack expansions, not just repeat each T components
fulfilling complete packs

And secondly, we don't seem to be caching complete packs right now, which is a little unfortunate.

John_McCall · March 11, 2023, 12:38am

A few more scattered fixes across the entire lowering pipeline in #64293:

There was an ownership bug in the last few SILGen PRs where we weren't actually activating a cleanup when copying from a tuple value into a pack expansion.
I hit the optimizer with a big hammer marked SILType::aggregateHasUnreferenceableStorage() to try to convince it to ignore tuples with pack expansions in them in passes like SROA. This is not the right thing to do, but it's efficacious.
The outlining logic in IRGen needed to be taught not to try to apply to opened-element archetypes.

DevAndArtist · March 12, 2023, 6:41pm

Hi John, quick questions:

is this some kind of personal progress reporting thread? The information you share is very interesting to follow along.
are there any plans or reservation in the implementation for variadic packs to be able to expand / project onto enum cases? Basically where each generic type parameter from a pack would project a single associated value in a new separate case. Either is an example of two generic type parameters.

SIL instructions for variadic generics

Explicit pack syntax

Pack ownership and inline-ness

Opened pack archetypes

Pack index type

Instructions for indexing into packs

dynamic_pack_index

scalar_pack_index and pack_pack_index

open_pack_element

Structural type matching for pack indices

alloc_pack and dealloc_pack

pack_set_element_addr and pack_get_element_addr

tuple_pack_element_addr

`dynamic_pack_index`

`scalar_pack_index` and `pack_pack_index`

`open_pack_element`

`alloc_pack` and `dealloc_pack`

`pack_set_element_addr` and `pack_get_element_addr`

`tuple_pack_element_addr`