Starting to implement Variadic Generics

technicated · March 3, 2019, 1:33pm

Let's say that, for fun, I want to start toying around implementing this feature, more or less in line to what is written in the Generics Manifesto.
What are, roughly, the steps that I would need to follow? What needs to be implemented (new AST nodes, new lowering features, new function application rules, etc)?

I obviously understand that this is a very challenging feature to implement, and one that might impact many areas of the language, but recently I've seen this topic coming out multiple times (like for example here, here and here), together with the "extending non-nominal types" feature.

Any kind of help / guidance would be much appreciated!

technicated · March 3, 2019, 1:36pm

@Joe_Groff may I ask you who is the Swift team member "expert" on the "generics stuff"?

suyashsrijan · March 3, 2019, 1:36pm

cc @Douglas_Gregor

Joe_Groff · March 3, 2019, 3:47pm

For something like variadic generics, I think getting the design planned out ahead of time is important, because it could significantly affect the implementation model. @Douglas_Gregor or @xedin would probably be the best people to offer help if you want to prototype ideas.

Douglas_Gregor · March 4, 2019, 6:50am

It's definitely a challenging feature! That said, there are numerous incremental steps along the way, and they should build on each other without requiring too many big leaps.

I agree with @Joe_Groff's point about having a design mapped out in advance, to some extent: the generics manifesto's description of variadic generics is fairly terse, and is in many ways a direct port of C++ variadic templates into Swift. Abstractly, the key ideas there are probably right: having the notion of "parameter packs" (for generic parameters, function parameters) that capture zero or more parameters with roughly the same shape and having "pack expansions" that replicate a particular pattern (whether it's a type, expression, or similar) for each element in the parameter pack. Syntax can be discussed at length and various features are up for debate (e.g., what to do about multiple parameter packs in a pack expansion, how inference works with parameter packs, etc.), but likely won't have a large effect on the initial stages of implementation.

Here's how I'd start on an implementation.

First, prepare the frontend: there are many places in the frontend where we assume that the only kind of generic parameter is a type parameter. With variadic templates, there will be a new kind of generic parameter: a parameter pack. For example, a GenericParamList contains a sequence of GenericTypeParamDecl * nodes. You'll want to abstract that that GenericParamList contains (e.g.) a GenericParamDecl node that effectively wraps up the GenericParamDecl * and can be extended later with a new case for parameter packs. It's easier to describe what I means in terms of Swift, which can model data structures so much better. It's like going from a GenericParamList containing an Array<GenericTypeParamDecl> to containing an Array<GenericParamDecl>, defined like this:

enum GenericParamDecl {
  case typeParameter(GenericTypeParamDecl)
}

C++ enums aren't very expressive, but you can write a C++ class that wraps an enum and handles the data storage. Now, when you do this, you'll have to do some fairly-mechanical updates to a bunch of places in the frontend to make them switch over the different kinds of generic parameters. If we could write Swift, that would mean things like this:

switch genericParam {
  case .typeParameter(let typeParam):
    // existing code that works on typeParam
}

Of course it'll be uglier because it's C++, but the idea is the same: by doing this everywhere, we set ourselves up for an easier time when we add a new case into GenericParamDecl. When you're doing this, you'll likely want to factor more functionality into the new GenericParamDecl type, so it has an API that should work for different kinds of generic parameters that come along (packs, for variadic templates, but maybe even value parameters in the future).

You'll likely find other places in the AST that need similar treatment, e.g., SubstitutionMap and BoundGenericType will need some way to describe a generic argument (e.g., a GenericArgument that contains just a Type today, but will eventually contain a list of GenericArguments, a pack expansion, etc.). Pick off just one part of the AST at a time, and we can get PRs reviewed and merged---this is infrastructure work that's general goodness and doesn't need to wait for a proposal.

Once the frontend is prepared, it's time to start working on the representation of variadic generics: add the new case for a parameter pack to GenericParamDecl:

  case typeParameterPack(GenericTypeParamDecl)

And add an isParameterPack bit and ellipsis location to GenericTypeParamDecl. The parser can set up this state, but the most significant amount of work here will be in updating all of those switch statements. It's okay to stub out ones that can't meaningfully be implemented now with llvm_unreachable("unimplemented variadic generics"), so you'll get a nice compiler crash and can find all of the places later that need updating.

At this point you can write something like:

struct Tuple<Types...> { }

but not do anything with it. Then work on something like:

Tuple<Int, String, Double>

Here, you'll want to extend your GenericArgument to include the array of generic arguments ( Int, String, Double) that are bound to the generic parameter pack Types.

This would be a good point to "lock down" any incorrect uses of parameter packs. For example, if inside the body of Tuple I were to write:

  var foo: Types

I should probably get an error that says "parameter pack Types has not been expanded; did you mean to expand it with ...?". You can do that with an AST walk that verifies that every use of a parameter pack is covered by an expansion, and will save you a lot of grief when users (or your tests) try to use a parameter pack in a place that you don't yet support.

Other features will follow: pack expansions in generic argument lists, in where clauses, etc. Each time you'll extend some structure with new cases and deal with those throughout the frontend. We'd want to scope out more of the design along the way, so we know (for example) every place in the language where we will be able to introduce a parameter pack (generic param lists, function/subscript param lists) and write a pack expansion (function calls, subscript argument lists, tuples, etc.), as well as getting a sense of any "add on" features we might want.

There is also some very interesting work on mapping this down to LLVM IR and designing the runtime metadata to support this feature, but I think it's good to tackle that further down the road.

Doug

technicated · March 4, 2019, 9:38am

That is super great advice!
I think I understand how I can start, but I have a question (for now).

When you mention the GenericParamDecl node to wrap the generics stuff do you actually mean a new entry to DeclNodes.def or it's just a separate class used in GenericParamList & friends?
In the former case, where in the node hierarchy should we put it? At the moment GenericTypeParamDecl is an AbstratcTypeParamDecl, that is a TypeDecl, but parameter packs (probably) and value parameters (of course) are not types.
I have this doubt because you mention the eventual factoring of functionalities in the new GenericParamDecl type so I am (wrongly?) assuming a type/class hierarchy may be involved.

Sorry for the errors but I'm writing this reply in a little bit of hurry!

Chris_Lattner3 · March 4, 2019, 4:41pm

Just out of curiosity, instead of making packs be a special new concept, have you considered making them be a generic type that is known to be a tuple? Just like we have "class constraints" on generic types, we could have "tuple constraints", and tuple constrained generic parameters could have accessors on them for getting the tuple length etc.

This could allow a more dynamic model for variadics that might fit better into the overarching swift design than the static C++ model.

-Chris

Chris_Lattner3 · March 4, 2019, 4:46pm

Basically, I'm thinking that it is very likely to be interesting over time to investigate making tuples be more first class: e.g. tuples conforming to protocols, the ExpressibleByTupleLiteral thread, allowing types to conform to a protocol that gives them participation with tuple patterns, etc.

If/when we do those sorts of thing, having variadics is essential, but instead of making variadics a heavy core language feature, maybe it is just light sugar for working with tuples.

In any case, I'm certainly not an expert on this stuff and this isn't exactly a baked plan :-) I'm just curious if you've thought about it.

-Chris

Joe_Groff · March 4, 2019, 6:38pm

In my thinking, we kinda already closed off that "variadics are just tuples" avenue once we committed to breaking up function arguments. That means that there are now at least two expression contexts we'd want to be able to "unpack" into. Nonetheless, we can probably limit the number of places you need variadic declarations compared to C++, since we ought to make it easy to pack into and unpack from tuple values.

One benefit of tuples preserving their built-in status is that we can maintain the "unlabeled single-element tuple types are equivalent to their element type" invariant (in other words, (T) and T are always the same type), which is important for the ergonomics of variadic returns or fields, since you don't want to have to unwrap them in the single element case.

Joe_Groff · March 4, 2019, 6:42pm

As a forum etiquette thing, I wonder whether it'd be a good idea to start pitch threads on this and the other "tuple literals" thread, so that these development threads can remain focused on implementation questions.

technicated · March 4, 2019, 6:57pm

Regarding this, in reality I initially thought (like Chris) that the variadic generics could be sugar for tuples, as I expressed in my other thread; but this is obvious subject to discussion and your point on (T) and T is a good reminder of the implications one should consider when relating new features to tuples.

I'm not sure if this is a reply to Chris's post or mine, but in any case I agree with you and would like to invite anyone interested to participate to my other thread for discussion / pitching so that this thread can "only" collect informations about implementation-level stuff

Joe_Groff · March 4, 2019, 6:59pm

It wasn't really directed at anyone in particular, I was mostly concerned about your own tolerance for design discussion.

technicated · March 4, 2019, 7:09pm

I'm sorry, I don't understand what you mean here - not saying this in a sarcastic way, I literally do not understand what you mean by "tolerance for design discussion".

Joe_Groff · March 4, 2019, 7:10pm

Maybe I'm being paranoid, but I was only worried that you had come in here asking for implementation guidance, and then we all started arguing about aspects of the design. I could see some people being put off by that.

technicated · March 4, 2019, 7:15pm

Ah, now I understand!
I'm personally not too concerned about this, but obviously for reference and discussion simplicity I'd like to keep the two aspects separated because like you say not everyone might be happy with things being mixed.

Chris_Lattner3 · March 6, 2019, 4:42am

Oh yes, very good point. Does this mean that variadic parameter packs will be "what tuples should have been"? Variadic parameter packs don't need to be varargs (in the current sense) so that is one bit of complexity they wouldn't have to eat. I think the only other things that are missing are the ownership modifiers and autoclosures.

Does ownership give us a new way to introduce inout elements to tuples? What will the fundamental difference be between a parameter pack and a tuple? We don't have partial specialization, so we can't use the C++ model for variadic templates, we need some sort of "dynamic" model.

-Chris

John_McCall · March 6, 2019, 5:01am

Random thoughts about representations:

I don't know if the runtime representation of a variadic pack of types should be an ArrayRef or a pointer to "Pascal-style" array or what, but I know it should not be flattened into the surrounding types.
That's when storing the pack in type metadata; as an argument to a function I think an ArrayRef is the way to go, especially if recursion is going to be important somehow.
If we want to do piecemeal destructuring of the value representation, and we use a tuple for the representation of a variadic pack of values, then we should promote pulling elements off the end rather than the beginning or else we'll have to move the elements in order to make the padding right.
That said, it might be more efficient to use an array of pointers to values.

technicated · March 19, 2019, 8:39am

@Douglas_Gregor @Chris_Lattner3
I was writing my document about Variadic Generics when I stumbled across Chris's post on the Introduce (static) callables thread. In the post, Chris mentions a long term goal of unifying nominal and structural types. If I'm correct, this desire is shared among multiple Swift users and other thread refers to this feature, too.

At the moment I'm thinking to represent Variadic Generics as tuples inside types and function bodies, much like variadic function arguments are passed as Arrays.
But if / when we have structural types for tuples and / or functions, because they will need Variadic Generics (I assume), they might look like this:

struct Tuple<T...> {
  // Tuple implementation
}

class Function<Args..., Result> {
  // Function implementation
}

But at this point it seems to me that we have a chicken-and-egg problem: Tuple definition requires Variadic Generics, but Variadic Generics are represented as tuples inside the type using them, so...

struct Tuple<T...> {
  // This is... a tuple?
  private var members: (T...)

  init(_ members: (T...)) {
    self.members = members
  }
}

So, looking at the (far) future, what can we do here? The only solution I can think of ATM is to actually not represent Variadic Generics as tuples, but as something else like a "parameter vector" or C++'s "parameter pack".

DevAndArtist · March 19, 2019, 10:52am

In case of Tuple<T...> I think we can rely on the stdlib internal capabilities and hide the inner storage from the user. For the end-user Tuple<T...> would conform to some kind of ExpressibleByTupleLiteral protocol to achieve the old tuple syntax. The inner storage tuple of Tuple struct does not need to be the swifts (T...) current tuple, which avoids the paradox.

technicated · March 19, 2019, 12:00pm

So you suggest that some magic could go on only with the Tuple type? Why not, this can be an acceptable trade-off.

This still leaves me with a doubt, let me clarify one thing just to be sure:

struct ZipSequence<Sequences... : Sequence> {
  // For what concerns `ZipSequence`, `Sequences` is a tuple of unknown arity (S1, S2, ..., Sn)
}
func zip<Sequences... : Sequence>() {
  // Also here, `Sequences` is a tuple of unknown arity (S1, S2, ..., Sn)
}

Simply by declaring a generic T to be variadic, I would like the type / function to see it as a tuple, and this is independent of any member declared to be of that type. So if we have this in the stdlib:

struct Tuple<T...> {
  // Here, the compiler would treat `T` as a tuple like it would do
  // for every other type, even if there are no explicit members and
  // magic storage is going on
}

part of the issue will still remain. Maybe I'm missing part of your reasoning?