The syntax for variadic generics

hborla · November 20, 2022, 8:09pm

Hello, Swift community,

It's time to bikeshed the syntax for variadic generics!

The currently proposed syntax for variadic generics uses ... for parameter pack declarations and pack expansions. This syntax choice follows precedence from C++ variadic templates and non-pack variadic parameters in Swift. However, there are some serious downsides of this choice, because ... is already a postfix unary operator in the Swift standard library that is commonly used across existing Swift code bases. You can find a description of the ambiguities introduced by using ... for variadic generics in the value and type parameter packs pitch.

As I get further into the type checker implementation, I'm also becoming concerned about the impact of resolving the pack expansion operator amongst the existing ... overloads. This won't impact existing code, since code needs to be written in a variadic generic context in order to use the expansion operator, but this will have an impact on compile-time performance and diagnostics behavior in variadic generic code.

A few syntax alternatives are outlined in the variadic generics vision document. Personally, I'm most amenable to the postfix * alternative, especially because * is not a postfix unary operator in the standard library, but this choice doesn't fundamentally solve the issue with using operators for variadic generics. Previous discussion threads have also surfaced the idea to have a more verbose "pack element binding" syntax:

Pitching The Start of Variadic Generics

I could imagine us introducing, for instance, a new type of generic constraint that would allow us to supply a name for each element in a type sequence. Something like:
func allPresent<Haystacks..., Needles...>(haystacks: Haystacks..., needles: Elements...) -> Bool
  where forEach(Haystack in Haystacks, Needle in Needles, Haystack == Set<Needle>)

I personally find this syntax harder to understand; it's harder for me to read a for-in-style construct as producing a new, flat list through pack expansion, and requiring this syntax in generic signatures seems like it'd be a very long-winded way of expressing something that's fairly straightforward (e.g. zipping together two type packs). In other words, I think this is overkill for most variadic generic code, though I could see us exploring such a syntax later on to enable multi-dimensional packs.

Thoughts?

-Holly

filip-sakel · November 20, 2022, 9:42pm

I don't think declaring parameter packs will be as common as other generic types, so <pack Haystacks> seems a lot clearer that <Haystacks...> without adding a lot of verbosity. Unpacking, though, will probably be a lot more common, such as in methods of variadic types, so I'm not sure what the best syntax is. One would naturally consider for haystack in unpack haystacks {} but that will quickly get annoying while manipulating parameter packs. For example, if I forget to write unpack and see the compiler complain about it, I will have to go back to the beginning of "haystacks" to add the keyword, whereas the ellipses can easily be added at the end. Have we considered unpacking implicitly?

hborla · November 20, 2022, 10:24pm

Yes, and it really doesn't work because there's a fundamental ambiguity between element-wise repetition and repeating the entire pack. This was explored through modeling packs as tuples, but you'd get the same fundamental issue with implicit expansion using a distinct pack concept too. The following sections from the parameter packs pitch and the variadic generics vision document explain the issue in more detail:

I'm not too worried about an explicit syntax being annoying - this is an advanced feature that will be used relatively infrequently compared to all other generics features. I care most about the syntax encouraging the right mental model for thinking about parameter packs.

FWIW, using a keyword such as unpack or expand for pack expansion has the same sorts of issues as using an operator because the operand must be parenthesized for patterns involving nested or multiple pack references, making the syntax in expression context no different than a function call.

xwu · November 20, 2022, 11:05pm

I'm not sure I ever understood based on the vision document why declaring a pack and expanding it has to use the same operator, as envisioned. Years ago, in earlier iterations of the variadic generics discussion, these did not have to be (I believe in one iteration prefix versus postfix ellipses were used).

As we've discussed in the past I've been a fan of the postfix * operator due to its parallels with the regex syntax we now support in the language, where * also means zero or more of something exactly as it does for a pack. (On a cursory review, it seems Scala uses postfix * for variadic arguments, so there's some other precedent for that, and many of us are no doubt familiar with Python's prefix * syntax.)

The point about the operator's subtlety is well taken; I think if we want a little more verbosity, using a "wordy" pack declaration syntax seems to work nicely. As a bonus, I think a syntax such as the following is naturally conducive to sticking with the singular throughout without too much awkwardness:

func zip<variadic T, variadic U>(_ first: T*, _ second: U*) -> ((T, U)*) {
  return ((first, second)*)
}

(Incidentally, the semantics of * in HTML table sizing suggests a natural semantics for allowing unlabeled (_: T*, _: U*) as above, where the number of "leftover" parameters would be automatically evenly divided amongst T* and U* after other parameters are accounted for.)

filip-sakel · November 20, 2022, 11:59pm

I haven’t really worked with regex so when I see the asterisk my mind jumps to pointers. I’d assume it’s Swift’s weird take on C pointers, rather than an unpacking operator.

ebg · November 21, 2022, 1:55pm

* has so much prior art as a pointer it would be quite confusing, and additionally takes one of the few remaining unclaimed symbols left. I don't know if this is possible, but can we automatically bridge non-pack variadic parameters into packs, and use ...?

mtsrodrigues · November 21, 2022, 2:29pm

That's good. I was already pro * because the regex reason and now I'm even more seeing that it's used in a simular way in other language. I don't think we should let the prior use of * for pointer in other languages prevent us from use it because it's not something that makes sense for most swift programmers.

Nickolas_Pohilets · November 21, 2022, 4:35pm

For a tuple of length n , the complexity of converting a tuple value to a pack is O(n) .

That's disappointing. What is the difference between ABI representations of tuple and pack? It it possible to keep them the same?

hborla · November 21, 2022, 4:47pm

The discussion on the ABI for variadic generics is over here:

Slava_Pestov · November 21, 2022, 6:23pm

Two reasons:

A tuple value stores its elements contiguously in memory; the offset of each field is computed either at compile time or run time from the layouts of the element types. A value pack is not a value in of itself, but is passed as a list of pointers to values. So the tuple expansion operator takes the address of each element of the tuple and forms a pack from these addresses.
The runtime metadata for a tuple type stores a list of element/label pairs, where each element is a pointer to metadata and the label is a string. A type pack is just a list of pointers to metadata. So again, we have to iterate over the tuple elements because we need to extract the element types and skip over the labels.

The tradeoff with 1) is that forming a pack from individual values is cheaper than if a value pack stored its elements contiguously, like a tuple. We expect this to be more common than a tuple expansion. If value packs stored their elements contiguously, forming a value pack from individual values would instead require copying values around.

My assumption is that tuples and value packs aren't going to be huge in practice, so this might not be a big problem. If the tuple has a fixed size on the caller's side, we already have various optimizations that split tuples up into their constituent fields, so forming the pack might not always require O(n) operations.

Slava_Pestov · November 21, 2022, 6:27pm

It doesn't; there are in fact three operators which could in theory use different syntaxes:

Declaring a type pack in a generic parameter list, <T...>
Expanding a type pack T in a type context, like a function parameter type: args: Array<T>...
Expanding a value pack x in expression context, like an argument to a call: forward(args: foo(x)...)

However I suspect that at least 2 and 3 should use the same syntax to avoid confusion, especially since types can appear in expression context, like Array<T>.self... to form a value pack of metatypes by expanding a type pack T.

Tuple expansion also makes sense with and without an immediately following ..., so I think overloading ... to mean tuple expansion might not be the best idea. Eg, suppose foo(x:y:) takes two pack parameters, and in the caller's scope, myTuple is a tuple and t is a value pack:

foo(x: t, y: myTuple.expand)... forms a value pack by applying foo() to each element of t and myTuple pairwise
foo(x: t, y: myTuple.expand...)... forms a value pack by applying foo() to each element of t in turn, together with all elements of myTuple

Zollerboy1 · November 21, 2022, 10:12pm

I'm a bit confused right now. Aren't variadic generic functions/types (and thus parameter packs) a feature that only the compiler sees, because it then specializes them to normal non-generic functions/types? Why would there be runtime metadata for parameter packs?

hborla · November 21, 2022, 10:15pm

I suggest moving this discussion over to the dedicated thread on variadic generics ABI to keep this thread focused on syntax. Thanks!

Short answer

No, specialization of generic code is just an optimization in Swift; generally, generic code is separately compiled, and substitution happens at runtime.

hborla · November 21, 2022, 10:56pm

I agree; I think it's important for pack expansion to be expressed in the same way at the type and value level.

Using a keyword for pack declarations also might help indicate that * means something different than a pointer type in Swift for those familiar with other languages where * means pointers. That said, the pack declaration keyword wouldn't always be directly visible, e.g. if you're in an extension over a variadic generic type:

struct Container<pack Value> { ... }

// in another file
extension Container {
  func getTuple() -> (Value*) { ... }
}

The problem isn't non-pack variadic parameters using .... The issue is that ... is used as a postfix and infix range operator, and attempting to migrate this family of operators would be a big undertaking that comes with a large source break:

jeremyabannister · November 23, 2022, 6:46pm

If we end up going with a keyword I think I like @xwu’s idea of variadic Element more than pack Element. I think it sounds more grammatically fluent/natural when I read it out loud/in my head while also aligning with the proper mental model (as far as I understand it) as well or better, and more importantly it’s the ideal search term for inexperienced readers of the code looking to understand it. It seems to me like variadic covers the needs of both the experienced and the inexperienced variadic generics programmer better than pack. Just for the sake of putting ideas on the table, zeroOrMore could be argued to be the most instantly/broadly meaningful option, but so far I quite prefer variadic.

jeremyabannister · November 23, 2022, 6:47pm

This reminded me of a problem that we already have, and then I realized that maybe the two share a solution. I'll describe the possible solution here since it was directly inspired by this but if you consider it off-topic I'm happy to move the discussion elsewhere. This solution is a breaking change which would obviously need to be introduced in Swift 6, but it may be considered too radical/breaking even for that, or just not a good idea in the first place - I don't know.

The problem that we already have, unrelated to variadic generics

The problem we already have is that when I extend a generic type or a PAT I don’t get any help from code completion for the generic parameters/associated types until I have correctly typed at least the first letter, which makes discoverability poor. I often have to jump to the definition of a generic type to remind myself of the names of its generic parameters, and jump to definition doesn't always work in a complex and modular code base, so in my experience this problem is not entirely insignificant.

The solution to this current problem which leaves us just one small step from solving the problem Holly pointed out

The possible solution to this problem that occurred to me would be that we introduce and require a new syntax for declaring extensions, in which the “extension signature” (do we call it that?) is analogous to a case pattern, and a generic type or PAT is analogous to an enum case with associated values.

It's fine to extend the bare type without binding any of the generic parameters/associated types**, but it means that you can't reference them within the body of the extension:

extension Array {
    func mutated (by mutation: (inout Self)->()) -> Self {
        var copy = self
        mutation(&copy)
        return copy
    }
}

** Is there a shorter way to say "generic parameters/associated types"? Can we come up with one? I don't think "subtypes" is correct/available to take on that meaning.

If you want to reference the "subtypes" then you have to bind them (similar to having to bind the associated values of an enum case in order to use them when pattern matching using case). Autocomplete could make this generic type pattern matching very easy because after you type extension Arr it offers an autocompletion to extension Array<Element>. Using this syntax you could enable references to the "subtypes" in the extension:

extension Array<Element> {
    var first: Element? {
        guard self.count > 0 else { return nil }
        return self[0]
    }
}

If you want to extend an array of Int then you'd write:

extension Array<Element == Int> {
   
}

If you want to extend Array where the Element conforms to CustomStringConvertible then you'd write:

extension Array<Element: CustomStringConvertible> {

}

The final small evolution of the above solution that would address the issue Holly raised

And lastly, if you wanted to extend a generic type with a variadic generic parameter you would have to write:

extension Container <variadic Value> {
    func getTuple() -> (Value*) { ... }
}

Carat too tiny; didn't expand (ctt;de): Below is the syntax that would theoretically solve the issue @hborla raised:

extension Container <variadic Value> {
    func getTuple() -> (Value*) { ... }
}

Slava_Pestov · November 23, 2022, 7:33pm

This doesn't directly address your issue, but FWIW in a protocol or protocol extension, associated types are actually member types of Self:

extension Sequence {
  var first: Element { ... }

  // equivalent to:
  var first: Self.Element { ... }
}

What if the extended type has generic requirements, would you have to re-state those too? Like an extension of Set, whose Element is Hashable.

They're called type parameters. A type parameter is either a generic parameter, or a member type of another type parameter, recursively.

Wouldn't this just be the following then?

extension Array<Element> where Element == Int

beccadax · November 23, 2022, 8:34pm

I know I've made this point in Language Workgroup meetings, but perhaps not fully in public before:

One of the first prototypes of the regex builder DSL used postfix operators like * instead of named quantifiers like ZeroOrMore, based on its use in the regex literal syntax. This resulted in code samples that looked something like this:

// Adapted from WWDC22 "What's new in Swift" -- not the shipping syntax!
let regex = Regex {
    CharacterClass.horizontalWhitespace*
    Capture(CharacterClass.noneOf("<#")+?)??
    CharacterClass.horizontalWhitespace*
    "<"
    Capture(CharacterClass.noneOf(">#")+)
    ">"
    CharacterClass.horizontalWhitespace*
    ChoiceOf {
        "#"
        Anchor.endOfSubjectBeforeNewline
    }
}

When we looked at code samples like this, we noticed that single punctuation characters in postfix position, like * and +, were easily lost in the clutter of identifiers and parentheses. A single incorrect or missing quantifier can dramatically change the behavior of a line in a builder, so rendering these behaviors near-invisible in the syntax seemed like a bad design, and we chose a different direction.

I think the same logic likely applies to using postfix * for variadic generics. The invisibility of postfix * is kind of a problem when all of these would be valid and would mean slightly different things:

printPack(tuple, pack*)             // Concatenating tuple with pack
printPack(tuple, pack)*             // Expanding tuple with pack
printPack(tuple.element*, pack*)    // Concatenating tuple element pack with pack
printPack(tuple.element, pack)*     // Expanding tuple element pack with pack

It would get worse in more complicated examples where the expansion was buried in a subexpression.

hborla:

[Pitch] Parameter Packs

Now, this conflict would be resolved if we allowed the use site to bind a different name to the pack element. I can think of a couple different syntaxes for that:

for A in Arguments { A? }

each A in Arguments { A? }

Arguments.map { A in A? }

I don't love any of them, though. And in context, I worry that people won't understand that this is an expansion, so maybe they also need a trailing ..., which just makes the syntax worse. They're also a little awkward when it comes to expanding multiple packs in parallel — the first two, I can imagine something with parens, but for .map I really don't know.

I personally find this syntax harder to understand; it's harder for me to read a for-in-style construct as producing a new, flat list through pack expansion, and requiring this syntax in generic signatures seems like it'd be a very long-winded way of expressing something that's fairly straightforward (e.g. zipping together two type packs). In other words, I think this is overkill for most variadic generic code, though I could see us exploring such a syntax later on to enable multi-dimensional packs .

One thing I'll say for the map-style approach in particular is that it makes the section of code covered by the expansion very clear:

printPack(tuple, pack.map { $0 })                         // Concatenating tuple with pack
pack.map { printPack(tuple, $0) }                         // Expanding tuple with pack
printPack(tuple.element.map { $0 }, pack.map { $0 })      // Concatenating tuple element pack with pack
zip(tuple.element, pack).map { printPack($0.0, $0.1) }    // Expanding tuple element pack with pack

Outside of expressions, though, I agree that it's not a great fit.

I don't have a single set of recommendations at this stage, but here are some things I'm thinking about:

struct VariadicZip<Collections: many Collection>: Collection {
    var underlying: each Collections

    typealias Index = (each Collections.Index)
    typealias Element = (each Collections.Element)

    subscript(i: Index) -> Element { (each underlying[i.element]) }

    var startIndex: Index { ((each underlying).startIndex) }
    var endIndex: Index { ((each underlying).endIndex) }

    func formIndex(after index: inout Index) {
        for (c, inout i) in each (underlying, index.element) {
            c.formIndex(after: &i)
        }
    }
}

(Pardon any expression syntax mistakes—I'm still struggling a little with the scope of non-map-style keywords, particularly in expressions like the one in the subscript.)

Why?

many and each get across similar ideas to variadic/pack and .../expand without using jargon or overloaded symbols. I have to admit that the many/any rhyme is kind of pleasing too.
many is after the colon, not before the argument, because the fact that a generic argument is variadic feels type-y to me. (many would be allowed as a standalone keyword in these positions, short for many Any.)
Why two different keywords? I like something like each in expression context where there's literal iteration happening, and it feels strange to have either many or each used in opposing ways in a generic signature (labeling constraint) vs. a concrete type (labeling generic parameter).

An alternative that would avoid this last problem is:

struct VariadicZip<each Collections: Collection>: Collection {
    var underlying: each Collections
    // ...as before...

By moving the keyword before the generic parameter, it's no longer labeling the constraint in generic context, so it no longer has an opposing meaning in those two positions.

And, of course, I'm still thinking about map-style syntax. This seems way clearer than version with an each keyword, or ... or * suffix, or whatever else:

    subscript(i: Index) -> Element {
        (zip(underlying, i.element).map { $0.0[$0.1] })
    }

hborla · November 23, 2022, 9:40pm

A new keyword in expression context would break existing code that uses that keyword name, e.g. as the name of a function.

I just remembered that we conveniently already have a keyword that fits with the concept of pack expansion that would not break existing code. I've been hammering the concept of "repetition patterns" for variadic generics, so... we could potentially use the repeat keyword, which already cannot be used as a regular identifier in expressions.

func zip<variadic T, variadic U>(t: repeat T, u: repeat U) -> (repeat (T, U)) {
  return (repeat (t, u))
}

jeremyabannister · November 23, 2022, 9:53pm

No, only the name. I think of it as similar to this:

enum CandleUpgrade {
    case multipleWicks (wickCount: Int)
}

func demo1 (upgrade: CandleUpgrade) -> String {
    switch upgrade {
    case .multipleWicks (let wickCount):
        return """
            Here, we don't have to restate that wickCount is an Int,
            but we do have to write the name of the associated value
            if we want to use it in the body of this switch case.
        """
    }
}

func demo2 (upgrade: CandleUpgrade) -> String {
    switch upgrade {
    case .multipleWicks (wickCount: 3):
        return """
            Here, we have added an additional specification, letting us
            know that in this context wickCount is not just an Int
            but more specifically the integer 3.
        """
    }
}

extension Set<Element> {
    // Element is known to be Hashable
    // just like wickCount was known to be an Int
}

extension Set<Element == Int> {
    // Element is known to not only be Hashable
    // but more specifically an Int, just like in demo2
    // where wickCount was bound with extra specificity
}

Yes, but if we were to force people to declare extensions in this somewhat new way starting in Swift 6 then perhaps allowing my slightly more concise syntax would be an important way to ease the transition. But it's true that the extension Set<Element == Int> syntax is technically orthogonal to my suggestion/idea and could be considered separately.