[Pitch 2] Light-weight same-type requirement syntax

ksluder · February 21, 2022, 5:48pm

Quoting myself yet again:

I feel like there’s a significant disconnect between what people are asking for and what the Core Team is hearing. People are asking for a language-level abstraction over writing out a bunch of (e.g.) ConvertibleFrom* types. They’re not asking for types with multiple Selfs, and I don’t think there has been a thorough argument that proves that would indeed be necessary achieve what they’re asking for.

People aren’t asking for a new kind of associated type. They are asking for a factory for protocols. In other words, they’re asking for a type constructor T -> U -> V. To substitute real names, Int -> ConstructibleFrom<T> -> ConstructibleFrom<Int>. ConstructibleFrom<Int>.Self has a dependency on ConstructibleFrom<T>, and thus transitively on T.

John_McCall · February 21, 2022, 7:37pm

The Self type of a protocol is a parameter to the conformance predicate. Generic protocols introduce additional parameters. That is why we talk about them as multi-parameter protocols.

The fact that the parameters would be written differently is an artificial distinction with significant consequences. Rust having an unnecessary distinction between From and Into purely to allow a contextually-typed into() member requirement to be added to the second parameter of From is a fine example of that.

When people talk about functional dependencies, they mean opaque types in the requirements having functional dependencies on other opaque types, which can then be thought of as primary opaque types / parameters. This is highly valuable for most protocols, which do not need more than one parameter, but conversions are a well-known exception. In fact, conversions are the only exception that I've ever seen, across many languages that have explored this space; if there's a pool of amazing things other than conversions waiting to be unlocked by this capability, well, it's well-hidden.

Regardless, my argument is not that we should never support functionality like this. I can certainly see the value in allowing abstraction over convertibility. I do, however, thinks it's quite a bit more specific to convertibility than you're admitting. More importantly, I think it would bad to use this specific syntax for such a narrow purpose, and there are other ways to express these requirements.

It's also worth pointing out that the soundness problems with multi-parameter protocols that I alluded to come up immediately with conversions. (Again, this is very well-studied in other languages.) Abstractions over convertibility provide most of their value by deriving conversions on aggregates (optionals, arrays, etc.) from conversions on components. Those derivation rules have an irritating tendency to be overlapping and potentially unsound, and libraries often run into unanticipated limits. (For example, you cannot generally provide a transitive derivation rule; users will need to manually chain conversions.)

Tino · February 21, 2022, 7:39pm

Honestly, in another context, I would hardly associate such a spelling with a generic protocol, and I have still no intuition how exactly this should work.
I don't think it makes sense to use the obvious syntax for something different, and come up with yet something else later.

Wait… exactly that was presented as a major goal of the pitch, wasn't it??

Guessing what people actually want is tough business ;-) — so I consider the arguments based on that very weak. However, I can say that this is exactly what I'd be asking for:
A more elegant way to declare ProtocolXForInt, ProtocolXForString, ProtocolXForFoo… and imo protocol ProtocolX<T> would nail it.

Sidenote: even when it's already decided that generic protocols will never be added, I don't think the concept of primary associated types as it is presented has a good tradeoff (see [Pitch 2] Light-weight same-type requirement syntax - #100 by Karl).

Slava_Pestov · February 21, 2022, 7:46pm

It's also worth noting that Swift generics are already undecidable as specified due to the combination of same-type requirements in protocols and recursive conformance constraints. The new formulation in terms of a rewrite system carves out a decidable subset of the language by imposing a limit on the total size of the rewrite system constructed by the completion procedure, but it's hard to get an intuitive understanding of what this actually means in terms of the constraints the user is able to write. Adding more "advanced" features such as MPTCs, generic associated types will further complicate the theoretical and intuitive model of Swift generics in a way that is probably not desirable.

ksluder · February 21, 2022, 7:49pm

I don’t think that applies to the kinds of protocol factories people are asking for. Wouldn’t that be more like a function which took all the non-Self parameters and produced a protocol which accepted Self as a parameter?

Quoting myself from upthread:

Perhaps not coincidentally, collision handling is also a common illustration of the power of multi-methods. There’s probably an analogy hiding in here, not just between multi-methods and multi-Self protocols but in specializations of both which language designers have taken as practical optimizations.

I would find it very strange not to use angle bracket syntax for such a feature. But I believe you made a point earlier that associated types are not necessarily the dual of type parameters, so perhaps there’s room for both types of parameters in the angle bracket syntax.

Joe_Groff · February 21, 2022, 7:54pm

The "protocol factory" is a different way of looking at the same amount of expressivity. In the same way that you can go from a "curried" representation of a function that takes its first argument, then produces a function that takes the second argument like Int -> String -> Float, to a function that takes its arguments all at once like (Int, String) -> Float, you can look at a protocol factory as a function that you pass the generic arguments into to get a concrete protocol, which you then pass in a conforming Self type to get to a specific conformance. This is ultimately as expressive as having a model where all of the arguments are provided to the protocol at once.

John_McCall · February 21, 2022, 7:55pm

This is effectively just currying the conformance predicate. Can you explain why you think that this is a fundamental difference?

ksluder · February 21, 2022, 7:59pm

It’s the difference between (T x Self) -> Conformance and (T, Self) -> Conformance. In the first, T and Self are peers; in the second, T is “superior” to Self because it can be curried out.

I bring this up because so much of the pushback against generic protocols has been about the “unprivileged” nature of Self. It just doesn’t seem “unprivileged” to me when seen through this lens. Within a protocol definition, Self is known to refer to the conformance identified by the type argument(s) as much as the conformance identified by what precedes the colon.

I understand the argument is deeper than that, especially with respect to associated types. I am not discounting that argument, because I don’t yet completely understand it. But that’s not where the initial opposition seemed to come from.

Joe_Groff · February 21, 2022, 8:06pm

I'm not sure that's true. The primary place you "apply" protocol requirements in Swift is in generic constraints. If we had multi-parameter type classes, I would think we'd have syntax that lets you constrain any or all arguments no matter what surface syntax you use; you would be able to write both <T> where Foo(T, Int) and where Foo(Int, T). Since we allow protocol constraints to be applied directly to type arguments, the generic protocol syntax gives just a bit of sugar to one direction, since you could write just <T: Foo<Int>> but would have to write out <T> where Int: Foo<T>, but you can also "curry" either direction.

ksluder · February 21, 2022, 8:15pm

But Swift doesn’t have multi-parameter typeclasses. So why does this syntax factor into the discussion? The fact of the matter is that every Swift protocol conformance has a well-known Self type, and I’m not sure anyone has suggested changing that fact except as part of a potential generalization of the more restricted feature people are actually requesting.

In other words, could we not keep the expressive power of <T> where T: Foo<Int> to that which can be expressed today by <T> where T: FooFromInt? Or is there something absolutely fundamental to how associated types would behave in such a regime that necessitates bringing in the whole multi-parameter typeclass idea?

Slava_Pestov · February 21, 2022, 8:17pm

If you imposed some heavy restrictions on the "generic protocol" syntax, such as prohibiting generic protocols from declaring associated types and insisting that the generic parameter is always instantiated with a concrete type (so FooProtocol<Int> would be okay, but not FooProtocol<T>) then it can probably desugar to something equivalent to today's system of non-generic protocols. As soon as you allow slightly more generality, then you open a pandora's box of type-level computation and implementation complexity.

ksluder · February 21, 2022, 8:19pm

OK, I think this is where the knowledge gap lives. I know it’s been referenced previously, but I don’t think it’s been conclusively illustrated how associated types turn a relatively simple request (shorthand for generating protocols) into something too complex to implement.

Slava_Pestov · February 21, 2022, 8:28pm

It's no longer quite so simple once you allow the generic protocol to be parametrized by another type parameter (Foo<T>) and not just a concrete type (Foo<Int>). But even the latter interpretation runs afoul of some pretty fundamental assumptions in the generics system.

For better or worse, today if a type parameter T conforms to two protocols P and Q that both define an associated type named A, we introduce an implicit same-type requirement T.[P]A == T.[Q]A, with the reasoning being that a concrete type that conforms to both P and Q will use the same type witness for both instances of A. This no longer makes sense if P and Q are two instantiations of a "generic protocol" where you want A to depend on the type parameter of the protocol, for example this becomes ambiguous:

protocol Foo<T> {
  associatedtype A where A == Array<T>
}

struct Bar : Foo<Int> {}
struct Bar : Foo<Double> {}

What is Bar.A?

ksluder · February 21, 2022, 8:33pm

Isn’t this one of the uses C++ has for the :: operator? You can envision a system in which Bar.A cannot be looked up, and one must do (Bar as some Foo<Int>).A. This also extends to generic functions:

func superGeneric<T>(_ arg: Foo<T>) -> Foo<T>.A {
  return arg.A()
}

superGeneric(Bar()) // error: cannot infer type of return type Foo<T>.A
superGeneric(Bar()) as [Int] // ok, assumes T == Int
superGeneric(Bar() as some Foo<Int>) // ok, same as above
superGeneric(Bar()) as [String] // error: cannot convert expression Bar to type Foo<String>

John_McCall · February 21, 2022, 8:33pm

I'm not sure what you mean here. Tuples are the way we write product types in Swift; there's no missing (T x Self) thing that's radically different.

Swift doesn't support the higher-rank polymorphism of constraints (!) that would let you actually curry this out. Generic code that uses your ExpressibleWith<X> protocol as a constraint will always fully apply the predicate, e.g. where T: ExpressibleWith.

Now, Swift does let you "flip the quantifier" on a constraint to get an existential type. So if you "curry" to get a single-parameter constraint, you can make the existential type ExpressibleWith from that. That happens to be useless because the protocol has no instance requirements, but the existential metatype ExpressibleWith.Type is useful ("any type that you can convert a value of U to").

But here we run into the usual problem with currying, which is that it's great if the bias is right, but it actively works against you if it isn't. The notional existential type ∃ U where T: ExpressibleWith . U ("any value that can be converted to T") is probably at least as useful as ExpressibleWith.

And that directly comes back to the problem of privileging one parameter over the other. Fundamentally, there's no particular reason for ExpressibleWith to be a protocol that T conforms to instead of ExpressibleFrom<T> being a protocol that U conforms to.

Slava_Pestov · February 21, 2022, 8:36pm

Sure, but this assumption is deeply baked into the theoretical model and changing it unilaterally would break existing code. The difficulty is not in coming up with a concrete syntax for referencing the two different associated types.

Another difficulty is that in the existing "monoid" interpretation, all terms are either ground terms (generic parameters) or are constructors with arity 1 (protocols, associated types).

If a type parameter T.U is known to conform to P, this introduces a rule

T.U.[P] => T.U

Or if we make the substitution variables explicit,

$x.T.U.[P].$y => $x.T.U.$y

Similarly, a same-type constraint X == Y in protocol P becomes

$x.[P].Y.$y => $x.[P].X.$y

Which in turn creates a rule $x.T.U.Y.$y => $x.T.U.X.$y.

Another way of looking at a rule like $x.T.U.[P].$y => $x.T.U.$y is with a more general "universal algebra" view instead of a "monoid" view where the terms are functions of variable arity:

Y(U(T($x))) => X(U(T($x)))

This is the correct view for introducing MPTCs and generic associated types, because now an MPTC conformance is a term with arity 2; ConvertibleTo(U, V) rather than just P(U).

This is a major departure from the existing implementation and redoing some of the existing algorithms such as generic signature minimization in the "universal algebra" view is going to be an involved research project.

The paper on coherent presentations of monoids that our current minimization algorithm is based on wasn't even published until 2013; this isn't a well-trod area of type system research.

The other languages that have implemented similar functionality such as Rust and Haskell don't use recursive associated types to the extent Swift does, and don't support separate compilation with an ABI based around passing a minimal set of witness tables with a symbol mangling scheme that encodes a minimal rewrite system of same-type requirements, which is why rewrite system minimization is a fundamental part of our approach.

michelf · February 21, 2022, 8:39pm

I think what's pitched here is a nice addition.

I don't have much comments to make, but I'll say this: I don't think this pitch is excluding generic protocols, or even generic protocols with a nice syntax. I think a generic parameter could cohabit nicely in the same parameter list if generic protocols are to be added to the language in the future. For instance:

// -- hypothetical example with generic protocols --

// 'required T' makes the protocol generic over T
protocol InitializableFrom<required T> {
  init(_ value: T)
}

// attempting non-parametrized conformance declaration:
class X: InitializableFrom { ... } // error: T is required for conformance

// and this makes multiple conformances possible:
class X: InitializableFrom<Int>, InitializableFrom<Double> { ... }

Maybe people will find that syntax confusing, but for my part I think it's quite nice. And it's probably a good thing that "generic protocol" isn't the default too.

ksluder · February 21, 2022, 9:08pm

That’s just me falling back to SML notation.

This isn’t new to Swift programmers, or programmers in any language, especially languages with single-inheritance polymorphism. Sometimes the language forces you to choose among many possible ways to align your model with language features. It doesn’t seem like a fatal flaw to me.

John_McCall · February 21, 2022, 9:43pm

Thank you, I'd completely forgotten about collision testing; this is something I'd seen before, and I was wrong to neglect it. But I think this sample code is actually very informative, because the way you're using CollidesWith definitely does not follow the model of CollidesWith being a factory of protocols:

var objects: [any CollidesWith]

Here we're using CollidesWith without any arguments at all; this is apparently the existential type ∃ T,U where T: CollidesWith . T. This would not be legal in your "protocol factory" interpretation; to make sense of it, you have to think of CollidesWith as a single protocol with multiple parameters. It is then capturing a specific conformance of its Self type to CollidesWith, and the second type parameter Other of that conformance is statically erased the same way that Self is. Crucially, it is not somehow capturing all of the available conformances.

Also, it's very significant that you've skipped over the implementation of gatherIntersections, or even its declaration, because it appears to be returning an array of the existential tuple ∃ T,U where T: CollidesWith . (T, U). That is not actually something that can be expressed in the type system, but let's put that aside. How do we get from an any CollidesWith to this existential tuple?

Well, for starters, we presumably have a geometric algorithm that finds pairs (x,y) of colliding values from objects. That algorithm doesn't naturally only find pairs that can collide with each other: we don't know that the dynamic type of y is the hidden type parameter U of x, or vice-versa. We can imagine having a way to do that cast, but that's not really what we want anyway, because x might have multiple conformances to CollidesWith, and the conformance stored here might not be the right one. (I assume we haven't added x to objects once per conformance!) Really we want to check whether typeOf(x) conforms to CollidesWith<typeOf(y)>. That's a full application of the predicate again.

And note that the only parts of the CollidesWith conformance from the original any CollidesWith that we're actually using are the geometry parts that have nothing to do with collision. So objects probably shouldn't store any CollidesWith at all, just the geometry; when we detect a collision, we'll dynamically cast the object pair to see if we're supposed to handle it, and somehow magically get this existential tuple back that we can safely use. The "curried" protocol is useless to us, precisely because it privileges one parameter over the other; we can never actually call the collide method it provides.

Again, there's nothing wrong with the idea of using a multi-parameter protocol to solve this problem, but thinking of it as a generic protocol factory actively leads us down the wrong track.

ksluder · February 21, 2022, 11:54pm

Thanks @John_mccall for taking the time to thoroughly analyze that example. I think it does a great job showing how the complexity creeps in. This provides a lot more support to the Core Team’s current position w.r.t. generic protocols, making it easier to understand why the team is willing to let design decisions about this pitch impact or preclude the future design space that generic protocols would exist within.