Generalized opaque and existential type constraints

jayton · February 23, 2022, 11:56am

For completeness: you can already write meaningful functions where the result type is chosen by the caller, using regular generic syntax. The minimal example is:

protocol DefaultInitializable {
    init()
}

func instantiate<T: DefaultInitializable>() -> T {
    T()
}

extension Int: DefaultInitializable {}

let zero: Int = instantiate()

This may be bad style, but it currently works (intentionally), and changing the meaning would be a breaking change.

Separately, it is also meaningful in principle to have “reverse generics” in the argument list, as the argument to a continuation, although this would require a limited form of generic closure.

ensan-hcl · February 23, 2022, 12:02pm

But I think it's better to replace this feature with opaque result types. Can't we deprecate this code in the future swift and introduce a new syntax which is based on some? This feature only produces APIs which are said to be 'terrible'.

func foo<T>() -> T

jayton · February 23, 2022, 12:09pm

The “non-terrible” form is:

func foo<T>(_: T.Type) -> T

Are you suggesting that we should infer that a type parameter is “outward” if it only occurs on the right hand side, and inward if it occurs on only the left or on both sides, and also preclude the possibility of ever extending reverse generics to continuations?

Why is this better than being explicit about the direction of information flow?

michelf · February 23, 2022, 12:19pm

If we wanted opaque return types in the same brackets, there's a simple and syntactically unambiguous way to do it:

func foo<opaque T>() -> T

I don't find this spelling very appealing, but I think it's better than having a second pair of angle brackets at the end.

ensan-hcl · February 23, 2022, 12:34pm

Yes.

SE-0341 made generics easier for learner. However, when they eventually become to have to deal with named generics, they must overcome two difficulties. One is that, named generics does not fully correspond with some syntax, as some in result position works like reverse generics. The other is that, there are no named version of some in result position i.e. reverse generics. In short, there is still a large gap between complete generics syntax and some syntax.

Earlier in this thread, I suggested it as <some T: P>. This is totally based on some P in both parameter side and result type side. And when the same type parameter is shared across ->, it becomes generic, because T is first used in parameter position. For learners who will understand generics as some at first, this syntax of generics would be easier to reach than current one. But as @hborla said earlier, there should not be two ways to do the same thing. If so, I think which should retire is current one, so that I suggested to deprecate the current syntax of generic result type.

jayton · February 23, 2022, 12:39pm

Hmm. To me, having output type parameters in the angle brackets, and spelled the same way as input type parameters, seems analogous to declaring functions like this:

func add(x: Int, y: Int, result: Int) -> result

ensan-hcl · February 23, 2022, 1:01pm

I don't stick on the detailed syntax. Maybe you would feel more natural if the syntax is as follows. What I'd like to suggest was the deprecation of current syntax of generics and replace it with new syntax, so that there are not two ways for the same thing and there is no way to create 'terrible API'.

func foo(_: (some P T).Type) -> T

bjhomer · February 23, 2022, 2:23pm

hborla:

However, I do think this syntax suffers the same problem of creating two very similar ways of accomplishing exactly the same thing for input type parameters. For example, I'd assume that both of these signatures would be allowed:
func evenValues<C: Collection>(in collection: C) -> [Int] where C.Element == Int

func evenValues(in collection: some Collection C) -> [Int] where C.Element == Int
I'm having a hard time justifying having two different syntaxes to accomplish the same thing when both of them are equally verbose.

I don't think it's a big problem to have two different syntaxes to accomplish the same thing. After all, that's exactly what some P in parameter position does:

func example1<C: Collection>(collection: C) {}

// This is the same thing as example 1
func example2(collection: some Collection) {}

These are both roughly equally verbose, so the benefit is not primarily about reducing the number of characters the user needs to type. Rather, the benefit is increased clarity; we don't have to introduce a generic parameter and then use it separately. It feels analogous to how in early versions of C, you had to declare all your variables at the beginning of the function block:

void sillyExampleInC() {
  int x = 0;
  printf("x is zero\n");

  // This was illegal! Variables declarations had to come before all expressions
  int y = 3; 
  printf("y is three\n");
}

Thankfully, C moved on from that requirement, and you're now allowed to declare variables at the point where you need them. What I like about some P syntax is that it allows the same thing for generic function signatures:

// I can introduce the generic parameter at the point where
// it is used
func example3(a: Int, b: String, c: Float, d: some Collection) {}

// I don't have to do this, which requires generic parameters
// to be declared up front, even though it won't be used until the end.
func example4<C: Collection>(a: Int, b: String, c: Float, d: C) {}

Now, there are still cases where it makes sense to declare a parameter up front, but those are primarily cases where I want to constrain multiple parameters to have the same type:

// These all need to be the same type
func example5<C: Collection>(one: C, two: C, three: C) {}

In that case, declaring the generic parameter up front feels reasonable, because it is being used in multiple places. But when a generic parameter is only used in one position, declaring it inline feels reasonable.

Wrapping this all up, I like this proposed syntax:

func proposed(one: some Collection A, two: some Collection B)
  where A.Element == B.Element

and it doesn't feel like a problem to me that the same thing could also be done using angle brackets:

func current<A: Collection, B: Collection>(one: A, two: B)
  where A.Element == B.Element

The proposed way seems significantly clearer to me, but if someone is comfortable with the angle brackets, that's fine too.

tera · February 23, 2022, 2:52pm

Here's my take:

example of two parameters that must agree on collection's element type:

func proposed(one: some Collection(E), two: some Collection(E))

example of two parameters that must agree on collection's index type:

func proposed(one: some Collection(Index: T), two: some Collection(Index: T))

example of two parameters that must agree on both element and index types:

func proposed(one: some Collection(E, Index: T), two: some Collection(E, Index: T))

example of two parameters, the second must be of the same type as the first:

func proposed(one: some Collection C, two: C)

hborla · February 23, 2022, 6:00pm

The language is not moving from one syntax to another. We have introduced a very concise syntax that covers a limited -- but extremely common -- subset of generic code for the purpose of progressive disclosure, and this allows you to elide an explicit generic signature completely in those cases. I don't think it's feasible to completely change the syntax for generics and transition all code to a new syntax, especially because I don't think the some Collection C syntax covers all use cases for generics. For example, this syntax does not work for generic types with stored properties that reference type parameters for reasons I laid out in this post, nor does it work for input type parameters in a generic signature that are only used in the result type (which we've determined isn't the best pattern, but it is used in Swift code today). Generic types are especially problematic because a type parameter can be declared in the primary declaration of a generic type and only used in an extension. In general, the some Collection C syntax or any variation of it suffers from the problem of becoming attached to the nearest generic possible generic signature to where the implicit type parameter is declared. If we introduced this new variation of some, we'd have this matrix of use cases that each have a different recommended syntax. I don't think that decreases the complexity of the language or improves learnability at all.

I do agree that the some Collection C syntax reads more like prose, which I personally find very helpful, but I think we're also conflating the goals between this post and the other recent generics proposals. I think this discussion would benefit from an answer to this very good question:

The goal of opaque parameters and the same-type requirement sugar is indeed to improve the learnability of the generics system and provide a stepping stone to more advanced generic code via progressive disclosure. However, the goal of this discussion is to enable the full expressivity of arbitrary constraints on opaque result types and existential types (to the extent that we want arbitrary constraints on existential types), and therefore the intended audience here is much more advanced. I anticipate that the need for this feature will be much more rare than the need for the very concise some Collection<Element>. I completely agree with @xwu that the design here should prioritize expressivity, which likely means sacrificing conciseness in the syntax.

Philippe_Hausler · February 23, 2022, 7:11pm

In the realms of existing Swift code out there; I think some Publisher<String, Never> (or however we decide to spell it) will be pretty common. Perhaps I am not fully considering all of the usage here but for app developers using SwiftUI, I would guess that is a use case that may be more pervasive than some Sequence<Int> since [Int] is a really easy, approachable, and useful type. In no way am I saying that the some Sequence <Int> is not a great improvement here, but I feel that the usage patterns for things like the swift composable architecture, RxSwift, Combine, AsyncSequence all seem to gravitate into the camp of "APIs that REALLY lean heavily on generics". Those things are perhaps not horribly advanced tools that folks use.

Per the Combine and swift composable architecture, those are the double generic cases where the signatures would make sense for two primary generic types; you can't have a publisher without expressing both its Output and Failure. I would be quite happy if the result of generalized opaque types is that no one ever has to write .eraseToAnyPublisher() ever again. Not only for the performance implication but also for the readability improvement. As it stands folks have to write compositional operators with Combine by a fairlty complicated dance (including making types that implement Publisher and such) just to avoid that .eraseToAnyPublisher().

for example (only slightly fictitious) - four options:

The very verbose route (but gives some flexibility of the inner behavior to be controlled by the receive(subscriber:) method.

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> NonRepeatingMap<Self, T> {
    NonRepeatingMap(self, transform: transform)
  }
}

public struct NonRepeatingMap <Upstream: Publisher, Output: Equatable> {
  public typealias Failure = Upstream.Failure
  let upstream: Upstream
  let transform: (Upstream.Output) -> Output

  init(_ upstream: Upstream, transform: @escaping (Upstream.Output) -> Output) {
    self.upstream = upstream
    self.transform = transform
  }

  public func receive<Downstream: Subscriber>(subscriber: Downstream) where Downstream.Input == Output, Downstream.Failure == Failure {
    upstream.map(transform).removeDuplicates().subscribe(subscriber)
  }
}

or the less verbose but explicit type route... (which means that you can't change anything later on since it is now exposed as ABI of the signature)

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> Publishers.RemoveDuplicates<Publishers.Map<Self, Output>> {
    map(transform).removeDuplicates().eraseToAnyPublisher()
  }
}

or the type eraser route... (which hides the ABI nature behind an eraser at the cost of some performance)

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> AnyPublisher<T, Failure> {
    map(transform).removeDuplicates().eraseToAnyPublisher()
  }
}

or the opaque type route... (which is just as performant as the explicit!)

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> some Publisher<T, Failure> {
    map(transform).removeDuplicates()
  }
}

As a library author I would say that the last option is quite nice. For the case of Publisher I feel that type really makes sense to have two primary generic characteristics.

Personally, I consider Sequence to only have one primary generic characteristic; the element. Would Collection then inherit the primary-ness of the Sequence's element? If so then would the signature then be some Collection<String, .Index = Int>. This might be interesting to consider that primary characteristics cannot be elided, but say you had some Collection<String> wouldn't then the Index then be some Comparable? If so that means that the signature of some Collection<String> is just shorthand for some Collection<String, .Index = some Comparable>.

That exercise means to me that the "primary" generic characteristics are not primary (as in singular) but instead they are "required" generic characteristics. This falls in line with Publisher which has two distinct characteristics; the Output and the Failure.

ExFalsoQuodlibet · February 23, 2022, 9:04pm

I think I got slightly confused by the fact that you mentioned the Result type parameter clause option in the original post, which I read as a way to put in the same ballpark the opaque syntax and the "reverse generics" one, like they were 2 possible options to achieve the same goal. Also, I don't fully understand if we're only talking about result types, a.k.a. types in the result position, or opaque types in general, either parameter or result.

It seems to me that without explicitly naming type parameters, we'll need to "scope" the constraints in some way. Using the angle brackets is an option, but a possible alternative, that would would allow to reserve the angle brackets to the primary associated types, could be to use parentheses for the arbitrary constraints:

func foo(_ collection: some Collection<Int>(.Index == Int)) -> some Collection<Int>(.Index == Int)

This wouldn't allow to express cross-parameter constraints, but we should then understand what level of generality we want for this before going back to explicit type parameters.

But even if we want to focus our attention only to common use cases – less common that those where only the primary associated types are involved, but more common than complex cases where cross constraints on type parameters are involved – there would still be some confusion due to the fact that on the "parameter" side of things we always have the more powerful syntax at our disposal – named parameters in angle brackets – so the reasoning about a lightweight syntax with some and any on that side would be actually about syntactic sugar for relatively simple cases; while on the result side of things it wouldn't really be sugar, at least for now, because we don't have reverse generics in the language, thus we would be compelled to achieve the highest possible expressive power with generalized opaque and existential type constraints (thus, for example, the interest in adding named parameters to the some syntax).

hborla · February 23, 2022, 9:57pm

I agree! The "primary associated type" pitch has been updated to support more than one primary associated type.

Yes, the "result type parameter clause" idea is a more general version of opaque result types, i.e. "reverse generics". Arbitrary constraints are already possible for opaque types in parameter position by using explicit type parameters, so I intended for this discussion to be focused on opaque result types, specifically.

Personally, I think the level of generality before moving to explicit type parameters in angle brackets should be exactly what's in the primary associated type pitch. Of course, others might disagree I explained why I think that's the case in my post here:

And John's post here explains the "grand vision" in more detail:

tevelee · February 25, 2022, 10:40am

In an earlier discussion on the same topic, @beccadax proposed an alternative spelling which I quite like. The result type parameter declaration doesn't have to be in the middle of the function signature if we annotate it with some prefix at its regular position.

bjhomer · March 2, 2022, 6:54am

Okay, I was just thinking in terms of function declarations, but this helped me understand that there's a larger picture here. If I understand correctly, there's a desire to build a syntax for declaring constraints on opaque and existential types throughout the language. For the sake of making sure I understand that desire, I'd like to sketch out a few scenarios to make sure I understand.

(I'll use the some Collection C syntax as a strawman for now, acknowledging that arguments have been made against it, but I need to use some syntax to illustrate the ideas.)

// Given this:
protocol Strawman {
  associatedtype Input
}

We need a syntax that can be used in all of the following scenarios:

Function parameters

func example1(_ x: some Strawman A)
  where A.Input == Int

func example2(_ x: any Strawman A)
  where A.Input == Int

Return types

func example3() -> some Strawman A  
  where A.Input == Int

func example4() -> any Strawman A  
  where A.Input == Int

Stored properties
```
struct Example5 {
  var x: any Strawman A where A.Input == Int
}

struct Example6 {
  var x: some Strawman A where A.Input == Int
}
```
(There was some question whether supporting Example6 is a desirable feature; see Holly's post here. I'm just showing how it might be spelled if we did want to support it.)

Structural return types

func example7() -> (some Strawman A, some Strawman B)
  where A.Input == B.Input 

func example8() -> (any Strawman A, any Strawman B)
  where A.Input == B.Input

Are there other cases I missed?

As mentioned before, the goal of any syntax proposed here is not to replace the existing <T> syntax; rather, we're just trying to find a syntax that allows us to put constraints on some and any types. Some of the above examples could be written today using regular generics (specifically, examples 1 and maybe 6), but most of the above are currently inexpressible. Likewise, it is not a requirement that everything you can do with generics be expressible via constraints on opaque types.

I'll also note that the "primary associated types" proposal could simplify many of the examples here, but does not replace the need for a general syntax, because a protocol can have many associated types which are not a primary associated type.

With all of that written out, I'd like some clarification. Many of the suggestions earlier in this thread talk only about opaque result types, but those don't seem like they'd work with existential types. For example, using the "angle brackets after the arrow" syntax, how would you write something that returns a constrained existential type, like example4 above?

func whatDoWeDoHere() -> <T> any Strawman where ???

Likewise, what would a constrained existential member look like, especially in the case where there may be a naming conflict with an external struct?

struct S<Input> {
  var x: any Strawman A where A.Input == Input

  // If we don't support naming `any Strawman`, what can you
  // reference in a where clause?
  var y: any Strawman where ???
}

It's not clear to me that any other syntax has been proposed which can handle all the use cases suggested above.

ExFalsoQuodlibet · March 2, 2022, 8:08am

I don't think it's possible, or even desirable, to define a single syntax that could work both in a context where the where clause clearly refers to, and only to, a specific parameter, and in a context where where introduces a list of constraints that refer to several parameters at once.

Your examples 5 and 6 show a case where the where clause unambiguously refers to a specific parameter, so there's no real need to name it:

struct AltExample5 {
  var x: any Strawman where Input == Int
}

struct AltExample6 {
  var x: some Strawman where Input == Int
}

this breaks if we add structure, for example in case of tuples or some other type with generic parameters:

struct AltAltExample5 {
  var x: (any Strawman, any Foo) where Input == Int // ambiguous
}

struct AltAltExample6 {
  var x: Result<some Strawman, some Foo> where Input == Int  // ambiguous
}

which means, to me, that we could introduce named parameters only if needed, that is, to remove ambiguity when it presents itself. The compiler could help here, with an error that clearly suggests to introduce named parameters when needed.

In case of functions, the situation seems very similar, with the only difference that having more parameters, thus potential ambiguity, is simply more likely to happen. In theory, your first 2 examples could work without named parameters:

func example1(_ x: some Strawman)
  where Input == Int

func example2(_ x: any Strawman)
  where Input == Int

but once we add more parameters or return types, either because with both have input and output generics, or due to "structural" opaque return types, the only way to resolve ambiguity would be to name them, unless we attach a where clause to every single parameter for which additional constraints are declared.

But naming parameters is really only needed if we want to:

put together constraints for multiple type parameters in a single place;
cross-reference type parameters when declaring constraints;

If we decide for a smaller, simpler goal, and leave "total generality" to explicit declaration of type parameters in angle brackets, we can think about ways to attach additional constraints to some and any type parameters directly, without a detached where clause.

For example, some Strawman<.Input == Int> has beed proposed several times, but I'm not a fan of it for reasons I laid out above. some Strawman(.Input == Int) could be interesting, also:

some Strawman(Input == Int);
some Strawman(where Input == Int);
(some Strawman where Input == Int).

bjhomer · March 2, 2022, 1:52pm

Sure, if a name is not needed, it could perhaps be elided. I'll note, though, that even when there's only a single some/any, the where clause could still be ambiguous if it could reference an enclosing context:

struct ChemistryExample<Element> {
  // Is 'Element' here Collection.Element or ChemistryExample.Element?
  var x: any Collection where Element == String
}

Eliding the name where not needed may be acceptable, but it creates another "stopping point" along the continuum, which means it's another syntax the user needs to learn. I'll grant that it's a fairly intuitive one, so that may be fine.

Part of my argument here is that explicit declaration of type parameters in angle brackets does not give total generality, especially in the case of existential constraints… or least, I haven't seen anyone demonstrate how it would. How would you declare something like my example8 using angle brackets? Or how would you do something like this?

func doThisWithAngleBracketsPlease(x: any Publisher A, y: any Publisher B)
  where A.Input == B.Input, A.Output == B.Output

The conversion to generics would be fairly straightforward if these we're using opaque some types, but how would you do it while accepting existential any types?

Or how would you do this?

struct S<Input> {
  func moreAngleBracketsPlease() -> (any Publisher A, any Publisher B)
    where A.Input == Input,
          B.Input == Input,
          A.Output == B.Output,
}}

I just don't see how adding angle brackets is even supposed to solve this kind of problem.

(Edit: I adjusted my code examples slightly for clarity)

xwu · March 2, 2022, 5:55pm

Swift actually still has the remnants of an older syntax for existentials, which uses angle brackets: protocol<…>. You can see it in definition for Any as a typealias for protocol<>.

Since it’s actually a currently available syntax (very deprecated but still understood by the compiler), it could be resurrected and expanded for this use case.

bjhomer · March 2, 2022, 7:13pm

That's true. Do you have any thoughts on what that might look like? I'm not seeing how protocol<> would actually help here, other than "it has angle brackets". Specifically, how would you add a constraint between multiple types, as in this example?

I can't see how you'd do it in this form…

// Where do I put `A.Input == B.Input`?
func attempt1(x: protocol<Publisher>, y: protocol<Publisher>)

Maybe something like this?

func attempt2<T, U>(x: T, y: U) 
  where T: protocol<Publisher>,
        U: protocol<Publisher>,
        T.Input == U.Input,
        T.Output == U.Output

But now we're still not getting anything from the angle brackets; it would probably be clearer to write it like this:

func attempt3<T, U>(x: T, y: U) 
  where T: any Publisher,
        U: any Publisher,
        T.Input == U.Input,
        T.Output == U.Output

Note that I'm using T: any Publisher, but that's kinda odd, because we aren't passing in a subtype of any Publisher; we'd always be passing in exactly any Publisher; we just have some additional restrictions on it. So maybe it should be T == any Publisher there. But in either case, is this a desirable direction? It still doesn't support constraints on return types, so it seems to lack generality.

John_McCall · March 2, 2022, 11:06pm

In addition to Holly's post, I tried to lay out the larger picture here.