Generalized opaque and existential type constraints

I don't think it's a big problem to have two different syntaxes to accomplish the same thing. After all, that's exactly what some P in parameter position does:

func example1<C: Collection>(collection: C) {}

// This is the same thing as example 1
func example2(collection: some Collection) {}

These are both roughly equally verbose, so the benefit is not primarily about reducing the number of characters the user needs to type. Rather, the benefit is increased clarity; we don't have to introduce a generic parameter and then use it separately. It feels analogous to how in early versions of C, you had to declare all your variables at the beginning of the function block:

void sillyExampleInC() {
  int x = 0;
  printf("x is zero\n");

  // This was illegal! Variables declarations had to come before all expressions
  int y = 3; 
  printf("y is three\n");
}

Thankfully, C moved on from that requirement, and you're now allowed to declare variables at the point where you need them. What I like about some P syntax is that it allows the same thing for generic function signatures:

// I can introduce the generic parameter at the point where
// it is used
func example3(a: Int, b: String, c: Float, d: some Collection) {}

// I don't have to do this, which requires generic parameters
// to be declared up front, even though it won't be used until the end.
func example4<C: Collection>(a: Int, b: String, c: Float, d: C) {}

Now, there are still cases where it makes sense to declare a parameter up front, but those are primarily cases where I want to constrain multiple parameters to have the same type:

// These all need to be the same type
func example5<C: Collection>(one: C, two: C, three: C) {}

In that case, declaring the generic parameter up front feels reasonable, because it is being used in multiple places. But when a generic parameter is only used in one position, declaring it inline feels reasonable.

Wrapping this all up, I like this proposed syntax:

func proposed(one: some Collection A, two: some Collection B)
  where A.Element == B.Element

and it doesn't feel like a problem to me that the same thing could also be done using angle brackets:

func current<A: Collection, B: Collection>(one: A, two: B)
  where A.Element == B.Element

The proposed way seems significantly clearer to me, but if someone is comfortable with the angle brackets, that's fine too.

5 Likes

Here's my take:

example of two parameters that must agree on collection's element type:

func proposed(one: some Collection(E), two: some Collection(E))

example of two parameters that must agree on collection's index type:

func proposed(one: some Collection(Index: T), two: some Collection(Index: T))

example of two parameters that must agree on both element and index types:

func proposed(one: some Collection(E, Index: T), two: some Collection(E, Index: T))

example of two parameters, the second must be of the same type as the first:

func proposed(one: some Collection C, two: C)
2 Likes

The language is not moving from one syntax to another. We have introduced a very concise syntax that covers a limited -- but extremely common -- subset of generic code for the purpose of progressive disclosure, and this allows you to elide an explicit generic signature completely in those cases. I don't think it's feasible to completely change the syntax for generics and transition all code to a new syntax, especially because I don't think the some Collection C syntax covers all use cases for generics. For example, this syntax does not work for generic types with stored properties that reference type parameters for reasons I laid out in this post, nor does it work for input type parameters in a generic signature that are only used in the result type (which we've determined isn't the best pattern, but it is used in Swift code today). Generic types are especially problematic because a type parameter can be declared in the primary declaration of a generic type and only used in an extension. In general, the some Collection C syntax or any variation of it suffers from the problem of becoming attached to the nearest generic possible generic signature to where the implicit type parameter is declared. If we introduced this new variation of some, we'd have this matrix of use cases that each have a different recommended syntax. I don't think that decreases the complexity of the language or improves learnability at all.

I do agree that the some Collection C syntax reads more like prose, which I personally find very helpful, but I think we're also conflating the goals between this post and the other recent generics proposals. I think this discussion would benefit from an answer to this very good question:

The goal of opaque parameters and the same-type requirement sugar is indeed to improve the learnability of the generics system and provide a stepping stone to more advanced generic code via progressive disclosure. However, the goal of this discussion is to enable the full expressivity of arbitrary constraints on opaque result types and existential types (to the extent that we want arbitrary constraints on existential types), and therefore the intended audience here is much more advanced. I anticipate that the need for this feature will be much more rare than the need for the very concise some Collection<Element>. I completely agree with @xwu that the design here should prioritize expressivity, which likely means sacrificing conciseness in the syntax.

9 Likes

In the realms of existing Swift code out there; I think some Publisher<String, Never> (or however we decide to spell it) will be pretty common. Perhaps I am not fully considering all of the usage here but for app developers using SwiftUI, I would guess that is a use case that may be more pervasive than some Sequence<Int> since [Int] is a really easy, approachable, and useful type. In no way am I saying that the some Sequence <Int> is not a great improvement here, but I feel that the usage patterns for things like the swift composable architecture, RxSwift, Combine, AsyncSequence all seem to gravitate into the camp of "APIs that REALLY lean heavily on generics". Those things are perhaps not horribly advanced tools that folks use.

Per the Combine and swift composable architecture, those are the double generic cases where the signatures would make sense for two primary generic types; you can't have a publisher without expressing both its Output and Failure. I would be quite happy if the result of generalized opaque types is that no one ever has to write .eraseToAnyPublisher() ever again. Not only for the performance implication but also for the readability improvement. As it stands folks have to write compositional operators with Combine by a fairlty complicated dance (including making types that implement Publisher and such) just to avoid that .eraseToAnyPublisher().

for example (only slightly fictitious) - four options:

The very verbose route (but gives some flexibility of the inner behavior to be controlled by the receive(subscriber:) method.

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> NonRepeatingMap<Self, T> {
    NonRepeatingMap(self, transform: transform)
  }
}

public struct NonRepeatingMap <Upstream: Publisher, Output: Equatable> {
  public typealias Failure = Upstream.Failure
  let upstream: Upstream
  let transform: (Upstream.Output) -> Output

  init(_ upstream: Upstream, transform: @escaping (Upstream.Output) -> Output) {
    self.upstream = upstream
    self.transform = transform
  }

  public func receive<Downstream: Subscriber>(subscriber: Downstream) where Downstream.Input == Output, Downstream.Failure == Failure {
    upstream.map(transform).removeDuplicates().subscribe(subscriber)
  }
}

or the less verbose but explicit type route... (which means that you can't change anything later on since it is now exposed as ABI of the signature)

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> Publishers.RemoveDuplicates<Publishers.Map<Self, Output>> {
    map(transform).removeDuplicates().eraseToAnyPublisher()
  }
}

or the type eraser route... (which hides the ABI nature behind an eraser at the cost of some performance)

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> AnyPublisher<T, Failure> {
    map(transform).removeDuplicates().eraseToAnyPublisher()
  }
}

or the opaque type route... (which is just as performant as the explicit!)

extension Publisher {
  func nonRepeatingMap<T: Equatable>(_ transform: @escaping (Output) -> T) -> some Publisher<T, Failure> {
    map(transform).removeDuplicates()
  }
}

As a library author I would say that the last option is quite nice. For the case of Publisher I feel that type really makes sense to have two primary generic characteristics.

Personally, I consider Sequence to only have one primary generic characteristic; the element. Would Collection then inherit the primary-ness of the Sequence's element? If so then would the signature then be some Collection<String, .Index = Int>. This might be interesting to consider that primary characteristics cannot be elided, but say you had some Collection<String> wouldn't then the Index then be some Comparable? If so that means that the signature of some Collection<String> is just shorthand for some Collection<String, .Index = some Comparable>.

That exercise means to me that the "primary" generic characteristics are not primary (as in singular) but instead they are "required" generic characteristics. This falls in line with Publisher which has two distinct characteristics; the Output and the Failure.

8 Likes

I think I got slightly confused by the fact that you mentioned the Result type parameter clause option in the original post, which I read as a way to put in the same ballpark the opaque syntax and the "reverse generics" one, like they were 2 possible options to achieve the same goal. Also, I don't fully understand if we're only talking about result types, a.k.a. types in the result position, or opaque types in general, either parameter or result.

It seems to me that without explicitly naming type parameters, we'll need to "scope" the constraints in some way. Using the angle brackets is an option, but a possible alternative, that would would allow to reserve the angle brackets to the primary associated types, could be to use parentheses for the arbitrary constraints:

func foo(_ collection: some Collection<Int>(.Index == Int)) -> some Collection<Int>(.Index == Int)

This wouldn't allow to express cross-parameter constraints, but we should then understand what level of generality we want for this before going back to explicit type parameters.

But even if we want to focus our attention only to common use cases – less common that those where only the primary associated types are involved, but more common than complex cases where cross constraints on type parameters are involved – there would still be some confusion due to the fact that on the "parameter" side of things we always have the more powerful syntax at our disposal – named parameters in angle brackets – so the reasoning about a lightweight syntax with some and any on that side would be actually about syntactic sugar for relatively simple cases; while on the result side of things it wouldn't really be sugar, at least for now, because we don't have reverse generics in the language, thus we would be compelled to achieve the highest possible expressive power with generalized opaque and existential type constraints (thus, for example, the interest in adding named parameters to the some syntax).

I agree! The "primary associated type" pitch has been updated to support more than one primary associated type.

Yes, the "result type parameter clause" idea is a more general version of opaque result types, i.e. "reverse generics". Arbitrary constraints are already possible for opaque types in parameter position by using explicit type parameters, so I intended for this discussion to be focused on opaque result types, specifically.

Personally, I think the level of generality before moving to explicit type parameters in angle brackets should be exactly what's in the primary associated type pitch. Of course, others might disagree :slightly_smiling_face: I explained why I think that's the case in my post here:

And John's post here explains the "grand vision" in more detail:

In an earlier discussion on the same topic, @beccadax proposed an alternative spelling which I quite like. The result type parameter declaration doesn't have to be in the middle of the function signature if we annotate it with some prefix at its regular position.

Okay, I was just thinking in terms of function declarations, but this helped me understand that there's a larger picture here. If I understand correctly, there's a desire to build a syntax for declaring constraints on opaque and existential types throughout the language. For the sake of making sure I understand that desire, I'd like to sketch out a few scenarios to make sure I understand.

(I'll use the some Collection C syntax as a strawman for now, acknowledging that arguments have been made against it, but I need to use some syntax to illustrate the ideas.)

// Given this:
protocol Strawman {
  associatedtype Input
}

We need a syntax that can be used in all of the following scenarios:

  • Function parameters

    func example1(_ x: some Strawman A)
      where A.Input == Int
    
    func example2(_ x: any Strawman A)
      where A.Input == Int
    
  • Return types

    func example3() -> some Strawman A  
      where A.Input == Int
    
    func example4() -> any Strawman A  
      where A.Input == Int
    
  • Stored properties

    struct Example5 {
      var x: any Strawman A where A.Input == Int
    }
    
    struct Example6 {
      var x: some Strawman A where A.Input == Int
    }
    

    (There was some question whether supporting Example6 is a desirable feature; see Holly's post here. I'm just showing how it might be spelled if we did want to support it.)

  • Structural return types

    func example7() -> (some Strawman A, some Strawman B)
      where A.Input == B.Input 
    
    func example8() -> (any Strawman A, any Strawman B)
      where A.Input == B.Input
    
  • Are there other cases I missed?

As mentioned before, the goal of any syntax proposed here is not to replace the existing <T> syntax; rather, we're just trying to find a syntax that allows us to put constraints on some and any types. Some of the above examples could be written today using regular generics (specifically, examples 1 and maybe 6), but most of the above are currently inexpressible. Likewise, it is not a requirement that everything you can do with generics be expressible via constraints on opaque types.

I'll also note that the "primary associated types" proposal could simplify many of the examples here, but does not replace the need for a general syntax, because a protocol can have many associated types which are not a primary associated type.


With all of that written out, I'd like some clarification. Many of the suggestions earlier in this thread talk only about opaque result types, but those don't seem like they'd work with existential types. For example, using the "angle brackets after the arrow" syntax, how would you write something that returns a constrained existential type, like example4 above?

func whatDoWeDoHere() -> <T> any Strawman where ???

Likewise, what would a constrained existential member look like, especially in the case where there may be a naming conflict with an external struct?

struct S<Input> {
  var x: any Strawman A where A.Input == Input

  // If we don't support naming `any Strawman`, what can you
  // reference in a where clause?
  var y: any Strawman where ???
}

It's not clear to me that any other syntax has been proposed which can handle all the use cases suggested above.

1 Like

I don't think it's possible, or even desirable, to define a single syntax that could work both in a context where the where clause clearly refers to, and only to, a specific parameter, and in a context where where introduces a list of constraints that refer to several parameters at once.

Your examples 5 and 6 show a case where the where clause unambiguously refers to a specific parameter, so there's no real need to name it:

struct AltExample5 {
  var x: any Strawman where Input == Int
}

struct AltExample6 {
  var x: some Strawman where Input == Int
}

this breaks if we add structure, for example in case of tuples or some other type with generic parameters:

struct AltAltExample5 {
  var x: (any Strawman, any Foo) where Input == Int // ambiguous
}

struct AltAltExample6 {
  var x: Result<some Strawman, some Foo> where Input == Int  // ambiguous
}

which means, to me, that we could introduce named parameters only if needed, that is, to remove ambiguity when it presents itself. The compiler could help here, with an error that clearly suggests to introduce named parameters when needed.

In case of functions, the situation seems very similar, with the only difference that having more parameters, thus potential ambiguity, is simply more likely to happen. In theory, your first 2 examples could work without named parameters:

func example1(_ x: some Strawman)
  where Input == Int

func example2(_ x: any Strawman)
  where Input == Int

but once we add more parameters or return types, either because with both have input and output generics, or due to "structural" opaque return types, the only way to resolve ambiguity would be to name them, unless we attach a where clause to every single parameter for which additional constraints are declared.

But naming parameters is really only needed if we want to:

  • put together constraints for multiple type parameters in a single place;
  • cross-reference type parameters when declaring constraints;

If we decide for a smaller, simpler goal, and leave "total generality" to explicit declaration of type parameters in angle brackets, we can think about ways to attach additional constraints to some and any type parameters directly, without a detached where clause.

For example, some Strawman<.Input == Int> has beed proposed several times, but I'm not a fan of it for reasons I laid out above. some Strawman(.Input == Int) could be interesting, also:

  • some Strawman(Input == Int);
  • some Strawman(where Input == Int);
  • (some Strawman where Input == Int).

Sure, if a name is not needed, it could perhaps be elided. I'll note, though, that even when there's only a single some/any, the where clause could still be ambiguous if it could reference an enclosing context:

struct ChemistryExample<Element> {
  // Is 'Element' here Collection.Element or ChemistryExample.Element?
  var x: any Collection where Element == String
}

Eliding the name where not needed may be acceptable, but it creates another "stopping point" along the continuum, which means it's another syntax the user needs to learn. I'll grant that it's a fairly intuitive one, so that may be fine.

Part of my argument here is that explicit declaration of type parameters in angle brackets does not give total generality, especially in the case of existential constraints… or least, I haven't seen anyone demonstrate how it would. How would you declare something like my example8 using angle brackets? Or how would you do something like this?

func doThisWithAngleBracketsPlease(x: any Publisher A, y: any Publisher B)
  where A.Input == B.Input, A.Output == B.Output

The conversion to generics would be fairly straightforward if these we're using opaque some types, but how would you do it while accepting existential any types?

Or how would you do this?

struct S<Input> {
  func moreAngleBracketsPlease() -> (any Publisher A, any Publisher B)
    where A.Input == Input,
          B.Input == Input,
          A.Output == B.Output,
}}

I just don't see how adding angle brackets is even supposed to solve this kind of problem.

(Edit: I adjusted my code examples slightly for clarity)

1 Like

Swift actually still has the remnants of an older syntax for existentials, which uses angle brackets: protocol<…>. You can see it in definition for Any as a typealias for protocol<>.

Since it’s actually a currently available syntax (very deprecated but still understood by the compiler), it could be resurrected and expanded for this use case.

That's true. Do you have any thoughts on what that might look like? I'm not seeing how protocol<> would actually help here, other than "it has angle brackets". Specifically, how would you add a constraint between multiple types, as in this example?

I can't see how you'd do it in this form…

// Where do I put `A.Input == B.Input`?
func attempt1(x: protocol<Publisher>, y: protocol<Publisher>)

Maybe something like this?

func attempt2<T, U>(x: T, y: U) 
  where T: protocol<Publisher>,
        U: protocol<Publisher>,
        T.Input == U.Input,
        T.Output == U.Output

But now we're still not getting anything from the angle brackets; it would probably be clearer to write it like this:

func attempt3<T, U>(x: T, y: U) 
  where T: any Publisher,
        U: any Publisher,
        T.Input == U.Input,
        T.Output == U.Output

Note that I'm using T: any Publisher, but that's kinda odd, because we aren't passing in a subtype of any Publisher; we'd always be passing in exactly any Publisher; we just have some additional restrictions on it. So maybe it should be T == any Publisher there. But in either case, is this a desirable direction? It still doesn't support constraints on return types, so it seems to lack generality.

In addition to Holly's post, I tried to lay out the larger picture here.

2 Likes

I did see that, but I think it took me a while to really digest it. I did have some questions.

(Emphasis added.)

This seems to suggest that adding a where clause here is not what you're hoping to achieve. The paragraph is specifically in the context of proposing the some Collection<Int> syntax, though, and I think what you were saying is not that where clauses in general are bad, but rather that adding the lightweight same-type syntax proposed there makes generics and same-type constraints feel more unified. Is that correct?

As you said later, the same-type constraint syntax handles case #2, and this thread (I believe) is more about case #1. I wanted to clarify, though. You said that we want "a fully general syntax that can express any constraint that a generic signature with a single type parameter could". I'm not sure I entirely understand what you mean there. Does that mean that we should not be focusing on cases where we need to constrain multiple parameters to have the same types? For example, I've used examples like this quite a bit:

func example(any Strawman A, any Strawman B)
  where A.Input == B.Input

There are two different existentials here, and we're constraining them to each other. It's seems desirable to me to be able to express this kind of constraint. But this kind of constraint cannot be expressed by a generic signature at all right now. Am I tilting at the wrong windmills in this thread by trying to address this use case? This proposed syntax here does seem to address "Generalized opaque and existential type constraints" as per the title of the thread, but does not specifically align them with generics.

I'm not sure how you would actually construct two values to pass to this example() function here. It seems like it might be difficult for the type checker to reason about this kind of code.

Assume the same-type constraint syntax is accepted, would this do it?

func example(any Collection A, any Collection B)
  where A.Element == B.Element { ... }

func takeInts(a: any Collection<Int>, b: any Collection<Int>) {
  example(a, b) // Marker
}

I would assume that at the marker, the type checker can see that both parameters have Element == Int.

1 Like

Yess. I'm saying that adding a fully general where-like syntax doesn't by itself achieve our goals, because that syntax will necessarily look different from simple generic arguments, and so it will fail at the critical goal of establishing stronger ties between the features.

I mean that you can't meaningfully express those things in something that's contained within a single type. That sort of link ought to require something at a wider scope. Your example has a where clause that applies across the entire function signature, for example. Or you might express it like this (using one particular syntax that I know has been proposed, without meaning to imply anything about my own preferences):

func example<T>(any Strawman<.Input == T>, any Strawman<.Input == T>)

Unless I'm mistaken, A and B here are the same type (i.e., identical existential boxes).

If we had, say, a DictionaryProtocol<Key, Value>, then I could make sense perhaps of a function that takes two values of distinct types A and B where, say, A: DictionaryProtocol, B: DictionaryProtocol, A.Key == B.Key, A.Value == String, B.Value == Int. But since these aren't all constrained within a single type, existing features can be composed to express this:

func f<K>(_: any DictionaryProtocol<K, String>, _: any DictionaryProtocol<K, Int>)

You're right; those are likely the same type. A better example would be something like this:

func example(a: any Publisher A, b: any Publisher B)
  where A.Output == B.Input

If we accept the "light-weight same-type constraint" syntax pitched elsewhere, and if Publisher adopted it, then this could be expressed like so:

func example<T, U, V>(a: any Publisher<T, U>, b: any Publisher<U, V>)

T and V are somewhat distracting here, though, if we actually don't care about the types. So maybe it could be expressed like this:

func example2<U>(a: any Publisher<_, U>, b: any Publisher<U, _>)

But again, this only works for protocols that choose to adopt the "primary associated types" pitch that allows generic parameters to follow a protocol. If we want this to work with protocols that don't adopt that, then we need some way of attaching additional constraints to individual parameters. As far as I can see, that either requires attaching the constraints directly to the type (e.g. any Publisher<.Input = T>) or it requires being able to reference them in a where clause, and doing that pretty much requires that we be able to assign names to the various types.

Having expressed these constraints, what would you proceed to do with these types or values? I’m having trouble imagining how to work with this.