Generalized opaque and existential type constraints

hborla · February 22, 2022, 4:11am

Constraints on opaque and existential types are currently limited to protocol and protocol composition requirements on the underlying type. More advanced requirements, such as constraining an associated type, are currently not supported in the language. [Pitch 2] Light-weight same-type requirement syntax proposes to add a limited form of same-type requirement on "primary" associated types, but this feature is not a replacement for a more general syntax.

This post outlines a few ideas for expressing more sophisticated constraints on opaque and existential types and their implications/tradeoffs. This is not an exhaustive list - other ideas and brainstorming are welcome!

Constraints in a normal where clause

A seemingly obvious place to express constraints on opaque result types is in a trailing where clause, along with other constraints on the input type parameters.

func test() -> some Collection where Index == Int

This syntax could apply to both opaque and existential types. However, this syntax has two glaring ambiguities that, in my opinion, make this option unviable:

It’s not clear whether the requirements in the where clause apply to the opaque type or the enclosing function. This creates an ambiguity between type parameters in scope, and associated types of the opaque result type.

Requiring a leading dot to reference associated types of the opaque result type will resolve the ambiguity between associated types and type parameters in scope, but that's very easy to forget and constrain the wrong type if one of the input type parameters/associated types has the same name. Repetition between in-scope type parameters and associated types is very common with Collection.Element. This also doesn't offer a solution to the second ambiguity:

There’s no way to disambiguate which associated type you want to constrain when you have multiple opaque types with associated types of the same name, e.g. (some Collection, some Collection) where Index == Int

We could allow where clauses in parenthesis to disambiguate and require distinct where clauses for each opaque or existential type, but this could lead to extremely verbose function signatures including an arbitrary number of explicit where clauses, such as (some Collection where Index == Int, some Collection where Index == Int) where Element: Comparable.

Constraints in angle-brackets

A very popular suggestion is to write constraints on associated types in angle-brackets directly on the opaque or existential type, using leading-dot syntax to refer to associated types:

func test() -> some Collection<.Index == Int> { ... }

This syntax looks a lot like a where clause, but it’s more limited in what you can express with a general where clause. For example, this syntax does not allow expressing relationships between multiple opaque types, e.g. a pair of opaque types that are statically the same type.

It’s also not clear where this syntax would be supported. If it’s only allowed on opaque or existential types, i.e. types declared with some or any, it could be confusing that you can’t use this same syntax in other places where associated types are constrained, e.g. extension Collection<.Index == Int> { ... } and other contexts where the type parameter constrained to the protocol has a name. If you can use it everywhere, then we’ve just added two different ways to write constraints on a single type parameter that have almost the same syntax:

extension Collection where Element == Int { ... }

extension Collection<.Element == Int> { ... }

It doesn't seem great to have two nearly-identical ways of writing the same thing. This would also create further distance between requirements on associated types versus type parameters, because presumably this would be invalid:

extension Array<.Element == Int> { ... } // error!

Result type parameter clause

Without a way to name an implicit type parameter declared by some, enabling fully generalized constraints on opaque result types requires named result type parameters:

func test() -> <C> (C, C) where C: Collection, C.Index == Int { ... }

The first issue with this syntax is that it’s unclear how it would apply to existential types because there isn’t currently a way to name a type parameter and explicitly erase it while maintaining its requirements.

Next, this syntax is that it’s pretty inscrutable, especially in combination with input type parameters:

func groupedValues<C>(in collection: C) -> <Output> (even: Output, odd: Output)
  where C: Collection, C.Element == Int, Output: Collection, Output.Element == Int
{ ... }

In addition to having another type parameter declaration clause in the middle of the function signature, untangling requirements between the input and output generic signatures specified in the same where clause adds implementation complexity and cognitive load for programmers writing and using such a declaration. Further, this extremely verbose syntax could “infect” callers that need to ascribe a type to the result of calling the function, e.g. to disambiguate between overloads, requiring a full generic signature in the type annotation:

let grouped: <Output> (even: Output, odd: Output) where Output: Collection, Output.Element == Int = groupedValues()

The generality of this syntax is great, but it would most likely be extremely onerous and confusing, as proven by similar issues with regular generic signatures today.

Requirement inference

SE-0328: Structural opaque result types made the deliberate decision to not support requirement inference on opaque types. However, enabling requirement inference for opaque types unlocks the full expressivity of a where clause on the opaque type via generic type alias. For example:

typealias IntegerIndexed<C> = C where C: Collection, C.Index == Int
func test() -> IntegerIndexed<some Collection> { ... }

Requirement inference offers a solution to the “call-site infection” mentioned above, because the type alias can be used in a type annotation rather than writing the generic signature directly. However, this pro is also a con because additional constraints cannot be expressed at the declaration of the opaque type, requiring programmers to seek out the underlying typealias type in order to understand the API contract of an opaque result type. Finally, constraint inference via generic type aliases don’t offer a solution for existential types.

Please let me know your thoughts on the ideas above, and any other ideas that I didn't think of!

xwu · February 22, 2022, 4:42am

To my simplistic mind, a general syntax for an advanced feature has quite different design constraints from what we've been discussing recently in other related threads—those are by contrast aimed (in large measure, at least) at improving the ergonomics of the most common uses by achieving the right amount of "lightweightness" intentionally with loss of generality.

With that framing, here's my take:

I agree very much with your assessment that the "result type parameter clause" syntax has great generality. Thus, it seems to optimize for exactly the right things—namely, generality and, implicit in that statement, a certain amount of predictability because it's generalized from a feature that users already know and use. Indeed, in other threads some of us have been trying to explain to each other what the lightweight syntax means in certain tricky spots by "desugaring" that syntax into this more verbose form that isn't even a real thing yet for Swift: to me this is a great testimony to its usefulness. That the syntax can be onerous due to verbosity is significantly less of an issue for what we're optimizing for, and doubly so since we've got that other conversation about a more lightweight counterpart to this feature.

I think your critique of "constraints in angle brackets" is spot on as well and, in contrast to the verbosity shortcoming of the alternative, quite a big strike against it: "It doesn't seem great to have two nearly identical ways of writing the same thing." Put another way, this syntax isn't a generalization of an existing feature but rather its own idiosyncratic spelling; it duplicates rather that builds upon what users already know. Yes, it does manage to be more lightweight than the alternative mentioned above (but of course not as lightweight as the lightweight syntax in the other pitch), but at the cost of generality (you can't name the type parameter even if you want to) and duplication. This seems to run counter to what we're trying to optimize for.

Your point that it's unclear how result type parameter clause syntax would apply to existential types is an issue; I suspect with some thought this too would be achievable, and some of how we approach the discussion is surely contingent on how certain it is that the underlying support for existential boxes with such advanced combinations of requirements is coming.

Philippe_Hausler · February 22, 2022, 6:22am

One question to perhaps highlight here that might be of use: What is the primary use case of what this type of syntax is supposed to solve? In other words, what usages are the target audience?

For example - in a project I am working on one generic syntax ended up having a total of 14 types nested over 7 layers deep. A redacted form (and even shortened it is a beast) A<B<C<D<E<F, G<F.H>>, I<J<G<F.H>>>>, E<K, G<F.H>>>>, L<M>>, N>. In truth this really boiled down to a type that could have been expressed with a conformance to a protocol with two associated types being specified. We ended up using a type alias to solve it but spelling that out means that if we ever need to change anything it is an entire ordeal to do so.

That anecdote is a long way of saying that even the longer form where clauses are preferable to the alternative of needing to spell everything out, and even more preferable to exposing those types for folks to have to consume. I would much rather have to write some X<.Y == Foo, .Z == Bar> than that monstrosity previously illustrated, or even needing to write out some more complex where-clause is preferable to the nearly unreadable signatures.

Any technical hinderance to shortening (such as loosing effects) is in my view is quite important; almost as important as getting those associated types being bound correctly, or performance being top notch.

If nothing is available to solve those effects, either a) folks will just operate under the case where the effect is presumed and the meaning of said effect is then devalued or b) folks will be pushed to make erasers which can hinder performance (and maybe even correctness).

Let's take the case where a non zero faction takes choice a); that means that code will be written presuming that some types (even more advanced ones) never ferry effects and always presume the worst case. Later on, we cannot require a specification of that effect unless we default the effects to be enforced. By choosing to avoid effects it means that our fate is chosen for us later down the road. For something that has an effect of throws it means that the default must be throws, for something that has an effect of async the default then must be async; honestly I am not convinced that should be unilaterally the case; as a matter of fact I would say that the default is the other way around for throws - things should ONLY throw if throwing is a specified effect.

For a more concrete example (which I have previously raised concerns about) take AsyncSequence. The question is then: What are the potential valid effects of an AsyncSequence? Is it just throws? Is it throws and nothrows? Is it throws, nothrows, async, and noasync? No matter the choice that is made about type constraints, the decision on what is default will be made for us if we don't choose how effects play into this.

It was posed before that @rethrows (when finally formalized... which perhaps it should be considered as defacto-formalized due to the existence of AsyncSequence) would create a second synthetic typealias of Failure that could either be Error or Never. If that pseudo "typed throws" is the solution can we have a type only specify part of its signature as a type constraint? e.g. would it even make sense to have a some DictionaryProtocol<.Key == String>? Could we later without breaking ABI or source then permit some DictioanryProtocol<.Key == String, .Value == Int?

Long/short is that there are a number of types and use cases that really could benefit from a fully operational opaque type constraint system that includes effects even if the actual spelling of which is a bit longer than a short hand of some Foo<Bar>.

Hopefully these concerns can help find a solution that works for more complex types as well as effects.

ExFalsoQuodlibet · February 22, 2022, 11:02am

I think that, overall, the Result type parameter clause is the best, for several reasons:

expressive power;
same syntax as named input parameters;
allows to put the where clause at the end, and list all constraints in a single place.

The issues that you mention seem to me, on average, pretty minor, – save for the impossibility to apply it to existential types (I rarely use them so I never really thought about this issue) – if compared to its power and generality. I think that, even if we define some sugar for it, adding type parameters to the function outputs (a.k.a. "reverse generics") is inevitable. Of course if the need to have the same feature for existential types is proven to be essential for the evolution of the language, then this option doesn't seem viable.

So let's assume that we want to push for opaque type declarations with some, and want to give them as much expressive power as possible.

The Constraints in a normal where clause option seems clearly not expressive enough, and prone to ambiguities.

I don't personally like the Constraints in angle-brackets option, for the reasons you mentioned, and for the fact that I'd like, eventually in Swift, to be able to name the generic parameters that are being specialized in all places where angle brackets are used: what I mean is that, for example, if I have a (concrete) type like Reducer<State, Action, Environment>, I'd love to be able to write something like Reducer<State == [String: String], Action == String, Environment == Void> to clarify, at call site, which type is associated to which type parameter. This would also apply to protocols with "primary" associated types.

The Requirement inference option is interesting, but suffers from several issues you mentioned in the other options, and doesn't really scale when needing to add more type parameters to the typealias, for additional constraints.

In thinking about a different option, I'd like the keep the following requirements:

the option to have a single where clause with the list of constraints;
the possibility express cross-type constraints.

It seems to me that these requirements can only be satisfied by explicitly naming each type, by assigning them a named parameter. But this "assignment" could be done differently, for example by declaring the name of the parameter in parentheses. This code

func test() -> <C> (C, C) where C: Collection, C.Index == Int { ... }

could be written like this:

func test() -> (some Collection(C), some Collection(C)) where C.Index == Int { ... }

This way the C.Index == Int constraint could be added in a second step without rewriting the whole function signature. This would also have the added benefit to constrain the 2 collections to have the same type.

This would play well enough with "primary" associated types

func test() -> some Collection<String>(C) where C.Index == Int { ... }

The progressive disclosure (my favorite Swift design principle) would be strong with this one. I could see a teaching path like:

some Collection

some Collection<Int>

some Collection<Int>(C) where C.Index == Int

and, in theory, this could also apply to the any case.

hisekaldma · February 22, 2022, 11:16am

ExFalsoQuodlibet:

It seems to me that these requirements can only be satisfied by explicitly naming each type , by assigning them a named parameter. But this "assignment" could be done differently, for example by declaring the name of the parameter in parentheses. This code
func test() -> <C> (C, C) where C: Collection, C.Index == Int { ... }
could be written like this:
func test() -> (some Collection(C), some Collection(C)) where C.Index == Int { ... }

I’ve been thinking about something similar, but without the parentheses:

func test() -> (some Collection C, some Collection C) where C.Index == Int { ... }

That feels very natural to me. But I can understand if it’s a pain to parse.

jayton · February 22, 2022, 12:00pm

One seemingly obvious approach that should be discussed, even if it can be shot down quickly, is to put “output types” in the regular angle brackets, with a sigil:

func groupedValues<C, out Output>(in collection: C) -> (even: Output, odd: Output) where ...

In addition to avoiding the “yet another parameter list in the middle” problem, it works nicely for continuations (should we add the requisite semantics):

func unzip<T, U, C: Collection<(T, U)>, out R1: Collection<T>, out R2: Collection<U>>
(_ collection: C) -> (R1, R2)

func unzipWithPyramidOfDoom<T, U, C: Collection<(T, U)>, out R1: Collection<T>, out R2: Collection<U>>
(_ collection: C, continuation: (R1, R2) -> Void)

Edit: I actually have something of a principled argument for this, not just aesthetics.

As I understand it, the intuition behind using some for inward type parameters as well as outward ones is that they’re essentially the same thing, just with information flowing in different directions. If we believe that, then using the same generalized syntax (but annotating the direction of information flow) seems like the most coherent option.

jayton · February 22, 2022, 12:01pm

Incidentally, looking at this five-parameter monstrosity, my feeling is that the “inness” of C and “outness” of R1 and R2 feels like a lesser distinction than that between T and U as mere “carriers” of type identity vs. C, R1 and R2 as actual protocol constraints. One could imagine syntax to leverage this, such as:

func unzip(_ collection: some Collection<($T, $U)>) -> (some Collection<$T>, some Collection<$U>)

Jumhyn · February 22, 2022, 1:47pm

hisekaldma:

I’ve been thinking about something similar, but without the parentheses:
func test() -> (some Collection C, some Collection C) where C.Index == Int { ... }
That feels very natural to me. But I can understand if it’s a pain to parse.

I've always been attracted to the some Collection C where ... syntax due to the way it reads out almost as just the natural English I'd use to describe the signature, but I think the difficulty to figure out what the actual named parameters are makes it a non-ideal syntax. Especially so since I think this syntax reads better with terse, ~single letter generic parameter names, whereas Swift tends towards longer, more self-explanatory names for generic parameters.

ensan-hcl · February 22, 2022, 1:48pm

Thank you for the post!

After the acceptance of SE-0341, I'm lean to think that the syntax of named generics should share the semantics of some. Let's consider pseudo syntax, <some T>. This is a named version of some syntax, but it can provide necessary and sufficient features.

Generic parameters and opaque result types corresponds as this. Of course where constraints work.

func foo(value: some Numeric) -> some Numeric
// corresponds to
func foo<some T: Numeric, some U: Numeric>(value: T) -> U

// works
func test<some C>() -> (C, C) where C: Collection, C.Index == Int { ... }

Furthermore, when T is shared in both parameter position and result type position, we can understand it as the next.

func square<some T: Numeric>(value: T) -> T { value * value }
// corresponds to
func square<T: Numeric>(value: T) -> T { value * value }

Though very simple, the merit of this <some T> syntax is huge.

First, learners would more easily understand this syntax than generics and addressed 'result type parameter clause', because they are just named versions of some. Second, we can more naturally and fluently write reverse generic result types, and so that it reduces the potential 'cognitive load' which you raised as the problem of 'result type parameter clause'. Finally, we can even replace current syntax of generics with this <some T> syntax.

Currently, 'generic result type' is allowed as badCreatePi. However, this usage of generics is not recommended (as this), and goodCreatePi is better API.

func badCreatePi<T: ExpressibleByFloatLiteral>() -> T { 3.14 }

func goodCreatePi<T: ExpressibleByFloatLiteral>(type: T.Type) -> T { 3.14 }
// corresponds to
func goodCreatePi<some T: ExpressibleByFloatLiteral>(type: T.Type) -> T { 3.14 }

<some T> syntax does not provide stand-alone 'generic result types', and so that they cannot express badCreatePi, while <some T> syntax supports goodCreatePi. As this, we can do all 'good' things with this <some T> syntax.

So, I'd like to suggest named some syntax like <some T> which fully supports generic parameters, opaque result types, and generic result types with the metatype arguments, should replace current syntax of generics.

Tino · February 22, 2022, 1:56pm

I don't like the idea of having two completely different ways to declare constraints, but I think this flexibility isn't hard to achieve:

func test<T>(input: some Collection<.Index == T) -> some Collection<.Index == T> { ... }

Edit: You may have to add some more some to match the requested scenario... it would help to have a full example

hborla · February 22, 2022, 4:26pm

An example of "a pair of opaque [result] types that are statically the same type" is already included in the post:

There is no way to express this without naming the opaque type, or having some other way to refer to it, in order to use it twice in the tuple.

Ben_Cohen · February 22, 2022, 5:02pm

Part of the problem here continues to be the lack of any good example of when these constraints on opaque result types would be useful. Holly's post showed an example for the purpose of discussing syntax, but it is still artificial because we don't have that good canonical real world example.

Constraining an index to Int is not a good pattern in practice and not something you'd want to do when returning an opaque collection. There are also plenty of examples involving returning a tuple of collections where the collections happen to be the same. For example, unzip, or a method that splits a collection into two for even and odd elements. But in those cases, the fact those collections are the same type isn't necessarily useful to relay back as part of the result.

It would be a great service to the community if someone came up with a real example that could be used in these discussions.

edit: in fact unzip isn't even a good example because while the collection generic types might be the same, the elements would be different.

hborla · February 22, 2022, 5:03pm

ExFalsoQuodlibet:

It seems to me that these requirements can only be satisfied by explicitly naming each type , by assigning them a named parameter. But this "assignment" could be done differently, for example by declaring the name of the parameter in parentheses. This code
func test() -> <C> (C, C) where C: Collection, C.Index == Int { ... }
could be written like this:
func test() -> (some Collection(C), some Collection(C)) where C.Index == Int { ... }
This way the C.Index == Int constraint could be added in a second step without rewriting the whole function signature. This would also have the added benefit to constrain the 2 collections to have the same type.

I definitely like the benefits you outline here. There's something unsetting about not naming the type parameter upfront, before its use in the return type, but I'm having a hard time articulating why and I'll admit it's probably due to my familiarity with the "usual" way of declaring type parameters.

However, I do think this syntax suffers the same problem of creating two very similar ways of accomplishing exactly the same thing for input type parameters. For example, I'd assume that both of these signatures would be allowed:

func evenValues<C: Collection>(in collection: C) -> [Int] where C.Element == Int

func evenValues(in collection: some Collection C) -> [Int] where C.Element == Int

I'm having a hard time justifying having two different syntaxes to accomplish the same thing when both of them are equally verbose.

rauhul · February 22, 2022, 5:03pm

Maybe one use case could be constraining Protocols that haven't adopted "primary" associated types?

hborla · February 22, 2022, 5:18pm

It's difficult to talk about use cases in the abstract - the ask here is a concrete, real world code example that would make use of these more advanced constraints that you cannot express today, and that would also not be covered by the primary associated types pitch.

Tino · February 22, 2022, 5:56pm

Maybe

func test<C: some Collection> -> (C, C) where C.Index == Int { ... }

?
(although I wouldn’t like this, because it changes the meaning of the angle brackets)

ExFalsoQuodlibet · February 23, 2022, 8:02am

hborla:

However, I do think this syntax suffers the same problem of creating two very similar ways of accomplishing exactly the same thing for input type parameters. For example, I'd assume that both of these signatures would be allowed:
func evenValues<C: Collection>(in collection: C) -> [Int] where C.Element == Int

func evenValues(in collection: some Collection C) -> [Int] where C.Element == Int

I also don't find it great to have 2 ways of accomplishing the same thing, but it might be a natural and necessary stepping stone when a language is moving from a certain syntax to another for expressing something.

Opaque types in general suffer from this issue, because

func foo<C>(_ collection: C) -> Int where C: Collection

is equivalent to

func foo(_ collection: some Collection) -> Int

the spelling is not very similar, but we still have a situation where the exact same thing is expressed in 2 different ways.

From the early days of Swift, expressing generic parameters in generic contexts has been done with the familiar (to programmers used to working with generics) syntax of "named parameters in angle brackets", which was a good choice, of course. But over the years things have changed, and we realized that we needed:

a better way to teach (and learn) and progressively disclose generics;
reverse generics;

If, for example, the need for reverse generics arrived earlier, and the core team pushed for it in 2015 or 2016, we would probably ended up with angle brackets type parameters in the return position. And if, hypothetically, we found ourselves now in the same situation of discussing generalized constraints on some and any declarations, we would have had 2 pieces of syntax to replace instead of 1. So, in a sense, we're lucky to have discovered that opaque types could be a better way to express generics before burdening the language with one more piece of syntax that could be replaced with a better alternative.

In a "full angle brackets" world, a function could be written like this:

func foo<InC>(_ collection: InC) -> <OutC> OutC where
  InC: Collection,
  InC.Element == Int,
  InC.Index == Int,
  OutC: Collection,
  OutC.Element == Int,
  OutC.Index == Int,

I find this syntax familiar and useful, so I wouldn't take any issue with it, but I have to admit that the "no angle brackets" alternative looks better:

func foo(_ collection: some Collection<Int> InC) -> some Collection<Int> OutC where
  InC.Index == Int,
  OutC.Index == Int

In both cases, if one fully adopts the syntax, they can achieve a progressive inclusion of type constraints without major changes to the function signature. But the second syntax has extra powers:

it scales well to existentials;
it's easier and cleaner to adopt for simple cases and/or multiple generic parameters.

It also has a downside (not sure how major it is): with the first syntax, the difference between a generic parameter whose specialization is decided by the caller (normal generics) and a parameter whose specialization is decided by the function (reverse generics) is 100% explicit. The second syntax doesn't explicitly make this distinction. For example, consider a decode function:

func decode<T>(_ type: T.Type) -> T where T: Decodable

this function has a generic parameter in return position, but the specialization is decided by the caller. How would this be translated in the second syntax? Well, maybe

func decode(_ type: (some Decodable T).Type) -> T

which seems less clear than the regular version (but maybe just because I'm used to the regular one), and must use the plain T type parameter for the return value in order to clearly express the relationship.

So, there are pitfalls to consider if we go the path of "no angle brackets".

I don't personally think that the fact of having 2 similar ways to express one thing for a certain period of time in the language evolution will be a major issue, but there could some issues lurking around in the "no angle brackets" world. On the other hand, I don't think it's desirable to limit the some/any syntax to a watered down version of the regular generics one, because we would still:

need to implement a fully-featured reverse generics syntax;
be limited in the way we can express constraints on existentials;
need to completely replace the signature of a generic function in case we need more specific constraints.

So up to this point it feels that the ability to somehow name opaque and existential parameters in order to express complex constraints is the way to go (and I'm noticing that reads very well too if you read it out loud ).

DevAndArtist · February 23, 2022, 11:43am

Never mind these questions

Was it anywhere mentioned why the reverse generics aka opaque types require the generic type parameter to be defined after the arrow?
Why would it be illegal to extend the existing list of generic type parameters and simply lookup their final position (if it's not used in parameter position, it defines an opaque type)?

// let's take this example for a spin
func foo() -> some Hashable

// why do we prefer this general syntax form
func foo() -> <T: Hashable> T
func foo() -> <T> T where T: Hashable

// over this one?
func foo<T: Hashable>() -> T
func foo<T>() -> T where T: Hashable

Do we need that extra list at the end of the function for something else?

jayton · February 23, 2022, 11:46am

This already has a meaning in Swift. The specific example can’t be implemented, but if Hashable had required initializers it could.

DevAndArtist · February 23, 2022, 11:48am

I don't understand, what are you trying to even answer here? I picked a random protocol from the top of my head to not use imaginary P or Foo protocols. This seems irrelevant to any of my questions.

Last time I checked, swift does not allow you to specify any generic type parameters in return positions without them being in any parameter positions.

EDIT: Oh wait, never mind my questions. You could infer the generic type like that from the outside. Yeah I guess I get why we need the secondary list at a different position.