Lifting the "Self or associated type" constraint on existentials

We already have type erasure in the language today, via Any (i.e. you could have a function parameter or stored property typed as Any, and dynamically check if it's a String or Int), and it doesn't seem to be a problem -- people very much like the ability of Swift to statically enforce proper typing of data. The point of using protocols as types is to reduce the need for dynamic checks or strong coupling by allowing more precise forms of erasure.

in other words: if you can better express and work with your types in a generic/existential context, you'll be able to do more at that level of abstraction and shouldn't need to downcast to concrete types as often.

As for compiler optimisations, there is no intrinsic reason an existential must be less optimisable than a generic parameter. For example, we have a (relatively new) ExistentialSpecializer optimisation pass which transforms functions of the form func f(_: P) to func f<T:P>(_: T), at which point our other generic specialisation infrastructure can take over. https://github.com/apple/swift/blob/a4103126b309549166182744265991c9a4db2819/lib/SILOptimizer/FunctionSignatureTransforms/ExistentialSpecializer.cpp

Protocol existential self-conformance is a massive issue, though. I'll come back to this when I have some time to elaborate, but basically, I think it's a flaw in the design of the type-system. The type of an existential should be some kind of type-existential (i.e. "any type which conforms to P") rather than the protocol type P itself.

Opaque result types are a great grooming tool for API authors, but they wouldn't solve my most pressing need for beefed-up protocol existentials, which is that I sometimes need to store something which may be populated by a variety of different types (e.g. var myThing: Collection where Element == String). Generic parameters are awkward for this - if this was inside a struct, MyStruct<X> and MyStruct<Y> would have different layouts and could not be substituted.

1 Like

Yeah, passing an existential as an argument is isomorphic to passing a generic argument with its own type variable—anything that's possible with one representation is possible on the other. We've even discussed giving them the same underlying ABI.

In cases where you have a single existential value like this, it should be possible to automatically open the existential and pass along the value inside as an argument that's required to conform to the protocol, rather than requiring the existential itself to conform. Would that address your use case?

The trick is that protocols can only resiliently add new requirements, and if they do so, they need to provide defaults for existing conformances. An example when it comes to hashing would be the transition from hashValue to hash(into:)—we added the latter to support incremental hashing, but still provide a default implementation for types that only provided hashValue.

Using an existential does not require anything about the existential type conforming to the protocol. It's true that, since protocols can resiliently add new members, an existential based on a public resilient protocol would not "self-conform" unless explicitly promised to do so, when we add that feature. However, that's not because of associated types specifically but because of contravariant self/associated type requirements. If a protocol requirement takes a value of Self type or of an associated type as an argument, then there isn't any automatic way to generalize from the implementations for individual types to an implementation that works dynamically with heterogeneous types.

1 Like

I suppose it's too late for big ABI changes like that, but it would be cool if they were literally identical. The generics infrastructure seems far more fleshed-out (e.g. @_specialize would be cool for existentials, too).

The big win that we'd get from relaxing the discussed restrictions would be for heterogeneous storage. We need the existential boxing to give a uniform layout regardless of the contents, but if that heterogeneity involves PATs, currently we lose that information.

Opaque result types don't help with that problem - especially since the proposal involved them always resolving to a single type (e.g. func foo() -> opaque Sequence where Element == Int cannot return an Array<Int> on one path and a Set<Int> on another - just like you can't today).

Well, ABI only constrains the interface between separately-compiled, resilient components. We'll still have freedom to add optimizations within a binary. One nice optimization for returned existentials would be to be able to return them on-stack, avoiding heap allocation.

2 Likes

Hi @algal,

I think I'd like to answer this in reverse order:

I agree that the reference point of protocols in Objective-C was probably a source of confusion. However, I think the bigger source of confusion is that we have one thing called protocol that

  • plays two distinct roles that overlap significantly in their capabilities
  • can only support a fraction of its declared API in one of its roles

You can define the “existential type P” to be the most specific possible common supertype of all types conforming to the protocol P. [I don't think it's crucial that anyone understand why we use the word “existential,” FWIW. I consider that an artifact of nerdy language research whose explanation does almost nothing to illuminate the meaning of the word].

As for the fundamental reasons for this difference, it's what I said in the talk: capturing an instance of type T as an existential type erases type information, and in particular, type relationships. If you work through a couple of examples with a protocol like:

protocol P {
   init()
   associatedtype A
   func f(_: A) -> A
}

you'll see that the most specific common supertype of types conforming to P has no usable API. The compiler can't provide a working init() because it has no way to know which subtype to create. I suppose it could provide a trapping init(). But it can't even provide a trapping f because that would have to take a type that is a subtype of every conforming type's A and return a type that is a supertype of every conforming type's A. In a world where Never was a true bottom type, that could be func f(_: Never) -> Any, but of course even that doesn't satisfy the requirements for f, which say it has matching parameter and return types.

The proposal to generalize existentials says that we'll have the compiler figure out which part of a protocol's API can be used on its existential type without violating the soundness of the type system, and simply forbid the use of the other parts when handling the existential. From my point of view, without an explicit declaration from the programmer that certain parts are intended for use on the existential type, the author, users, and maintainers of a protocol have to do some fairly subtle reasoning about type soundness to understand what they are getting.

10 Likes

Sure users want the limitations lifted, but most of them don't understand that some of the limitations are inherent, and generalized existentials won't eliminate the pain.

I loathe writing type-erasing wrappers as much as the next guy. But generalized existentials don't eliminate the need to write them, nor (AFAICS) do they make it much easier to do so. An existential Collection type would not remotely replace AnyCollection, because that type would not conform to Collection. In fact, most basic things we expect from a Collection would not be available: you can't index it, and first() could at best return Any?.

Nobody has proposed a set of features really targeted at the pain points you cite; that would be a really interesting exploration (I suspect the answer has at least as much to do with solving what I call “the API forwarding problem” as it does with existentials). Even if what is being proposed makes incremental progress toward solving those problems, though, I fear that in the specific place we arrive, having applied that increment, we'll have done more harm than good as explained in my opening message.

I totally understand we have problems with the limitations on existentials, but before we charge ahead lifting limitations IMO we have to consider how the whole language works and fits together once we've done that. The future I see when I do that mental exercise has what I consider some serious new problems—problems that aren't addressed by simply saying “the limitations are bad so we should lift them.”

11 Likes

No it is not. Please see the example P I gave in this post, along with a complete explanation of why the existential type P can never support the API required by protocol P.

they will be drawn toward the syntactically lightweight use, which I think is more often inappropriate.

I would argue that points to a flaw in the language that should be solved

I agree; it should be solved if possible, but…

rather than artificially imposing pain on users because we think we know better.

…the pain is not artificial; AFAICT it's inherent, as I hope my example shows.

It doesn't go nearly far enough.

I agree again. We should take a good look at the actual use cases that create what you call a “mile-high wall of overhanging granite” and design language features (or write educational articles, if that is more appropriate to our analysis) that address those needs. Incrementally chipping away at the restrictions on existentials does not necessarily seem like it leads to a good answer, and in the meantime, as I have mentioned elsewhere, just doing that could leave us with some serious new problems.

IMHO these concepts are impossible for mortal programmers to grasp because the bar for working with them is so high.

I don't think so. The problem is simple: we've created a confusing language feature by making the creation of existential types implicit. There are three sources of confusion that I know of:

  1. Sometimes when you declare a protocol, you also get an existential type that you can use to handle instances of any conforming type. Sometimes, though, depending on details of how you declared the protocol's interface, you don't get an existential type.
  2. Even when you do get an existential type, it doesn't conform to the protocol.
  3. Also, some parts of the protocol's API may be unavailable on the existential type (rare today, but true for extension methods with Self arguments).

Generalizing existentials in the way proposed means that the compiler would no longer bite you right away when you try to use an existential type; hooray! But instead, the compiler will bite you when you try to use the parts of the API that today are preventing us from creating the existential type. That takes away confusion #1 but compounds confusion #3. That could be a much worse place to be than the situation we have today for reasons cited in my first post.

I'm also not sold on the idea that existentials must impose a performance cost; maybe static compilation makes some form of specialization for an existential (aka non-generic) function impossible but is that true for all cases or just some (If we were to admit JIT ala JS then it is definitely possible to provide dynamic specializations of a function, switched based on argument type). I'm not saying Swift could, would, or should do this, just muttering questions out loud.

Just some: when the compiler can “see” all the types involved, it can use the information (e.g. knowledge that two types must be the same) to optimize code just as well as if that constraint were captured in a generic. But that can only happen under special conditions and trying to broaden the cases that get optimized usually requires optimizer heroics (a big development investment) and increased compile time (which is bad for end users). To be fair, most cases with resilient generics will not optimize well, either.

So, yes: using existentials where you could use generic constraints implies a performance cost in the general case. Applied without careful discrimination over the majority of Swift programs, it cannot help but be significant. (And, yes, Swift should probably have a JIT)

I expect the addition of opaque result types to satisfy the majority of needs for which people want to use existentials today

It might solve many of the cases where protocol methods return Self but that is only a subset (I'll admit I don't know how big that subset is in realistic codebases; maybe it is far larger than I think?).

? I don't know of any special applicability to protocol methods that return Self. Opaque result types cover all cases where an existential would be used purely to avoid exposing the actual type of a return value as API.

5 Likes

It's not a problem because nobody expects Any to have usable API.

people very much like the ability of Swift to statically enforce proper typing of data. The point of using protocols as types is to reduce the need for dynamic checks or strong coupling by allowing more precise forms of erasure.

Yeah, sure. But under the proposal, Swift won't let you be “precise” about which APIs get erased on the existential type. It will decide for you, based on some rules of type soundness that most people can't and/or legitimately don't want to understand. That problem exists on the margins today, and will get worse under the proposal.

As for compiler optimisations, there is no intrinsic reason an existential must be less optimisable than a generic parameter.

Yes, for all practical purposes, there is. Not in every case, but in the general case.

For example, we have a (relatively new) ExistentialSpecializer optimisation pass which transforms functions of the form func f(_: P) to func f<T:P>(_: T) , at which point our other generic specialisation infrastructure can take over. https://github.com/apple/swift/blob/a4103126b309549166182744265991c9a4db2819/lib/SILOptimizer/FunctionSignatureTransforms/ExistentialSpecializer.cpp

And yet it can't produce optimal code for func f(_: P, _: P) if it would otherwise have been written as func f<T:P>(_: T, _: T).

Protocol existential self-conformance is a massive issue, though. I'll come back to this when I have some time to elaborate, but basically, I think it's a flaw in the design of the type-system. The type of an existential should be some kind of type-existential (i.e. "any type which conforms to P") rather than the protocol type P itself.

That begins to get at the root of the problem I am talking about. The point of confusion is that for many protocols P, the existential P cannot satisfy the requirements of P. Note that this is inherent as long as you allow init requirements, for reasons I have outlined in another post.

Opaque result types are a great grooming tool for API authors, but they wouldn't solve my most pressing need for beefed-up protocol existentials, which is that I sometimes need to store something which may be populated by a variety of different types (e.g. var myThing: Collection where Element == String ). Generic parameters are awkward for this - if this was inside a struct, MyStruct<X> and MyStruct<Y> would have different layouts and could not be substituted.

I can't quite visualize your example “inside the struct.” Care to elaborate? Also I don't know what you mean about generic parameters being “awkward for this;” I'd have thought they simply wouldn't work in the case described. But yeah, I agree that opaque result types wouldn't solve your problem, which requires either type erasure or maybe rethinking your approach at a higher level to make type erasure unnecessary (not claiming the latter is possible, BTW).

2 Likes

Do I understand this correctly: In a world where we have generalized existentials some function func foo(_: P) can never instantiate a new instance because P can be any conforming type in the program, thus an init() requirement can never be satisfied. Hence P doesn't conform to P?

A protocol with type parameters (Self, init() which can be thought of as func init() -> Self, and associated types) is incomplete by definition. Working with these kinds of existentials has some inherent language design issues that need to be addressed.

I'm coming around to your way of thinking, re: an explicit annotation on the protocol or a different spelling like Any<P> that highlights the differences.

I'm not convinced that this is as bad an option as you see it to be. In my eyes, it moves any errors or issues closer to the places where they're relevant and therefore easily explained. For example, rather than being presented with a message that a protocol can't be used as an existential, you're instead presented with a message that a particular method can't be called since the type is unknown (or else is erased to Any where possible). That then naturally leads to imposing more specific constraints (i.e. the <T : Protocol> syntax) to get to a point where that information is available.

Fundamentally, this proposal just shifts the error closer to the actual problem (i.e. you don't know what the type is) while enabling use-cases that currently require cumbersome workarounds.

9 Likes

I would agree with @dabrahams that such an eventuality would be strictly, and significantly, worse than the status quo for the reasons below. This is why I said upthread that lifting the "Self or associated type" restriction should happen only in tandem with significantly improved diagnostics that allow users to avoid the scenario you outline above.

Certainly there are some "use cases that currently require cumbersome workarounds" that would be enabled by lifting the restriction, and I would very much like to be able to enjoy that functionality. However, @dabrahams outlines above why lifting the "Self or associated type" restriction won't actually enable or enhance a large portion of use cases that people have mentioned even in this thread, such as an existential collection type replacing AnyCollection.

What we often see in the "Using Swift" portion of these forums is that users reach for existential types when they should be using generic contraints--"should" not merely for performance reasons, but because they truly do not need or intend for any type erasure and often do intend to access APIs that require the type relationships being erased. That they run into the "Self or associated type" restriction now and would run into the "method can't be called" issue in the future is not the actual problem but only a symptom of that problem (i.e., using existential types instead of generic constraints).

Today, users are told upfront of this fact if they are dealing with a protocol with Self or associated type constraints. Without the "Self or associated type" restriction, then, more uses of existential types by the typical user would fall into the category of problems that would be best served by features other than existential types. This becomes even more so the case if/when opaque types and other enhancements are added to the language. Given the limited extent to which intentional use cases would actually be enabled by lifting this restriction, one must be careful that it's not outweighed dramatically by the extent to which unintentional use cases would be encouraged--and "unintentional" here referring not to the intentions of language designers but to the intentions of the user who actually does not want or might not even know about the type erasure that's going on.

One component of solving this problem might be to change the spelling so that Any<P> (or, for reasons that will become apparent below, I'll use an alternative strawman syntax Existential<P>) rather than P is the existential type. The goal here is to reduce as much as possible the scenario where users reach for existential types without even realizing that they are doing so. I have to admit that, even after years of working with the language, I still catch myself sometimes unintentionally using an existential type when I meant to have a generic constraint!

A spelling such as Existential<P> neatly avoids the baffling situation that "P does not conform to P," since even on visual inspection it's clear that Foo<Bar> has no reason to conform automatically to Bar.

I'd imagine it could then be possible for authors to conform Existential<P> to P by manually implementing the necessary methods in an extension (i.e., extension Existential where Protocol == P). (If the existential type were to be spelled Any<P>, then extension Any where Protocol == P would naturally prompt the question of whether one can extend Any without constraints, which is a different topic altogether best avoided here.)

5 Likes

An initializer or static method requirement doesn’t technically preclude a protocol type from self-conforming as long as there’s at least one conforming type: the protocol could just pick that type and construct it / call the method on it. But if that type isn’t unique (which is reasonable to assume a priori), picking one type in specific would be an arbitrary choice, so as a policy matter it doesn’t make sense to allow it. So sure, maybe with an annotation it could be done if there’s really a reasonable default that wouldn’t cause more confusion than it saved.

Don’t think about it in terms of a function that takes an actual value of the protocol type. Think of a generic function over T: P. What actually happens if you use a particular requirement when T is dynamically the protocol type P itself?

3 Likes

I haven't had time to properly digest and contemplate the argument @dabrahams is making yet so nothing I say here should be considered as a direct response to that. However I do want to point out now that the statement above is simply not true.

Lifting the restriction would significantly simplify designs that store type-erase values and use various dispatching strategies to interact with the existential. The current workarounds I'm aware of rely on introducing an additional protocol which can be used as an existential and dispatching through that. Lifting the restriction would allow storage, casting and dispatching to happen directly on the PAT protocol itself which would streamline designs significantly. I have kept this pitch in mind since it began and have already run into several use cases where it would be extremely handy.

I don't have an opinion on this syntactic change yet but I don't buy the argument that it will reduce accidental use of existentials. The reason users often reach for existentials is because many programmers are most familiar with Objective-C protocols or interfaces from other object-oriented languages. The will reach for a tool that feels familiar in this way regardless of the syntax used to invoke that tool.

Allowing extensions on the existential would be a really useful way of allowing existentials to conform to protocols (including their defining protocol). On the other hand, allowing extensions on existentials could introduce significant confusion between those and protocol extensions. I think we need to study the use cases and consider alternative solutions closely before heading too far down that path.

7 Likes

Yes, indeed, I too am very excited about the fact that lifting this restriction would significantly simplify designs that store type-erased values. However, that does not change the fact that many uses discussed above do not fall into the category of things that would be simplified by lifting the "Self or associated type" restriction. Just quickly scrolling through some of the items that people mentioned here:

  • @karim mentioned Equatable conformance for existential types: lifting the restriction would not allow that
  • @dmcyk mentioned not having to create type-erased boxes: lifting the restriction would not allow that (for reasons @dabrahams outlines above)
  • @Karl mentioned replacing AnyHashable with a thin wrapper around Hashable: lifting the restriction would not allow that
  • @rbishop mentioned self-conformance for existential types: lifting the restriction would not allow that
4 Likes

That's a nice positive story to tell ourselves about the potential outcome, and I might even be inclined to believe it if we had a list of actual use-cases that would demonstrably become much nicer (hint, hint).

The problem is, if it just (as you say) “shifts the error”, then it necessarily doesn't remove the fundamental source of confusion around protocols and their existential types. If what @xwu says is true, that most people run into this wall in places where even a generalized existential would be a poor fit for the use case, then the wall—as frustrating as it might be—is actually performing a valuable service.

5 Likes

Although @John_McCall described a special case where the init() can reasonably be satisfied, and ways, where—usually—it can be satisfied arbitrarily (but IMO unreasonably), in the general case, IIUC when no types conform to the protocol, it can't be satisfied.

Anyway, although init() is simple to understand it might not be as much of a killer example as func f(_: A) -> A.

If your example would be enhanced by this proposal it would be very instructive to see how the code could be improved were the proposal accepted. I note, however, that the technique you showed is generally useful even where there are no existentials; I use it that way to deal with heterogeneous “collections” of similar items without type erasure, and I'm pretty certain my use case would see no benefit from generalizing existentials. I point that out because I think if it can be much better there's probably a more general feature that would benefit both of us.

If we are to believe all the grumbling we've heard about angle brackets, changing the syntax could easily be enough to make existentials not “feel familiar.” But doing that alone strikes me as a strictly punitive approach that I'd like to avoid. I would like to also address the fundamental confusion, increasing expressivity for protocol authors and comprehensibility and predictability for protocol users.

I'm keen to agree with @dabrahams here. The way I understand it, shifting the error can be worse than the status quo because the amount of refactoring that has to be performed in order to remove the error can be dramatically bigger.

Lifting the constraint wouldn’t indeed help to avoid creating type erased boxes, but it could somewhat simplify it when working with members that don’t use Self/associated types.
e.g. (primary for being able to mock things) I often write such wrappers:

protocol Foo {
  associatedtype Bar

  var x: Int { get }
  var y: Int { get }
}

struct AnyFoo<T>: Foo {
  typealias Bar = T
  private let _getX: () -> Int
  private let _getY: () -> Int

  var x: Int { return _getX() }
  var y: Int { return _getY() }

  init<K: Foo>(_ val: K) where K.Bar == T {
    self._getX = { return val.x }
    self._getY = { return val.y }
  }
}

Being able to use simple type members would decrease memory footprint of such wrapper and simplify the code.

Technically, is there a reason func f(_: A) -> A couldn't be mapped to func f(_: Any) -> Any with a runtime trap if the argument isn't of the expected type? By itself this isn't a very satisfactory solution, but I think it's close to what people would expect to happen. And then maybe there's way to improve on that by making the runtime trap clearly visible in the code like in ex.f(a as! ex.A) or something like that.