Lifting the "Self or associated type" constraint on existentials

Karl · September 13, 2020, 2:47pm

This topic seems to have gained some traction across a couple of threads, so I felt like writing out my opinion on where I think existentials should go in Swift in one place.

I'd really like us to remove this restriction, but more broadly, I think we need to drastically cut ("focus") what you can do with existentials, deprecate certain uses, and revise the syntax.

Summary: Existentials have only 2 real use-cases: storage of an erased value, and inout function parameters which may assign the object to a different type. There is a reason both of these are related to type-flexible storage: boxing is a storage concept. Existentials are not generics, and they have no place offering pseudo-generic interfaces. As such, existentials as function input parameters should be deprecated, and replaced behind-the-scenes with generic type parameters. This naturally allows the use of protocol requirements with Self or associated types. The stripped-down existentials should be given a new name to more clearly say what they do.

EDIT: And one more use-case: intentional erasure to hide implementation details (i.e. function/property return types). Again, this only exists so clients know how to handle the value whose type and layout are unknown (e.g. do I need to retain it?); the value should be unboxed when you actually access its functionality.

What is an existential?

If I write a function which takes an existential argument:

func takesExistential(arg: MyProto)

Many developers might think this function is generic, but it isn't - its argument has a single, concrete type: namely, a box containing some other value that conforms to MyProto. That box knows how to forward some methods, but it's just a box - it doesn't (shouldn't) conform to MyProto. Conformance creates an "is a" relationship between two types: an Array "is a" Collection, but it would be conceptually inaccurate to say that the box "is a" MyProto, or the box "is a" Collection. It holds a thing that really does have an "is a" relation, but the box itself doesn't. A dog house isn't a dog.

The reason that developers might confuse this with a generic function is that Swift automatically boxes values for you. You can call this function and pass any conforming type as the argument, and there is no obvious difference between it and a bona-fida generic function:

protocol MyProto {
  func doSomething()
}
extension Array: MyProto { func doSomething() { ... } }
extension String: MyProto { func doSomething() { ... } }

func takesExistential(arg: MyProto) {
  arg.doSomething() // Looks like generic code.
}

takesExistential(arg: [1, 2, 3])
takesExistential(arg: "smells like generic code")

To really see the difference, it helps to examine what happens if you declare the parameter as inout: suddenly, the compiler seems unable or unwilling to do the boxing for you:

func takesMutableExistential(arg: inout MyProto) {
  arg = "weren't expecting me, were you?"
}

var array = [1, 2, 3]
takesMutableExistential(arg: &array) // Error (no automatic boxing).

var boxedArray: MyProto = array // Manual boxing.
takesMutableExistential(arg: &boxedArray) // Works.

By making the parameter inout, I don't just allow calling mutating methods (like replaceSubrange) on the contained value - I actually make it possible to swap that underlying value with something of an entirely different type. I'm passing a mutable box, not just a mutable value.

This illustrates a really important thing about existentials: when used as parameters, they only provide forwarding, not generics. Every time you interact with an existential, you are interacting with a box, not the contained object. This is the cause of all our current limitations on existentials.

The only time an existential is appropriate is when the type of its contained value is flexible. Whenever the existential is immutable, even if for a limited scope (such as when it is passed as a read-only argument to a function), its type is fixed, and we can unbox its underlying value as some local generic type.

Step 1: Automatically unbox existentials when calling generic functions

One of the major issues with existentials is this "self-conformance" issue: i.e. that because the box doesn't conform to the protocol (it only looks like that because it can forward function calls), it can't satisfy generic constraints of the form <T: MyProto>.

func takesExistential(arg: MyProto) {
  takesGeneric(arg: arg) // Error: existential 'MyProto' does not conform to 'MyProto'
}

func takesGeneric<T: MyProto>(arg: T) {
}

IMO, the solution for this is not to introduce additional magic conformances for the box. Adding self-conformances means the existential box could be passed in to generic functions directly (i.e. T.self == MyProto.self or something). This could mess up a lot of code which depends on a fixed set of conformances, as well as any code with fast-paths for known types (if T.self == Float.self, etc).

Instead, I believe we need to automatically unbox existentials to retrieve their contents, which do satisfy those constraints, and pass the type of whatever is inside the box as our T.

func takesExistential(arg: MyProto) {
  let unboxed: <T: MyProto> = unbox(arg)
  takesGeneric(arg: unboxed)
}

func takesGeneric<T: MyProto>(arg: T) {
}

My understanding is that this is doable: that at an ABI-level, unspecialised generic code really accepts a value pointer and witness table as parameters, and an existential is basically a value (or pointer to a value) plus a witness table.

The real difficulty is how to propagate type information outside of takesExistential. The unboxed generic parameter <T> only exists within that function, and any generic types we create using that type, like an Array<T>, don't make sense to code on the outside:

func takesExistential(arg: MyProto) -> ??? {
  let unboxed: <T: MyProto> = unbox(arg)
  return takesGeneric(arg: unboxed) // What is this type? An Array of... some type.
}

func takesGeneric<T: MyProto>(arg: T) -> Array<T> {
  return [arg]
}

I believe that this kind of situation, where one wants to propagate a generic type out of a function, is roughly what opaque types are supposed to do. So takesExistential could return an Array<some MyProto> - that is to say, an Array whose Elements all have the same, unboxed type, but we don't know what that type is beyond that it conforms to MyProto. I'm not sure if opaque types as they are implemented today can capture that, but we need some way to express the type I just described (an unknown, unboxed type which satisfies some bunch of constraints).

That is infinitely better than our current solution of Array<MyProto>, whose Elements are actually individual existential boxes and thus may be different underlying types (in our example, let mixed: Array<MyProto> = [1, "two", 3, "four"] is valid). Constraining these would allow things like conditional conformances, including to Equatable, Hashable and Codable, etc. They will need their own kind of box so they can be passed around by the parent code, but it will be the entire Array<some MyProto> in the box, rather than the elements being boxes.

Ultimately what this does is remove existential function arguments as a substitute for generic code. They're no good at it; that's why real generics exist. If we had some way to unbox them, or unboxed them automatically, we would always have a way to access the full API of an erased value - even if we could only reason about type relationships within a limited scope.

Step 2: Deprecate existential function arguments, replace them with generic functions under-the-hood.

Now that we have access to a value's full API through generics, we really don't have any need of read-only existential function parameters any more. Legacy code still has to be supported, but any future code like:

func takesExistential(arg: MyProto) {
}

Should trigger a warning and be automatically duplicated as generic functions, which is what modern compilers will generate calls to. Since existentials automatically get unboxed, this should be a source-compatible change.

Step 3: New spelling for existentials

Existentials now only serve 3 remaining purposes:

Function parameters where the function may reassign the value to an instance of a different type.
Storage (e.g. in instance variables) of unknown types.
Intentional erasure to hide implementation details

All of these are related to having type/layout-flexible storage, not to abstracting functionality, which I think is a good sign. We can introduce a new spelling to emphasise this - perhaps Any<MyProto>. These types would have no API; you can only create them and unbox their values. We might even consider making them invalid as read-only function parameters, since you'd have to unbox in to a generic scope to use the value anyway.

Step 4: New spelling for generics

The arguments are summarised well here: Improving the UI of generics, but our UI for generics could certainly use some improvement. With existentials no longer providing their pseudo-generic interfaces, it becomes more important to improve our generics syntax.