A Feasible Implementation of Type Disjunctions

aetherealtech · September 23, 2024, 12:48pm

Type Disjunctions are a commonly rejected proposal, but the recent addition of typed throws will, in (not just) my opinion, make the need for them much more pressing. I think it would be unfortunate to create a one-off solution that works only for typed throws but not for the general case, like what was briefly mentioned in this thread.

What I want to propose here is what, to me (admittedly with no knowledge of the Swift compiler), would be a hopefully easy way to get general type disjunctions. I believe it avoids all the problems that have been raised about type checker performance.

We only really need one new feature, which seems minor, to provide an underlying implementation of type disjunctions. That new feature is "closed" or "sealed" protocols.

A "closed" protocol is a protocol with the restriction that it cannot be conformed to outside its own module. Alone this is a useful feature, such as for a library that wishes to hand out instances of a protocol, perhaps as a some type, but does not intend clients to define their own conformances, e.g. Metal (eliminating the common “please don’t conform to this yourself” documentation, which also indicates the library is doing unsafe downcasts somewhere).

Since no type outside the defining module can conform to the closed protocol, there is no point in refining the protocol outside its module either (no one would be able to conform to this refined protocol), so that is also not allowed. This is important for other reasons I'll explain later.

But a restriction on a type has a corresponding new capability (forbidding X at design time means Not X is now known at design time). The new capability that pairs with being "closed" is exhaustive switching. If I define a closed protocol MyProtocol and, say, three conforming types struct MyStruct1: MyProtocol, struct MyStruct2: MyProtocol and struct MyStruct3: MyProtocol in its module, then the compiler knows an any MyProtocol instance is "inhabited" by one of these three concrete types. Correspondingly I should be able to exhaustively switch on one:

let value: any MyProtocol = getValue()

switch value {
  case let struct1 as MyStruct1: ...
  case let struct2 as MyStruct2: ...
  case let struct3 as MyStruct3: ...
  // No open-ended default
}

And not just from within the module, but anywhere where all the conforming types are visible (i.e. if they're all public everyone can exhaustively switch on it, otherwise that's only possible inside the module).

There is precedent for this feature in Kotlin, which supports "sealed" classes and interfaces and allows exhaustive visitation of them. This is in fact how in Kotlin devs write type-safe unions or variants... the same thing Swift supports through enums with associated values.

Speaking of which, while it is possible to use one to emulate the other (in both directions), type disjunctions and tagged unions, either nominal or structural (people here call the latter "anonymous sums" but this is incorrect, anonymous types don't have an accessible name, tuples do have an accessible name and so would the dual to tuples) are not the same thing, and a lot of confusion has been bred by conflating them. In my opinion type disjunctions are far more important than structural unions, and using the latter to emulate the former (I do this often with nominal unions by writing a type eraser for a protocol as an enum) is a hack to work around a missing language capability.

The above is the simplest example: an existential of a protocol with only concrete conformances and no refining protocols.

What if we add a refining protocol, like protocol MySubProtocol: MyProtocol? The important question is: is this protocol also closed? This might seem strange at first but I believe the default answer (no added keywords) should be no: that is, a protocol should not inherit "closedness" from protocols it refines. The consistent universal rule is a protocol is open unless it is explicitly defined not to be.

But this means introducing a (non-closed) refining protocol partially "re-opens" the base protocol: types outside the module can now conform to it but only by conforming to the refined protocol. What does this mean for switching?

What it means is to exhaustively switch, you must handle an existential of the refined protocol as one of the cases:

let value: any MyProtocol = getValue()

switch value {
  case let struct1 as MyStruct1: ...
  case let struct2 as MyStruct2: ...
  case let struct3 as MyStruct3: ...
  case let subProtocol as any MySubProtocol: ...
  // No open-ended default
}

Since the conformances to MySubProtocol are open-ended, the only way to handle all possibilities is to handle all conformances to MySubProtocol together. You can also handle concrete conformances to or further refinements of MySubProtocol but you then must handle the existential or have a default case.

I’m open to suggestions from others that “re-opening” a closed protocol this way shouldn’t even be allowed, but I believe it doesn’t really complicate things that much.

On the other hand, if a refined protocol also declares itself as closed, this problem does not exist, and the behavior of the top closed protocol cascades down to the refined protocol. Not only can we exhaustively switch on the refined protocol by handing all of its conformances or existentials of its refinements, we can exhaustively switch on the top protocol by handling all its conformances/existentials of refinements besides any MySubProtocol plus all the conformances/existentials of refinements of MySubProtocol.

For example, let's say MySubProtocol has two concrete conformances struct SubStruct1: MySubProtocol and struct SubStruct2: MySubProtocol, and one open refining protocol MySubSubProtocol: MySubProtocol. We can then exhaustively switch on MyProtocol like this:

let value: any MyProtocol = getValue()

switch value {
  case let struct1 as MyStruct1: ...
  case let struct2 as MyStruct2: ...
  case let struct3 as MyStruct3: ...
  case let subStruct1 as SubStruct1: ...
  case let subStruct2 as SubStruct2: ...
  case let subSubProtocol as any MySubSubProtocol: ...
  // No open-ended default
}

So in general, the list of all the types that must be handled to fully cover a switch is generated by walking the tree of subtypes (where concrete conformances are leaves and refinements are branches) and terminating a search down a branch when a non-closed refinement is encountered, instead adding the existential for that refinement to the list. A similar algorithm would determine whether a switch is really exhaustive.

This I believe covers the behavior of existentials for closed protocols. All of its behavior for anything besides switching should be exactly the same as with normal protocols. In particular even though we might imagine fancy stuff like the compiler looking for members with matching signatures on all conforming types and expose a way to call them through the existential (since there's a trivial way to implement that by exhaustively switching over the conformances), this shouldn't be supported. It's not consistent with Swift's nominal notion of requirements (i.e. a concrete type cannot implicitly conform to a protocol even if it fulfills all the requirements). You can either do that yourself as an extension or just add the member as a requirement on the protocol.

Speaking of extensions, closed protocols allow extensions that are effectively dynamically dispatched to be added to protocols from anywhere (visitation is really just dynamic dispatch on a hierarchy bolted on from the outside). You have to write the dispatch table yourself but that's better than not being able to do it at all. Another fancy feature we can think of is the compiler allowing you to write an extension with only a declaration and then it forces you to extend every conforming type to implement what is declared, which it then internally turns into the exhaustive switch you would otherwise write by hand:

extension MyProtocol {
  func doSomethingNew() // By adding this, the compiler will raise an error here until every type conforming to `MyProtocol` defines this function in an extension
}

extension MyStruct1 {
  // This will get called when an `any MyProtocol` holding a `MyStruct1` receives a `doSomethingNew` call.
  func doSomethingNew() {
    print("I'm a MyStruct1!")
  }
}

...

This would be cool, and (unlike the previous hypothetical feature) consistent with Swift's type system. But to me it's a bonus that can be added, if ever, later. I wouldn't miss it dearly if it's never added because I can accomplish the same thing by implementing the "base" extension with an exhaustive switch (and even dispatch to corresponding extensions on each concrete type), which isn't a ton of boilerplate and only has to be done once (the kind of boilerplate I don’t like is the kind that scales with how much you use something).

One more thing to say about existentials is to discuss the existential for the metatype any MyProtocol.Type. This should support exhaustive switching just like with the existential for the protocol itself:

let value: any MyProtocol = getValue()

switch type(of: value) {
  case let struct1 as MyStruct1.self: ...
  case let struct2 as MyStruct2.self:: ...
  case let struct3 as MyStruct3.self:: ...
  case let subStruct1 as SubStruct1.self: ...
  case let subStruct2 as SubStruct2.self: ...
  case let subSubProtocol as (any MySubSubProtocol).self: ...
  // No open-ended default
}

How could the compiler implement these existentials? Well the optimally efficient way would be to compile them to tagged unions (enums), so that the memory size of one is equal to the largest size of any of the possible inhabitants (plus the space needed for the type tag) and implement the members by dispatching on the type tag instead of jumping over to a witness table. But if this would make them too incompatible with other existentials, then they should just be implemented the same way existentials always are. Really this is an entirely compile time feature: marking a protocol as “closed” tells the compiler when it’s okay to accept a switch without a default because it knows one of the other cases will always get hit. The resulting compiled code doesn’t need to change at all (except for omitting the default branch). The extra knowledge just enables new optimization opportunities.

Now what about generics? How does a closed protocol interact with generics when it’s used as a constraint? I believe there's nothing different about them (and same for opaque types). Since they behave just like protocols except when switching on them, and you only switch on them when you have an existential and need to recover the underlying concrete type (which you already have in a generic), there's no difference in capabilities inside a generic. For a T: MyProtocol you have access to the requirements on MyProtocol, same as always, and that's that.

What's important is what that constrained type parameter means: it guarantees it can only be bound to one of a known closed set of concrete types or a type that conforms to a non-closed refinement. This is a restriction on how that generic code can be called, and it's a powerful guarantee to know generic code can only be invoked with one of a closed set of known types. Simply having that guarantee strengthens type safety, even if values simply pass through generic code. But the way this emerges is simply as a consequence of a constraint to a protocol that doesn't allow additional conformances. This means if a library exposes generic code, either functions or types (including other protocols with associated types constrained to the closed protocol), those functions/types cannot be used with anything except types defined within the library itself. This restriction alone can prevent incorrect usages of a library at compile time.

But since every restriction has a corresponding new capability, what is the new capability within this generic code, made possible by the fact you can enumerate all the possible concrete types the type parameter is bound to?

The answer is simply the same exhaustive switching. You can do this in generic code just as you can in code with existentials because a value of type T: MyProtocol trivially erases to an any MyProtocol. But while this isn't required and can be a later enhancement, this would be much more powerful if the type checker can track the restriction on the type parameter in switch cases. I think it would be better and easier on the compiler to add specific syntax for this.

For example, you should be able to exhaustively switch on a type parameter T, so that T gets rebound within the body of the switch case:

private var currentInt = 5
private var currentString = "Hello!"

func makeAValue<T: MyProtocol>() -> T {
  switch T:  {
    case MyStruct1:
      print("Making a MyStruct1")
      return MyStruct1(intValue: currentInt)
    case MyStruct2:
      print("Making a MyStruct2")
      return MyStruct2(intValue: currentInt, stringValue: currentString)
    case MyStruct3:
      print("Making a MyStruct3")
      return MyStruct3(stringValue: currentString)
    case SubStruct1:
      print("Making a SubStruct1")
      ...
    case SubStruct2:
      print("Making a SubStruct2")
      ...
    case MySubSubProtocol:
      print("Making some MySubSubProtocol")
      ...
    // No open-ended default
  }
}

struct MyStruct1: MyProtocol {
  let intValue: Int
}

struct MyStruct2: MyProtocol {
  let intValue: Int
  let stringValue: String
}

struct MyStruct3: MyProtocol {
  let stringValue: String
}

...

Notice the lack of .self, and of any on the last case. This is different than switching on the runtime metatype.

In the first switch cases, T is rebound to the corresponding concrete type (so within the case the function is no longer generic). If one or more parameters of type T were in scope they could then be passed to non-generic code taking the matching concrete type. In the last case of the refined protocol, T is rebound to a further constrained generic parameter T: MySubSubProtocol and can then be used to call generic code with that same constraint.

Really this is a more general enhancement to generics that is just as usable with normal open protocols (you'd just need an open ended default). But if this isn't feasible or problematic for any reason you can hack around it by falling back to erased existentials and force casts:

func makeAValue<T: MyProtocol>() -> T {
  switch T.self as any MyProtocol.Type:  {
    case is MyStruct1:
      print("Making a MyStruct1")
      return MyStruct1(intValue: currentInt) as! T
    case is MyStruct2.self::
      print("Making a MyStruct2")
      return MyStruct2(intValue: currentInt, stringValue: currentString) as! T
    case is MyStruct3.self::
      print("Making a MyStruct3")
      return MyStruct3(stringValue: currentString) as! T
    case is SubStruct1.self:
      print("Making a SubStruct1")
      ...
    case is SubStruct2.self:
      print("Making a SubStruct2")
      ...
    case is (any MySubSubProtocol).self:
      print("Making some MySubSubProtocol")
      ...
    // No open-ended default
  }
}

Here we're just exercising the same exhaustive switching capability on the existentials. The compiler can't tell this is type safe so we have to tell it to just trust us. I expect the earlier type safe example would just compile to this (possibly omitting the conditional check and fatalError pathway).

You can also do the same thing in a generic type:

struct Container<T: MyProtocol> {
  static var description: String {
    switch T.self as any MyProtocol.Type:  {
    case is MyStruct1.self:
      "Container for MyStruct1"
    case is MyStruct2.self::
      "Container for MyStruct2"
    case is MyStruct3.self::
      "Container for MyStruct3"
    case is SubStruct1.self:
      "Container for SubStruct1"
    case is SubStruct2.self:
      "Container for SubStruct2"
    case is (any MySubSubProtocol).self:
      "Container for MySubSubProtocol"
    // No open-ended default
  }
}

Here we don't even need that feature of rebinding the type parameter and can just use switching on the existential.

You can also do this with values of T. This allows the behavior of a generic type to vary with its type parameter in more significant ways that don't require offloading the variation to T itself. It's really just another use of visitation from the outside:

final class OptionsViewModel<T: MyProtocol>: ObservableObject {
  init(value: T) {
    self._value = value
  }

  private let _value: T

  var options: [String] {
    switch _value as any MyProtocol {
      case let struct1 as MyStruct1: options(for: struct1)
      case let struct2 as MyStruct2: options(for: struct2)
      case let struct3 as MyStruct3: options(for: struct3)
      case let subStruct1 as SubStruct1: options(for: subStruct1)
      case let subStruct2 as SubStruct2: options(for: subStruct2)
      case let subSubProtocol as any MySubSubProtocol: options(for: subSubProtocol)
  // No open-ended default
    }

    private func options(for value: MyStruct1) -> [String] { ... }
    private func options(for value: MyStruct2) -> [String] { ... }
    private func options(for value: MyStruct3) -> [String] { ... }
    private func options(for value: SubStruct1) -> [String] { ... }
    private func options(for value: SubStruct2) -> [String] { ... }
    private func options(for value: any SubSubProtocol) -> [String] { ... }
  }
}

Here again we don't even need the fancy feature of rebinding T and can just use existentials, although it would be nice to switch on T itself and have the compiler know we can pass _value directly to those private functions.

Aside from this feature of rebinding type parameters by switching on them, all of these added capabilities within generic code are simply taking advantage of the added capability on existentials. I don't believe any changes to generics specifically are warranted here. The feature of rebinding type parameters by switching isn't specific to closed protocols or exhaustive switching.

Swift could also handle non-open classes in the same way. This would just be a matter of treating non-open classes as equivalent to closed protocols, so that you'd be able to exhaustively switch on the subclasses. I consider this less important and wouldn't really miss it much if it never gets supported, but that's because I don't use class hierarchies all that often.

I believe that covers closed protocols. If anyone is unsure of why this is useful, I can provide copious examples. The lack of this feature is the main reason Swift devs gratuitously overuse enums and in doing so destroy significant amounts of type safety that could otherwise be used to validate business rules at design time (i.e. declaring a struct X with a corresponding XType enum added to the struct as a let type: XType member, and then doing runtime checks in code that requires specific "types" of X and fatalErroring or throwing errors on a mismatch. I see this sort of thing all the time. It’s devs creating their own type system the compiler can’t validate. But making X a protocol and then making a struct for each corresponding type loses the necessary ability to exhaustively switch on the type, which can only be dealt with using workaround boilerplate and tedious wrapping/unwrapping).

aetherealtech · September 23, 2024, 12:48pm

Now what does any of this have to do with type disjunctions, like Int | String?

Well that's just sugar for a closed protocol. If we had closed protocols we could implement this disjunction ourselves:

closed protocol IntOrString {}

extension Int: IntOrString {}
extension String: IntOrString {}

func doSomethingWithAnIntOrString(value: some IntOrString) { … }

I've done this sort of thing several times. To do anything useful with such a value I'd need to find all the protocols both Int and String conform to whose requirements I want to access and make IntOrString refine all those protocols. But I can't exhaustively switch (which might be all I really wanted to do, making the search for common protocols unnecessary). For that I have to wrap it in an enum which is again tedious wrapping and unwrapping (and the biggest reason why structural sum types aren’t the right tool for this). But by supporting closed protocols, I gain the ability to exhaustively switch and don't need to define the enum anymore.

The only problem is that this is a nominal protocol I introduced in my module. If someone else introduces the same one in their module, the two are incompatible. This actually isn't as bad as it first seems (or would be if this was done with the structural sums), because the concrete types of both protocols would conform to both. The incompatibility is apparent only when dealing with existentials or generic constraints, but that's certainly still important.

In fact this raises a question of whether the compiler should treat two closed protocols with the same set of conforming types as equivalent. It certainly could. But the answer is definitely no. That's implicit typing again, but Swift is nominally typed. Plus you wouldn't want clients coupling to an equivalence today and all their code breaks when a library adds a new conformance to its closed protocol (but the other library doesn’t).

What we want is to be able to define Int | String as a structural protocol instead of a nominal one, specifically to ensure it is equivalent to every other use of Int | String... but also to relieve us of the boilerplate protocol declaration and extensions, which if we write by hand we might write incorrectly (we might accidentally extend something else to conform to IntOrString, or forget to extend one of the two). It's exactly analogous to why we want to define Codable as Encodable & Decodable instead of as a nominal protocol we'd then have to explicitly conform things to.

So the compiler can implement Int | String as sugar over that handwritten closed protocol, it just needs to do it in a way that makes the appearance of this type in different modules equivalent types. I’m curious if this causes any implementation challenges that haven’t already been solved for type conjunctions. These protocols shouldn’t even need their own witness tables, and I would expect the compiled code to be equivalent to using the lowest common protocol existential of all the types in the disjunction. It’s just stronger type checking.

And similar to type conjunctions, the order or multiplicity of types don’t make a difference. Int | String, String | Int and Int | Int | String should all be treated as identical protocols by the compiler.

Perhaps the central insight here is we're not asking the compiler to directly support disjunctive type calculus, which is what I think the type theorists in the language team have pointed out causes a combinatorial explosion for the compiler implementation. This is rather built on existing features, namely protocols, with only the added ability to exhaustively switch on a set of types the compiler only needs to search through a single module to enumerate (in the case of type disjunctions it's even easier, the list is right there in the name). A type disjunction isn't some significantly unique type. It's just a protocol that is conformed to only by the types it lists out.

This also means it acts as a generic constraint "for free". I can write generic code where T: Int | String, and this isn't some special constraint that's different from normal constraints. It's just a constraint to a protocol, one that the compiler writes and doesn't allow anything except Int and String to conform to it. That would be huge, because we’ve all wanted to write two retroactive extensions with different constraints, which Swift doesn’t support. But recast those different constraints as a single one by defining a protocol to cover the two cases and now you can (and if you want different behavior in each one, you can exhaustively switch and then dispatch).

If anyone is worried that leads to combinatorial explosion of type checking, remember you can do this today by hand-writing the disjunction protocol:

protocol IntOrString: Codable, CustomStringConvertible, … {}

extension Int: IntOrString {}
extension String: IntOrString {}

extension Array where Element: IntOrString {
  …
}

In fact this is worse from a type checking perspective because multiple modules who need this would all write their own separate protocol and declare their own conditional conformance.

Now ideally the protocol the compiler writes for you when you declare Int | String refines all the protocols that both Int and String conform to, not just to have access to those requirements on an any Int | String but also to be able to upcast one to a, e.g., any CustomStringConvertible. This should be simple for the compiler to calculate. The potential caveat I can think of is conformances in extensions. If I'm in a translation unit where I see that both Int and String have been extended to conform to MyProtocol, then ideally the requirements of MyProtocol are now available on Int | String. Again it should be simple for the compiler to figure that out while compiling that translation unit. And extending concrete types to conform to protocols causes a witness table to be built, so I imagine things don’t change at all at runtime. The added ability to call a requirement on an any Int | String that was added by retroactive conformances to both Int and String would be the same thunk to the corresponding witness table, right? The compiler just needs to check for conformances by each type in the union to be sure that thunk doesn’t compile to UB.

In fact like I said I don't even consider it essential that the requirements on common protocols Int and String directly conform to (not in retroactive extensions) be available on Int | String. If I can just exhaustively switch on it I'm happy. That gives me access to common requirements anyways.

The example so far involves only concrete types. What happens if one of the types in the disjunction is a protocol? In particular, Int | any MyProtocol is not the same as Int | MyProtocol. What's tricky is neither of the underlying protocols these would translate to can be written in Swift (even ignoring the "closed" part) today.

Int | any MyProtocol is sugar for this:

closed protocol IntOrAnyMyProtocol {}

extension Int: IntOrAnyMyProtocol {}
extension any MyProtocol: IntOrAnyMyProtocol {} // Not possible

User defined extensions to language provided existentials is a critical feature Swift will have to gain if it wants to fully eliminate the hand-written type eraser (i.e. AnyHashable) problem. But while we can't write this today, I don't know if that means the compiler can't write it. That's really all Int | any MyProtocol is... a protocol conformed to by exactly two types, one of which happens to be a protocol existential. You exhaustively switch on those exact two cases.

The other option, Int | MyProtocol, is a problem for another reason. It would also be sugar for a feature I know has been discussed before: generic extensions, which are what retroactive protocol refinements are "really" trying to do:

closed protocol IntOrMyProtocol {}

extension Int: IntOrAnyMyProtocol {}

extension <T> T: IntOrAnyMyProtocol where T: MyProtocol {} // Also not possible

The difference is that while Int | any MyProtocol has exactly two conformances, Int | MyProtocol has conformances by both Int and every type that conforms to MyProtocol.

Well what's the difference really? Doesn't that just end up being the same as the first one? After all the set of concrete types that can inhabit either one are the same. Why not just make the latter an alternate spelling of the former?

Because it's only the existential that's equivalent. any Int | MyProtocol would indeed be the same as any Int | any MyProtocol. Exhaustive switching would be the same cases for both, and that's true even if MyProtocol were closed (allowing the cases to be Int and all the conformances or open refinements of MyProtocol).

Things are different in the other two ways that protocols can be used besides existentials: as opaque types and generic constraints.

As an opaque type, some Int | MyProtocol is not the same as some Int | any MyProtocol. The latter allows returning an existential, any MyProtocol, and the opaque type is just a wrapper for that existential. The former requires returning a known concrete type conforming to MyProtocol (or an Int of course), and the resulting opaque type wraps whatever that concrete type is. The former requires dynamic dispatch and existential boxing. The latter is a struct wrapping a known concrete type.

Switching on it would still require boxing the value in an existential. But if type disjunctions supported access to common requirements, and both Int and MyProtocol were subtypes of a common protocol with requirements not available on an existential (like a Self requirement), this requirement would be accessible on some Int | MyProtocol but not some Int | any MyProtocol. Correspondingly, MyStruct1 | any MySubProtocol would not refine MyProtocol because any MySubProtocol doesn’t conform to MyProtocol (existentials don’t conform to any protocols). But MyStruct1 | MySubProtocol would refine MyProtocol.

It's a similar situation with generic constraints. If a type parameter T is constrained to Int | MyProtocol, you would have access to all requirements on common protocols of those two, but if it were constrained to Int | any MyProtocol, you'd only have access to requirements on common protocols that are available on existentials. And you’d only be able to bind T: MyStruct1 | MySubProtocol to a T: MyProtocol but not T: MyStruct1 | any MySubProtocol.

Since the underlying code Int | MyProtocol is sugar for would require generic extensions, and I'm assuming we don't have those today for good reason, I think this sort of type disjunction should just not be allowed unless or until generic extensions are supported. I don't think that's a very painful limitation. Again the main thing I want is exhaustive switching, and that works the same on both of these. So the compiler can just tell you that you have to use existentials in type disjunctions.

So there you go. Type disjunctions as sugar over closed protocols, first phase can support nothing except for exhaustive switching (don't even expose common protocol requirements). That handles the main case people probably need for typed throws. They want to declare throws(E1 | E2) so that the catch can handle E1 and E2 without an open-ended default. That's just exhaustive switching. It would be annoying if E1 | E2 didn't automatically upcast to any Error (a consequence of not chasing down the common base types) but this could be handled as a special case similar to how any Error conforms to Error as a special case today. That’s assuming it really is a problem to handle that in general. The type checking to determine common conformances doesn’t seem to suffer a combinatorial explosion (possibly because a type disjunction is not a nominal protocol that anyone can explicitly conform to, which is a key difference from type conjunctions, where conformance to one is sugar for conformance to all the individual protocols), and I don’t expect anything to change at runtime since there aren’t even any additional witness tables.

Closed protocols plus type disjunctions as sugar over them would be an absolute game changer for error handling, especially now that we have typed throws. Errors are one of the best examples of the overuse of enums. Error categories are often hierarchical and even nontrivial many-to-many graphs, which are awkward or impossible to model as enum cases. Imagine being able to write a simple catch table that can cover individual error cases, or intermediate categories of errors, without having to repeatedly unwrap cases to get access to the more granular cases within that case! You basically can’t model something as simple as HTTP errors cleanly with enum cases (unless you just give up on capturing categories like client vs. server errors). That’s easily and gracefully modeled by closed protocols. If errors with multiple subtypes were generally defined as closed protocols refining Error, type disjunctions in throws statements would elegantly combine into throw types that can always be handled by a simple one-level catch table with the flexibility to catch individual types or catch an intermediate category where needed, all without open-ended defaults. This probably also leads to powerful techniques to catch and handle subsets of errors while propagating the remaining ones up as disjunctions that omit the types you handle yourself.

codafi · October 13, 2024, 2:41pm

This is unsound. Consider

public sealed protocol P {}
public protocol Q: P {}
 
struct X: P {}

// in a module that imports this one:
// struct Y: Q {}

public func foo(_ x: any P) {
  switch x {
  case let x as X.self: print(“X”)
  }
}

// in a module elsewhere
// foo(Y()) //💥

The property of being “sealed” is a form of access control applied at conformance sites. Failure to inherit the “sealed” trait is like having an internal protocol with a public refinement. The proposed fix of “just switch on the derived protocol” doesn’t work if any other module can define refinements that the original module cannot have been designed to handle. To recover the program property you need to guarantee exhaustivity, you’d need to say the descendants of sealed protocols must be defined in the module that defines the sealed protocol. But then that seems to lose a lot of the benefits one would hope to gain from this re-opening feature.

Speaking of compiler performance, I’m nervous about extending exhaustivity analysis to require whole-module information (for a given protocol, gather all conforming types and, depending on which other features, all their descendants) whereas before it could get by with per-declaration checking as we only really guarantee the analysis for individual enums.

That said, the above shouldn’t be taken as a criticism of the rest of the post. You’ve expounded a great deal on something that I believe has promise. For example, one reason existentials are problematic for embedded Swift is because the compiler cannot guarantee a fixed layout for the underlying type and also maintain the open world assumption. This is not so with sealed protocols and their conformances which we could elect to fix an ordering for and compile as multi-payload enums as you’ve no doubt surmised.

aetherealtech · October 13, 2024, 7:07pm

For reference, the ability to "unseal" in this way is how it works in Kotlin. This is explained here.

The proposed fix of “just switch on the derived protocol” doesn’t work if any other module can define refinements that the original module cannot have been designed to handle

I might be misunderstanding you, but if my reading of this is correct, you don't switch on the derived protocol, you handle all conformances to the refinement as an existential. In your example, defining public protocol Q: P, which must be defined in P's module (since P is sealed), the switch on only X no longer compiles. You have to update it to this:

public func foo(_ x: any P) {
  switch x {
  case let x as X: print("X")
  case let q as any Q: print("Q")
  }
}

And the ability to exhaustively switch stops there. No one (in any module) can exhaustively switch on an any Q, because it isn't sealed, so when they exhaustively switch on any P, they can drill down to any Q but no further.

So I don't think it's unsound, but that doesn't mean it's a good idea. Like I said, I wouldn't protest too much if a consensus emerges that re-opening shouldn't be allowed. I'm struggling to think of a clear use case for it anyways.

Regarding compiler performance, I believe this feature can and certainly should be done in a way that follows the zero overhead principle, in that it should cost nothing when you don't use it. That's why it's essential that it's opt-in, you only gain this capability when you explicitly request it, and for this it may even be necessary to distinguish "sealing" or "closing" a protocol from disallowing external conformances (I talked about this in this thread).

I imagine it would work by the compiler seeing the sealed and from that enabling the extra work to collect the list of subtypes, and then storing that in the module interface. A concern there might be: if a module decides to use this, does it cost anything for importers of that module who aren't taking advantage of it? Hopefully not as it's just a little extra metadata the compiler can ignore until it sees switches without defaults. Of course importing open source packages can make it a problem for everyone, but that's a general problem SPM needs to figure out (why is it ever compiling package dependencies except once after a version is pulled down, and I mean for my entire dev machine?)

That way, the compiler would only get bogged down chasing an excessively large or complex hierarchy of types if someone decides to mark that hierarchy as sealed, which indicates they want that and might be willing to pay for it... and if it turns out the price tag is too high, they can just turn it back off.

and compile as multi-payload enums as you’ve no doubt surmised

Yes exactly. Knowing all the possibilities means heap allocations can be entirely avoided.

(What if it gets re-opened by a refinement? That would just mean one of the payloads is an existential, but at least you for sure avoid allocations for other cases).