Sealed protocols

I don't mean to suggest that sealed should imply @_frozen.

I'm pointing out that sealed, on its own, would not permit you to get rid of the default case when switching over, say, StringProtocol without a guarantee that the protocol is not just sealed but frozen, which would require extending @_frozen (an orthogonal feature) to protocols.

I don't either. The code I am talking about is in the same module as the protocol declaration. Exhaustive switch makes sense any time all conformances are statically known. i.e. within the declaring module for non-public and sealed protocols as well as everywhere for @frozen sealed protocols.

Great, I think we agree completely. My reply was principally to @Karl and the broader point was that one needn't consider sealed to be "less accessible" than public for the reasons I mentioned above, namely that it both adds and subtracts possibilities, just like open does.

Oh, certainly. The thing is, a lot of the times when I see this it's because the protocol has something which is basically a requirement, but people don't want to make it a formal requirement because it involves non-public types, or in general involves implementation details which others can't reasonably conform to. So I think non-public requirements could remove a lot of that downcasting.

In any case it would be a source-compatible change which we could add at any time. Your existing switch statements would simply warn on the default case if the rest of it covered all possibly subtypes.

Correct. sealed would be enough within the declaring module, but not across modules.

Well, for one thing, open is more accessible than public. From the proposal:

For the purposes of interpreting existing language rules, open is a higher (more permissive) access level above public .

By the same logic, a sealed protocol is less permissive than public protocol, yet is also exported as part of your library's "public interface". The word "open" kind-of infers public, in a way which "sealed" doesn't, really, so I'd prefer to keep them separate.

In fact that proposal does a great job of summarising why we want this for protocols, too. Maybe the core-team should re-read the excellent arguments presented there before deciding if they'd consider sealed-by-default :wink:

It seems like a hard sell to me. Like frozen-ness of enums, it's only really a resilience concern for ABI-stable libraries, and if StringProtocol is the only motivating example you can find from the standard library, that suggests to me that the current default hasn't been much of a problem so far. (StringProtocol is a poor example in general; given more time, I'm under the impression from @Ben_Cohen that his team would rather have eliminated it altogether from public API.)

3 Likes

It's not only a problem for ABI-stable libraries. It's really part of a larger goal to split visibility from conformance, which would allow all kinds of great enhancements like the ability to add requirements while retaining source stability. If you can't add a reasonable default, so just create something which traps, you lose conformance checking in your own types... and yeah, you might trap if somebody else conformed.

The standard library isn't really a great place to look, because it provides common-currency protocols like Collection, Codable and Numeric, which are definitely designed for you to conform your own types to them. The same goes for Foundation; part of its goal is to provide a set of common interface types for everybody. They are really too low level for this to surface often - typically this will occur in higher-level libraries, where you want to hide implementation details behind an erased interface.

For the standard library specifically, it mostly comes down to implementation details. For example, you could rewrite the _Variant storage enums with a private protocol. In theory it should be just as fast and require less code, but I think in practice the optimiser doesn't handle it as well (yet).

Well, the question Matthew asked was specifially about breaking source compatibility. We don't want to break source compatibility unless it's clear that the existing behavior is actively harmful, and given the existing body of libraries we have to look at, it doesn't seem to be

Well, for a private protocol, sealed-ness doesn't matter for optimization purposes, since the closed set of conformances can be known at compile time no matter what. It would require the same amount of code either as an enum or a protocol, just factored differently.

enums require you to switch and basically do your own dynamic dispatch. It would be less code if you just had a protocol which declared the requirement signature (maybe significantly less, depending how many requirements/conformers you have).

Another benefit is that by using a protocol, its easier to experiment with new storage types - e.g. you could easily introduce a dummy/logging storage for testing/benchmarking. Tests can't introduce a new case to an enum, though.

But yes, non-public protocols are implicitly sealed. We should totally give them special existential layouts and dispatching so they optimise down to the same thing as those _Variant enums do. This will be a bigger deal as we get expanded support for existentials.

And protocols require you to declare a method requirement for each switch point, and every conformance must then implement all of those requirements. If you have M types and N dispatched operations, you end up with M * N case blocks or method implementations either way. There can certainly be different benefits and costs to organizing code one way or the other, but the size of the matrix doesn't change.

Something to consider here: how would you teach this? Overall, I think sealed protocols are a more flexible and adaptable tool for managing operations over closed sets of types than enums, but given that we already have enums, if we add sealed protocols, we'd have two tools for very similar situations. We would need to be able to explain when it's appropriate to use enums or closed protocols.

5 Likes

There are some times where you absolutely must use protocols. For example, if you need to use the protocol as a generic constraint which accepts the "case" types you don't have a choice. On the other hand, if you want exhaustive switch you have to use an enum (at least presently).

Personally, I would like to see the drawbacks of each approach lifted as much as possible which would inevitably bring them closer together. In an ideal world it may even be possible to unite the concepts altogether, making one just syntactic sugar for the other. That would make it harder to choose between them in some use cases but I think that's ok.

I think explaining the advantages and limitations of each approach in a clear way is enough. We should help people understand the tradeoffs involved and let them make their own choice about what tradeoff is best to make in a specific situation.

1 Like

Enums are really powerful, you can do a lot with them (albeit with a lot of boilerplate). With private cases, they'd get even more powerful. Still, that doesn't mean they're always the best solution for everything.

Protocols provide an abstract interface over a set of types. That is certainly one of the things that enums can do, but protocols are designed for that kind of work and are generally more expressive. For example, you can't mimic associated types with an enum, because you'd need functions to have different return values based on the dynamic case. Your only option would be to return another enum.

It also does require less code. With both enum-dispatch and protocols, there will be a declaration, and every type needs to implement that requirement. For enums, if you have M types and N requirements, your declarations need an additional M lines to account for the switch statement which is done for you dynamically by fetching a protocol witness table (or statically, by the optimiser). So that's another MN lines.

Another really important point is that sealed protocols don't need to stay sealed forever. As the open/public class proposal explains, designing a base class which appropriate for subclassing is not trivial. In early development, you might make superficially non-breaking changes like rearranging the order in which certain open methods are called, and accidentally break somebody's subclass. The same is true of protocols - designing a truly abstract interface like Collection or Numeric requires a lot of thinking and careful documentation. We made lots of changes to Collection over the years (the new index model, removing IndexDistance). Numeric gained inheritance from Addable a few weeks ago, which is technically ABI-breaking. A protocol which is sealed while you work out the requirements can be opened to external conformances at any time.

Sealed or not, protocols are the best thing for type erasure, not enums.

3 Likes

I agree with most of that, but I'm not sure it directly answers my question. That explains why sealed protocols are a desirable feature, but it doesn't help explain why enums also exist and when you should use them.

It's not really about when you should use enums, it's more about when you shouldn't.

Don't use them for type erasure: I mean you can emulate a protocol if you want, but it's more boilerplate, less flexible, and should get lowered down to the same thing anyway.

Enums can do more than just type erasure. If that's what you're doing - go for it.

Enums are quite obviously preferable when you have simple C-type enums (especially when they have a raw value). They are also often the right solution when writing state machines. They are also a better fit for some use cases that might eventually be covered by generalized existentials, but cannot be done with protocols today.

Except that for now they offer exhaustive switching and protocols don't which can be useful if want to be able to recover the type information. From a pragmatic standpoint, enums also provide a central location where the full set of possible cases are documented. This is something tooling could address with sealed and @frozen sealed in time, but it is not something that is as immediately clear when just reading the text of the source.

So there are tradeoffs right now, but hopefully less on both sides in time...

Eh, this is digressing a little bit, but I actually think they are not the best thing for C-type enums, which is kind of interesting. Going to/from raw values requires a switch on the dynamic value. If you wrap the C integer in a struct, you can feed it back in to a C API directly.

Yeah for libraries that's kind of... not great. That's a pretty strong commitment to never add cases ("conformers"). It leaks implementation details - I mentioned this when talking about unions before.

What I meant was C-style enums which are basically a wrapper around an int. I don't have much experience working directly with C APIs from Swift so maybe there are interop issues that motivate a different approach when you need interop. However, I usually try not to let interop concerns impact the Swift interface of APIs I write so I might not worry about having to write. switch for interop if the enum was the best design for Swift code using the API.

Well this tradeoff depends a lot on circumstance. I said "which can be useful if want to be able to recover the type information", which you may not always care about. Or you may only care within a specific context, such as the implementation of a library (while not wanting to make commitments external). But I agree that the commitments made by each approach in a contract at a system boundary are important to consider.

I am not making any points about what tradeoffs are better or worse. I am only trying to describe the tradeoffs that exist. With that shared understanding, it's easier to talk about specific use cases and discuss why one might be a better choice than the other.

And for a protocol, you have additional syntax from MN method definitions, whereas switches don't need to be alone in their own methods, and even if they are, you only need N method definitions. I don't think the "amount of code" argument is compelling or necessary to make your case; it's a distraction at best.

My point was that (for teaching purposes), if somebody writes a data-structure, then decides "oh, I could write some kind of optimised storage for this particular case; I need to talk about this type in a more abstract way", or that some algorithm could return an optimised result for a particular case, they are doing type erasure and should go for a protocol (or a class).

Even if it's technically possible with an enum. This way is just easier to write and maintain, and should have no performance consequences (assuming the types are not public, cross-module is more complicated).

It's probably better to talk about with some code.

enum MyEnum {
  case a(A)
  case b(B)

  func aReq() -> Int {
    switch self {
    case .a(let a): return a.aReq()
    case .b(let b): return b.aReq()
    }
  }
}

extension A {
  func aReq() -> Int { /* ... */ }
}
extension B {
  func aReq() -> Int { /* ... */ }
}

======= vs =======

protocol MyProto {
  func aReq() -> Int
}
extension A: MyProto {
  func aReq() -> Int { /* ... */ }
}
extension B: MyProto {
  func aReq() -> Int { /* ... */ }
}


1 Like