Sealed protocols

Eh, this is digressing a little bit, but I actually think they are not the best thing for C-type enums, which is kind of interesting. Going to/from raw values requires a switch on the dynamic value. If you wrap the C integer in a struct, you can feed it back in to a C API directly.

Yeah for libraries that's kind of... not great. That's a pretty strong commitment to never add cases ("conformers"). It leaks implementation details - I mentioned this when talking about unions before.

What I meant was C-style enums which are basically a wrapper around an int. I don't have much experience working directly with C APIs from Swift so maybe there are interop issues that motivate a different approach when you need interop. However, I usually try not to let interop concerns impact the Swift interface of APIs I write so I might not worry about having to write. switch for interop if the enum was the best design for Swift code using the API.

Well this tradeoff depends a lot on circumstance. I said "which can be useful if want to be able to recover the type information", which you may not always care about. Or you may only care within a specific context, such as the implementation of a library (while not wanting to make commitments external). But I agree that the commitments made by each approach in a contract at a system boundary are important to consider.

I am not making any points about what tradeoffs are better or worse. I am only trying to describe the tradeoffs that exist. With that shared understanding, it's easier to talk about specific use cases and discuss why one might be a better choice than the other.

And for a protocol, you have additional syntax from MN method definitions, whereas switches don't need to be alone in their own methods, and even if they are, you only need N method definitions. I don't think the "amount of code" argument is compelling or necessary to make your case; it's a distraction at best.

My point was that (for teaching purposes), if somebody writes a data-structure, then decides "oh, I could write some kind of optimised storage for this particular case; I need to talk about this type in a more abstract way", or that some algorithm could return an optimised result for a particular case, they are doing type erasure and should go for a protocol (or a class).

Even if it's technically possible with an enum. This way is just easier to write and maintain, and should have no performance consequences (assuming the types are not public, cross-module is more complicated).

It's probably better to talk about with some code.

enum MyEnum {
  case a(A)
  case b(B)

  func aReq() -> Int {
    switch self {
    case .a(let a): return a.aReq()
    case .b(let b): return b.aReq()
    }
  }
}

extension A {
  func aReq() -> Int { /* ... */ }
}
extension B {
  func aReq() -> Int { /* ... */ }
}

======= vs =======

protocol MyProto {
  func aReq() -> Int
}
extension A: MyProto {
  func aReq() -> Int { /* ... */ }
}
extension B: MyProto {
  func aReq() -> Int { /* ... */ }
}


The way I see it, enums are just much easier to reason about because you know in advance all the possible cases there are (e.g. a Result type). The information about how to deal with the different cases is encapsulated in each function / method that uses the enum, while with protocols that information is to some extent delegated to conforming types. And even with sealed protocols, your different conformances could be spread out over a large module.

Protocols open up a lot of flexibility (which is needed for highly generic code or when you know the list of possible conformances is not somehow naturally limited etc.), but they also add indirection and thus the code is not as easy to inspect (even though the compiler does prevent any obvious bugs).

I see the discussion as being somewhat similar to using sum types vs. typeclasses in Haskell, so it's probably helpful to read up a bit on the Haskell community consensus about when to use which (of course, there are some differences, for example you can't use Haskell typeclasses directly to build heterogenous lists, etc.).

This doesn't feel like an idiomatic use of enums to me; the individual methods on A and B are superfluous and could just as well be written inline in the switch. As you said, if it's interesting for A and B to both share the API individually and as a single abstraction, then a protocol makes much more sense.

Setting aside temporary limitations, I think it'd help to look at the fundamental difference between enums and protocols to help derive guidance about when each is ideal. For a protocol, the "cases" are always types with individual identity, and the protocol itself does not directly have a type identity. (Existential types exist, sure, but from a model perspective I think it makes sense to consider these structural types derivable from generic constraints rather than something fundamentally tied to protocols.) On the other hand, an enum is the reverse—the cases have no type identity, but the sum of them does. That suggests that enums are at least the best tool for generic abstract sums like Optional and Result, since a Some<T> or Success<T> type would have little use outside of the enum, and for enums with lots of no-payload cases, since each empty case is likewise of little use as a type independent of the collection of cases.

11 Likes

In alignment with the notion of identity you're describing, protocols have the advantage of describing subsets of types via inheritance:

sealed protocol ABC { }
struct A: ABC { }
struct B: ABC { }
struct C: ABC { }

func takesABorC(_ abc: ABC) { /* ... */ }

protocol AB: ABC { }
extension A: AB { }
extension B: AB { } 

func takesAorB(_ ab: AB) { /* ... */ }

With enums, there's no simple way to express restriction among values without creating a separate type and a function to convert between the two.

enum ABC { case a, b, c }

func takesABorC(_ abc: ABC) { /* ... */ }

enum AB {
    case a, b

    var asABC: ABC {
        switch self {
            case .a: return .a
            case .b: return .b
        }
    }
}

func takesAorB(_ ab: AB) {
    /* requires a call to `asABC` to do `ABC` things with `ab` */
}

This is a great way of looking at it. The guideline would then be to use enums when the cases don't have a meaningful independent identity and the API offered is defined in terms of the entire sum. When the "cases" do have a meaningful independent identity a protocol should be used, even when the sum itself also has a meaningful identity (i.e. the existential).

In order to make this guideline as viable as possible we need to remove the current limitations of protocols. Generalized existentials will help to make protocols more viable in cases where a generic enum might be a better choice today. Exhaustive switch over (sealed and non-public) existentials will allow protocols to be used when without sacrificing the ability to organize code by method rather than by type or to switch inline in part of a larger operation if necessary. Ideally existentials of sealed and non-public existentials would also receive a representation that is just as optimized as enum representations as well (which would hopefully avoid the boxing existentials require in many cases).

1 Like

It should also be noted that making a protocol sealed as proposed is not by itself enough to have exhaustive knowledge of the types conforming without significant implementation complexity and compile-time cost, so this proposal probably should not promise optimizations or language features that depend on this knowledge. Private types and local types within a module are not normally visible outside of their scope, and if we were going to base language features like exhaustive pattern matching or optimizations on this knowledge, it would require that every translation unit exhaustively scan the source of the program to find any private or local types that might be conforming, which would not normally be necessary.

Furthermore, layout optimizations on existentials are not as trivial as they may seem, since existentials are structural types, and significant amounts of runtime code expects existentials to have certain layouts. Existential-specific layout optimizations could also end up pessimizing conversions between related existential types, since their optimized forms could have significantly different layouts. An optimization pass could conceivably replace an existential by a synthesized enum in places where it's a concrete type, but we would likely have to reabstract to the generic existential representation any time we dynamically manipulate the existential type. It would also be impossible to lay out existentials for sealed public protocols differently without breaking ABI if we want it to be resilient to remove sealed-ness.

There are other benefits to this proposal for sure, but if exhaustive knowledge of conforming types is an important goal of this proposal, it would need some adjustment to achieve that goal, such as possibly restricting the conformance of private or local types. Otherwise, the proposal text should not take this for granted and should probably not promise anything about layout optimizations or exhaustive pattern matching.

6 Likes

I didn't mean to imply that this proposal would promise any of that. @Karl has already made it clear that it is out of scope for this proposal. sealed protocols have plenty of merit without those features. But I would like to see these directions explored in the future as they would reduce the number of tradeoffs we face when making design decisions. It would be fine if they were subject to limitations (such as all conformances must be visible in order to switch exhaustively).

I don't know too much about layout optimizations but it would be nice to be able to choose the logical semantics of protocols and existentials without having to pay for boxing. Conversions between related existentials are not always necessary so that cost isn't always a factor. It would sometimes be acceptable as a tradeoff if we had the ability to influence the choice of layout. re: resilience, I think it would be fine to require @frozen sealed for layout optimization of public protocols.

More specifically, the proposal draft at sealed protocols by karwa · Pull Request #972 · apple/swift-evolution · GitHub says:

Similarly, when the compiler has knowledge about the conforming types, it can use optimised operations to handle existentials.
Currently, we advise to make protocols which are only conformed-to by classes inherit AnyObject, but this then becomes part of the protocol's ABI
and clients may depend on it. sealed protocols have the possibility to lower this to an 'informal' optimisation within the declaring module,
and support more patterns between conforming types.

and makes passing reference to optimizability in other places, which is not really possible as proposed.

I see. Definitely makes sense to remove this text then. I think the feature has sufficient motivation without discussing optimizations.

I agree with Joe. I fear it will bring up many resilience and access control discussions which are not really the important problems to be facing at this point in Swift's evolution. I'm personally very -1 on this feature, from a "it adds a bunch of complexity for very little gain" perspective.

Edit: The complexity I'm concerned about here is "language and conceptual complexity", not compiler complexity.

-Chris

2 Likes

Can you elaborate on this? Just the basic proposal without any of the enhancements that have been discussed would be enough to eliminate the need for this awful hack that prevents conformances from being added outside a module. I have needed to use that hack in a few places and would very much like to get rid of it.

The fact is that there are important library design techniques that don’t work if conformances can be added outside the library. I think enabling them is an important goal and reduces complexity for both libraries and users by allowing the library to more clearly state its intent within the language itself.

5 Likes

Yes, please elaborate. There has been plenty of discussion on this topic over the years, and every time, people who write Swift libraries and applications every day share stories about the hacks and tricks they use to emulate this feature. You can't just drop a bomb like that and walk off ;)

The only access control discussion I can see is whether or not protocols should be sealed by default and made open instead of the reverse. It's worth having that discussion as soon as possible, but I think everybody understands that it's unlikely. I also don't see any "resilience" impact (in the @_fixed_contents/@_frozen sense); it's valuable even for source libraries to declare protocols as sealed. Again, writing good protocols is hard.

1 Like

The alternative to that hack is to document that this protocol is not to be conformed to, and leave it at that. Protocols aren't just bags of syntax. You must always read and understand the documented requirements of a protocol when you implement it – or else you haven't actually implemented the protocol.

In this case, if you read the requirements and they are "don't implement this", things are pretty clear. Proposing a new keyword is no small thing, and the case would need to be made that this proposal delivers significant benefit over this approach.

9 Likes

Yes. This is what the stdlib does today, and I don't hear much fuss about people trying to conform to StringProtocol.

I mean, the status quo is a very serious option, and nobody should feel bound to adopt the various hacks described in this thread.

This is especially true now that it has been clearly stated above that the pitch should not make any promise about possible compiler optimizations or extensions such as exhaustive switches.

1 Like

It's not in the standard library proper, but I would hope SwiftSyntax would seal the Syntax protocol instead of having an internal _SyntaxBase to serve as the "real" protocol for conformers.

Some of us adopt a different philosophy. The hack is awful, but it doesn’t take that much code (which can even be generated making it almost as concise as using sealed). It provides an ironclad guarantee that the library is not abused. It is quite reasonable for a library to take relatively small measures like that to prevent abuse.

IMO, this is the most responsible approach to library development. Yes, protocols are also about semantics and users should read documentation and use them accordingly, but that doesn’t mean the language shouldn’t provide tools to prevent abuse.

Further, the reality is that not everyone bothers to read documentation. We shouldn’t modify our designs around that bad behavior, but we should still strive to offer the best experience possible for all Swift users. Every abuse a library is able to prevent by construction is a win in this regard. (I am not speaking of abuse with malicious intent here, primarily abuse by accident and / or naivety).

5 Likes

I don't know about the actual use cases for sealed, but I guess hiding a protocol might be suffient in some situations.
In this context, the meaning of "hiding" would be using an internal protocol in a public method, with the effect that the module would expose overloads of that method for each type that conforms to the protocol (while keeping the connecting protocol secret).
This woudn't need new keywords, and afaics has no impact on backwards compatibility.