Why does encode(to:) in an Encodable type take in an existential type?

From this documentation, encoder is any Encoder. But this seems slower and has more indirections compared to some Encoder. What are the reasons behind using any instead of some?

1 Like

The some T syntax didn't exist at the time this API was introduced, although it could have used <T: Encoder> instead to the same effect.

In this case, there's not going to be any practical difference because, when dealing with JSONEncoder or PropertyListEncoder, the implementation of encode(to:) is opaque and dynamically linked, so the compiler would end up producing an existential box either way. (I'm ignoring statically-linked copies of Foundation and WMO for the moment.)

I don't mean to speak for the Foundation code owners, but I'm sure they'd be happy to look at a pitch that introduces an overload taking some Encoder. :slightly_smiling_face:

2 Likes

Good question! There are a few related reasons, partly historical, partly still relevant:

  1. The effective performance difference between some and any in this specific context is negligible, especially in comparison to the rest of encoding and decoding
  2. From an API design perspective, there's also no benefit to knowing the type of an Encoder or Decoder statically (either for someone implementing encode(to:)/init(from:), or for someone writing an Encoder/Decoder)
  3. Making encode(to:) and init(from:) generic complicates the design of the rest of the Encoder/Decoder API

The specifics:

The Codable APIs were designed and implemented in the Swift 3→4 timeframe, and at that time, there was a much wider gulf between existential and generic types in the Swift type system. I believe this was before _openExistential was publicly available, and this was certainly long before SE-0352 Implicitly Opened Existentials. This meant that in order to be able to call init<D: Decoder>(from decoder: D) and encode<E: Encoder>(to encoder: E) (the full spelling of these methods, since this long preceded opaque types too), you had to know D and E statically, all the way up the call chain.

On the face of it, this doesn't sound like an issue, except that the Codable APIs were written to support encapsulated inheritance. When a class C inherits from a Codable superclass SC that it doesn't necessarily know anything about, it might be tempted to call super.encode(to: encoder) and super.init(from: decoder) (passing in the same encoder/decoder that it is handed), but this isn't generally safe:

  1. C and SC may want to encode/decode in different formats. It's perfectly valid for C, for instance, to want to use a keyed container, while SC wants to use an unkeyed container. This means that they can't use the same encoder/decoder
  2. If C and SC both want to encode into a keyed container, there's no guarantee that they won't accidentally clobber each other's data in the container. They're both welcome to use private keys which may conflict

So in the general case, there needs to be a way to be able to encode super into its own context. Encoder and Decoder then both have the concept of "super encoders" and "super decoders", exposed on their containers (e.g., KeyedEncodingContainer.superEncoder()): calling superEncoder() gives you a new Encoder that's safe to pass in to super.encode(to:), and allows it to encode as a nested object instead of into the same level.

The issue, then, is that if init(from:) and encode(to:) are generic, without implicitly-opened existentials, the type of Encoder and Decoder must be known statically — which means that superEncoder() and superDecoder() would need to return concrete types instead of any Encoder and any Decoder.

Since Unkeyed{En,De}codingContainer and Keyed{En,De}codingContainerProtocol are both protocols, this would have required adding those Encoder/Decoder types as associatedtypes, and Keyed{En,De}codingContainer would have been made generic on that type as well. In order to preserve those types, too, Encoder and Decoder themselves would have needed to expose their container types as associatedtypes too.

This would have turned (greatly simplified)

protocol Encoder {
    func container<Key: CodingKey>(keyedBy: Key.Type) -> KeyedEncodingContainer<Key>
    func singleValueContainer() -> any SingleValueEncodingContainer
    func unkeyedContainer() -> any UnkeyedEncodingContainer

    // ...
}

protocol KeyedEncodingContainerProtocol {
    associatedtype Key: CodingKey
    // ...
}

protocol UnkeyedEncodingContainer {
    // ...
}

into something closer to

protocol Encoder {
    associatedtype KeyedContainerType: KeyedEncodingContainerProtocol where Encoder == Self
    associatedtype UnkeyedContainerType: UnkeyedEncodingContainer where Encoder == Self
    associatedtype SingleValueContainerType: SingleValueEncodingContainer where Encoder == Self

    func container<Key: CodingKey>(keyedBy: Key.Type) -> KeyedEncodingContainer<Key, Self>
    func singleValueContainer() -> SingleValueContainerType
    func unkeyedContainer() -> UnkeyedContainerType

    // ...
}

protocol KeyedEncodingContainerProtocol {
    associatedtype Key: CodingKey
    associatedtype Encoder: Encoder
    // ...
}

protocol UnkeyedEncodingContainer {
    associatedtype Encoder: Encoder
    // ...
}

I don't remember if, at the time, the where constraints on the associatedtypes on Encoder were possible to express or not.

This would have massively complicated the APIs, for very little benefit:

  1. There's very little performance to be gained by this switch, especially since you only ever call a single method on any given Encoder/Decoder instance
  2. Every Encoder and Decoder would need to make all of their types public in order to conform to these protocols, for no real API-consumer benefit

Instead, all of this was hidden behind existential types.


As for whether this is possible today or not: it's possible to add generic overloads of init(from:) and encode(to:), but the calculus of performance benefit hasn't really changed. The cost of static vs. dynamic dispatch is pretty negligible, and in many cases, you end up with dynamic dispatch anyway.

(FWIW, it is possible to squeeze performance out of this: Inlinable Codable. If you make everything generic and @inlinable (and make sure that all Encoder/Decoder authors and all Codable types also keep everything generic and @inlinable), then there's performance to be gained at the cost of code size; but changing init(from:) and encode(to:) alone isn't enough.)

5 Likes

As far as I can tell, any Encoder can be thought as a concrete type and can be used in generic. Code below runs without problem:

protocol HasNum {
    var theNum: Int { get }
}

struct S: HasNum {
    var theNum: Int = 11
}

class C: HasNum {
    var theNum: Int = 22
}

func getHasNumObject(_ condition: Bool) -> any HasNum {
    if condition { return S() }
    else { return C() }
}

func printNum(for object: some HasNum) {
    print(object.theNum)
}

printNum(for: getHasNumObject(.random()))

AFAIK migrating away from existentials in some next-gen Codable would also help unblock bringing that to embedded.