Why you can't make someone else's class Decodable: a long-winded explanation of 'required' initializers

Thanks for putting these thoughts together, Jordan! Some additional comments inline.

David Hart recently asked on Twitter <https://twitter.com/dhartbit/status/891766239340748800&gt; if there was a good way to add Decodable support to somebody else's class. The short answer is "no, because you don't control all the subclasses", but David already understood that and wanted to know if there was anything working to mitigate the problem. So I decided to write up a long email about it instead. (Well, actually I decided to write a short email and then failed at doing so.)

The Problem

You can add Decodable to someone else's struct today with no problems:

extension Point: Decodable {
  enum CodingKeys: String, CodingKey {
    case x
    case y
  }
  public init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKeys.self)
    let x = try container.decode(Double.self, forKey: .x)
    let y = try container.decode(Double.self, forKey: .y)
    self.init(x: x, y: y)
  }
}

But if Point is a (non-final) class, then this gives you a pile of errors:

- init(from:) needs to be 'required' to satisfy a protocol requirement. 'required' means the initializer can be invoked dynamically on subclasses. Why is this important? Because someone might write code like this:

func decodeMe<Result: Decodable>() -> Result {
  let decoder = getDecoderFromSomewhere()
  return Result(from: decoder)
}
let specialPoint: VerySpecialSubclassOfPoint = decodeMe()

…and the compiler can't stop them, because VerySpecialSubclassOfPoint is a Point, and Point is Decodable, and therefore VerySpecialSubclassOfPoint is Decodable. A bit more on this later, but for now let's say that's a sensible requirement.

- init(from:) also has to be a 'convenience' initializer. That one makes sense too—if you're outside the module, you can't necessarily see private properties, and so of course you'll have to call another initializer that can.

But once it's marked 'convenience' and 'required' we get "'required' initializer must be declared directly in class 'Point' (not in an extension)", and that defeats the whole purpose. Why this restriction?

The Semantic Reason

The initializer is 'required', right? So all subclasses need to have access to it. But the implementation we provided here might not make sense for all subclasses—what if VerySpecialSubclassOfPoint doesn't have an 'init(x:y:)' initializer? Normally, the compiler checks for this situation and makes the subclass reimplement the 'required' initializer…but that only works if the 'required' initializers are all known up front. So it can't allow this new 'required' initializer to go by, because someone might try to call it dynamically on a subclass. Here's a dynamic version of the code from above:

func decodeDynamic(_ pointType: Point.Type) -> Point {
  let decoder = getDecoderFromSomewhere()
  return pointType.init(from: decoder)
}
let specialPoint = decodeDynamic(VerySpecialSubclassOfPoint.self)

The Implementation Reason

'required' initializers are like methods: they may require dynamic dispatch. That means that they get an entry in the class's dynamic dispatch table, commonly known as its vtable. Unlike Objective-C method tables, vtables aren't set up to have entries arbitrarily added at run time.

(Aside: This is one of the reasons why non-@objc methods in Swift extensions can't be overridden; if we ever lift that restriction, it'll be by using a separate table and a form of dispatch similar to objc_msgSend. I sent a proposal to swift-evolution about this last year but there wasn't much interest.)

The Workaround

Today's answer isn't wonderful, but it does work: write a wrapper struct that conforms to Decodable instead:

struct DecodedPoint: Decodable {
  var value: Point
  enum CodingKeys: String, CodingKey {
    case x
    case y
  }
  public init(from decoder: Decoder) throws {
    let container = try decoder.container(keyedBy: CodingKeys.self)
    let x = try container.decode(Double.self, forKey: .x)
    let y = try container.decode(Double.self, forKey: .y)
    self.value = Point(x: x, y: y)
  }
}

This doesn't have any of the problems with inheritance, because it only handles the base class, Point. But it makes everywhere else a little less convenient—instead of directly encoding or decoding Point, you have to use the wrapper, and that means no implicitly-generated Codable implementations either.

I'm not going to spend more time talking about this, but it is the officially recommended answer at the moment. You can also just have all your own types that contain points manually decode the 'x' and 'y' values and then construct a Point from that.

I would actually take this a step further and recommend that any time you intend to extend someone else’s type with Encodable or Decodable, you should almost certainly write a wrapper struct for it instead, unless you have reasonable guarantees that the type will never attempt to conform to these protocols on its own.

This might sound extreme (and inconvenient), but Jordan mentions the issue here below in The Dangers of Retroactive Modeling. Any time you conform a type which does not belong to you to a protocol, you make a decision about its behavior where you might not necessarily have the "right" to — if the type later adds conformance to the protocol itself (e.g. in a library update), your code will no longer compile, and you’ll have to remove your own conformance. In most cases, that’s fine, e.g., there’s not much harm done in dropping your custom Equatable conformance on some type if it starts adopting it on its own. The real risk with Encodable and Decodable is that unless you don’t care about backwards/forwards compatibility, the implementations of these conformances are forever.

Using Point here as an example, it’s not unreasonable for Point to eventually get updated to conform to Codable. It’s also not unreasonable for the implementation of Point to adopt the default conformance, i.e., get encoded as {"x": …, "y": …}. This form might not be the most compact, but it leaves room for expansion (e.g. if Point adds a z field, which might also be reasonable, considering the type doesn’t belong to you). If you update your library dependency with the new Point class and have to drop the conformance you added to it directly, you’ve introduced a backwards and forwards compatibility concern: all new versions of your app now encode and decode a new archive format, which now requires migration. Unless you don’t care about other versions of your app, you’ll have to deal with this:
Old versions of your app which users may have on their devices cannot read archives with this new format
New versions of your app cannot read archives with the old format

Unless you don’t care for some reason, you will now have to write the wrapper struct, to either
Have new versions of your app attempt to read old archive versions and migrate them forward (leaving old app versions in the dust), or
Write all new archives with the old format so old app versions can still read archives written with newer app versions, and vice versa

Either way, you’ll need to write some wrapper to handle this; it’s significantly safer to do that work up front on a type which you do control (and safely allow Point to change out underneath you transparently), rather than potentially end up between a rock and a hard place later on because a type you don’t own changes out from under you.

Future Direction: 'required' + 'final'

One language feature we could add to make this work is a 'required' initializer that is also 'final'. Because it's 'final', it wouldn't have to go into the dynamic dispatch table. But because it's 'final', we have to make sure its implementation works on all subclasses. For that to work, it would only be allowed to call other 'required' initializers…which means you're still stuck if the original author didn't mark anything 'required'. Still, it's a safe, reasonable, and contained extension to our initializer model.

Future Direction: runtime-checked convenience initializers

In most cases you don't care about hypothetical subclasses or invoking init(from:) on some dynamic Point type. If there was a way to mark init(from:) as something that was always available on subclasses, but dynamically checked to see if it was okay, we'd be good. That could take one of two forms:

- If 'self' is not Point itself, trap.
- If 'self' did not inherit or override all of Point's designated initializers, trap.

The former is pretty easy to implement but not very extensible. The latter seems more expensive: it's information we already check in the compiler, but we don't put it into the runtime metadata for a class, and checking it at run time requires walking up the class hierarchy until we get to the class we want. This is all predicated on the idea that this is rare, though.

This is a much more intrusive change to the initializer model, and it's turning a compile-time check into a run-time check, so I think we're less likely to want to take this any time soon.

Future Direction: Non-inherited conformances

All of this is only a problem because people might try to call init(from:) on a subclass of Point. If we said that subclasses of Point weren't automatically Decodable themselves, we'd avoid this problem. This sounds like a terrible idea but it actually doesn't change very much in practice. Unfortunately, it's also a very complicated and intrusive change to the Swift protocol system, and so I don't want to spend more time on it here.

The Dangers of Retroactive Modeling

Even if we magically make this all work, however, there's still one last problem: what if two frameworks do this? Point can't conform to Decodable in two different ways, but neither can it just pick one. (Maybe one of the encoded formats uses "dx" and "dy" for the key names, or maybe it's encoded with polar coordinates.) There aren't great answers to this, and it calls into question whether the struct "solution" at the start of this message is even sensible.

I'm going to bring this up on swift-evolution soon as part of the Library Evolution discussions (there's a very similar problem if the library that owns Point decides to make it Decodable too), but it's worth noting that the wrapper struct solution doesn't have this problem.

Whew! So, that's why you can't do it. It's not a very satisfying answer, but it's one that falls out of our compile-time safety rules for initializers. For more information on this I suggest checking out my write-up of some of our initialization model problems <https://github.com/apple/swift/blob/master/docs/InitializerProblems.rst&gt;\. And I plan to write another email like this to discuss some solutions that are actually doable.

Jordan

P.S. There's a reason why Decodable uses an initializer instead of a factory-like method on the type but I can't remember what it is right now. I think it's something to do with having the right result type, which would have to be either 'Any' or an associated type if it wasn't just 'Self'. (And if it is 'Self' then it has all the same problems as an initializer and would require extra syntax.) Itai would know for sure.

To give background on this — the protocols originally had factory initializers in mind for this (to allow for object replacement and avoid some of these issues), but without a "real" factory initializer pattern like we’re discussing here, the problems with this approach were intractable (all due to subclassing issues).

An initializer pattern like static func decode(from: Decoder) throws -> ??? has a few problems
The return type is one consideration. If we allow for an associated type representing to the return type, subclasses cannot override the associated type to return something different. This makes object replacement impossible in situations which use subclassing. The only reasonable thing is to return Self (which would allow for returning instances of self, or of subclasses). (We could return Any, but that defeats the entire purpose of having a type-safe API to begin with; we want to avoid the dynamic casting altogether.)
Even if we return Self, this method cannot be overridden by subclasses:
If implemented as static func decode(from: Decoder) throws -> Self, the method clearly cannot be overridden in a subclass, as it is a static method
The method cannot be implemented as class func decode(from: Decoder) throws -> Self on a non-final class:
protocol Foo {
    static func create() -> Self
}

class Bar : Foo {
    class func create() -> Bar { // method 'create()' in non-final class 'Bar' must return 'Self' to conform to protocol 'Foo'
        return Bar()
    }
}

protocol Foo {
    static func create() -> Self
}

class Bar : Foo {
    class func create() -> Self {
        return Bar() // cannot convert return expression of type 'Bar' to return type 'Self'
    }
}

protocol Foo {
    static func create() -> Self
}

class Bar : Foo {
    class func create() -> Self {
        return Bar() as! Self // error: 'Self' is only available in a protocol or as the result of a method in a class; did you mean 'Bar'?; warning: forced cast of 'Bar' to same type has no effect; error: cannot convert return of expression type 'Bar' to return type 'Self'
    }
}

final class Bar : Foo {
    class func create() -> Bar { // no problems
        return Bar()
    }
}
This means that we either allow adoption of these protocols on final classes only (which, again, defeats the whole purpose!), or, that every class which implements these protocols has to have knowledge about all of its potential subclasses and their implementations of these protocols. This is prohibitive as well.
Even if it were possible to subclass these types of methods, they don’t follow the regular initializer pattern. In order to construct an instance of a subclass, you need to be able to call a superclass initializer. But these methods are not initializers; even if you call super’s factory initializer, there’s noting you can do with the returned instance of the superclass; unlike in ObjC, there’s no super- or self-reassignment (in general), so classes would have to follow a completely different (and awkward) pattern of creating an instance of the superclass, initializing from that instance in a separate initializer (e.g. self.init(superInstance)), and also setting decoded properties

Overall, the lack of a true factory initializer pattern prevented us from doing something like this, and we took the regular initializer approach.

···

On Aug 2, 2017, at 5:08 PM, Jordan Rose <jordan_rose@apple.com> wrote:

1 Like