Codable with references?

itaiferber · September 8, 2018, 2:17pm

Yes, you're right — I should've more closely read what @QuinceyMorris was suggesting because the point I was trying to make was slightly different. My assertion (and I will edit this into the post above) is that it is not possible at initialization time to require the intialization of an object a which requires a b and a b which requires an a without an "escape hatch" somewhere.

Your Foo and Bar example offers this via SuperFoo (i.e. you can create a Bar without a Foo specifically — you have an object which does not link back to Bar which it can use). This is isomorphic to

class Bar {
    var foo: Foo?
}

class Foo {
    var bar = Bar()
}

let foo = Foo()
foo.bar = foo

The point that I am trying to make is in a scenario like

class Bar {
    var foo: Foo
    init(foo: Foo) { self.foo = foo }
}

class Foo {
    var bar: Bar
    init(bar: Bar) { self.bar = bar }
}

it is impossible to initialize either a Foo or a Bar, and thus impossible to rely on this behavior. My point is less about the reference cycle from a memory management perspective and more from a construction perspective. I'll refine this point.

itaiferber · September 8, 2018, 2:43pm

I wouldn't necessarily call this "guessing" — these are well-defined relationships defined by the semantics of the types you're working with. I think I see what you're getting at though: patching up distant relationships cannot necessarily be done at anyone's init time as you could have a long chain of references where no one knows how to patch up. e.g. if you write

class Node: Codable {
    var children: [Node] = []
    weak var someAncestor: Node? = nil
    
    func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(children, forKey: .children)

        // If someAncestor has not yet been encoded elsewhere, this encodes it and stores a UID.
        // If someAncestor has already been encoded elsewhere, this reuses the UID.
        try container.encodeReference(someAncestor, forKey: .someAncestor)
    }

    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)
        self.children = try container.decode([Node].self, forKey: .children)

        // Looks up the stored UID in the table.
        // If the corresponding object does not yet exist, decodes it — if it does, just returns it.
        self.someAncestor = try container.decodeReference(to: Node.self, forKey: .someAncestor)
    }
}

In the case of

let a = Node()
let b = Node()
let c = Node()

// a -> b -> c
// a < - - - c
a.children = [b]
b.children = [c]
c.someAncestor = a

Decoding a decodes b, which decodes c. In c's init(from:) we reference a again — because a is not done initializing, we'd try to decode it all over again.

In this specific case, the solution does require an awaken(), yes, and is a good motivator for introducing it. One easy way to untangle this is that decodeReference() returns an opaque CodingReference object which the decoder can later be queried for, e.g.

init(from decoder: Decoder) {
    // ...
    self.someAncestorRef = try container.decodeReference(to: Node.self, forKey: .someAncestor)
}

awaken(from decoder: Decoder) {
    self.someAncestor = decoder.object(for: self.someAncestorRef)
}

In this case, decode(Node.self, forKey: .someAncestor) would work the same as it does today: decode an object then and there (if it's a UID, translate it to the object regardless of cycles. Otherwise, you can cooperatively break cycles yourself by requesting references.

Here, encodeReference, decodeReference, and object(for:) could all have relatively simple default implementations for formats that don't support them (throw) and otherwise they'll do the right thing. Decodable can be extended with a default awaken(from:) method which defaults to doing nothing.

QuinceyMorris · September 8, 2018, 8:16pm

(It's my intention to reply in a series of posts, so that I don't have to try to get everything into one ginormous post. This may take a couple of days to get done, because there's a fair amount of thinking involved. Spoiler alert: I'm going to back down, partially, from some of the things I said earlier, based on the responses so far. This post is in response to @Nevin, and also to the continuation of the same ideas in a later response from @itaiferber.)

Nevin:

The answer remains the unchanged: it is entirely possible to create a cycle of strong references with no optionals and no unsafe pointers involved. Here is an example:
class SuperFoo {}
class Bar { var foo = SuperFoo() }
class Foo : SuperFoo { var bar = Bar() }

let foo = Foo()
let bar = Bar()
foo.bar = bar
bar.foo = foo

This is a really interesting case to discuss. First of all, I agree that this is an easy way to create a circular chain of non-optional references. Nothing I say below should be taken to suggest that the example is in some way invalid, or not suitable for consideration.

Second, I want to clarify that my primary interest in find a methodology that will let both Encodable and Decodable synthesize conformance in cases where reference identity is to be preserved. We already know from NSCoding that there is often a lot of boilerplate in encoding and decoding manually, and it would be nice (and safer!) to get the compiler to write it for us. If there is no general solution, then I would fall back to looking for a methodology that provides synthesis for reasonable and natural design patterns, leaving other less common scenarios to a manually coded solution.

So, let's begin …

Taking @Nevin's example as-is, I want to make an argument that this design is not decodable in general, and never decodable via synthesized code.

The problem is that this example relies on the creation of auxiliary objects. Unfortunately, synthesized decoding code would not have license to create auxiliary objects in this way. Even if the decoding code were written manually, it is not going to be generally allowable for it arbitrarily create new objects that are not in the archive — there may well be constraints external to the encoding/decoding procedure, that preclude the creation of transient objects. (We could imagine a developer devising a way to program around such a restriction, but forcing that kind of solution is — I think — exactly what we're trying to avoid here.)

OTOH, @Nevin's example isn't quite as natural as it seems. It constructs object graphs (little ones) with "structural" references linking the nodes, but makes no attempt to protect the structure from accidental (or unwise) tampering. In practice, most (though certainly not all) designs of this sort will want something like:

class Bar { private(set) var foo: SuperFoo = … }
class Foo : SuperFoo { private(set) var bar: Bar =  }

along with API inside one or both of these classes to establish the structural references. Given that the private access control provides safety and security, it is not practical to require developers to drop the protection globally just to ensure that a decoding "fixup" can be done.

Similarly, the example breaks if the references are not modifiable at all:

class Bar { let foo: SuperFoo; … }
class Foo : SuperFoo { let bar: Bar; … }

along with initializers to set these properties initially.

These points also apply to @itaiferber's example (which I'm quoting out of context, but which looks like it provides an escape hatch):

As soon as the stored properties become private or let there is no escape hatch available in a later fixup pass.

OK, that's enough for one post.

itaiferber · September 8, 2018, 11:50pm

Fair enough — if need be, too, we can split up the conversation into multiple threads if it makes sense.

There's a lot to say in this direction but I'll keep it brief for now so we don't go in a million directions: I don't think that at this point we can change the code being synthesized for reference types, due to

Backwards compatibility — there is already code in the wild which encodes objects inside of value trees and the data produced by that code has almost certainly already been written to disk somewhere. We can't change the format for objects in a way which breaks the input/output format
(Less strongly,) a lack of knowledge on the compiler's part about how one might want an object encoded. It's entirely reasonable to want to encode an object as a reference, or as a value (as things are today). As you allude to in your discussion below, the compiler also has no knowledge about how to patch things up after the fact (or whether patching up is even necessary). This means the compiler can only really do the simple thing (call encodeReference() and decodeReference()) and hope the object tree is a DAG at best. Given this lack of knowledge, it's not even clear how useful the default behavior would be (and I would claim: "not terribly")

As such, I think the focus here should not be "how can we best apply this to synthesized code?", but "how can we most easily empower a developer to express the relationship between decoded objects in a way that's useful?"

I largely agree with the rest of your post, and again, I think it's entirely reasonable to ask developers to take a look at how they relate between objects in their regular initializers (as you point out, there are plenty of limitations there as well and every use-case is different) and apply that to init(from:) and awake(from:) as well. This thread shows there are many, many different ways to express these relationships in even regular initializers, and IMO, we need to provide the tools to allow developers to express the same relationships here.

QuinceyMorris · September 9, 2018, 12:07am

I'm busy writing a post that picks up from last one, but I wanted to comment here that I'm envisaging reference archiving as being done by a specific Encoder/Decoder pair (similar to the JSON and property list pairs that exist now). This wouldn't depend on enhancing Codable to support reference semantics, but providing the reference semantics implicitly (as NSKeyedArchiver/Unarchiver currently do).

QuinceyMorris · September 10, 2018, 3:42am

Still working on it...

itaiferber · September 10, 2018, 3:45am

There’s no rush!

QuinceyMorris · September 13, 2018, 5:00am

Note: This came out nothing like I intended. I've been working on it for days, and it's either a feasible insight, or the product of a crazed mind. I can't tell the difference any more.

Purpose

There needs to be a Swift-native replacement for NSCoding-based NSKeyedArchiver and NSKeyedUnarchiver, for encoding and decoding object graphs with reference identity preserved. The Swift solution should in straightforward cases be synthesizable by the compiler, to abolish the boilerplate code found in the NSCoding mechanism, but should be simple enough to code manually when necessary, just like Codable.

TL;DR

The Swift language cannot currently meet this need.
Multi-pass decoding typically won't help.
A feasible solution could be based on separating object allocation from initialization.

General Approach

My belief is that a natural solution would introduce a new pair of encoder and decoder protocols, which I'll refer to as ReferenceEncoder and ReferenceDecoder, implemented by ArchiveEncoder and ArchiveDecoder classes, with an associated pair of encoding and decoding protocols, ReferenceEncodable and ReferenceDecodable, and various …Reference[En/De]codingContainer protocols. All of these protocols would be largely identical to the existing protocols of similar names (without the Reference… prefix). The ReferenceEncodable and ReferenceDecodable are named differently to act as markers for different synthesis semantics.

The reference-identity aspect of both encoding and decoding — assigning each archived object a unique ID, and representing object references in the archive using unique IDs — seems straightforward, and I don't think it needs to be discussed here.

I also believe that a reasonable Swift-native archiving scheme is impossible without enhancements to the language itself. Swift's current initialization rules are a bit too restrictive for archive decoding to work. The rest of this post is an attempt to justify this claim.

The Problem

Previously, I've kept making the mistake of thinking that the hard part of decoding archives with reference-identity semantics is the formation and storage of object references. I finally realized this isn't true. Object references are central to a solution, but they are not themselves the problem.

The problem is how to determine a correct order of initialization for objects being unarchived.

The order matters because the initialization of one object, in general, depends on the prior initialization of other objects. In the least general case, it would be possible to choose any archived object, and begin decoding it. This would lead to invoking init(from:) on its type, and letting that initializer trigger the decoding of the objects and values on which it depends, recursively.

When an object is encountered which has no further dependencies, that chain of decoding ends, and the process restarts from a different, as-yet-undecoded object in the archive. When no more undecoded objects remain, decoding of the archive is finished.

Obviously, this strategy fails if there are any circular chains of dependencies in the archive.

Breaking the Chains

With the current Swift initialization mechanism, there is no general way of resolving circularity in a single decoding pass (that is, by just invoking init(from:) for each object in the archive, in some order or other). A circular chain will have at least one object that cannot finish initialization until a second one finishes, and the second object cannot finish without the first.

At first, it seems hopeful that a two-pass decoding mechanism would work. Basically, this strategy would break all circular chains by ignoring some of the dependencies, initializing the objects in the remaining non-circular chains in a first pass, then dealing with the ignored dependencies in a second pass.

If "initialization" meant "setting the values of stored properties", then this might be a feasible solution. Some stored properties would be stored in pass 1, and the rest in pass 2. (In fact, this is not so easy, and I wasted a lot of time trying to taxonomize this strategy. It turns out to be irrelevant.)

In general, initialization of one object also involves dereferencing — invoking methods on or accessing properties of — the objects it depends on. For that to be safe, those objects must have completed their initialization. A multi-pass strategy doesn't help with that, it just extends initialization (for at least some objects) over all of the passes.

So, we're stuck.

But, Objective-C, Dammit

It's worth asking why NSCoding-based unarchiving isn't faced with the same dilemma.

In part, the answer is that the NSCoding pattern isn't entirely safe. In particular, Obj-C object initialization conventions are quite happy to vend object references outside the initialization method before the object is fully initialized. Developers using NSCoding learn to avoid relying on objects being in a consistent/complete state during unarchiving, except perhaps in a few carefully curated cases.

This is not foolproof. The typical symptom of getting it wrong is crashing, or encountering an unexpected nil reference during decoding. Sometimes, the symptom is a subtle bug that manifests much later, and takes a lot of debugging to find.

It's obviously not an option for Swift to let objects be used in an invalid state, but I think the Obj-C pattern points in the correct direction for Swift.

How Swift Can Win

Decoding typically involves a lot more setting of properties within an initializer than dereferencing object dependencies. If we could safely allow stored properties be set, even when their values are references to objects that haven't completed their initialization (or sometimes, in some sense, haven't started), then there's a pretty good chance that the remaining dependencies could be resolved without circularity. After all, decades of Obj-C programming has been relying on just this (though without language-enforced safety guarantees).

Safety could be provided by a few simple rules, which would apply to code executing during an init(from:) method:

If the code attempts to decode an archived value of value type, the init(from:) initializer for that type is run to completion.
If the code attempts to decode an archived object of reference type, and the object's init(from:) has already run to completion, the object reference is simply returned.
If the code attempts to decode an archived object of reference type, and the object does not yet exist, the object is allocated (that is, the memory to which the object reference points is allocated) but not initialized, and the object reference is returned.
If the code attempts to decode an archived object of reference type, and the object's init(from:) is partially completed, the object reference is simply returned.

Rule 4 applies in the case of a circular chain of dependencies of stored references. It would be unsafe, except that there are more rules:

If the code attempts to dereference an object reference, and the object's init(from:) has already run to completion, the invocation or access is allowed.
If the code attempts to dereference an object reference, and the object's init(from:) is partially completed, the program is aborted with an error describing the chain of coding keys.

Rule 6 describes to the case of a circular chain of dependencies on initialization completion. If it's encountered, the archive is undecodable by the ReferenceDecodable types provided for the decoding.

If the code attempts to dereference an object reference, and the object's init(from:) has not begun to execute, the initialization should be run to completion, then the invocation or access should be allowed.

Rule 7 may sometimes lead to a failure in the dereferenced object's initialization, because of rule 6.

What Swift Needs

The above rules cannot be implemented in the Swift language as it presently stands. However, the needed changes seem fairly simple, at least in overview:

The ReferenceDecoder object or compiler-generated code would need to maintain the initialization state of every object in the archive as one of three cases: .uninitialized, .initializing, and .initialized.
The ReferenceDecoder object or compiler-generated code would need a way to get a reference to an object's memory without actually initializing the object. (Obj-C already has an alloc/init distinction. This would be something similar. Presumably Swift does actually allocate memory before initializing it.)
The ReferenceDecoder object or compiler-generated code would need a way or running initialization using a reference to previously-allocated object memory.
Compiler-generated code would need to check the initialization state of the reference being used for method dispatch and property access during init(from:).

An actual implementation would likely be more complicated than this, but I don't see any obvious technical barrier to the general approach.

Dante-Broggi · September 13, 2018, 10:19am

Based on what you describe, my immediate thought is "yielding initializers".

itaiferber · September 13, 2018, 4:45pm

Thanks for taking the time to put this together! There's a lot to unpack here, so let me respond to some of the things that stand out to me:

Although this is aside from the rest of the points below (and I'll avoid responding to Reference{En,De}coder/Archive{En,De}coder for now): I don't see a meaningful reason to split this concept into one encoder/decoder pairs that can handle reference semantics. In my mind, ideally, we should be able to enable this for all Encoder/Decoder pairs (or at least give them the option to), even JSON{En,De}coder and PropertyList{En,De}coder.

From our previous discussions — I'm not convinced of this without further evidence. You allude to this somewhat down in this next quoted part, but I'd be curious to see where a discussion of this might lead:

Ignoring the Objective-C side of things for now, let's dig in to this. Do you have a particular example in mind here? For those dependencies which require setup in a later phase (let's keep calling this awaken(from:) for now): is there a reason why accessing properties on these decoded types cannot also then be deferred to awaken(from:)?

If a type A has properties B and C where C's initialization might depend on reference type B, it's possible that both B and C might need to be pushed out into awaken(from:), but this is not impossible. Is there a scenario I'm missing?

As for the rules below:

Of these rules, this one introduces the largest change to the language model. Changing this means we have to split allocation from initialization, which would require remodeling of how we do things. CC @jrose/@Joe_Groff for comment but I think this would require a major rearchitecture (or may even be impossible in some cases).

(This is beside the fact that some initializers are allowed to replace the allocated reference with a different value IIRC, which means the old allocated reference is no longer relevant. And yes, this is possible in Objective-C as well, and you can run into trouble there as well without help.)

I don't know how feasible it would be to actually track this. If you have an object reference obj and further decode(...) calls end up initializing the reference much later, there's no way to know statically whether the reference is valid to use or not. For example:

init(from decoder: Decoder) throws {
    // ...
    obj = try container.decode(...)
    // -- obj is NOT currently safe to access --
    otherObj = try container.decode(...) // deep in here resolves some other reference cycle which allows obj to be initialized
    // -- obj is now safe to access --
}

How would this be enforced? Is the intent here is that this would be a runtime failure. If so: do we consider this to be a significant improvement over what Objective-C already offers? A runtime trap is definitely better than undefined behavior, but do we consider this to improve the situation enough to be worth the effort of changing the whole initialization model?

(There are also going to be cases where obj is NEVER safe to access throughout the course of init(from:) — what if initialization depends on calling through to obj.something? Is the synthesized initializer just guaranteed to always end up trapping?)

QuinceyMorris:

<snip>

The ReferenceDecoder object or compiler-generated code would need to maintain the initialization state of every object in the archive as one of three cases: .uninitialized, .initializing, and .initialized.

The ReferenceDecoder object or compiler-generated code would need a way to get a reference to an object's memory without actually initializing the object. (Obj-C already has an alloc/init distinction. This would be something similar. Presumably Swift does actually allocate memory before initializing it.)

The ReferenceDecoder object or compiler-generated code would need a way or running initialization using a reference to previously-allocated object memory.

Compiler-generated code would need to check the initialization state of the reference being used for method dispatch and property access during init(from:).

Two things about this:

One thing to keep in mind here is that code synthesis at the moment cannot "type" anything that you as a developer cannot type — there is nothing privileged about synthesized code that makes things magic.

As such, anything the compiler can type out, you can too, which means that opening up the initialization model in this way will not (and I think should not) be limited to Codable, or synthesized code, or anything like that. If we go this route, we have to accept a new model and offer enough runtime tools for Swift developers to write unsafe code in this way themselves.

This is not inherently a bad thing, but a change of a much larger scope.
There are other questions to answer here: how would the compiler know to synthesize code related to ReferenceDecoder? What might an init(from:) like this look like? Keep in mind that an init(from:) can receive any Decoder reference — at the very least I think we'd need to add API here to help distinguish between possible decoders.

Overall, I still believe that it would be easier and safer to explore further in the direction of two-phase initialization, for a few reasons:

This is a problem that so far has largely only been brought up in the context of Codable. Codable brings with it a more structured initialization pattern (i.e. you can call init(from:) but it's harder to control what happens afterwards, unlike when you initialize values manually), which makes some scenarios harder to represent. As such, addressing this issue at the smallest scope (i.e. the Codable protocols themselves, rather than at the language level) is, in my mind at least, a more appropriate and targeted approach
Understanding order of initialization regarding object references is, at the end of the day, something that is up to the developer: given only properties, it is impossible for the compiler to somehow divine the order in which they need to be initialized, and how to initialize them. As such, it's not unreasonable to require the developer to put some thought into how they want their references initialized. As we've mentioned upthread, there are many schemes for doing this, and there's no way for the compiler to guess.

As part of this, I think it should be understandable that a developer might need to forgo a synthesized implementation in favor of the code doing the right thing. I'd love some clarification on rules 5 and 6 above, but at the moment, it seems that the alternative is a black-box synthesized implementation which may or may not trap at runtime based on the order of initialization. I don't personally see this as enough of an improvement in developers' day-to-day lives to be worth the risk. And with that, too:
There are cases in which developers still need to customize their init(from:) implementations for other reasons. This means that they still need to be able to write the implementations themselves, and track these reference relationships anyway. If these are already being expressed, I think developers should be given the tools to write these inits correctly: whether you write the code which traps at runtime or whether the compiler does, the result is the same. I would prefer being able to offer two-phase initialization for Codable anyway as a correctness guarantee and option for a good way out

These are my personal opinions and I am very happy to receive input. Please, change my mind.

(And again, thanks so much for taking the time to offer the pitch — looking forward to a response if you have the time.)

Joe_Groff · September 13, 2018, 5:00pm

We could conceivably provide mechanisms for allocating and passing around partially-allocated class instances, and allowing direct initialization of stored properties, with the understanding that they must be fully initialized before they have any chance of being deinitialized (and, more obviously, that you can't load from a stored property before it's initialized). Some extremely low-level mechanisms exist for this in the standard library and are used by things like ManagedBuffer and Array's buffer classes, but they have to be used with extreme care.

QuinceyMorris · September 13, 2018, 7:33pm

Indeed. I had a similar thought, but it wasn't obvious to me that it doesn't lead to deadlock by yielding, which is no better than deadlock by dependency.

But, yes, putting things in a more Swift-specific way, I think reference decoding is going to need a third kind of init (supplementing the existing designated and convenience).

If yielding init is workable, I'd be all for it.

QuinceyMorris · September 13, 2018, 11:52pm

OK, I have nothing against this approach. It is, however, a separate topic, because it's essentially about designing a way of expanding JSON semantics — of representing unique references in JSON syntax. (Ditto for property lists.) Once you have that, decoding different representations is all the same problem.

If it's all the same set of underlying protocols, then one issue is that Encodable and Decodable may need new generic methods, whose parameter type conformance distinguishes between reference and value types. That is, both T: Decodable and T: Decodable & AnyObject, etc.

When I was playing around with this in code, the fact that <expression> is AnyObject now returns true for everything, including non-objects, is a bit of a hindrance to the implementation. Distinguishing by different generic conformances seemed like the easiest solution.

QuinceyMorris · September 14, 2018, 12:16am

Nope. My argument is that it's provably impossible, formally — but the argument is expressed informally:

In Swift, an object declares when it regards its initialization as complete, so that it's ready for any of its public API to be used. Canonically, this declaration is implicit by returning from its init, since other objects can't get hold of the object's reference before it returns.

In practice, it's a little bit more subtle than this. Once the initializing object has passed the point where it calls super.init, it is free to "vend" its own reference to other objects by passing self as a parameter to a method invocation in another object. Obviously, the initializing object is responsible for ensuring that it can handle dereferences [method invocations and property accesses] via vended references that bounce back onto it before it returns from its init. In this case, vending a reference is the object's declaration that it is complete.

Now imagine that object A needs to dereference its reference to object B before it can declare its own initialization complete. You can put some of A's initialization into pass 1, and all of B's initialization into pass 1, and the rest of A's initialization into pass 2.

That's great, but it means that A's initialization isn't complete until the end of pass 2.

Now, object C that needs to dereference A in order to complete its own initialization is stuck. It either needs a pass 3 (to "wait" for A to be "ready"), or it needs to initialize in pass 2 but after A is "ready".

Falling back to a pass 3 starts an infinite regress. Ordering of C initialization after A initialization breaks if this happens to be one link in a circular chain of such dependences.

Neither solution works.

QuinceyMorris · September 14, 2018, 12:24am

Replacement is only allowed for value type initializers, and for convenience initializers of reference types (and the latter only after the very recent change that allowed this, which isn't rolled out yet, I think).

The initializer for decoding would have to be like a designated initializer, which does not permit reference substitution.

(I'm not sure how this impacts Obj-C objects in Swift, since self-replacement can't be prevented even for designated initializers written in Obj-C with legacy semantics. This issue needs to be looked at in a bit more detail.)

QuinceyMorris · September 14, 2018, 12:35am

My suggestion would be a third flavor of init semantics, after designated and convenience. Let's call it checked for the sake of discussion.

Within the checked init method, the compiler would treat all object dereferences something like an IUO — inserting low level code to check the pointer for validity, and aborting with a meaningful message if the referenced object isn't fully initialized.

Currently, reference-identity-preserving Decodable doesn't work at all. Even something with run-time checks is better than what we have now.

Do you mean that obj is not the same as the object executing init(from:)?

If it's never safe to access the obj.something property, then there's a circular dependency chain where obj depends on this object's initialization. That can't be decoded, and should fail.

Conversely, if there's no circular dependency chain, then obj can be fully initialized as soon as the dependent object needs it.

QuinceyMorris · September 14, 2018, 12:43am

My choice of terminology was unfortunate here. I'm not talking about Codable synthesis.

"Compiler-generated code" literally meant the code that the compiler produces from the source code. I was originally going to say "runtime", but I realized the code might be generated inline without calls into the runtime, so that might not be accurate.

What I was really trying to say is that some of the smarts of (say) keeping track of object initialization state could be assigned to the Decoder, rather than being built into the compiler. Or, it could be built into the compiler (as the semantics of a checked init method).

Most of the work could be done in the Decoder implementation, except for 2 things:

The breaking apart of allocation and initialization. Obviously the compiler/runtime has to provide some kind of support for this.
The check for dereferencing uninitialized references. Even if the Decoder does the work, the compiler has to emit instructions to call into the Decoder. At this stage, I have no idea how that might be made to work, but again I'll point in the general direction of unowned.

Sorry for being too hand-wavy in this part of the discussion. I was trying to be brief, but that wasn't a good idea.

QuinceyMorris · September 14, 2018, 3:22am

Now that I've written this out at such length (sorry!) I think I can restate the essence briefly:

During archive decoding, if we can safely divorce object allocation from initialization, then we can simply and safely remove all ordering dependencies of stored properties.

This has several extremely interesting implications:

Since [Reference]Decodable synthesized conformance can do nothing but re-set values of stored properties, synthesis is a completely solved problem. Any archive whose contents can be described by synthesized Decodable types can be unarchived.
Re-setting stored properties typically accounts for the vast majority of the operations needed during archiving. Out of the relative few that remain, only those that actually have initialization order dependencies will need to be handled carefully.
If problems of initialization order dependency are few, it seem feasible to leave them to developers to solve on an ad-hoc basis, and there's no need to design additional language or runtime changes to solve them more generally.
Recreating synthesized behavior manually, when synthesis can't be used for other reasons, would be as easy for a developer as it is now.

itaiferber · September 14, 2018, 4:05pm

Agreed — the actual details and implementation are a separate topic best discussed elsewhere. Just wanted to bring up the point.

A simpler solution these days is to check the type itself rather than the value:

struct Foo {}
class Bar {}

func f(_ v: Any) {
    print("\(v):", v is AnyObject, type(of: v) is AnyObject.Type)
}

f(Foo()) // => Foo(): true false
f(Bar()) // => Untitled.Bar: true true

So this should be possible to do without changing the protocols directly (at least for this portion).

itaiferber · September 14, 2018, 4:26pm

I was actually thinking of the initializer pattern made possible by protocol extensions, which unconditionally allow self-assignment on both structs and classes:

struct Bar {}

class Foo : RawRepresentable, Decodable {
    let rawValue: Bar
    required init(rawValue: Bar) {
        self.rawValue = rawValue
    }
}

extension RawRepresentable where RawValue == Bar, Self : Decodable {
    init(from decoder: Decoder) throws {
        self = Self.init(rawValue: Bar())!
    }
}

We use this factory initializer pattern in a few places in Foundation (namely with NSNumber IIRC) — it's possible to replace self with a default implementation provided by a protocol AFAIK. @Joe_Groff Is my understanding of what happens in this case incorrect? This does actually replace self for classes in a way that allows you to effectively return a different instance from an init, yeah?