Yes, protocol extension initializers can return an arbitrary instance. If class convenience initializers gain the ability to reassign self too, though, that will be just as expressive.
Understood, but I still have a few concerns here about the model:
- It seems unlikely by your description of this that the fact that an object was initialized via a
checked init
would be represented any differently in the type system. In that case, what guarantees, if any, do methods have that accepting a given object value won't blow up on them when attempting access (because the value wasn't fully initialized)? (I can't think of any, and this runs somewhat counter to the safety guarantees Swift tries to make.) - Because these values are not represented in the type system any differently, and because the compiler cannot statically tell when a value is going to be fully initialized, this makes a check for object initialization necessary on every access to the object over the entirety of its lifetime. With IUOs, this is true until you can unwrap them into guaranteed instances that no longer require checks
As such, is it possible to come up with a creative way to use IUOs or similar to model this? Perhaps the return type of these methods should not be Self
but Self!
(or similar) and treated as such until meaningfully unwrapped?
To clarify my intent with the question here: we can indeed offer a solution that is better than what Objective-C has, sure, but is it the best solution that we can come up with? Whatever we decide on can make a difference for the semantics of the language (or not), and we should feel confident that we've explored the space to know that we're happy with the solution.
This is indeed what I mean: in @jamesmoschou's example, various nodes point back up to their ancestors. However, ancestors are never safe to access until you yourself have completed decoding. Because this relationship is impossible to model in the type system (or expressly in code, really), it's trivial to write code (or worse, have code synthesized on your behalf) which looks correct but can never safely execute (except for in two-phase initialization) β the failure here would be an unavoidable runtime trap, though.
This is one of my primary concerns here: that it will be too easily possible to express in code incorrect assumptions about relationships between objects, in a way that will be automated by synthesis with no guarantees of safety. We already have this problem with Objective-C, but Swift works so hard to guarantee safety and correctness: I don't think we should be willing to abandon that to solve this problem.
This is the last response I'll make for now to give myself a chance to digest the rest, but I think this is the crux of my belief in two-phase initialization:
Sure: the object has to be fully initialized when it returns, but that does not imply it is in its final, most useful state. See all of the above examples of parent-child relationships which fundamentally cannot be represented without having an escape hatch (e.g. nil
reference to parent).
An object is free to be returned from an init
but that does not imply the process is done.
My argument is that you either:
- Cannot end up relying on a situation like this because it's impossible to represent with regular
init
semantics, or - Whatever solution you come up with manual initialization can similarly be solved with structured two-phase initialization
This sounds like
class A {
var b: B
init(b: B) {
self.b = b
// do something with b
}
}
class B {
var c: C
init(c: C) {
self.c = c
// do something with c
}
}
class C {
var a: A
init(a: A) {
self.a = a
// do something with a
}
}
Am I mistaken? I'm happy to accept a counter-example.
Today's episode of the long-running saga:
I think there's a useful way forward if we take the large problem space and divide it into two smaller problem spaces, which I'll call "Decodology" and "Recontructology".
-
Decodology deals with the problem of reconsituting objects in an object from the information in a stored archive. The reconstituted graph may not be the complete, original graph that was archived.
-
Reconstructology deals with the problem of completing a partial or unusable object graph using whatever information already exists in the graph, or from other sources of information.
Let's take a micro-example:
class Node: Codable {
private(set) var children = [Node] // a one-to-many relationship
weak private(set) parent: Node? // inverse of the "children" relationship
}
Suppose that both the children
and parent
properties of a graph of Node
instances were archived at encoding time. [Also, suppose that the postulated encoder and decoder is capable of preserving reference identity. That's implicit in everything that follows.] The decodology of this is the problem of how to decode β how to write an init(from:)
method for β the archived objects in a way that restores the values of both properties.
As it happens, this problem is difficult within Swift's currently capabilities. Circular chains of initialization dependencies make it impossible with just the init(from:)
method.
However, let's put that problem to one side, press on, and change the scenario. Suppose the parent
property value isn't archived. It is, after all, redundant, so it uselessly doubles the amount of storage needed for archiving the object graph structure.
Upon completion of unarchiving, therefore, we need the parent
property to have the correct value. How is that to be achieved?
What's important to notice here is that the answer that question doesn't involve decoding anything because the answer isn't in the archive β the archive contains no un-decoded information. Decodology doesn't help. We are now in the realm of reconstructology, where we need to find some way to interrogate existing objects (maybe: "Hey, Node! Got any children?") to fill in the missing information.
I've come to the conclusion it helps to keep these two problem domains apart. You can think of this as two-pass unarchiving if you like; it's as correct a characterization as anything else.
Both decodology and reconstructology have trouble with order dependencies, but that doesn't mean we need the same solution in both domains.
At the moment (and, arguably, so far in this thread) I am mainly interested in decodology. I've got some new ideas about how to approach it, but nothing fully worked out yet. OTOH I haven't thought much about reconstructology. If anyone wants to jump in and solve it, have at it!
Next episode under construction β¦ .
Hi @QuinceyMorris just following the thread an letting others speak but I feel that you and @itaiferber are speaking in similar ways w.r.t you last post here. Though I donβt think anyone has the best approach forward yet... just observing I think that justifies the βtwo-passβ approach?
Also mainly posting to state Iβm interested and listening.
(I'm sooooooo sorry this is long.)
Decodology
The problem to solve for decoding is initialization order dependency.
The Experimental Subject, Dissected
Here's the example I mentioned earlier:
class Node: Codable {
private(set) var children = [Node] // a one-to-many relationship
weak private(set) parent: Node? // inverse of the "children" relationship
}
Let's add simple-minded decoding to this:
init(from decoder: Decoder) {
let container = decoder.container(keyedBy: CodingKeys.self)
children = container.decode([Node].self, forKey: .children)
parent = container.decodeIfPresent(Node.self, forKey: .parent)
}
This will crash after mutual recursion fills up the stack. A child cannot be decoded before its parent, and a parent cannot be decoded before its children β¦ forever.
Getting Our Feet Wet
Previously, I was trying to find a way to defer the setting of (for example) the parent
property. It occurs to me that, instead of dipping our toes in the waters of dependency ordering, we could just try jumping into the deep end. Consider this:
init(from decoder: Decoder) {
let container = decoder.container(keyedBy: CodingKeys.self)
children = container.decode([Node].self, forKey: .children)
defer {
parent = container.decodeIfPresent(Node.self, forKey: .parent)
}
}
Clearly, this doesn't actually solve anything, if we use the current semantics of defer
. It moves a line of code to the end of the initialization, but it's still in the initialization. But it is kind of interesting to think about, if you think of it as meaning "defer the availability of the value of parent
until later".
If parent
had "deferred" availability, that would suggest that code inside this initializer could not be allowed to dereference or vend the property value.
- Note: "Dereference" means invoke a method or access a property via the
parent
reference. "Vend" means pass the reference as a parameter to another method.
Previously, I suggested that this restriction could be enforced at runtime by using a specially crafted reference that would lead to a crash if it was dereferenced. That came unstuck because (amongst other reasons) it couldn't handle self
-replacement during initialization, which is possible in some contexts in Swift objects, and always in Obj-C objects.
Serious Syntax
Gathering all these pieces together, consider this alternative:
class Node: Codable {
private(set) var children = [Node]
deferred weak private(set) parent: Node?
deferring init(from decoder: Decoder) {
let container = decoder.container(keyedBy: CodingKeys.self)
children = container.decode([Node].self, forKey: .children)
parent = container.decodeIfPresent(Node.self, forKey: .parent)
}
}
[The syntax and naming is of course schematic.] This would work as follows:
-
The
deferred
attribute would have no effect on property accesses in code outside adeferring
intializer. It would only modify how the property is interpreted inside such an initializer. -
Inside a
deferring
intializer,deferred
properties could be set or assigned to otherdeferred
properties, but not dereferenced or vended. -
We also need to express that a method such as
decode(_:forKey:)
can return a reference of deferred availability, so it would be declared like this:
func decode<T>(T.Type, forKey: KeyedDecodingContainer<K>.Key) -> deferred T
If you want a mental model for the syntactic rules for deferred
, think of it along the lines of "!" indicating an IUO. Like "!", it's a declaration attribute, not a type attribute. More on its meaning later.
- Finally, a
deferring
intializer would produce a "referencing value" β that is, a result of value type that encapsulates an initialized object reference. More on this later.
That's the syntax. Now let's move on to the semantics.
Serious Semantics
A deferred
property is, in effect, a pair of properties that shares the same storage. One version of the property "exists" only during a deferring
intializer, and it contains a unique ID (details private to the implementation) that identifies the instance it is eventually going to be.
- Note: This is an abstraction of the allocated-but-uninitialized pointer scheme that I proposed earlier, but it doesn't have to be a pointer: it just has to be unique. It also has to be, for all practical purposes, an
Int
, since it has to fit into the storage normally allowed for a pointer. It'd also be nice if its bit pattern could easily be distinguished from a "real" pointer.
The second version of the property "exists" at all other times, and it's a normal reference to a fully initialized object.
- Note: This distinction between two property versions is known statically at compile time.
To make this work, the compiler has to synthesize code associated with each deferring
initializer. The synthesized code is responsible for replacing the unique ID of each deferred
property with a real object reference, thus transitioning the property to full usability, and completing the object's initialization. For the sake of concreteness, let's assume the synthesized code is packaged as a private method which we can refer to as _postinit
(the actual name β if there is one β is an implementation detail). More about this later, too.
When or how is this _postinit
method invoked? Imagine that there might be a pool or collection of object references that was indexed by unique ID. If you want a mental model for this, think of an autorelease pool, which is an existing mechanism for a collection of references on which an operation is to be carried out "later" β that's called "draining" the pool. That's approximately how the deferral pool works too, but the "draining" operations in this case are invocations of _postinit
.
Now, the interesting thing to note here is that a collection of object references indexed by some unique ID happens to be exactly the kind of data structure that a reference-identity-preserving decoder needs to build anyway. That make it reasonable to push the implementation of the deferral pool mechanism out of the compiler, and into the decoder.
The decoding mechanism is controlled by a "root decoder" object. This is an object similar to JSONDecoder
or PropertyListDecoder
, in that it triggers the entire decoding process, but likely doesn't actually conform to the Decoder
protocol.
The Decoder Algorithm
Let's look at what a root decoder needs to do. To hold the discussion to manageable length, let's make some simplifying assumptions:
-
We can assume, without loss of generality, that the root decoder can find in the archive data a single root object from which all other objects are reachable. [If there are multiple root objects, or disconnected subgraphs, or if the decoder starts with a non-root object, the algorithm can be run multiple times until all objects are decoded.]
-
We can assume, without loss of generality, that every object in the archive data already has a unique ID in the archive. [This is just another way of saying we can find objects in the archive.]
-
We can assume, without loss of generality, that we can get directly to archive data from the unique ID, and that a
Decoder
magically knows the unique IDs of the objects it's responsible for decoding. [This relieves us of the burden of constantly mentioning decoder containers, coding keys, and the rest of the standardDecodable
machinery.] -
We can temporarily ignore archived values of value types, and pretend there are only archived objects of reference types. [I'll circle back to value types later.]
The root decoder starts by invoking a decode(β¦) -> deferred T
method to decode the root object. This decode
method causes other decode
methods to be invoked, and so on until everything in the object graph is decoded.
Each decode
method starts by finding the unique ID of each value it is to decode, and looks up the unique ID the deferral pool.
-
If the corresponding value has not yet been decoded, there will be no entry in the pool. The
decode
method creates an entry, marks it as "initializing", and invokes thedeferring init(from:)
method for the appropriate type. Upon return from the initializer, thedecode
method updates the pool entry, marking it as "initialized" and storing the "referencing value" from the initializer in the updated entry. -
If the pool contains an entry for the uniqueID, the
decode
method simply returns the uniqueID, regardless of whether the entry is marked as "initializing" or "initialized". -
(Optimization) If the existing entry is marked "initialized", the
decode
method could return the "referencing value", and that could be stored as thedeferred
property value instead of a unique ID. In many, many use cases, this will eliminate most of what_postinit
otherwise has to do.
Eventually, the original decode
invocation will return to the root decoder. At this point, all decoded objects are represented in the deferral pool, and all entries in the pool will be marked "initialized".
In a final step, the root decoder must cause _postinit
to be invoked on each entry in the pool. When this is complete, all references are valid, and all objects are fully initialized.
One approach to this would be to find a way to let the root decoder invoke this (invisible) method on each entry, while iterating through the pool. However, it's pretty difficult to guarantee that the root decoder gets this right.
A better approach would be to provide a way for the root decoder to initiate post-processing on the root object (indirectly invoking its _postinit
), and to synthesize each _postinit
to invoke other _postinit
s recursively. The return value of the function that initiates this would return a valid reference to the root object, and all reachable objects in the object graph will be valid too. Note that each subclass _postinit
would need to invoke super._postinit
, so the method would need to by dynamically dispatched (or something equivalent).
That second approach is safer because it doesn't expose any object references until all objects they depend on are fully initialized.
Standard Types
In the above description, I glossed over the details of deferred return types (β¦ -> deferred T
) and the "referencing values" produced by deferring
initializers.
A deferred
return value is essentially an Int
unique ID. However, to enforce type safety when a deferred
return type is assigned to a deferred
property, we can conceptualize the return type as a generic standard library type, which I'll call DeferredReference<T>
. This type will have a uniqueID
property to recover the Int
value it was initialized with, and a incompleteReference
property to keep track of an associated reference value.
The value returned by a deferring
initializer is essentially a valid object reference, except that the object's _postinit
method hasn't been invoked yet (and so it's not yet safe to dereference yet). To represent that, the actual reference would be wrapped in another generic standard library type, which I'll call IncompleteReference<T>
. This type exposes a method that invokes _postinit
using the wrapped reference, and returns a usable object reference.
Here's the essence of those types:
struct DeferredReference<T> where T: AnyObject {
let uniqueID: Int
var incompleteReference: IncompleteReference<T>? // nil == "initializing", non-nil = "initialized"
}
struct IncompleteReference<T> where T: AnyObject {
private let reference: T
let postinit: (pool: [Int: DeferredReference<T>]) -> T
}
The root decoder (or one of its proxy Decoders
) is responsible for creating DeferredReference<T>
objects, and later setting their incompleteReference
property.
Note that the IncompleteReference<T>.postinit
method takes the deferral pool as a parameter. It passes the pool
parameter to the reference
's _postinit
method, so that it can look up the reference values it needs. One other tiny wrinkle is that one of these two struct types will need to keep track of whether _postinit
has already been invoked, so that it's not done more than once.
Value Types
I haven't said anything about value types yet, because I think they're not much of a problem. (This is based on earlier coding experiments with reference-identity-preserving archiving, where things just "fell out" when I did the obvious thing with value types. I'm expecting that here, too.)
A deferred β¦: V
property, if V
is a value type, doesn't need a unique ID, because it can serve as its own deferred value. If the value type is a custom type with members of reference type, the deferred value is the value with deferred references for those members. Such a value type has to have a deferring init
to create deferred values, and will need a _postinit
method to re-set references within its deferred
members (recursively).
The only oddity is that a deferred β¦: T?
will β I think β need to be treated in a deferring
initializer as Optional<DeferredReference<T>>
, not DeferredReference<Optional<T>>
. I'm not sure if there's any real weirdness here.
That's All
No doubt I've overlooked a tragic flaw in all this, but this is where I'm taking my stand.
Sorry to be resurrecting an old discussion, but I wrote a little library that can correctly encode/decode a Codable
object graph containing both duplicates (multiple references to the same object) and cycles (with a small change to the data model): GitHub - greg/CyclicCoding: Encodes & decodes Codable object graphs containing duplicates and cycles
More info is in the README.
I don't think something like that (in its present form) would be worth adding to the Swift language, but it does the job for me and I think it might do the job for some of you as well.
For folks who might still be interested in this whole topic, my latest solution to the cyclical reference problem is here: Archivable.swift Β· GitHub
It uses property wrappers to annotate properties for archiving and discovers those properties using Mirror. In order to patch up cyclical references, my Archivable
protocol requires that the type have an empty init()
- so instead of using an initializer to populate the instance with the property values, all values are instead set later when read(from:)
is called later on that instance. This is obviously a pretty different approach from Codable
, but I think it'll work pretty well.
public protocol Archivable {
/// You'll need this. You can likely get this for free from Swift if you use structs or final classes and provide initial values for all your properties.
init()
/// Generally you don't have to provide this - use the default implementation.
func write(to archive: ArchiveWriter) throws
/// Generally you don't have to provide this - use the default implementation.
mutating func read(from archive: ArchiveReader) throws
}
Using clever default implementations means you pretty much never have to write any encode/decode functions, so using it is as simple as this:
struct Point3: Archivable {
@Archive var x: Float = 0
@Archive var y: Float = 0
@Archive var z: Float = 0
}
final class Child: Archivable {
@ArchiveWeak var parent: Parent?
@Archive var position = Point3(x: 10, y: 3, z: -2)
}
final class Parent: Archivable {
@Archive var name = "whatever"
@Archive var children: [Child] = []
@Archive var number = 42
@Archive var dictionary: [Int : String] = [:]
}
One key limitation of my implementation, though, is that it doesn't support subclasses because I couldn't work out how to solve it without requiring the user to pre-register types or something like that. (Swift doesn't seem to be quite dynamic enough to make this work?) I'm sure there are other problems with it - but perhaps it'll be fun and/or horrifying to read over the code (linked above).