Allow forming weak references to partially constructed objects

Nickolas_Pohilets · April 14, 2023, 2:10pm

Problem

Currently when class is initialising child objects, which need a callback to the parent object, additional code is needed to pass initialisation checks:

class Inner {
   let foo: Int
   let callback: () -> Void
   init(foo: Int, callback: @escaping () -> Void) {
       self.foo = foo
       self.callback = callback
   }
}

class Parent {}

class Outer: Parent {
    let inner: Inner

    init(foo: Int) {
        inner = Inner(foo: foo) { [weak self] in // error: 'self' used before 'super.init' call
            self?.handleCallback()
        }
        super.init()
    }
    
    private func handleCallback() {}
}

Some common workarounds:

#1. Make callback writable:

class Inner {
   let foo: Int
   var callback: () -> Void
   init(foo: Int, callback: @escaping () -> Void) {
       self.foo = foo
       self.callback = callback
   }
}

class Outer: Parent {
    let inner: Inner

    init(foo: Int) {
        inner = Inner(foo: foo, callback: {})
        super.init()
        inner.callback =  { [weak self] in
            self?.handleCallback()
        }
    }
}

#2. Capture mutable variable:

class Outer: Parent {
    let inner: Inner

    init(foo: Int) {
        var weakSelf: Outer? = nil
        inner = Inner(foo: foo) { [weak self] in
            self?.handleCallback()
        }
        super.init()
        weakSelf = self
    }
}

Idea

Workarounds may vary, but some form of mutability is needed to construct cycle like that to break a cycle in initialisation dependency.

Weak references already are a tool for breaking retain cycles. With changes to implementation of the weak references, the same language mechanism could be used to break cycles in initialisation dependency. This would reduce amount of boilerplate code, and smoothen the learning curve of the language.

Currently object with weak references can be in one of the two states - live or deinitialising. Allowing weak references to partially constructed objects introduces a third state - initialising. Strong reference can be formed only for objects in the live state. Attempting to get a strong reference to an object in the initialising state returns nil.

There are two alternatives for when object transitions from initialising to live state:
A) On init() of the root class.
B) After init() of the most-derived class.

class Inner {
    let check: () -> Bool
    init(check: @escaping () -> Bool) {
        self.check = check
    }
}

class Outer: Parent {
    let inner: Inner
    init() {
        inner = Inner { [weak self] in self != nil }
        super.init()
        print(inner.check()) // true for A, false for B
    }
}

func test() {
    let outer = Outer()
    print(outer.inner.check()) // true for both A & B
}

In practice, there is little difference between them, since typically callbacks will be called based on events delivered on the next runloop iteration/executor job.

Challenges

Currently weak references don't exist as a type. As a consequence, partially initialised self can be assigned to the weak reference, only if it exists in the scope of the initialiser. There is no way to pass around partially initialised self or a weak reference to the partially initialized self.

struct WeakRef<T: AnyObject> {
    weak var ref: T?
}

class Outer: Parent {
    init() {
        weak var weak1 = self // ok
        let callback: () -> Void = { [weak self] in self?.doSomething() } // ok
        var weak2 = WeakRef(ref: self) // error
        super.init()
    } 
}

Also, some of the existing code may assume that reading nil from the weak reference implies that object is dead and associated data can be cleaned up. Introduction of the initialising state breaks this assumptions. It might be safer to keep behaviour of the built-in weak references as-is, and instead introduce suggested functionality in a new type:

// Protocol that all class types conforms to, but also all existential bound to `AnyObject`
protocol ReferenceType {}

/// Weak reference as a type
struct WeakRef<T: ReferenceType>: Washable {
    /// @_maybePartiallyInitialized is a magical attribute that suppresses checks for complete initialisation.
    /// Argument type is intentionally non-optional
    init(@_maybePartiallyInitialized _ object: T)

    var object: T? { get }
    var state: ObjectState<T> { get }
}

/// Enum that clearly disambiguates between `initialising` and `reinitialising` states
enum ObjectState<T: ReferenceType> {
    /// Object is still initialising - try again later
    case initializing
    /// Object is alive, and will not deinitize while you are using provided strong reference
    case live(T)
    /// Object is deinitizing - remove all accosted data
    case deinitializing
}

Also such new type could provide stable object identity that persists after object starts deinitialising (see Hashing Weak Variables) or be part of the API that allows listening to changes in object state.

See also Should weak be a type?

michelf · April 14, 2023, 5:52pm

My preferred way to handle this is to use an implicitly unwrapped optional:

class Outer: Parent {
    private(set) var inner: Inner!
    init() {
        super.init()
        inner = Inner { [weak self] in ... }
    }
}

or lazy when I don't need the object to be alive right away:

class Outer: Parent {
    private(set) lazy var inner = Inner(foo: foo) { [weak self] in ... }
}

I don't think a weak reference with an initialization phase is a good idea in general. I think it's better to subscribe to a callback once you're ready to actually receive it, not before, as this makes interactions between the callback and self clearer.

So the best solution here I think is the implicitly unwrapped optional inner above. Alternatively, making the callback mutable (#2) isn't bad either. In both cases you're making the moment when you're ready to handle a callback explicit in the initializer.

Perhaps a little better than an implicitly unwrapped optional: it should be possible to make a "set only once" property wrapper that would make it impossible to mutate inner after it was first set to a value.

tera · April 14, 2023, 7:02pm

Technically it's not "a use", unless I call inner before calling super.init or unless super.init() changes self(which happens extremely rarely) or unless Inner initialiser itself calls the passed closure right away – it's just hard for the compiler to know all those details.

For completeness yet another workaround would be making inner a variable, assigning it to some default value first:

    var inner = Inner(foo: 0) {}

    init(foo: Int) {
        super.init()
        inner = Inner(foo: foo) { [weak self] in self != nil }
    }

It's marginally better than using unwrapped optional – with unwrapped optional I can mistakenly assign the variable back to nil and get a crash. Although it would still be possible to reassign inner to something wrong / or more than once – not ideal but ok-ish in practice.

John_McCall · April 14, 2023, 7:50pm

Edit: Sorry, my initial post here was unfair. Your post is a little oddly organized, and it took me a second to really wrap my heap around what you're proposing, which is essentially that we allow weak references to incompletely-initialized objects to propagate arbitrarily, but that we should have a runtime mechanism by which those references evaluate as nil until the referent is completely initialized.

This is an interesting idea, but it'd be a pretty major change to weak references. It would become extremely important whether we do something like weakRef1 = weakRef2 as a copy of weak references or by temporarily transitioning through a strong reference. It would also impose pervasive costs on most classes — we don't currently have to do extra dynamic work at the point of complete initialization, but now we'd have to if it was possible that a weak reference had been formed to self. Unfortunately, only the first of those would be solved by using a new kind of reference.

kelin · April 14, 2023, 9:10pm

Looks too tricky to me. @michelf proposed simple and working solution, why mess with weak?

Nickolas_Pohilets · April 14, 2023, 11:10pm

Does Swift even handle this? I'm aware about this in ObjC, but in ObjC [super init] must be called before initialising any ivars. And in Swift it is other way around. If super.init() changes self, will already initialised properies be copied to the new instance? Or Swift will ignore new instance completely and will keep using the old one?

Do you have any examples, where fine control over when self is ready to receive callback is needed? I see why option A may not suite the needs - even after initialising all the stored properties class still may be logically not fully initialised, without extra actions. But with option B all initialisation actions are guaranteed to complete. I don't think option B can be too early, but maybe you have examples where it is too late?

I'm thinking if this could be optimised.

Option A.

Let's say there is an attribute computed by the compiler that indicates if weak references to incomplete object are formed in the init. This attribute is emitted into SIL and swiftinterface. Such weak references can be created only in the body of the init before call to super.init or in the super.init. To compute this property, we don't need to have access to the body of the super.init, only to the value of the attribute for the super.init and body of init. Since self cannot be passed to any other function before super.init is called, no other functions need to be analysed.

If attribute indicates that there are no weak references to the incomplete object, then objects can be allocated (by the allocating init) already in the live state, and transition from initialising to live (atomic write) can be avoided.

Value of the attribute may vary between base and derived classes. With option A transition from initializing to live happens inside the init() of the root class which would need to perform some runtime check.

Since there cannot be concurrent writes at this moment, object status could be read non-atomically. So looks like we can optimise from "atomic write for every class" to "non-atomic read + conditional jump for every class, and atomic write for some classes". Not sure how much of optimization this is.

Option B.

In option B transition happens in the allocating init, which is generated for most derived class, and if we can know statically if weak references to incomplete objects are used, we can truly pay only for what is used. But computing the attribute becomes much harder. Now scope of analysis also includes all the code after super.init(). Which may include self being passed to arbitrary functions and even escape. And if self escapes, it could be accessed by other threads/tasks. So as long as self escapes, it is not possible to statically know if any weak references are formed during the duration of the init. But even self does not escape, analysis looks challenging.

michelf · April 14, 2023, 11:57pm

I see it more as "knowing you're not dropping a call to the callback because it happened before the object was ready". For instance, you could subscribe for an observer that fires its initial state upon subscription. Perhaps this observer is asynchronous and timing-dependent and now you have a race condition. Or perhaps it's synchronously calling with its initial value upon subscription and you'll always miss the initial call for the initial value.

In contrast to debugging issues mentioned above, catching a bug where inner is going to be reassigned after the initial assignment is a piece of cake. It's much easier to add runtime checks for an object internal state than for inter-object interactions involving callbacks. The easiest check would be this:

var inner: Inner! {
  willSet { precondition(inner == nil, "Don't reassign a second value to inner!") }
}

This could also be accomplished with a property wrapper.

It might not matters for all situations, but it's just a good pattern not to tell the other object that you're ready to receive calls before you're actually ready if you care about separation of concerns. When the callback gets called is the other object's concern, yours is to handle the callback correctly.