Enabling safe, non-optional, circular referencing with shared reference counts

chrisbia · July 5, 2020, 10:46pm

I don't often come across a data modeling problem where circular referencing appears to be a good solution. In general I find that arguments that attempt to justify circular references appear contrived (Apartment <-> Person), and because of that lacking any true real world applicability. That said, circular referencing isn't always an attempt at a data modeling solution, more often than not I find it's an attempt at an encapsulation solution.

Often when defining complex models code files can become large and in an effort to make the repository more digestible functionality and properties are encapsulated into smaller bite-sized classes. The problem arises when some of those bite-sized classes require access to data within the original scope. So you attempt to provide access to it:

class Human {
    private var brain: Brain!   // Very complex component. So naturally it's moved into its own file.
    private let heart: Heart
    private let lungs: Lungs
    // ... tons of code ...
    init() { 
        self.heart=Heart()
        self.lungs=Lungs()
        self.brain=Brain(human: self)
}

New file:

class Brain {
    private let human: Human
    // ... tons of code ...
    init(human: Human) { self.human=human }
}

Obviously this is a problem because it causes a retain cycle. You can try to make Brain.human unowned, but that code becomes fragile. You can make it weak, but now any members that return something in Brain that use human will have to return Optionals instead, polluting the API unnecessarily with uncertainty.

What I'm about to pitch here is a rough sketch of a mechanism to a solution to this problem, not necessarily the syntax that that solution will take the form of.

So, instead of making just Brain.human unowned we can also make Human.brain unowned and also add something else to the procedure during Human.brain's initialization:

class Human {
    private unowned var: Brain!
    // ... tons of code ...
    init() { 
        // ... init code ...
        self.brain={
            let brain=Brain(human: self)
            pool_reference_counts(self, brain)   // Secret sauce.
            return brain
        }()
    }
}

What pool_reference_counts(_:) would do is sum the reference counts of the given parameters into a single count and make the given objects share that count. Every time a new brain or human strong reference would arise, that count would increment by 1, effectively incrementing both their reference counts. By the same token when a strong reference to either of them is destroyed that count would decrement by 1. And the only time Human.brain or the owning Human would be deallocated is when there are no more strong references to either of them.

This way we can guarantee that Brain.human will always exist, and rest assured that we won't run into any issues when accessing it in the body of Brain — the same going for Human.brain in the body of Human as well.

Disregarding the specific syntax I used to express this shared reference counting concept, I think this mechanism is a means of eliminating the headaches induced by trying to shoehorn circular referencing in a language that is not memory managed, and will make us less likely to choose between encapsulation, safety, or model integrity.

Lantua · July 6, 2020, 7:28pm

Interesting idea. Though it may be a little tricky since you need to keep track that Brain is tied to a particular Human, even if it escape this unowned variable.

I’m not sure where the ref count is stored, but chances are, it’s with some other unsharable metadata.

wiresoft · July 6, 2020, 9:42pm

Wouldn’t it be sufficient for retain(myBrain) to actually be forwarded retain(myBrain.human) , and similar for release()?

I guess this might break if you are allowed to assign a brain to a different human. Maybe this could be restricted to let references?

wiresoft · July 6, 2020, 9:55pm

Actually now that I think about it just forwarding ARC traffic from Brain to Human won’t work because in the general case there’s no guarantee that multiple Humans won’t reference the same brain, and then Brain doesn’t know who to forward to. We would need some kind of uniqueness guarantee, almost like a move-only reference.

chrisbia · July 7, 2020, 4:22am

Presumably the reference count is stored at an address in memory. If that's the case ARC traffic would not need to be forwarded. At the point of sharing of reference counts, the counts of the two objects would be summed to a new integer. That integer would then have its own address in memory. Then the pointer for the reference counts of the two existing objects would be mutated to point to this new sum. Then each object's reference count would be incremented/decremented as usual, the only difference being that the memory address for the reference count of each object is now the same.

jrose · July 7, 2020, 4:49am

Reference counts in Swift are (currently) stored inline in an object, except in some circumstances where one machine word of information isn't enough. Those cases are generally kept to a minimum to keep reference counting fast, and I think at least one of the out-of-line cases stores additional per-object information that you wouldn't want to pool (though I don't remember at the moment).

This pitch doesn't break the semantics of Swift, but it would detract from the performance of reference counting in at least some circumstances, so the tricky part would be limiting those circumstances to just those where you're doing this pool thing. I have to admit I don't think the additional implementation complexity (and teaching complexity) is worth it for this feature.

chrisbia · July 7, 2020, 8:06pm

What if instead of sharing reference counts, the objects were in fact the same instance? This way reference counting speed wouldn't be compromised because it would simply operate as it is currently. The difference would be that Brain would be a different 'perspective' into the same instance.

You could define it like so:

class Human {
    private unowned var brain: Brain!
    private let heart: Heart
    private let lungs: Lungs
    // ... tons of code ...
    init() { 
        self.heart=Heart()
        self.lungs=Lungs()
        self.brain=self.Brain(human: self)  // A specific `Human's` `Brain` initializer would only be able to be called once.
    }
}

New file:

class Brain (Human) {
    private unowned let human: Human
    // ... tons of code ...
    init(human: Human) { self.human=human }
}

In a way this 'sideclass' is like a subclass except instead of defining a hierarchical relationship, it creates a lateral one. Not an 'is a' relationship, but an intrinsic 'has a' relationship. This would mean casting a Human to a Brain would be incongruous, so the only way to access it would be if a reference to it was stored at the point of its creation, and then made accessible through Human's API.

This is important because relationships don't only exist between distinct entities. Often, you have relationships that are intrinsically compositional; where the existence of one thing is dependent on the existence of another (Human -> Brain). Then you have others that are not, for example (Person -> Car). In the latter case, if a person loses their car it is conceivable that the person may still exist. This is ok because Swift's type system allows us to reflect the independent nature of that relationship. But if it stands to reason that it's inconceivable for a Human instance to exist without a Brain instance we should have a means of reflecting that relationship in our type definitions without compromising on safety or truthfulness.

An instance of a subclass includes all the data of its superclasses. You can cast a subclass to one of its superclasses easily because all of the necessary data is there, inline with the subclass's data. What I'm pitching here is that another type of relationship is created similar to the inheritance one, but it is compositional in nature. The result of using it would mean that all the data for an instance of the composed type is inline with the data of its composing instance. The characteristic that would underpin the dependent nature of this relationship is that the composing instance's reference count would be used by all the composed instances as their own reference counts; basically an instance in an instance.

As an aside, you may ask why define a Human.brain property when you've already defined an explicit 'has a' relationship between a Human and Brain. It's because I like being able to look at the top of class's definition, see the stored properties, and know what objects I can interact with in that scope. Without a stored property for Human.brain, say instead is was just left implied by Brain's very definition, it would be more ambiguous as to what properties I can interact with from within Human. That said, I imagine there are alternatives for defining an API that is non-ambiguous and not redundant, but I figure those can be fleshed out if this is worth pursuing; hoping that that aspect of it is not the thing that makes it a worthwhile pursuit.

beccadax · July 8, 2020, 12:36am

The one that comes to mind immediately: a weak ref actually points to the side allocation; reading a weak ref really reads a pointer back to the original object from that side allocation. That pointer gets nilled out when the object is deallocated, which is how weak refs magically become nil without anything explicitly setting them.

This depends on objects and side allocations having a 1:1 (or 1:0) relationship—if two objects shared the same side allocation, weak refs to one would actually go to the other.

wowbagger · July 8, 2020, 3:40am

What's side allocation?

glessard · July 8, 2020, 10:36am

The side allocation is (in this case) an additional memory allocation done by the runtime. When you create the first weak reference to an object, a small amount of memory is allocated to keep track of the weak references. It remains allocated until the last weak reference disappears, which may happen after the main object has been deinitialized.

chrisbia · July 8, 2020, 3:05pm

In this case though, that's the point. If it's dissatisfactory for two separate objects to share a single reference count then an alternative is to have one object, one reference count, but be able to interact with that one object in a way that can (from the perspective of a Swift developer) effectively make it behave as multiple objects.

For example, let's say you have:

class Animal {...}
class Human {...}

If you instantiate both of these classes they will exist separately in memory because they are two different classes.

Now if you do this:

class Human: Animal {...}

Presumably Humans no longer exist separately from an Animal in memory. You can cast a Human instance to an Animal and lose access to the Human API. But then cast it right back because nothing about the object in memory fundamentally changed just how you were able to interact with it did.

What I'm pitching is along the same lines. When I say "basically an instance in an instance", more specifically this is a set of properties and methods that exist on a 'host' object from the point of its creation. However, this other API, or 'guest' object, would not be able to be interacted with from the host type. For example:

class Human {
    unowned let brain: Brain
    ...
}
class Brain (Human) {
    func slowHeartRate() {}
}
class Ninja: Human {
    func feignDeath() {
        self.slowHeartRate()   // Error: `Ninja` does not have a member named 'slowHeartRate()'
        self.brain.slowHeartRate()   // Ok.
    }
}

Despite slowHeartRate() existing inline with Human's API, and thereby Ninja's, it would not be accessible because it would only be accessible on an actual Brain type. From the perspective of the Swift developer, a Brain and a Human would be objects with different APIs; no hierarchical relationship between them, no ability to cast from one to the other, but they would both be sourced from a single object in memory, passed around as a single object. This would enable developers to define models with relationships that are actually compositional in nature, not just one distinct object holding a reference to another completely distinct object.

Avi · July 8, 2020, 5:32pm

I think this is the money quote. This cuts away all the discussion about syntax and semantics and neatly describes what is wanted. Combined with the original post about dealing with reference cycles, we have an explanation for why it's wanted.

brandon · July 8, 2020, 8:09pm

Not sure if I fully understand, but you should be able to do this currently in Swift with protocols and generics with full type-safety.

protocol AnyBrain {
    func think()
}

class HumanBrain: AnyBrain {
    func think() { }
}

class NinjaBrain: HumanBrain {
    func slowHeartRate() { }
}


protocol HasBrain {
    associatedtype BrainType: AnyBrain
    var brain: BrainType { get }
}

protocol Human: HasBrain {
    // other requirements
}

extension Human {
    // normal human stuff
    
    func learn() {
        brain.think()
    }
    
    func eat() { }
}


class NormalHuman: Human {
    let brain = HumanBrain()
}

class Ninja: Human {
    let brain = NinjaBrain()
    
    func feignDeath() {
        brain.slowHeartRate() // only available in NinjaBrain, not HumanBrain
    }
}

class Frankenstein<Brain>: HasBrain where Brain: AnyBrain {
    let brain: Brain
    init(brain: Brain) {
        self.brain = brain
    }
}

chrisbia · July 8, 2020, 8:43pm

Missed the mark Brandon. This topic is about composition. The problem being addressed here is that you cannot currently link the lifespan of one object with another. A Human requires a Brain and vice-versa. If there are methods of a Brain that require its Human to produce the result, what safeties are there in place preventing its Human from being deallocated while Brain is still in existence? Can't strongly reference the Human from Brain because that would create a strong reference cycle causing a memory leak, using unowned is inherently unsafe, and with weak, members that utilize the Human must either now return optionals or throw an error when encountering a nil Human.

Those are the only three options for Brain to keep a reference to Human, and they are unsatisfactory to safely model many real world relationships.

John_McCall · July 8, 2020, 9:01pm

Is there actually a reason here not to use unowned references? It seems like you have some non-specific concerns about code becoming "fragile". However, this kind of circularly-referential system requires a multi-stage setup anyway, and it often requires multi-stage teardown. It's very rarely a problem in practice to just set things up so that no code will be using a Brain after the Human has been destroyed. You are almost assuredly relying on that kind of intelligent staging in your program already. Building complex new conceptual models where certain objects are actually part of other objects except that they can be created independently just seems like a mess.

brandon · July 8, 2020, 9:49pm

What I wanted to demonstrate in my example was that you can already compose a concrete "Human" type based off of protocol conformances and generics.

When you mentioned you wanted

a set of properties and methods that exist on a 'host' object from the point of its creation

with an

other API, or 'guest' object, would not be able to be interacted with from the host type

, it seemed like you were referring to protocol composition.

In terms of memory safety, it falls within the current implementation of Swift's memory model (no changes needed to the runtime). And it also allows you to compartmentalize your code into smaller portions.

chrisbia · July 9, 2020, 12:34am

What's your point here?

I build the model layer. If another programmer builds the controller layer I have no control over how they decide to use a particular Human instance. There will likely be a human controller to display a Human. Then that controller will have a subcontroller whose sole responsibility is to display a Brain. Now that controller has a reference to a Brain and has no knowledge of the existence of a Human. That is encapsulation.

What you are suggesting here is that every time an app is built using MVC that code has to be "set up" so that subcontrollers don't do anything with their objects once their supercontroller's data is deallocated; which also happens to take place in an entirely separate layer of the program.

This is anything but "very rare" and any time you have to write code to work around inadequacies in a programming language it's a problem. Not to mention this is a conceptual hurdle for beginners to grasp. Doesn't it just make sense that if a Human is destroyed then its Brain must have been too? It's not intuitive for composing objects to outlive the very objects they're composing.

Yes there is underlying assumption that you do not work with Brains if they don't belong to a specific Human. The problem is that the things that work with Brains have no concept of a Human. They are controllers that work only with matters that pertain to a specific Brain — display and mutation. The impetus is on the programmer that things won't occur with a Brain after its Human is deallocated. And programmers are not perfect, we are very capable of making mistakes. That is why we depend on the language we're working with to provide features that can limit the types of mistakes we can make.

What we've been talking about thus far is a simplified version of the reality. Consider a model layer where there's an object for each organ system, and then again for each organ within each system. Then there's a specific controller for each of those models, and at every level there is no concept of anything existing in controllers at levels above. Each supercontroller is a black box to its subcontrollers and each subcontroller is a black box to its supercontroller.

There are listeners set in each of these controllers on their respective models, these listeners may request data from that model, and that data may depend on the existence of a Human, the particular system object it belongs to, or another organ from another system. You may suggest to remove these listeners any time one of the dependencies is deallocated, but that's the thing, these controllers are encapsulated. They have no concept of model layer dependencies. Those dependencies are the domain of the model layer — implementation details therein.

It just makes sense for all this behavior to be enforced in the model layer and unowned does not provide that level of enforcement.

Composition is what we are talking about here, and it's not new. Certain objects being part of other objects is actually the very definition of composition.

As to them being able to be created independently, I'm not quite sure where you got that from. I actually suggested that they would be created dependently — that the host object would need to be initialized before the group objects were.