How to implement copy-on-write with thread safety?

Hello guys,

I'm implementing a COW value type, and I'm confused whether I should use lock facilities to achieve thread safety.

For instance, here is my COW tyoe.

class Storage {
  var value: Int
  // and many other stuff
  
  init(_ value: Int) { self.value = value }
  func copy() -> Storage { return Storage(value) }
}

struct Value {
  private var storage: Storage
  
  init(_ value: Int) {
    storage = Storage(value)
  }
  
  func get() -> Int { return storage.Int }

  mutable func update(_ value: Int) {
    if !isKnownUniquelyReferenced(&storage) {   // (a)
      storage = storage.copy()                  // (b)
    }
    storage.value = value                       // (c)
  }
}

After some digging in this forum, I understand isKnownUniquelyReferenced is thread safe as long as the caller does not create race condition on a same Value variable.

But are the codes above enough to guarantee thread safety? Are there any mistakes if I use Value in the following way?

var v1 = Value(3)  
// v1 will always be mutated on queue1
let queue1 = DispatchQueue("queue1")
let queue2 = DispatchQueue("queue2")

queue1.main.async {
  v1.update(5)
}

queue2.main.async { [var v2 = v1] in
  v2.update(3)
}

It seems perfectly fine for me, because semantically Value is a value type, and when queue2 makes a copy with v2 = v1, everything should be OK just like Array.

However, after closer analyses, I'm not so sure about that. Here's what I think:
There're 2 threads, let's call them T1, T2. During runtime, the order of executions of lines (a), (b), (c) in the 2 threads could be:
T1: (a) ---> (c)
T2: (a) ---> (b)

But when combined, I believe it is totally possible that the order of executions are as follows, because the copy v2 = v1 happens after v1.update(5) is submitted on queue1.
T1 (a)
v2 is created, reference on storage +1
T2 (a)
T1 (c) (sorry I mistyped this line before)
T2 (b)

Now there's a race condition on the shared object Storage, the object is mutated and fetched (for copy) at the same time: the object is used for copy during mutating - god knows what the values will be in the copy.

So which part of my codes is flawed?
Do I need to wrap my COW implementation in locks?
If I do, witch part of my codes should I change, the methods of Storage or Value?
Or is the caller to blame for the race condition happening here?

If you enter update(), check for uniqueness of reference, and then before update() exits v2 is created, that means the reference was not unique in the first place.

I mistyped the line "T1 (c)" before.

What I means is, for the first time update is entered, there's only 1 variable v1 , so isKnownUniquelyReferenced will return true , and line c will be executed. But just before that, v2 is initialized and update is entered on another thread. This time, because v1 and v2 are both alive, isKnownUniquelyReferenced will return false and line b will be executed.

This is possible, right?

No. The capture list is taken at the time of the closure creation. The moment we hit the call to queue2.main.async we create the second reference to Value.

In the lifecycle of this program there are 2 strong references to the initial Value. The first is v1, held in the main thread until the point of its last use (the call to queue2.async) and in queue2 until that queue exits. There is no point in the program where only one strong reference exists unless either of the two blocks enqueued on the queues has exited.

Your implementation of COW is correct, but usage of the Value is not.

There are 3 threads here. There is also a thread where everything is setup - queues and closures are created.

And copying of v1 into v2 is happening on that thread, while v1 might be being mutated on queue1. That’s the data race that is causing problems.

:+1: Yup, I missed this data race in my answer. That's absolutely not safe.

This is correct. But there's no telling this must happens before v1.update(5) is called in queue1.

I believe you are right, this clears a lot of things.

Thank you.