How am I supposed to implement value-semantics for multi-threaded environment without synchronization?

I've been thought "value-semantic" means full isolation for different instances. For example,

let a = "abc"
var b = a // copied and isolated.
b.append("def") // changes in b never affect to a

This is also true for multi-threaded environment as long as I make another instance.

let a = "abc"
DispatchQueue.global().async { [b = a] in b.append("def") } // copied and isolated. changes in b never affect to a.

This is well explained in this article. This article clearly says you can pass value-semantic types freely among threads without synchronization.

Importantly, you can safely pass copies of values across threads without synchronization. In the spirit of improving safety, this model will help you write more predictable code in Swift.

AFAIK, the recommended way to build such value-semantic types is CoW pattern which depends on isKnownUniquelyReferenced function to provide the copy and isolate property.

But when I see the manual, the manual says the function can return false positive in multi-threaded access, then IMO, accidental simultaneous write to a shared storage can happen, and assumption of "copy & isolate" can be broken. Furthermore, the function manual says synchronization is required to make the function work properly. Then, types built with CoW pattern with this function is not value-type anymore because two instances are not fully isolated without synchronization.

What am I supposed to do to build fully value-type semantics like String or Array? Is CoW pattern really enough? How is it possible to build synchronization-free types with isKnownUniquelyReferenced function that requires synchronization? Is there some hidden magic that I don't know? Please let me know how I am supposed to make CoW value-types.

1 Like

You could wrap all the isKnownUniquelyReferenced calls into synchronized blocks, and use that to build your own value types, but I wouldn't recommend it.

Better yet, just avoid interactions between your threads whenever possible. Start threads off with their own unshared state, and have them produce a result at the completion of their computation. If they need to share any other state (try to avoid this, but it's sometimes necessary), put the state in a shared, synchronized object (instance of a class).

You should certainly avoid threads trying to access global state stored in CoW types, because as you said, isKnownUniquelyReferenced isn't thread-safe on its own.

The check performed by isKnownUniquelyReferenced is meaningful as long as the reference pointing to the storage is accessible only from one thread. If the storage itself is available to another thread via another reference, that's fine as isKnownUniquelyReferenced will simply return false in that case. Only if the reference itself is shared between threads do you need synchronizing access to the reference.

1 Like

This sounds like all CoW types loses its value-semantic attributes in multi-threaded access case unlike mentioned in the Apple posting.

This also voids what Apple posting promised for thread safety of value types. Then, as String and Array involves CoW, they're not value types too.

They are thread safe as long as you pass a copy of the variable to the other thread (and therefore a copy of the reference to the storage). Because there are now two references to the storage, any attempt to mutate will create a copy before the mutation happens. After that copy, each thread will have a reference to a different storage.

Maybe you should post an example of code of something you worry may be unsafe.

I mean, if uniqueness test can return false positive in concurrent access, there's no way to test for single or multiple references.

For example,

// I want a value-semantic type.
// that can be passed to any thread freely
// without synchronization.
struct Foo {
    private var storage = FooStorage()
    var bar: Int { 
        get { return storage.bar }
        set(v) {
            // this can be true 
            // if storage is accessed from multiple threads.
            if !isKnownUniquelyReferenced(& storage) {
                storage = FooStorage()
            }
            storage.bar = v
        }
    }
}
class FooStorage { 
    var bar = 111
}


let a = Foo()
for i in 0..<1000 {
    DispatchQueue.global().async { [b = a] in 
        // unsynchronized write.
        // if the unique-ness test returns false positive...?
        b.bar = 222
    }
}
DispatchQueue.global().async {   
    // unsynchronized read. result undefined.
    print(a.bar) 
}    

Because

If the instance passed as object is being accessed by multiple threads simultaneously, this function may still return true .

I think your example is all fine and the printed result at the end is always going to be 333. Or have you been able to reproduce something else?

If the instance passed as object is being accessed by multiple threads simultaneously, this function may still return true.

I read this as "isKnownUniquelyReferenced does not check if the object is being accessed from multiple threads, only that the reference is unique". But the wording is confusing, I'll grant you that.

I think it's trying to say the reference being unique does not automatically imply that only one thread has access. You can even make an example of that with nested CoW arrays:

let a = [[1, 2], [3, 4]]
let b = a
// (Assuming we somehow have access to the storage.)
// The storage for `a` and `b` has two references:
assert(isKnownUniquelyReferenced(&a._storage) == false)
assert(isKnownUniquelyReferenced(&b._storage) == false)
// But the storage for the inner arrays only has one reference
// (because `a._storage` hasn't been duplicated yet)
assert(isKnownUniquelyReferenced(&a[0]._storage) == true)
assert(isKnownUniquelyReferenced(&b[0]._storage) == true)
assert(isKnownUniquelyReferenced(&a[1]._storage) == true)
assert(isKnownUniquelyReferenced(&b[1]._storage) == true)

Note how b[0] and a[0] are referencing the same inner storage through the same reference. If b was given to another thread, we'd have two thread accessing the same inner storage simultaneously, even though that inner storage is uniquely referenced.

We don't really need synchronization here because the reference itself is stored in CoW storage and nobody is going to mutate the inner storage without copying the outer one first. For instance:

b[0][0] = 99
// does the following
// - makes a copy of `b._storage`
//   (`b[0]._storage` is no longer uniquely referenced)
// - makes a copy of `b[0]`'s storage
// - writes to `b[0][0]`

The way to think about this, which the comment doesn’t make clear perhaps, is that isUniquelyReferenced check provides the same level of thread safety as you’d expect from a primitive value type like Int or Float. You can be sure that copies of the same value can be safely accessed from multiple threads, and mutated independently. However, if there is a race condition accessing the same variable, then you’ll get undefined behavior:

let a = someValue()
var b = a
var c = a
// This is safe
async { b.mutate() }
async { c.mutate() }

// This is not
async { b.mutate() }
async { b.mutate() }

This is true whether someValue is an Int or a copy on write value type implemented using isKnownUniquelyReferenced. Thread Sanitizer can detect races of this kind at runtime.

15 Likes

OK. I got it. It would be better if manual gets updated to your reply. Thanks.

1 Like

@nnnnnnnn might be able to help with that!

1 Like