Using a protocol to more easily implement copy-on-write

nikitamounier · April 3, 2021, 4:29pm

I was watching a great talk recently by @johannesweiss called "High-performance systems in Swift", where he talked about the importance of copy-on-write for large value types in high-performance systems (which is already implemented on common Swift types like String, Array, and Dictionary).

However, implementing copy-on-write manually, although pretty straightforward, comes with lots of boilerplate, with having to create a computed property for each of the properties in the underlying storage, and in each of computed properties' setter an !isKnownUniquelyReferenced(_:) check.

So, I set out on a mission to find a way to reduce the boilerplate as much as possible when implementing copy-on-write, while still having accessing and modifying the properties feeling natural, and came up with this:

@dynamicMemberLookup
protocol CopyOnWrite {
    associatedtype Storage: AnyObject & Copyable
    
    var _storage: Storage { get set }
}

extension CopyOnWrite {
    subscript<Value>(dynamicMember keypath: ReferenceWritableKeyPath<Storage, Value>) -> Value {
        get {
            return _storage[keyPath: keypath]
        }
        set {
            if !isKnownUniquelyReferenced(&_storage) {
                _storage = _storage.copy()
            }
            self._storage[keyPath: keypath] = newValue
        }
    }
}

With Copyable a simple protocol:

protocol Copyable {
    func copy() -> Self
}

As you see, instead of making a computed property for each of the properties, we can use keypath member lookup to access the underlying storage which is constrained to being a reference type – this means we only need to write the !isKnownUniquelyReferenced(_:) check once.

Now, implementing copy-on-write for a value type looks like this:

struct LargeType: CopyOnWrite {
    typealias Storage = _Storage
    var _storage: _Storage
    
    init(value: String) {
        self._storage = _Storage(value: value)
    }
    
    final class _Storage: Copyable { // requires the `copy()` method
        var value: String
        
        init(value: String) {
            self.value = value
        }
        
        func copy() -> _Storage {
            return .init(value: self.value)
        }
    }
}

To test this out:

var first = LargeType(value: "first")
print(first.value) // first

var second = first
print(second.value) // first
print(first._storage === second._storage) // true

second.value = "second"
print(first.value) // first
print(second.value) // second
print(first._storage === second._storage) // false

Is this a valid approach? Are all the rules of copy-on-write respected?

I saw a copy-on-write wrapper implementation using a property wrapper. Although it definitely has even less boilerplate involved, modifying the storage using the projectedValue feels awkward.

As Swift continues to move into the server space (especially with the introduction of actors and other concurrency features), performance will become increasingly important. As we wait for ownership semantics, it's important that users are able to use copy-on-write semantics for their large types succinctly.

Here's the full gist – I'd love to hear your thoughts!

idrougge · April 3, 2021, 11:35pm

Doesn’t @dynamicMemberLookup incur a cost you wouldn’t want in high-performance applications?

TellowKrinkle · April 4, 2021, 2:25am

dynamicMemberLookup isn't the thing you should be looking at, it's keypaths, as dynamicMemberLookup is just a shortcut for calling a special subscript

"Does accessing a variable through a compile-time-known keypath incur overhead?"

It appears the compiler already has optimizations in place for some simple cases

But for some reason it doesn't always work

You can confirm here that adding dynamicMemberLookup does not affect member access

filip-sakel · April 4, 2021, 12:57pm

You should not have any significant performance penalty from using @dynamicMemberLookup, if you annotate the key-path-accepting subscript either with @inline(__always) or @_transparent.

One limitation of your implementation, though, is that you can have only one copy-on-write object per container type. A property wrapper could solve that:

@propertyWrapper
public struct CopyOnWrite<Value: Copyable> {
    public var wrappedValue: Handle
    
    public init(object objectGetter: @autoclosure () -> Value) {
        self.wrappedValue = Handle(storage: objectGetter())
    }
    
    @_transparent
    public var projectedValue: Value {
        mutating get {
            wrappedValue.copyIfNotUnique()
            return wrappedValue.storage
        }
        set {
            wrappedValue.copyIfNotUnique()
            wrappedValue.storage = newValue
        }
    }
}

extension CopyOnWrite {
    @dynamicMemberLookup
    public struct Handle {
        @usableFromInline
        internal var storage: Value
        
        @_transparent
        @usableFromInline
        internal mutating func copyIfNotUnique() {
            if !isKnownUniquelyReferenced(&storage) {
                storage = storage.copy()
            }
        }
        
        public subscript<Member>(
            dynamicMember keyPath: ReferenceWritableKeyPath<
                Value,
                Member
            >
        ) -> Member {
            @_transparent
            get {
                storage[keyPath: keyPath]
            }
            @_transparent
            set {
                copyIfNotUnique()
                storage[keyPath: keyPath] = newValue
            }
        }
    }
}

This way, we can write:

final class ClassA: Copyable {
    var propertyA: Int 

    init(propertyA: Int) {
        self. propertyA = propertyA 
    }

    func copy() -> Self {
        ClassA(propertyA: propertyA)
    }
}

final class ClassB: Copyable {
    var propertyB: Int 
  
     ...
}

struct Container {
    @CopyOnWrite(object: ClassA())
    var classA

    @CopyOnWrite(object: ClassB())
    var classB

    func useAB() {
        classA.propertyA
        classB.propertyB 

        // Here we access the handle, not the object, which is safe.
        _ = classA 

        print($classA) ❌ 'useAB()' is not mutating
    }
}

nikitamounier · April 4, 2021, 6:16pm

Thanks for this! The website you linked is very interesting, I'll definitely dig into it some more.

I really like this implementation! I find it more intuitive than the other implementation using property wrappers – if you want access to the class' properties, you use the wrappedValue, and if you want to change the type itself or call one of its methods (so interfacing with the "meta" of the type), you use the projectedValue. And it's definitely more intuitive to make types obtain copy-on-write behaviour with this compared to my protocol-oriented method.

However, you say:

I disagree that the property wrapper enables you to have more than one copy-on-write object per container type – the property wrapper itself is the container. So, when putting the wrapped class inside a Container struct, you're actually putting a container in a container. This is epitomised by how you access the object's properties.

With the protocol-oriented method:

struct ClassAContainer: CopyOnWrite {
    // ....
    class ClassA: Copyable {
        var property: Int
        // ....
    }
}

var cow = ClassAContainer()
cow.property = 1

With the property-wrapper method:

class ClassA: Copyable {
    var property: Int
    // ....
}

struct ClassAContainer {
    @CopyOnWrite(object: ClassA(property: 0)) var classA
}

var cow = ClassAContainer()
cow.classA.property = 1

In addition, one could consider the CopyOnWrite.Handle to be the actual container – does that make it a container in a container in a container? Would such layering incur performance costs?

Arasthel · April 5, 2021, 6:16am

I believe the best way to do this would be either by code generation using Sourcery or, as they advised you above, keypaths. About a year ago I made some experiments on some syntax that would mimick how Kotlin data classes' copy method works using keypaths.

You can see all my attempts in this gist: https://gist.github.com/Arasthel/f46452623e44e111f9c343de5ceb0329

The syntax I liked the most is the one at the end of the file:

let original = MyStruct(a: "Foo", b: 0)
let copy = original.copy(\.a .= "Bar")
print(copy.a) // Bar

I'm sorry that it's not commented, but I didn't really plan to make this public at that point.

I you'd want a code generation based solution I can also provide a Sourcery template, although after using it for some time I can't actually recommend this approach.

EDIT: actually, after re-reading the whole thread I realised this is probably not what you wanted to achieve, sorry. I'm leaving it here in case someone finds it useful, though.

kiel · April 5, 2021, 8:13am

We know that the current copy on write implementation is susceptible to data races. Should any storage type, going forward, solve this problem with an actor? My intuition says it should not, because that kind of problem should be solved at a higher level of abstraction. But if it does, how does that affect the shape of the COW protocol?

ktoso · April 5, 2021, 8:16am

Actors should not be used to represent “data”. They are “live entities, that mutate data”.

As such, actors may use (including storing) CoW types, but it’d be pretty weird for an actor itself to utilize CoW itself.

kiel · April 5, 2021, 8:20am

Great response, thanks.

nikitamounier · April 5, 2021, 12:22pm

I'm guessing you mean by the fact that multiple threads could access the same storage without forcing a copy, since isKnownUniquelyReferenced(_:) could sometimes return true in such cases. If so, how are String and Array thread-safe, if they use much of the same mechanism as @filip-sakel's or mine's implementation?

Indeed this isn't exactly what I was looking for, but it's still some very interesting code! I especially like your use of infix operators – very creative.

filip-sakel · April 8, 2021, 8:55am

I meant that, with the wrapper method, multiple copy-on-write properties can be declared in a single struct; with the protocol approach, however, you will need to nest copy-on-write "container" types or resort to boilerplate.

No, nesting doesn't, usually, incur performance costs; key-paths, on the other hand, can sometimes not be well optimized, as @TellowKrinkle mentioned. That's the reason why it's important to include @_transparent, which is essential for libraries vending key-path-based APIs, in most cases.