Copy-on-write implementation choice

mobc · December 2, 2024, 3:58pm

Hi everyone,

I’m currently trying to understand Copy-on-Write (COW) behavior, and I’ve come across two different implementations. I was wondering why some people might prefer the first approach over the second.

Here’s the first implementation:

struct MyStruct_1 {
    private class _Storage {
        var array: [Int64] = []

        func copy() -> Self {
            let copy = Self()
            copy.array = self.array
            return copy
        }

        func append(_ element: Int64) {
            array.append(element)
        }
    }

    private var _storage = _Storage()

    mutating func append(_ element: Int64) {
        if !isKnownUniquelyReferenced(&self._storage) {
            self._storage = self._storage.copy()
        }
        self._storage.append(element)
    }
}

And here’s the second one:

struct MyStruct_2 {
    var array: [Int64]
}

Since Swift’s Array already implements Copy-on-Write semantics, the array property in MyStruct_2 should avoid duplicating its content when MyStruct_2 is passed around—until it’s modified.

So, my question is:

Is there a performance or other practical reason to store the array inside a reference type (_Storage) in the first implementation?

Thanks in advance for your insights!

Slava_Pestov · December 2, 2024, 4:25pm

If it's just an array, then no, but if MyStruct_1 contains a large number of stored properties, then the implemented approach might have some performance benefit.

I believe you can also fake this without resorting to isKnownUniquelyReferenced(), by using a one-element array instead:

struct CowPoint {
  private struct Storage {
    var x: Int = 0
    var y: Int = 0
    var z: Int = 0
  }

  private var _storage = [Storage()]

  var x: Int { get { return _storage[0].x } set { _storage[0].x = newValue } }
  var y: Int { get { return _storage[0].y } set { _storage[0].y = newValue } }
  var z: Int { get { return _storage[0].z } set { _storage[0].z = newValue } }
}

mobc · December 3, 2024, 3:37pm

Thank you for taking the time to respond

vanvoorden · December 3, 2024, 8:11pm

Slava_Pestov:

I believe you can also fake this without resorting to isKnownUniquelyReferenced(), by using a one-element array instead:

struct CowPoint {
  private struct Storage {
    var x: Int = 0
    var y: Int = 0
    var z: Int = 0
  }

  private var _storage = [Storage()]

  var x: Int { get { return _storage[0].x } set { _storage[0].x = newValue } }
  var y: Int { get { return _storage[0].y } set { _storage[0].y = newValue } }
  var z: Int { get { return _storage[0].z } set { _storage[0].z = newValue } }
}

This is an interesting idea… and it got me thinking: to what extent – if any – could a "regular old" Swift.Array serve as a legit "cow storage" box? Would that be more performant than a solution built from a "roll your own" storage box?

The Swift-CowBox repo (full disclosure: self-promotion) is a set of macros for adding copy-on-write semantics to Swift Structs: it writes the boilerplate for you at compile-time. For the most part, CowBox follows the patterns from Swift NIO^[1]^[2]^[3]: a custom Storage class is the "Box" that is copied by-reference when the struct container is copied.

The CowBox repo comes with Performance Benchmarks built from the Ordo One Benchmarks package. We focus on three operations for measuring performance:

We create a Swift.Array of N elements.
We create a Swift.Array of N elements, copy the array, and mutate the copy with one new element.
We create a Swift.Array of N elements, copy the array, mutate the copy with one new element, and then test both arrays for equality.

For those three operations, we test against an Element built from a "classic" swift struct:

struct StructElement: Hashable {
  let a: Int64
  let b: Int64
  let c: Int64
  let d: Int64
  let e: Int64
  let f: Int64
  let g: Int64
  let h: Int64
  let i: Int64
  let j: Int64
}

And a "CowBox" struct:

@CowBox struct CowBoxElement: Hashable {
  @CowBoxNonMutating var a: Int64
  @CowBoxNonMutating var b: Int64
  @CowBoxNonMutating var c: Int64
  @CowBoxNonMutating var d: Int64
  @CowBoxNonMutating var e: Int64
  @CowBoxNonMutating var f: Int64
  @CowBoxNonMutating var g: Int64
  @CowBoxNonMutating var h: Int64
  @CowBoxNonMutating var i: Int64
  @CowBoxNonMutating var j: Int64
}

Here is a new element for testing the performance of an element built from a Swift.Array cow storage:

struct ArrayElement: Hashable {
  private struct _Storage: Hashable {
    let a: Int64
    let b: Int64
    let c: Int64
    let d: Int64
    let e: Int64
    let f: Int64
    let g: Int64
    let h: Int64
    let i: Int64
    let j: Int64
  }
  
  private var _storage: [_Storage]
}

Once we have these three element types defined, we can run our three operations on each element (for nine benchmarks). Let's focus on three metrics: Instructions, Memory, and CPU.

For our experiment, we create arrays of 10 million elements and run each benchmark 100 times. These results are from running on a MBP M2 Pro Max with 96GB memory.

Let's start with measuring instructions:

Instructions

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Create (M) *	7219	7227	7227	7231	7231	7234	7234	100
CowBoxElement Create (M) *	4763	4769	4773	4777	4777	4777	4779	100
StructElement Create (M) *	440	441	441	441	441	441	441	100

Instructions

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Copy (M) *	432	438	440	443	445	447	462	100
CowBoxElement Copy (M) *	378	381	381	381	381	382	383	100
StructElement Copy (M) *	437	438	438	438	439	439	439	100

Instructions

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Equal (M) *	152	152	152	152	152	153	153	100
CowBoxElement Equal (M) *	80	82	82	82	82	83	83	100
StructElement Equal (M) *	316	319	320	320	320	321	322	100

Across the board, our CowBoxElement uses fewer instructions than ArrayElement.

Here are the results for memory:

Memory (resident peak)

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Create (M)	1184	1214	1228	1228	1228	1228	1228	100
CowBoxElement Create (M)	1057	1057	1057	1057	1057	1057	1057	100
StructElement Create (M)	763	807	809	809	809	809	809	100

Memory (resident peak)

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Copy (M)	1278	1294	1302	1302	1302	1302	1302	100
CowBoxElement Copy (M)	1114	1137	1137	1137	1137	1137	1137	100
StructElement Copy (M)	1561	1609	1609	1609	1609	1609	1609	100

Memory (resident peak)

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Equal (M)	1293	1294	1310	1310	1310	1310	1310	100
CowBoxElement Equal (M)	1127	1137	1137	1137	1137	1137	1137	100
StructElement Equal (M)	1609	1609	1609	1609	1609	1609	1609	100

Across the board, our CowBoxElement uses less memory than ArrayElement.

Here are the results for CPU time:

Time (total CPU)

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Create (μs) *	400034	408420	412353	416547	418382	422838	423065	100
CowBoxElement Create (μs) *	240640	250348	250872	251265	251527	253493	253817	100
StructElement Create (μs) *	64299	65765	65962	66519	69140	71369	71391	100

Time (total CPU)

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Copy (μs) *	42760	44499	46006	48071	49775	52527	74258	100
CowBoxElement Copy (μs) *	31069	33505	34144	35062	40075	45941	46178	100
StructElement Copy (μs) *	55292	56459	56885	57344	59113	61374	61536	100

Time (total CPU)

Test	p0	p25	p50	p75	p90	p99	p100	Samples
ArrayElement Equal (μs) *	22184	22839	22938	23069	23167	23806	24130	100
CowBoxElement Equal (μs) *	3396	4354	4395	4436	4477	4813	5473	100
StructElement Equal (μs) *	34959	36733	37356	37650	37814	38928	39466	100

Across the board, our CowBoxElement uses less CPU than ArrayElement. We also see what looks like a big win when comparing two non-identical array instances for equality.

Based on these experiments, a custom-built copy-on-write data storage class seems to perform better than leveraging a Swift.Array for data storage. The Swift.Array approach does mean less code than building copy-on-write semantics by-hand, but the CowBox macro can also build that boilerplate for you.

You can experiment with this for yourself. Here is a patch you can apply on the 1.0.0 commit from Swift-CowBox to set up the new benchmarks:

gist.github.com

https://gist.github.com/vanvoorden/dffc72d77460b3c431d5bc65a93b1dc1

2024-12-03.txt

diff --git a/Benchmarks/Benchmarks.sh b/Benchmarks/Benchmarks.sh
index f50f1f6..171b5e1 100644
--- a/Benchmarks/Benchmarks.sh
+++ b/Benchmarks/Benchmarks.sh
@@ -23,11 +23,11 @@ main() {
   mkdir -p "$results"
   cd "$results"
   
-  swift package -c release benchmark --grouping metric --filter "^Array.*" > 1.txt
-  swift package -c release benchmark --grouping metric --filter "^ContiguousArray.*" > 2.txt

This file has been truncated. show original

allevato · December 3, 2024, 8:14pm

Swift.Array somewhat unsurprisingly has more overhead because there's additional indirection inside the type implementation to support multiple storage representations, like bridged NSArrays from Objective-C. It's not just a simple wrapper around a heap allocation like a hand-written CoW type would tend to be.

David_Smith · December 3, 2024, 9:07pm

I would be extremely surprised if that was the source of the overhead here. Single branches are very cheap, and the compiler has a lot of knowledge of Array internals, to the point where ContiguousArray sometimes ends up slower than Array.

Step one if someone wants to dig into this would be to get time profiles of the different cases so we can avoid guessing.

Nobody1707 · December 4, 2024, 8:15am

There's also the fact that the array version has to perform a bounds check on each access.