Question about CoW on value type

Why don't they print the same address?
I was expecting "copy on write" for value type.
But the value of arr2 didn't change here.

func print(address o: UnsafeRawPointer ) {
    print(String(format: "%p", Int(bitPattern: o)))
}

struct NumArr<R> {
    var vals: [R]
}

var arr1 = NumArr(vals: [1,2,3,4,5])
var arr2 = arr1

print(address: &arr1) // print 0x555c9c6505b0
print(address: &arr2) // print 0x555c9c6505b8

This is printing the address of the stack variables that arr1 and arr2 are stored into, not the underlying storage of the internal array buffer.

3 Likes

I don't want to know the address of the array.
NumArr is value type. I expect the address of arr2 will be changed after I modify it. If I don't modify it, i expect the address is the same.

isn't it?

Nope, copies of local variables can be thought of as two separate values as soon as the 2nd local variable is declared. Copy-on-write kicks in to avoid copying the array buffer when stored into a second variable.

1 Like

so cow only works on array?

Copy-on-write isn't some universal phenomena of value types. It's hand-written for String, Set, Array, Dictionary, etc.

See swift/Array.swift at 054bae50807167c499be2932386bd21cb6854544 · apple/swift · GitHub

3 Likes

Apple Developer Documentation is used for implementing CoW structures like swift-collections/_DequeBuffer.swift at main · apple/swift-collections · GitHub

Swift has always promoted value semantics as the core language concept. Value semantics means that different variables behave like they hold completely independent values: if you copy the value out of one variable into another, subsequent modifications of either of the variables have no effect on the other. Unfortunately, some of the early discussion about Swift mentioned that many of our core data types are implemented with copy-on-write internally, and that mixed the two concepts up in peoples' minds. That confusion has been very persistent, and we do still see a lot of people today who think that they have to understand copy-on-write in order to use Swift. Fortunately, they do not. Value semantics are the important thing to understand. Copy-on-write is important if you want to understand the performance of some operations, but it's a second-order concern; it is neither necessary in order to implement value semantics nor used implicitly for basic language features in Swift.

We use CoW in Swift because we think it's the right way to implement core data types like String and Array: we decided a long time ago that it's worth incurring a small overhead on every modification in order to optimize copies. If we didn't have CoW, we'd have added some kind of "copy constructor" feature to structs, and we'd make these types do a deep copy instead. In either case, they would still have value semantics.

9 Likes

This is how you can drill into swift's data structures. Use at your own risk:

import Foundation

func addressToBytes(_ address: UnsafeRawPointer) -> UnsafePointer<UInt8>? {
    address.assumingMemoryBound(to: UInt8.self)
}

func printBytes(_ address: UnsafeRawPointer, offset: Int = 0, count: Int) {
    let p = addressToBytes(address + offset)!
    
    print(String(format: "%016lx:  ", p), terminator: "")
    for i in 0 ..< count {
        let s = String(format: "%02x ", p[i])
        print(s, terminator: "")
    }
    print()
}

func getPointer(_ address: UnsafeRawPointer, offset: Int) -> UnsafeRawPointer? {
    let p = (address + offset).assumingMemoryBound(to: UnsafeRawPointer.self)
    return p.pointee
}

struct NumArr<R> {
    var vals: [R]
    var magic: UInt64 = 0x1122334455667788
}

var a = NumArr<UInt8>(vals: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25])
var b = a
let size = MemoryLayout.size(ofValue: a)
print("bytes of a:")
printBytes(&a, count: size)
print("bytes of b:")
printBytes(&b, count: size)

a.vals[0] = 0x7F
print("new bytes of a:")
printBytes(&a, count: size)
print("new bytes of b:")
printBytes(&b, count: size)

// let's drill into it further:
var av = getPointer(&a, offset: 0)!
var bv = getPointer(&b, offset: 0)!

var avSize = malloc_size(av)
print("contents of a's array: (\(avSize == 0 ? "first bytes" : "known to be malloced block of size \(avSize)"))")
printBytes(av, count: avSize == 0 ? 64 : avSize)

var bvSize = malloc_size(bv)
print("contents of b's array: (\(bvSize == 0 ? "first bytes" : "known to be malloced block of size \(bvSize)"))")
printBytes(bv, count: bvSize == 0 ? 64 : bvSize)

Note that I've changed your struct a little bit to easier see what's going on without scrolling too much:

struct NumArr<R> {
    var vals: [R]
    var magic: UInt64 = 0x1122334455667788
}

At start:

bytes of a:
0000000100008048:  00 02 70 01 00 60 00 00 88 77 66 55 44 33 22 11 
bytes of b:
0000000100008058:  00 02 70 01 00 60 00 00 88 77 66 55 44 33 22 11 

So far so good, as you can see both contents are equal at this point, both point to the same array. If you wonder why the bytes are in the wrong order - we are on a little endian computer.

Now changing one of the arrays:

a.vals[0] = 0x7F

new bytes of a:
0000000100008048:  00 03 70 01 00 60 00 00 88 77 66 55 44 33 22 11 
new bytes of b:
0000000100008058:  00 02 70 01 00 60 00 00 88 77 66 55 44 33 22 11 

As you can see a's array is now different - COW machinery did its job.

To further see the internals:

var av = getPointer(&a, offset: 0)!
var bv = getPointer(&b, offset: 0)!

Obviously if we got the memory layout wrong that would either crash or give us some garbage, but it appears we know what we are doing:

contents of a's array: (known to be malloced block of size 64)

0000600001700300: 48 59 9d db 01 00 00 00 03 00 00 00 00 00 00 00 19 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 7f 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 0a 1a dc 01 00 00 00

contents of b's array: (known to be malloced block of size 64)

0000600001700200: 48 59 9d db 01 00 00 00 03 00 00 00 00 00 00 00 19 00 00 00 00 00 00 00 32 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 4e 8f db 01 00 00 00

I highlighted array contents in bold. The rest of the array contains things like size (25 == 0x19), capacity and some internal stuff.

Now you have the superpowers – use it wisely! :mage:

3 Likes

What do you think about having an FAQ on the official Swift.org site for these kinds of matters? Here are some other related points of confusion I've seen a fair amount (on these forums, StackOverflow, and elsewhere):

  1. Not all structs have CoW magically implemented for them.
  2. It's possible for classes to have value semantics, and structs can have value semantics
  3. Common misconception: Structs are allocated on the stack, and class objects are allocated on the heap (the C# world struggles with this confusion too, see 1, 2)
  4. People often try to inspect the "address of a struct" as a proxy for some kind of stable identity (which they don't actually have)
  5. The difference between passing a reference by value (i.e. passing an instance of an object) vs passing a reference by reference (inout)
  6. and more?

(Wouldn't it be ironic if I made mistakes in laying out these common misconceptions? :sweat_smile:)

2 Likes

More precisely: structs are allocated inline in their container, which for local variables is the stack. struct members of a class are on the heap (unless the class allocation is nonescaping and was promoted to the stack by the optimizer), and global structs can be in the __DATA segment.

1 Like

…except when local variables are captured by closures. Stack-allocating local variables is technically an optimization in Swift!

2 Likes

Pardon the confusion, I was putting that line forth as a common misconception.

Oh structs can be optimized to registers as well.

Also structs in local vars of async functions are hoisted into a Task's storage, but only if their lifetimes cross a suspension point (await).

Tons of exceptions to the rule!

3 Likes

Indeed, one level of dereferencing is missing. Here's a possible fix:

func print(address: UnsafeRawPointer) {
    let p = address.assumingMemoryBound(to: UnsafeRawPointer.self).pointee
    print(p)
}

print(address: &arr1) // prints 0x00006000017041c0
print(address: &arr2) // prints 0x00006000017041c0
arr1.vals[0] = 0x7F
print(address: &arr1) // prints 0x0000600001700180 *** changed
print(address: &arr2) // prints 0x00006000017041c0