Suppose I have some code that uses a value-semantic data structure from the stdlib (specifically wondering about Array), and I want to ensure that CoW does not occur within a particular region of execution β is there a reliable way to do that at runtime from within the program? Underscored API are fine, this would just be for investigative purposes. Some searching suggests comparing buffer pointer values before and after a mutation, but I don't think that works in general (if it works at all).
I would expect comparing buffer pointers would work, except that it would get a false positive on resizes due to appends. If that's not an issue for your code it should work fine.
Interesting, thank you. I am primarily interested in what happens when appending, and oddly the simple tests I've done seem to report that the buffer base address changes on any append, unless I explicitly reserve capacity beforehand. Not sure if that is some artifact of optimizations for small arrays or what.
There might be something here that could help:
var a = [1, 2, 3]
do {
weak let weakOwner = a._owner
a.append(4)
if let weakOwner = weakOwner {
print(
"weakOwner: ",
ObjectIdentifier(weakOwner)
)
} else {
print("nil")
}
if let owner = a._owner {
print(
"owner: ",
ObjectIdentifier(owner)
)
} else {
print("nil")
}
}
var b = a
do {
weak let weakOwner = a._owner
a.append(4)
if let weakOwner = weakOwner {
print(
"weakOwner: ",
ObjectIdentifier(weakOwner)
)
} else {
print("nil")
}
if let owner = a._owner {
print(
"owner: ",
ObjectIdentifier(owner)
)
} else {
print("nil")
}
}
When a is "unique" and we save a weak pointer to _owner that pointer is nil after we append.
When a is "shared" and we save a weak pointer to _owner that pointer is not nil after we append.
If the weak owner and the current owner are the same pointer that could imply there was no copy. I'm not sure how robust this all would be⦠but this might be Good Enough just for debugging purposes.
I guess one possibility is that the Array buffer is being stack promoted before the first append⦠but that wouldn't account for any subsequent ones.
Ah yes, I think you're right. And I was imprecise before β it does appear to be just the first one that causes the initial base address change. Subsequent appends don't alter the address unless a resize is actually needed.
When implementing my own containers or wrappers of existing containers that are generic I have found it useful to use a simple value that can verify in-place mutation:
struct CoWValue: @unchecked Sendable {
private final class UniqueIndicator {}
/// This reference is "copied" if not uniquely referenced
private var uniqueIndicator = UniqueIndicator()
/// mutates `self` and returns a boolean weather in was mutated in place
/// - Returns: true if mutations happened in-place, false if Copy on Write (CoW) was triggered
@discardableResult
mutating func mutateInPlace() -> Bool {
if !isKnownUniquelyReferenced(&uniqueIndicator) {
uniqueIndicator = UniqueIndicator()
return false
}
return true
}
}
which you can then use like this:
var array = [CoWValue]()
array.append(CoWValue())
assert(array[0].mutateInPlace())
Thanks for the tip; TIL about _owner. FWIW I think both the owner and the buffer pointer sort of produce the same information β a proxy for the current identity of the underlying storage. I thought there might be some way to just directly query isKnownUniquelyReferenced passing that value, but since it's computed it seems you have to make a new reference to the underlying value, thus incrementing the reference count. I thought maybe an unowned local variable might get around this, but that doesn't work because isKnownUniquelyReferenced always returns false when passed such a binding (makes sense, and the rationale is in the docs).
That does seem useful β thanks for sharing. Possibly related: I did find this utility in the stdlib unit tests which looks like the kind of like the thing I was hoping might exist, though I haven't actually played around with it enough to be certain.
I think the buffer pointer test works sufficiently well to answer the question in the specific case that got me wondering about this, which, if your curious, is the following sort of thing:
class C {
enum State {
case none
case some([Int])
}
var state = State.some([])
func append(_ i: Int) {
switch self.state {
case .none:
self.state = .some([i])
case .some(var value):
value.append(i) // <-- Do we (necessarily) trigger CoW here?
self.state = .some(value)
}
}
}
If we use a utility to check for changes in the base address before and after a mutation:
extension Array {
@discardableResult
mutating func didBufferMove(
during mutation: (inout Self) -> Void
) -> Bool {
let startPtr = self.withUnsafeBufferPointer { $0.baseAddress }
mutation(&self)
let endPtr = self.withUnsafeBufferPointer { $0.baseAddress }
let didMove = (startPtr != endPtr)
print("buffer moved: \(didMove)")
return didMove
}
}
// and use it here...
func append(_ i: Int) {
switch self.state {
case .none:
self.state = .some([i])
case .some(var value):
value.didBufferMove { $0.append(i) }
self.state = .some(value)
}
}
Then the local binding appears to always have its buffer to moved during the append(). Presumably this is due to CoW since both the local binding an the class property have references to the underlying storage. If we instead overwrite the property before the mutation:
func append(_ i: Int) {
switch self.state {
case .none:
self.state = .some([i])
case .some(var value):
self.state = .none // Try to ensure `value` is the only reference during mutation
value.didBufferMove { $0.append(i) }
self.state = .some(value)
}
}
Then the utility only reports that the buffer base address changes when you hit the "breakpoints" that force the buffer to resize and allocate more memory (IIUC for Array's buffer this occurs at powers of two).
For this specific case you can use a couple techniques to get rid of CoW.
If you make the type not a class but a struct you can consume it which should get rid of the intermediate copy:
struct C {
enum State {
case none
case some([Int])
}
var state = State.some([])
mutating func append(_ i: Int) {
switch (consume self).state {
case .none:
self = .init(state: .some([i]))
case .some(var value):
value.append(i)
self = .init(state: .some(value))
}
}
}
Copyable structs don't support partial consumption so you will need to consume self fully. A ~Copyable struct at least supports partial consumption but sadly doesn't yet support reinitializing fields after they are consumed.
It still improves it slightly:
struct C2: ~Copyable {
enum State: ~Copyable {
case none
case some([Int])
}
var state = State.some([])
mutating func append(_ i: Int) {
switch consume state {
case .none:
self = .init(state: .some([i]))
case .some(var value):
value.append(i)
self = .init(state: .some(value))
}
}
}
The real advantage of making everything ~Copyable is that we can now use UniqueArray (from swift-collections today or eventually in the standard library):
struct C3: ~Copyable {
enum State: ~Copyable {
case none
case some(UniqueArray<Int>)
}
var state = State.some(.init())
mutating func append(_ i: Int) {
switch consume state {
case .none:
var array = UniqueArray<Int>()
array.append(i)
self = .init(state: .some(array))
case .some(var value):
value.append(i)
self = .init(state: .some(value))
}
}
}
which actually grantees that we never trigger CoW as it doesn't even support it to begin with.
The consume of state and the reinitializing of the enum is still just a workaround for a missing inout for pattern matching feature.
I will point out though, that you may as well just use Optional instead of your custom state enum. Not only is it the same shape as your enum, it also comes with a take() method which is perfect for situations like these.
struct C4: ~Copyable {
var state: [Int]? = []
public mutating func append(_ i: Int) {
// This consumes the contained value, if any, and replaces it with nil.
switch state.take() {
case nil:
state = [i]
case var array?:
// array is uniquely referenced.
array.append(i)
state = array
}
}
}
Although since arrays already have a non-allocating empty state, you could skip the optional and use something like this:
@inlinable
public func replace<T: ~Copyable>(_ value: inout T, with newValue: consuming T) -> T {
let oldValue = value
value = newValue
return oldValue
}
struct C5: ~Copyable {
var state: [Int] = []
public mutating func append(_ i: Int) {
var array = replace(&state, with: [])
array.append(i)
state = array
}
}
Your points are well-taken, but this example was a reduction from a more complex use case which is not isomorphic to Optional (I probably should have made that more clear in retrospect).
I like the idea of consuming the existing value in the switch expression and then re-initializing it in the case branches as both you and David suggested β that seems like it has the right "flow" of ownership to me. Thank you both for the suggestions.