When the count of the bytes is no greater than 14 in a 64 system, or 6 in a 32 system, the initializer of Data.init(bytesNoCopy:count:deallocator:) does not work as expected as the label suggests: it makes a copy instead of using the bytes pointer.
let testPointer = UnsafeMutableRawPointer.allocate(byteCount: 1, alignment: 1)
testPointer.storeBytes(of: 1, as: UInt8.self)
print(testPointer.load(as: UInt8.self))
// prints 1
let testData = Data(bytesNoCopy: testPointer, count: 1, deallocator: .free)
testPointer.storeBytes(of: 0, as: UInt8.self)
print(testPointer.load(as: UInt8.self))
// prints 0
print(testData[0])
// prints 1, but it should print 0
EDIT: Sorry for the noise. I removed the content of my post, which essentially boiled down to a verification of what you show in the OP. And I agree that the current behavior is surprising / a bug.
I don’t consider this to be a bug in Data. IMO the ‘no copy’ variants are an optimisation and there’s no requirement that Data implement that optimisation in all circumstances.
Notably, if Data decides to not use the buffer it frees it immediately. Consider this:
let size = 1
let p = calloc(size, 1)!
let d = Data(bytesNoCopy: p, count: size, deallocator: .custom({ p, _ in
print("free")
free(p)
}))
print(d)
which prints:
free
1 bytes
And that suggests that this isn’t a simple omission.
You could, of course, argue that the documentation should cover this non-obvious behaviour, and file a bug on that basis (-:
I agree with @eskimo that this is confusing given the current state of the documentation, but that it's not a bug. When you give NSData/Data a no-copy buffer, it needs to own that buffer in order to correctly handle deallocation. From NSData.init(bytesNoCopy:length:freeWhenDone:) on the freeWhenDone flag:
If true , the returned object takes ownership of the bytes pointer and frees it on deallocation.
The same is implicitly true for the other deallocator cases, though it could be spelled out more clearly.
If the result is mutated and is not a unique reference, then the Data will still follow copy-on-write semantics. In this case, the copy will use its own deallocator. Therefore, it is usually best to only use this initializer when you either enforce immutability with let or ensure that no other references to the underlying data are formed.
Because the bytesNoCopy initializers take ownership* of the buffer, you can't rely on being able to access the buffer through any reference except the Data reference itself, and given that Data now owns the buffer and may need to make copies of the data in the future, it is valid for it to copy the data into a more efficient representation immediately as long as it cleans up the underlying buffer (which @eskimo shows) as an optimization.
Now, if you need a guarantee that the buffer itself will remain in use even if you have multiple references to the data objects, NSData would be preferred over Data because of its object semantics.
*there is one case I can think of where this is more unexpected than others: when you pass in a deallocator of .none, Data can't necessarily take ownership of the buffer. In those cases, it should likely make stronger guarantees about not copying up-front (though again, Datamust be able to copy in order to maintain value semantics).
However, when the user already expressed that they want the Data to be backed with a __DataStorage instead of a Buffer tuple by specifying it with byteNoCopy, shouldn't it be honored? To be specific, doesn't it make more sense to force the .slice(InlineSlice) to be used even if the InlineData.canStore(count:) returns true when the copy is passed as false to the __DataStorage initializer? And in that case, the deallocator is not called immediately.
Anyway, it would be appreciated to know if there is a viable way to force a .slice representation of data even if the size is relatively small that falls in the .inline category.
If you guys have an existing workaround to achieve that, I would be grateful to be educated if you can share that knowledge with me.
There's currently no way to force it. What's the reason for needing a .slice?
However, when the user already expressed that they want the Data to be backed with a __DataStorage instead of a Buffer tuple by specifying it with byteNoCopy , shouldn't it be honored?
Again, Data takes ownership of the buffer that you give it in cases like this, and once it does, it's allowed to copy it if need be. Are you looking to effect changes on the Data instance by writing to the raw pointer? Because that's neither safe (because you don't own the buffer anymore), nor guaranteed to be possible (because it may have been copied due to CoW).
If you absolutely need to do this, consider NSData/NSMutableData which doesn't have as many copying considerations, but modifying the pointer externally is still kind of iffy.
I understand, and thank you for all your concerns and suggestions.
Well, as I said, and as what you have suggested to be not safe, I am trying to make sure the data initialized with the bytesNoCopy to be immutable with a let declaration, and only mutable with the pointer.
I agree. But CoW only happens when mutating the value right? So
let pointer = UnsafeMutableRawPointer(...)
let pointerBasedData = Data(bytesNoCopy: pointer, ...)
// `pointerBasedData` use `pointer` as its storage directly
var data = pointerBasedData
// `data` shares the storage of `pointerBasedData` which is also `pointer`
data.append(1)
// CoW happens
// `data` gets copied to a new address and the mutation happens
This is fine in my case, the data is mutated from outside which means it no longer holds the original content of the pointer, so I would not care if changing pointer does not affect data as long as the pointerBasedData is reflected.
Sadly that is not likely to be a doable choice for our code base.
I meant that pointerBaseData changes its value from external stimuli. The point of value semantic is precisely that you can predict how/where the mutation occurs, which is generally local to boot.