The `Data.init(bytesNoCopy:count:deallocator:)` does not work as expected when `Data` is of representation of `InlineData`

ylorn · October 8, 2020, 12:15am

When the count of the bytes is no greater than 14 in a 64 system, or 6 in a 32 system, the initializer of Data.init(bytesNoCopy:count:deallocator:) does not work as expected as the label suggests: it makes a copy instead of using the bytes pointer.

let testPointer = UnsafeMutableRawPointer.allocate(byteCount: 1, alignment: 1)
testPointer.storeBytes(of: 1, as: UInt8.self)

print(testPointer.load(as: UInt8.self))
// prints 1

let testData = Data(bytesNoCopy: testPointer, count: 1, deallocator: .free)

testPointer.storeBytes(of: 0, as: UInt8.self)
print(testPointer.load(as: UInt8.self))
// prints 0

print(testData[0])
// prints 1, but it should print 0

Jens · October 8, 2020, 6:14am

EDIT: Sorry for the noise. I removed the content of my post, which essentially boiled down to a verification of what you show in the OP. And I agree that the current behavior is surprising / a bug.

lukasa · October 8, 2020, 6:59am

Issues with Foundation should be reported using feedbackassistant.apple.com.

eskimo · October 8, 2020, 9:40am

I don’t consider this to be a bug in Data. IMO the ‘no copy’ variants are an optimisation and there’s no requirement that Data implement that optimisation in all circumstances.

Notably, if Data decides to not use the buffer it frees it immediately. Consider this:

let size = 1
let p = calloc(size, 1)!
let d = Data(bytesNoCopy: p, count: size, deallocator: .custom({ p, _ in
    print("free")
    free(p)
}))
print(d)

which prints:

free
1 bytes

And that suggests that this isn’t a simple omission.

You could, of course, argue that the documentation should cover this non-obvious behaviour, and file a bug on that basis (-:

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

itaiferber · October 8, 2020, 12:52pm

I agree with @eskimo that this is confusing given the current state of the documentation, but that it's not a bug. When you give NSData/Data a no-copy buffer, it needs to own that buffer in order to correctly handle deallocation. From NSData.init(bytesNoCopy:length:freeWhenDone:) on the freeWhenDone flag:

If true , the returned object takes ownership of the bytes pointer and frees it on deallocation.

The same is implicitly true for the other deallocator cases, though it could be spelled out more clearly.

There are also cases in which Data must copy, despite being initialized with bytesNoCopy. From Data.init(bytesNoCopy:count:deallocator:):

If the result is mutated and is not a unique reference, then the Data will still follow copy-on-write semantics. In this case, the copy will use its own deallocator. Therefore, it is usually best to only use this initializer when you either enforce immutability with let or ensure that no other references to the underlying data are formed.

Because the bytesNoCopy initializers take ownership* of the buffer, you can't rely on being able to access the buffer through any reference except the Data reference itself, and given that Data now owns the buffer and may need to make copies of the data in the future, it is valid for it to copy the data into a more efficient representation immediately as long as it cleans up the underlying buffer (which @eskimo shows) as an optimization.

Now, if you need a guarantee that the buffer itself will remain in use even if you have multiple references to the data objects, NSData would be preferred over Data because of its object semantics.

*there is one case I can think of where this is more unexpected than others: when you pass in a deallocator of .none, Data can't necessarily take ownership of the buffer. In those cases, it should likely make stronger guarantees about not copying up-front (though again, Data must be able to copy in order to maintain value semantics).

ylorn · October 8, 2020, 4:06pm

That is interesting to know the deallocator is called immediately when initializing in that case.

But that is still an inconsistent behavior. I mean it works as the document suggested when the _Representation is NOT .inline(InlineData):

Creates a data buffer with memory content without copying the bytes.

Unlike other _Representation such as .slice(InlineSlice), the Buffer of the InlineData is a 14 (6 in 32 system) elements tuple, which is undoubtedly allocated on the stack, so I understand it has to copy the bytes to its own buffer instead of using the given pointer.

However, when the user already expressed that they want the Data to be backed with a __DataStorage instead of a Buffer tuple by specifying it with byteNoCopy, shouldn't it be honored? To be specific, doesn't it make more sense to force the .slice(InlineSlice) to be used even if the InlineData.canStore(count:) returns true when the copy is passed as false to the __DataStorage initializer? And in that case, the deallocator is not called immediately.

ylorn · October 8, 2020, 4:13pm

Anyway, it would be appreciated to know if there is a viable way to force a .slice representation of data even if the size is relatively small that falls in the .inline category.

If you guys have an existing workaround to achieve that, I would be grateful to be educated if you can share that knowledge with me.

@eskimo @lukasa @Jens @itaiferber

itaiferber · October 8, 2020, 5:03pm

There's currently no way to force it. What's the reason for needing a .slice?

However, when the user already expressed that they want the Data to be backed with a __DataStorage instead of a Buffer tuple by specifying it with byteNoCopy , shouldn't it be honored?

Again, Data takes ownership of the buffer that you give it in cases like this, and once it does, it's allowed to copy it if need be. Are you looking to effect changes on the Data instance by writing to the raw pointer? Because that's neither safe (because you don't own the buffer anymore), nor guaranteed to be possible (because it may have been copied due to CoW).

If you absolutely need to do this, consider NSData/NSMutableData which doesn't have as many copying considerations, but modifying the pointer externally is still kind of iffy.

ylorn · October 8, 2020, 7:08pm

I understand, and thank you for all your concerns and suggestions.

Well, as I said, and as what you have suggested to be not safe, I am trying to make sure the data initialized with the bytesNoCopy to be immutable with a let declaration, and only mutable with the pointer.

I agree. But CoW only happens when mutating the value right? So

let pointer = UnsafeMutableRawPointer(...)

let pointerBasedData = Data(bytesNoCopy: pointer, ...)
// `pointerBasedData` use `pointer` as its storage directly

var data = pointerBasedData
// `data` shares the storage of `pointerBasedData` which is also `pointer`

data.append(1) 
// CoW happens
// `data` gets copied to a new address and the mutation happens

This is fine in my case, the data is mutated ~~from outside~~ which means it no longer holds the original content of the pointer, so I would not care if changing pointer does not affect data as long as the pointerBasedData is reflected.

Sadly that is not likely to be a doable choice for our code base.

Lantua · October 8, 2020, 7:12pm

I'm pretty sure that breaks the Data's value semantic, since the data mutates without going through any mutating operations.

ylorn · October 8, 2020, 7:17pm

Sorry, I am not sure if I follow.

Isn't Data.append(_ newElement:) a mutating function?

Can you elaborate? Thanks.

Lantua · October 8, 2020, 7:20pm

I meant that pointerBaseData changes its value from external stimuli. The point of value semantic is precisely that you can predict how/where the mutation occurs, which is generally local to boot.

ylorn · October 8, 2020, 7:23pm

Ah, I see.

Yes, pointerBaseData is immutable with the let declaration, but can be mutated secretly by the pointer. That does break the value semantic.