Hi, i've been instructed by the latest xcode (10.2) to migrate some code that used "withUnsafeMutableBytes" and "withUnsafeBytes" over the Data type with this message : "use withUnsafeMutableBytes<R>(_: (UnsafeMutableRawBufferPointer) throws -> R) rethrows -> R.
After some digging on the net i started using a mix of UnsafeMutablePointer.allocate(capacity) and "myData.withUnsafeBytes { (_ ptr:UnsafeRawBufferPointer) -> Void ... " which now compiles without warning.
However, i tried to look at the definition for those variants in the Data documentation (Apple Developer Documentation) to make sure i wasn't doing incorrect things with the memory, and found that pretty much every "withUnsafe" functions are deprecated (yet xcode gives no warning anymore), and not a single line explain what are now the official recommended way of accessing those data (except for the subscript operators, which don't suit my case).
Autocompletion also suggested "withContinuousStorageIfAvailable", which seems to indicate that there could be issues regarding the underlying storage continuity ? But there also, i couldn't find any documentation on that subject on the documentation (is this a sequence only thing that's not really relevant in the case of the Data ?).
I must admit i really don't know where to look at, nor which method are now supposed to be the correct way of accessing a Data bytes buffer (for context, the initial goal of the function was to use the commoncrypto CC_MD5 over the data buffer).
Data conforms to Foundation’s ContiguousBytes protocol, which defines a single function you can use to get the underlying bytes. That’s the function you should call.
You may find it helpful to explicitly define the type of the argument your closure expects to an UnsafeMutableRawBufferPointer such that the compiler will select the correct function.
Thanks for the infos. I tried to look at the documentation for this method, and there's simply nothing. What's the return type for ? Since we're talking about "bytes" why is the closure parameter a RawPointer instead of a Something< UInt8 > (which would be convenient in my case since CC_MD5 uses a UnsafeMutablePointer< UInt8 > for the destination buffer) ? Would it work fine with the "memory rebound" apis in case i need to have it typed ?
Trying to do "data.withUnsafeMutableBytes { (_ ptr: UnsafeMutableRawBufferPointer< UInt8 >) -> Void in .. " results in a "Cannot specialize non generic type UnsafeMutableRawPointer"...
I had a look at the "Manual Memory Management" chapter of the documentation, which explains the "unsafe" types very well, but i'm starting to get the feeling there are still gaps in how those types are integrated in the stdlib (or at least in the stdlib documentation).
All the memory access functions are an extremely sensitive part of the api which most average developers (like myself) don't use on a daily basis, so i was a bit surprised by the lack of documentation, especially if xcode starts throwing new warnings...
Do you know if there there are any effort to get the official Swift documentation in the hands of the community ? I would glady contribute.
The change here primarily had to do with the possibility of creating Data with Data.init(bytesNoCopy:count:deallocator:), which allows someone to create a Data instance wrapping an already-existing buffer.
When someone does this, they can pass in a raw pointer to any buffer which they have created, which may or may not have been initialized with various types of data; specifically, the passed pointer could be bound to Typed Memory where the bound type is non-trivial.
Previously, Data presented an interface which returned an UnsafeBufferPointer<UInt8>, and did this by rebinding the memory on your behalf: this could implicitly trigger undefined behavior if the original buffer was one you didn't own, and have no control over how it was allocated and initialized.
The change here keeps underlying Data access entirely untyped via Raw pointers. With a Raw pointer, you can read the bytes directly (via load(fromByteOffset:as:)/copyMemory(from:)), without running the risk of implicit undefined behavior. If you did have control over how the buffer as initialized (specifically, you know the original buffer was either untyped, or bound to a trivial type like UInt8), then it is also safe to rebind the raw buffer to the type you want with bindMemory(to:)).
Indeed, raw buffers differ from typed buffers, but there are various ways of reading directly out of a raw buffer, and hopefully the specific documentation on UnsafeRawBufferPointer and continued reading of the Manual Memory Management guide can help. (Also happy to answer specific questions to help guide you!)
Unfortunately, the documentation on developer.apple.com is not part of the open-source effort, but please do file a Radar for any unclear/missing documentation you find — we really do want the documentation on this to be clear, understandable, and easy to find.
func md5DigestA(of data: Data) -> Data {
precondition(!data.isEmpty)
var result = [UInt8](repeating: 0, count: Int(CC_MD5_DIGEST_LENGTH))
data.withUnsafeBytes { buffer in
_ = CC_MD5(buffer.baseAddress!, CC_LONG(buffer.count), &result)
}
return Data(result)
}
If you want to handle the empty data case [1], remove the precondition check on line 2 and the force unwrap on line 5. This works because CC_MD5 will handle a NULL parameter if the count is 0.
If you want to handle the empty data case and you’re dealing with a C function that doesn’t allow a NULL pointer when the count is 0, things get more complex (-:
Share and Enjoy
Quinn “The Eskimo!” @ DTS @ Apple
[1] A question that deserves serious consideration, at least in a crypto context.
let nbBytes = Int(CC_MD5_DIGEST_LENGTH)
let digestBytes = UnsafeMutablePointer<UInt8>.allocate(capacity: nbBytes)
defer { digestBytes.deallocate() }
data.withUnsafeBytes { ptr in
guard let baseAddress = ptr.baseAddress else { return }
CC_MD5(baseAddress, CC_LONG(ptr.count), digestBytes)
}
return Data(bytes: digestBytes, count: nbBytes)
Because i wasn't sure about memory contiguity for any of the Foundation/stdlib structures (being arrays or Data). So using an explicitely allocated UnsafeMutablePointer structure seemed the safest bet.
As of the aforementioned changes above in Swift 5 and beyond, Data is guaranteed to be contiguous such that allocating a separate copy should not be necessary.
Data(result) here creates a copy, but it is possible to avoid this by creating a Data instead of an array (with the right count) and writing into its buffer directly:
import Foundation
import CommonCrypto
func digest(_ data: Data) -> Data {
var md5 = Data(count: Int(CC_MD5_DIGEST_LENGTH))
md5.withUnsafeMutableBytes { md5Buffer in
data.withUnsafeBytes { buffer in
let _ = CC_MD5(buffer.baseAddress!, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)
}
}
return md5
}
Thanks, this is helpful. I wanted to try a different approach, but it is not working (getting wrong result) and I can't figure out why. Do you mind having a look?
func digest2(_ data: Data) -> Data {
let size = Int(CC_MD5_DIGEST_LENGTH)
let md = UnsafeMutablePointer<UInt8>.allocate(capacity: size)
data.withUnsafeBytes {
CC_MD5($0.baseAddress!, UInt32(size), md)
}
return Data(bytesNoCopy: md, count: size, deallocator: .free)
}
Data(result) is using the memory allocated by result or is it
allocating again?
It’s allocating again. Should you be concerned about that? Only if you’re calling this a lot. Unless this code is very hot, that extra allocation just won’t matter.
Moreover, attempting to remove it can cause you grief. For example, the code you posted downthread has these lines:
which is not valid. Memory that you allocate with allocate(capacity:) must be freed by deallocate, but .free causes it to be freed by free. This happens to work on Apple platforms, but is not guaranteed by the API. For more details, see UnsafeMutablePointer allocation compatibility with C malloc/free.
It's a serious usability bug that Data and UnsafeRawPointer do not interoperate with C functions that take char * byte buffers. App programmers should never need to use any of the memory binding APIs just to call libraries.
I'm struggling with this whole memory management stuff, probably because I've never come across a good introduction to it. So I've basically relied on imitating other people's code. If someone can point me in the right direction for a good introduction (I've been programming for nearly 40 years in several languages, but seriously in Swift for only a few months), I'd be grateful.
Anyway, my issue comes up with the same recommendation from Xcode 10.2. I have a function to do a simple test of a block of data to confirm that it is likely to actually be icns data:
I'm struggling with this whole memory management stuff …
That’s understandable. Swift makes this challenging because:
The API details have changed quite a lot over the years.
Recent versions have strict rules about aliasing (in this sense of the word). These will yield long-term benefits, but they do take some getting used to.
With regards your specific issue, I’m a big fan of moving up a level of abstraction. In your case, I’d rethink this as a parsing problem rather than a structure access problem. The fact that Data exposes its contents as a collection of bytes means you can take advantage of lots of functionality that’s available on collections. For example:
One thing to note about this code it that, on a 32-bit machine, it avoids the trap you might encounter converting the length bytes of a maliciously crafted icns to Int.