withUnsafeBytes Data API confusion

This is a case where "avoiding force unwrapping" has gotten you into a trap: you are no longer producing an md5 hash for an empty data empty Data instances without storage.


Sorry for my ignorance but what's the point to generate the MD5 of an empty data?

If I'm commiting files into a git repo, I need to know the hash of each one so that they get stored (and deduplicated properly). That should work even if I try to check in an empty file!

(Git uses SHA-1 for now, not MD5, but you get the idea.)

Well, the code that I wrote above does not have the purpose to handle this case. I am not sure how to modify it to accept this case, I just modified the other examples above to make it "safe".

CC_MD5 takes an optional pointer; you should pass baseAddress even if it is nil. If it didn't, though, it's important to not silently produce a wrong answer:

guard let baseAddress: UnsafeRawPointer = buffer.baseAddress else {
  preconditionFailure("data must be non-empty")

Alternately, you could be maximally correct and still call this hypothetical CC_MD5_with_non_optional_param by providing your own dummy address. This ought to be safe because you pass 0 for the count.

guard let baseAddress: UnsafeRawPointer = buffer.baseAddress else {
  assert(buffer.count == 0)
  var dummy = 0
  CC_MD5_with_non_optional_param(&dummy, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)
_ = CC_MD5_with_non_optional_param(baseAddress, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)

Why not simply do this? I didn't know that CC_MD5 accepted nil values. I tested it with a empty Data object and it's returning 1B2M2Y8AsgTpgAmY7PhCfg==.

import CommonCrypto

func buildMD5(data: Data) -> String {
  var md5: Data = Data(count: Int(CC_MD5_DIGEST_LENGTH))
  md5.withUnsafeMutableBytes { (md5Buffer: UnsafeMutableRawBufferPointer) in
    data.withUnsafeBytes { (buffer: UnsafeRawBufferPointer) in
      _ = CC_MD5(buffer.baseAddress, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)
  return md5.base64EncodedString()

Yep, that's the best answer! I was trying to include the recommended alternatives if it wasn't supported, but I should have included the "happy path" first, my bad.

how to make an empty data instance without storage?

it actually doesn't accept nil values, and neither does the newer CC_SHA256. the function doesn't crash but it also doesn't do anything useful either. so you have to pass a non zero pointer to it for it to make the calculation even if the count is 0, so use a variation of that method above that uses a dummy variable or a force unwrap if you are not afraid of it.

Or just stop using Common Crypto for this stuff (-:

import CryptoKit

let md5 = CryptoKit.Insecure.MD5.hash(data: Data())
// -> MD5 digest: d41d8cd98f00b204e9800998ecf8427e

let sha256 = CryptoKit.SHA256.hash(data: Data())
// -> SHA256 digest: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

in fact - no need to be afraid: always unwrap here, as Data's withUnsafeBytes can never give a buffer that has nil baseAddress:

    let bp = UnsafeMutableRawBufferPointer(start: nil, count: 0)
    assert(bp.baseAddress == nil)
    let data = Data(bp)
    data.withUnsafeBytes { p in
        assert(p.baseAddress != nil)
        p.baseAddress! // safe

and indeed, what Quinn says if you can tolerate @available(iOS 13.2, macOS 10.15, watchOS 6.1, tvOS 13.2, *)

If it’s not documented, it’s an implementation detail and may change in future releases. Apple Developer Documentation

