withUnsafeBytes Data API confusion

This is a case where "avoiding force unwrapping" has gotten you into a trap: you are no longer producing an md5 hash for an empty data empty Data instances without storage.

3 Likes

Sorry for my ignorance but what's the point to generate the MD5 of an empty data?

If I'm commiting files into a git repo, I need to know the hash of each one so that they get stored (and deduplicated properly). That should work even if I try to check in an empty file!

(Git uses SHA-1 for now, not MD5, but you get the idea.)

Well, the code that I wrote above does not have the purpose to handle this case. I am not sure how to modify it to accept this case, I just modified the other examples above to make it "safe".

CC_MD5 takes an optional pointer; you should pass baseAddress even if it is nil. If it didn't, though, it's important to not silently produce a wrong answer:

guard let baseAddress: UnsafeRawPointer = buffer.baseAddress else {
  preconditionFailure("data must be non-empty")
}

Alternately, you could be maximally correct and still call this hypothetical CC_MD5_with_non_optional_param by providing your own dummy address. This ought to be safe because you pass 0 for the count.

guard let baseAddress: UnsafeRawPointer = buffer.baseAddress else {
  assert(buffer.count == 0)
  var dummy = 0
  CC_MD5_with_non_optional_param(&dummy, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)
  return
}
_ = CC_MD5_with_non_optional_param(baseAddress, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)

Why not simply do this? I didn't know that CC_MD5 accepted nil values. I tested it with a empty Data object and it's returning 1B2M2Y8AsgTpgAmY7PhCfg==.

import CommonCrypto

func buildMD5(data: Data) -> String {
  var md5: Data = Data(count: Int(CC_MD5_DIGEST_LENGTH))
  md5.withUnsafeMutableBytes { (md5Buffer: UnsafeMutableRawBufferPointer) in
    data.withUnsafeBytes { (buffer: UnsafeRawBufferPointer) in
      _ = CC_MD5(buffer.baseAddress, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)
    }
  }
  return md5.base64EncodedString()
}

Yep, that's the best answer! I was trying to include the recommended alternatives if it wasn't supported, but I should have included the "happy path" first, my bad.

how to make an empty data instance without storage?

it actually doesn't accept nil values, and neither does the newer CC_SHA256. the function doesn't crash but it also doesn't do anything useful either. so you have to pass a non zero pointer to it for it to make the calculation even if the count is 0, so use a variation of that method above that uses a dummy variable or a force unwrap if you are not afraid of it.

Or just stop using Common Crypto for this stuff (-:

import CryptoKit

let md5 = CryptoKit.Insecure.MD5.hash(data: Data())
print(md5)
// -> MD5 digest: d41d8cd98f00b204e9800998ecf8427e

let sha256 = CryptoKit.SHA256.hash(data: Data())
print(sha256)
// -> SHA256 digest: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

1 Like

in fact - no need to be afraid: always unwrap here, as Data's withUnsafeBytes can never give a buffer that has nil baseAddress:

    let bp = UnsafeMutableRawBufferPointer(start: nil, count: 0)
    assert(bp.baseAddress == nil)
    let data = Data(bp)
    data.withUnsafeBytes { p in
        assert(p.baseAddress != nil)
        p.baseAddress! // safe
    }

and indeed, what Quinn says if you can tolerate @available(iOS 13.2, macOS 10.15, watchOS 6.1, tvOS 13.2, *)

If it’s not documented, it’s an implementation detail and may change in future releases. Apple Developer Documentation

2 Likes