Unsafe bytes, 64-byte alignment, CCCrypt, TensorFlow, and deallocation

JetForMe · March 10, 2020, 12:25am

We’re using TensorFlow in our iOS app, which requires the weights file be 64-byte aligned. We also encrypt our weights file, so to decrypt it we have this mess (I didn’t write this, and I believe the outPtr is wrong, although it seems to work in practice).

	private func crypt(input: Data, operation: CCOperation) throws -> DecryptionResults?
	{
		var outLength = Int(0)
		let outData = Data(count: input.count + kCCBlockSizeAES128 + 64)		//	Allocate an extra 64 bytes to allow for 64-byte alignment of the resulting data.
		var outPtr: UnsafeMutableRawPointer? = nil

		outData.withUnsafeBytes
		{ (u8Ptr: UnsafePointer<UInt8>) in
			outPtr = UnsafeMutableRawPointer(mutating: u8Ptr)
		}
		
		// We want to pass 64 byte aligned data to Tensorflow, so we align the output buffer to that
		var offset = unsafeBitCast(outPtr, to: Int.self)
		offset = offset % 64 > 0 ? (64 - offset % 64) : 0

		var status: CCCryptorStatus = CCCryptorStatus(kCCSuccess)
		input.withUnsafeBytes { (encryptedBytes: UnsafePointer<UInt8>!) -> () in
			self.iv.withUnsafeBytes { (ivBytes: UnsafePointer<UInt8>!) in
				self.key.withUnsafeBytes { (keyBytes: UnsafePointer<UInt8>!) -> () in
					status = CCCrypt(operation,
							CCAlgorithm(kCCAlgorithmAES128),            // algorithm
							CCOptions(kCCOptionPKCS7Padding),           // options
							keyBytes,                                   // key
							self.key.count,                             // keylength
							ivBytes,                                    // iv
							encryptedBytes,                             // dataIn
							input.count,                                // dataInLength
							outPtr! + offset,                           // dataOut
							input.count + kCCBlockSizeAES128,           // dataOutAvailable
							&outLength)                                 // dataOutMoved
				}
			}
		}
		...
		return DecryptionResults(decryptionData: outData, offset: offset, outLength: outLength)
	}

I got here because while profiling memory use in Instruments, it seems the input Data isn't being deallocated in the scope in which it’s being allocated, and I'm wondering if something about this code here is hanging on to it.

It seems I also need to nest this yet one more scope to properly use outData.withUnsafeBytes.

So, a few questions:

Is there a way to get a Data allocated to a 64-byte boundary that's better than this? This current solution makes us pass an offset and “real” length around along with the decrypted data.
Does withUnsafeBytes increase the reference count on Data in some unexpected way?
As I’ve lamented several times in the past, working with C and “unsafe” data is very cumbersome in Swift. Is there a more elegant way to do this in Swift 5.1?

UPDATE: After a bit more investigating, it seems I should be able to do this:

	let outputRaw = UnsafeMutableRawPointer.allocate(byteCount: input.count + kCCBlockSizeAES128, alignment: 64)
	let outputData = Data(bytesNoCopy: outputRaw, count: input.count + kCCBlockSizeAES128, deallocator: .custom({ inPtr, inSize in inPtr.deallocate() }))

Does this seem reasonable? I haven't actually tried it yet.

As always, thank you.

Lantua · March 10, 2020, 2:28am

At least one gotcha would be this warning.

If the result is mutated and is not a unique reference, then the Data will still follow copy-on-write semantics. In this case, the copy will use its own deallocator. Therefore, it is usually best to only use this initializer when you either enforce immutability with let or ensure that no other references to the underlying data are formed

You can just work with outputRaw directly until the end, and if you don't need the alignment after returning, you can just use normal Data.init.

Do not do this. It is in direct conflict with this warning

The byte pointer argument should not be stored and used outside of the lifetime of the call to the closure.

Also, you shouldn't be converting immutable memory to mutable one without explicit information that the memory was originally mutable.

Most withUnsafe variations have a generic ResultType. So likely you can do:

var status = input.withUnsafeBytes { encryptedBytes in
  iv.withUnsafeBytes { ivBytes in
    key.withUnsafeBytes { keyBytes in
      CCCrypt(operation, ...)
    }
  }
}

self is not needed since the closures aren't escaping, and I think one that is Data member is deprecated but I don't know which one is replacing it.

PS

There's also this rule about bound/untyped memory that one needs to adhere to, though I don't know how it interacts with C imports.

JetForMe · March 10, 2020, 4:59am

I’m 99% certain the output data is immutable, but it needs to be 64-byte aligned. It is consumed by TensorFlow as a weights file. Can I decrypt to outputRaw and then build the Data from that, and then pass that along to C code later (it would be a pointer to the Data’s buffer)?

Yes, I know. I didn’t write this code originally, and will be fixing it. Part of my question dealt with the fact that Swift’s approach to this requires so much nesting of closures, one for each raw pointer one works with. I hoped there was a more linear way.

Getting rid of all the parameter typing would help, but I like using self. to make it clear where the symbols come from (in C++ I always prefix symbols with a single lowercase letter indicating their scope, like m; Swift makes that more awkward because property names really look better without that).

None of those manipulations would cause the input data to be retained outside this scope, would they?

Lantua · March 10, 2020, 5:44am

Yup,

Allocate memory to outputRaw.
Decrypt to outputRaw.
Use Data.init(bytesNoCopy, ...) to make immutable Data.

Might as well make a function for it.

func withUnsafeBytes<T, U, V, R>(_ t: T, _ u: U, v: V, block: throws (UnsafeRawPointer, UnsafeRawPointer, UnsafeRawPointer) -> R) rethrows -> R {
  withUnsafe(t) {
    withUnsafe(u) {
      withUnsafe(v) {
        block(t, u, v)
      }
    }
  }
}

I don’t see anything that would retain input. The only appearance of input is when you use withUnsafeBytes which, is just a function call, and will release after the call ends should it retain.

JetForMe · March 10, 2020, 6:05am

Heh, sure, but that’s not quite as flexible as I was hoping for. What if there were four, five, six, seven, or more buffers? ;)

Strange. input is created a couple frames up the stack via a call to Data(contentsOfURL:), and I would have expected that memory to be released after that scope exits, but according to the Xcode memory monitor (this is running on an iOS device), memory shoots up 380 MiB when the input is loaded (the size of the encrypted data), and doesn’t come down 380 MiB after that Data exits scope.

At least the data alignment works out nicely in this instance, which ironically it would not if there were a Swift version of CommonCrypto.

lukasa · March 10, 2020, 9:00am

Is there any reason not to use CryptoKit or Swift Crypto to make this problem go away? It would address some of the cryptographic concerns I have with the way you're encrypting your weights.

lukasa · March 10, 2020, 9:11am

Call posix_memalign. This is the correct allocation function to use for expressing arbitrary alignment requirements. The appropriate function to free this memory is free, so Data(bytesNoCopy:count:deallocator:) can safely be used with the returned pointer expressing free as the deallocator.

There is not and cannot be. Swift must bound the lifetime of the pointer to any managed CoW data structure to correct manage its memory. The only way Swift can do that is by scoping the lifetime, for CoW data structures the only way we can do that is with a closure

In general the good advice is given by @lantua: wrap the unsafe operations into small, safe ones. In this context the unsafe operation is CCCrypt, so you'd want to wrap that. This keeps the nesting in a single function, which is much less appalling than spreading it out through your code. It also puts all the unsafe code in one place, making auditing it much easier.

Of course, you can also just use CryptoKit, which has done the hard work for you. The only downside is that you can't easily achieve your alignment requirement, but that can be resolved by performing a copy if needed.

Karl · March 10, 2020, 9:12am

Probably the OS version requirements. SwiftCrypto doesn’t officially support versions of iOS which predate CryptoKit (i.e. as a polyfill), last time I checked.

lukasa · March 10, 2020, 9:17am

If that's the concern, then @JetForMe please please please authenticate your ciphertext. Encryption without authentication is a very bad idea. HMAC is fine and provided by CommonCrypto.

JetForMe · March 10, 2020, 12:44pm

Alignment is critical, and the copy is rough. We're already running out of RAM on an iPhone 8, and the decrypt hangs on to the input data until it's finished. I suppose I could do that copy-for-alignment after. RAM exhaustion is the most critical aspect here, since it’s killing the app.

We must support iOS 12, and CryptoKit is only available in iOS 13. What are your concerns? Our encryption was looked at extensively by another team before being implemented.

That's exactly what this function does. I just hoped it didn't have to be so clunky.

Would just be interesting if there was a way to do it with a variable number of arguments instead of only one. I guess the wrapper you created above could just be made in 2 - 10 parameter versions to address most needs.

Lantua · March 10, 2020, 1:08pm

I’d go with a magic number 3, maybe also 4. Even with 10 parameter it still have only 3-4 layers. More parameters will just spread the clunkiness horizontally rather than vertically. But well, that definitely can be done.

lukasa · March 10, 2020, 1:39pm

Yeah, you should be able to do this in 2x memory. Sadly, I don't think it's possible to do it in 1x if you're going through Foundation or CryptoKit as they don't give you access to the appropriate in-place APIs.

See my other post at: Unsafe bytes, 64-byte alignment, CCCrypt, TensorFlow, and deallocation - #9 by lukasa

JetForMe · March 10, 2020, 2:22pm

Oh, I hadn't considered that CCCrypt could do it in-place. I don't know if I can manage read the encrypted data off disk directly to the UnsafeMutableRawPointer.

Yeah, I don't understand what you're telling me to do here.

lukasa · March 10, 2020, 2:53pm

Ok. Quick primer.

Encryption is a transformation that takes sensitive data ("plaintext") and makes it statistically indistinguishable from random data ("ciphertext") in a way that is reversible for those in possession of a small secret. This allows encryption to provide secrecy: if you encrypt data, it is not supposed to be possible for anyone with access to the ciphertext to determine what was inside it.

However, encryption does not validate data integrity. This means that attackers can change your ciphertext and this will not be detectable by the encryption scheme. After all, to be indistinguishable from random data it must be possible for any cipher combined with any key to potentially produce all possible ciphertext blocks: therefore, all possible ciphertext blocks must be validly decrypt able.

For AES-CBC (what you're using here), the malleability of the ciphertext is well understood. An attacker that flips a bit in the ciphertext will deterministically flip the corresponding bit in the plaintext in the next block, as well as lead to gibberish in the block whose bit was flipped.

This is an attack that can be used both to render data useless (by just wildly flipping bits) as well as to launch highly targeted attacks. For example, if the attacker is familiar with what some of your weights are likely to be, they can happily change those weights to anything else, at the cost of turning some other weights to gibberish.

As a first-order assumption, any data you cared enough about to want to encrypt should also be authenticated. This defends against the above attacks by ensuring that the data you're operating on is the same as the data you encrypted in the first place.

CryptoKit and SwiftCrypto only offer authenticated encryption modes (AES-GCM and ChaCha20-Poly1305). Unfortunately for you, CommonCrypto does not offer any, so you need to build one yourself.

The way to do this is to combine your encryption with a MAC (Message Authentication Code). A MAC is a function that takes two parameters, a shared secret key and the data to authenticate, and produces a public authentication "tag". The other end of the connection can then take a tag, the same shared secret, and the data to validate, and confirm that this data was used with this key to generate this tag.

The most common and easiest-to-use MAC is HMAC (Hash-based Message Authentication Code). This is simply a fancy way to apply a hash function to some data. It's extremely fast, extremely well-understood, and well-trusted.

To build this crypto system properly you need to bear in mind the Cryptographic Doom Principle: when combining MACing and encryption there is only one right way to do it. Specifically, on the encryption side you must do this:

Generate your shared keys. Different ones for encryption and MAC please, though you can derive these keys from the same shared secret using a key derivation function.
Encrypt the plaintext data P using the encryption key K_e to produce ciphertext C.
Generate the MAC tag M by applying HMAC to the ciphertext C with MAC key K_M.
Serialize the result by concatenating C and M to form your result data R.

Then when you decrypt, you do this:

Receive the result data R'.
Trim off the tag from the end of R', leaving you with ciphertext C' and tag M'. You will know how big the tag should be (they're fixed width), so you should just unconditionally use that many bytes. Do not allow your data R' to somehow encode the length of the tag M', this is a violation of the doom principle.
Verify the tag M' is valid for the ciphertext C' using the pre-arranged MAC key K_m. If the tag fails to validate, go no further: the ciphertext cannot be trusted. Report a generic error.
Only now may you decrypt the ciphertext C' using the pre-arranged encryption key K_e.

I appreciate that this is a pain in the neck: this is why high level libraries like CryptoKit exist. But I stress that you really must do this: there are to a first-order approximation no threat models in which it is sensible to encrypt data that you don't also authenticate.

JetForMe · March 11, 2020, 5:10am

Ah, okay. I appreciate the thorough and well-formatted post, but that's way outside the scope of my question. The purpose of encryption has less to do with our need to authenticate the contents but rather to prevent others from getting it. We go through a number of authentication and key exchange steps long before we get to the decryption phase that I'm dealing with here.

ahti · March 11, 2020, 8:08am

Fwiw, and you might well be aware of this already: If the decrypting happens on device there is basically no way (at least that I know of, I'd be happy to be proven wrong) to prevent an attacker to run the app in a debugger on a jailbroken phone, set an appropriate breakpoint and dump the decrypted weights from memory. So you can't really prevent others from getting it, just make it slightly more involved.

JetForMe · March 11, 2020, 8:09am

Yeah, we make an effort to detect a jailbroken phone and use Apple’s device authentication APIs, but honestly all that stuff is in there to make the execs happy.

Karl · March 11, 2020, 11:14am

If the attackers are able to do all of that, they’re more than capable of stepping through and skipping whatever checks you had in place.

But yeah, in business, some times you have to make a token effort, even if you know it can’t actually stand up to any scrutiny.