Is it safe to use a variable after modifying via a rebound pointer?

I have some code which is serialising a data type. The maximum length of the string is 40 bytes, and for various reasons I would like the ability to serialise in to a kind of fixed-size array. My serialisation code uses various utility functions which require an UnsafeMutableBufferPointer<UInt8>. So here's what I came up with:

var result: (UInt64, UInt64, UInt64, UInt64, UInt64) = (0, 0, 0, 0, 0)
    
let count = withUnsafeMutableBytes(of: &result) { rawStringBuffer -> Int in
  let stringBuffer = rawStringBuffer.bindMemory(to: UInt8.self)
  // Write contents in to stringBuffer, which is a UMBP<UInt8>.
}

return (result, count)

My understanding is that .bindMemory is needed for this, as I'm reinterpreting result as a type of a different size, and that I should never use rawStringBuffer again after binding it.

But is it safe to use result after the closure has completed? I think it is, but I'm not sure.

[edit: see Andrew's post, I had forgotten a critical piece of info]

1 Like

Accessing the UInt64 tuple result is undefined as long as its memory is bound to UInt8.

It should be:

  var result: (UInt64, UInt64, UInt64, UInt64, UInt64) = (0, 0, 0, 0, 0)
    
  let count = withUnsafeBytes(of: &result) { rawStringBuffer -> Int in
      // Write contents in to rawStringBuffer, which is a UMRBP.
  }

  return (result, count)

Presumably you wanted a typed pointer so you could pass it to a C char * function? That requires a type system fix: [SR-10246] Support limited implicit pointer conversion when calling C functions.
Until then, you need to lie to the type system and create a CChar pointer that's never used in the Swift code:

strcpy(rawStringBuffer.baseAddress!.assumingMemoryBound(to: CChar.self), ...)

You should also be able to use withMemoryRebound for this sort of thing if you have something against raw pointers, but it has silly limitations: SR-11082 - Support withMemoryRebound(to:) with mismatched sizes

Similarly, accessing a String value while it's memory is bound to UInt8 can miscompile:

4 Likes

Ah I had forgotten about that fix, though I saw it go by. Thank you for the correction!

Ah, that's what I was afraid of - that binding the pointer inside withUnsafeBytes(of:) would poison the original value. I figured that since it's a "with..." method, I should be able to bind the pointer within the closure scope without affecting anything before/after the call. Guess I'll have to fix that.

What about if I don't use result again as a UInt64 tuple, but instead load the tuple out of the bound pointer's raw bytes? Are there any safety issues if I abandon the result local with its memory left bound to a different type?

typealias FortyBytes = (UInt64, UInt64, UInt64, UInt64, UInt64)

var result: FortyBytes = (0, 0, 0, 0, 0)    
return withUnsafeMutableBytes(of: &result) { rawStringBuffer -> (FortyBytes, Int) in
  let stringBuffer = rawStringBuffer.bindMemory(to: UInt8.self)
  // Write contents in to stringBuffer, which is a UMBP<UInt8>.

  return (
    UnsafeRawPointer(stringBuffer).load(fromByteOffset: 0, as: FortyBytes.self),
    count
  )
}

--

I'm serialising an IPv6 address in a kind of pure-swift version of inet_ntop. IP addresses require a lot of these sort of type-punning tricks.

The only reason I was using an UMBP<UInt8> was because a utility function to print an integer as hex was written to use them. It certainly could work with a raw pointer (and that's what I will do to fix the issue), but at the time, it felt more appropriate to use a typed pointer since the function is writing UTF8 code-units, which semantically are UInt8s rather than raw, untyped bytes.

Is there any guidance about when to use a raw pointer over a UInt8-typed pointer? My intuition is that a raw pointer should be used for binary blobs (e.g. a file read from disk, a packet received from the network), but once you impose some structure/meaning to the bytes, it's reasonable (and maybe preferable) to codify that structure via type system.

That's right. Binding a local variable's memory to a different type poisons that local variable. You could bind memory back to the UInt64 tuple when you're done (that's all withMemoryRebound(to:) does), but it's cumbersome.

No problem there technically. Although it's nonobvious why you did that to someone reading the code.

Right, I understand that. I personally think UTF8 utilities should operate on raw pointers because the utility already assumes the encoding. It already establishes that contract with the caller, so why should it care where the memory came from? There's just no reason to force it to use a typed pointer. People do it thinking it gives you more type safety, then type safety is thrown out the window when you want to decode a blob of memory without copying it first. If you really want the API to be type safe, and avoid copying data, then you should pass the utility a UTF8 code unit view that is a thin wrapper around a raw pointer.

Simply put, UnsafePointer<UInt8> is only for objects that have UInt8 type, not for things that happen to be encoded as bytes!

I think it's wrong to use a typed pointer for memory that was initialized as a different type, if you can avoid it. (I realize you can't avoid it in practice because Swift utilities are already written using typed pointers.)

Imposing structure on top of a memory buffer should be done one of two ways:

  • Loading the expected type from a raw pointer
  • Defining a typed "view" over a raw pointer, which is trivial to do

I realize C programmers will have a really hard time with this because they're used to using pointers as views over memory buffers and pretending that strict aliasing isn't really a thing.

6 Likes