withUnsafeTypePunnedPointer

When working with C APIs, or even low-level memory manipulation in Swift, it's frequently necessary to pass a pointer with a different pointee type from the value being pointed to. "How the heck do I do this?" is a frequently-asked question on these forums. It seems like something we could supply easily in the standard library:

func withUnsafeTypePunnedPointer<T, U, R>(to: T, do body: (UnsafePointer<U>) -> R) -> R {
  return withUnsafePointer(to: to) {
    return $0.withMemoryRebound(to: U.self, capacity: MemoryLayout<T>.size / MemoryLayout<U>.size, body)
  }
}

For example:

var addr = sockaddr_in6()
// set up IPv6 address...
// ... then pass it to bind as a sockaddr
withUnsafeTypePunnedPointer(to: addr) {
  bind(sock, $0, MemoryLayout<sockaddr_in6>.size)
}
18 Likes

This seems like a nice addition. This exact pattern appears 4 times in SwiftNIO, each time with sockaddr structures.

2 Likes

Technically it is combination of withUnsafePointer() and bindMemory(to:)?

I have strongly mixed feelings about this proposal. I'll explain why, and how it fits with other possible solutions.

TLDR; This is a bad API for Swift, and there are better alternatives that we definitely should do. OTOH, I also think we should provide the simplest possible interop with legacy (mistyped) C APIs, and don't think those APIs will go away. Ultimately, it just depends on whether dealing with those APIs is really a significant pain point. It really comes down why people are asking how to do this, not just what they are asking to do.


withUnsafeTypePunnedPointer is basically a straightforward contraction of withUnsafePointer and withMemoryRebound. An obvious advantage is eliminating a nested closure when calling out to incorrectly typed C APIs. Seems harmless enough right? Here are the less obvious, somewhat more philosophical, aspects:

  • withMemoryRebound was deliberately given a type argument to prevent type punning via inference. A major usability advantage of your proposed API is that the developer does not identify type imported from C. In general, I think this is a terrible idea for a Swift API, but it does make sense for the express purpose of legacy interop.

  • Common programming tasks should not even introduce developers to the concept of "memory binding". That concept should be the purview of system programmers and library developers (e.g. writing a custom allocator). Wrapping withMemoryRebound in a type inferred helper is a pretty good solution.

  • The safety and correctness problems surrounding unsafe pointers and memory binding are a much bigger concern to me than the convenience of type punning. Let's be clear that this API does nothing to help with those issues. This is purely an additional convenience that is only useful when working with bad C APIs.

These issues were thoroughly discussed prior to Swift 3. The conclusion at the time was that we should not design APIs to encourage direct use of mistyped C APIs. Instead, we wanted to encourage the development of Swift libraries that modernize those C APIs. We wanted Swift developers to move away from them, not because we don't want Swift developers to call C, but because those APIs are inherently broken even in C land. This is paternalistic, but a fair stance considering we can always add the convenience later. So is now the time?

I see plenty of withUnsafePointer + withMemoryRebound compositions on github. A few of them are calling socket bind. Is it really just this one C API causing the problem? How often do developers need to reach for this to use legacy C APIs? (Seriously, can anyone get an idea!). I've looked through several hundred github examples, and most of them do nothing but immediately load the pointee. Developers should not be binding memory to do this!

  return withUnsafePointer(to: x) {
    $0.withMemoryRebound(to: T.self, capacity: 1) { $0.pointee }
  }

I repeat, no one should be doing this. It is completely unnecessary. They just need to write this:

  return withUnsafeBytes(of: &x) {
    $0.load(as: T.self)
  }

All that said, there does need to be more convenience for using raw pointers. (The Swift 3 proposal was deliberately stripped of all convenience because bike-shedding was putting the memory model itself at risk of not happening in time).

I have always been convinced that we need a typed unsafe pointer with raw pointer semantics. The only reason that UnsafeRawPointer doesn't have an element type is that it's analogous to void *. It's supposed to be the obvious choice for "I don't know what the type is at this point in the code".

But there are times when you want a view of typed elements over raw (untyped) memory. In fact, there are some serious issues with Data.Foundation's use of pointer types that could be addressed with new Swift APIs.

In particular, there should probably be a closure-based API for using a type punned pointers, just as you proposed. But it should look like this:

func withUnsafeTypePunnedPointer<T, U, R>(to: T, do body: (UnsafeTypePunnedPointer<U>) -> R) -> R {
  return try body(UnsafeTypePunnedPointer<T>(Builtin.addressof(&value)))
}

This closure-based API would be convenient sometimes but is really unnecessary and insufficient. Developer's should also be able to directly convert any Unsafe[Raw]Pointer into a UnsafeTypePunnedPointer and use it following the same lifetime rules as other unsafe pointers.

We would also want to provide this on Array/UnsafeBufferPointer, and I think that argues for a solution that makes sense for pure Swift code.

Note: This solution does not depend on rebinding memory! That makes it much safer. It's perfectly fine to simultaneously access a UnsafePointer and UnsafeTypePunnedPointer into the same memory.

The problem with my solution: it does not directly help with calling socket bind. To do that, we would need to add an implicit conversion from UnsafeTypePunnedPointer to UnsafePointer that is only supported for imported functions!

12 Likes

After digging deeper...

The most problematic APIs that I've seen w.r.t. pointer types are Foundation.Data and SwiftNIO. I'll go through a lot of the issues with those below.

In short, withUnsafeTypePunnedPointer would only be useful when implementing Swift wrappers on top of some "difficult" C APIs. It does not solve the vast majority of the usability and correctness issues that I've seen in practice.

I still think it's fine to add such a thing, as long as it doesn't make those correctness issues worse. Along those lines, something called withUnsafeTypePunnedPointer should not actually bind memory as proposed:

func withUnsafeTypePunnedPointer<T, U, R>(
  to: T,
  do body: (UnsafePointer<U>) -> R
) -> R {
  return withUnsafePointer(to: to) {
    return $0.withMemoryRebound(
      to: U.self,
      capacity: MemoryLayout<T>.size / MemoryLayout<U>.size,
      body)
  }
}

The above semantics will conflict with later support for type punned pointers. Given the name, people would naturally think it's ok use type punning within the closure, but that would be undefined behavior because the closure actually takes a strongly typed UnsafePointer.

For now, we could add the helper, but call it withUnsafeTypeConvertedPointer. Later we can introduce the safer variant, withUnsafeTypePunnedPointer, once we have an UnsafeTypePunnedPointer type.

Here are the actual issues that I see today with developers using memory binding APIs:

Foundation.Data

In Foundation.Data, we have a raw buffer that may be shared with other code, and we want to temporarily view that buffer as some user-provided type. Memory binding is not designed for this. Memory binding only makes sense when the code doing the binding has exclusive ownership of the memory at the time and knows the memory's current type. When you bind memory to a type, you either persistently bind uninitialized memory to a type, or temporarily rebind memory to a different type.

You cannot temporarily rebind memory of unknown type that is used somewhere else, then rebind it back to that unknown type when the closure completes.

Another way to look at this is that the memory model gives each memory location a single global type state. Verification of the model can make use of this property. We could add more complexity to the model with the addition of "memory type scopes" in SIL. I just don't think that's desirable.

Really, the only way to fix Foundation.Data's API is for the user-provided closure to either take UnsafeRawPointer (which can be accessed with its type safe API), or take a weakly typed UnsafeTypePunnedPointer. Specifically, Data should definitely declare this method:

  public func withUnsafeBytes<ResultType>(
    _ body: (UnsafeRawBufferPointer) throws -> ResultType
  ) rethrows -> ResultType

And it should possibly eventually have this method too:

  public func withUnsafeTypePunnedPointer<ResultType, ContentType>(
    _ body: (UnsafeTypePunnedBufferPointer<ContentType>) throws -> ResultType
  ) rethrows -> ResultType

Alternatively, we could deprecate Data's bytesNoCopy initializer so that the only way Data's memory can be typed is if Data binds the type itself.

SwiftNIO

SwiftNIO has a lot of code that works with raw byte buffers and needs to call out to various
typed C APIs. Here are the basic use cases...

Nested TLDR: There's a lot of code here that I think should be simplified. @johannesweiss pointed out that, with full knowledge of the codebase, most of these cases are actually correct, in that memory model verification will succeed once we have it. However, the safe way to write the code is always simpler.

Load and store bytes of a raw buffer

Current Swift

let address = buffer.baseAddress!.assumingMemoryBound(to: UInt8.self)
for { //...
    let byte = address.advanced(by: idx).pointee
}

Safe Swift

for { //...
    let byte = buffer[idx]
}

And...

Current Swift

func returnStorage() -> UnsafeMutablePointer<UInt8> {
  return self.bytes.advanced(by: idx).assumingMemoryBound(to: UInt8.self)
}
while { //...
  base = returnStorage()
  base[idx] = byte
  idx += 1
}

Safe Swift

while { //...
  self.bytes[idx] = byte
  idx += 1
}

UnsafeRawBufferPointer is already a collection of UInt8 bytes.

Copy any UInt8 Collection into memory

Current Swift

let base = outBytes.assumingMemoryBound(to: UInt8.self)
inCollection.withUnsafeBytes { srcPtr in
  base.assign(from: srcPtr.baseAddress!.assumingMemoryBound(to: UInt8.self), count: n)
}

Safe Swift

outBytes.copyBytes(from: inCollection)

Initialize a UInt8 Array

Current Swift

return Array.init(UnsafeBufferPointer<UInt8>(
  start: rawptr.baseAddress?.advanced(by: index).assumingMemoryBound(to: UInt8.self),
  count: length))

Safe Swift

return [UInt8](rawptr[index..<(index+ length)])

Decode a String

Current Swift

String(decoding: UnsafeBufferPointer(
  start: rawptr.baseAddress?.assumingMemoryBound(to: UInt8.self).advanced(by: index),
  count: length),
as: UTF8.self)

Safe Swift

String(decoding: rawptr[index..<(index + length)], as: UTF8.self)

Store a typed pointer pointing to the interior of a raw buffer

C

struct z_stream {
  unsigned *next_in;
  //...
}

Current Swift

let typedPtr = dataPtr.baseAddress!.assumingMemoryBound(to: UInt8.self)
let typedDataPtr = UnsafeMutableBufferPointer(start: typedPtr, count: dataPtr.count)
zstream.next_in = typedDataPtr.baseAddress!

Safe Swift

zstream.next_in = dataPtr.bindMemory(to: UInt8.self)

The is one of the rare cases where binding memory is appropriate. We need a persistent, strongly typed pointer into a raw byte buffer. (NIO is essentially implementing its own memory allocator.)

The only alternative I can think of would be to provide some annotation on top of zlib to indicate that the struct fields should be imported as a weakly typed pointer, like the proposed UnsafeTypePunnedPointer. Then the byte buffer's memory never needs to be given a type.

Call an arbitrary C API (through a wrapper)

Current Swift

public static func read(pointer: UnsafeMutablePointer<UInt8>, size: size_t)

buf.writeWithUnsafeMutableBytes { ptr in
  read(pointer: ptr.baseAddress!.assumingMemoryBound(to: UInt8.self), size: n)
}

Safe Swift

public static func read(pointer: UnsafeMutableRawPointer, size: size_t)

read(pointer: pointer, size: n)

There's already a Swift wrapper around the C API. Just make its byte buffer argument type UnsafeRawPointer. Defining a wrapper is a way better approach than always calling it via withUnsafeTypePunnedPointer(to) or `withMemoryRebound(to:).

The pthread API

Current and Safe Swift

let res = pthread_create(
  &pt,
  nil,
  { p in
    let box = Unmanaged<ThreadBox>.fromOpaque((p as UnsafeMutableRawPointer?)!
           .assumingMemoryBound(to:ThreadBox.self)).takeRetainedValue()
    // ... The rest of the thread start routine
  },
  Unmanaged.passRetained(box).toOpaque())

Wow, this is horrible. Exactly as horrible as it should be! This is a great use of the assumingMemoryBound API that proves that we still need it even though it's massively misused. Notice that, within the same statement, we pass a typed pointer in, which is then passed to a void * callback. Naturally, within the callback, we can "assume" the type of the pointer!

The socket API

Current Swift

class Socket {
  func read(pointer: UnsafeMutablePointer<UInt8>, size: Int)
}

mutating func withMutableWritePointer(body: (UnsafeMutablePointer<UInt8>, Int) {
  //...
  let localWriteResult = try body(rawptr.baseAddress!.assumingMemoryBound(to: UInt8.self), ptr.count)
  //...
}

buffer.withMutableWritePointer(body: socket.read(pointer:size:))

Safe Swift

class Socket {
  func read(pointer: UnsafeMutableRawBufferPointer)
}

mutating func withMutableWritePointer(body: (UnsafeMutableRawBufferPointer) {
  //...
  let localWriteResult = try body(rawptr)
  //...
}

buffer.withMutableWritePointer(body: socket.read(pointer:size:))

I think it's extremely rare to directly call a C API like socket or pthread directly from Swift. There's always going to be a wrapper. Just use raw pointers in the wrapper, which is really untyped. And use buffer pointers rather than passing around counts!

Extending sock_addr_in

Current Swift

extension sockaddr_in: SockAddrProtocol {
  mutating func withSockAddr<R>(_ body: (UnsafePointer<sockaddr>, Int) throws -> R) rethrows -> R {
    var me = self
    return try withUnsafeBytes(of: &me) { p in
      try body(p.baseAddress!.assumingMemoryBound(to: sockaddr.self), p.count)
    }
  }
}

func doBind(ptr: UnsafePointer<sockaddr>, bytes: Int) throws {
  try Posix.bind(descriptor: fd, ptr: ptr, bytes: bytes)
}

switch address {
case .v4(let address):
  address.withSockAddr(doBind)

Safe Swift

extension sockaddr_in: SockAddrProtocol {
  mutating func withSockAddr<R>(_ body: (UnsafePointer<sockaddr>, Int) throws -> R) rethrows -> R {
    return withUnsafePointer(to: self) {
      $0.withMemoryRebound(to: sockaddr.self, capacity: 1) { p in
        try body(p, MemoryLayout<sockaddr_in>.size)
      }
    }
  }
}

func doBind(ptr: UnsafePointer<sockaddr>, bytes: Int) throws {
  try Posix.bind(descriptor: fd, ptr: ptr, bytes: bytes)
}

switch address {
case .v4(let address):
  address.withSockAddr(doBind)

Here you can finally see a legitimate use of withMemoryRebound(to:). We have a pointer to memory, of a known type sockaddr_in. We need to bridge out to an API that needs to view the same memory as a different type for the duration of the call. The scope is nicely contained with no chance of accessing the same memory from differently typed pointers.

Here's where withUnsafeTypePunnedPointer could be used--at the lowest level of C interop. But let's call it withUnsafeTypeConvertedPointer.

13 Likes

Thanks very much @Andrew_Trick, we will address the issues you pointed out. However, I have a question regarding the first five examples you mention from the SwiftNIO code base. Essentially the code you propose is just skipping the assumingMemoryRebound(to:). First of all: I just didn’t know that you could get the UInt8s directly from an UnsafeRawBufferPointer so we should implement what you suggest because the code looks nicer. However I don’t understand why this is a correctness issue. When we allocate the memory, we bind it to UInt8 (Using bindMemory. So I assumed that I can then (in later uses of the UnsafeRawPointer that I bound) use assumingMemoryBound(to: UInt8) to access it as UnsafePointer<UInt8> because we do bind the memory to be of type UInt8. Is that not correct? Again, I see your code looks nicer, I do not understand how it’s correct and the current version is not (only speaking of the first five examples which are I assume all from ByteBuffer). As a rule, we only use assumigMemoryBound(to: SomeType) if we have previously bound the memory to that type.

Am I misunderstanding something around those APIs?

For reference here is where we bind all memory that comes out of ByteBuffer to UInt8:

/* bind the memory so we can assume it elsewhere to be bound to UInt8 */
ptr.bindMemory(to: UInt8.self, capacity: Int(bytes))

And regarding our (internal) Socket API: Yes, this should definitely be expressed in terms of UnsafeMutableRawBufferPointers, this is one of the oldest bits of the NIO codebase and we never got around to fixing it. To do the least I did now raise an issue to address this.

Regarding the sockaddr_in: I believe the code we have today is an artefact of this bug: [SR-2749] Invalid copy of sockaddr_storage struct on Linux · Issue #45353 · apple/swift · GitHub . At least on Ubuntu 14.04 it's extremely delicate to change anything around sockaddr_*.

I think I did now go through all bits of SwiftNIO code/API in your post and from my current understanding our usage is correct (although unnecessarily awkward at times) because we do bind all ByteBuffer memory at initialisation to UInt8 and we only ever assume it to be already bound to UInt8 (which is always the case). You point out that 1) the code would fail memory model verification if Swift would do that. However I can't currently find any instances where that is the case, mind pointing them out. 2) That some pointers escape their with closures. I couldn't immediately find any of those instances, mind letting us know where they are?

@johannesweiss is correct. If the code always binds the buffer to UInt8 before it initializes the memory (via either initializeMemory or bindMemory), and the memory is never bound to a different type, then all of your assumingBoundMemory(to: UInt8.self) calls would pass a hypothetical memory model verifier. Wherever I said "correct" in my previous post, I should have said "safe" (except maybe the sockaddr_in example).

I know you fully understand how the Swift APIs works and why those uses of assumingMemoryBound(to) were dangerous as written. But for the sake of a larger audience...

UnsafeRaw[Buffer]Pointer is an important family of types for working with byte buffers and streams. Regular developers should feel comfortable with and confident about using them safely. Developers should not need to reason about the type that memory is "bound" to, and they should not normally need to use the explicit memory binding APIs: bindMemory(to:capacity), withMemoryRebound(to:capacity:), and assumingMemoryBound(to:).

It's validating for me to see after reviewing SwiftNIO that it's exactly the same C APIs causing the need for memory binding workarounds today as originally motivated those memory binding API workarounds here UnsafeRawPointer Migration Guide.

The usability of raw pointers was improved quickly after the initial Swift 3 release by introducing
UnsafeRawBufferPointer. That reduced the temptation for pure Swift code to bind raw memory to a type just as a convenience.

assumingMemoryBound(to:)

I'm very concerned that developers seeing code that uses assumingMemoryBound will think it is a generally safe and intended mechanism for pointer type conversion in Swift. It's not that. It is entirely a backdoor for writing wrappers around C APIs.

I consider any Swift code calling assumingMemoryBound to be unsafe unless:

  • It is accompanied by a comment explaining why the assumption holds in this particular case.
  • That logic can be easily confirmed by locally reasoning about the code.

I know of two legitimate use cases for assumingMemoryBound:

  1. Writing a wrapper around a C API taking a void * callback. The pthread_create example from SwiftNIO that I posted above is a perfect example of that.

  2. Passing a Swift Unsafe[Mutable]RawPointer to a C API taking char * or unsigned char *.

(The original Swift 3 migration guide justified the need for this API using exacly the same two pthread and CString examples.)

In both cases, the typed pointer produced by assumingMemoryBound(to:) should be passed directly to the imported C code without accessing it in Swift. There's always a danger that the imported C function could by replaced by a Swift shim. However, I don't know how to avoid assumingMemoryBound(to:) in these cases without special support in the Swift type system, which is a fairly high bar.

withMemoryRebound(to:capacity:)

withMemoryRebound(to:capacity:) is safer than assumingMemoryBound, but also only intended for C interop when two distinct imported C type are known to have a compatible layout. Again, the canonical example of this from the original Swift 3 migration guide is exactly the same case where it's needed in SwiftNIO:

var addr = sockaddr_in()
let sock = socket(PF_INET, SOCK_STREAM, 0)

let result = withUnsafePointer(to: &addr) {
  // Temporarily bind the memory at &addr to a single instance of type sockaddr.
  $0.withMemoryRebound(to: sockaddr.self, capacity: 1) {
    connect(sock, $0, socklen_t(MemoryLayout<sockaddr_in>.stride))
  }
}

The subject of this thread is (or was) introducing a new API to make this particular case more convenient:

// Temporarily bind the memory at &addr to a single instance of type sockaddr.
let result = withUnsafeTypeConvertedPointer(to: &addr, as: sockaddr.self) {
  connect(sock, $0, socklen_t(MemoryLayout<sockaddr_in>.stride))
}

bindMemory(to:capacity)

bindMemory(to:capacity) is the only of the three meant for use in pure Swift. It allows Swift code to decouple memory allocation from knowledge of the type that memory will hold. So, it's useful for layering a raw memory allocator, or raw byte buffer underneath a typed view of the buffer.

Interestingly, SwiftNIO reveals a situation where bindMemory(to:capacity:) is also needed for C interop. The C struct z_stream exposes a typed pointer into a raw byte buffer.

struct z_stream {
  unsigned char *next_in;
  //...
}

zstream.next_in = dataPtr.bindMemory(to: UInt8.self, capacity: count)

I'm not sure how else to get around this without defining your own layout compatible z_stream struct in C that takes void *. Then you would need to deal with the additional type mismatch when calling zlib.

If you ask me, this is a pretty good example of why it could be worthwhile to add some type system support for importing [unsigned] char * "differently", at least in some cases.

9 Likes

@Andrew_Trick Thanks a lot for the explanations and examples, very helpful.

I have a question about the pthread example. Here's the code snippet from above:

let res = pthread_create(
  &pt,
  nil,
  { p in
    let box = Unmanaged<ThreadBox>.fromOpaque((p as UnsafeMutableRawPointer?)!
           .assumingMemoryBound(to:ThreadBox.self)).takeRetainedValue()
    // ... The rest of the thread start routine
  },
  Unmanaged.passRetained(box).toOpaque())

You said it's "a great use of the assumingMemoryBound API". But it seems unnecessary in this example. Unmanaged already carries type information via its generic parameter, and the fromOpaque method takes a raw pointer, not a typed pointer. (I was actually surprised this compiles. I didn't know you can pass a typed pointer to a function expecting a raw pointer.)

So I think you can simplify this:

Unmanaged<ThreadBox>
    .fromOpaque((p as UnsafeMutableRawPointer?)!.assumingMemoryBound(to:ThreadBox.self))
    .takeRetainedValue()

to this:

Unmanaged<ThreadBox>
    .fromOpaque((p as UnsafeMutableRawPointer?)!)
    .takeRetainedValue()

(I left in the cast to the optional and force-unwrap that's necessary to make the same code work on macOS and Linux, according to a comment in NIO.)

Am I correct? This compiles just fine.

Side note: Unmanaged.fromOpaque uses unsafeBitCast and not assumingMemoryBound(to:) in its implementation. This is it:

public struct Unmanaged<Instance : AnyObject> {
  public static func fromOpaque(_ value: UnsafeRawPointer) -> Unmanaged {
    return Unmanaged(_private: unsafeBitCast(value, to: Instance.self))
  }
}
1 Like

Yes. Thank you @ole . This would definitely be the prefered idiom when passing a reference into the pthread_create body:

let box = Unmanaged<ThreadBox>
    .fromOpaque((p as UnsafeMutableRawPointer?)!)
    .takeRetainedValue()

assumingMemoryBound would only make sense if you want to use a typed UnsafePointer inside the thread body:

  let buffer = UnsafeMutablePointer<Int>.allocate(capacity: n)
  pthread_create(&pt, nil, { p in
      let buffer = (p as UnsafeMutableRawPointer?)!.assumingMemoryBound(to: Int.self)
      // ...
  }, buffer)

@Andrew_Trick Thanks!

@Andrew_Trick Do you have any good resources for building my understanding of Swift's pointer types? I've had a pretty rough time most times I've needed to interact with them, mainly because the documentation was written using terms whose definitions I couldn't find.

I'm quite familiar with assembly, manual memory management, etc., so I get the basics. But there are a lot of concepts here:

  1. Memory allocation/deallocation
  2. Memory initialization
  3. Memory binding
  4. Moving vs copying

I would find it really useful if there were a state graph, that shows all possible states a pointer and its underlying memory could be in, connected by arcs representing the various verbs (or Swift function names) for manipulating them. Basically the nodes are nouns, like:

  1. Unallocated and uninitialized memory
  2. Allocated but uninitialized memory
  3. Allocated and initialized memory

And then the arcs are verbs, like:

  • "allocate" takes you from 1 to 2
  • "initialize" takes you from 2 to 3
  • "deinitialize" takes you from 3 to 2
  • "deallocate" takes you from 2 to 1

etc. Though I'm missing a bunch, because I have no idea how any of this works. I know I'm missing the states for "bound" vs "unbound" (and the corresponding arcs for bind/unbind), "typed" vs "untyped" (which might be what binding is about?). And then there's loading, moving and copying which fits in somehow.

8 Likes

@AlexanderM I assume SR-12739 was created in response to your post.

1 Like

Cool, thanks Andrew Trick!