Usage of Data.withUnsafeBytes()?

How am I supposed to access the contents of a memory block contained in Data?
I get "Unable to infer closure type in the current context" with the following snippet and can't find any Swift 5+ documentation that explains the semantics in a understandable manner

    import Foundation

    struct PacketHeader {
        let packetType: UInt32
        let packetLength: UInt32
    }

    let data = Data(  [0, 1, 2, 3] )
    var size : UInt32

    data.withUnsafeBytes() {
        $0.withMemoryRebound(to: PacketHeader.self, capacity: 1 ) {
            size = CFSwapInt32LittleToHost( $0.pointee.packetLength )
        }
    }

I think you want something like this instead:

import Foundation

struct PacketHeader {
    var packetType: UInt32
    var packetLength: UInt32
}

let data = Data(  [0, 1, 2, 3, 0, 1, 0, 0] )
var size : UInt32

let p: PacketHeader = data.withUnsafeBytes {
    var p = $0.bindMemory(to: PacketHeader.self).first!
    p.packetLength = CFSwapInt32LittleToHost(p.packetLength)

    return p
}

print(p) // PacketHeader(packetType: 50462976, packetLength: 256)

There are a bunch of problems here. The confusing error you’re seeing is likely due to poor compiler diagnostics. There’s some sort of compile error within the the various closures in play, and because they are single line closures, and thus involved in type inference, you get a wacky error. I’ve seen this many times before and, to my shame, haven’t yet filed a bug about it )-:

However, there are a much bigger issues with the approach you’ve outlined. First up, Swift does not provide any guarantees about the layout of structs, and thus it’s not safe to declare PacketHeader in Swift and then access the packet’s bytes via that structure.

If you want to use the struct approach, you’ll need to declare the struct in C and import that struct into Swift. C does provide some limited guarantees about struct layout, and Swift will honour those.

However, there’s a second problem that suggests that this isn’t the right approach, namely that Data does not guarantee any alignment. If the base address of the buffer ($0 in your example) was misaligned, you could run into problems accessing packetLength.

I write a lot of networking code and I’ve long since given up on the entire notion of declaring a struct that maps on to a packet. Rather, I treat parsing packets like parsing strings: I start at the front and deal with them byte-wise. For example:

extension Data {

    mutating func parseLEUInt32() -> UInt32? {
        guard self.count >= 4 else { return nil }
        defer {
            self = self.dropFirst(4)
        }
        return self.prefix(4).reversed().reduce(0) { $0 << 8 | UInt32($1) }
    }

    mutating func parsePacketHeader() -> PacketHeader? {
        guard
            let packetType = self.parseLEUInt32(),
            let packetLength = self.parseLEUInt32()
        else { return nil }
        return PacketHeader(packetType: packetType, packetLength: packetLength)
    }
}

var d = Data([0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef])
let header = d.parsePacketHeader()!
print(String(header.packetType, radix: 16))     // 67452301
print(String(header.packetLength, radix: 16))   // efcdab89
print(d.count)                                  // 0

Note Here I’m extended Data just for brevity, but in real code I tend to define a custom type to help with this parsing.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

14 Likes

Can I just +1 this really hard? Parsing network protocols by casting buffer pointers to structures is really brittle. It also leads to needing to think really hard about endianness issues that can simply be avoided by doing as @eskimo suggests and parsing them bytewise.

This is also the approach used by SwiftNIO, due to the resilience of this approach.

9 Likes

Thanks for the hints, and for pointing out the pitfalls.
The "real world" structs already come from a bridged header. Strange the Swift compiler respects struct alignment information that can not be expressed in the language and is missing in the "Swift counterpart" of the C header.
Regarding Data alignment: The assertion that the underlying buffer is safe aligned for an 64-bit-datatype is not valid any more for Data? (AFAIK this was true for NSMutableData)

I'll probably end up with the "Streaming conversion" (even if only to avoid 64-UInt8-tuples for fixed size string buffers in the structs) but also try to evaluate the tradeoff (memory, time) of different approaches before - hence the version "reinterpret memory without copying".

I've always had difficulty understanding the diagnostics due to type inference on closure-taking APIs. Although the one I recently filed related to withUnsafeBytes was quickly fixed: [SR-11155] Severely misleading type inference of closure argument · Issue #53552 · apple/swift · GitHub.

Part of the problem is that withUnsafeBytes is overloaded and one of the overloads is actually deprecated. Your case is different than my bug because you're actually trying to use deprecated API to get an UnsafePointer<PacketHeader>. The way you would write that is:

data.withUnsafeBytes() {
  (p: UnsafePointer<PacketHeader>) in
    size = CFSwapInt32LittleToHost( p.pointee.packetLength )
}

Which will rightly give you a warning:

warning: 'withUnsafeBytes' is deprecated: use `withUnsafeBytes<R>(_: (UnsafeRawBufferPointer) throws -> R) rethrows -> R` instead
data.withUnsafeBytes() {

Once we add this feature, the original code would work as written, without using the deprecated API: SR-11087: Add a closure taking API: UnsafeRaw[Mutable][Buffer]Pointer.withMemoryRebound(to:[capacity:])

But that's still a bad way to write the code. Any approach using withMemoryRebound or bindMemory is unnecessarily confusing and dangerous.

If you want to reinterpret a sequence of raw bytes as a struct in a way that's type safe you can write it this way.

data.withUnsafeBytes() {
  size = $0.load(as: PacketHeader.self).packetLength
}

I left out the call to CFSwapInt32LittleToHost, because, if you're parsing a byte stream produced by a different host, you should first parse and reorder the bytes before initializing the PacketHeader, as @eskimo explained above.

1 Like

Strange the Swift compiler respects struct alignment information that
can not be expressed in the language and is missing in the "Swift
counterpart" of the C header.

I understand why you find that strange, but it make sense when you look at how things have evolved. Swift has to honour the layout of structs imported from C because that’s critical to C, and hence Objective-C, interoperability. The question of how to specify the layout of native Swift structs is a complex one, and it’s basically been put off because it’s not on the critical path.

If that’s something you’d like to see fixed, feel free to wade in to Swift Evolution.

Regarding Data alignment: The assertion that the underlying buffer is
safe aligned for an 64-bit-datatype is not valid any more for Data?
(AFAIK this was true for NSMutableData)

Memory you get from malloc, and hence the base pointer of a newly created NSMutableData, will be aligned. The problem comes when dealing with subdatas. Consider this Objective-C program:

NSData * data = [NSData dataWithContentsOfFile:@"/System/Library/Kernels/kernel"];
printf("%p\n", data.bytes);     // -> 0x104000000

NSData * subdata = [data subdataWithRange:NSMakeRange(1, data.length - 1)];
printf("%p\n", subdata.bytes);  // -> 0x104000001

If you’re writing general-purpose code, it’s not safe to assume any alignment on your data pointers, even in Objective-C.

Most people don’t notice this because most code is run on CPUs where misalignment is just a performance problem. However, my first Mac was a Mac Classic, whose 68000 CPU would trap on misaligned memory accesses.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

3 Likes

Sure, been there with Think-C on my Mac LC (and on my 68000 based Atari ST)...

Thanks for the input. With the "data stream" approach I'm still a bit torn. Given even my initial sample contains an buffer overrun there is a point in the extra safety.

Still it feels like trading buffer safety for likely implementation errors - for round trip stream conversion, member order and datatype need to be specified twice (unless I abuse Codable or use an Codegen). Easy for simple Type / Length headers, in my experience error prone for multiple complex structures.

Initially I assumed also the performance impact pretty high, but the optimizer does a pretty good job:
100000 x reading 3 members of a struct of Data:
Unoptimized:
just test setup: 0.2s
bindMemory: 0.47s
load: 0.5s
stream implementation using internal offset: 1.9s
using data with map / reduce: 3.8s

Optimized:
just test setup (create data, manipulate an inout var three times): 0.19s
bindMemory: 0.34s
load: 0.221s
stream implementation using internal offset with constant data: 0.57s
using data with map / reduce: 0.6s

Just wanted to add that there is UInt32(littleEndian: littleEndianEncodedUInt32) as well as UInt32(bigEndian: bigEndianEncodedUInt32). So no need to implement an endianness swap manually. (Same applies to all the other integer types).

2 Likes