Best practice for parsing heterogeneous types from Data (in Swift 5)

This post is partly motivated by the new warning messages in Xcode 10.2 / Swift 5 about deprecation of withUnsafeBytes that takes UnsafePointer<T> rather than the newer variant that takes UnsafeRawBufferPointer: e.g. withUnsafeBytes Data API Confusion

I have read through a number of strategies people use to get their existing code working, but I am interested in taking a fresh look at best practice for parsing serialized binary protocols from Data in Swift 5. For the sake of discussion, say I'm trying to parse a binary packet of data I received that looks like:

var receivedData: Data
// Internally structured like [ Length: UInt8 ] [ Type: UInt16 ] [ Payload (variable length) ] [ Checksum: UInt16 ]

Previously I might have done something like:

let parsedLength: UInt8 = receivedData.withUnsafeBytes { $0.pointee }
// Continue parsing typed values out individually...

Is this the recommended way to parse heterogenous types out of a Data instance now?

receivedData.withUnsafeBytes { (rawPointer) -> Void in
    var offset = 0
    let parsedLength = rawPointer.load(as: UInt8.self)
    offset += MemoryLayout.size(ofValue: parsedLength)
    let parsedType = rawPointer.load(fromByteOffset: offset, as: UInt16.self)
    offset += MemoryLayout.size(ofValue: parsedType)
    ... 
    // Do something with the parsed values
}
2 Likes

The load method of UnsafeRawBufferPointer is wrapper of UnsafeRawPointer.load

And the method is checking the alignment that would causing fatalError.

Yes, I think that's roughly best practice. You can also update a var raw pointer by byte offsets if that's your thing: rawPointer += offset, but I think what you've written is better.

I will point out that you're relying on type inference to generate correct byte offset. That might be more error-prone than being explicit:

offset += MemoryLayout<UInt16>.size

[edit] @SusanCheng made an important point about unaligned memory access. Adding an unaligned load to the UnsafeRawPointer API is long overdue. I wasn't able to find a bug tracking it though, so please file a bug if you need it. It was alluded to here:

1 Like

@Andrew_Trick, @SusanCheng, alignment is one of the things I've yet to strongly grasp. Could you help me understand what my options are (today) if the received byte stream is "packed", i.e. potentially not aligned the way Swift would like the values to be? My intuition is that if you're accessing a value in place and it is misaligned that seems bad, but my understanding was that UnsafePointer.load(as:) first makes a copy of the data so it is never associating the type with the original (potentially un-aligned) bytes, which seems... less bad?

Thanks

Although load(as:) doesn't associate a type with the memory being loaded, it still unfortunately assumes that the pointer is aligned properly for a value of that type. Like in C, you could use memcpy or Swift's equivalent UnsafeMutableRawPointer.copyMemory method to do unaligned loads and stores. I usually add some helper methods to UnsafeRawPointer to facilitate this; as Andrew said, it'd be great to have these in the standard library:

extension UnsafeRawPointer {
  func loadUnaligned<T>(as: T.Type) -> T {
    assert(_isPOD(T.self)) // relies on the type being POD (no refcounting or other management)
    let buffer = UnsafeMutablePointer<T>.allocate(capacity: 1)
    defer { buffer.deallocate() }
    memcpy(buffer, self, MemoryLayout<T>.size)
    return buffer.pointee
  }
}
2 Likes

Thanks for sharing that helper method. I have two questions:

  • Is _isPOD() meant to be called from outside the standard library?
  • Is allocating memory temporarily the only way to instantiate an “empty” instance of a trivial type?

Not officially, but there isn't an ordained way to do so.

Currently, yes. A withScopedAllocation { } API to give you a pointer to temporary, uninitialized memory would also be a great addition. If you have a balanced malloc/free, however, note that it will get promoted away by LLVM:

1 Like

That is good to know, thanks!

Unfortunately, memcpy is currently the way to load misaligned data (semantically one byte at a time). Or you can load the bytes yourself and piece them together with bitwise operators.

This is obviously an unsatisfactory way to handle packed data. I think there's an argument to be made for having the UnsafeRawPointer load default to loading unaligned data. That wasn't done initially because

  • compiler support didn't exist, but that's easy to add
  • ignoring alignment may indicate a programmer error, this way you're forced to think about alignment and make it explicit
  • we wanted higher-level APIs to be expressible in terms of raw pointers without changing semantics or losing performance
  • we can loosen this restriction over time but can't strengthen it

I filed [SR-10273] Add an UnsafeRaw[Buffer]Pointer API for loading unaligned/packed data

Yes, that would be very helpful, also because – as I was told here – one may not even know how the memory of a given Data value is aligned.

It'd be great if load and store just worked with unaligned data, I agree. That would require a constraint that they only work with POD types, or maybe that they only support unaligned loads on POD types, since generic non-POD types don't have value witnesses for unaligned accesses, but that's probably fine, and the supporting unaligned loads only for POD types would be backward compatible with the current semantics.

@Andrew_Trick I'm a long-time Swift user but new to contributing: Is there anything I can do to help add weight to that [SR-10273]? I know with bugreporter.apple.com the WWDC recommendation is to file duplicates to express cumulative interest.

Jira lets you "vote" for an issue; there's no need or benefit to dup bugs in the public tracker.

@cconway Bringing it up on this forum was a good start. I added an explanation about how to proceed in the bug. Essentially, someone needs to follow through with a prototype. I could help with that. Then someone needs to drive the Swift Evolution proposal to decide whether to change default behavior or simply add a new public API flag.

1 Like

Ok, I will plan to do something like the extension @Joe_Groff posted above for the time being.

I'm curious, however: The code snippet that @SusanCheng posted seems to suggest that calling UnsafeRawPointer.load(fromByteOffset: as:) in an unaligned case will trigger a stop in execution due to _debugPrecondition(), but I haven't noticed any issues with my non-alignment-aware parsing code prior to Xcode 10.2. Have I just gotten lucky that my binary data was aligned and didn't trigger the precondition? Say I do end up parsing misaligned data in a production build without a mitigation like memcpy: Would I expect a crash, poor performance, unpredictable behavior, or other?

It is error in debug mode.

I workaround by reimplement the load and store methods.

It should be same as old version of Data.withUnsafeBytes which’s also using bindMemory directly.

1 Like

Doesn't binding (and dereferencing) memory also require that the memory is aligned for the type T?

My code is doing the same thing as UnsafeRawPointer.load but without the alignment check. It means that it’s just force loading type T from memory and ignore the alignment.

I think it’s easy to get dragged down in the maw of unsafe memory access here, because most of us have come from some C-based language where that’s the only option. Personally I think that’s a mistake in most cases. One of the main goals of Swift is safety, and if you parse incoming data using the same style as you’d use in C, you are likely to suffer from the same exploitable memory management bugs as C.

Hence my post on the thread you referenced, outlining a completely different, and much safer, way to approach this problem. It’s not applicable in all cases, but IMO you should default to this approach, leaving the unsafe, C-style code for situations where profiling has shown that you absolutely need it.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

1 Like

@eskimo, thanks for re-linking to that post. I looked more closely at your alternative implementation and that does answer part of what I was originally asking about, which is best practices when parsing binary data from a source you don't control.

Does my extension below still capture what you were trying to demonstrate about moving up in abstraction? Instead of calling Data.withUnsafePointer() I envision getting slices of my Data packet around expected values, then parsing the expected integer type from it with this function. NOTE: I had to insert a .reversed() to your example code in order for it to correctly parse little endian integers.

extension Data {
    
    func parse<T: FixedWidthInteger>(type: T.Type) -> T? {
        
        let typeSize = MemoryLayout<T>.size
        guard self.count >= typeSize else { return nil }
        
        return self.prefix(typeSize).reversed().reduce(0) { $0 << 8 | T($1) }
    }
}
Terms of Service

Privacy Policy

Cookie Policy