Accessing a "misaligned" raw pointer safely

I have the following:

func parseVarRec<Header: FixedWidthInteger>(
    buffer: Data, 
    headerType: Header.Type, 
    output: (Data) -> Void) throws {

    let hdrLen = MemoryLayout<Header>.size
    var hdrIdxs = (start: buffer.startIndex, end: buffer.startIndex + hdrLen)
    while hdrIdxs.end < buffer.endIndex {
        let recLenHdr = buffer[hdrIdxs.start ..< hdrIdxs.end].withUnsafeBytes { 
            $0.pointee as Header 
        }
        let recLen = Int(recLenHdr.bigEndian)
        output(buffer[hdrIdxs.end ..< hdrIdxs.end + recLen])
        next(indexPair: &hdrIdxs, byOffset: hdrLen + recLen)
    }
}

When compiled with Swift 5.0 I now get a warning:

x9parse.swift:24:63: warning: 'withUnsafeBytes' is deprecated: use `withUnsafeBytes<R>(_: (UnsafeRawBufferPointer) throws -> R) rethrows -> R` instead
        let recLenHdr = buffer[hdrIdxs.start ..< hdrIdxs.end].withUnsafeBytes { $0.pointee as Header }

So I'm trying to figure out how to use this new withUnsafeBytes method. Here is what I've come up with:

        let recLenHdr = buffer[hdrIdxs.start ..< hdrIdxs.end].withUnsafeBytes {
            $0.baseAddress!.load(as: Header.self) 
        }

The problem is that not all of the "header" fields in this file are "aligned". So in those cases I end up with "Fatal error: load from misaligned raw pointer".

How can I resolve this?

Not all hardware architectures support unaligned loads. Intel CPUs do, but ARM will crash with a bus error. See this link, which also has a nice explanation of why we need to worry about this. It's undefined behaviour in C. In Swift it is defined to crash, like you're seeing.

The solutions are basically to load and shift the bytes individually (yeah, we actually do this), or use memcpy. Unfortunately we don't have any convenient way to do unaligned loads in the standard library yet (it's possible for the compiler to generate it for us - see the __attribute__((__packed__)) example here - it's just that nobody has gotten around to it).

1 Like

First off, dereferencing a misaligned pointer is undefined behaviour (Swift 5 helps you spot this by trapping in debug mode) even though some architectures support it. The fix is pretty simple, instead of dereferencing the memory, you copy into a target value which is already correctly aligned:

var header = Header() /* create empty target value */
withUnsafeMutableBytes(of: &header) { headerPtr in
    buffer[hdrIdxs.start ..< hdrIdxs.end].withUnsafeBytes { sourcePtr in
        // copy the bytes from the source into the memory of the header value
        headerPtr.copyMemory(from: UnsafeRawBufferPointer(rebasing: sourcePtr))
    }
}

Here's for example NIO's ByteBuffer code for loading an integer: https://github.com/apple/swift-nio/blob/06649bb8c704a042fc07f2013ae429e2d646e7bb/Sources/NIO/ByteBuffer-int.swift#L50-L61

I should add that Swift structs do not have any layout guarantees, so you should only do the above if you serialised the bytes with the exact same Swift version from the exact same type. Also for this to work, Swift structs have to be trivial types. The docs say:

A trivial type can be copied bit for bit with no indirection or reference-counting operations. Generally, native Swift types that do not contain strong or weak references or other forms of indirection are trivial, as are imported C structs and enumerations.

3 Likes

Thanks. I had a big of trouble because I think you code assumes the withUnsafe[Mutable]Bytes calls are working with pointers, and with Swift 5 having versions that return buffer pointers (aka buffers!) it wasn't working. And the error messages were less than helpful. In any case, I have it working now:

func parseVarRec<Header: FixedWidthInteger>(
    buffer: Data, 
    headerType: Header.Type, 
    output: (Data) -> Void) throws {

    let hdrLen = MemoryLayout<Header>.size
    var hdrIdxs = (start: buffer.startIndex, end: buffer.startIndex + hdrLen)
    var header = Header()
    while hdrIdxs.end < buffer.endIndex {
        withUnsafeMutableBytes(of: &header) { headerBuf in
            buffer[hdrIdxs.start ..< hdrIdxs.end].withUnsafeBytes { sourceBuf in 
                headerBuf.copyMemory(from: sourceBuf)
            }
        }
        let recLen = Int(header.bigEndian)
        output(buffer[hdrIdxs.end ..< hdrIdxs.end + recLen])
        next(indexPair: &hdrIdxs, byOffset: hdrLen + recLen)
    }
}

Annoying, to me, that it has to be done this way. But I realize its a CPU architecture issue, not a Swift issue.

Thanks for your help!!

Figured I'd share a further result. "Genericized" (is that a word?) the buffer type.

func parseVarRec <Buffer, Header> (
    buffer: Buffer, 
    headerType: Header.Type, 
    output: (Buffer.SubSequence) -> Void
)   throws 
    where Buffer: ContiguousBytes & Collection,
          Buffer.SubSequence: ContiguousBytes,
          Header: FixedWidthInteger
{
    let hdrLen = MemoryLayout<Header>.size
    var hdrIdxs = (start: buffer.startIndex, 
                   end: buffer.index(buffer.startIndex, offsetBy: hdrLen))
    var header = Header()
    var recLen = 0
    repeat {
        buffer[hdrIdxs.start ..< hdrIdxs.end].withUnsafeBytes { sourceBuf in 
            withUnsafeMutableBytes(of: &header) { headerBuf in
                headerBuf.copyMemory(from: sourceBuf)
            }
        }
        recLen = Int(header.bigEndian)
        output(buffer[hdrIdxs.end ..< buffer.index(hdrIdxs.end, offsetBy: recLen)])
    } while buffer.formIndices(&hdrIdxs, offsetBy: hdrLen + recLen) 
}

extension Collection {
    func formIndices(_ idxs: inout (start: Index, end: Index), offsetBy offset: Int) -> Bool {
        _ = formIndex(&idxs.start, offsetBy: offset, limitedBy: endIndex)
        return formIndex(&idxs.end, offsetBy: offset, limitedBy: endIndex)
    }
}

Actually it doesn't have to work this way. With a little compiler support Swift could provide a much nicer API for unaligned access. See [SR-10273] Add an UnsafeRaw[Buffer]Pointer API for loading and storing unaligned/packed data · Issue #52673 · apple/swift · GitHub

5 Likes