Unaligned Load

I've got some code that I am migrating to Swift 5. I had thought this code was relatively safe, or as safe as I could make it considering it was using the Unsafe* APIs. It turns out I was wrong and it was actually to be undefined behavior for Swift all along.

I have an extension to Data called getFixedWidthInteger which gets a FixedWidthInteger from a specific index in the Data. I used to use the now deprecated withUnsafeBytes.

I tried the naïve migration:

extension Data {
  func getFixedWidthInteger<I: FixedWidthInteger>(at index: Data.Index) -> I {
    return self.withUnsafeBytes { $0.load(fromByteOffset: index, as: I.self) }
  }
}

let data = Data([0x00, 0x00, 0x40, 0xA4, 0x81])
let uut: UInt32 = data.getFixedWidthInteger(at: 1) // Fatal error: load from misaligned raw pointer
XCTAssertEqual(uut, 2_175_025_152)

Which traps as expected. I've read multiple threads on here (e.g., Best practice for parsing heterogeneous types from Data (in Swift 5), Accessing a “misaligned” raw pointer safely, Built-in unaligned loads).

tl;dr: The Actual Question

I've come up with a solution that does not trap.

extension Data {
  func getFixedWidthInteger<I: FixedWidthInteger>(at index: Data.Index) -> I {
    return self.advanced(by: index).withUnsafeBytes { $0.load(fromByteOffset: 0, as: I.self) }
  }
}

let data = Data([0x00, 0x00, 0x40, 0xA4, 0x81])
let uut: UInt32 = data.getFixedWidthInteger(at: 1)
XCTAssertEqual(uut, 2_175_025_152)

My guess is that it is not safe and I've just stumbled into a corner case that is not triggering an assertion. Maybe this is safe and completely defined behavior. However, given that I used to think my old code was safe and defined behavior and I was wrong I figured I'd rather ask and be sure.

How bad is this?

I'm starting to think that maybe it is safe but possibly not very performant because it copies the entire block of memory.

I switched my code to using the following:

fileprivate func read<T: FixedWidthInteger>(_ type: T.Type) throws -> T {
    let byteCount = MemoryLayout<T>.size

    guard cursor + byteCount <= data.endIndex else { throw Errors.prematureEndOfData }

    defer { advance(by: byteCount) }

    return data[cursor ..< cursor + byteCount]
        .reduce(T(0), { $0 << 8 | T($1) })
}

Due to the Data subscript ensuring that reading a single element (byte) will always succeed.

[edit]

Despite not being an issue in the codebase from which the above method appears, I edited it to be more correct as a standalone method.

Is this code implicitly assuming that data has zero-based indices? (It’s hard to tell without knowing what cursor represents.)

Yes, that assumption is being made. It's not an issue in the codebase in which this method appears. The key here, though, is to demonstrate how I'm forming integers.

I'm with @Avi in that, based on those other threads, I have switched to parsing FixedWidthIntegers using the indexed byte access from Data and shifting. From what I understand you can avoid unaligned access via pointers, but that maybe if you don't require using pointers it's safer to use more abstract APIs. I created some extensions to suit my major use cases which involve stepping over a specified count of FixedWidthIntegers and then parsing a FixedWidthInteger. If you're relying on .prefix() and such it's much harder to escape the bounds of your Data instance then with UnsafeRawPointer.