Unaligned Load

swift5
memory-safety
(Ryan Lovelett) #1

I've got some code that I am migrating to Swift 5. I had thought this code was relatively safe, or as safe as I could make it considering it was using the Unsafe* APIs. It turns out I was wrong and it was actually to be undefined behavior for Swift all along.

I have an extension to Data called getFixedWidthInteger which gets a FixedWidthInteger from a specific index in the Data. I used to use the now deprecated withUnsafeBytes.

I tried the naïve migration:

extension Data {
  func getFixedWidthInteger<I: FixedWidthInteger>(at index: Data.Index) -> I {
    return self.withUnsafeBytes { $0.load(fromByteOffset: index, as: I.self) }
  }
}

let data = Data([0x00, 0x00, 0x40, 0xA4, 0x81])
let uut: UInt32 = data.getFixedWidthInteger(at: 1) // Fatal error: load from misaligned raw pointer
XCTAssertEqual(uut, 2_175_025_152)

Which traps as expected. I've read multiple threads on here (e.g., Best practice for parsing heterogeneous types from Data (in Swift 5), Accessing a “misaligned” raw pointer safely, Built-in unaligned loads).

tl;dr: The Actual Question

I've come up with a solution that does not trap.

extension Data {
  func getFixedWidthInteger<I: FixedWidthInteger>(at index: Data.Index) -> I {
    return self.advanced(by: index).withUnsafeBytes { $0.load(fromByteOffset: 0, as: I.self) }
  }
}

let data = Data([0x00, 0x00, 0x40, 0xA4, 0x81])
let uut: UInt32 = data.getFixedWidthInteger(at: 1)
XCTAssertEqual(uut, 2_175_025_152)

My guess is that it is not safe and I've just stumbled into a corner case that is not triggering an assertion. Maybe this is safe and completely defined behavior. However, given that I used to think my old code was safe and defined behavior and I was wrong I figured I'd rather ask and be sure.

How bad is this?

(Ryan Lovelett) #2

I'm starting to think that maybe it is safe but possibly not very performant because it copies the entire block of memory.

#3

I switched my code to using the following:

fileprivate func read<T: FixedWidthInteger>(_ type: T.Type) throws -> T {
    let byteCount = MemoryLayout<T>.size

    guard cursor + byteCount <= data.endIndex else { throw Errors.prematureEndOfData }

    defer { advance(by: byteCount) }

    return data[cursor ..< cursor + byteCount]
        .reduce(T(0), { $0 << 8 | T($1) })
}

Due to the Data subscript ensuring that reading a single element (byte) will always succeed.

[edit]

Despite not being an issue in the codebase from which the above method appears, I edited it to be more correct as a standalone method.

#4

Is this code implicitly assuming that data has zero-based indices? (It’s hard to tell without knowing what cursor represents.)

#5

Yes, that assumption is being made. It's not an issue in the codebase in which this method appears. The key here, though, is to demonstrate how I'm forming integers.

(Chas Conway) #6

I'm with @Avi in that, based on those other threads, I have switched to parsing FixedWidthIntegers using the indexed byte access from Data and shifting. From what I understand you can avoid unaligned access via pointers, but that maybe if you don't require using pointers it's safer to use more abstract APIs. I created some extensions to suit my major use cases which involve stepping over a specified count of FixedWidthIntegers and then parsing a FixedWidthInteger. If you're relying on .prefix() and such it's much harder to escape the bounds of your Data instance then with UnsafeRawPointer.