Data type returning wrong index values when sliced

quasarsunnix · July 31, 2021, 4:20pm

I'm trying to decode some binary chunks into various fixed width integers in iOS. The format of the abridged data features 5 UInt32s, followed by n amount of Int16s.

Each chunk I receive features 1 initial byte of UInt8 stating the size of that payload, followed by 244 bytes of actual data. My approach has been to discard this initial byte and append the remaining data to a data buffer using the following code:

let payload = data.advanced(by: 1)
payloadBuffer.append(payload)

After receiving the expected total amount of bytes (determined by payloadBuffer.count), I perform a crc32 check to ensure data integrity, which passes.

All good so far!

However I've noticed that if I want to slice the payloadBuffer and remove the initial 20 bytes to focus solely on the Int16's using a similar approach to above the index is always out by 2 bytes.

So when I use the following I seem to lose the initial Int16 value:

let slicedBuffer = payload.advanced(by: 20)

However if I use:

let slicedBuffer = payload.advanced(by: 18)

I am correctly getting the initial Int16 value, even though you would assume this is the last two bytes of the previous UInt32.

I'm using the following code to decode the slicedBuffer (usually with a byte offset):

Int16(littleEndian: trimmedBuffer.withUnsafeBytes({ $0.load(as: Int16.self) }))

I've tried using subscripting instead of .advanced(by:), as well as .subData(in:), but still getting the same result. I've also tried different methods to decode the Int16s, such as some of the following:

let test = payloadBuffer.withUnsafeBytes { bytes -> Int16 in
    let int: Int16 = bytes.loadFixedWidthInteger(fromByteOffset: 20)
    return int
}

extension UnsafeRawBufferPointer {

    func loadFixedWidthInteger<T: FixedWidthInteger>(fromByteOffset offset: Int) -> T {
        var value: T = 0
        withUnsafeMutableBytes(of: &value) { valuePtr in
            valuePtr.copyBytes(from: UnsafeRawBufferPointer(start: self.baseAddress!.advanced(by: offset),
                                                            count: MemoryLayout<T>.size))
        }
        return value
    }
}

or simply...

Int16(bytes[21]) << 8 | Int16(bytes[20])

I believe this has something to do with the fact that the bytes are unaligned in memory? I've read through many forum posts related to unaligned loads (e.g. 1, 2), but none of the suggested solutions seem to resolve the issue:

Swift Forum - Unaligned Load

Stack Overflow - Creating an aligned array of bytes

For the life of me though I cannot work out what is causing this quirk? If anyone has any suggestions I'd be very grateful!

Jens · July 31, 2021, 5:01pm

I tried to see if I could reproduce the unexpected behavior you describe using this program:

import Foundation

func test() {
  var data = Data()
  data.append(contentsOf: [
    26, // Initial byte stating the remaining number of bytes
    11, 12, 13, 14, // bytes of 1st UInt32 value
    21, 22, 23, 24, // bytes of 2nd UInt32 value
    31, 32, 33, 34, // bytes of 3rd UInt32 value
    41, 42, 43, 44, // bytes of 4th UInt32 value
    51, 52, 53, 54, // bytes of 5th UInt32 value
    61, 62, // bytes of 1st UInt16
    71, 72, // bytes of 2nd UInt16
    81, 82, // bytes of 3rd UInt16
  ])
  print("data:", data.map(String.init).joined(separator: ", "))
  let payload = data.advanced(by: 1)
  print("payload:", payload.map(String.init).joined(separator: ", "))

  let slicedBuffer = payload.advanced(by: 20)
  print("slicedBuffer:", slicedBuffer.map(String.init).joined(separator: ", "))

  slicedBuffer.withUnsafeBytes { ptr in
    print("UInt16 values:")
    for uint16Index in 0 ..< ptr.count/2 {
      let uint16Value = ptr.load(fromByteOffset: uint16Index * 2, as: UInt16.self)
      print(uint16Value, "( low byte:", uint16Value & 0xff, ", high byte:", uint16Value >> 8, ")")
    }
  }
}

test()

But couldn't, as the output is as expected:

data: 26, 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 71, 72, 81, 82
payload: 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 71, 72, 81, 82
slicedBuffer: 61, 62, 71, 72, 81, 82
UInt16 values:
15933 ( low byte: 61 , high byte: 62 )
18503 ( low byte: 71 , high byte: 72 )
21073 ( low byte: 81 , high byte: 82 )

quasarsunnix · July 31, 2021, 5:05pm

Thanks Jens,

Out of interest were you running this on x86 or ARM?

Jens · July 31, 2021, 5:08pm

x86

% swiftc --version
Apple Swift version 5.4.2 (swiftlang-1205.0.28.2 clang-1205.0.19.57)
Target: x86_64-apple-darwin20.5.0

What does my test program output if you run it?

Karl · July 31, 2021, 5:58pm

It shouldn't have anything to do with alignment - although you are doing the right thing by using an unaligned load.

Raw bytes themselves don't have any alignment requirements. It's only once you start saying "this is a 16-bit integer" that that integer must be aligned. So in the loadFixedWidthInteger function, value will be correctly aligned, and then you copy the (potentially misaligned) bytes in to the correctly-aligned value. The bytes can be anywhere, because they're just bytes, but the integer has stricter requirements.

As for the slicing issue? No idea. I'd recommend inspecting the contents of the data at each step (bearing in mind that Data's indices might not start at 0! This catches a lot of people out and can easily be the source of the strange behaviour). Confirm that before you call .advanced(by: 20), that those initial 20 bytes are really what you expect to be there, and what you want it to discard. If you see that it isn't discarding the initial 20 bytes despite you asking it to, then that would seem pretty clearly to be a bug.

quasarsunnix · July 31, 2021, 6:45pm

Thanks Karl, I've started to extract the data as bin files at various steps, so that I can inspect via a Hex Editor, and I'm starting to think this may be a documentation issue.

quasarsunnix · July 31, 2021, 6:56pm

Just for my understanding - is there much difference between using .copyBytes(from: ) vs. load(as: ) in terms of safety?

Karl · August 2, 2021, 1:11pm

They do different things. copyBytes copies bytes, but load(as:) dereferences the pointer, which means types (hence the as: parameter).

Types come with additional complexity - alignment, layout, padding, etc. Bytes have none of that; a byte is the smallest unit of addressable memory - they are the units in which memory addresses are expressed (i.e. the space between addresses n and n+1 always contains exactly 1 byte). That means they are indivisible - they have no internal layout or padding, and by definition no pointer to bytes can ever be misaligned.

(Pedantic note: at lower levels, bytes may have internal layout due to things like parity bits, but the MMU handles that at the hardware level. I don't think you could see those even if you wanted to.)

So the difference comes down to whether you can guarantee that the source data (and pointers to elements within the source data) will always meet the requirements needed to dereference directly as a type. Otherwise undefined behaviour. Generally it's best to assume as little as possible and see how much it can be tightened later, how you verify that and what the benefits are.