Convert [UInt8] to Int

somu · October 23, 2019, 6:33pm

Hi,

I would like to convert [UInt8] to Int

What is the recommended / safe way to do this ?

Note: I am using Swift 5.1

Lantua · October 23, 2019, 6:40pm

If you're somewhat comfortable with UnsafePointer, you can do

let a: [UInt8] = [0, 1, 0, 0, 0, 0, 0, 0]

assert(a.count * MemoryLayout<UInt8>.stride >= MemoryLayout<Int>.size)
let b = UnsafeRawPointer(a).assumingMemoryBound(to: Int.self).pointee.littleEndian

which assumes that the data is in littleEndian format.

somu · October 23, 2019, 6:52pm

I am not so familiar with unsafe pointer, I think I would need to read about it.

We are using big endian, so what should be the changes in the above code ?

Lantua · October 23, 2019, 7:01pm

There's .littleEndian at the end of of last line, that's where the conversion is, change to .bigEndian.

PS

I might break some UnsafePointer taboo. Perhaps someone here will be able to point that out. It'd work with current implementation but might break in an unpredictable way in future versions. Come back to check regularly .

somu · October 23, 2019, 7:07pm

Thanks a lot !!

Lantua · October 23, 2019, 7:26pm

After diving back into UnsafeRawPointer docs, I believe using load(fromByteOffset:as:) would be the correct usage.

assert(a.count * MemoryLayout<UInt8>.stride >= MemoryLayout<Int>.size)
let b = UnsafeRawPointer(a).load(as: Int.self).bigEndian

You'd still have memory alignment consideration to be aware of, but that holds true for previous method as well.

somu · October 23, 2019, 7:37pm

@Lantua thank you so much, since we are using Int (on both solutions, like load(as: Int.self) and assumingMemoryBound(to: Int.self)) is it less implementation specific or working with unsafe raw pointer always is reliant on implementation details of the types and might break when implementation details change ?

Lantua · October 23, 2019, 8:13pm

It's largely about semantic, the memory layout for simple types should be pretty solid by now.

I'm more concerned about memory binding. In an ideal world, each part of the memory would be tagged, as Int or UInt8, and we're trying to read Int out of UInt8 memory, which is not allowed in Swift. (There's also an implication that using memory with incorrect binding could lead to undefined behavior though I'm not sure how that works.)

assumingMemoryBound, well, assumes that you already bind that region of memory to Int at some point (which we didn't) and binding memory to a type would cause it to lose binding of the previous type (we would lose UInt8 binding) so we don't want that either.

So I'm trying to find a variant that doesn't change/assume the binding of the memory, and load(fromByteOffset:as:) seems to be what I want.

If you're curious, the docs as well as SE-0107 would be useful.

eskimo · October 23, 2019, 8:25pm

I do this a lot because I’m always parsing network packets and other weirdo data structures, and in my experience the best way to do this is byte-wise. For example, for little endian:

let bytes: [UInt8] = [0x01, 02, 03, 04]
let u32 = bytes.reversed().reduce(0) { soFar, byte in
    return soFar << 8 | UInt32(byte)
}
print(String(u32, radix: 16))   // 4030201

Drop the reversed for big endian.

Doing this using unsafe pointers is a pain for a number of reasons:

It’s… well… unsafe. A simple mistake can ruin your whole day.
And it’s easy to make a mistake. As an example, my code above converts to UInt32 because it’s pretty darned rare that you find a network protocol that uses Int (remember that Int changes size depending on your architecture).
You have to worry about type punning.
You have to worry about memory alignment.
You have to worry about big and little endian. Admittedly the code above deals with little and big endian, but when you assemble multi-byte integers byte-by-byte, it’s hard to forget about that problem (-:

Now, don’t get me wrong, there are clearly cases where dealing with unsafe pointers is the right choice, but IMO it shouldn’t be your first choice.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

somu · October 23, 2019, 8:33pm

Thanks a lot @Lantua for explaining the concern.

I think I really need to read about unsafe raw pointers and how they are used.

somu · October 23, 2019, 8:38pm

Thanks a lot @eskimo, I didn't realise that Int was platform specific.

That's a nice solution to shift bits and do an OR operation, so that the bytes get added to form the final UInt32.

scanon · October 24, 2019, 1:07am

We can generalize @eskimo's solution a bit if we want (perhaps too much) to work with any iterator or collection of UInt8 and any FixedWidthInteger type:

extension FixedWidthInteger {
  init<I>(littleEndianBytes iterator: inout I)
  where I: IteratorProtocol, I.Element == UInt8 {
    self = stride(from: 0, to: Self.bitWidth, by: 8).reduce(into: 0) {
      $0 |= Self(truncatingIfNeeded: iterator.next()!) &<< $1
    }
  }
  
  init<C>(littleEndianBytes bytes: C) where C: Collection, C.Element == UInt8 {
    precondition(bytes.count == (Self.bitWidth+7)/8)
    var iter = bytes.makeIterator()
    self.init(littleEndianBytes: &iter)
  }
}

Even in this ridiculous generality, Swift generates basically optimal code with optimization enabled, which I think is pretty cool:

func foo(_ bytes: [UInt8]) -> UInt32 {
    return UInt32(littleEndianBytes: bytes)
}

output.foo([Swift.UInt8]) -> Swift.UInt32:
        push    rbp                         // setup stack frame
        mov     rbp, rsp
        cmp     qword ptr [rdi + 16], 4     // check count == 4
        jne     .LBB6_1
        mov     eax, dword ptr [rdi + 32]   // 4-byte load from array
        pop     rbp                         // tear down stack frame
        ret
.LBB6_1:
        ud2                                 // trap if count != 4

eskimo · October 24, 2019, 7:39am

Even in this ridiculous generality, Swift generates basically optimal
code with optimization enabled

Wow.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

ajo · October 24, 2019, 9:45am

I think it's quite common to parse a byte array into some logical structure, and handle e.g. endianness and different widths.

I usually use an ArraySlice and an extension to the ArraySlice and a throwing method to extract values with the correct endianness.

E.g. (quite pseudo):

var slice = buffer[buffer.startIndex..<buffer.endIndex]
let value = try slice.consume(type: UInt32.self, endian: .big)
let count = try slice.consume(type: Int16.self)
let string = try slice.consumeString(encoding: .utf8, length: count)

which takes care of e.g. reading out-of-bounds and type conversion.

IMHO something like this is missing from the standard library, as we otherwise have to drop down to using UnsafePointers or a the, little hacky, solution of reducing the buffer - and we just want to get the data parsed and get on with it. :)

somu · October 25, 2019, 12:00am

Thanks a lot @scanon, the extension is really nice as it convenient and clear at the point of use.

Thanks everyone was a really good learning for me and found it really useful.

Nevin · October 25, 2019, 4:13pm

Is there any reason not to use load(as:) here?

Something like this for the general case:

extension ArraySlice {
  func load<T>(as type: T.Type) -> T {
    return self.withUnsafeBytes{ $0.load(as: T.self) }
  }
}

Usage:

let bytes: [UInt8] = [0, 1, 2, 3, 4, 5, 6, 7]
let x = bytes[4...].load(as: Float.self)

scanon · October 25, 2019, 5:37pm

load(as:) requires that the pointer be suitably-aligned for the type being loaded (this is something that @Joe_Groff, @Andrew_Trick and others have been chatting about improving recently). From the documentation of load(as:):

The buffer pointer plus offset must be properly aligned for accessing an instance of type T.

Your example happens to work some of the time, because there's no opportunity for the compiler to abuse the undefined behavior you're invoking, and you're running on a CPU that silently supports unaligned access to memory (and your slice is likely to be four-byte aligned by virtue of how it was created). However, on other architectures this might trap or produce unspecified results, and even on x86, it may allow the compiler to generate vectorized code that traps when the pointer is unaligned, or optimize on the assumption that it is and produce undefined behavior.

Please don't do this.

Andrew_Trick · October 25, 2019, 6:02pm

In debug builds at least, UnsafeRawPointer.load(as:) will assert on misaligned data. Clearly, it would be really nice to be able to UnsafeRawPointer.load(as:) on unaligned data. I think we've seen enough justification by now to loosen this restriction, which won't affect any existing code. If we really want aligned raw pointer loads for performance in the future then we could introduce an alignedLoad API later.

fswarbrick · October 27, 2019, 8:41pm

Lot's of good stuff here. I'd like to add the following.

public enum Endian {
    case big, little
}

protocol IntegerTransform: Sequence where Element: FixedWidthInteger {
    func toInteger<I: FixedWidthInteger>(endian: Endian) -> I 
}

extension IntegerTransform {
    func toInteger<I: FixedWidthInteger>(endian: Endian) -> I {
        let f = { (accum: I, next: Element) in accum &<< next.bitWidth | I(next) }
        return endian == .big ? reduce(0, f) : reversed().reduce(0, f)
    }
}

extension Data: IntegerTransform {}
extension Array: IntegerTransform where Element: FixedWidthInteger {}

let bytes: [UInt8] = [0x01, 02, 03, 04]
print(bytes)
let u64le: UInt64 = Data(bytes).toInteger(endian: .little)
print(String(u64le, radix: 16))
let u64be: UInt64 = Data(bytes).toInteger(endian: .big)
print(String(u64be, radix: 16))

let words: [UInt16] = [0xffff, 0xfffe, 1, 0]
let u64be2: UInt64 = words.toInteger(endian: .big)
print(String(u64be2, radix: 16))

lukasa · October 28, 2019, 4:36pm

Just for those who want a stable follow-up for Steve's example, here's a compiler explorer link to the code. This is a good example of why it's important to push on the Swift optimiser: it should always be possible to write very general safe code that generates optimal (or nearly optimal) output in the easy case and good output in the complex case.