Hi,
I would like to convert [UInt8]
to Int
What is the recommended / safe way to do this ?
Note: I am using Swift 5.1
Hi,
I would like to convert [UInt8]
to Int
What is the recommended / safe way to do this ?
Note: I am using Swift 5.1
If you're somewhat comfortable with UnsafePointer
, you can do
let a: [UInt8] = [0, 1, 0, 0, 0, 0, 0, 0]
assert(a.count * MemoryLayout<UInt8>.stride >= MemoryLayout<Int>.size)
let b = UnsafeRawPointer(a).assumingMemoryBound(to: Int.self).pointee.littleEndian
which assumes that the data is in littleEndian format.
I am not so familiar with unsafe pointer, I think I would need to read about it.
We are using big endian, so what should be the changes in the above code ?
There's .littleEndian
at the end of of last line, that's where the conversion is, change to .bigEndian
.
PS
I might break some UnsafePointer taboo. Perhaps someone here will be able to point that out. It'd work with current implementation but might break in an unpredictable way in future versions. Come back to check regularly .
Thanks a lot !!
After diving back into UnsafeRawPointer docs, I believe using load(fromByteOffset:as:) would be the correct usage.
assert(a.count * MemoryLayout<UInt8>.stride >= MemoryLayout<Int>.size)
let b = UnsafeRawPointer(a).load(as: Int.self).bigEndian
You'd still have memory alignment consideration to be aware of, but that holds true for previous method as well.
@Lantua thank you so much, since we are using Int
(on both solutions, like load(as: Int.self)
and assumingMemoryBound(to: Int.self)
) is it less implementation specific or working with unsafe raw pointer always is reliant on implementation details of the types and might break when implementation details change ?
It's largely about semantic, the memory layout for simple types should be pretty solid by now.
I'm more concerned about memory binding. In an ideal world, each part of the memory would be tagged, as Int
or UInt8
, and we're trying to read Int
out of UInt8
memory, which is not allowed in Swift. (There's also an implication that using memory with incorrect binding could lead to undefined behavior though I'm not sure how that works.)
assumingMemoryBound
, well, assumes that you already bind that region of memory to Int
at some point (which we didn't) and binding memory to a type would cause it to lose binding of the previous type (we would lose UInt8
binding) so we don't want that either.
So I'm trying to find a variant that doesn't change/assume the binding of the memory, and load(fromByteOffset:as:)
seems to be what I want.
If you're curious, the docs as well as SE-0107 would be useful.
I do this a lot because I’m always parsing network packets and other weirdo data structures, and in my experience the best way to do this is byte-wise. For example, for little endian:
let bytes: [UInt8] = [0x01, 02, 03, 04]
let u32 = bytes.reversed().reduce(0) { soFar, byte in
return soFar << 8 | UInt32(byte)
}
print(String(u32, radix: 16)) // 4030201
Drop the reversed
for big endian.
Doing this using unsafe pointers is a pain for a number of reasons:
It’s… well… unsafe. A simple mistake can ruin your whole day.
And it’s easy to make a mistake. As an example, my code above converts to UInt32
because it’s pretty darned rare that you find a network protocol that uses Int
(remember that Int
changes size depending on your architecture).
You have to worry about type punning.
You have to worry about memory alignment.
You have to worry about big and little endian. Admittedly the code above deals with little and big endian, but when you assemble multi-byte integers byte-by-byte, it’s hard to forget about that problem (-:
Now, don’t get me wrong, there are clearly cases where dealing with unsafe pointers is the right choice, but IMO it shouldn’t be your first choice.
Share and Enjoy
Quinn “The Eskimo!” @ DTS @ Apple
Thanks a lot @Lantua for explaining the concern.
I think I really need to read about unsafe raw pointers and how they are used.
Thanks a lot @eskimo, I didn't realise that Int
was platform specific.
That's a nice solution to shift bits and do an OR operation, so that the bytes get added to form the final UInt32
.
We can generalize @eskimo's solution a bit if we want (perhaps too much) to work with any iterator or collection of UInt8
and any FixedWidthInteger
type:
extension FixedWidthInteger {
init<I>(littleEndianBytes iterator: inout I)
where I: IteratorProtocol, I.Element == UInt8 {
self = stride(from: 0, to: Self.bitWidth, by: 8).reduce(into: 0) {
$0 |= Self(truncatingIfNeeded: iterator.next()!) &<< $1
}
}
init<C>(littleEndianBytes bytes: C) where C: Collection, C.Element == UInt8 {
precondition(bytes.count == (Self.bitWidth+7)/8)
var iter = bytes.makeIterator()
self.init(littleEndianBytes: &iter)
}
}
Even in this ridiculous generality, Swift generates basically optimal code with optimization enabled, which I think is pretty cool:
func foo(_ bytes: [UInt8]) -> UInt32 {
return UInt32(littleEndianBytes: bytes)
}
output.foo([Swift.UInt8]) -> Swift.UInt32:
push rbp // setup stack frame
mov rbp, rsp
cmp qword ptr [rdi + 16], 4 // check count == 4
jne .LBB6_1
mov eax, dword ptr [rdi + 32] // 4-byte load from array
pop rbp // tear down stack frame
ret
.LBB6_1:
ud2 // trap if count != 4
Even in this ridiculous generality, Swift generates basically optimal
code with optimization enabled
Wow.
Share and Enjoy
Quinn “The Eskimo!” @ DTS @ Apple
I think it's quite common to parse a byte array into some logical structure, and handle e.g. endianness and different widths.
I usually use an ArraySlice and an extension to the ArraySlice and a throwing method to extract values with the correct endianness.
E.g. (quite pseudo):
var slice = buffer[buffer.startIndex..<buffer.endIndex]
let value = try slice.consume(type: UInt32.self, endian: .big)
let count = try slice.consume(type: Int16.self)
let string = try slice.consumeString(encoding: .utf8, length: count)
which takes care of e.g. reading out-of-bounds and type conversion.
IMHO something like this is missing from the standard library, as we otherwise have to drop down to using UnsafePointers or a the, little hacky, solution of reducing the buffer - and we just want to get the data parsed and get on with it. :)
Thanks a lot @scanon, the extension is really nice as it convenient and clear at the point of use.
Thanks everyone was a really good learning for me and found it really useful.
Is there any reason not to use load(as:)
here?
Something like this for the general case:
extension ArraySlice {
func load<T>(as type: T.Type) -> T {
return self.withUnsafeBytes{ $0.load(as: T.self) }
}
}
Usage:
let bytes: [UInt8] = [0, 1, 2, 3, 4, 5, 6, 7]
let x = bytes[4...].load(as: Float.self)
load(as:)
requires that the pointer be suitably-aligned for the type being loaded (this is something that @Joe_Groff, @Andrew_Trick and others have been chatting about improving recently). From the documentation of load(as:)
:
The buffer pointer plus offset must be properly aligned for accessing an instance of type T.
Your example happens to work some of the time, because there's no opportunity for the compiler to abuse the undefined behavior you're invoking, and you're running on a CPU that silently supports unaligned access to memory (and your slice is likely to be four-byte aligned by virtue of how it was created). However, on other architectures this might trap or produce unspecified results, and even on x86, it may allow the compiler to generate vectorized code that traps when the pointer is unaligned, or optimize on the assumption that it is and produce undefined behavior.
Please don't do this.
In debug builds at least, UnsafeRawPointer.load(as:)
will assert on misaligned data. Clearly, it would be really nice to be able to UnsafeRawPointer.load(as:)
on unaligned data. I think we've seen enough justification by now to loosen this restriction, which won't affect any existing code. If we really want aligned raw pointer loads for performance in the future then we could introduce an alignedLoad
API later.
Lot's of good stuff here. I'd like to add the following.
public enum Endian {
case big, little
}
protocol IntegerTransform: Sequence where Element: FixedWidthInteger {
func toInteger<I: FixedWidthInteger>(endian: Endian) -> I
}
extension IntegerTransform {
func toInteger<I: FixedWidthInteger>(endian: Endian) -> I {
let f = { (accum: I, next: Element) in accum &<< next.bitWidth | I(next) }
return endian == .big ? reduce(0, f) : reversed().reduce(0, f)
}
}
extension Data: IntegerTransform {}
extension Array: IntegerTransform where Element: FixedWidthInteger {}
let bytes: [UInt8] = [0x01, 02, 03, 04]
print(bytes)
let u64le: UInt64 = Data(bytes).toInteger(endian: .little)
print(String(u64le, radix: 16))
let u64be: UInt64 = Data(bytes).toInteger(endian: .big)
print(String(u64be, radix: 16))
let words: [UInt16] = [0xffff, 0xfffe, 1, 0]
let u64be2: UInt64 = words.toInteger(endian: .big)
print(String(u64be2, radix: 16))
Just for those who want a stable follow-up for Steve's example, here's a compiler explorer link to the code. This is a good example of why it's important to push on the Swift optimiser: it should always be possible to write very general safe code that generates optimal (or nearly optimal) output in the easy case and good output in the complex case.