File format?
In my experience it is quite rare to write if bigEndian { … } else { … }
.
Most of the times this whole 2 * UInt8 -> UInt16
dance is caused by a binary file format that asks us to interpret bytes from 100 to 140 as [Int16]
. In this case the format specifies whether the numbers are stored as little/big endian.
With this you do not need to write the if bigEndian { … } else { … }
, you can just hard-code the endianness directly in the decoding function. (Hopefully with the the comment that links to the relevant section of the documentation/standard.)
For example the protobufs documentation states (emphasis mine):
How do you figure out that this is 150? First you drop the MSB from each byte, as this is just there to tell us whether we’ve reached the end of the number (as you can see, it’s set in the first byte as there is more than one byte in the varint). These 7-bit payloads are in little-endian order. Convert to big-endian order, concatenate, and interpret as an unsigned 64-bit integer:
10010110 00000001 // Original inputs.
0010110 0000001 // Drop continuation bits.
0000001 0010110 // Convert to big-endian.
00000010010110 // Concatenate.
128 + 16 + 4 + 2 = 150 // Interpret as an unsigned 64-bit integer.
When implementing the protobuf decoder you can just hard-code the algorithm above. No need to check endianness.
It may happen that your file format contains a flag that dictates the endianness. In such case you use the value supplied in the file.
The big question is: where did you get this data? What does it represent? What does the documentation/standard says?
Is [UInt8]
aligned to UInt16
?
Personally I always see such cases as: we have a 50% chance that [UInt8]
is UInt16
aligned. 50% is not suitable for production. (This is a simplification, but I feel it is better than relying on some internal implementation details of Swift runtime.)
At the same time a single UInt8
in a [UInt8]
is always UInt8
aligned. Thus I will always write the decoding step manually, possibly hard coding the endianness according to the spec.
Some may say that this is slower than the reinterpret-cast, but it is definitely safer. Performance can be improved later, possibly with the use of reinterpret-cast. This would be a separate ticket that requires an additional research and possibly a deep dive into the Swift source code. This approach gives your manager more data on what you did during your working hours:
- [IMPL-007] Implemented the file format - 24h
- [IMPL-008] Improved
[UInt16]
decoding performance - 12h
Final
+1 on the wrapper suggested by @lukasa. Personally I would simplify it to just the bare essentials:
- remove generic
- remove
RandomAccessCollection
Unless you actually need those features.
struct UInt16View {
private let bytes: [UInt8] // Foundation.Data/UnsafePointer whatever you have
var count: Int { self.bytes.count / 2 }
init(bytes: [UInt8]) {
// Do not use '%'. Swift has 'isMultiple' for exactly this reason.
precondition(bytes.count.isMultiple(of: 2), "…")
self.bytes = bytes
}
subscript(index: Int) -> UInt16 {
// Implement based on the standard/documentation.
let high = self.bytes[2 * index]
let low = self.bytes[2 * index + 1]
return UInt16(high) << 8 | UInt16(low)
}
}
If the endianness is supplied in the file and known only at runtime, then just provide it in init
and store as a property:
init(bytes: [UInt8], endian: Endian) { … }