Swift 5 Read binary file ( problem with unsafe mutable pointers)

jbrohan · November 22, 2019, 3:03pm

The recent changes to Swift 5 in the area of Unsafe Buffer pointers to Data raise serious syntax issues for a beginner (me).

Does anyone have a "read binary file of Int16 to an array" that actually compiles in Swift 5? All the examples I can find of extracting items out of a Data turn out to not compile clean, and raise the same issue with unsafe pointers of one kind or another.

'withUnsafeMutableBytes' is deprecated: use withUnsafeMutableBytes(_: (UnsafeMutableRawBufferPointer) throws -> R) rethrows -> R instead

micampe · November 22, 2019, 5:42pm

Since Data can be read as a sequence of UInt8 you can use one of the techniques described here:

A simplified example of what is described there that only reads Int16 would be something like this (note that this example assumes the data is stored as big endian and that it will discard the last byte if the number of bytes in Data is odd):

var numbers: [Int16] = []
for index in data.startIndex ..< data.index(before: data.endIndex) {
let num = Int16(data[index]) << 8 + Int16(data[data.index(after: index)])
    numbers.append(num)
}

jbrohan · November 25, 2019, 1:16pm

Thanks, but my problem is to get the Data as an array of Int16. The examples simply do not compile.

eskimo · November 26, 2019, 10:08am

@micampe’s code compiles just fine for me (in an Xcode 11.2 command-line tool project). It does not, alas, work )-: The problem is that it iterates every index of the input data, and it should be iterating every other index.

Here’s something that does work:

import Foundation

func main() {
    let data = Data([0x12, 0x34, 0x45, 0x67, 0x78])
    var numbers: [Int16] = []
    var iter = data.makeIterator()
    while true {
        guard
            let b1 = iter.next(),
            let b2 = iter.next()
        else {
            break
        }
        let num = Int16(b1) << 8 | Int16(b2)
        numbers.append(num)
    }
    print(numbers)
    // prints [4660, 17767]
    // where 4660 is 0x1234 and 17767 is 4567
}

main()

Alternatively, you could make something that works on any sequence of bytes:

import Foundation

func asUInt16s<Input>(_ input: Input) -> AnySequence<UInt16>
where
    Input: Sequence,
    Input.Element == UInt8
{
    let s = sequence(state: input.makeIterator()) { iter -> UInt16? in
        guard
            let b1 = iter.next(),
            let b2 = iter.next()
        else {
            return nil
        }
        return UInt16(b1) << 8 | UInt16(b2)
    }
    return AnySequence(s)
}

func main() {
    let d = Data([0x12, 0x34, 0x45, 0x67, 0x78])
    print([UInt16](asUInt16s(d)))
    // prints [4660, 17767]
    let b: [UInt8] = [0x12, 0x34, 0x45, 0x67, 0x78]
    print([UInt16](asUInt16s(b)))
    // likewise
}

main()

There is, of course, a question of whether you should using this approach at all. I’m a big fan of avoiding unsafe techniques in general, but it’s not always the best option.

How big is your input data? Where does it come from? And where is the output array of Int16 going? And how frequently are you doing this?

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

jbrohan · November 26, 2019, 3:20pm

func readFileToInt16 (path: URL?) -> [Int16] {
    var numbers: [Int16] = []
    if let data: Data = NSData(contentsOf: path!) as Data?  {
        var iter = data.makeIterator()
        while true {
            guard
                let b1 = iter.next(),
                let b2 = iter.next()
                else {
                    break
                }
            let num = Int16(b2) << 8 | Int16(b1)
            numbers.append(num)
        }
    }else{
       print ("in readFileToInt16...read file failed")
    }
    return numbers
}

Note... It will return an empty array to denote a failure to read the file.

Many thanks to Quinn “The Eskimo!”

dhoepfl · November 27, 2019, 10:44am

The recent changes to Swift 5 in the area of Unsafe Buffer pointers to Data raise serious syntax issues for a beginner (me).

Same for me. I have failed several times to do simple mappings of data to a given type (without copying).

My take on “convert binary block to [Int16]”:

import Foundation

let data = Data([0x00, 0x01, 0x23, 0x42, 0x80, 0x00, 0x00, 0x80])

let array = data.withUnsafeBytes { (pointer: UnsafeRawBufferPointer) -> [Int16] in
    let buffer = pointer.bindMemory(to: Int16.self)
    return buffer.map { Int16(bigEndian: $0) }
}

Note that this does copy the data, converting it from net order to host order. If you do not need the array later, just use the buffer directly (within the withUnsafeBytes block):

data.withUnsafeBytes { (pointer: UnsafeRawBufferPointer) -> Void in
    let buffer = pointer.bindMemory(to: Int16.self)
    buffer.forEach { print("Native: \($0), Net Order: \(Int16(bigEndian: $0))") }
}

(Tested using Swift version 5.1.1-dev (LLVM 6e04008c7f, Swift db902a19cd))

eskimo · November 28, 2019, 9:03am

The problem with this approach is that it doesn’t handle misalignment. To continue your example, if pointer is not even, buffer is constructed with a base address that’s not even, which is not supported [1].

You can see this in action if you add one byte to data and then call data.dropFirst().withUnsafeBytes ….

There are circumstances when this approach makes sense, the most notable being in performance-sensitive code. However, my recommendation is that, in general, you avoid unsafe pointers and work byte-by-byte.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] See the discussion of the start parameter on this page.

gonsolo · November 29, 2019, 1:49pm

Can you do the same with a file of Floats?

jbrohan · November 29, 2019, 3:35pm

Ordinary mortals have no business with the internal layout of Floats and Doubles.

The reason you cannot simply get a pointer to 8 bytes and interpret it as a Float is that it has to be aligned on a particular boundary in memory (16 bytes I think).

My (Quinn's really) line of code says that num:Int16 = another Int16, which was made by shifting two Int16's.

This will not work with Floats because the purpose of each byte in a Float depends where it is (mantissa or exponent).

An array of Floats is correctly aligned, so you could move your buffer to the Array using pointers, I would expect, and extract the Float.
My success with pointers is low so far. It is my opinion that such matters should be handled by compiler writes rather than ordinary mortals like me.

gonsolo · November 29, 2019, 3:43pm

Ordinary mortals have to deal with binary ply files where Floats and UInt8s are freely mixed (without alignment). I am able to read them, it's just slow (50x slower than C++); Reading binary file is slow
I was hoping to find another way that is faster.

eskimo · December 1, 2019, 3:16pm

Reading arbitrarily-aligned binary floating point numbers from a file is, indeed, a pain. Doing things byte-wise, like you do with integers, isn’t easy, which means you will likely end up having to convert between bytes and floats. The best way to do that will depend on how you decide how to handle buffered I/O. Looking at your other thread, it seems that you’re still wrangling that question.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

gonsolo · December 1, 2019, 4:12pm

I'd like something like

extension Data {
        func convert<T>(to type: T.Type) -> T {
                return self.withUnsafeBytes { $0.load(as: T.self) }
        }
}

which works for unaligned Floats with swiftc -O but not in debug mode.
Is there a way to load unaligned Floats if I know that my file consists of exactly one UInt8 and one Float?

eskimo · December 2, 2019, 8:42am

[I see, reading your [other thread]ref, that you’re already past this roadblock, but I want to follow up here just so that other folks don’t get tripped up by this. My recommendation is that you not respond here, but instead continue to follow-up on your other thread, because that captures a bunch more details about your specific requirements.]

You wrote:

which works for unaligned Floats with swiftc -O but not in debug mode.

Right. The problem here is that unsafe pointers must be aligned (per the doc I referenced back on 28 Nov). This isn’t a requirement of the current crop of CPUs but rather a requirement of Swift. If you don’t follow this rule, you run the risk of trapping, or other misbehaviour, on future CPUs that don’t allow misaligned access. Such problems may only show up under obscure circumstances [1], so the Swift compiler inserts a check in the debug build to avoid you accidentally shipping code that might trigger it in the field.

The approach you posted in the other thread — using a byte-by-byte copy to extract the value from the buffer into a local Float — is the direction I’d recommend.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] I once helped a developer debug a problem that hung a single core on specific multicore PowerPC systems. It turned out that they were reading a misaligned floating point value that crossed a 16 MiB boundary (known as a segment in the PowerPC architecture, although that has nothing to do with segments are you might understand them from x86). The CPU would trap on such accesses, and for compatibility reasons the microkernel would handle that trap and emulate the correct result. This emulation had a bug that crashed the microkernel on that core, with the end result being that the microkernel would get stuck in a crash loop that hung the core. Weirdly, other cores on the CPU would continue to run just fine, so the Mac would keep working, just with one core pinned in a crash loop.