Reading binary file is slow

gonsolo · November 29, 2019, 2:55pm

Hi!

I'm trying to read binary ply files with mixed and/or unaligned UInt8 and Float values. For a 16Mb (4 Million Floats) file the C++ takes 0.09s to read whereas the Swift version nearly takes 5s. Is it possible to make this faster? Right now I'm using something like this:

extension Data {
func convert(at index: Data.Index, to type: T.Type) -> (T, Self.Index) {
let size = MemoryLayout.size
let end = index + size
let view = self[index..<end]
let value = view.withUnsafeBytes { $0.load(as: T.self) }
return (value, end)
}
}

Here are my files:

common.h:

const int count = 4 * 1024 * 1024;

write.cc:

#include
#include
#include "common.h"

using namespace std;

template
void write(ostream& out, T value) {
out.write((char*)&value, sizeof(value));
}

void write() {
ofstream out("bla.ply", ios::binary);
uint8_t x = 124;
write(out, x);

float f = 13.3;
for(int i = 0; i < count; ++i) {
write(out, f);
}

out.close();
}

int main() {
write();
}

read.cc:

#include
#include
#include "common.h"

using namespace std;

template
void read(istream& in, T& value) {
in.read((char*)&value, sizeof(value));
}

void read() {
ifstream in("bla.ply", ios::binary);
uint8_t x;
read(in, x);
//cout << (int)x << endl;

float f;
for(int i = 0; i < count; ++i) {
read(in, f);
//cout << f << endl;
}

in.close();
}

int main() {
read();
}

readSwift.swift:

import Foundation

let count = 4 * 1024 * 1024
//let count = 1

enum E: Error {
case e
}

extension Data {
func convert(at index: Data.Index, to type: T.Type) -> (T, Self.Index) {
let size = MemoryLayout.size
let end = index + size
let view = self[index..<end]
let value = view.withUnsafeBytes { $0.load(as: T.self) }
return (value, end)
}
}

func main() {
guard let handle = FileHandle(forReadingAtPath: "bla.ply") else {
exit(0)
}
let data = handle.readDataToEndOfFile()
var index = data.startIndex

var x: UInt8
(x, index) = data.convert(at: index, to: UInt8.self)
//print(x)

var f: Float
for _ in 0..<count {
(f, index) = data.convert(at: index, to: Float.self)
//print(f)
}
}

main()

lukasa · December 1, 2019, 10:33am

Before we go further, I have to ask: are you compiling with optimisations turned on? (That is, have you passed either -O to swiftc or -c release to swift build or swift run?) If you have not turned on optimisations in Swift it will be hilariously slow.

Assuming your have done that:

Unfortunately these programs are really nothing like each other.

FileHandle.readHandleToEndOfFile is not an equivalent of a C++ iostream: for files at many sizes this will bring the entire file into memory at once.

Similarly, Data.convert is creating slices of Data for no reason: let view = self[index..<end] is entirely unnecessary, as you can instead ask for the offset of index from self.startIndex and then simply use self.withUnsafeBytes and load from the appropriate offset.

However, knowing where the time is going is hard without first profiling the outcome. Try using Instruments Time Profiler to see where you’re spending your time. That may help you determine what your Swift program is doing.

gonsolo · December 1, 2019, 11:16am

Yes, this was with -Ounchecked.

That's via mmap, right? Do you have a better idea how to read binary files byte by byte? I couldn't find anything in the documentation.

Ok, that was it. I changed that to load(fromByteOffset: index, as: T.self) and timings are down from 4.6s to 0.04s.

I would have assumed though that creating a view is not as expensive.

Thanks!

lukasa · December 1, 2019, 12:16pm

The cost of creating a view is very dependent on the data structure itself, and how it is implemented. In general you can assume for a CoW data type like Data that at the bare minimum you will need to encounter some ARC traffic to correctly reflect that mutations to the original Data must not affect your slice and vice versa.

Slices are a very good idea when what you want is to compute on a restricted subset of the data, but when you're simply going to create a pointer and load from it (an operation that is not bounds checked) the slice is just noise.

Data does not promise that: in fact, it rarely promises exactly how it will load things into memory. Sometimes it'll mmap them, sometimes it'll just allocate a buffer and call read. If you want to read byte-by-byte, you are likely to be best served by using either the length-based reading methods on FileHandle (which I would expect to delegate to read under the hood) or to wrap read yourself.

Incidentally, I propose forgetting that -Ounchecked is a thing. While it technically works, removing the assertions around bounds checking eliminates all memory safety in Swift, so you're just writing a weird high-level C instead. -O and -Os are the optimisation levels worth using.

gonsolo · December 1, 2019, 12:39pm

I see.

I read the file via FileHandle.readDataToEndOfFile(), the only other method I see at FileHandle | Apple Developer Documentation is readData(ofLength: Int) which also returns Data. What would be the difference here?

Are you referring to read from Glibc/Darwin? I couldn't find any documentation of that either.

Ok, I was trying out things. It made no difference at all so I will remove that.

Many thanks again,
g

gonsolo · December 1, 2019, 4:35pm

I tested reading from a FileHandle via readData(ofLength) and from Data via Data.startIndex.
The latter is approximately 20x faster. Since I am going to read the whole file anyway I have no problem with using FileHandle.readDataToEndOfFile().

I did not try Glibc.read.

import Foundation

let count = 1 * 1024 * 1024

extension Data {
	func convert<T>(to type: T.Type) -> T {
		return self.withUnsafeBytes { $0.load(as: T.self) }
	}

	func convert<T>(at index: Data.Index, to type: T.Type) -> (T, Data.Index) {
		let value = self.withUnsafeBytes { $0.load(fromByteOffset: index, as: T.self) }
		return (value, index + MemoryLayout<T>.size)
	}
}

func readFromHandle() {
	guard let handle = FileHandle(forReadingAtPath: "bla.ply") else { return }
	let data = handle.readData(ofLength: 1)
	let _ = data.convert(to: UInt8.self)
	for _ in 0..<count {
		let data = handle.readData(ofLength: 8)
		let _ = data.convert(to: Float.self)
	}
}

func readFromData() {
	guard let handle = FileHandle(forReadingAtPath: "bla.ply") else { return }
	let data = handle.readDataToEndOfFile()
	var index = data.startIndex
	(_, index) = data.convert(at: index, to: UInt8.self)
	for _ in 0..<count {
		(_, index) = data.convert(at: index, to: Float.self)
	}
}

//readFromHandle()
readFromData()

Unfortunately

gives a FatalError in debug mode. I guess I have to resort to C for that.

lukasa · December 1, 2019, 9:32pm

No, that API is just a bit awkward because it doesn’t support unaligned loads. You can use this:

let result = data.withUnsafeBytes { src in
    var result: Float = 0.0
    withUnsafeBytes(of: &result) { dst in
        memcpy(dst, src, MemoryLayout<Float>.size)
    }
    return result
}

There is an open swift bug to add an API like load that supports unaligned access.

gonsolo · December 1, 2019, 9:54pm

This does not compile for me:

bla.swift:4:35: error: unable to infer complex closure return type; add explicit type to disambiguate

Adding the type like this

let result: Float = ...

results in

bla.swift:7:21: error: cannot convert value of type 'UnsafeRawBufferPointer' to expected argument type 'UnsafeRawPointer'

This is what I ended up with. This memcpy version is 15x faster than a comparable one based on copyBytes:

func convert<T>(data: Data, at index: Data.Index, to value: inout T) -> Data.Index {
        let size = MemoryLayout<T>.size
        _ = withUnsafeMutableBytes(of: &value) { (valuePointer) -> Void in
                data.withUnsafeBytes { (dataPointer) -> Void in
                        let source = dataPointer.baseAddress! + index
                        let destination = valuePointer.baseAddress!
                        memcpy(destination, source, size)
                }
        }
        return index + size
}

lukasa · December 2, 2019, 7:45am

Ah yes, the baseAddress addition was very necessary. This looks right to me.