I'm trying to read binary ply files with mixed and/or unaligned UInt8 and Float values. For a 16Mb (4 Million Floats) file the C++ takes 0.09s to read whereas the Swift version nearly takes 5s. Is it possible to make this faster? Right now I'm using something like this:
extension Data {
func convert(at index: Data.Index, to type: T.Type) -> (T, Self.Index) {
let size = MemoryLayout.size
let end = index + size
let view = self[index..<end]
let value = view.withUnsafeBytes { $0.load(as: T.self) }
return (value, end)
}
}
Here are my files:
common.h:
const int count = 4 * 1024 * 1024;
write.cc:
#include #include #include "common.h"
using namespace std;
template
void write(ostream& out, T value) {
out.write((char*)&value, sizeof(value));
}
float f;
for(int i = 0; i < count; ++i) {
read(in, f);
//cout << f << endl;
}
in.close();
}
int main() {
read();
}
readSwift.swift:
import Foundation
let count = 4 * 1024 * 1024
//let count = 1
enum E: Error {
case e
}
extension Data {
func convert(at index: Data.Index, to type: T.Type) -> (T, Self.Index) {
let size = MemoryLayout.size
let end = index + size
let view = self[index..<end]
let value = view.withUnsafeBytes { $0.load(as: T.self) }
return (value, end)
}
}
func main() {
guard let handle = FileHandle(forReadingAtPath: "bla.ply") else {
exit(0)
}
let data = handle.readDataToEndOfFile()
var index = data.startIndex
var x: UInt8
(x, index) = data.convert(at: index, to: UInt8.self)
//print(x)
var f: Float
for _ in 0..<count {
(f, index) = data.convert(at: index, to: Float.self)
//print(f)
}
}
Before we go further, I have to ask: are you compiling with optimisations turned on? (That is, have you passed either -O to swiftc or -c release to swift build or swift run?) If you have not turned on optimisations in Swift it will be hilariously slow.
Assuming your have done that:
Unfortunately these programs are really nothing like each other.
FileHandle.readHandleToEndOfFile is not an equivalent of a C++ iostream: for files at many sizes this will bring the entire file into memory at once.
Similarly, Data.convert is creating slices of Data for no reason: let view = self[index..<end] is entirely unnecessary, as you can instead ask for the offset of index from self.startIndex and then simply use self.withUnsafeBytes and load from the appropriate offset.
However, knowing where the time is going is hard without first profiling the outcome. Try using Instruments Time Profiler to see where you’re spending your time. That may help you determine what your Swift program is doing.
The cost of creating a view is very dependent on the data structure itself, and how it is implemented. In general you can assume for a CoW data type like Data that at the bare minimum you will need to encounter some ARC traffic to correctly reflect that mutations to the original Data must not affect your slice and vice versa.
Slices are a very good idea when what you want is to compute on a restricted subset of the data, but when you're simply going to create a pointer and load from it (an operation that is not bounds checked) the slice is just noise.
Data does not promise that: in fact, it rarely promises exactly how it will load things into memory. Sometimes it'll mmap them, sometimes it'll just allocate a buffer and call read. If you want to read byte-by-byte, you are likely to be best served by using either the length-based reading methods on FileHandle (which I would expect to delegate to read under the hood) or to wrap read yourself.
Incidentally, I propose forgetting that -Ounchecked is a thing. While it technically works, removing the assertions around bounds checking eliminates all memory safety in Swift, so you're just writing a weird high-level C instead. -O and -Os are the optimisation levels worth using.
I read the file via FileHandle.readDataToEndOfFile(), the only other method I see at FileHandle | Apple Developer Documentation is readData(ofLength: Int) which also returns Data. What would be the difference here?
Are you referring to read from Glibc/Darwin? I couldn't find any documentation of that either.
Ok, I was trying out things. It made no difference at all so I will remove that.
I tested reading from a FileHandle via readData(ofLength) and from Data via Data.startIndex.
The latter is approximately 20x faster. Since I am going to read the whole file anyway I have no problem with using FileHandle.readDataToEndOfFile().
I did not try Glibc.read.
import Foundation
let count = 1 * 1024 * 1024
extension Data {
func convert<T>(to type: T.Type) -> T {
return self.withUnsafeBytes { $0.load(as: T.self) }
}
func convert<T>(at index: Data.Index, to type: T.Type) -> (T, Data.Index) {
let value = self.withUnsafeBytes { $0.load(fromByteOffset: index, as: T.self) }
return (value, index + MemoryLayout<T>.size)
}
}
func readFromHandle() {
guard let handle = FileHandle(forReadingAtPath: "bla.ply") else { return }
let data = handle.readData(ofLength: 1)
let _ = data.convert(to: UInt8.self)
for _ in 0..<count {
let data = handle.readData(ofLength: 8)
let _ = data.convert(to: Float.self)
}
}
func readFromData() {
guard let handle = FileHandle(forReadingAtPath: "bla.ply") else { return }
let data = handle.readDataToEndOfFile()
var index = data.startIndex
(_, index) = data.convert(at: index, to: UInt8.self)
for _ in 0..<count {
(_, index) = data.convert(at: index, to: Float.self)
}
}
//readFromHandle()
readFromData()
Unfortunately
gives a FatalError in debug mode. I guess I have to resort to C for that.
No, that API is just a bit awkward because it doesn’t support unaligned loads. You can use this:
let result = data.withUnsafeBytes { src in
var result: Float = 0.0
withUnsafeBytes(of: &result) { dst in
memcpy(dst, src, MemoryLayout<Float>.size)
}
return result
}
There is an open swift bug to add an API like load that supports unaligned access.