I'm trying to read large files as memory efficient and fast as I can.
Right now I am using FileHandle.availableData which is reading in 4KB chunks which is great:
func scan(advance: Int = 1) -> UInt8 {
if data.isEmpty {
return 0
}
if index >= data.count {
return 0
}
let c = data[index]
index += advance
if index == data.count {
index = 0
data = file.availableData
}
return c
}
But performance is not optimal, it takes 1s to read 50MB. Running perf I get:
29,00% bla libswiftCore.so [.] swift_beginAccess
26,66% bla bla [.] $s3bla4scan7advances5UInt8VSi_tF
18,94% bla libFoundation.so [.] $s10Foundation4DataV15_RepresentationOys5UInt8VSicig
5,36% bla ld-2.31.so [.] __tls_get_addr
Is there a way to get rid of "swift_beginAccess"? I'm already compiling with
First: I found InputStream which seems more appropriate.
Platform: Linux
File size: 1GB or more
File type: ASCII text
Now I am using InputStream.read() which seems fast enough and interpreting the resulting UInt8 myself since everything else seems too slow (especially String and Character).
Reading a 1GB file with a InputStream buffer size of 16k takes 2 seconds without further parsing.
If anyone is still interested in the part of the question about swift_beginAccess, it would seem that swift_beginAccess gets called even when enforce exclusivity is set to none (from reading the comments in the definition of swift_beginAccess at swift/Exclusivity.cpp at e37eb35c7c6e9faeabe2bda7de59dc92a718d779 · apple/swift · GitHub). I may be interpreting that incorrectly, but from your evidence it does seem to be the case (unless the compiler option is just broken).
It looks to me like your seek function is part of a class — swift_beginAccess is called because you are accessing members of the class. Making the class final or refactoring it to make it a struct would most likely get rid of the swift_beginAccess calls (not so sure about final, but pretty sure about struct). I'm surprised that the calls had such a high performance impact though given that they short-circuit pretty quickly when exclusivity is not being enforced. Reading more bytes at once could also minimise the performance impact of swift_beginAccess.
I know that the question is quite old and you already found an alternative solution, but I was working on a similar problem and some future reader of the thread might find it useful.