I read the following on stackoverflow. It reads the whole file into memory. It's not what I want. What I need is to read text file line by line.
import Foundation
//place the file Textfile.txt in the executable directory
let file = "TextFile"
if let path = Bundle.main.path(forResource: file, ofType: "txt"){
do {
let data = try String(contentsOfFile: path, encoding: .utf8)
let myStrings = data.components(separatedBy: .newlines)
let text = myStrings.joined(separator: "\n")
print("\(text)")
} catch {
print(error)
}
}
//hello world
//hello earth
$ pwd
/Users/jianhuali/Library/Developer/Xcode/DerivedData/hello_swift-eiidwnzfqfhbzycdgbbyhhodxfjy/Build/Products/Debug
$ ls
TextFile.txt hello_swift hello_swift.swiftmodule
$ cat TextFile.txt
hello world
hello earth
$
You need to read it in line by line, or process it line by line? Because you can read the whole file in String(contentsOfFile: fileName) and then split it on newlines ( .split { $0.isNewline }) if the latter.
If you read a file line-by-line, it’s critical to do user-space buffering to avoid hitting the kernel for each line. Swift doesn’t have that facility, and neither does Foundation.
Implementing line buffering yourself (on top of, say, InputStream or FileHandle) is quite tricky. Rather than rolling my own, I typically use the C standard library for this. That is, I use FILE * with a Swift wrapper. Here’s an example of how you might do that.
A Swift wrapper around C’s `FILE *`
class QFile {
init(fileURL: URL) {
self.fileURL = fileURL
}
deinit {
// You must close before releasing the last reference.
precondition(self.file == nil)
}
let fileURL: URL
private var file: UnsafeMutablePointer<FILE>? = nil
func open() throws {
guard let f = fopen(fileURL.path, "r") else {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno), userInfo: nil)
}
self.file = f
}
func close() {
if let f = self.file {
self.file = nil
let success = fclose(f) == 0
assert(success)
}
}
func readLine(maxLength: Int = 1024) throws -> String? {
guard let f = self.file else {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(EBADF), userInfo: nil)
}
var buffer = [CChar](repeating: 0, count: maxLength)
guard fgets(&buffer, Int32(maxLength), f) != nil else {
if feof(f) != 0 {
return nil
} else {
throw NSError(domain: NSPOSIXErrorDomain, code: Int(errno), userInfo: nil)
}
}
return String(cString: buffer)
}
}
Reading whole file into memory will cause bigger resource problem in
some case.
If the text file is sufficiently big to cause memory problems, you’ll probably want to store it compressed on disk. That becomes a whole different kettle of fish.
Personally, I’d love to see improvements in how Swift handle files, including this specific case. If you think likewise, consider engaging with Swift Evolution.
Funnily enough I needed to do this last week. I first looked at NSFileHandle but had issues with it, I think there is a memory leak somewhere but I've not had time to investigate too deeply. But using it caused any data read to never be released from memory which isn't ideal when reading in a 3.3GB file.
I ended up using the excellent work by the SwiftNIO developers and their version of FileHandle. It's not reading a file line by line but chunk by chunk which you can then subsequently split each chunk down by line. Try looking at the NonBlockingFileIO object, you can then read in the chunked data and get given a ByteBuffer object which you can then in turn use to parse out primitive types or raw bytes. It was probably never intended to read in multiple GB size files (I have a 70GB file I need to test it on soon), but it works great for what I'm doing. A basic macOS app can read through the entire file and never go above 30mb of memory usage if you are just processing the data, but obviously if you start caching then your memory usage will increase. One thing to note is that you will find a HUGE speed difference between running in debug mode compared to releases mode.
@lukasa I was going to message you separately about this, but since this thread came up I don't suppose there is any desire to split out the file IO work from SwiftNIO so it can be used independently of SwiftNIO? It's really nice to use for low level file reading work, but as you can see in my use case I don't really need all of the rest of SwiftNIO.
I'll try SwiftNIO later. Right now, I'm trying Qt. Both Qt and C++ can do line oriented file io, and find substrings at specified position in a string.
Why is it needed to invent so many wheels of swift version
We could do it, but we'd have to remove the use of EventLoopFuture, as to use that requires being able to have an EventLoop and once you have one of those you basically have all the NIO code anyway.
Actually doing the extraction is moderately awkward, as we were pretty free with using our internal abstractions. For example, it uses NIO's FileRegion and FileHandle types, which bring in our _UInt56 type, as well as our Posix syscall wrappers and our IOError struct. It also uses NIO's NIOThreadPool type, which further relies on NIO's CircularBuffer type (and so also _UInt24), and NIO's Lock and Thread types.
All of these things are also used by other parts of NIO, so in order to bring out the NonBlockingFileIO object we'd end up having to create a kind of "NIO helpers" module, not unlike the already existing NIOConcurrencyHelpers module that already has Lock and Thread. This module would be a total grab bag: we'd have some weird integer types, some syscall wrappers, some errors, etc. I think the dependency graph here starts getting pretty awkward.
This means my suggestion would be probably to factor the code out and build a slightly less complex version. The actual logic is fairly simple and relies on relatively few syscalls, so you can probably build a sensible version by dropping some types entirely and just copying out little bits of the others. For example, you could not use CircularBuffer and take the mild performance hit that implies, avoid the weird integer packing we did, write simpler versions of FileHandle and FileRegion (which were never intended for use in this API anyway), and then you just need to pull out the core code and the syscall wrappers for the syscalls you need (open, close, read, write, lseek should be sufficient).
That would give you a much smaller code size story (avoiding bringing all those NIO types with the extra baggage) at the cost of very little performance and no feature size. You could even extract our unit tests and slightly rewrite them. Change the interface to call back on DispatchQueues instead of using EventLoopFutures and you have a really great little macOS/iOS utility library.
This is mostly speculation, but my guess the reason file based operations is still extremely lacking in Swift is that it hasn't been a focal point yet. Most iOS apps probably don't use heavy file based operations like a server or maybe a desktop app. Another reason it's probably not been tackled yet is that a solution would probably be required to support all the OSs that Swift currently supports, which leads to a more complex implementation and API questions.
But I agree with @eskimo that this is something that sorely needs improvement in Swift. Requiring a user to drop to C just to perform this task is asking way too much IMO.
To be honest I thought there might be quite a bit of the complexity you describe to make it an independent module. Just from my very limited experience with using it last week there are a number of components that were very much tied to the event loop etc.
I don't personally have time to look into what it would mean to extract out all of those components right now and simplify certain areas to see what that would look like, and my bet is that you guys are very much busy with other SwiftNIO work also. It would be nice to keep it in mind for future though as it could be a good starting point to build out the IO features lacking in Swift right now. I do want to say that from my limited experience with the small subset of SwiftNIO it's been a pleasure to use, so thanks for all the great work
Thanks!
If the length of the line from data file is larger than maxLength in readLine, it doesn't read the whole line. It still needs to call fgets to read the rest of the line and append it to the contents already read. I may do this later.
It still needs to call fgets to read the rest of the line and append
it to the contents already read.
Be careful doing this. Reading a large file line-by-line puts you in a fundamental bind:
Limiting the line length is, obviously, annoying.
Not limiting the line length exposes you to the possibility that the file might consist of one large line, which undermines the whole point of the line-by-line approach.
The code I posted errs on the side of simplicity, adopting the same policy as fgets (which splits long lines). In real code I adopt one of the following options:
Change the interface to be able to return partial lines. This makes things more complex for the client, but it can deal with long lines correctly.
I'm trying to learn Swift (coming mostly from a Python background). This thread really interests me because some things which are so easy (and well documented!) in Python are for a mostly non-programmer like myself very difficult to discover in Swift.
I was reading your posts above and trying the code in a playground. Suppose a plain text file which contains these lines:
01234567
01234567
01234567
When I call readline() per your QFile example, it always returns all chars up to the specified maxLength. If there's a line break in the text file after only 8 chars, why does it always return maxLength chars?
Note: A few minutes after writing this it occurred to me that the problem is probably line-endings in the file I tested with, ie "\r" vs "\n" or "\r\n". My line endings in this test case appear to be ASCII char 13. What is the 'Swiftest' way to handle line-ending variance? In Python, it's as simple as something along the lines of: with open(filePath, 'rU') as f: ...
Thank you!
let filePath = "/Users/users/myFile.txt"
let f = QFile(fileURL: URL(fileURLWithPath: filePath))
do {
try f.open()
} catch {
print("file open error: \(error)")
}
do {
let line = try f.readLine(maxLength: 64)
print(line ?? "nil")
f.close()
} catch {
print("readline error: \(error)")
}
The code I posted was meant to be as simple as possible, which means it uses fgets. That’s a C library function, so it only deals with C line breaks, which on Apple platforms is \n.
If the string you’re dealing with fits easily in memory, you can support different line break styles with code like this:
let s1 = "Hello\nCruel\nWorld!" // Unix, LF
let l1 = s1.split(whereSeparator: { $0.isNewline })
print(l1) // ["Hello", "Cruel", "World!"]
let s2 = "Hello\rCruel\rWorld!" // Traditional Mac OS, CR
let l2 = s2.split(whereSeparator: { $0.isNewline })
print(l2) // ["Hello", "Cruel", "World!"]
let s3 = "Hello\r\nCruel\r\nWorld!" // Windows, CR LF
let l3 = s3.split(whereSeparator: { $0.isNewline })
print(l3) // ["Hello", "Cruel", "World!"]
If you want to support arbitrary line breaks while streaming a large file, things get trickier (-: