What is the best way to work with the file system?

Currently, there is Apple's Foundation's FileManager (which uses both URLs and String as paths, even though I thought URLs were designed for networks), Apple's Foundation's FileHandle (which is used just for reading and writing), Swift.org's Foundation's FileManager (which is incomplete), and Swift.org's Foundation's FileHandler (which is mostly complete).

Is there a package I am missing that deals with this mess?

4 Likes

If you are developing for Apple platforms you will probably be best served by using FileManager from Foundation for most things. Generally speaking you should use the highest level abstraction that meets your needs.

Also, using URLs for file paths is a much more flexible way to work and has better support with the newer APIs across many libraries.

1 Like

even though I thought URLs were designed for networks

They were? AFAICT file: URLs have been around as long as http: URLs.

Anyway, if you’re curious as to why Foundation works this way, see my explanation here.

Which isn’t to say that everything is peachy when it comes to file system APIs in Swift. As you’ve noted, there’s a lot of room for improvement.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

4 Likes

Depends on what you need. Reading files, writing files, getting file attribute, moving/copying files from one place to another, enumerating dictionaries – are quite different tasks. If that's reading - there are quite a few ways to choose from: URL's resourceBytes, URLSession+dataTask, FileHandle, Data(contentsOf:), dispatchI/O, read, stdio's fread, aio, mmap, Carbon's relics.

I am currently using Apple's file manager and URLs, but to check if a file exists at a path I have to do FileManager.default.contentsOfDirectory(atPath: sourcesDirectory.path(percentEncoded: false)) which is way too long.

According to this, URLs are "networking primitives".

Would the FileManager method func fileExists(atPath path: String) -> Bool do what you need?

If you're using it often with file URLs, you could write an extension method that takes a URL and accesses the path.

Files and network resource are like brother and sister. You could access a network resource as a file (in some cases) and you could access a file as a network resource:

print("start")
let file = URL.temporaryDirectory.appending(component: UUID().uuidString)
let sourceData = "Hello, World".data(using: .utf8)!
try! sourceData.write(to: file)

URLSession.shared.dataTask(with: URLRequest(url: file)) { data, response, error in
    guard let data else { fatalError() }
    precondition(data == sourceData)
    try! FileManager.default.removeItem(at: file)
    print("done")
}.resume()
RunLoop.current.run(until: .distantFuture)
1 Like

Note that using FileManager and URL is highly recommended if you intend to support other platforms (e.g. Windows). There are a lot of shenanigans that FileManager does under the cover to abstract file path limits, handling API differences, semantic differences (file systems are UTF-16 on Windows, not UTF-8), etc. You receive all this for free when you are using the Foundation APIs for file operations.

The URL(fileURLWithPath:) and withUnsafeFileSystemRepresentation are core to ensuring that you properly handle paths, so while it may appear verbose, it serves as a good way to ensure that you can easily move your code around.

I think that this was not given the proper visibility - this is excellent advice and should have been highlighted :slight_smile:

11 Likes

To check if a file exists using a URL, you can use checkResourceIsReachable() | Apple Developer Documentation

Note also that, depending on what you are trying to achieve, you may not need this:

When performing operations such as opening a file or copying resource properties, it is more efficient to simply try the operation and handle failures.

1 Like

Thanks, haven't seen this before. It doesn't return false BTW (contrary to the documentation), instead it throws (whether it's an unsupported URL or an absent file). Makes me wonder why bother returning Bool at all, could have been Void. :sweat_smile:


Edit: there could be a bug or two here:

This is how it is declared in Obj-C:

- (BOOL)checkResourceIsReachableAndReturnError:(NSError **)error NS_SWIFT_NOTHROW

This is how it is imported into Swift:

public func checkResourceIsReachable() throws -> Bool
  • So it throws despite of NS_SWIFT_NOTHROW?
  • And yet it returns Bool even when (BOOL)method:(NSError **)error is normally imported as "func method() throws -> Void" ?
1 Like

File Manager is one of the oldest remaining subsystems of Apple OS'es (it came from an era when macOS was called "System"). We've been throw a series of transitions over the years: vol+name → vol+dir+name → FSSpec → FSRef → NSURL → URL. Now we are in need for another transition → make everything async. Even if the underlying operations are currently synchronous (and thread blocking) the API could still be exposed as async, and when time allows the implementation could be tweaked to be more asynchronous under the hood.

1 Like

It’s been a while since I needed to use it, so had completely forgotten that particular gotcha!

I can see that the swift-corelibs-foundation implementation always throws or returns true.

I’ve filed a Feedback (FB13706122 if anyone from Apple is reading this) about this as a documentation bug.

2 Likes

I just want to throw in more alternative which is NIOFileSystem that we have been developing in the swift-nio package. It works both on Darwin and Linux and provides a concurrency native interface for file system operations. The module is currently underscored _NIOFileSystem since we are still polishing the API when we get new user feedback but it is considered to be relatively stable.

5 Likes

@tera

And in the end, the swift-corelibs-foundation implementation of checkResourceIsReachable() ends up calling FileManager.default.fileExists(atPath: path) anyway.

Sure thing, that's how it usually is. Rarely there would be more than one "bottom" implementation which every other "helper" on top is calling through directly or indirectly.

This is how async version of this call could be implemented (simplified version with error checking removed):

extension URL {
    
    func exists1() async -> (exists: Bool, isDirectory: Bool) {
        var isDirectory: ObjCBool = false
        let exists = FileManager.default.fileExists(atPath: path, isDirectory: &isDirectory)
        return (exists: exists, isDirectory: isDirectory.boolValue)
    }
    
    func exists2() async -> (exists: Bool, isDirectory: Bool) {
        await withCheckedContinuation { continuation in
            someQueue.async {
                var isDirectory: ObjCBool = false
                let exists = FileManager.default.fileExists(atPath: path, isDirectory: &isDirectory)
                continuation.resume(returning: (exists: exists, isDirectory: isDirectory.boolValue))
            }
        }
    }
}

exists1 is a simple stub implementation, exists2 is a more proper "don't block current thread" implementation.


Edit:

The recently introduced DirectoryHint.checkFileSystem mode of operation while cleans things in certain aspects makes the potential async API's more complicated (checkFileSystem needs to be async while other modes could be served immediately).

It might be possible to keep the look of the current API by minimally breaking it and changing it to be more async friendly, for example:

// REDO: using stub names with `1` and `2` suffixes here
enum DirectoryHint1 { case isDirectory, notDirectory, inferFromPath }
enum DirectoryHint2 { case checkFileSystem }

func appending(path: S, directoryHint: DirectoryHint1 = .inferFromPath) -> URL
func appending(path: S, directoryHint: DirectoryHint2) -> URL

with the following async version:

func appending(path: S, directoryHint: DirectoryHint2, execute: @escaping (URL) -> Void)
func appending(path: S, directoryHint: DirectoryHint2) async -> URL

leaving the DirectoryHint1 methods only available in sync (immediate) form.

cc @icharleshu

I recently looked into this a bunch. I tried to do a complete survey of all the possible APIs.

I went with FileManager.default.fileExists for my checks as well, but my version of async was just to call the function async and then... do nothing. ("I'll get back to it". Ha. )

I'm chiming in mostly because I saw the SwitNIO stuff when I was poking around and it does look great! That and

This is would be a great future direction!

If one had improvements to add... which API would be the most fruitful to learn better?

1 Like

What is the quickest way to read a large file?

That's `mmap` (1x or base time)
func readWithMMap(_ url: URL, callback: (UInt8) -> Void) async {
    let file = open(url.path, O_RDONLY)
    var s = stat()
    fstat(file, &s)
    let size = Int(s.st_size)
    let bytes = mmap(nil, size, PROT_READ, MAP_PRIVATE, file, 0)!.assumingMemoryBound(to: UInt8.self)
    for i in 0 ..< size {
        callback(bytes[i])
    }
    munmap(bytes, size)
    close(file)
}
Followed by reading / "freading" file by large chunks (1.2x base time)
func readFileByBlocks(_ url: URL, callback: (UInt8) -> Void) async {
    let blockSize = 10*1024
    let file = open(url.path, O_RDONLY)
    let mem = malloc(blockSize)!.assumingMemoryBound(to: UInt8.self)
    while true {
        let count = read(file, mem, blockSize)
        for i in 0 ..< count { callback(mem[i]) }
        if count < blockSize { break }
    }
    free(mem); close(file)
}

func freadFileByBlocks(_ url: URL, blockSize: Int = 10*1024, callback: (UInt8) -> Void) async {
    let file = fopen(url.path, "rb")!
    let mem = malloc(blockSize)!.assumingMemoryBound(to: UInt8.self)
    while true {
        let count = fread(mem, 1, blockSize, file)
        for i in 0 ..< count { callback(mem[i]) }
        if count < blockSize { break }
    }
    free(mem); fclose(file)
}
Followed by Data(contentsOf:) (1.4x base time)
func readWithData(_ url: URL, callback: (UInt8) -> Void) async {
    let data = try! Data(contentsOf: url)
    let size = data.count
    data.withUnsafeBytes { bytes in
        let p = bytes
        for i in 0 ..< size {
            callback(p[i])
        }
    }
}
Followed by URL.resourceBytes, well done! (2.8x base time)
func readWithResourceBytes(_ url: URL, callback: @escaping (UInt8) -> Void) async {
    do {
        for try await byte in url.resourceBytes {
            callback(byte)
        }
    } catch {
        fatalError()
    }
}
Followed by URLSession.data (7.5x base time)
func readWithURLSessionAsync(_ url: URL, callback: @escaping (UInt8) -> Void) async {
    let data = try! await URLSession.shared.data(from: url).0
    let size = data.count
    data.withUnsafeBytes { bytes in
        let p = bytes
        for i in 0 ..< size {
            callback(p[i])
        }
    }
}
Followed by URLSession.dataTask, no idea why it is slower (30x base time)
func readWithURLSession(_ url: URL, callback: @escaping (UInt8) -> Void) async {
    await withCheckedContinuation { continuation in
        URLSession.shared.dataTask(with: URLRequest(url: url)) { data, response, error in
            let size = data!.count
            data!.withUnsafeBytes { bytes in
                let p = bytes
                for i in 0 ..< size {
                    callback(p[i])
                }
            }
            continuation.resume(returning: ())
        }.resume()
    }
}
Followed by "freading" file byte by byte (130x base time)
func freadFileByBytes(_ url: URL, callback: (UInt8) -> Void) async {
    let file = fopen(url.path, "rb")!
    while true {
        var byte: UInt8 = 0
        let count = fread(&byte, 1, 1, file)
        if count < 1 { break }
        callback(byte)
    }
    fclose(file)
}
Followed by reading file byte by byte – no buffering (2900x base time)
func readFileByBytes(_ url: URL, callback: (UInt8) -> Void) async {
    let file = open(url.path, O_RDONLY)
    while true {
        var byte: UInt8 = 0
        let count = read(file, &byte, 1)
        if count < 1 { break }
        callback(byte)
    }
    close(file)
}
Then something is seriously wrong with this AsyncStream implementation (7600x base time)

see the fragment at the bottom of this post.

Rest of the code if you want to try it
func test() {
    Task {
        let url = URL.temporaryDirectory.appending(component: UUID().uuidString)
        let data = Data(repeating: 0xAD, count: 100_000_001)
        try! data.write(to: url)
        
        let base = await measure(url, nil, "readWithMMap", readWithMMap)
        await measure(url, base, "readFileByBlocks", readFileByBlocks)
        await measure(url, base, "freadFileByBlocks", readFileByBlocks)
        await measure(url, base, "readWithData", readWithData)
        await measure(url, base, "readWithResourceBytes", readWithResourceBytes)
        await measure(url, base, "readWithURLSessionAsync", readWithURLSessionAsync)
        await measure(url, base, "readWithURLSession", readWithURLSession)
        await measure(url, base, "freadFileByBytes", freadFileByBytes)
        await measure(url, base, "readFileByBytes", readFileByBytes)
        await measure(url, base, "readWithAsyncStream", readWithAsyncStream)
        try! FileManager.default.removeItem(at: url)
        print("done")
    }
}

@discardableResult func measure(_ url: URL, _ base: Double?, _ title: String, _ execute: (URL, @escaping (UInt8) -> Void) async -> Void) async -> Double {
    let start = Date()
    var result = 0
    await execute(url) { result &+= Int($0) }
    let elapsed = Date().timeIntervalSince(start)
    precondition(result == 17300000173)
    let elapsedString = String(format: "%.3f", elapsed)
    let factorString = String(format: "%.3f", elapsed / (base ?? elapsed))
    print("\(title): \(elapsedString)sec, \(factorString)x")
    return elapsed
}

test()
RunLoop.current.run(until: .distantFuture)

This is the one that takes ridiculous amount of time (7600x base time):

func readWithAsyncStream(_ url: URL, callback: @escaping (UInt8) -> Void) async {
    let stream = AsyncStream<UInt8> { continuation in
        let data = try! Data(contentsOf: url)
        let size = data.count
        data.withUnsafeBytes { bytes in
            let p = bytes
            for i in 0 ..< size {
                continuation.yield(p[i])
            }
        }
        continuation.finish()
    }
    for await byte in stream {
        callback(byte)
    }
}

Don't know if async stream / continuations are slow or I am using them wrongly.

All tests were done with -O and there was no significant difference between mac and iOS device.

22 Likes

That‘s a great, comprehensive, summary. Thanks. I really wonder though what‘s wrong with async here…

I don’t know that there is anything wrong with it. By using an async method you are telling the system that the speed with which the operation returns isn’t something that you are particularly worried about.

Because of that it is going to take its own sweet time in performing the read while doing things like coalescing IO reads and conserving power budget.