What is the best way to work with the file system?

I don't think so based on the timing of:

for try await byte in url.resourceBytes { ... }

which is amazingly fast (for file URL's at least). I'd love to see its source.

Well, that implicitly rules out OSes outside of Darwin (and maybe Linux in some cases). Has this been designed with Windows support in mind or would it cause a massive restructuring of the design to deal with the different semantics?

Our intention was to design a cross platform API. Under the hood we are using swift-system however all the APIs that we are exposing are focus on concrete actions such as creating a file or reading a file. We would love to see contributions that add support for Windows.

1 Like

That's not a bug; it's the correct translation. The BOOL return is technically whether the method succeeds or not (which happens to be aligned with whether the resource is reachable or not). If it returns NO then error is populated with an explanation. That's standard practice regarding NSError - from before Swift even existed - and translates into Swift as simply throwing or not.

They tend to be slow…

…although that's not really the intent or necessarily intrinsic. Ideally async code is effectively as fast as sync code. However, it tends to rely much more heavily on the optimiser successfully removing a bunch of boilerplate and abstractions, and the Swift compiler's just not that good [yet].

That said, I don't foresee async approaches being quite as performant and efficient as sync code anytime soon. User-space continuations, as async/await in Swift uses, are technically redundant in most cases because the kernel is already using continuations anyway, for the underlying syscalls like select or read. So in that sense async code has more overhead vs a simpler direct [sync] syscall or libc call.

Hypothetically user-space task switching can be more efficient than kernel-based equivalents (i.e. thread context switching), so it is certainly possible for async code to in whole be more performant. But ultimately I think that's still a bit bottlenecked by missed optimisation opportunities, re. the compiler (if not also poor implementations of continuations, like using an Array for continuation FIFO - whereas the kernel is pretty heavily optimised and written in C/C++ which makes it substantially easier to obtain good performance).

Tangentially, the better way to improve I/O performance (both latency and throughput) is to remove the kernel from the I/O path entirely. This is done a lot in server systems, for example, using memory-mapped device I/O such that userspace code just talks directly to storage and network hardware. To my knowledge none of Apple's CPUs support this, though.

There were plenty of changes for NIO that enabled Windows support but were not allowed to be merged due to concerns over CI. I'd love to see the changes merged irrespective of that so that libraries like this could be adopted, but until such a time, I think that it is important to recognize that it is not always possible to use the tools at hand.

3 Likes

Let's see. For this Obj-C:

@interface TestClass: NSObject
- (BOOL)foo1:(NSError **)error NS_SWIFT_NOTHROW;
- (BOOL)foo2:(NSError **)error;
@end

I am normally getting this Swift API:

class TestClass: NSObject {
    func foo1(_ error: NSErrorPointer) -> Bool
    func foo2() throws
}

callable as:

let c = TestClass()
var nsError: NSError?
let x: Bool = c.foo1(&nsError)
let y: Void = try! c.foo2()

Note: it's either Bool + NSErrorPointer or Void + throw. The two errors I am talking about above are:

  1. With checkResourceIsReachableAndReturnError we are getting a mixture of Bool + throw (which normally doesn't happen).

  2. NS_SWIFT_NOTHROW of checkResourceIsReachableAndReturnError is getting ignored (which normally doesn't happen).

(plus the 3-rd error in the documentation that incorrectly states that false is being returned (in Swift API) if there's no file. technically it's correct for obj-c API, but not Swift – we are getting a throw in that case, and the only thing we are ever getting as a return value is true.)

Oh yes, my mistake - I missed that it was importing it as returning bool.

Ah yes. Per the docs:

You use the NS_SWIFT_NOTHROW macro on Objective-C method declarations that produce an NSError to prevent it from being imported by Swift as a method that throws.

So I suspect someone applied this by mistake, thinking it meant that function doesn't throw Objective-C exceptions.

What is the quickest way to read a large file?

These results reinforce the idea that it’s always best to access a file by memory mapping. That’s not a statement I agree with. There are other factors to consider.

First is error handling. If you access a file via memory mapping and you get an I/O error, that translates to a machine exception that is effectively uncatchable. Thus, it’s only safe to memory map a file if it’s on a volume where a failure on that volume means that all bets are off. On Apple platforms that means the system volume and the volume containing your home directory [1].

Second, accessing files by memory mapping puts pressure on the page cache [2]. Even if your access is faster, you have to consider the impact it has on other code running on your system.

Note Caching also makes this hard to benchmark. You have to make sure that none of the target file is in the cache before each iteration of your test.

Finally, there’s a misconception that memory mapping is the only way to do ‘no copy’ file access. That’s not true. On Apple platforms you have F_NOCACHE [3]. On Linux you have io_uring and, if not that, O_DIRECT.

If you’re interested in this stuff then I’m gonna recommend that you have a listen to Software Unscripted: Implementing Databases with Glauber Costa. Indeed, it’s what reminded me that I want to reply here (-:

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] On macOS, “your” refers to the user. On iOS and its child platforms, “your” refers to the app.

[2] On Apple platforms this is known as the unified buffer cache (UBC), because it’s used for both the page cache and the file system cache.

[3] To get the most out of this:

  • Make sure your transfers are page aligned within the file.

  • And have a length that’s an even multiple of a page.

  • And that your buffer is page aligned in memory.

13 Likes

Also, if you access data multiple times [at the same location] in a memory-mapped file, it may mutate out from under you, whereas if you do linear I/O and save the bits into your own private memory, it will not. This is not usually an issue for most apps and most use cases, but something to keep in mind if you're accessing a shared resource, or are doing something security-sensitive with the file's contents. Probably you have mutual exclusion issues anyway in those cases, but it can make the failure modes even worse if your program logic has redundant reads that can dynamically diverge.

Not that I'm against mmap, to be clear, just noting that additional consideration.

FWIW, years ago I experimented a lot with file I/O methods on MacOS X, and long story short I found that the fastest approach was Dispatch I/O. Closely followed by plain old read. I don't recall why Dispatch I/O was faster than read, and that is a bit unintuitive, but it might be that Dispatch I/O tweaks the file descriptor's configuration (e.g. with fcntl) in some way that improves performance by default.

I recall trying mmap and finding it to be surprisingly slower. I'm not sure I ever got to the bottom of why, but hazarding a guess it might be that mmap doesn't benefit from kernel-side prefetching in the same way as linear file I/O? MacOS X by default reads 1 MiB ahead (or at least it used to, going back a few years) when you use file descriptors. It's possible that for memory-mapped files you don't get any more prefetching that the CPU's cache prefetchers?

3 Likes

For the record, I didn't claim that it is best to access a file by memory mapping (always or otherwise)... Only that it is fastest (at least in the quoted tests). "Fastest" != "bestest".

temporaryDirectory is available in iOS 16 only

Point of order: NSFileManager dates all the way back to NeXTSTEP and didn't exist on Classic Mac OS. :grin:

4 Likes

gerchicov-vg wrote:

temporaryDirectory is available in iOS 16 only

If you need go back further, use the FileManager property of the same name. For example:

extension URL {
    
    static var temporaryDirectoryCompat: URL {
        if #available(iOS 16.0, *) {
            self.temporaryDirectory
        } else {
            FileManager.default.temporaryDirectory
        }
    }
}

Point of order: NSFileManager dates all the way back to NeXTSTEP and didn't exist on Classic Mac OS.

File Manager did exist on traditional Mac OS, but it was a completely different thing. Prior to Swift you'd say File Manager and NSFileManager to distinguish the two. But then Swift went and renamed NSFileManager to FileManager and confusion ensued.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

9 Likes

Yep, even before it was called "Mac OS". And it still works today! :sweat_smile:

#include <stdio.h>
#include <stdio.h>
#include <CoreServices/CoreServices.h>

#pragma GCC diagnostic ignored "-Wdeprecated-declarations"

int main(int argo, const char * argv[]) {
    FSRef folder, file;
    printf("test start\n");
    const UInt8* folderPath = (const UInt8*)"/private/tmp";
    const UInt8* filePath = (const UInt8*)"/private/tmp/ABC";
    OSErr err = FSPathMakeRef(filePath, &file, nil);
    if (!err)
        FSDeleteObject(&file);
    err = FSPathMakeRef(folderPath, &folder, nil);
    const UniChar name[] = { 0x41, 0x42, 0x43 };
    err = FSCreateFileUnicode(&folder, sizeof(name) / sizeof(*name), name, 0, nil, &file, nil);
    assert(err == noErr);
    FSForkIOParam pb = {};
    pb.ref = &file;
    pb.permissions = fsRdWrPerm;
    err = PBOpenForkSync(&pb);
    assert(err == noErr);
    int ref = pb.forkRefNum;
    printf("%d\n", ref);
    UInt8 buffer[10];
    const char* s = "Hello, World!";
    err = FSWriteFork(ref, fsFromStart, 0, strlen(s), s, nil);
    assert(err == noErr);
    err = FSReadFork(ref, fsFromStart, 0, 5, buffer, nil);
    assert(err == noErr);
    assert(strcmp((const char*)buffer, "Hello") == 0);
    FSCloseFork(ref);
    printf("%s\n", buffer);
    err = FSDeleteObject(&file);
    assert(err == noErr);
    printf("test done\n");
    return 0;
}

PS. I wasn't able using a more ordinary FSOpen / HOpen, etc, somehow they are not visible to the linker (even though I could see them in the header). And all of those calls (including the calls in the sample above) marked deprecated, perhaps some calls are more deprecated than the others.

5 Likes

Nice.

Is the following possible somehow?

extension URL {
    // some availability mantra here to indicate that this is for pre iOS 16 only
    static var temporaryDirectory: URL {
        FileManager.default.temporaryDirectory
    }
}

Is the following possible somehow?

I don’t think so.

Even if it were, I’m not sure it’s a good idea. I’ve found that it’s better to be clear than clever in situations like this.

Oh, one thing I should’ve mentioned: You can mark temporaryDirectoryCompat as deprecated when the deployment target is iOS 16 or later. That practice lets you avoid an unbounded build up of compatibility shims like this.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

2 Likes