To/from memory-mapped file?

I have compiled a program represented by a custom binary format. It lives in memory as an UnsafeRawBufferPointer, but I would like to save it as a file.myformat on disk. I'll also need a way to obtain a read-only view of the file's contents. In short:

1. UnsafeRawBufferPointer -> file.myformat
2. file.myformat -> UnsafeRawBufferPointer

What is the recommended way to go to and from a memory-mapped file in Swift?

Memory-mapped files are not supported on all platforms that Swift supports. Would you be able to clarify if you're looking for a fully cross-platform solution? Otherwise which subset of platforms would you need?

Thanks for the clarification! I’m targeting iOS and macOS, so full cross-platform support isn’t necessary at this stage. If there are any platform-specific considerations or best practices for these platforms, I’d appreciate hearing about them.

On Apple platforms you have to be very careful with memory mapping. If the targeted file is stored on a volume that can ‘go away’ — for example, on an external drive or a network volume — then memory mapping isn’t safe because accessing a file that’s gone away triggers a machine exception [1], which isn’t something you can reasonably handle [2].

Still, if you want to go down this path there are two standard APIs for mapping a file:

  • mmap — See the mmap man page.

  • Data.init(contentsOf:options:) — Using either .mappedIfSafe or .alwaysMapped option.

You go the other way using your file system API of your choice: write, fwrite, System framework, FileHandle, and so on.


IMO memory mapping is overused. I see two primary motivations here:

  • Unix Loreℱ is that memory mapping is the only way to do no-copy reads and writes. That’s never been true on Apple platforms, and hasn’t be true on most Unix-y platforms for decades.

  • Folks think it’ll be convenient because they can define a language structure that maps to their data structure and then they’re just reading and writing fields rather than doing I/O. That rarely works out as well as you might hope [3].

And memory mapping has a lot of drawbacks:

  • The big one is the safety issue I’ve discussed above.

  • If the file is large, you have to worry about address space issues on iOS.

  • And on 32-bit platforms [4].

  • If you’re streaming through a large file you end up running all your I/O through the buffer cache, which is much less efficient than doing no-copy reads and writes.

  • If you’re accessing the file at random, those accesses put pressure on the VM system which cause other pages to get evicted, which might result in your I/O being fast but the rest of the system suffering.

Which isn’t to say that this approach is always wrong, just that folks ofter try to use it when it’s not appropriate.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] Using the terminology defined here.

[2] I talk about that problem a lot in Implementing Your Own Crash Reporter.

[3] This technique has serious problems:

  • It only works if you defined the language structure in C, because Swift doesn’t give you a way to define the layout of its structures.

  • Even then, C’s structure layout in more of a shared hallucination than something defined by the standard.

  • And you still have to deal with alignment issues.

  • And byte ordering.

[4] In the Apple ecosystem there is one remaining 32-bit platform, watchOS.

12 Likes

Agree with all you say in general, but there are some other approaches for [3] too - e.g. we use Flatbuffers as the on-disk and on-the-wire representation for our cached data (and uses LMDB for the mmap access to it) - for our use case that is working pretty nicely (but then, we also want to do sparse access of fields for e.g. filtering and wanted a zero-copy format that allows direct field acccess without parsing the full message). That can all be done in Swift, but is admittedly a fairly special usage.

Hi, Quinn! Thanks for pointing me in the right direction. I appreciate the time you took to respond. Your knowledgeable and candid perspective is invaluable.

Unix Loreℱ is that memory mapping is the only way to do no-copy reads and writes. That’s never been true on Apple platforms, and hasn’t been true on most Unix-y platforms for decades.

My use case only requires read access and a pointer to the file’s contents. Is there a more modern or platform-preferred alternative for zero-copy file access? As far as I can tell, I’ll avoid the memory-mapping hazards you pointed out. Thanks again for your insights!

[...] we use Flatbuffers as the on-disk and on-the-wire representation for our cached data (and use LMDB for the mmap access to it) [...] we also want to do sparse access of fields [...] and [...] zero-copy [...].

While my use case differs from Hassila’s, I’m also working with a zero-copy format and sparse access patterns. On the surface, memory mapping appears to be the right tool for the job. That said, I’d be interested in reading any alternative suggestions you or others might have!

The alternative is to allocate a buffer and read data into that. Or allocate a pool of buffers and read chunks of data into those.

In terms of transferring data between a file and a buffer, the mechanism to do that varies by platform. On Apple platforms you use fcntl to set F_NOCACHE, but this is only effective under specific circumstances. A good rule of thumb is that everything should be page aligned, that is, the buffer, the transfer length, and the offset into the file.

As to whether memory mapping is the right tool for the job, there are specific show stoppers to watch out for:

  • Is there any chance the file can go away [1]?
  • Or be bigger than your available address space? [2]
  • Are you streaming through the file from end to end?

If the answer to any of those is “Yes”, then I’d stay away from memory mapping. If not, you then get to evaluate secondary criteria, profile a prototype, and so on.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] I recently discovered MAP_RESILIENT_MEDIA, which will cause the mapping to return zeroes rather than trigger a machine exception if the file goes away. You could imagine designing a file format that took advantage of that option but, sheesh, that’s gonna be full of pitfalls.

Oh, and this was added in macOS 10.11, just about a decade ago, so a) there’s no worries about back deployment, and b) haven’t learnt about it only recently, I clearly missed a memo (-;

[2] If you’re address space constrained you could, in theory, set up one or more windows on the file and move them around. I suspect that would really hurt performance, but I must admit to having never tried it.

5 Likes

Good to know there's a limit. Checked just now - it's around 5GB on my iPhone.

This looks quite unsafe option to me, e.g. I'm copying a file and instead of (safely) crashing the app continues copying zeroes, then I think all is good and done and tell the other guy on the remote end to delete the original file :person_facepalming:

it's around 5GB on my iPhone.

Cool.

But for context, I regularly saw folks bump into this limit at around the 500 MB mark, even on 64-bit devices. That’s because page mapping tables have a cost, and so each address space was artificially limited to reduce that cost for folks who didn’t need large amounts of address space. The com.apple.developer.kernel.extended-virtual-addressing entitlement allowed you to raise that limit.

This looks quite unsafe option to me

Well, it certainly can be. I can only imagine this option being useful if the file format were designed with it in mind.

I'm copying a file

If you’re using memory mapping to copy files, you’re doing it wrong. Specifically, you’re failing at the “Are you streaming through the file from end to end?” test.

But, yeah, your general point is correct. Using this option without due care and attention could easily result in data loss.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

3 Likes