Before I start of my experience and opinions on the matter, I want to refer to the forum post that sparked this post. Why must i build ALL of NIOCore just to say the name ByteBuffer? by @taylorswift
After this post, I had a follow-up conversation with @glessard and we discussed the need for more/different bag-of-bytes types in Swift. I didn't immediately get back to him, but I finally kicked off the discussion with him after the presentation by @Tony_Parker on Server-Side Swift conference and related forum post: What’s next for Foundation
I commented that it's time to seriously talk about Data. I don't mean the specific Data
type seen in Foundation, I mean the general way Swift developers interact with bytes. And because there's a whole set of use cases that impact the design, I've written out my experience in both open source and closed source development in Swift. Note that these are my experiences and opinions, and it's important to hear more perspectives. I do not think nor hope to have a one-fits-all solution for binary data.
Below is a mostly-intact message I've sent in my conversation with @glessard.
Over the last 7 years of Swift development, I have a large variety of use cases for which I need(ed) a "bag of bytes" type. I think the best way to approach this is to first cover all the use cases that need "a" bag of bytes type, and then see how and if we can find an agreement between these requirements. I'll mostly disregard what's already possible, for the sake of completeness. This'll also reflect upon the wider ecosystem, including NIO. This could then evolve in either one or many types covering different needs. We also need slices of buffers that don't make copies, and copy-on-write semantics. Again, we're all good here.
The first, and most common, is protocol (de-)serialization. I need control over endianness, writer and reader indices, consuming and non-consuming reads. I also need control over (re)allocations sometimes. I think SwiftNIO's ByteBuffer is just amazing here, especially combined with ByteBufferAllocator.
Second use case is for even lower-level systems. In some of my projects I need tight control over memory (de)allocations, and because we're working with Apple's APIs in these projects, we actually are running into a couple tough situations. First of all, iOS can have tight memory limits, especially in app extensions. We don't want to allocate new buffers for each message we transfer, so we'll need something like a memory arena. In addition, we want to apply back pressure if a buffer is not yet available. We've built our own types for this right now. The main 'issue' here initially was that Apple's APIs use Data. So we ended up wrapping our arena's pointers in bytesNoCopy with a custom deallocator function that returns the region to the arena.
Next up, when working with iOS, you'll work with (NS)Data more often than not right now. But assuming we don't need to take that into consideration, it's fundamental that we could - at least to some degree - interchangeably use contiguous binary data types. This is more of an API design thing than anything else, but I do think it's essential if you want to support various different use cases without making one or more copies of your data.
Another use case, more so on the server, is static file serving. Ideally you'd stay within kernel-space, something you can do with NIO's IOData/FileRegion. Regardless, I think this use case is not to be forgotten, as it's a huge boost when your TCP stack is not in user space.
In many libraries, end users of mine receive their data over an HTTP(2), database or other TCP-based connection. In these cases, we're working with ByteBuffer. One or more TCP packets can form a single application layer message, like an HTTP body. This can be seen as a contiguous blob, mainly for databases, or a stream like is ideal for HTTP bodies. Regardless, I think it's important to be able to accumulate information in a buffer with a corresponding capacity, since many times you'll know you the final payload size before the arbitrarily sized storage starts. Having this contiguous blob with a predefined capacity, prevents (re)allocs. We could then reuse this message buffer for the next message in line.
Likewise, many protocols have a hard limit on the size of a message. I think it would be great if we could reuse buffers for that. Often enough, a message is processed directly, into say a data structure of the user's or library's choice. Many times this'll happen through Codable. Many structured data formats, including MongoDB BSON and MySQL, will make slices of data and pass them around a Decoder as well. Decoding for these formats tends to use non-consuming reads, though parsing strategies will vary. BSON specifically is a type that contains itself recursively. The main type (Document) is closely resembling a JSON Object type, and can contain another Document, just like JSON Objects can contain other objects.
A lot of issues with ByteBuffer here is the fact we have to bundle the whole of NIO, mainly because of binary size. Many people want to use (my) libraries for things like XML, JSON and BSON on other platforms, but don't want to depend on the whole of SwiftNIO as well. This becomes vastly more important when talking to people doing WASM and/or embedded Swift.
I think this mostly boils down to reducing copies/allocations within both Swift and between kernel- and user space applications. Reducing copies between your apps and the libraries created in both Apple's and Linux' ecosystems (not ignoring the work on Windows either!)
Likewise, we need to consider while they're all basically a bag-of-bytes, the public APIs, (de/re)allocations and various other details can greatly differ. We don't want to import an unnecessarily huge library for this use case, but we do want to use our bytes where we can.
Now I don't think I have a good solution to all these needs, and I can't say I've really tried. But I do know there's a huge demand for this, not just by myself but a large set of Swift developers. I'm happy to put effort into (contributing to) designing various solutions to these problems and use cases listed above, and those that I'm sure will be commented below. Before any large effort gets kickstarted, I'd love to hear other opinions on the matter.
Finally some thanks to @johannesweiss and @lukasa for their work on SwiftNIO, and a polite request for your experiences and opinions on the matter as well.