Hi Everyone,
A while back I ran into this NIOTooManyBytes error while working with Swiftly and managed to fix it there. I'm here to report back what I learned so that it might help others who run into this. This was a novel class of error for me.
As a very quick background about myself, I have some experience with writing http endpoints in both rest and rpc styles in other languages. Older Java frameworks directly exposed request and response with their readers and writers. They had document object model based parsers for json and xml. Newer ones in Java and Go are based around marshalling and un-marshalling payloads using provided data types and reflection, saving the developer from some kinds of hazards, such as schema changes and divergence between client/server schema versions. Also, the frameworks can often save some memory by preventing byte sequences from accidentally remaining after the payloads have been un-marshalled. Unrecognized fields are stripped away entirely, helping to avoid data contamination.
As I understand it the core of the NIOTooManyBytesError is the idea that one type of malicious payload can be something that is large enough to cause memory problems when un-marshalling it, perhaps big enough to crash your service, or even slow down the OS/container where it runs with memory contention. This can be a potential vector for a Denial of Service (DoS) threat.
The idea is that when you are reading the bytes out of a stream that you should put some kind of upper bound on what an acceptable payload size can be. Typically, you'll read all of the bytes from the stream to create the Data for the Decoder and that's where you provide the upTo limit. If the payload is too big then the error is thrown before the decoder is involved. It's better to fail before incurring a large processing cost decoding a large malicious payload. This two-step process was novel to me and I found it to be slightly unergonomic until I understood the reason.
var request = makeRequest(url: url)
let response = try await requestExecutor.execute(request, timeout: .seconds(30))
let buffer = try await response.body.collect(upTo: 1024 * 1024) // 1MB is much larger than what we expect
let data = try JSONDecoder().decode(MyData.self, from: buffer)
With the frameworks that I have used in the past in other languages there's no real protection against malicious large payloads since that part is abstracted and there isn't an obvious way to set a limit on the bytes. The un-marshalled data can grow, and grow as long as the data type has any variable width fields in it, such as strings, and arrays. There's also the processing time that has compounded the memory consumption.
So, how does one decide what the limit should be for a particular payload? I think that judgement depends on how close to the point of consumption of that data. If the code is very close to where the data will be acted, then the domain might give a clue to what a static limit could be. Would a payload of 1MB be much more than what is needed for a single JSON status object from a well known service? Could that memory consumption be handled within the budget of the service given X concurrent service requests?
What about the case where you don't have an answer at the level of this code? You might be able to pass the responsibility of specifying a limit up through the call stack to the trusted client that might have a better understanding of what it can be. If they can't provide one either then it's possible that this limit should be configurable at deployment time through an environment variable, or configuration file.
When this error is encountered, then either it's a malicious payload, or there's a limit that needs to be adjusted. The error documentation now summarizes these points in case someone finds themselves in this position.
Cheers,
Chris