Memory leaking in Vapor app

Helge_Hess1 · April 4, 2019, 7:49am

Are we saying it is normal for Swift programs to potentially hold onto whatever peak memory they spike to?

It is highly implementation and access pattern dependent, but this is generally the case for malloc backed memory allocation (that is for most languages but such coming with an own "VM"), particularily on Linux.
Malloc doesn't usually return memory to the system, but puts freed blocks into a free list for reuse. So the RSS may indeed just appear to grow and never really shrinks (except for large allocations). In the worst case, unused memory pages will get swapped out.

(BTW: The same is true for Java/JS: it is implementation specific and depends on the GC in use. If you use a copying/compacting GC, which JVM allows, it can actually resolve the fragmentation.).

But it seems to me one ought to be able to write for example an image or in-memory data processing program in Swift that temporarily loads a couple GB.

Again, highly implementation dependent, but if you allocate memory over a certain size (presumably over the page size [4KB..16KB]), malloc will place you in a separate code path and dealloc the memory on free (no point in returning a 30GB memory block into the freelist).
For image processing (or other large files) in particular, the file may not be even loaded into "regular" malloc memory, the file might be mapped into memory (though careful with that as well).
But yes, it is a common pattern on Unix to do expensive stuff out of a concurrent server and in an external process / in a job queue.

But you mention one thing which is quite likely another problem here: in-memory data processing.
The original program sounded like it is fetching a whole set of data from the DB into the memory just to deliver it a second later and then drop it from the memory. That is almost universally not a good approach. (Obviously I can't say about this particular implementation) But I've seen this often and generally you should be directly streaming the data from the database to the http connection and do your transformations on the fly. This way you get "O(1)" memory use, instead of "O(n)".
Even though NIO mysteriously has no piping function builtin :-), it is setup to support this really well (I eventually need to rebuild Noze.io on top of it).

The problem with "just load everything into RAM" is that in highly concurrent framework like NIO many concurrent connections may be doing that at the same time, turning the issue into a "O(n^2)" ...

So: Stream if you can! ;-)

As with every algorithm, you need to choose the right one for your task and be aware of how your computer works. Resources, neither CPU nor RAM, are free :-)

backing stores that could take up residence in new pages, I think? It wasn’t clear to me if these were sticking around because XYZ was holding on to them or if they’re just the sort of things that can get sprinkled about when memory grows, making a Swift program’s RSS like a balloon that never deflates

No, that won't easily happen. NIO and its ByteBuffer is already doing the right thing here and NIO itself doesn't hold onto buffer objects. My understanding about what happened is this:

NIO is receiving network packets, say 1KB ByteBuffer blocks
it is vending those blocks to the MySQL driver
even for just reading tiny amounts of data (integers, 4 bytes?), that driver would use the ByteBuffer.getData to create a Foundation.Data object for those 4/8 bytes
NIO tries to be smart and not copy the data, but vend a Data object which is backed by the ByteBuffer itself (instead of copying unnecessarily)
Unfortunately Swift's Data implementation for "backed-data" is really inefficient, it is not just holding a reference and a slice, it is actually creating a new heap block to hold that reference.
Plus there is SR-7378, which requires yet another heap allocated closure.

To recap: To read and transform 4 bytes of data, we are allocating two tiny memory objects on the heap, just to drop them a second later.
If that incoming 1KB block has 100 ints, that is ~200 extra RAM blocks just to parse a single incoming buffer.

In short: The driver was using an inappropriate algorithm aka a plain bug (please correct me if I'm wrong). A low level driver should probably not use Data at all, but just use NIO's builtin ByteBuffer reading facilities (or peek at its raw data).

P.S.: Small ad: Apache aka mod_swift is really good because of this. It allows you to automatically and gracefully recycle its request handling subprocesses if they peak either due to a RC bug, due to fragmentation, or due to excessive single time RAM use :->