tl;dr: There are tools to figure out what's going on with memory & RSS. On Linux,
/proc/pid/smaps is of particular interest. Specifically the difference between
To expand here a little bit: Application memory can be in a few different states and whilst everything said here is true I think we should add that you can check what the problem is.
But let's first start with a few states of the memory. If you allocate memory in your application, this will likely go through
malloc(3) or friends (
posix_memalign, ...) and hopefully (unless it's in use forever or you have a leak) eventually be freed using
free(3). Many people I speak to assume that memory from
malloc comes straight from the kernel and gets released back to the kernel after
free. This is not true.
free etc are implemented by an allocator, the default one usually comes with your libc. These are complicated beasts, can be tuned and replaced which is what @lukasa is alluding to.
Just to be clear, in Swift you usually don't actually call
free yourself, you typically create instances of classes or data structures that are backed by them (Array, Dictionary, String, ...). Once their reference count drops back to zero, Swift's runtime will free them for you. But the Swift runtime won't keep
malloc'd memory around so for the sake of this discussion we can ignore Swift's runtime.
Conceptually, if you (or the Swift runtime) call
malloc, the allocator will likely look if it already has spare memory that it previously requested from the kernel (usually using
mmap). If it does, it will assign that memory to you without having to ask to the kernel for anything. It will just mark this memory as in use in its data structures and your
malloc will return it.
Similarly, if you call
free the allocator can and will absolutely hold on to that memory, i.e. it will usually not immediately return this to the kernel (or even tell the kernel about that fact). In fact it often can't return the memory because you can only ever return at the very least full pages (usually 4 kB or 16 kB) to the kernel. But even if you
free fully page-aligned memory that is a multiple of your page size the allocator might (and will) still hold onto it. If the allocator's heuristics decide that something is worth returning to the kernel, then they will and only then is there a chance for RSS going down.
But, as Cory points out, there are different ways of returning memory to the kernel:
munmap: which invalidates that page mapping, returns the pages to the kernel immediately. Memory returned using
munmap will immediately decrease your RSS.
madvise(..., MADV_FREE) does not invalidate any mappings/pages, it merely tells the kernel that you do not need the contents of that memory anymore (so the physical memory pages backing this memory can be reused by the kernel when it wants). But it's important to understand that these mappings are still owned by the application. The application can even still use (read/write) that memory (which will re-dirty it). Memory "returned" using
madvise(..., MADV_FREE) does not immediately decrease your RSS size. It will eventually decrease your RSS if the kernel runs into memory pressure which means it'll actually reuse the physical pages for something else (and you'll likely get mapped a copy-on-write zero page).
madvise(..., MADV_DONTNEED) is like
madvise(..., MADV_FREE) except that it does all the returning memory immediately. I.e. your RSS will decrease immediately.
Okay, we covered a few basics. Let's discuss again a few (non exhaustive states of memory):
- allocated (using
malloc) and in use: obviously counts towards RSS, you're using it [you, the allocator and the kernel agree that it's in use]
- allocated using
malloc and leaked: counts towards RSS [you might think it's not in use but the allocator and the kernel don't know that]
- not allocated using
malloc but either in the allocator's pools or sharing a page with something that's allocated. Counts towards RSS [you think it's freed, the allocator thinks it's freed, kernel thinks it's allocated]
- not allocated & returned using
madvise(..., MADV_FREE). This counts towards RSS because until the kernel has actually reused that memory [you think it's freed, the allocator knows it's freed, the kernel knows that you don't need the data (so it can just drop the physical memory pages) but your application still has the mappings]. Essentially, you no longer control your exact RSS size after
MADV_FREE because the kernel can reduce it when it wants which is usually when it suffers memory pressure.
- not allocated & returned using
madvise(..., MADV_DONTNEED). This does not count towards your RSS [you think it's freed, the allocator knows it's freed, the kernel knows that you don't need the data and has unmapped the physical pages)]. You still get to keep the virtual memory mappings though.
- not allocated & returned using
munmap. This does not count towards RSS [you, the allocator and the kernel know this isn't allocated]. If your application were to touch the memory after
munmaping it, it would crash (
And now, let's look into how we can figure out how much memory in your application is in what state.
If you want the kernel's understanding of (1), (2) and (3) you can
grep ^Private_Dirty: /proc/YOUR_PID/smaps | grep -v ' 0 kB' and see all mappings that are "private" (not shared with other processes) & "dirty" (you actually used this memory, kernel can't just forget the contents).
Unfortunately, we can't easily tell (1) and (2) apart without introspecting the actual data of your address space.
heaptrack/... can do this partially for Swift and
leaks on macOS can do this pretty well.
(3) is actually quite similar because the kernel is fully oblivious to this. The only way to get information of memory that's allocated & in use vs. memory that's allocated & in the allocators pools/fragmentation is from the allocator itself. On Linux,
mallinfo2 can help here, on macOS it's
MADV_FREE'd pages that haven't been reclaimed are fortunately quite easy to spot on Linux because the kernel has understanding of it. You can look into
/proc/YOUR_PID/smaps and check all all entries that are
grep ^LazyFree: /proc/YOUR_PID/smaps. Each line that appears is something that your allocator allocated using
mmap earlier and then told the kernel that it's no longer needed using
madvise(..., MADV_FREE). Over time (when the kernel actually reclaims those pages, they become like (5))
MADV_DONTNEED'd pages (and
MADV_FREE'd ones that have been reclaimed by the kernel) are a little bit harder to spot but they don't increase your RSS so maybe you just ignore them. If you're curious, those would appear as mappings that have a size (e.g.
Size: 132 kB) but show up as
0 kB. That means that whilst you still own the virtual memory mappings, no physical pages are mapped to those. You can read/write them but you'll suffer page faults.
All the memory in (6) (
munmap'd memory) doesn't appear in
/proc/YOUR_PID/smaps at all and is just like memory that never got mapped.
Lastly, it's worth noting that not all allocators even use
MADV_FREE and sometimes you can configure if you want it to use
MADV_FREE or not. In general,
MADV_FREE is a good idea because it will make reclaiming previously "returned" memory much cheaper in certain cases. For example if the kernel hasn't gotten around to really unmapping the physical pages from your
MADV_FREE'd memory then you can just reuse it without doing a syscall or suffering a page fault.
But if you use a resource limiting system that limits based on your RSS, then
MADV_FREE might be an issue for you.
Especially with containers
MADV_FREE could be an issue if your hosting kernel has access to a load of memory but your container is limited by RSS. That would mean that the kernel itself never really runs into memory pressure so it might never actually reclaim the
MADV_FREE'd pages but you still "exhaust" your resources because your RSS stays high.
@taylorswift if you share a copy of your
/proc/YOUR_PID/smaps in the bad state, I'm happy to have a look too.