Measuring memory usage accurately (for scaling purposes)

Bridger_Maxwell · November 3, 2020, 6:59pm

I'm looking for a metric I can use to accurately measure memory usage so I can auto-scale my server application. This is difficult, because malloc doesn't return freed memory to the OS and I'm not sure how to get a metric that represents used and free memory.

Background:
My server has high peak memory usage for sustained amounts of time. This is because it is a document-syncing server and functions by loading the document (a whiteboard) into memory. When all clients disconnect, it can then close the document and remove it from memory. You can see an example of a document/whiteboard here.

Out of the box, I can measure the OS's free memory, but this metric has a problem. The problem is that the used memory usage never seems to go down, even when a whiteboard is closed and removed from memory. From what I've learned on other posts (Memory leaking in Vapor app - #50 by Helge_Hess1), it seems this is normal behavior for malloc. I understand that when memory is freed malloc might recycle it, but it doesn’t return it to the OS. That means that the process memory reaches a peak and then doesn’t go back down even if there is plenty of freed heap space ready to be reused

Here is a graph of memory usage in my app over a couple months. You can see it goes up and then plateaus at peak memory usage. It never goes down significantly, except when I restart the app (those are the sharp drops downward).

The question:
What metric can I use to determine if I need more or less servers at a given moment? For example, is there an API to ask the allocator how much memory is freed and available for recycling? Maybe the metric has something to do with paging (which I know very little about)?

If I use the out-of-the-box system memory metric, then my application will scale out but it will never scale in, because that metric never goes down.

I'm also not sure what happens if my application reaches 100% memory usage. I think swapping happens and it gets slower, but I'm not sure how bad that would be...

If this API is specific to the OS, I am using the official Swift Docker image based on Ubuntu 18.04.

lukasa · November 4, 2020, 7:55am

Eventually at or after 100% memory usage your process will likely get forcibly killed by the Linux OOM killer.

In general, my recommendation is not to use memory consumption as a trigger to scale out. Memory can be returned under pressure if the allocator chooses to do so, fragmentation can ease over time, and in the absolute worst case Linux will make your process go away, freeing up the memory (at the cost of lost transactions).

The better metrics for scaling are going to be customer observable ones. That is, latency, throughput, etc. At high memory usage and high CPU load you will likely be in an overload scenario, and this will be visible in the form of spikes in latency and throughput. That’s a good time to scale out.

Alternatively, you can do some capacity planning on memory and then recycle your processes over time. This is a good practice anyway as it makes things like rolling upgrade easier.

tal · November 4, 2020, 4:50pm

From my understanding his problem is less scaling out and more scaling down when the extra capacity is no longer needed. His server has an abnormally high amount of memory per request.

Helge_Hess1 · November 4, 2020, 4:54pm

How is this usually done w/ NIO processes? Have a frontend load balancer stop distributing requests to the instance and do a graceful shutdown?
(Apache has this sort of functionality builtin, would be nice to have in NIO as well (a protocol aware watchdog))

Bridger_Maxwell · November 4, 2020, 4:56pm

Thanks @lukasa!

The better metrics for scaling are going to be customer observable ones. That is, latency, throughput, etc. At high memory usage and high CPU load you will likely be in an overload scenario, and this will be visible in the form of spikes in latency and throughput. That’s a good time to scale out.

I'm worried that for my server, it is kind of an all-or-nothing failure. CPU doesn't get maxed out and latency is fine even as we toe the line with memory consumption. The CPU usage on my server is around 2-5%, even when I choose the smallest CPU configuration. The bottleneck is just memory. As far as I can tell, there won't be any user-observable effects until bam the server is killed because it is OOM.

David_Smith · November 4, 2020, 5:00pm

So the standard API for getting this information on unix systems is getrusage, but unfortunately the numbers it gives are only dubiously meaningful. Even more unfortunately the reason they're dubiously meaningful is because the question is only loosely meaningful. Some examples:

Is a page from a memory mapped file that's currently resident "memory used by your process"? Logically it is, but it also can be discarded at any moment without incurring any paging costs, so for most metrics it makes more sense to count it as not.
If two processes have some shared memory, does it count against both, neither, or something in between? If one exits does it now count fully against the other?
Should pages from the "purgeable memory" facility on Darwin be counted as used or not used? Should they toggle back and forth from used to not used when they're locked and unlocked?

The right thing to do is probably what @lukasa says and proactively recycle processes.

If you can't do that, my suggestion would be to take matters more into your own hands: if you have large transient buffers of memory, consider mmap()ing them yourself with MAP_PRIVATE | MAP_ANON. Then when you're done with them you can unmap them from your process, rather than hoping malloc decides to help you out.

You could also see if there's a less CPU-focused memory allocator you could use. Darwin's default malloc is more aggressive about returning memory to the OS, for example.

Bridger_Maxwell · November 4, 2020, 5:38pm

Thank you @David_Smith! Now I’ve got a bunch to read up on I should have taken that Operating Systems course...

I’ll definitely look into task recycling. I’m running this application on Amazon’s container service with Fargate and just got to the point that I can seamlessly deploy by launching new containers and retiring old ones. I can probably recycle containers using that system.

I also want to check out getrusage because a lot of those tricky questions are easier in my setup. The container only has one process (my app) that has an impact on memory usage. Fargate doesn’t support memory paging, so that isn’t a factor either. Maybe with those complications out of the way I can get a useful metric. Thank you for that tip!

CFUsername · November 4, 2020, 7:32pm

You could also use mallinfo to retrieve statistics about malloc's usage to confirm that data is really free-ed and not leaked.

johannesweiss · November 24, 2020, 3:35pm

SwiftNIO Extras comes with a ServerQuiescingHelper which can be used for this use case. When you tell it that you want to start quiescing the server what it does is the following:

closes the server socket so no new connections are coming in
on all the open channels, it sends an inbound event that we're starting to quiesce (which you can ignore if you want)
once all channels are closed, it tells you by fulfilling a future

Here's also an example HTTP server that implements quiescing on SIGINT.

As an extra gift, if you use NIO's HTTPServerPipelineHandler (by default you do if you use ChannelPipeline.configureHTTPServerPipeline), quiescing is sped up by encouraging connections to close without hard closing them. In normal operation, we use connection: keep-alive whilst the client allows us to but once we receive the ChannelShouldQuiesceEvent we'll send connection: close to speed up the closure of the network connections. And once they're all gone, the quiescing helper tells you about it. A common strategy could be: Use the quiescing helper but additionally set a longish timeout (say 5 mins) and if stuff hasn't quiesced until then, just exit. Of course, your infrastructure may already do that in which case there's nothing more to do .

Helge_Hess1 · November 24, 2020, 4:23pm

Nice! But there is no readymade frontend server which deals with that, right? Does nginx or something else have builtin support for those kind of things?
(„quiescing“, what a word! :-) )

lukasa · November 24, 2020, 5:23pm

Yes, most servers can do this. In the case of nginx if you send it SIGQUIT it will perform what nginx calls a "graceful shutdown", which is broadly analogous to what NIO calls "quiescing": new connections are not accepted, and existing connections are allowed to complete their work. Most other servers can do this too:

Apache has the WINCH signal (graceful-stop).
HAProxy has graceful stop on SIGUSR1
Caddy uses SIGTERM

The pattern is quite common.

A fun evolution of this pattern is also to use graceful shutdown for zero-downtime config reloads or binary upgrades. In this mode, rather than close the server socket, you use the signal to spawn a new copy of the binary that loads the new config from disk. You then send the server socket to the new process using unix domain sockets, and wait for the existing processes to exit. This way there is no point where the server socket is dead, and all existing connections get served.

NIO is missing a few small chunks of functionality for that use-case, sadly.

Helge_Hess1 · November 24, 2020, 5:45pm

You misunderstood me, of course all the established servers have graceful shutdown build in - and usually a frontend which manages those backend processes.
The latter is what I’m interested in (how to do it with NIO servers). Can nginx act as a frontend and trigger shutdowns and such? Does Pound still exist and work for that?

tomerd · November 24, 2020, 6:59pm

couple of potentially helpful links:

GitHub - swift-server/swift-service-lifecycle: Cleanly startup and shutdown server application, freeing resources in order before exiting. can help with capturing signals and initiating graceful shutfown
GitHub - apple/swift-metrics-extras can help collect system metrics

lukasa · November 25, 2020, 8:04am

Ah, I see. The answer is no: there is no common system for frontend servers to signal to backend servers that they should terminate. This is the job of the init system or whatever else manages the runtime of the backend servers.