Uneconomical stack usage in non-optimized builds

The way the compiler assigns stack space for locals and temporaries in non-optimized builds seems to be a bit uneconomical when there are a lot of branches, and it can create situations where it's possible to write code that functions perfectly well under -O when the optimizer combines everything together but easily exhausts the stack in debug builds.

Usually this pain is felt most often when working with code that's recursive (because the stack usage is compounded), or when the code is running on background threads (which are given a much smaller default size, and if you want to use Dispatch APIs, you're unable to change it without falling back to Foundation.Thread). A couple examples I've seen personally are:

Here's a godbolt link where I explored some examples in more detail to try to nail down what was going on and mitigate it in the meantime:

This problem isn't specific to switch/case statements either; rewriting it using if/else has the same characteristic. So it looks like every basic block may be getting its own separate space on the stack even if their execution is mutually exclusive. When compiling with optimizations, they get collapsed appropriately, but without optimizations, we can see significant growth from just minor changes.

Some changes I didn't include above make the problem even worse; for example, if Value is a class instead of a struct, the extra temporaries generated for retains/releases and to read/modify/write the value passed via inout to the function contribute even more to the stack usage. (We manage to get around this one already in SwiftProtobuf by wrapping whole functions in withExtendedLifetime.)

One mitigation is to wrap the code in an immediately-executed closure to hoist its locals into a separate frame from the outer function, but if the code uses throws, this creates an awkward looking "double-try" (and the try handling still requires a small amount of local stack space, so it still causes linear growth w.r.t. the number of cases).

We're looking separately at ways to improve the memory layout of SwiftProtobuf messages so that indirect storage is used more frequently when the message might be large and that would help tackle this from one direction. But in the general case, whether the problem is a few branches with large stack usage or many branches that are individually small but add up to a large amount of stack usage, is there anything preventing the compiler from using a more compact stack layout when possible, even for non-optimized builds, so that authors don't have to rewrite their code in strange ways to avoid this issue?

cc @tbkka @thomasvl

2 Likes

To clarify the scope of the problem a bit: One user reported a single function that had an 800k stack frame in debug builds due to allocating redundant storage for a 6k struct in every branch of a large case statement. In optimized builds, that same function only required a 26k stack frame. As Tony said, we're experimenting with ways to reduce the size of our generated structs, and with ways to exploit closures to make better use of stack space in debug builds, but I expect this is going to be a problem for a lot of other folks.

1 Like

I'm the user who reported the issue with SwiftProtobuf. Naturally, I hope the compiler can become smarter about stack usage in unoptimized builds and this problem will go away, but the experience of debugging this issue left me wishing the compiler would issue a warning when it generates a very large stack frame. I got lucky this time because the stack frame was absurdly large: 800KB, but if it was more like 300KB the crash might not have happened every time and it would have been trickier to track down. I'd love to be able to set a threshold like 50KB and have the compiler warn me if it exceeds that for any single stack frame.

how did you end up with a 6k struct? did it have a fixed-size array imported from C?

I nesting of a nesting of a nesting of ... of value types. This is the part where @allevato mentioned we're looking at making SwiftProtobuf smarter at inserting a class for the storage to avoid the compositing resulting in this.

But as seen with SwiftSyntax, if you have enums with associated values, the total size might only be say 100 bytes, but if there are 100 cases, each one being the 100 bytes, then a switch on the cases can result in atleast 100*100 bytes of stack. There's always overhead for each base based on try, inout params to functions, etc. Use that code in a recursive structure, and you can run out of stack space in a non optimized build really quickly.

1 Like

And not just this, but even code that reads as fairly straightforward Swift can get lowered to a large number of LLVM temporaries, which appear to each be given their own unique stack address. (I could be wrong and it might be smarter than this, but even small changes to code causes a significant increase in the number of bytes used in a non-opt build.)

I imagine there's an LLVM pass that could be used to collapse this stack usage, but I wonder if there's something that prevents it from being enabled for debug builds.