The way the compiler assigns stack space for locals and temporaries in non-optimized builds seems to be a bit uneconomical when there are a lot of branches, and it can create situations where it's possible to write code that functions perfectly well under -O
when the optimizer combines everything together but easily exhausts the stack in debug builds.
Usually this pain is felt most often when working with code that's recursive (because the stack usage is compounded), or when the code is running on background threads (which are given a much smaller default size, and if you want to use Dispatch APIs, you're unable to change it without falling back to Foundation.Thread
). A couple examples I've seen personally are:
- SwiftSyntax refactored a
visit
method containing a largeswitch/case
into separate methods, one for eachcase
, to relieve stack pressure (Move implementation of each case in SyntaxRewriter.doVisit to separate function by ahoppen · Pull Request #147 · apple/swift-syntax · GitHub) - SwiftProtobuf users encounter problems involving messages that get generated as large value types (EXC_BAD_ACCESS in decodeMessage(decoder:) due to stack overflow · Issue #1034 · apple/swift-protobuf · GitHub)
Here's a godbolt link where I explored some examples in more detail to try to nail down what was going on and mitigate it in the meantime:
This problem isn't specific to switch
/case
statements either; rewriting it using if
/else
has the same characteristic. So it looks like every basic block may be getting its own separate space on the stack even if their execution is mutually exclusive. When compiling with optimizations, they get collapsed appropriately, but without optimizations, we can see significant growth from just minor changes.
Some changes I didn't include above make the problem even worse; for example, if Value
is a class
instead of a struct
, the extra temporaries generated for retains/releases and to read/modify/write the value passed via inout
to the function contribute even more to the stack usage. (We manage to get around this one already in SwiftProtobuf by wrapping whole functions in withExtendedLifetime
.)
One mitigation is to wrap the code in an immediately-executed closure to hoist its locals into a separate frame from the outer function, but if the code uses throws
, this creates an awkward looking "double-try
" (and the try
handling still requires a small amount of local stack space, so it still causes linear growth w.r.t. the number of cases).
We're looking separately at ways to improve the memory layout of SwiftProtobuf messages so that indirect storage is used more frequently when the message might be large and that would help tackle this from one direction. But in the general case, whether the problem is a few branches with large stack usage or many branches that are individually small but add up to a large amount of stack usage, is there anything preventing the compiler from using a more compact stack layout when possible, even for non-optimized builds, so that authors don't have to rewrite their code in strange ways to avoid this issue?