The way the compiler assigns stack space for locals and temporaries in non-optimized builds seems to be a bit uneconomical when there are a lot of branches, and it can create situations where it's possible to write code that functions perfectly well under
-O when the optimizer combines everything together but easily exhausts the stack in debug builds.
Usually this pain is felt most often when working with code that's recursive (because the stack usage is compounded), or when the code is running on background threads (which are given a much smaller default size, and if you want to use Dispatch APIs, you're unable to change it without falling back to
Foundation.Thread). A couple examples I've seen personally are:
- SwiftSyntax refactored a
visitmethod containing a large
switch/caseinto separate methods, one for each
case, to relieve stack pressure (https://github.com/apple/swift-syntax/pull/147)
- SwiftProtobuf users encounter problems involving messages that get generated as large value types (https://github.com/apple/swift-protobuf/issues/1034)
Here's a godbolt link where I explored some examples in more detail to try to nail down what was going on and mitigate it in the meantime:
This problem isn't specific to
case statements either; rewriting it using
else has the same characteristic. So it looks like every basic block may be getting its own separate space on the stack even if their execution is mutually exclusive. When compiling with optimizations, they get collapsed appropriately, but without optimizations, we can see significant growth from just minor changes.
Some changes I didn't include above make the problem even worse; for example, if
Value is a
class instead of a
struct, the extra temporaries generated for retains/releases and to read/modify/write the value passed via
inout to the function contribute even more to the stack usage. (We manage to get around this one already in SwiftProtobuf by wrapping whole functions in
One mitigation is to wrap the code in an immediately-executed closure to hoist its locals into a separate frame from the outer function, but if the code uses
throws, this creates an awkward looking "double-
try" (and the
try handling still requires a small amount of local stack space, so it still causes linear growth w.r.t. the number of cases).
We're looking separately at ways to improve the memory layout of SwiftProtobuf messages so that indirect storage is used more frequently when the message might be large and that would help tackle this from one direction. But in the general case, whether the problem is a few branches with large stack usage or many branches that are individually small but add up to a large amount of stack usage, is there anything preventing the compiler from using a more compact stack layout when possible, even for non-optimized builds, so that authors don't have to rewrite their code in strange ways to avoid this issue?