There's an interesting discussion evolving about potential performance problems where large switch statements (or if-else conditionals) lead to excessive stack usage and even overflows in debug builds. It looks like this has led to some workarounds in swift-protobuf[1] and swift-syntax[2].
I'm searching around for more information on this topic. Please let me know if there are any open tasks on swift I can follow along to learn more. Please let me know if there are any more essays or documentation teaching product engineers how to defend against these situations. Thanks!
If anybody wants to play around with this, I've ported a test program from our crustacean friends who very much have the exact same bug with their LLVM backend:
Here's what the results look like for debug and release with Swift version 6.1 (swiftlang-6.1.0.109.103 clang-1700.0.13.2) distributed with Xcode 16.3 beta 2
It looks like since this has been posted nearly 8 years ago, outlining can materially improve the size of the stack frame that gets pushed, but that doesn't solve the overall asymptotic growth the OP is reporting.
It's funny this was posted as I was poking around sqlite and noticed that they use a script to generate a giant union-of-structs in heart of their virtual machine that amounts to the missing stack size optimization that needs to happen here at -Onone.
LLVM is generally prone to this sort of problem. The way we addressed it in Clang was using LLVM's lifetime intrinsics to enable better stack-slot re-use. We could look at enabling those in more situations.
In some (but not all, to be fair) of the reports, the problem is more pronounced in unoptimized builds, in which LLVM presumably does little or no stack slot optimization, so I wonder if there are some LLVM passes we could run in -Onone, or if we need to do some SIL-level preprocessing.
Just curious and want to understand (as compilers are magic to me)—will some functional language compiled with LLVM have same problem when using pattern matching, or it's different to using switch in Swift?
It’s a problem in any large function that isn’t using lifetime intrinsics. switch / pattern matching is just a particularly obvious code pattern in which allocations from different paths do not need to co-exist with each other.
I believe Clang does do some stack slot optimization even at -O0 for exactly this reason. I haven’t looked into it, though, maybe we need to enable a pass.
The LLVM inliner will automatically apply intrinsics to inlined allocas, so optimization doesn’t generally make this problem worse. But of course that doesn’t help for functions that get inlined in SIL, or if there are optimizations that are being lazy about making tight lifetimes (alloc_stack ranges in SIL), so our optimizer is still potentially implicated.
Yeah, my original issue was primarily a Debug issue, Release builds were fine. But for this type of more-hobby-than-commercial project, Debug is often the primary target.
It seems like everyone here correctly understands the scenario, but if anyone wants any context around the issue that I was discussing in the original toot, it was the printName function in CwlDemangle, literally an implementation of Swift's NodePrinter::print. The C++ original is here:
and you can see the Swift function here:
The biggest insult here is that the C++ ran fine but Swift would overflow, even though both functions are roughly equivalent in terms of structure.
I briefly had an implementation that refactored the immediate printing into deferred printing via a heap-based queue (and iterated until the queue was drained):
but needing to do this made me feel sad – mostly because it's admitting defeat to C++. Instead, I got the stack usage down from 36kB to about 12kB by aggressively factoring all the case legs out into child functions and even though Xcode does most of the work for you (select code, refactor -> extract to method), it's still 750+/750- lines of churn.
And at the end of that, the function still takes 12kB (i.e. a lot) for a function where no path through the function is longer than 3 steps. But 12kB is "enough to pass my test case" so
Oh, and we'll all quietly ignore the fact that SE-0262 remains stuck in permanent limbo? Excellent