When we were discussing the Charter of the SSWG last year, one area several people mentioned they were interested in collaborating on is improving the experience of deploying Swift applications and managing them at scale.
A critical part of that is ensuring that Swift has a good story for FFDC (First Failure Data Capture - please forgive me using IBM terminology, I'm sure other people have different/better names for the same thing). The idea is that when your app falls over, you should have useful diagnostics available immediately without having to recreate the problem a second time (which of course may be impossible for heisenbugs).
I'm also aware of two other libraries which tackle this problem in different ways.
(uses libunwind)
(uses some hackery (@_silgen_name to call into stdlib for demangling))
Would people be interested in collaborating on a stacktrace library through the SSWG? Does anyone have thoughts on the technical approaches taken by the libraries I mention above?
It would be fantastic for the community. Especially if it will also work on iOS and macOS, as nowadays you basically have to sell out your customers when you use any of the "free" solutions out there.
It's not clear to me what the best/recommended path is on Darwin. libunwind ships with macOS but there is also CoreSymbolication.framework. Now that we are ABI stable on Darwin what is the official way?
First, thanks @IanPartridge for starting this discussion. I think we all agree that the current situation is not optimal and we are very eager to do something about it.
We have been collecting some information about how other languages with a similar scope as Swift handle this today. The most prominent ones are probably C++, Go and Rust. They all have support for retrieving backtraces, with varying degrees of options and manual work required. Go and Rust both have built-in support for printing backtraces. Go always does it with no configuration required, while Rust only prints the backtrace when the environment variable RUST_BACKTRACE is set to “1”. Rust also by default strips debug symbols from release builds (just like Swift), so building with “-g” is required in release mode. Go does not differentiate between debug and release builds and always includes debug symbols. Both languages also allow users to catch and recover from panics, in which case a backtrace must be printed manually by the user, which in Go can be done with PrintStack() , found in the runtime/debug package and in Rust it can be retrieved using the backtrace crate (https://crates.io/crates/backtrace). C++ does not by default print a backtrace on crashes, but they can be manually retrieved using backtrace() and then demangeled with cxa_demangle() , or more comfortably by using the boost stacktrace module (GitHub - boostorg/stacktrace: C++ library for storing and printing backtraces.).
So there is quite a bit of prior art here, that could be used as inspiration. In Swift today there's no built-in way to trap panics, but signal handlers can be used instead (like Ian already does in his library). I think a good start would be to check our options (e.g. backtrace() vs libunwind ) and see how well each of those work on the platforms we want to support. Also how much control do we want to give the user. Is it sufficient to just install a pre-defined hook and print the backtrace, or should users be able to install their own hooks and be able to retrieve the backtrace as a proper data structure (like in Rust's backtrace crate).
We think this would be a great candidate for a SSWG hosted project and would love to join your effort in making this real, as this would dramatically improve the overall experience of developing server-side Swift code.
On Apple platforms, we rely on the system crash tracer, which collects a crash report once a process crashes and handles symbolication of the backtrace. It might be interesting to consider a similar out-of-process monitor-based approach for the server, since in-process signal handlers might fail to capture some forms of failure (particularly SIGKILLs) and could interfere with an app's own signal handlers.
We've also been looking into addressing shortcomings of backtrace symbolication itself in Swift, and looking for ways we can improve that which would likely benefit both Apple and server platforms. Traditional backtrace libraries which merely walk a callstack and rely on the symbol table for symbolication are limited in how well they can deal with inline frames. Something that uses DWARF debug info to symbolicate backtraces could give a more accurate account of inlined functions. On Apple platforms, the inessential debug info is separated from the binary so that customer machines don't need to download it, but the developer can use the debug info to symbolicate crash reports on their end; this separation of concerns may be less important on the server, though.
On a related note, Swift uses trap instructions for safety checks, and although these instructions are uniqued for each trap reason, and the instructions are associated with source locations for the trap reason in DWARF, there's no more detailed accounting of the reason for the trap. We've been discussing ways we might be able to record richer messages for these traps in debug info as well.
It also abstracts away the backend implementation so we could experiment with both backtrace() and libunwind and leave the door open to supporting other platforms in future.
The question of hooking the backtrace generator up to, for example, a SIGILL handler is a separate concern in my view.
Another separate question is about the viability of running the Swift demangler in-process, including in during a crash situation.
The demangler is already in the Swift runtime, so using it to pretty up backtraces should not be a problem. The issues I raised seem readily applicable to Linux as well as Darwin; in both environments, a traditional backtrace is going to miss out on inline frames, and having reason metadata for traps would allow crash reports to contain more descriptive and actionable information.
Exposing that makes a lot of sense, especially for in-process backtraces and things like that. My main concern would be people trying to parse the demangler output, when it isn't really designed to be a stable output format, but that's an existing problem with things like String(describing: T.self) that already generate demangled strings.
Excellent Very glad that this seems to be not too controversial.
The second part to the thread is where to get the backtraces from, we could invest into getting the information out of DWARF debug infos if we think that'd be the way to go. We'll need to explore it a bit but with a few hints here and there I hope we'd be able to pull it off. Or as MVP we'd start out with the simple backtrace() and improve over time...
I agree the signal handler / installing may not necessarily be part of the same discussion, but in the library we could perhaps provide either a small function OR pattern for runtime (i.e. http frameworks like kitura / vapor) so perhaps they could install those handlers for their users, so end-users would not have to care "how" they got the better traces -- we are also in good position to collaborate with developers/users of the potential "nice backtraces library", so overall quite optimistic here.
Hope to have more information once back from traveling after wwdc
What might a stdlib demangle look like? The existing API is String -> String but possibly we could match the existing print() and debugPrint() APIs which have a pair of functions each:
// demangle to stdout
public func demangle(_ mangledName: String)
// demangle to the given output stream
public func demangle<TargetStream>(
_ mangledName: String,
to target: inout TargetStream
) where Target : TextOutputStream
Another benefit of using a separate process for crash reporting would be that the crash handling process doesn't need to live in an austere runtime environment because of a possibly corrupt host process. I was recently talking to some engineers about their work on the crash handler for Clang; they had struggled with the limitations of in-process handling for a while, but ultimately switched to forking a supervisor process, and that's what allowed Clang to report not only a simple backtrace but also collect inputs from the filesystem in order to bundle up inputs to reproduce the crash. That specific case might not be of much relevance to servers, but I can imagine servers wanting to be able to collect more interesting information from their environment, such as logs, and bundling them into rich crash reports, and that becomes tricky if you have to work from an arbitrarily-corrupted process state.
Hi all, here's an update on my progress with backtraces.
I started off by looking at the various options we have:
backtrace() for unwinding and backtrace_symbols() for symbolication. This is the approach currently used in GitHub - swift-server/swift-backtrace: 💥 Backtraces for Swift on Linux and Windows and while it does successfully unwind it does not fully symbolicate, because backtrace_symbols() does not read DWARF debug info. addr2line can be used post-mortem to symbolicate.
Then I thought "Well if addr2line is symbolicating, can't we just do whatever it is doing internally?" Yes we can, and https://oroboro.com/printing-stack-traces-file-line/ explains nicely how to do it. Basically you can use libbfd which is part of GNU binutils. The problem with this is that libbfd is GPL licensed, which makes it unusable for SSWG purposes.
Next I looked at libdw. This is part of elfutils and again looks like it ticks the boxes. It is used successfully by the Haskell runtime for stacktraces and symbolication, and is also an unwind/symbolicate backend for Linux perf. It doesn't look easy to work with though from a quick look at the API, and the licensing is LGPLv3+ which may be problematic. It may well be worth investing more time looking at it though.
Next I looked at eu-stack. This is part of elfutils again and is a command-line tool that can print stacktraces of any process. It uses libdw internally. We could run it by catching SIGILL, calling fork() then execve()ing eu-stack and capturing the output. I'm not really a fan of these out of process options though, and it would require users to install the elfutils package.
libcwd - seems aimed at C++ and I couldn't spot easy to use API. Didn't spend much time looking at this one.
Lastly I looked at libbacktrace. It's used as GCC's unwinder and is designed to run in and out of process. It's liberally licensed. It has a very simple (read: idiot-proof) API and is not much code in and of itself.
I decided to try and get libbacktrace working. I vendored it into an SPM target (many hacks were done, as a PoC) and used its backtrace_print() API. Here's my first results:
I think this could be a viable backend for our backtrace library. It requires no additional system packages to be installed, it is quick to build (just 13 C files), it can be vendored inside an SPM package, and it seems to give good stacktraces including function names, source files and line numbers.
I was using its simplest backtrace_print() API - there are others which give more control and would enable us to feed the symbolicated function names to the Swift demangler. Together it could be quite nice.
Very good stuff indeed! This combined with the demangling proposal gets us a lot closer to where we want to be. What are those ??? entries, btw? Any way to get rid of that?