Strategies for Debugging Server-Side Swift in Production

I'm curious what sorts of strategies people may be using to debug live server-side swift code in production or what sort of things are possible with a bit of work.

E.g.
Is there a way to:

  • performance trace live server-side code?
  • print out current stack traces of all running threads from the server?
  • capture crash logs for applications that crash?
  • log a stack trace on swift Errors?

Any other strategies that come to mind would also be helpful.

Thanks

3 Likes

In production, server side Swift appears to the OS just like any other native binary. Therefore, the same debugging strategies apply. For performance tracing, perf is the main tool for the job, which you can use to generate flame graphs and other useful diagnostics. It can therefore also be used as a replacement for the macOS sample command for getting call stacks.

For capturing crash logs, this is often a consideration for your runtime environment. Some process is responsible for starting and monitoring your service, whether that’s the Docker daemon or your init process (like systemd). That process should also be responsible for storing and delivering logs to some logging aggregation system. Exactly what your solution should look like here is hard to say because there are so many different choices, but this is a fairly standard log aggregation process.

Your last point, logging a stack trace on Swift Error, is the most important one. This is not something you can do in general in Swift. Error does not capture a stack trace: it’s an extremely cheap error propagation mechanism, not much more complex at runtime than Go’s (result, error) return pattern. This is great in terms of performance but limiting in terms of metadata. For now there is no general solution to the problem of capturing stack traces on Error. If this is an Error your code emits, you can always emit an appropriate error log that contains a stack, but if it is not you are limited in your ability to do much more at this time.

6 Likes

Thank you very much for the thorough and quick explanation. This gives me a lot of information to start exploring. One follow up question though, you mentioned "you can always emit an appropriate error log that contains a stack". Is there a function in Swift that provides current stack information on a Linux / Docker env? Is it Apple Developer Documentation?

What Cory says, and below a random collection of other stuff:

  • run production code in release mode: swift build -c release

  • until 5.2 lands, always pass a -Xswiftc -g to get debugging symbols, otherwise you won't have symbolicated stacktraces

  • if you have --privileged/--security-opt seccomp=unconfined containers or are running in VMs or even bare metal, you can run your binary with

    lldb --batch -o run -k "image list" -k "register read" -k "bt all" -k "exit 134" ./my-program
    

    instead of just ./my-program to get something something akin to a 'crash report' on crash.

  • if you don't have --privileged (or --security-opt seccomp=unconfined) containers (meaning you won't be able to use lldb) or you don't want to use lldb, I'd advise to use a library like swift-backtrace to get stack traces on crash

  • for best performance in Swift 5.2 (once released), pass -Xswiftc -cross-module-optimization (this won't work in Swift versions below 5.2)

  • perf output, strack traces etc often appear "mangled" (lots of stuff looking like 3NIO14CircularBufferV5IndexV), you can use swift demangle to turn that into readable things. So just do cat file-containing-mangled-stuff | swift demangle and it will look much more readable

  • make use of the sanitizers, so before running code in production, do the following:

    • run your test suite with TSan (thread sanitizer): swift test --sanitize=thread
    • run your test suite with ASan (address sanitizer): swift test --sanitize=address and swift test --sanitize=address -c release -Xswiftc -enable-testing
    • it's also very wise to compile a binary using swift build --sanitize=thread and use it in your testing. The binary will run slower and is not suitable for production but you'll see many many threading issues before you deploy your software. Often threading issues are really hard to debug and reproduce and also cause random problems, TSan helps catching them early
  • make use of logging and metrics

The good news is that creating a deployment guide is part of the SSWG's focus areas for 2020.

14 Likes

Thank you so much for this cheat sheet!