Debugging crashes in deployed swift apps

Hi,

I was wondering what the best practice for debugging deployed swift apps is. Especially if containerized and running in a Kubernetes cluster. Specifically, I have a server-side swift app that crashes but I don't see the crash detail (error message, stack trace) in kubectl logs -p POD. I assume this is because, of course, the deployed version is a release build.

If you can point me to some info (blog post, documentation, etc) or at least throw some keywords in the message box, would be greatly appreciated :)

Cheers
Tobias

1 Like

I'm glad you ask this here! I have absolutely not much time right now, so I'll give you a braindump of what should work.

So first of all, Swift applications should run in release mode (as you do) because otherwise they're super slow. And btw, in debug mode you also don't always get stack traces, for example on overflow.

Ok, how do we get stacktraces then?

Step 1: Debug Info

We need debug information in the binary, to get that we should compile with

swift build -c release -Xswiftc -g

done.

Step 2: Let's print stack traces

To do that, we need to do something dodgy, we'll install a signal handler that will print us the backtrace :slight_smile:. How do we do that? Well, first, we'll need access to the backtrace and backtrace_symbols functions from glibc.

Step 2.1 Getting access to backtrace and backtrace_symbols

The easiest is to create a new SwiftPM target with the following source layout:

Sources/CBacktrace
Sources/CBacktrace/c_backtrace.c  # this file can be empty 
Sources/CBacktrace/include/CBacktrace.h

So the .c file can be empty, and CBacktrace.h just needs to contain:

#include <execinfo.h>

Step 2.2

In Package.swift, just create the target and depend on it as usually. In your main.swift do import CBacktrace

Step 3 Let's make the stack traces printed on SIGILL

Okay, this is murky, but there you go. If you put the following code in your main.swift you should be getting stack traces in release mode.

import Glibc
import CBacktrace

private func setupHandler(signal: Int32, handler: @escaping @convention(c) (CInt) -> Void) {
    typealias sigaction_t = sigaction
    let sa_flags = CInt(SA_NODEFER) | CInt(bitPattern: CUnsignedInt(SA_RESETHAND))
    var sa = sigaction_t(__sigaction_handler: unsafeBitCast(handler, to: sigaction.__Unnamed_union___sigaction_handler.self),
						 sa_mask: sigset_t(),
						 sa_flags: sa_flags,
						 sa_restorer: nil)
    withUnsafePointer(to: &sa) { ptr -> Void in
        sigaction(signal, ptr, nil)
    }
}

setupHandler(signal: SIGILL) { _ in
	// this is all undefined behaviour, not allowed to malloc or call backtrace here...

	let maxFrames = 50
	let stackSymbols: UnsafeMutableBufferPointer<UnsafeMutableRawPointer?> = .allocate(capacity: maxFrames)
	stackSymbols.initialize(repeating: nil)
	let howMany = backtrace(stackSymbols.baseAddress!, CInt(maxFrames))
	let ptr = backtrace_symbols(stackSymbols.baseAddress!, howMany)
	let realAddresses = Array(UnsafeBufferPointer(start: ptr, count: Int(howMany))).compactMap { $0 }
	realAddresses.forEach {
		print(String(cString: $0))
	}
}

So as I said, this is very undefined behaviour but we don't really mind as we're crashing anyway. What we're doing here is installing a one-off signal handler for SIGILL, that's the signal that will be triggered when Swift crashes. And in that signal handler, we call backtrace and backtrace_symbols and print the result.

Now, when our binary crashes, we should see something like:

root@f56a02308177:/tmp/crashtest# swift run -c release -Xswiftc -g
Fatal error: file /tmp/crashtest/Sources/crashtest/main.swift, line 33
.build/x86_64-unknown-linux/release/crashtest(+0x1a71) [0x55dac5f2ea71]
.build/x86_64-unknown-linux/release/crashtest(+0x1429) [0x55dac5f2e429]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7ff45b033890]
/usr/lib/swift/linux/libswiftCore.so(+0x321aba) [0x7ff45aa1faba]
/usr/lib/swift/linux/libswiftCore.so(+0x1485e9) [0x7ff45a8465e9]
.build/x86_64-unknown-linux/release/crashtest(+0x14b8) [0x55dac5f2e4b8]
.build/x86_64-unknown-linux/release/crashtest(+0x1445) [0x55dac5f2e445]
.build/x86_64-unknown-linux/release/crashtest(+0x1480) [0x55dac5f2e480]
.build/x86_64-unknown-linux/release/crashtest(+0x1465) [0x55dac5f2e465]
.build/x86_64-unknown-linux/release/crashtest(+0x140b) [0x55dac5f2e40b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7ff4599efb97]
.build/x86_64-unknown-linux/release/crashtest(+0x115a) [0x55dac5f2e15a]
Illegal instruction

Already much better, looks like a stacktrace!

Step 4

If we copy the stack trace output, so basically this part

.build/x86_64-unknown-linux/release/crashtest(+0x1a71) [0x55dac5f2ea71]
.build/x86_64-unknown-linux/release/crashtest(+0x1429) [0x55dac5f2e429]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7ff45b033890]
/usr/lib/swift/linux/libswiftCore.so(+0x321aba) [0x7ff45aa1faba]
/usr/lib/swift/linux/libswiftCore.so(+0x1485e9) [0x7ff45a8465e9]
.build/x86_64-unknown-linux/release/crashtest(+0x14b8) [0x55dac5f2e4b8]
.build/x86_64-unknown-linux/release/crashtest(+0x1445) [0x55dac5f2e445]
.build/x86_64-unknown-linux/release/crashtest(+0x1480) [0x55dac5f2e480]
.build/x86_64-unknown-linux/release/crashtest(+0x1465) [0x55dac5f2e465]
.build/x86_64-unknown-linux/release/crashtest(+0x140b) [0x55dac5f2e40b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7ff4599efb97]
.build/x86_64-unknown-linux/release/crashtest(+0x115a) [0x55dac5f2e15a]
Illegal instruction

into a file called /tmp/stacktrace we can run this beautiful bash script:

cat /tmp/stacktrace | tr '()' '  ' | while read bin addr junk; do addr2line -e "$bin" -a "$addr" -ipf; done | swift demangle

aaand

0x0000000000001a71: function signature specialization <Arg[0] = Dead> of closure #1 (Swift.Int32) -> () in crashtest at /tmp/crashtest/Sources/crashtest/main.swift:22
0x0000000000001429: @objc closure #1 (Swift.Int32) -> () in crashtest at /tmp/crashtest/<compiler-generated>:?
0x0000000000012890: __restore_rt at ??:?
0x0000000000321aba: function signature specialization <Arg[0] = Exploded, Arg[1] = Exploded> of Swift._assertionFailure(_: Swift.StaticString, _: Swift.String, file: Swift.StaticString, line: Swift.UInt, flags: Swift.UInt32) -> Swift.Never at crtstuff.c:?
0x00000000001485e9: Swift._assertionFailure(_: Swift.StaticString, _: Swift.String, file: Swift.StaticString, line: Swift.UInt, flags: Swift.UInt32) -> Swift.Never at ??:?
0x00000000000014b8: merged crashtest.recurse2(n: Swift.Int) -> () at main.swift.o:?
0x0000000000001445: crashtest.recurse2(n: Swift.Int) -> () at ??:?
0x0000000000001480: merged crashtest.recurse2(n: Swift.Int) -> () at main.swift.o:?
0x0000000000001465: crashtest.recurse(n: Swift.Int) -> () at ??:?
0x000000000000140b: main at /tmp/crashtest/Sources/crashtest/main.swift:48
0x0000000000000000: ?? ??:0
0x000000000000115a: _start at ??:?

which is basically what we wanted, wohoo! I don't exactly know why addr2line (which is in the binutils package) can't find the file/line information but it's probably a hell of a lot better than no stacktraces at all :).

Appendix

Should this be so hard? NO, this should be fixed and high-quality stack traces should be printed.

Can we get higher quality stack traces? Yes! With the help of lldb, the symbolicate-linux-fatal script in the Swift repository can get you really high-quality stack traces. The problem: Right now it tries to parse the output that Swift programs only output in debug mode. But it would be not too hard to combine steps 1 to 3 with symbolicate-linux-fatal and make it be able to produce those stack traces from the output from step 3.

Sorry for typos etc, I need to run...

6 Likes

related https://bugs.swift.org/browse/SR-10439

Is there any reason we can't include backtrace() and backtrace_symbols() in the Glibc modulemap? Seems like that would simplify this quite a bit.

1 Like

Thank you Johannes for the detailed explanation! I’ll definitely try this. Happy Easter!:hatching_chick:

I've thrown this into a Swift package, it might be useful to people?

4 Likes

@IanPartridge this is cool, do you want to also pitch it so more folks find it?

Please don't claim that system headers are part of your package! That's what module map packages are for.

It will be wonderful if apps build with debug information on crash to output it stack trace by default. When they aren't attached to a debugger. Or to have a compile flag for this :)

1 Like

Thanks - fixed in https://github.com/ianpartridge/swift-backtrace/pull/2

Ok I don't know what is happening, but when I add Ian's package, swift build gets stuck with the line Updating https://github.com/vapor/validation.git. It compiles fine, without dependency to Ian's package. :weary:

Ah Ian's package declares // swift-tools-version:5.0, my app is still on 4.2 maybe that is an issue. Weird though that this manifests as a seemingly random hang during resolution...

I can add Swift 4.2 support too. Hang on... :slight_smile:

2 Likes

Thanks guys, this worked really well. One thing to note, if running the app in a container you need to enable tty otherwise the signal handler does not work.

One last question: I still don't see the fatalError() message in the logs. Is there some way to make swift output the message from a fatalError / precondition in release builds?

Hmm, the fatalError messages should be output in release mode:

$ cat test3.swift
fatalError("boo")
johannes:/tmp
$ jw-docker-swift-5.0 bash -c 'swiftc -O test3.swift && ./test3'
Fatal error: boo: file test3.swift, line 1
-ILL- johannes:/tmp

@t089 are you sure you're actually hitting a fatalError with a message and not maybe an integer overflow or smth?

Hm fair enough.

So the issue was that for some reason a dictionary hit the "duplicate key found" precondition. Before your nice tutorial, I was running the develop build of the app within lldb to see the crash report (:exploding_head:) and I remember seeing a "duplicate key found, this can happen when the key changes after insertion" error message there (paraphrased, didn't save the log :see_no_evil: ). Now, with your backtrace proposal I saw the nice backtrace in my logs when the crash occurred, but no error message. Maybe this error message is an assertion ...?!

Background

The crash is related to this dictionary

import Foundation 

struct LookupKey: Hashable {
        let regionId: Int
        let locale: Locale
        let domainId: Int
}
    
private var fetchedRegionNames: [LookupKey: String] = [:]

My suspicion is that Foundation.Locale does something fishy. So I replaced let locale: Locale now with let locale: String and so far the crash has not appeared again.

Here is the relevant backtrace of the crash:

0x00000000000f65af: function signature specialization <Arg[0] = Dead> of closure #1 (Swift.Int32) -> () in static Backtrace.Backtrace.install() -> () at /app/.build/checkouts/swift-backtrace.git-7189386639530628169/Sources/Backtrace/Backtrace.swift:12
0x00000000000f5e29: @objc closure #1 (Swift.Int32) -> () in static Backtrace.Backtrace.install() -> () at /app/<compiler-generated>:?
0x0000000000011390: __restore_rt at ??:?
0x00000000000e7203: static App.RegionNameService.LookupKey.__derived_struct_equals(App.RegionNameService.LookupKey, App.RegionNameService.LookupKey) -> Swift.Bool at /app/Sources/App/Services/RegionNameService.swift:?
 (inlined by) protocol witness for static Swift.Equatable.== infix(A, A) -> Swift.Bool in conformance App.RegionNameService.LookupKey : Swift.Equatable in App at /app/<compiler-generated>:?
 (inlined by) generic specialization <App.RegionNameService.LookupKey, App.RegionNameService.FetchedName> of (extension in Swift):Swift._NativeDictionaryBuffer< where A: Swift.Hashable>.unsafeAddNew(key: A, value: B) -> () at /app/<compiler-generated>:?
 (inlined by) function signature specialization <Arg[1] = Exploded> of generic specialization <App.RegionNameService.LookupKey, App.RegionNameService.FetchedName> of Swift._VariantDictionaryBuffer.nativeUpdateValue(_: B, forKey: A) -> B? at /app/<compiler-generated>:?
0x00000000000e75b0: generic specialization <App.RegionNameService.LookupKey, App.RegionNameService.FetchedName> of Swift._VariantDictionaryBuffer.nativeUpdateValue(_: B, forKey: A) -> B? at /app/<compiler-generated>:?
 (inlined by) generic specialization <App.RegionNameService.LookupKey, App.RegionNameService.FetchedName> of Swift.Dictionary.subscript.setter : (A) -> B? at /app/<compiler-generated>:?
 (inlined by) function signature specialization <Arg[1] = Exploded> of App.RegionNameService.fetchRegionName(regionId: Swift.Int, locale: Foundation.Locale, domainId: App.DomainId) -> NIO.EventLoopFuture<Swift.String?> at /app/Sources/App/Services/RegionNameService.swift:62
0x00000000000d12a2: App.RegionNameService.fetchRegionName(regionId: Swift.Int, locale: Foundation.Locale, domainId: App.DomainId) -> NIO.EventLoopFuture<Swift.String?> at /app/<compiler-generated>:?

Seems like it is not really a fatalError:

Still, I find it surprising that this crash does not output a single line to indicate sth went wrong...

So I think you're running Swift 4.2 maybe? Because if I try to repro your situation with my terrible program

struct Bad: Hashable {
    func hash(into: inout Hasher) {
        into.combine(Int.random(in: 0 ..< 100))
    }

    static func ==(_ lhs: Bad, _ rhs: Bad) -> Bool {
        return true
    }
}

var h: [Bad: ()] = [:]

for _ in 0..<20 {
    h[Bad()] = ()
}

and then run it, I get the following (nothing) for Swift 4.2.4:

-ILL- johannes:/tmp
$ jw-docker-swift-4.2  bash -c 'swiftc -O -o test test.swift && ./test'
bash: line 1:    13 Illegal instruction     ./test
-ILL- johannes:/tmp

but for Swift 5.0

$ jw-docker-swift-5.0  bash -c 'swiftc -O -o test test.swift && ./test'
Fatal error: Duplicate keys of type 'Bad' were found in a Dictionary.
This usually means either that the type violates Hashable's requirements, or
that members of such a dictionary were mutated after insertion.
Current stack trace:
0    libswiftCore.so                    0x00007f249c7db920 _swift_stdlib_reportFatalError + 69
1    libswiftCore.so                    0x00007f249c51266d <unavailable> + 1349229
2    libswiftCore.so                    0x00007f249c6fe6e8 <unavailable> + 3364584
3    libswiftCore.so                    0x00007f249c6047b1 <unavailable> + 2340785
4    test                               0x000056251f853196 <unavailable> + 8598
5    test                               0x000056251f852286 <unavailable> + 4742
6    libc.so.6                          0x00007f249b6baab0 __libc_start_main + 231
7    test                               0x000056251f851fca <unavailable> + 4042

I think your suspicion is absolutely right. Would you mind filing a bug with as much info as possible? Like what Locales are using putting in that dictionary etc.