[Pitch] Swift Backtracing API

mattie · January 25, 2023, 1:55pm

I love the idea of capturing the presence of a discontinuity. That's exactly what you might want. Honestly, even without that in the API, it would still produce more useful, if slightly dishonest, backtrace. The whole repeated frame thing was garbage. But, it was easy to do, and was immensely helpful in practice.

You might also want to consider removing the limit altogether and make that an implementation detail. Clients won't really have a way of knowing what a reasonable limit is anyways - they'll likely either just leave it the default or make it unlimited. Crash reports on Darwin have a much smaller, predefined limit and that works great for basically all cases except infinite recursion.

fclout · January 25, 2023, 5:28pm

Yes, fundamentally the problem is that Backtrace is full of unsafe pointers to (code) objects with a lifetime we usually can't guarantee. Strictly speaking, it's not even correct to assume that the return addresses currently in the backtrace are still all mapped, or mapped to what they were at the point the function was executing. I also wonder if there are use-after-free-like issues with grabbing a backtrace to symbolicate later with DWARF, since DWARF unwinding is Turing-complete...

Backtrace is currently structured as a precursor to SymbolicatedBacktrace. IMO, we could improve the situation by having both Backtrace and SymbolicatedBacktrace have a capture method and moving images off of Backtrace. I would say that it's true regardless of the details of turning a Backtrace into a SymbolicatedBacktrace.

One way I think that we could let you turn Backtraces into SymbolicatedBacktraces fairly safely would be to have something like:

withImageSnapshot { snapshot in
    let backtrace = callDeepAndReturnTrace()
    let symbolicated = snapshot.symbolicate(backtrace)
}

where we would have to ensure that snapshot does not escape, and do nothing for frames that don't lie in images captured by the snapshot.

grynspan · January 25, 2023, 5:54pm

There's some precedent in the standard library for this sort of tuple return (where one tuple element is a flag describing the other element):

String.decodeCString(_:as:repairingInvalidCodeUnits:) returns a (optional) tuple with a flag indicating whether repairs were made.
FixedWidthInteger.dividedReportingOverflow(by:) returns a tuple with a flag indicating whether overflow occurred.

al45tair · January 26, 2023, 9:17am

I don't think this is as big a problem as you might think. SymbolicatedBacktrace has to locate the image files on disk and read symbols from them there, because the debug information doesn't get mapped into address space on most systems. The only time we access information using addresses we've obtained while backtracing is during backtracing, at which point the call stack is still threaded so there's no way that things are getting unmapped.

I've also been pondering the images thing some more. For frame-pointer-based unwinding, we don't need the image list to do the unwind, but for anything fancier we do (or, at least, we need to find the images corresponding to frames in the call stack) because we need to locate the unwind information. So I think it's right that Backtrace should be able to capture images — but it should probably be optional and maybe there should be a separate function/method to capture them after the event. I think making it a lazy var might have been the wrong choice, because it hides when that happens and I think you're right that it's important that people at least know when they're doing that capture (if it's after the backtracing).

al45tair · February 6, 2023, 2:22pm

Update: I've updated the Gist with a few changes, notably:

The Backtrace.Frame type is now an enum, and in addition to various kinds of "normal" frame it supports omitted frames so that we can represent partial backtraces of various kinds.
There is now support in Backtrace for capturing frames up to a limit, optionally with a minimum number captured at the top of the stack. This is useful when dealing with cases where a deep recursion might have taken place.
The SymbolicatedBacktrace now has its own Frame struct, which facilitates the capture of virtual frames for cases where we can recover the necessary information.

grynspan · February 6, 2023, 8:03pm

This enum is quite complicated compared to the traditional "here's a bunch of void*s, go away" model. I appreciate where the added complexity comes from, but it might be worth seeing if it can be streamlined to fewer cases. Can any of the cases be represented as missing optional values (e.g. a discontinuity being n nil values)?

bbrk24 · February 6, 2023, 9:40pm

Reading through this pitch, I see two things:
1.

  // This will be `UInt32` or `UInt64` on current platforms.
  typealias Address

Isn't that just Int? From the developer docs:

On 32-bit platforms, Int is the same size as Int32, and on 64-bit platforms, Int is the same size as Int64.

I don't see any @frozen attributes in the draft. If this is intentional, that sounds particularly painful for Backtrace.Frame as every switch statement will need an @unknown default case.

I'm not familiar enough with backtracing to comment on the substance of the API, but these two things stuck out to me.

al45tair · February 7, 2023, 12:58pm

I think that would be undesirable. The entire point here is that if there's a 100,000 frame backtrace, we don't end up allocating space for 100,000 entries in the array (or indeed trying to demangle 100,000 strings and so on).

I've pondered a few times whether it might be better to have .programCounter(Address, async: Bool) instead of the .programCounter(Address) and .programCounterInAsync(Address). That might simplify things a little, I suppose.

al45tair · February 7, 2023, 1:00pm

UInt maybe, but yes. Perhaps I should just not have put the comment there.

Good point. I'll go through and think about where we need @frozen. Backtrace.Frame is an obvious place, certainly.

grynspan · February 7, 2023, 4:50pm

I wouldn't be opposed to that. Same for returnAddress(). It would simplify calling code that doesn't need to distinguish.

I'm not certain what value .truncated has as an enum case. Why does it need to be a distinct case? Why not just have a var isTruncated: Bool property on Backtrace?

Frame.adjustedProgramCounter might need to be optional since it has no meaningful value for some of the cases in the enum.

Sorry for the nitpicks! As always, just looking to make the best possible interface.

fclout · February 7, 2023, 6:26pm

Is UInt defined to be uintptr_t, or is it defined to be the size of a general-purpose register? That matters on platforms where those aren't the same, like the watchOS arm64_32. (That would also matter on weird platforms that have segmented addresses and such, but I don't think that Swift exists there yet.)

al45tair · February 9, 2023, 12:37pm

Interesting question; I assume UInt is the size of a general-purpose register, since defining it to be the size of an address would make no sense on arm64_32. What matters here is that Address will be a type suitable for holding an address. I've updated the comment to clarify that the expectation is that it will be something appropriate that conforms to FixedWidthInteger.

al45tair · February 9, 2023, 12:38pm

So, it turns out that we don't need the programCounterInAsync or returnAddressInAsync cases after all (those appeared because of some code I was working on that I'd since changed).

Karl · February 9, 2023, 1:17pm

Why not use UnsafeRawPointer? Why even define an address type at all?

al45tair · February 9, 2023, 2:00pm

We don't want people to dereference them. They might find their way into a totally unrelated process for one reason or another (e.g. you could save the data from the backtrace into a file and then have another program load it and symbolicate it later). Additionally in a threaded program, some addresses might actually not be valid any more at pretty much any point after they've been captured.

jayton · February 9, 2023, 2:01pm

Nevertheless, UInt is 32 bits on arm64_32.

al45tair · February 9, 2023, 2:02pm

Oh, interesting (I hadn't checked). I'm not sure that's the choice I would have made, but there we are.

Karl · February 9, 2023, 2:19pm

Well, a pointer doesn't mean it's safe to dereference.

But I think we should use the standard Swift types for addresses. If we ever did support systems with non-numeric addresses/address spaces, the pointer type would be adjusted appropriately I think. It is also how C APIs get imported, so if there are any contexts in which you can safely pass this data to a platform API, URP would be the most convenient type.

I'm not sure that giving people a less convenient type really helps with much. It's likely to be more of a mild annoyance IMO.

al45tair · February 9, 2023, 2:26pm

I think you're making the assumption that the process that you got the backtrace from has the same sized pointers as the one that you're looking at them in. That isn't an assumption I want to code into the API here, which is another reason (besides discouraging dereferencing) to use a separate type. (It might seem like a non-issue on Apple platforms, since we don't support 32-bit processes any more, but Linux and Windows very much do — even though it's unusual for a Linux box to be set up to do it.)

These really aren't pointers. System APIs that you're likely to use them with likely have made a similar choice to encode them as unsigned integers.

Karl · February 9, 2023, 2:44pm

So shouldn't it be a generic parameter rather than a fixed type? If it's a fixed type then it can only ever support backtraces with a single address type (i.e. the same size as the current process).

I just think it's a strange design choice. If you want to keep the values as a kind of opaque token, I'd suggest wrapping the value in a struct rather than exposing the actual integer.