[Pitch] Swift Backtracing API

al45tair · January 23, 2023, 4:29pm

So, if you have an async frame that calls a non-async frame, then sure, that's a return address. But if you have an async frame in the activation chain below that, the frame address is the resume address, which is an accurate program counter that points to the start of a function (which corresponds to the continuation of your Swift function after the await). Subtracting from that will not yield the expected results

Similarly, program counter addresses captured from a signal handler or, as I say, captured by grabbing a thread context somehow, are going to be accurate values and not return addresses.

mattie · January 23, 2023, 4:37pm

Yeah, I've never been super-clear on all the different situations where an address needs to be adjusted. I have just spent a lot of time investigating symbolication issues where you have a wrong line/column, and was due to adjustment.

I'm totally into this! It absolutely belongs here. I guess I just got tripped up by the comment, which explicitly lists just one specific situation.

fclout · January 23, 2023, 5:21pm

Glad to see this! What is Frame.adjustedProgramCounter? Is it, like, the address of a call instruction rather than the return address of the call?

al45tair · January 23, 2023, 5:33pm

Like that, but not quite

In general it will point into the call instruction, rather than necessarily at the start of it, which is sufficient to make things like symbolication work correctly.

The reason it might not point at the call instruction is that some instruction sets are variable length; the worst example is obviously x86 (including 32 and 64-bit), where it's extremely difficult to parse the instruction stream backwards to find the call instruction that executed — indeed, an instruction could contain a large immediate value that itself looks like a valid call instruction, and there's really no way to know which thing it was that executed. A mischievous assembly language programmer, or a Sufficiently Advanced Compiler (TM), could even construct an instruction stream where both instructions were used!

As a result, the traditional strategy for this is just to subtract one(!)

blangmuir · January 23, 2023, 5:50pm

    /// The base address of the image.
    public var baseAddress: Address

Every tool I've used shows the address range; would it make sense to include the upper bound as well? I don't feel strongly about this, just something I noticed.

It would be nice to be more explicit about whether backtraces can contain inline frames or not (obviously subject to having enough information to reconstruct them). If inline frames are possible, will they appear in the Backtrace or only the SymbolicatedBacktrace?

I'm not an expert here, but one thing I would expect is that tracing across multiple images comes with a potentially large cost per image to load debug info if it is separate from the executable/library image itself (loading object files or dsyms, maybe even retrieving dsyms from the network).

etcwilde · January 23, 2023, 6:32pm

The captured info looks mostly reasonable to me. I'm confused about the public API though. I can't think of another language that lets/makes folks insert their functions into a backtrace. Even in places where there is something of an "API", isn't that usually for JIT operations? So, just to clarify, getting backtraces won't require explicit programmer intervention and we will see frames in places that look/behave like function calls (actual function calls, closure-closures, auto-closures, computed properties, etc...) without source changes right? How does this behave around inlining? Will the displayed backtrace be like the frames listed from lldb where they sometimes can include inlined frames, or will those disappear?

Edit: I just re-read what capture does. That explains more. I can't help but wonder if folks are going to try to re-implement C++ exceptions with this.

al45tair · January 23, 2023, 7:21pm

My original designs for this did just that, however it turns out that it's not quite as simple as it sounds and it could even be a little misleading. Images aren't necessarily linearly mapped into address space; an obvious example is that on Darwin, images that are in the shared cache have been somewhat rearranged (which is why the Crash Reporter logs actually show the extents of the __TEXT segment, rather than the whole image as you might imagine).

I will ponder whether it might make sense to provide the upper boundary of the text segment, since that might conceivably be useful.

al45tair · January 23, 2023, 7:29pm

I'm glad because I was really confused for a moment there. At some point you'll have to explain to me what wacky behaviour you thought I was proposing.

@mattie did point out that there might be an issue surrounding inlined frames. Right now, you'll get one frame per program counter/unwind step, but it sounds like it might be possible to do better than that in some cases — I need to investigate further on that front I think.

etcwilde · January 23, 2023, 9:30pm

I'm glad because I was really confused for a moment there. At some point you'll have to explain to me what wacky behaviour you thought I was proposing.

For a second, I thought you were trying to productize the "stacktraces" in the StdlibUnittests, where, instead of walking the stack, it keeps its own stack of "SourceLoc" objects, and then have people do the equivalent of SourceLocStack.withCurrentLoc()

github.com

apple/swift/blob/eecde02dde62424bc8a93de09f30741cf6f946e8/stdlib/private/StdlibUnittest/StdlibUnittest.swift#L51-L111


      
          public struct SourceLoc {
            public let file: String
            public let line: UInt
            public let comment: String?
          
            public init(_ file: String, _ line: UInt, comment: String? = nil) {
              self.file = file
              self.line = line
              self.comment = comment
            }
          
            public func withCurrentLoc(
              _ file: String = #file, line: UInt = #line
            ) -> SourceLocStack {
              return SourceLocStack(self).with(SourceLoc(file, line))
            }
          }
          
          public struct SourceLocStack {
            let locs: [SourceLoc]

This file has been truncated. show original

With regard to inlining, I'm fine if it's its effectively the same as stack traces from C or C++, it's good enough for me. Technically, the wacky behavior above would "work" in the inlined case, but also, no, please don't.

xwu · January 24, 2023, 5:29am

Just one drive-by nit: since it’s non-mutating, it appears the method should be Backtrace.symbolicated() rather than symbolicate().

al45tair · January 24, 2023, 1:08pm

I've updated the Gist to reflect some of the feedback above; thank-you to everyone who has contributed so far

fclout · January 24, 2023, 6:29pm

It's surprising to me that the Image array belongs to Backtrace instead of SymbolicatedBacktrace. It's not obvious how to use Image to do anything with Frame yourself (especially given that there's only a baseAddress and no upper bound), while I find that the relationship to SymbolicatedBacktrace is evident as it vends Symbol objects with indices into the Image array. Wouldn't it be a better match there?

al45tair · January 24, 2023, 6:30pm

The reason I did that is that I was thinking that you might want to capture the image list and then symbolicate later. I can see why you might think it odd, though, because as you say it doesn't seem like it's useful in Backtrace.

fclout · January 24, 2023, 7:01pm

I thought about it more and there's another potential issue: images is lazy, so the instances you get when you start looking at the array could be different from the ones you had when the backtrace was captured. On Apple platforms, dlclose is unusual and absolutely hated by the linker folks, but for instance, it's a common thing to do on Windows because of how much runtime dynamic linkage COM carries around with it. You could get incorrect results (missing images, or worse, different images at captured addresses) if you capture the list of images after the function capturing the backtrace has returned.

al45tair · January 25, 2023, 8:34am

Indeed. I was aware of this.

I'm not entirely sure what the best approach is here, honestly. We definitely don't want to capture images every time someone captures a backtrace, because it's unnecessary and expensive. I wonder if perhaps we should have a separate type to hold images, and require that as an argument to the symbolicated() function (maybe with a default value that constructs the image list automatically if you don't provide it). That would at least make it clear that you were choosing to capture images at a particular point in your program.

mattie · January 25, 2023, 12:38pm

Obviously easy for me to say, but I really like the change to UnwindAlgorithm. Do you still think of this enum as an algorithm, or do you think perhaps "strategy" fits better now?

I understand why Backtrace.Image.buildID is a [UInt8], but do you think Data could be a little easier to work with?

Can I ask why Backtrace.Frame's programCounter is a tuple? Could isReturnAddress just be moved to a regular property, kind of like isAsync?

Thinking about the capture limit, I totally get why it's in there. Runaway recursion on a macOS main thread can get into the 100s of thousands before crashing. A strategy I've used in the past was to detect repeated frames during unwind and just count them. This is very simplistic, but helps to compress the stack enough, as often it is desirable to know how you got to this point, instead of just seeing a huge stack that's ultimately cut off. This could be done with a Backtrace.Frame.count property. Just food for thought.

It sounds like you may still be investigating how to handle inlined functions. But, I am interested to see where that goes.

Can I also just ask, how is the user supposed to supply symbol-rich files for symbolication?
Might a user-supplied symbolication system be an option?

hassila · January 25, 2023, 12:44pm

Would add a Foundation dependency..

al45tair · January 25, 2023, 1:00pm

I have no strong views on "algorithm" versus "strategy".

Honestly, in most places where I might have used Data in the past, I'm inclined to use [UInt8] instead these days anyway (and that's leaving aside the Foundation dependency that we can't take here).

Indeed, that's how I had it originally, but @grynspan pointed out that you will always want the return address flag whenever you access the program counter, and making it a tuple instead stops anyone from failing to realise that fact.

People who think they don't care probably want the adjustedProgramCounter instead anyway.

That is an interesting suggestion.

I think the answer is that we aren't going to attempt that. LLDB can do it because it has the full debug information available, but we won't, at least not when unwinding.

The current plan is to lean on the platform APIs for this where available, so the answer is going to be "whatever they do". The exception is Linux, which doesn't really have a platform API for symbolication, but even there there is already a standard way to try to locate debug information using the build ID, so we'll probably do that.

I'm not planning anything particularly special in this area.

mattie · January 25, 2023, 1:10pm

If you can fill in source location information, you've already got what you need to expand the inlined stuff. I'm not referring at all to changing how unwinding works, I'm only talking about how one frame address can map to > 1 symbol/source location.

Which API is going to be used on Darwin? I'm only aware of CoreSymbolication, which is SPI (and also supports symbolicating inlined functions).

al45tair · January 25, 2023, 1:24pm

Having thought further on this, this strategy doesn't really work. It solves the simple case where you have a single function that is recursing many times, but it won't work for mutual recursion and in general you'd need to try to detect and somehow annotate cycles.

Perhaps the simplest solution might be to add another parameter that lets you specify the minimum number of frames from the top of the stack that should be captured and then have some indication of a discontinuity. I'll think more on that.