[stdlib] Cleanup callback for fatal Swift errors

davidbaraff · December 15, 2019, 6:16pm

Is this going anywhere?

More than two years ago, I posed a very similar question: "could someone PLEASE let me have the contents of just the fatal error string or similar", and it's hard to believe that after nearly 4 years of Swift, we still have no way of getting this vital info out of a crashing program:

Joe_Groff · December 15, 2019, 7:03pm

The Swift compiler in development now emits fatal error traps with an artificial inline frame, which will allow crash reporters than understand debug info to show the error directly as part of the backtrace. Future toolchains that include this change should be able to do better.

jrose · March 25, 2022, 6:17am

I’d like to necro this thread. More information in crash logs is good, but a crash log can’t always be correlated with, say, an external logging system that gets submitted to support by the user. Having the ability to write the fatal error and maybe the backtrack to the external log, and at least try to flush it, would be invaluable.

Can this functionality be abused? (say, by restarting the run loop) Yes, it can. Can it result in even worse behavior? Yes it can, but not worse than anything you can do with threads (your thread can always be suspended between the cmp and the jmp swift_reportFatalError). Is it still useful? Yes, absolutely.

Is there a useful feature handling actor-isolated recovery that does not have to result in process termination? Perhaps. Will the currently-proposed feature handle all kinds of exits? Definitely not. Would this preclude any such expansion in the future? Nope!

Is it hard to implement? Nope, the code is already well-factored for it! Will it cost anything at the fatal error site? Nope, there’s already a call there! Can it be backwards-deployed? Not easily (none of these symbols were planned as interposable), but maybe someone clever can figure something out. If not it’s still worth it, though.

(Personally I think supporting multiple handlers is a good idea, but chainable handlers is technically sufficient: fetch the old handler, capture it into your new handler, call it when you’re done with your own work.)

Joe_Groff · March 25, 2022, 3:56pm

There are still a lot of bare traps that are directly generated by the compiler, integer overflow checks being the typical example, and I think we'd want to reserve the right to reduce fatal errors with static strings into traps with inline frames in the future too. I think a crash interceptor is something we could reasonably provide, but I also think it'll need to involve catching signals to some degree.

jrose · March 25, 2022, 4:24pm

The problem with catching signals is, well, everything involved in catching signals. Either you’re handling them in a context that’s very limited (POSIX allows me to write to a file here, but does Foundation’s implementation?), or you’re waking up another thread. Neither is great.

There’s definitely a code size concern here, and I see how compact crash info would help that. You could still get most of that benefit from a hypothetical swift_deriveFailureInfoFromReturnAddress or something, that did exception-table-like lookup before calling reportFatalError.

The main thing here is that “custom handling” is at one end of the process and “what gets reported and how” is at the other end, and they don’t have to be designed together. Adding this functionality now does not preclude adding more later—though it does ratchet in anything currently using reportFatalError to keep providing that functionality. The perfect has been the enemy of the good for quite a while.

EDIT: I kind of don’t want to put signal handling in this, because it’s much harder to distinguish whether a signal is a “soft fault” or a “hard fault”. Signal handlers are also very limited to one per signal, and I’m not sure Swift should claim that handler for this mechanism. Meanwhile, we know this kind of thing is useful, because of NSSetUncaughtExceptionHandler, mentioned in the very first post in this thread.

Joe_Groff · March 25, 2022, 5:10pm

Yeah, I agree that signal handling is gross. Windows and Darwin have more composable alternatives, at least—SEH is a natural fit to containing fault handling behavior to a scope, and Mach exception handling is gross but more flexible than signal handling. I know Linux has grown a few new mechanisms like signalfd (though AIUI that's targeted primarily toward asynchronous signals and not usually terminal signals like SIGSEGV/BUS/ILL/TRAP), but I don't know if there are yet more alternatives there these days.