Force-unwrapping, `try!`, and `fatalError` in the LLDB REPL cause memory leaks

I have noticed a situation where a failed force-unwrapping, try!, or fatalError in the LLDB REPL causes memory leaks.

I'm very interested in finding a way to make them not leak memory, because force-unwrapping, try! and fatalError are very convenient to use in the REPL when you're just playing around and you don't want to write code that handles errors with Swift's error handling mechanisms.

The leak

Consider what happens if you run this in the LLDB REPL:

func eatMyMemory(_ thing: Int?) {
  let veryBigArray = Array(repeating: Int64(1), count: 100_000_000)
  thing!
}
eatMyMemory(nil)

This leaks memory, because the optional unwrapping halts execution before the cleanup code that releases veryBigArray runs. Nothing that you can do (as far as I know) will make LLDB run the cleanup code.

Fix idea

I'm curious what people think of this fix idea: Teach the compiler to generate cleanup code for each function that makes it possible to cleanly exit that function. (This is kind of like the cleanup tables that some C++ implementations use to handle exceptions). Whenever the LLDB REPL unwinds the stack after a fatal error, it can run the appropriate cleanup codes.

This sounds pretty big and complicated. At least it can be completely turned off for code that is not intended to run in the REPL, so it won't add any cost to non-REPL code.

Are there any simpler ways to fix this? Any preexisting code that does something similar that I could use?

More context: Swift for TensorFlow

I noticed this situation while I was thinking about Swift for TensorFlow. In TensorFlow, it's common to run stuff like this:

var model = Model()
for epoch in 0..<10 {
  let update = bigComputationThatAllocatesLotsOfIntermediateStuff()
  model += update
}

Now there are a few ways that the leak can be triggered:

  • bigComputationThatAllocatesLotsOfIntermediateStuff often encounters runtime errors (e.g. because the user tries to multiply matrices with incompatible shapes). In our TensorFlow library, these runtime errors are implemented as fatalError because it would be very onerous to make every mathematical operation throws. So the leak occurs.
  • The user might want to halt execution and unwind the stack because something is taking longer than they expected. This also causes the leak.

The LLDB REPL is always going to leak because it stores the results of operations forever. I don't think it's worth protecting against leaks-from-traps. It'd be better to have a "start over from scratch" command that let you keep your defined declarations.

1 Like

I still think that it is worthwhile to protect against leaks-from-traps (or find some other way to provide some easy-to-use-in-interactive-environment runtime errors that do not leak), for the TensorFlow use case, and in particular in Jupyter. When doing ML stuff in Jupyter, it's very common to:

  1. Run some time-consuming code that loads up some data.
  2. Run some experiment that allocates gigabytes of intermediate memory but only returns a few Floats at the end.
  3. Get a runtime error while the experiment is running.
  4. Fix the code and go back to step 2.

Leaks-from-traps block this workflow, and other small leaks don't. (Also, I think the Jupyter kernel implementation that I have now does not store intermediate results or do anything else that "intentionally" leaks memory).

Asking the user to manually start over from scratch is a workaround that works today, but it forces the user to wait while (1) is rerunning, which is inconvenient.

Your suggestion about having a "start over from scratch" command that lets you keep your defined declarations sounds very interesting. How would it work? If it just reruns your code to get back your defined declarations, it suffers from the same problem as manually starting over from scratch, but it's a bit more convenient. If it could somehow move all referencable resources from the old to the new, then it would solve all my problems. But that sounds quite complicated to do.