Differentiable programming for gradient-based machine learning

I agree with this. While it is interesting that you've reduced the cost of these in a specific case, we shouldn't have codegen of a core language feature rely on heroic runtime implementation, and having compilation be fragile w.r.t. number of loop iterations or execution path seems unacceptable. We need guaranteed correctness out of core language features, and ideally a predictable execution cost model.

I think what you need here is a way in SIL to guarantee stack allocation of these closures. This will naturally solve the problem that you're seeing here because the closures in each loop iteration will naturally get reused (since the alloc/dealloc stack is scoped to the loop body. This will also make everything else cheaper using autodiff, even more performant than bump pointer allocator. This will also enable secondary optimizations to clean up the IR a lot better, since many of the closures will be inlined, and the alloc/dealloc stacks will be removed by existing optimizations.

-Chris

2 Likes