I'd like to express my enthusiastic support for this proposal: we started the SwiftFusion project, initially a collaboration between Google Research and Georgia Tech, precisely because the idea of having differentiable functions as first class citizens is so appealing. In our case, it allowed us to unify non-linear optimization based on factor graphs (what gtsam.org is all about) and deep learning, allowing us to learn factors in a data-driven way.
I am not enough of a swift guru to provide deep technical feedback, but I am glad to see there is substantial discussion.
Forward differentiation would be a great future contribution for non-ML applications, specifically those applications where second-order optimizers can be used efficiently, as opposed to gradient descent - which only needs the gradient and not the Jacobians. Thinking about sparse Jacobians would also be interesting.