While "tangent" is short and is obvious to differentiation users, its meaning may not be obvious to people who don't use differentiable programming. Most standard library numeric types will conform to Differentiable
, so I think it's best if the name is self-documenting and if it can be quickly disambiguated by doing a simple web search. When you search for "tangent", the definition that pops up is "tangent line" in geometry. But when you search for "tangent vector", what shows up is a very accurate definition of what it means in Swift differentiable programming:
In mathematics, a tangent vector is a vector that is tangent to a curve or surface at a given point. Tangent vectors are described in the differential geometry of curves in the context of curves in Rⁿ. More generally, tangent vectors are elements of a tangent space of a differentiable manifold. Wikipedia
Reverse-mode AD's derivatives will produce pullbacks. Forward-mode AD's derivatives will produce differentials. They're mathematically transposes of each other but are produced by different compiler implementations. Currently, we only have reverse-mode differentiation stably implemented in the compiler. While we believe differentiation would be complete with both modes (better unified together), it will require a significant amount of engineering and its use cases are not nearly as dominant as most gradient-based machine learning. With this proposal we are hoping to enable Swift to deliver a good experience for ML use cases, in a way that's forward-compatible with more general abstractions (@differentiable
functions, for example).
At the type level, they are very different because they have different ABI. A reverse-mode differentiable function's ABI is a tuple of the original function and a derivative function that produces a pullback ((R'...) -> (T'...)
):
original: (T...) -> (R...)
derivative: (T...) -> ((R...), (R'...) -> (T'...))
* An apostrophe stands for the associated tangent vector. For example, T'
means T.TangentVector
.
A forward-mode differentiable function would be a tuple of the original function and a derivative function that produces a differential ((T'...) -> (R'...)
):
original: (T...) -> (R...)
derivative: (T...) -> ((R...), (T'...) -> (R'...))
In the manifesto, @differentiable
functions are defined in an efficient and compact representation such that a derivative function produces all three things: the original result, the differential and the pullback. (It uses @differentiable(linear)
to represent the bundle of differential and pullback because they are transposes of each other. But in binary this is what it looks like:
original: (T...) -> (R...)
derivative: (T...) -> ((R...), ((T'...) -> (R'...), (R'...) -> (T'...)))
^~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~
differential pullback
Every function type needs to have a stable ABI. A differentiable function that doesn't have a differential-producing derivative is not the same as a general @differentiable
function, not at the representation level, and therefore not at the type level.