Thanks for starting this interesting discussion!
I also assumed that valueWithDifferential(at: x, in: f).value == f(x) and valueWithPullback(at: x, in: f).value == f(x) are reasonable semantic requirements, but it seems there may be performance-related motivations for relaxing these requirements.
I'm not familiar with Julia, so here's my interpretation of the linked Julia code:
function rrule(::typeof(/), A::AbstractVecOrMat{<:Real}, B::AbstractVecOrMat{<:Real})
Aᵀ, dA_pb = rrule(adjoint, A)
Bᵀ, dB_pb = rrule(adjoint, B)
Cᵀ, dS_pb = rrule(\, Bᵀ, Aᵀ)
C, dC_pb = rrule(adjoint, Cᵀ)
function slash_pullback(Ȳ)
# Optimization note: dAᵀ, dBᵀ, dC are calculated no matter which partial you want
# this is not a problem if you want the 2nd or 3rd, but if you want the first, it
# is fairly wasteful
_, dC = dC_pb(Ȳ)
_, dBᵀ, dAᵀ = dS_pb(extern(dC))
# need to extern as dAᵀ, dBᵀ are generally `Thunk`s, which don't support adjoint
∂A = @thunk last(dA_pb(extern(dAᵀ)))
∂B = @thunk last(dA_pb(extern(dBᵀ)))
(NO_FIELDS, ∂A, ∂B)
end
return C, slash_pullback
end
The derivative of matrix division (A / B) is defined in terms of complex conjugate operations (adjoint) and matrix inverse division (A \ B, i.e. B / A).
The original result C is computed as adjoint(adjoint(B) \ adjoint(A)). This C may differ slightly from C_true = A / B, but the computed intermediate values d{A,B,S,C}_pb enable a more efficient slash_pullback implementation. This motivates relaxing the strict C == C_true requirement.
I wonder if you meant to flip the inequality? i.e. ∀x. abs(v - v_true) <= abs(f(x) - v_true)
The flipped inequality makes sense to me: it's okay for valueWithDifferential(at: x, in: f).value to be more accurate than f(x) for all x. The original inequality doesn't quite make sense to me, since it doesn't restrict abs(v - v_true).