Unable to define custom derivative for MPSGraph methods

Hi Swift community! It's my first post here.

So it's been a long time since S4TF's archived. But Swift's differentiation feature remains attractive to me!

I am able to register custom derivative for (Float) -> Float type:

import _Differentiation

var input: Float = 0.1
func squareFunc(_ input: Float) -> Float {
    input * input
}

@derivative(of: squareFunc)
func squareFunc_(_ input: Float) -> (value: Float, pullback: (Float) -> Float) {
    let value = squareFunc(input)
    return (value: value, pullback: { dx in 3.0 * value * dx })
}

print(gradient(at: input, of: squareFunc))
// Prints: 0.030000001

But when it comes to defining custom derivative for Metal Performance Shaders Graph's methods, I get unexpected error for the following code: Referenced declaration 'square' could not be resolved.

import _Differentiation
import MetalPerformanceShadersGraph

extension MPSGraph {
    
    @derivative(of: MPSGraph.square)
    func squareDerivative(_ inputTensor: MPSGraphTensor) -> (value: MPSGraphTensor,
                                                             pullback: (MPSGraphTensor) -> MPSGraphTensor) {
        let value = square(with: inputTensor, name: nil)
        let fakeDerivative = multiplication(value,
                                            constant(3.0,
                                                     dataType: .float32),
                                            name: nil)
        return (value: value, pullback: { _ in fakeDerivative })
    }
}

Has anyone comes across resolving this issue or can somebody please guide me?

Mentioning you both @rxwei, @dan-zheng for help because Metal algorithmic differentiation seems to have not been touch yet.

Regards
Rahul Bhalley

You'll need to make MPSGraphTensor conform to Differentiable and AdditiveArithmetic.

Thanks for reply.

So I tried conforming to AdditiveArithmetic:

extension MPSGraphTensor: AdditiveArithmetic {
    
    public static func + (lhs: MPSGraphTensor, rhs: MPSGraphTensor) -> Self {
        return lhs as! Self
    }
    
    public static func - (lhs: MPSGraphTensor, rhs: MPSGraphTensor) -> Self {
        return lhs as! Self
    }
    
    public static func += (lhs: inout MPSGraphTensor, rhs: MPSGraphTensor) {
        // Do nothing for now.
    }
    
    public static func -= (lhs: inout MPSGraphTensor, rhs: MPSGraphTensor) {
        // Do nothing for now.
    }
}

For both func += (lhs:rhs:) and func -= (lhs:rhs:) I get the error: Protocol 'AdditiveArithmetic' requirement '+=' cannot be satisfied by a non-final class ('MPSGraphTensor') because it uses 'Self' in a non-parameter, non-result type position. I think we can't mark a class final using retroactive modeling.

Is there any possible workaround this issue?

Instead of conforming the class itself to AdditiveArithmetic, you could define a nested TangentVector type within the class type (wrapping an MPSGraphTensor), and conform TangentVector to AdditiveArithmetic.

MPSGraph has its own highly optimized automatic differentiation. I plan on using that instead of Swiftโ€™s autodiff whenever possible in MetalXLA. Since Swiftโ€™s autodiff is currently incredibly buggy, thatโ€™s a safer choice as well.

Also @rxwei any chance we can update some of the broken S4TF tutorials about autodiff? See swift-colab for more context.

I think MPS instead of MPSGraph will be a better choice too as it'll allow for explicit encoding & committing of the commands to queue as you might already be used to from bare Metal API. Whereas MPSGraph has a simpler and abstract approach of running session like Python TF 1.x (IDK if that still exists in TF 2.x).

MPSGraph has its own highly optimized automatic differentiation.

I wonder if it's able to recognize derivative methods for custom (using extension) methods.

MPS may be more flexible, but MPSGraph is incredibly optimized. If youโ€™re going for performance and have the time to figure out how to work MPSGraph into your workflow, itโ€™s a good idea to attempt doing that. There is no XLA compiler for Apple, but MPSGraph does use MLIR internally.

1 Like

I wonder if it's able to recognize derivative methods for custom (using extension ) methods.

That isn't possible. MPSGraph isn't extensible.

I'm sure we can definitely add new methods using extension MPSGraph. (I've been doing this. But, just mentioning, MCLGraph doesn't let you define new methods.) But I don't think MPSGraph will recognize the derivative method for corresponding forward methods.

What I meant is that MPSGraph only recognizes a specific instruction set internally. You could compose new ops out of smaller ones that it recognizes, but there are fundamental problems with that approach. For example, it might not handle 3D convolutions well.

I might have not understood what you meant. But I tried it anyway and it didn't work either:

extension MPSGraphTensor: Differentiable {
    //public typealias TangentVector = type
    
    public struct TangentVector: AdditiveArithmetic {
        // IDK what to write here, for now.
    }
}

I am suggested to add stubs and I get public typealias TangentVector = type.

What is the issue in my code?

All the members must be declared differentiable in the original declaration, or you must try @memberwise Differentable. Neither approach is feasible. I recommend making a wrapper around MPSGraphTensor, and making copies of all math functions (with custom derivatives) that manually calculate derivatives on an MPSGraphTensor.

Again I'll get stuck at making those wrapped functions differentiable with Differentiable. It seems tedious to make Metal functions differentiable using Swift compiler's differentiation feature.

Update: Nah nah, this can't be done easily! Needs to be digged down very deeply into MPSGraphTensor type's properties like S4TF's Tensor conformance. It seems crazy difficult for Metal-derived frameworks.

I think a better approach would be to expose underlying C++ tensor implementation using module maps from either PyTorch's ATen library or CTensorFlow (as S4TF does) and then conform Tensor structure to Differentiable as in S4TF.

Wait, again no Metal support. Might use Metal-custom shaders in Swift for @differentiable and forward function. Or, secondly, could now convert Tensor -> scalars -> MPSGraphTensor[Data] and put this code in @differentiable. Can't be sure until both are tried. Second, seems more uncertain.

Have you tried looking at how S4TF interfaces with the XLA compiler for CUDA and TPU? That might give you some ideas for how to proceed, as they are in a similar situation to MPSGraph.

If you would help me with the Metal backend for S4TF, it might be the only feasible solution to your problem. Correct me if I misunderstood your update.

The low-level XLA interface is completely independent of autodiff. S4TF lowers it down into raw machine instructions, so S4TF's frontend takes care of the problems with using differentiation with MPSGraph. That's why I see it as a solution to your problem.

If you would help me with the Metal backend for S4TF, it might be the only feasible solution to your problem. Correct me if I misunderstood your update.

I'm not an expert at GPU programming like you are. But if you're okay with me asking any dumb questions along the way then I am excited to help you.

Really? You changed your mind? If so, then that is awesome!

Yes, I want to run DL algorithms in Swift on Metal.

Also it doesn't feel right that [only] you're putting this much efforts for S4TF alone. I want to help you.

2 Likes

Let's continue this discussion in a new issue on MetalXLA, so it doesn't flood Swift Forums with updates.