DeepSwift

Dear community,

let me introduce DeepSwift.

DeepSwift is a pure swift repo for differentiable programming that - surprise! - does not require any compiler magic.

Ever since I came across Backprop as a Functor, I tried to provide a Swift implementation. While coming up with something that works was simple, it took me years of throwing everyting away and starting over to come up with a design that I find elegant and general.

Here's an example usage of the high level API that could have been wrapped around other differentiable programming frameworks:

struct MyModel : Learner {

   var body = SomeLayer() |> SomeOtherLayer() |> ThridLayer()

}

The "low level" API on the other hand looks like this:

struct Foo : Layer {

    /* //can be inferred!
    typealias Input = Float
    typealias Output = Float 
    typealias Adjustment = Float 
    typealias AuxiliaryData = Float
    */
    
    func inspectableApply(_ input: Float) -> (result: Float, auxiliaryData: Float) {
        //...
    }
    
    func adjustment(input: Float, auxiliaryData: Float, gradient: Float) -> (adjustment: Float, backprop: Float) {
        //...
    }
    
    mutating func move(_ adjustment: Float) {
        //...
    }

It's all just plain Swift, without any @differentiable annotations. Instead, the required information for the backward pass (i.e. layer input and "auxiliary data" [in general, this does not need to be the derivative]) will simply be stored using a special ChainedLayers layer that can be found in the repo. No mysteries here!

The rest of the repo is mostly my attempt to provide conveniences for further high level APIs (like layers that combine the outputs of two layers, thereby enabling e.g. ResNet), a proof of concept of optimizers (which are currently designed as a local property) and alternatives to ChainedLayers that contain the same type of layer multiple times or even repeatedly apply the exact same layer.

In a private repo, I even have proof of concepts that one can write attention modules purely with a few operators. I will open source this additional private repo as soon as I found a way to test all this, but it should only be considered a proof of concept, as I was only able to make sense of Accelerate's BLAS implementation, vDSP and vForce, but nothing that can do any useful math on GPUs or even remote computation.

Given how long I've been throwing this away and starting over again and how satisfied I am with the current design, there's a good chance that I will continue the development until I can meaningfully evaluate if - using appropriate math frameworks - the approach can compare to torch and flow.

4 Likes

Awesome work! Have you checked out DL4S? That framework's creator had something similar in mind, making an entire alternative to Swift for TensorFlow using library-integrated differentiation. It could even do second-order differentiation, which Swift AutoDiff can't do. He used just Accelerate and MKL, rewriting his repo several times, but could only implement eager mode and a CPU backend. I actually spent a lot of time planning to make a GPU backend for DL4S before changing my mind and going with S4TF. The reason I switched was how aesthetic the language-oriented AutoDiff looked.

I think you might find the Differentiation package interesting as well.

Thanks so much for your feedback, I was not aware that there are other no-toolchain-yet-general approaches!

I‘ll review that and come back to y’all :)

Info for everyone who might be interested: I just answered @philipturner's issue on git. I also opened a discussions section in the repo.