Dear community,
let me introduce DeepSwift.
DeepSwift is a pure swift repo for differentiable programming that - surprise! - does not require any compiler magic.
Ever since I came across Backprop as a Functor, I tried to provide a Swift implementation. While coming up with something that works was simple, it took me years of throwing everyting away and starting over to come up with a design that I find elegant and general.
Here's an example usage of the high level API that could have been wrapped around other differentiable programming frameworks:
struct MyModel : Learner {
var body = SomeLayer() |> SomeOtherLayer() |> ThridLayer()
}
The "low level" API on the other hand looks like this:
struct Foo : Layer {
/* //can be inferred!
typealias Input = Float
typealias Output = Float
typealias Adjustment = Float
typealias AuxiliaryData = Float
*/
func inspectableApply(_ input: Float) -> (result: Float, auxiliaryData: Float) {
//...
}
func adjustment(input: Float, auxiliaryData: Float, gradient: Float) -> (adjustment: Float, backprop: Float) {
//...
}
mutating func move(_ adjustment: Float) {
//...
}
It's all just plain Swift, without any @differentiable
annotations. Instead, the required information for the backward pass (i.e. layer input and "auxiliary data" [in general, this does not need to be the derivative]) will simply be stored using a special ChainedLayers
layer that can be found in the repo. No mysteries here!
The rest of the repo is mostly my attempt to provide conveniences for further high level APIs (like layers that combine the outputs of two layers, thereby enabling e.g. ResNet), a proof of concept of optimizers (which are currently designed as a local property) and alternatives to ChainedLayers
that contain the same type of layer multiple times or even repeatedly apply the exact same layer.
In a private repo, I even have proof of concepts that one can write attention modules purely with a few operators. I will open source this additional private repo as soon as I found a way to test all this, but it should only be considered a proof of concept, as I was only able to make sense of Accelerate's BLAS implementation, vDSP and vForce, but nothing that can do any useful math on GPUs or even remote computation.
Given how long I've been throwing this away and starting over again and how satisfied I am with the current design, there's a good chance that I will continue the development until I can meaningfully evaluate if - using appropriate math frameworks - the approach can compare to torch and flow.