Ongoing work on differentiable Swift

Hi @philipturner, I assure you I'm quite familiar with all the technology you mention. :)


There isn't yet a formal working group for Differentiable Swift, or for ML-in-Swift efforts in general, but I think this thread is a reasonable place to ask this question. We're putting together a project-wide "roadmap" blog post looking forward to the next year of Swift development, and as part of that we're soliciting 2–4 paragraphs from a bunch of different corners of the project about the major things we're hoping to achieve in this timeframe. If the people here are willing to contribute that (say, by September 6th), I'd be happy to include it in the post.

It should be focused mostly on deliverables in official Swift projects, although briefly covering / linking to related projects may be fine.


Hey, John! Thanks for the opportunity, glad to hear about the interest. I can only speak for the group of people I work with, but we believe the next year will be an important one for differentiable Swift and related language features. We plan on investing in upstreaming improvements to the Swift language in these core areas:

Robustness: We will focus on fixing issues in differentiable Swift that impact production applications as we encounter them, but we are observing fewer of those over time. There are several other known issues (many with simple reproducers) present in Swift GitHub issues that can also be addressed.

Automatic differentiation performance: We'd like to significantly improve the host-side performance of compiled code using differentiable Swift. Nominally, the compiler-generated backwards pass through arbitrary Swift functions should be as fast (or only slightly slower) than the original version (forward pass) of those functions. At present, the compiler-generated backwards pass is orders of magnitude slower in many cases, so we have optimizations planned over the next year to make the backwards pass much faster (one example). We'd like to see if we can make Swift automatic differentiation as fast as, or even faster than, alternative implementations like Enzyme.

Swift key path performance: While not strictly a part of differentiable Swift, when optimizing strongly typed models in Swift, key paths become extremely important for introspection of these models. Fast key path traversal is vital for many uses of strongly-typed models, so we are hoping to upstream performance improvements in this area. As a first step, we've been working to add a robust set of key path benchmarks to the compiler suite.

Outside of this roadmap, the company I work for is gearing up for a beta launch of a large suite of Swift-heavy products built on the concept of generalized autonomy. These products depend on differentiable Swift to enable gradient-descent-based physics models for use in control and optimization of automation systems. Over the next year, we plan on a staged release into general availability where Swift will be powering the automation of many large and small buildings.


Thanks! We may need to distill this down, but at the very least I can link to it, and this is very useful to know.


The current text is as follows; the edits are mostly to remove uses of "we" (a little vague/ambiguous in context) and to start with a short imperative sentence, which is the style we're using for the other bullets. Let me know if you want any changes.

Differentiable Swift

  • Improve robustness by fixing issues in differentiable Swift that impact production applications as they are encountered. Fewer and fewer of these issues are being observed over time, but there are still some known issues (many with simple reproducers) in the issue tracker.
  • Significantly improve the host-side performance of compiled code using differentiable Swift. Nominally, the compiler-generated backwards pass through Swift functions should be as fast (or only slightly slower) than the original version (forward pass) of those functions. At present, the compiler-generated backwards pass is orders of magnitude slower in many cases; there are some planned optimizations over the next year that should make the backwards pass much faster (one example).
  • Implement performance improvements to key paths. While this is not strictly a part of differentiable Swift, when optimizing strongly typed models in Swift, key paths become extremely important for introspection of these models. Fast key path traversal is vital for many uses of strongly-typed models, so the hope is to upstream performance improvements in this area. As a first step, there’s been an effort to add a robust set of key path benchmarks to the compiler suite.

That looks great to me, thanks!


As a technical update, work has continued on some of the performance and robustness objectives laid out above. For those interested, I can summarize some of the pull requests that have gone in recently:

On the performance front, @asl polished and landed a pull request originally developed by @dan-zheng to enable peephole optimizations that unblock optimizations of simple calls to gradient(at:of:). For simple differentiable functions (calculations involving scalar types, no control flow, etc.) in optimized builds, this can lead to a backwards pass that's as fast as the forward pass. Prior to these optimizations, the backwards pass for these functions could be up to 130 times slower, so this is a huge improvement for these simple cases.

My coworker Martin Cwikla upstreamed the rigorous keypath benchmarks I'd mentioned before. As an initial optimization, he has a pull request open now that precomputes the offset from the root to the value referenced by the keypath for structs with trivially-typed internals. This yields significant improvements in several keypath benchmarks, ranging from 13 - 64X. More work remains, but early results are encouraging.

In terms of making differentiable Swift more robust, @asl improved the system for custom derivative lookups, which allowed custom derivatives for functions previously defined in other modules to be defined and then used in the same module (as well as fixing other cases). He also identified and fixed a source of segmentation faults around differentiable functions, which allowed a series of tests to be re-enabled.

Work is ongoing, just though it was worth providing an update for anyone following along.


Hi @Brad_Larson thanks so much for your work here and for the updates you've been posting. I read this thread earlier with a lot of enthusiasm and have been wondering how to sink my teeth into it properly (noting that this would be the first ML training project I'd be undertaking myself).

In particular, I'm wondering whether there are any plans to implement the Layer and optimiser APIs from the TensorFlow package? Or if you have alternative suggestions? I'm aware that differentiable Swift is supposed to be bigger than "just" neural nets, but I'm finding it very difficult to get my head around the implications of that without any higher-level APIs to play around with first.

What I'd really like to do is recreate a CNN we originally trained using TF 'proper', and deploy it to devices (iOS, Wasm, Android) with the help of import _Differentiation. With that, I'm hoping we could:

  1. Remove TFLite from our stack (provided the performance is same or better)
  2. More easily allow on-device training
  3. Further improve our cross-platform story

Does that seem realistic to you, given that our models do run on the CPU today? Is the language feature at a stage where we can expect to match performance of the likes of TFLite and its highly-optimised XNNPack CPU delegate? Is it even a good use case?

edit: I should add that I'm not at all averse to getting my hands dirty, going off the beaten track, etc. Mostly curious whether it's even in the ballpark of being worth it at this point. And whether there are standardised solutions to, e.g. dense layers, activation functions, convolutions, etc., that I should be aware of (or whether I should just try to copy/paste some TensorFlow code initially).

1 Like

Sorry for the slow reply, was on the road for a bit. Glad to hear about the interest.

There are maybe different concerns when talking about the performance and robustness of differentiable Swift as a language feature itself, versus higher-level frameworks built upon it. A higher-level framework oriented around a Tensor-like type that dispatches to an accelerator or has optimized vector operations might have different performance bottlenecks and may only use a subset of the language features differentiable Swift provides. The overhead that we're trying to eliminate at the language level in the backwards pass of compiled differentiable Swift functions might be trivial compared to that involved in the preparation and dispatch of big parallel operations.

For traditional CNNs, your performance will probably be determined by how well data can be batched for parallel operations, how well available hardware can be used, can operations be fused, can memory be re-used, etc. Established frameworks have put a lot of time and effort into those parts of the process, and they can be hard to beat for their sweet spot of traditional neural networks.

Differentiable Swift shines in the situations that need a fast host language, tremendous flexibility, and / or the ability to blur the lines between application logic and machine learning code. If a custom operation would let you perform a calculation much faster, you can simply write one in Swift instead of waiting for a framework's authors to build and deploy one. We've seen instances on mobile where that was a huge win over existing solutions. At PassiveLogic, differentiable Swift is absolutely vital to getting great performance at the edge for our physics-based simulations, where it is ~117X faster in one internal comparison vs. PyTorch for a representative simulation, and 189X faster than TensorFlow for the same. Our code seamlessly mixes application logic in Swift with optimization processes.

Also, while differentiable Swift currently can be used on iOS, Android, and SwiftWASM (I've deployed to all three), support is what I would call "highly experimental" on those platforms and may not pass your comfort threshold for everyday production use. It's certainly fun to play with there, though.

That's a bit of a roundabout way of saying that I'm not sure whether replacing your existing TensorFlow or TFLite CNN model with something built in a framework layered on differentiable Swift would justify the rewrite work today. There are certainly cases where we've seen large performance advantages by building something in Swift that is outside of the comfort zone of existing ML frameworks, but that may not be your situation from what you describe.


Thanks a lot for your reply and for the great practical-level info @Brad_Larson, much appreciated! And I've taken your disclaimer to heart.

Nevertheless, I believe it's worth a shot – above all for our Wasm platform – in order to reduce overheads and dependency complexity, and because a successful deployment there could mean further unifying our codepaths across all of our platforms.

1 Like

For clarity (and sorry if this was already covered or implied and I missed this) does your implementation use Accelerate framework or similar still, or is everything hand written? Are there internal benchmarks when leveraging Accelerate or similar?

And thank you to you and your team for all the incredible work here :slight_smile:

Cannot speak for Brad, but I would imply that he means "program differentiation" i.e. differentiating the result of a simulator, etc.

BTW, I am also trying to bring back the SwiftFusion project back to life, by stripping out the TensorFlow Swift dependency. We beat (by a small threshold) a C++ impl for a sparse graph optimization workload without any matrix libraries, and that is before the S4TF project shutdown...and I heard from Brad that they got a lot more speedups in the two years passed :)


Sorry, I could have been more clear. Though I too can't speak for him, I think it's clear from what he said that autodiff/program differentiation is at work, but my understanding is one could still use Accelerate/SIMD/what-have-you for basic operations even with autodiff. By hand written, I meant not using any acceleration at all and all basic operators are either written by hand (and then auto-differentiated) or use some std lib (i.e. not hardware accelerated at all) (and then auto-differentiated). That's what I'm curious about, though maybe that piece is also implied here.

We beat (by a small threshold) a C++ impl for a sparse graph optimization workload without any matrix libraries.

That's really promising :slight_smile:

Sorry for the slow reply, got behind after some travel.

For our current simulator code, we're targeting aarch64 Linux devices and doing development / testing on macOS. Therefore, we can't rely on a platform-specific framework like Accelerate. We do use Swift SIMD types where it makes sense, but there's no explicit underlying framework. It's straightforward Swift code that we're differentiating through for simulation and control path optimization. Our physics and control teams write the natural Swift code that describes the equations at work, and language-integrated automatic differentiation handles the rest.

That's not to say that we wouldn't start layering on top of another framework to simplify dispatch to accelerators at some point, just that we're not using an existing one right now.


Incredible, thanks.

It's been a few months since my last update, but a lot has still been going on around differentiable Swift on the compiler side. In line with our goals for differentiable Swift in 2023, many patches have been upstreamed to fix issues with and improve performance of differentiable Swift code. These include:

I have to thank @asl for almost all of effort on the fixes above, he did phenomenal work over the last few months. I may have also mentioned it before, but my coworker Martin was able to upstream his optimizations for keypath access as well as some new keypath-related benchmarks, an area of Swift that we're also heavy consumers of.

Differentiable Swift remains an incredibly important area of the Swift language for my team at PassiveLogic. We've been growing a team of compiler engineers to work on and upstream improvements for this and other areas of Swift that are key to our autonomous control systems. You should start seeing some new names on pull requests, and we're excited with how Swift is evolving. In particular, I think the intersection of Swift macros and differentiability presents opportunities to simplify code and add powerful new capabilities.


This is very interesting and I would like to hear more about it (maybe this could make an interesting Swift blog entry)? I know autodiff from machine learning and Swift is not the obvious choice as a machine learning tool (to put it mildly), so it is interesting that autodiff in Swift is still useful.

We've written a few posts about differentiable Swift itself and why it's neat, but we definitely could elaborate on the specific use cases that we find so exciting. I go into this in one of my posts above, but differentiable Swift really shines for machine learning and optimization applications that are not well served by existing frameworks. It definitely can be used for traditional neural networks (we did quite a bit of that in Swift for TensorFlow), but differentiable Swift uniquely enables certain classes of problems. It also lets you merge production and ML / optimization code such that you don't see where one begins and the other ends.

But yeah, I am behind in putting together a public repository of differentiable Swift examples that demonstrate use cases, following the excellent example of the Swift macros sample repository.


First I must say, well done Brad and PL compiler Team! The progress is so awesome.

Very good question. The reason PassiveLogic is investing so heavily in the differentiable Swift compiler is we are very focused on edge based generative autonomous systems. Swift has a unique set of qualities that don't exist in any other language or framework:

  1. Industrial systems AI. How do we merge systems programming & AI into a scalable solution. Nobody would claim you're going to build an industrial system, application code, or system frameworks out of python. We need a modern compiled systems language to do that. That means Swift or Rust. And Swift is the only language with a serious effort to build generalized differentiable computing compiler support at the moment.

  2. Edge based inferencing AND training. There are many applications where backroom training, and edge-based inferencing won't get us to where we need to go. We are particularly interested in systems that train themselves at the edge.

  3. Post deep-learning AI. Current frameworks have a very narrow POV. If you don't fit into a tensor shaped homogeneous MATMUL matrix, your problem might be SOL. We are developing a generative platform for autonomous systems. This work is not going to happen with just conventional deep learning frameworks.

  4. Speed. Did I say is was freaking fast? We don't need to worry about the dispatch limitations of Tensorflow, PyTorch, or JAX. Right now differentiable Swift is between 2 to 3 orders of magnitude faster as solving our problem sets than those frameworks. Again this is crucial at the edge.

Hopefully that helps!



this is a super interesting project and I definitely appreciate all the great work happening here although I am not very familiar with all the details of differentiable programming given so much has happened in the meantime. I know my post is slightly off topic, but please hear me out.

Back in late 2020, I came across the original Google work on Swift4Tensorflow and started experimenting with applying Swift protocols to computational causality. When the Google project was archived in 2022, it wasn't clear to me if differentiable programming has a future in Swift so eventually I ported my project to Rust which isn't as elegant as Swift, but traits in Rust are similar to protocols and support default implementation.

The original Swift Jupyter Notebook with the basic idea is still in the repo

Meanwhile, things have evolved quite a bit towards hypergeometric protocol based causality in Rust, so I just want to share my humble project here because, back in 2020, everything started with protocol based differential programming.

You never know who follows your traces when you walk in the sand. Thank you all.