Hi @philipturner, I assure you I'm quite familiar with all the technology you mention. :)
There isn't yet a formal working group for Differentiable Swift, or for ML-in-Swift efforts in general, but I think this thread is a reasonable place to ask this question. We're putting together a project-wide "roadmap" blog post looking forward to the next year of Swift development, and as part of that we're soliciting 2–4 paragraphs from a bunch of different corners of the project about the major things we're hoping to achieve in this timeframe. If the people here are willing to contribute that (say, by September 6th), I'd be happy to include it in the post.
It should be focused mostly on deliverables in official Swift projects, although briefly covering / linking to related projects may be fine.
Hey, John! Thanks for the opportunity, glad to hear about the interest. I can only speak for the group of people I work with, but we believe the next year will be an important one for differentiable Swift and related language features. We plan on investing in upstreaming improvements to the Swift language in these core areas:
Robustness: We will focus on fixing issues in differentiable Swift that impact production applications as we encounter them, but we are observing fewer of those over time. There are several other known issues (many with simple reproducers) present in Swift GitHub issues that can also be addressed.
Automatic differentiation performance: We'd like to significantly improve the host-side performance of compiled code using differentiable Swift. Nominally, the compiler-generated backwards pass through arbitrary Swift functions should be as fast (or only slightly slower) than the original version (forward pass) of those functions. At present, the compiler-generated backwards pass is orders of magnitude slower in many cases, so we have optimizations planned over the next year to make the backwards pass much faster (one example). We'd like to see if we can make Swift automatic differentiation as fast as, or even faster than, alternative implementations like Enzyme.
Swift key path performance: While not strictly a part of differentiable Swift, when optimizing strongly typed models in Swift, key paths become extremely important for introspection of these models. Fast key path traversal is vital for many uses of strongly-typed models, so we are hoping to upstream performance improvements in this area. As a first step, we've been working to add a robust set of key path benchmarks to the compiler suite.
Outside of this roadmap, the company I work for is gearing up for a beta launch of a large suite of Swift-heavy products built on the concept of generalized autonomy. These products depend on differentiable Swift to enable gradient-descent-based physics models for use in control and optimization of automation systems. Over the next year, we plan on a staged release into general availability where Swift will be powering the automation of many large and small buildings.
Thanks! We may need to distill this down, but at the very least I can link to it, and this is very useful to know.
The current text is as follows; the edits are mostly to remove uses of "we" (a little vague/ambiguous in context) and to start with a short imperative sentence, which is the style we're using for the other bullets. Let me know if you want any changes.
- Improve robustness by fixing issues in differentiable Swift that impact production applications as they are encountered. Fewer and fewer of these issues are being observed over time, but there are still some known issues (many with simple reproducers) in the issue tracker.
- Significantly improve the host-side performance of compiled code using differentiable Swift. Nominally, the compiler-generated backwards pass through Swift functions should be as fast (or only slightly slower) than the original version (forward pass) of those functions. At present, the compiler-generated backwards pass is orders of magnitude slower in many cases; there are some planned optimizations over the next year that should make the backwards pass much faster (one example).
- Implement performance improvements to key paths. While this is not strictly a part of differentiable Swift, when optimizing strongly typed models in Swift, key paths become extremely important for introspection of these models. Fast key path traversal is vital for many uses of strongly-typed models, so the hope is to upstream performance improvements in this area. As a first step, there’s been an effort to add a robust set of key path benchmarks to the compiler suite.
That looks great to me, thanks!
As a technical update, work has continued on some of the performance and robustness objectives laid out above. For those interested, I can summarize some of the pull requests that have gone in recently:
On the performance front, @asl polished and landed a pull request originally developed by @dan-zheng to enable peephole optimizations that unblock optimizations of simple calls to
gradient(at:of:). For simple differentiable functions (calculations involving scalar types, no control flow, etc.) in optimized builds, this can lead to a backwards pass that's as fast as the forward pass. Prior to these optimizations, the backwards pass for these functions could be up to 130 times slower, so this is a huge improvement for these simple cases.
My coworker Martin Cwikla upstreamed the rigorous keypath benchmarks I'd mentioned before. As an initial optimization, he has a pull request open now that precomputes the offset from the root to the value referenced by the keypath for structs with trivially-typed internals. This yields significant improvements in several keypath benchmarks, ranging from 13 - 64X. More work remains, but early results are encouraging.
In terms of making differentiable Swift more robust, @asl improved the system for custom derivative lookups, which allowed custom derivatives for functions previously defined in other modules to be defined and then used in the same module (as well as fixing other cases). He also identified and fixed a source of segmentation faults around differentiable functions, which allowed a series of tests to be re-enabled.
Work is ongoing, just though it was worth providing an update for anyone following along.
Hi @Brad_Larson thanks so much for your work here and for the updates you've been posting. I read this thread earlier with a lot of enthusiasm and have been wondering how to sink my teeth into it properly (noting that this would be the first ML training project I'd be undertaking myself).
In particular, I'm wondering whether there are any plans to implement the
Layer and optimiser APIs from the
TensorFlow package? Or if you have alternative suggestions? I'm aware that differentiable Swift is supposed to be bigger than "just" neural nets, but I'm finding it very difficult to get my head around the implications of that without any higher-level APIs to play around with first.
What I'd really like to do is recreate a CNN we originally trained using TF 'proper', and deploy it to devices (iOS, Wasm, Android) with the help of
import _Differentiation. With that, I'm hoping we could:
- Remove TFLite from our stack (provided the performance is same or better)
- More easily allow on-device training
- Further improve our cross-platform story
Does that seem realistic to you, given that our models do run on the CPU today? Is the language feature at a stage where we can expect to match performance of the likes of TFLite and its highly-optimised XNNPack CPU delegate? Is it even a good use case?
edit: I should add that I'm not at all averse to getting my hands dirty, going off the beaten track, etc. Mostly curious whether it's even in the ballpark of being worth it at this point. And whether there are standardised solutions to, e.g. dense layers, activation functions, convolutions, etc., that I should be aware of (or whether I should just try to copy/paste some
TensorFlow code initially).
Sorry for the slow reply, was on the road for a bit. Glad to hear about the interest.
There are maybe different concerns when talking about the performance and robustness of differentiable Swift as a language feature itself, versus higher-level frameworks built upon it. A higher-level framework oriented around a Tensor-like type that dispatches to an accelerator or has optimized vector operations might have different performance bottlenecks and may only use a subset of the language features differentiable Swift provides. The overhead that we're trying to eliminate at the language level in the backwards pass of compiled differentiable Swift functions might be trivial compared to that involved in the preparation and dispatch of big parallel operations.
For traditional CNNs, your performance will probably be determined by how well data can be batched for parallel operations, how well available hardware can be used, can operations be fused, can memory be re-used, etc. Established frameworks have put a lot of time and effort into those parts of the process, and they can be hard to beat for their sweet spot of traditional neural networks.
Differentiable Swift shines in the situations that need a fast host language, tremendous flexibility, and / or the ability to blur the lines between application logic and machine learning code. If a custom operation would let you perform a calculation much faster, you can simply write one in Swift instead of waiting for a framework's authors to build and deploy one. We've seen instances on mobile where that was a huge win over existing solutions. At PassiveLogic, differentiable Swift is absolutely vital to getting great performance at the edge for our physics-based simulations, where it is ~117X faster in one internal comparison vs. PyTorch for a representative simulation, and 189X faster than TensorFlow for the same. Our code seamlessly mixes application logic in Swift with optimization processes.
Also, while differentiable Swift currently can be used on iOS, Android, and SwiftWASM (I've deployed to all three), support is what I would call "highly experimental" on those platforms and may not pass your comfort threshold for everyday production use. It's certainly fun to play with there, though.
That's a bit of a roundabout way of saying that I'm not sure whether replacing your existing TensorFlow or TFLite CNN model with something built in a framework layered on differentiable Swift would justify the rewrite work today. There are certainly cases where we've seen large performance advantages by building something in Swift that is outside of the comfort zone of existing ML frameworks, but that may not be your situation from what you describe.
Thanks a lot for your reply and for the great practical-level info @Brad_Larson, much appreciated! And I've taken your disclaimer to heart.
Nevertheless, I believe it's worth a shot – above all for our Wasm platform – in order to reduce overheads and dependency complexity, and because a successful deployment there could mean further unifying our codepaths across all of our platforms.
For clarity (and sorry if this was already covered or implied and I missed this) does your implementation use Accelerate framework or similar still, or is everything hand written? Are there internal benchmarks when leveraging Accelerate or similar?
And thank you to you and your team for all the incredible work here
Cannot speak for Brad, but I would imply that he means "program differentiation" i.e. differentiating the result of a simulator, etc.
BTW, I am also trying to bring back the SwiftFusion project back to life, by stripping out the TensorFlow Swift dependency. We beat (by a small threshold) a C++ impl for a sparse graph optimization workload without any matrix libraries, and that is before the S4TF project shutdown...and I heard from Brad that they got a lot more speedups in the two years passed :)
Sorry, I could have been more clear. Though I too can't speak for him, I think it's clear from what he said that autodiff/program differentiation is at work, but my understanding is one could still use Accelerate/SIMD/what-have-you for basic operations even with autodiff. By hand written, I meant not using any acceleration at all and all basic operators are either written by hand (and then auto-differentiated) or use some std lib (i.e. not hardware accelerated at all) (and then auto-differentiated). That's what I'm curious about, though maybe that piece is also implied here.
We beat (by a small threshold) a C++ impl for a sparse graph optimization workload without any matrix libraries.
That's really promising
Sorry for the slow reply, got behind after some travel.
For our current simulator code, we're targeting aarch64 Linux devices and doing development / testing on macOS. Therefore, we can't rely on a platform-specific framework like Accelerate. We do use Swift SIMD types where it makes sense, but there's no explicit underlying framework. It's straightforward Swift code that we're differentiating through for simulation and control path optimization. Our physics and control teams write the natural Swift code that describes the equations at work, and language-integrated automatic differentiation handles the rest.
That's not to say that we wouldn't start layering on top of another framework to simplify dispatch to accelerators at some point, just that we're not using an existing one right now.