Differentiable programming for gradient-based machine learning

I'm trying to figure out if there's a problem with MPS that I can solve in the S4TF backend. The GitHub repository might not have given the two GPUs an accurate comparison (there could be other factors they didn't realize were affecting performance), which makes it difficult for me to draw conclusions for the purpose of optimization.

@philipturner Would you mind moving this discussion to the Related Projects forum? We'd like to keep this thread focused on autodiff. Thanks!

We are working on that right now. Stable autodiff is a requirement before we take any action to resurrect S4TF. There are plenty of bugs, but it should be fixed in a month or two when I start devoting time to the effort.

We (Apple) do not yet have a timeline for autodiff to share, but if members from the community like @philipturner would like to help push forward the implementation, I'm more than happy to provide guidance. Tackling bugs that are blocking third-party clients would be a great starting point, but note that in order for autodiff to be eligible for Swift Evolution, we still need to:

  • Complete the ABI [0]
  • Fix any issues when compiling with library evolution
  • Reorganize the _Differentiation module [1]
  • Support differentiating additional language constructs such as _modify [2] and @_alwaysEmitIntoClient functions [3]
  • Support specifying availability on differentiability [4]
  • Most importantly, discover more major use cases

@rxwei from what @BradLarson told me, it’s fine to proceed with work on S4TF once at least VJP works 100% fine. However, will we also need JVP and linear and transpose maps in order to get it fully merged like @machineko is asking about?

I don't want to go into discussing clients outside of the Swift open source project here, but for autodiff itself to be eligible for Swift Evolution, we do not need to implement transposition or forward-mode differentiation that was described in the manifesto. The initial implementation should be in line with the proposal in this thread, that is, reverse-mode differentiation (based on VJPs) that we have already implemented today.


For more major use cases, I think we should look at physics. My AR demo was about that.

I spent two months in summer 2020 creating a simulation called MultiPendulum. It required solving a system of linear equations, but taking the derivative of the solution operation. Very complex, and ended up with O(n^4) algorithmic complexity and with lot of hand-tuned optimizations - it took days to come up with that algorithm.

That specific derivative example isn’t something Swift autodiff could have optimized on its own, but it just shows how engrained derivatives are in physics simulations. The O(n^4) part was to find partial derivatives of momentum with respect to all other pendulums. I was also using adaptive-step Runge-Kutta, where momentum and position were treated as distinct entities, with their derivatives being used to find their change each time step. There was also an error correction function computed from a division of differentials of the hamiltonian and something else.

This post is very long, but I just mentioned 3 unique examples of derivatives in my simulation. If I could rebuild some of the higher-level components of the physics simulation part of AR MultiPendulum and push an update to the App Store, that would constitute a new application. We could also apply autodiff to some entirely different physics simulation, as there as many examples out there to choose from.


Very cool feature... vs very niche use...

I wish there was a formula we could use to formally assess features. e.g. "a feature of coolness X, that requires an added language complexity Y and that will be used in Z percent of real world apps. combine X, Y, Z into an overall mark N, and check N against threshold T, if bigger the feature should be included into the language."

One possible way forward is to release the cut down feature in a library form (whatever is expressible without language support), ship it and see how well it is received. Then based on feedback and frequency of its usage make an executive decision whether it's worth inclusion as a language feature or not.

As a question from a potential user: Does the “no timeline to share” mean that things will move slowly in the worst case, and we don’t have to fear that autodiff features are going to break when other Swift features evolve?

The background is that I am currently evaluating for my research lab, whether we should bet on the unique selling point of “autodiff for a compiled language” for some of our projects. We could probably live with the current rough state for some time, but breakage of features would be fatal. To some extent, this is a chicken&egg problem: We could potentially provide some visible use cases, but I am uncertain whether I really should invest at that time.

@binderh could you give more specific detail on which projects you think autodiff could be used for?

1 Like

In machine learning, combining neural networks with models that impose some structure, such as differential equations, currently is a big topic. Among many other application domains, this is very useful in biomedicine for incorporating knowledge on the processes to be modeled. Yet, differential equations are only one example of a suitable model class, and probably many others will become available in the next few years. One limitation for more flexible incorporating model classes into neural networks is that specification in PyTorch can be challenging, and a real differential programming language is needed. In addition, other frameworks typically require that estimation has to work without updating array elements (see, e.g., the “mutating arrays not supported” topic in the Julia/Zygote community, but also the corresponding issue with JAX). This is a limitation that Swift autodiff doesn’t seem to have. Thus a much broader class of models would potentially be feasible with Swift, and we have several biomedical examples (ranging from single-cell data to large clinical registries) where this could be demonstrated. In addition, implementations in Swift can be compiled into libraries that can easily be deployed into R, Python, Julia, or even web/WASM-based workflows.


That is definitely an important use case, and it could help Swift autodiff/S4TF stand out from other languages/frameworks. @rxwei we now have complex physics simulations and biomedicine as additional major use cases for autodiff.

1 Like

@binderh have you checked out my thread on enabling differentiation in Swift release toolchains? Swift for TensorFlow Resurrection: Differentiation running on iOS

(post deleted by author)


I feel great respect to your eternal fight efforts in this area.

Don't take this as a discouragement, just one humble opinion of a member of this list.

Have no idea how others feel about this. My view on this and other features like this:

  • Swift is already quite large and complicated language.
  • We can't add everything into Swift to satisfy everyone.
  • There are costs to every new language feature (compiler complexity, app complexity, user education, etc)
  • Swift should stay general purpose language without extending into niche domains and areas that would be only used in, say, 0.1% of all apps.
  • Can a feature be done in a library form (even if it's not totally perfect that way compared to being built into the language itself)? If yes it should stay in a form of separate library.

All IMHO above, and I'm happy to be proven wrong.

To be clear, the current stance of the Swift project towards this feature is still what Ted laid out in his post from 2019. There's technical progress that needs to be made, and then we can evaluate adding this to the language. Philip's work is welcome, but this is a large feature that will be ready when it's ready, and it is not on the roadmap for any particular release.

There probably isn't a whole lot of point to discussing this feature in the abstract right now. Implementation progress and discussion should really go into a dedicated thread in Development, and nothing is happening on the Evolution front.

@tera, you've made your reservations about adding this feature clear, repeatedly. I think it's best for you to just ignore this thread now. If the feature makes technical progress and comes up for review, you will have an opportunity to give more feedback then.


Counterexample: half of every language feature in C++ that got pushed into the library. Move semantics, discriminated unions, SFINAE (concepts mostly fixed this one), s̴͔̮̆̄ț̴̛̝d̴̨͓͆́:̵̖̭̒:̶̖̀͛ḯ̴̘́n̸̙̐i̸̜̤͘͝t̴̪̂i̵̛̭͒a̸̻̐̾ḷ̴̩͂ỉ̵̧z̸̩̱̓é̵̮͘r̵͇̖̈́_̷̖̟͛l̶͇̀͜i̷̮̗̿̉ṣ̴̿̋ṭ̴́͑ etc.

I'm going to say something that isn't particularly relevant to the discussion occurring on this thread. Thus, we should make a new thread under Development to discuss it further. It would be helpful to plan for its implementation now instead of waiting until this SE proposal is implemented into release toolchains.

Pitch for a new thread

Making Array, Optional, Float, etc. differentiable is not part of this SE proposal, but I imagine it will come in a subsequent one. There is still a lot to decide, such as what convention we'll go by for differentiating functions that have jump discontinuities. I discussed the idea of pretending we're differentiating the raw assembly instructions with @scanon. As for Array, there are many more collection types that could be differentiated. There has been progress on a DifferentiableCollection protocol, which might be possible to merge if its blocking compiler crashers have been fixed. There are also many functions of floating-point types that do not have derivatives, and a vulnerability in their test suite that allows manual derivatives to not propagate the pullback. One more thing to discuss is the unofficial Differentiation Swift package, and its role in allowing differentiation of the Stdlib before a release toolchain officially supports that.

With my current skill set, I'm much more inclined to working on the standard library than helping push ABI stability forward. So I would be very interested in having another thread where we could exclusively discuss differentiation of standard library types. Is this discussion viable at the moment?

I don’t know if there’s a new thread now, but I just wanted everyone to know what @philipturner already read elsewhere:

There might actually be a way to move forward with ML/DL without a special autodiff toolchain. I‘m not sure about the performance of my more high level solution, but I‘m going to assess that.

Regarding apples to apples comparisons, I would hope that a good library in a good language can identify the hardware it needs and then be written against that hardware. If we’re going to assess the performance of a library, we should be using the hardware that the leaders of the field are using. Haven’t worked with swift outside the apple ecosystem a lot, don’t know how well that works by now, but I think this will be a requirement here.

It's been a little while, but to follow up on John's suggestion I've created a separate Development thread in an attempt to aggregate information about ongoing implementation work for differentiable Swift. Hopefully that will help to keep the discussions here focused on the pitch itself.