Differentiable Programming Mega-Proposal

yxckjhasdkjh · September 12, 2019, 1:07pm

Also, having taken a closer look at the proposal, I suggest that the section on Automatic Differentiation be expanded a bit more upon. I know enough maths to be comfortable with symbolic and numerical differentiation, but I had never heard of AD until it became a topic on these forums and I suspect that it might be the case for many other people even if they had calculus (or even numerical analysis, optimisation etc.) in high school or university. To me, that section doesn't yet fully explain how the paradigm works and it what sense it is superior.

Karl · September 12, 2019, 1:36pm

It's my understanding that the "Swift for TensorFlow" project aims to solve many of those issues. For example, there is a Swift Jupyter kernel and Google's Colab environment has experimental support for Swift (example), so anybody can use Swift from their browser without even having to install it (even Windows users).

EDIT: Also, one of the tensorflow GSoC projects was to create a cross-platform graphing library (SwiftPlot) with support for Linux and Windows (and it also supports Jupyter notebooks), so that's another major ecosystem improvement. I think the swift-format tool was also developed by Googlers, no doubt due to the increasing amount of Swift code they're writing. So these things are indeed being worked on in parallel.

TensorFlow is probably the most popular ML framework, and AFAIK there are no plans for similarly excellent integration with other languages (Go, Rust, etc), which together means that Swift would become the very best language for machine-learning applications.

Of course, we can't force anybody to use Swift, but S4TF would offer ML developers a much better experience, designed by the ML experts behind TensorFlow (who surely know better than anybody what their users want/expect), and which can easily be integrated in to existing projects (via interop with Python and other dynamic languages).

wadetregaskis · September 12, 2019, 5:27pm

So, if I may try to summarise the thread so far:

Some people have chimed in stating that they see immediate potential applications of this. Nobody is disagreeing that this is useful to some people, particularly to important domains such as machine learning.
- Conversely, some people have asserted that it isn't broadly useful, but there is not consensus on this assertion (though counter-arguments have tended to be speculative). Little data has been provided by either side.
Many folks have asserted that this is necessary to make Swift competitive with Python, but other than the potential to coerce TensorFlow users into Swift - iff this is a dealbreaker for continued TensorFlow development for Swift - this appears to be only speculation.
- Counter-arguments have been raised to this assertion, including that Python itself doesn't actually support any of this stuff in this manner either, therefore it is clearly not a requirement for success, in machine learning or elsewhere.
  - The counter-counter argument acknowledges this, but posits that it's a "fight fire with earth" strategy of differentiation over Python, not equivalence.
- It has been acknowledged that there are many other areas in which Swift is notably lacking vs Python, which are considered by some people to be [more] significant reasons Python is more popular. e.g. Swift's:
  - Poor support for non-Apple platforms.
  - Lack of meta-programming, variadic generics, and other more powerful language features.
  - Much less comprehensive and active ecosystem of libraries & tools.
A lot of concern around this proposal's premise of language integration seems to centre around:
- Opportunity cost, premised on a zero sum element whereby supporting this in Swift distracts from or delays support for other language features some people feel are more important.
- That this establishes a precedence of 'hacking' the language & the compiler case-by-case rather than empowering the language more broadly via meta-programming or hygienic macros or similar.
- Potential side-effects of this support on the Swift language & compiler - e.g. more bugs, more complexity in the core language, etc. Thus far nobody from the core compiler / language team seems to have weighed in on this point.
- Lack of clarity as to why this cannot be done sufficiently if not equivalently in pure[r] library code.
There are only a few concerns around the implementation itself (if the premise of language integration is taken as given). e.g. that it doesn't handle inout parameters properly / elegantly as currently drafted.
There have been assertions (including from Chris Lattner himself, of particular note) that the proposal's length exaggerates its complexity. Others have similarly suggested if not requested that it could & should be parred down to piecemeal and/or less elements. Some others have requested more detail, however, such as in explaining the principles of automatic differentiation.

I personally am still not convinced the pitch's proponents have made their case, but I do want to thank everyone who's contributed to this thread so far. Despite it being a bit of a 'warmly heated' topic, I think the discussion has been largely constructive & respectful, and I for one have been learning from it.

cocoaphony · September 12, 2019, 6:27pm

Swift has avoided unix-style letter-dropping-smash-words abbreviations like "autodiff." I think these names match Swift's naming conventions of using well-known terms of art without resorting to abbreviations. I agree that my fingers get a bit lost in these spellings, but I don't expect them to be typed that often, even in heavy-AD code.

@autodiff is definitely the wrong term IMO because it sounds far too much like stdlib's new CollectionDifference types.

mdlockyer · September 12, 2019, 8:18pm

Even though I previously made the point that the inclusion of this proposal would drive traffic, I have to agree with your point about the importance of the ecosystem improving jointly. I've recently been attempting to use Swift without the hand holding of XCode (both on Mac and Linux) and I have to say that the experience has been pretty lackluster so far. It's better on linux(although IDE support is pretty much non-existent), but the Mac is awful and I'm a strong believer that a language/compiler should never be tied to any one IDE. As understand it though, there aren't any plans to separate Swift from XCode. Swift can have endless earth-shattering features, but if it's not a cohesive, painless experience to get started and use them, they are ultimately meaningless. I do believe that adding these features solves the "why" problem for ecosystem and tooling growth though. It could act as a forcing factor for the community to develop more stuff for Swift.

mdlockyer · September 12, 2019, 8:28pm

With that being said, I think there is some validity to the idea that adding this feature set will drive ecosystem development, at least anecdotally. After seeing this proposal, I myself have started working on an initial, simple (and very crude) Tensor library that is intended to be a "SwiftTorch" like API in the future. I have no idea what the end goal is for it, but I think a proof of concept would be nice. I'm also aware that @cgarciae has begun work on a similar project. Both of these will probably never end up being the next PyTorch or Numpy, but the excitement surrounding the possible inclusion of these features has already spurred some basic development. I'm also eyeballing working on a PIL like image library for easily loading image data into Swift for vision models.

allevato · September 12, 2019, 8:28pm

The SourceKit-LSP project from Apple is dedicated to creating a cross-platform experience using the open Language Server Protocol that can be plugged into any IDE that supports it (such as Visual Studio Code).

mdlockyer · September 12, 2019, 8:32pm

Yeah! I actually played around with it a few days ago. I couldn't seem to get any useful suggestion/completion out of CLion(seemingly the best non-Xcode IDE when using the Swift plugin) with it though. ATM that's probably on the CLion side though as the LSP support was only though a third party plugin. I am happy that there is some movement on this front though.

yxckjhasdkjh · September 12, 2019, 8:39pm

It's not just that though. Installing Swift on Mac still essentially means installing XCode (there are other means, like swiftenv, but they are not supported) and this leads to the well-known problem of "due to a recent XCode update the code doesn't compile anymore" (or even worse, behaves subtly differently than the code running in production).

David_Sweeris · September 12, 2019, 9:41pm

I don't think I agree that the current lack of proposals (reasonable or otherwise) to add meta-programming support necessarily means that the fundamental issue you raised can't be solved without going "fully dynamic". Would you mind elaborating on this?

Troy_Harvey · September 12, 2019, 11:37pm

There are many in the data analytics world that are running out of headroom with python.

As a dynamic language without a complier it has real limitations managing ever larger code bases, even for experts.
It is slow, which is limits its future in the very space that made it successful.
When we look towards the future of heterogeneous computing (wait we are already there!). And you need to fill the multi-core computation pipe for the CPU, vector unit, GPU, and NPU/TPU all with one code base managed by the average programmer. You'll need a complier + language that understands types, how those types form computational proofs, how those proofs form graphs, and how to dispatch those proofs across multiple heterogeneous units.

Swift can be said to be the next step in systems languages (like C/C++), with high-level scripting-like semantics (like python), with the static guarantees of functional programming (like Haskel). It's a interesting mix. And it enables both this future need, and the merging of domains.

I think AD is likely the most useful derivative case for most programmers. I think of the space as having three major options:

Symbolic derivatives. Useful in pure/academic math, but often less useful in dirty real world problems. And you need to be expert to model your problems.
Numerical derivatives (finite difference methods). Often require custom solutions for each case, have accuracy and truncation problems, and lead to an approximation of the derivative to a given order.
Computational derivatives (AD). With automatic differentiation you can solve real world problems with the code you already wrote. You don't need to be a calculus expert (cause if you already write the forward code), and you should obtain the exact derivative. Its kind of the best of all world for many problems.

cgarciae · September 12, 2019, 11:52pm

For those who need inspiration, imagine you have a Graph<User> and want to somehow optimize that structure via a gradient-based algorithm. In other languages (including python) you probably have to covert the problem to TF, PyTorch, Jax, etc, (probably by converting the users into a tensor) solve the problem there and then convert back to the original representation.

In Swift for TensorFlow right now you can "just" get a the gradient of Graph<User> which usually has the same structure and use the allDifferentiableVariables keypath iterable to update the whole structure. This is amazing! People in computer graphics and simulation have to do this kind of stuff a lot e.g. optimize a mesh.

David_Sweeris · September 12, 2019, 11:59pm

In the Introduction, a type is defined:

struct Perceptron: @memberwise Differentiable {
...
}

with the @memberwise attribute, and that attribute shows up in a few other places as well. I don't see its semantics defined anywhere in this proposal (or this thread). A quick googling doesn't turn anything up, other than a reference to struct's memberwise initializers. Did @memberwise get added to the language while I wasn't looking (which is entirely possible)?

WRT to the claim of inefficient code in the section on Symbolic differentiation, is there a source on that? I don't just mean "Mathematica benchmarks" or the like; that section seems (to me, anyway) to imply that symbolic differentiation is inherently slower, not just that whatever implementations the authors are familiar with are slower. If the former is what's being claimed, I'm not aware of any proof. Unless the differentiation is assumed to be occurring at runtime instead of compile-time, at which point you're essentially running that code in an interpreter and that's pretty clearly going to be slower.

Are mega-proposals really bad, though? I rather enjoy them.

Depends on what you mean by "low-level optimization". If you're referring to faster compile times, then I think that might almost necessarily be true... I can't see how, in the best case, there wouldn't at least be an extra layer of indirection to go through. If you're referring to the performance of the resulting binary, I'm unaware of proof that this is inherently the case (even if it has been historically).

I have doubts that it is necessary...

TL;DR: This proposal seems like another reason to add support for meta-programming, expanding the generics system, etc sooner rather than later. I'd advocate that we add general functionality to the language before tackling more narrowly-focused features (even if they are super cool, which this one absolutely is) because the former will surely inform the designs of the latter, potentially leading to us being stuck with suboptimal designs due to our compatibility requirements (also, I don't know how to evaluate how much of a risk this really is).

Anyway...

[Beginning of what the TL;DR is summarizing]

The Embedded Domain-Specific Languages subsection of Approaches to Automatic Differentiation lists five reasons not to use EDSLs. Two-to-four of the five (depends on whether you think the first and forth are subsets of the other reasons) have previously been called out as pain-points in the language, and while I'm reasonably certain that fixing them is not on any "official" roadmaps for Swift, the impression I've gotten is that there's little doubt that reasonable proposals to address them would be accepted. In particular, the second reason says:

(we also synthesize conformance for CaseIterable or whatever it ended up being called). My admittedly fallible recollection is that part of the rationale for accepting at least the first of the proposals to have the compiler synthesize a protocol(s)'s conformance(s) is that it'd just be a stop-gap measure to address a pain-point now while we* work on a general-purpose code synthesis mechanism later (possibly not even restricted to protocol conformances... I really don't remember). Additionally, I think it's fairly reasonable to assume that the last reason not to use the EDSL approach — inadequate diagnostics support — would be addressed in the process of dealing with the other reasons.

Similarly, the main argument against Source code transformation tools seems to be that they're clunky to use. Well, yeah... AFAIK, source-code transformations in the sense that we're talking about them here aren't really part of any widely-used languages. If Swift had support for source code transformers as a language feature, IDE authors would have a compelling reason to find ways to remove said clunkiness and presumably a framework with which to do it (adding support to the LSP, which is itself a rather new thing, seems the obvious place to start since it isn't tied to any particular IDE or language, but I haven't given it much thought). The key point is that it'd need to be an integral part of the language's specification, not an optional or compiler/toolchain-specific thing. Additionally, the proposal itself states in The pursuit for user-defined code transformations:

... so maybe we could explore that? Enough to find out if performant AD based on it is possible, anyway.

Since we have compatibility requirements now, I'm wary of narrower proposals potentially eating into syntaxes which might be needed for more general proposals, especially when a general language feature + a library could be used to give the same functionality as the proposal for a narrower language feature. Speaking of which, does anyone have a view as to whether this proposal's syntax would be a good fit for inclusion in a general meta-programming system? If so, maybe we treat this as another "magic for now but not forever" thing, like the automatic synthesis of conformances for Equatable and such?

[End of what the TL;DR is summarizing]

If the proposal's authors have already looked at adding general meta-programming facilities and then building this proposal on top of them, I haven't see it mentioned and would love to know their thoughts on the matter (beyond "don't do that", of course... otherwise presumably they'd have done that, and we'd be having a different conversation).

AD is a seriously cool feature! Aside from the concerns raised about this specific implementation of it, I haven't heard any compelling reasons not to do this and many reasons to do it. OTOH, as I sit here writing this, I can't make up my mind between being a reluctant -1 on this proposal due to the compiler magic or a reluctant +1 in spite of it. IMHO, without those concerns, it'd be an obvious and strong +1.

* The "royal" we, I mean. IIRC the task wasn't given to or taken up by anyone in particular.

scanon · September 13, 2019, 1:17am

Minor point: AD has different issues with cancellation than numerical differentiation has, and they are less common, but it still has them. It does not guarantee an exact derivative.

Paul_Cantrell · September 13, 2019, 3:44am

I just can’t shake concerns that it sits at the wrong level of generality.

That’s not a concern about complexity; the proposal is less complex than most type system proposals. It’s also not a concern about utility; this would be a useful feature for many. It’s a concern about generality appropriate to a general-purpose language.

The “function builders” proposal is a useful contrast to this one: motivated by and built for SwiftUI, yet clearly general enough to sit at the language level.

Let’s run with that analogy.

Imagine a Swift without closures. I could imagine, say, a proposal to add “callbacks” as a first-class language feature. Functions may be marked with @acceptsCallback. Such functions may be called with a trailing block of callback code, which is passed to the @acceptsCallback function as, say, a compiler-generated Callback that has a call method … etc.

It’s … almost closures, but it’s not. The structure and terminology keep it narrow in its potential applications. It undeniably has broad applications, yet still feels a bit too specific to be a language-level feature. It seems like it would make a nice library, but needs compiler support.

That’s where this proposal sits for me. Thank goodness we have closures instead! And the proposal itself lays out this concern:

That “system for custom code transformation” is what the proposal makes me wish for — and I wish the proposal didn’t gloss over it. Imagine a system for AST transformation that could make “differentiable” a library, and it’s easy to imagine other exciting uses as well. That seems like what we should be aiming for here in the long term.

All of this isn’t really a -1 on this proposal per se. It’s a request for more forward-looking context in the proposal’s development. If we do bake differentiable programming into the language for now, I hope it fits into some larger future vision. What might such a generic AST transformation proposal look like? Is this work on differentiable programming a step toward it? Would this proposal likely fit sensibly into such a future system, both in syntax and behavior?

There’s a danger here of baking something into the language that seemed compelling in tech’s current moment, but had too many specific design decisions baked in before the motivating trend had matured, and ended up being compatibility baggage the language must maintain long after the problem domain has moved beyond it. Think of Java Swing.

Troy_Harvey · September 13, 2019, 5:56am

Beyond the fundamental nature of “it’s just math”... I guess I’m somewhat confused by the pushback in light of Swift’s type system architecture. It’s always appeared to me that certain functionality will need access to the compiler to interact with the type system in a way that preserves the architectural goals of the language (the type system also happens to be why first class AD is feasible)

It also appears to me that given the above, “just make a library” isn’t possible, as a library doesn’t have any more access to the compiler. The S4TF team seemed to state so much (so why keep saying it should be library?) It also seems that nobody is sure that it could even be achieved the through some yet uninvented and theoretical metaprogramming method that presumably may take years, and may not be possible (am I off base on this?).

At the end of the day, for those who can’t conceive of using the functionality, how would it even effect you? Or for that matter if you never use “@differentiable” how would you even know it’s there? How many compiler flags in Clang have you never used? Most of them, right? Have they ever bothered you? Nope you don’t know they even exist.

woolsweater · September 13, 2019, 7:25am

I should have added this before, but hopefully better late than never...

In spite of the concerns, I do want to say kudos and thanks to the folks who worked on this, for hacking on the language to make it better. However this ends up, sharing that work is awesome and appreciated!

bartchr808 · September 13, 2019, 7:46am

WRT to the claim of inefficient code in the section on Symbolic differentiation, is there a source on that? I don't just mean "Mathematica benchmarks" or the like; that section seems (to me, anyway) to imply that symbolic differentiation is inherently slower, not just that whatever implementations the authors are familiar with are slower. If the former is what's being claimed, I'm not aware of any proof.

The problem with symbolic differentiation comes from "expression swelling" and not working on control flow. Here are a couple of great sources that are publicly available about it. I couldn't find an exact proof as to why this is true, but hopefully this clears it up.

Lantua · September 13, 2019, 12:01pm

At the very least, we (I) want to be assured that we’re not unnecessarily closing future path.

Frankly enough I think generic meta-programming and/or function transformation mentioned is a huge undertaking compared to this pitch. It’s not a trivial modification to get from here.

Paul_Cantrell · September 13, 2019, 12:26pm

TAH2, it seems you may have read my post a bit hastily, perhaps responding to the conversation as a whole without full attention to what you’re responding to in particular. You wrote:

But I would potentially use this functionality, which is why I said

…and please note that I said:

Let me re-emphasize that: I’m not opposing the proposal; I’m asking more of it.

What am I asking? Well, It’s quite apparent that, as you wrote,

…at least with the current compiler. (That’s why I quoted the section of the proposal stating as much.)

What I’m saying is that it would be nice to give other potential features similar access to the compiler without them also having to alter the compiler, and we should at least give some speculative due diligence to that future direction, so we can think about how this proposal might grow into it.

We often do this in the development of other proposals, considering whether they make sense in light of a larger future vision, even if we ultimately accept or reject them on their own “current world” merits. We should do it here too.

Please consider what I wrote in the OP (in the last few paragraphs) about the risks of failing to do that “guessing ahead” work now.