Differentiable Programming Mega-Proposal

mdlockyer · September 9, 2019, 4:36am

I think the "it's just math" argument is actually pretty good. Languages already come with primitives and operators that cover a wide range of mathematics, why should this be any different? It is definitely not domain specific and simply opens up new doors for writing derivative oriented software(ML just happens to fall nicely into that category).

elhoov · September 9, 2019, 10:52am

This. While it would be great for Swift to support enough meta-programming for this feature to be implemented, it doesn't seem like we are in any way close to this.

In my opinion the application of this is broad enough that it probably deserves it's own feature in the compiler. Off the top of my head I would have loved to have something like this in my robotics class where we had to do multiple derivatives by hand to implement in code later. Also in education imagine having a "playground" style app where kids can play around with the equations of motion and can see in real time a plot of how it affects speed and acceleration over time without having to compute the derivative each time they change the equation. It really kills exploration.

I'm sure there's a lot more but the biggest one of all is obviously AI and we can't downplay its importance right now. We can't ignore it's contribution to the increase in popularity of python as others mentioned. This feature is the only thing that has the chance to shift the tide onto Swift, and even if it's a long shot in my opinion this is the most desirable future possible for the language and it's worth every effort to try. How big is this feature? As far as I know it's the part that keeps Tensorflow from being a regular library. How important is that? Well important enough that one of the most popular MOOC has started teaching Swift for Tensorflow in the final lessons of its advanced course because they believe it has the potential to be a much better alternative to current tools: https://course.fast.ai/videos/?lesson=13.

The community seems divided between the ones who think this feature is of critical importance (I am one of these), the ones who think just a handful of users will use it, and the ones who see its potential but would like it to be implemented in a way that is not a special case, but that similar custom extensions to the compiler can be built on top.

Group one doesn't need to be convinced. I think if someone can confirm that this feature will in no way impact execution or compile time for those that don't need it, then a lot of group two would be neutral about it. As for group three, I've seen hints that if this was implemented in a more generic way the errors and warnings from the compiler can't be as good. It would be good to get a much more in depth explanation of what exactly we'd be missing and why it's important.

Just my 2 cents from reading this thread.

jeremyp · September 9, 2019, 11:24am

Well it's clearly not of critical importance because no current mainstream language has it (except Python?) and we've got along for 70 or 80 years without it.

Compile time and execution time are not the only potential costs. There's also compiler size and executable size, also the extra bugs that the code is going to introduce into the compiler. The proposal is 60 pages long, which makes it pretty much impossible for the community to review it for errors and suggests implementation is going to be time and resources consuming, possibly delaying other needed features and there will be a cost to future maintenance.

The idea does sound pretty cool, but don't pretend there will be no impact on the people who won't be using it.

elhoov · September 9, 2019, 11:38am

By that logic nothing new is ever of critical importance just because it's never been done before. My opinion (emphasis on opinion) is that this is of critical importance if we ever want Swift to have any chance of taking python's place (python doesn't have this feature btw and that's why it's the best/only chance to win over developers). This is an opinion and it's just as valid as yours. I have no interest in using Swift for any purpose other than data engineering and machine learning. Obviously it's not critically important to people who are already using Swift for other purposes.

I'm not pretending anything as I'm only a potential user and not involved in the creation of this feature. I'm just trying to prompt the developers to address the potential costs that you and others have brought up and tell us if and how much they will impact existing users who will never use this feature. I think you would agree that this needs to be brought into the open sooner rather than later because otherwise we'll keep endlessly discussing abstract things.

jeremyp · September 9, 2019, 1:42pm

Not being critical doesn't mean its not worth doing. I just dispute the implication that the world (of Swift) would end if it isn't done.

Do you mean take over in your field? Or generally? Because the latter is not happening any time soon no matter what features we bundle into Swift. Swift is already vastly superior IMO to Python as a general purpose programming language and it's not really denting Python's dominance. The reasons why are not because of missing features.

As for dominance in your field, well you know it a lot better than I do. I can understand why you might want Swift to take over from Python (for me, programming in Python after programming in Swift feels a bit like time travelling back to thew 1990's), but I would be concerned if our primary motivation for adding any new feature was so Swift could eat some other language's lunch.

I apologise. I worded that sentence poorly. It should have been more like "let us not pretend...": I wasn't accusing you specifically, or at all, really. In fact, I think we agree on the point about costs i.e. we want to know what they are. I just want to be sure people are aware that the speed is not the only metric of costs.

I'll reiterate: I think this is a cool feature. I am just concerned that the costs might be too high at this stage.

elhoov · September 9, 2019, 2:11pm

Sure it's not close to being critical to Swift's survival, that's a fact. However it's arguable whether it's critical for Swift to break away from the iOS/OS X programming language box that it's currently in.

I disagree that Python feels 1990s I actually really enjoy the language but agree with you that I find Swift superior in almost every way. But I don't know many programming languages that became dominant because they were better than the ones before. They usually break through the competition when there's a feature/library that makes a certain task they try to solve a lot easier. It happened with Ruby after Ruby on Rails, then with Python after numpy/Pandas/Jupyter made it easier than any language to prototype for data science, then again after Tensorflow/Airflow etc.

I believe that languages get popular by riding waves when they come by providing superior libraries to its competitors. There's no bigger wave than machine learning right now, the libraries are still immature/hard to use and if you give them a compelling reason to switch and get some key people behind it (google is behind this project, fast.ai is teaching a new generation and hoping to use this in their courses, etc.) then people might follow.

No matter how good Vapor gets it would be almost impossible to get the majority of people to drop Django, but compel them to learn Swift for Tensorflow and you might get a lot more people willing to start using it for other projects.

I think we all agree on this point and that was the whole purpose of my initial post. We need to know the costs before we decide knock down or praise the proposal.

I appreciate you correcting me on that.

Karl · September 9, 2019, 3:40pm

I'm like the idea of adding differentiable programming to Swift.

At the same time, I can appreciate some of the pushback. Integrated source-transformation techniques could be used to power lots of advanced and exciting features. Even acknowledging the importance of differentiation as a fundamental mathematical function transformation, might there be other kinds of transformations which would benefit from similar treatment - e.g. integrals or fourier transforms? Would it be reasonable to add "first-class language support" for all of them? I'm not qualified to answer those questions. Maybe there is something about differentiation which makes it uniquely suited to this kind of implementation strategy.

If we think this is a general problem and would like to support other mathematical transformations one day, I think we should break the protocol and conformances out in to a Differentiation core-library which we gradually move more of the transformation-related logic in to as we figure out how that should work.

Some other things to consider (responding to concerns above, and just generally thinking out-loud):

Maintainability
How confident can we be that this won't bit-rot? That's a question the core team and corporate contributors will have to consider. Google have a bit of a spotty record when it comes to long-term support for projects - how do we know they won't one day drop their investment in Swift in favour of Go or some other language? It's impossible to say what will happen in the future, but hopefully Chris Lattner's involvement is a sign that Google's interest in Swift is not fleeting. Additionally - Google is far from the only company interested in machine learning. I'm sure Apple and IBM would be very interested in using Swift for ML.
Portability
I might be alone in this, but I consider it important that it remains at least theoretically possible to write 3rd-party Swift compilers. In fact, there already is (at least) one: RemObject's Silver compiler. It seems like the S4TF team have published whitepapers and technical documentation describing how the transform should work, and the implementation in the official compiler is of course open-source, so I hope it would be possible to implement in other backends as well. We might ask @marc_hoffman for his perspective on how portable this feature is.

Finally, I'd like to thank the S4TF team for all their hard work. As somebody who has been spending more and more time with TensorFlow, I truly cannot wait to ditch Python. I think I might literally throw a party the day that S4TF has its first stable release.

Lantua · September 9, 2019, 4:24pm

Among these exciting additions, there's one that I'm particularly interested. That you can, in a way, tag a related function onto the the other.

func expf(_ x: Float) -> Float { ... }

@differentiating(expf)
func _(_ x: Float) -> (value: Float, differential: @differentiable(linear) (Float) -> Float) { ... }

This adds great amount of safety to the function implementations, and seems to make sense as a function member. Function is a first-class object, but we can barely do anything aside from calling them and/or passing them around. Now if we could extract the related function like so

expf.differentiate(4.5)

It'd be an independently useful feature. Especially when there's a lot of meaningful transformation; inverse, Fourier/z transform, derivative, integration, etc. Then the compiler synthesis could come later.

elhoov · September 9, 2019, 5:51pm

My understanding is that the Swift for Tensorflow team is working on adding MLIR as a compiler backend so everything can run on GPUs, so my guess is it's pretty portable. Don't take my word for it though, someone official should answer.

Josh_Osborne · September 9, 2019, 9:05pm

As I understand it the ML frameworks really want to take differentiable objects as arguments, so the ML clients either need to hand differentiate objects/functions to take advantage of (some) ML frameworks, or they need to autogenerate differentiated objects/functions. The actual ML frameworks just get to assume if they needed differentiated stuff the callers will comply.

In many ways this feature seems like closures, not in any technical sense of a shared implementation, but more in that entire APIs go from "in theory you can pass a function pointer and a void pointer wherever we might want a callback...but that is such a pain to use you had better really need a callback in order to code it that way" to "anything that would be better with a callback gets a closure! Closures everywhere!". This feels like "well ML frameworks need derivatives all over the place, and it is a big pain for people to provide them, but them the breaks"...and this proposal hopes to reduce the client side pain.

(note my actual experience with ML has more to do with simulated ants, and not solving any real problems though, so I may be overstating the amount of ML that actually cares about differentiable stuff)

anandabits · September 9, 2019, 9:13pm

I think the closure analogy is a good one. It occurred to me while considering this proposal as well. In particular, the "closure are just objects" argument against supporting them in a language came to mind. While there is an analogy, the qualitative difference is significant in its influence on how people write code. I suspect the same can probably be said of first class support for differentiation. (Of course it's not possible to say for sure - we just don't have enough experience with it yet)

leozhao · September 10, 2019, 2:06am

Since this proposal is from Swift for TensorFlow team, people within our community who are not closingly following this development might link this proposal with TensorFlow integration. My understanding is that Differentiable Programming is an infrastructure on which other libraries can be built. It has broader usage than serving TensorFlow integration (To use TensorFlow, one has to use swift-apis which is where all TensorFlow related code resides, which is separated from Swift repo.) For example, potentially, on macOS where TensorFlow on GPU is missing for now, one can write custom algorithms using Metal Performance Shader, MPS, to offload certain computation tasks onto GPU, and still be able to benefit from other libraries that build on this differentiable programming compiler support. This way, you are not limited to what CoreML can do, which is the only way for now to leverage GPU compute on macOS.

I am an app developer. So, I mainly care about how to run machine learning inference in an app, without expensive cloud GPU backend. I had some machine learning related ideas, but simply could not find a viable way to put it into an app. I tried many frameworks, Torch, PyTorch, TensorFlow C++. None of them is easy to work with. Most frameworks are geared towards running on Linux in the cloud and using Python as frontend language, which is slow for certain tasks.

There is no good GPU compute solution for Mac. Nvidia GPUs are not supported since Mojave. Even if NVidia GPUs are available, since Cuda is proprietary, shipping an app that uses Cuda isn't as easy as it should. Sure, you can use CoreML if you are willing to write your model in Python, and use a converter to convert it into something CoreML can understand (unless the model is simple, you need lots of luck for that to be smooth.)

With Swift For TensorFlow, I finally could build an app with the feature I wanted and uploaded to Mac App Store. For iOS, I imageine there are a few ways that this proposal can help. Swift has very good integration with C++. So, potentially, many frameworks can be exposed to Swift via this proposal. Swift developers can access them, and enhance them wherever necessary. As long as those enhancements conform to the proposed protocol, they should work together.

I don't consider this feature as a potential burden for future feature development of Swift. In contrary, I think the way it is done can provide a reference to other future features. Some days later, we might say, the feature under development shares quite some similarities with differentiable programming, let's abstract them better. But I don't expect the community will take this implementation 100%. Like SE-0253 ends up with callAsFunction, at that time, call seemed short and clean. Now looking back, I appreciate callAsFunction naming whenever I encounter it. I am kind of hoping that collectively we can come up with better naming.

Chris_Lattner3 · September 10, 2019, 6:04am

A few random comments, just my 2c:

I'm super +1 on proper metaprogramming support in Swift and Eugene and others are pushing on this, but I'm not aware of a reasonable proposal that can provide a good experience for differential programming. The fundamental issue is that we need type system integration, witness/vtable emission support, and other feature to make this feel swifty. Going with a 'fully dynamic' approach makes the feature feel like Python-esque, which is... not what we're going for.
Differential programming is not widely available in other languages, but I agree with others that this is a huge potential /differentiator/ (sorry, couldn't help myself) for Swift, in a key and growing market.
This is all "just math", and is intentionally well factored such that it applies to Float, Double and arbitrary other types, not just tensors or S4TF-specific things. S4TF is specifically trying not to create a fork of Swift or develop "tensor only" features. Our focus is on work that develops features that are important to audiences outside the established iOS developer ecosystem Swift is currently well placed in.
The proposal is much simpler (imo) than it looks from the manifesto. The manifesto starts with a deep dive on calculus and other background concepts, but the actual Swift language extensions are modest. I've asked Richard and Dan to develop a set of piecemeal proposals which can be considered ala carte, but it is important to also be able to see the "big picture" of where things are going, which is the normal role of manifestos in the swift-evolution process.

I expect the individual proposals to need iteration and we are committed to following the normal swift-evolution process. We're most interested in getting the best possible result (e.g. see the year+ we've been incubating this on a branch, plus extensive iteration in the face of user feedback!) rather than being in a hurry to "get something in" to any given release.

I'd love to see other big changes to the language get similar diligence paid to them.

-Chris

jawbroken · September 10, 2019, 11:26am

I would love to see this functionality brought into Swift. I don't do much machine learning work, at least in the deep learning sense, but I do a lot of computational science work which would benefit from this. Even languages widely used in my area do not have good support here, and people mostly manually implement derivatives or rely on numerical differentiation. This is one factor that leads to bugs and poor code reuse, because people find it easier to just copy-and-paste algorithms and rewrite the function and derivative evaluations than to e.g. pass around separate function and derivative closures.

I would prefer concerns about implementation complexity, maintenance, etc. to be based on concrete technical concerns and knowledge than things like document length.

This should be characters not words, for anyone else terrified at the thought of a 120,000 word proposal. It might be better to call this a manifesto rather than a mega-proposal, because I think some people have missed:

I have some nitpicks in a few areas that can wait for the bite-sized proposals. Thanks for everyone's work on this so far.

GreatApe · September 10, 2019, 1:15pm

Having been a mathematician in a previous life, I really love this, even before digging into the details. But I wonder if there have been any pronouncements by the core team?

Personally I think this is the kind of change that the core team should make more or less unilaterally - or rather together with the authors and other experts - but in any case it would be very interesting to know what their stance is.

Karl · September 10, 2019, 3:57pm

Well, Swift doesn't currently have any official, stable integration with C++. It seems that some members of the S4TF team have been working on it, but it's not directly related to differentiable programming and not part of this proposal AFAIK.

David_Sweeris · September 10, 2019, 5:41pm

There's "nothing wrong" in the sense that the industry seems to get along fine with the status quo, but it does mean you have to learn an entirely new language -- as well as any requisite language interop mechanisms if you want to integrate with an existing app -- to make the jump from being a "regular programmer" to being an "advanced math programmer" (or however you want to phrase it). Learning the math can be hard enough... why raise the barrier to entry even higher?

Torust · September 11, 2019, 5:50am

A slight aside, but if people are curious for more concrete examples of how this could be useful outside of machine learning, it's worth having a look at Mitsuba 2 and the Enoki library it's built upon. Mitsuba 2 is a state-of-the-art renderer/path tracer that implements inverse rendering (i.e. trying to reconstruct a scene based on photographs), and uses automatic differentiation to do so. (I have nothing to do with these projects, I just think they’re very impressive.)

As someone who works in computer graphics, the idea of having first-class language support in Swift for these types of tasks is fairly exciting – the combination of built-in SIMD primitives, automatic differentiation, and potential support for heterogenous compute through MLIR all make Swift an appealing language for computer graphics, and I’m happy the Swift for Tensorflow team has been putting all this work towards it.

leozhao · September 11, 2019, 7:33pm

This proposal includes Differentiable protocol, @differentiable function types and @differentiating. I am aware those keywords have been carefully chosen to meet existing best practices and conventions. However they are kind of spelling-wise too close to each other.

Comparable attributes to @differentiable would be @objc, @IBAction,
which tend to describe the intended usage of type than describing the type itself.
Would @autodiff serve the purpose of indicating a function can be used in Automatic Differentiation, and also direct compiler to generate derivative for it?

yxckjhasdkjh · September 12, 2019, 12:53pm

I've been reading a lot of comments to the effect that "ML practitioners want another language than Python" and that "differentiable programming in Swift will make the language attractive to those practitioners", but no proof for those statements.

Personally, I believe that the lack of good support for Windows (and the many rough edges still existing around Linux support) are much bigger reasons why people wouldn't just switch from Python to Swift. Some people said that moving from Swift to Python feels like regressing to the 1990s; in terms of language features, that might be true, but in terms of tooling of ecosystem, it's the other way around.

Maybe these things need to go hand in hand (add unique ML features to Swift and improve the ecosystem and tooling), but not doing them together IMHO risks creating a Swift that is significantly more complex, yet the people who would benefit from that complexity wouldn't even use the language. I think history shows that languages with amazing features can fail to be adopted due to a number of reasons, and "making it hard to install the language" typically is one of these reasons.

(Also, I almost take offense at some statements above that "discrete maths is simple" and that CS maths is a regression from calculus; there is nothing wrong with calculus/analysis, it's a wonderful branch of mathematics, but it's not all of mathematics, and there are just naturally a lot more concrete use cases from (equally interesting) branches such as combinatorics, graph theory, abstract algebra etc. in computer science. And even calculus doesn't necessarily mean numerical calculus, but includes symbolic methods as well. As an example, I am working on a symbolic maths engine, and for that purpose, AD doesn't seem to be useful (which is fine, since probably numerical applications are more common, but still).)