Differentiable Programming Mega-Proposal

Your comment reads like a lot of confirmation bias. You do this all day, so everyone will be doing it all day. Most programs require simple logic that doesn't necessarily use much more than basic arithmetic. It's not likely this will change anytime soon, and there's nothing wrong with niche languages. Consider Matlab or R.

1 Like

@Avi I didn't say "everybody", in fact what I am saying is: please consider including features that other communities (aka not everybody) will use.

8 Likes

I'm really excited to see this work start to make its way into the official SE process. I can't wait to see the applications for this the Swift community develops in the coming years.

Thank you for all the hard work you're doing to make Swift a great language in new domains! It is important for Swift to continue to grow beyond its initial domain of building apps for Apple platforms. I can't imagine a better way to do that than giving Swift a unique and innovative capability that is extremely relevant to the industry today. Even better, this feature targets a domain focused on high performance numerical computing. Making Swift a great language for numerical computing will benefit the whole community by letting us do more without leaving the language (especially since having the language often means dropping down to C and losing memory safety).

Language vs library

Since there are no precedents for a language with first class automatic differentiation it is no surprise that there is some pushback on deep compiler integration. This is especially the case given Swift's relative weakness in the area of metaprogramming and absence of support for library-defined code transformations.

While it may be possible to move some parts of this feature into a library it looks to me like requesting that the entire feature live in a library isn't reasonable given the stated goal. The design is deeply integrated with the language and the type system. This integration is essential to providing truly first class AD. As an example, it is not at all clear to me how a library could do replace @differentiable function types without imposing a significant burden on users of AD, especially given the carefully thought out subtyping relationships of such functions.

I would be interested in seeing the authors explore this topic a bit deeper. In an ideal world, what metaprogramming features would you like to see in Swift to support AD and how would that influence the design of this proposal? What parts do you think could ideally live in a library and which features really must be implemented in language itself?

I agree with the concern of others have expressed that there is a risk of continuing to kick metaprogramming down the road if when we encounter large concrete use cases for metaprogramming features, instead of designing and implementing them we put special case features in the language that might someday be replaced with a library solution building on top of future metaprogramming features. We should be cognizant of this risk and honest about how far we go down this road before we start prioritizing the metaprogramming tools Swift needs in order to enable library developers.

AD beyond ML

Other pushback appears to be due to the roots of this feature in the ML domain and a sense that the feature won't really general purpose in practice. It's natural that many of the concrete examples in the proposal are derived from the domain the team building it is focused on. This gives the proposal an emphasis on ML that I don't believe is representative of how the Swift community will use the feature if it is accepted. Of course it will be heavily used in the ML domain, but it will also be used in interesting ways in many other domains.

With no precedent in other languages we aren't able to draw on concrete examples that have been developed elsewhere in other domains. We have to use our imaginations. I think the proposal does a good job of describing domains beyond ML where AD will be extremely useful, such as graphics and scientific computing. But pointing in a direction is much different than providing concrete examples. I think the authors could make a stronger case for the general utility of the feature by fleshing out even just a few simple examples in other domains.

I also encourage people pushing back to think not only about their own code, but also about the libraries and frameworks that might be interesting to them which might take advantage of AD. We all benefit (at least indirectly) when Swift becomes a better language for writing libraries.

Additional language integration

The proposal does not discuss any interaction of AD with Swift's mutability model. There is a close relationship between a function with the signature (Float) -> Float and one with the signature (inout Float) -> Void. Is there a reason AD supports the former and not the latter? Is deeper integration with the mutability model something that could be explored in the future?

Similarly, the proposal mentions the possibility of providing a condition conformance for Result to Differentiable. There is a close relationship between (Float) -> Result<Float, E> (which could be differentiable when E: Differentiable and (Float) throws -> Float. If typed errors are added to Swift in the future would it be reasonable to consider supporting differentiation of throwing functions when the error type conforms to Differentiable? The utility of this is not immediately obvious so feel free to consider it a hypothetical question.

Minor questions

If a non- Differentiable or a let stored property is not marked with @noDerivative , then it is treated as if it has @noDerivative and the compiler emits a warning (with a fix-it in IDEs) asking the user to make the attribute explicit.

Why is this a warning rather than an error? It seems hard to justify impact behavior with a warning. We don't do that anywhere else in Swift (afaik).

Properties in the synthesized TangentVector have the same effective access level as their corresponding original properties.

Can you clarify what you mean by the same effective access level when the original property was declared private? The original property would be private to the Differentiable type within the declaring file. There is no way to spell that scope of visibility on a property of a different type. Would the property be private to the synthesized TangentVector, fileprivate (so it is also visible in the scope of the Differentiable type in the declaring file), or would it be a Voldemort scope of some kind?

36 Likes

While reading this thread, I can't help but get the impression that the size of the potential user base for this is being significantly understated. Judging from the discussion here one might assume this is only useful for a few hundred people or something. The scientific/ML world is huge and is one of the fastest growing software fields. We are talking about potentially millions of users who could benefit from this going forward. Who do you think is driving all this growth in Python? I think it would be foolish to miss the opportunity to extend Swift as the scientific successor to Python. The data science community desperately needs to migrate to a high performance language and having first class support for the underlying maths they use could be the catalyst for that migration.

27 Likes

I am +1. With this reservation about the implementation:

2 Likes

The MLIR team is already working on meta-programing for Swift (although for another purpose):

I am not an expert on language stuff but I think it might be a bit unrealistic to ask for this to be implemented in a (non-existing) meta programming language feature given its taken 1.5 years to get this correctly, it seems differentiating through control flow is specially tricky. Although it would be nice to hear from the authors about this.

1 Like

I think that while ML will probably be used by millions of users, most of the softwares that provide ML functionalities are just based on existing ML frameworks and don't need low-level primitive like differential programming.

That does not mean that we should not include differential programming in the language, if only to help these frameworks implementers, but I don't think that it will be effectively used by as much user as you think.

2 Likes

I want to predicate what I'm saying with a few thoughts. I am a ML engineer by trade so I am certainly biased here; however, I come from a background in software engineering before getting into ML so I see Swift as more than a potential replacement for Python(although I would love nothing more). I have spent the last few years patiently waiting for Swift to mature into a strong general purpose language, and I firmly believe it would be amazing as such. I am not a contributor to Swift, only a loyalist that has used it for iOS/MacOS dev work and someone who wants to see it succeed outside that scope.

Having said that, It's going to be a long road getting there, if at all. Right now Swift is just another option in a pool of a dozen or so languages fighting to gain traction. For as awesome as Swift is, there's nothing ground-breaking about it that truly makes it stand out as a GP language. I understand the desire to protect and serve the existing users, but adhering too strictly to that cause will undoubtedly ensure that Swift never breaks out of the "Oh yeah, Swift is that iOS language right?" rut it is in currently. Swift needs something that none of those other languages have to differentiate it(maybe a little pun intended) and I can't think of a better thing than what is being proposed here.

The scientific/ML community is the perfect set of users to help push Swift into being a GP language and I think the Swift community should be excited about providing them with such an incredible opportunity to build new tools in a way they couldn't before. They have a broad set of use cases and they are deeply cross-platform. I have colleagues that use Windows, Mac, and Linux, and most of them use a mixture of those in their daily lives(MacOS/Windows for dev, Linux for training/production). Embracing scientific/ML(and all the other crazy stuff that could be done with baked-in differentiation) would most likely lead to an explosion in development and use for Swift. New libraries, tooling, and resources, as well as higher visibility even to those outside scientific/ML. Python's growth is evidence of this effect.

Adding these features and embracing scientific computing doesn't have to only benefit the people inside that field. I believe it could benefit the greater Swift user base directly and indirectly.

25 Likes

Certainly Swift is not going to have a million new users overnight. And I'm really coming from a "total addressable market" perspective with that number.

It definitely is possible to have a lot of people begin using Swift for ML in the future though. With first-class diff in Swift, the development of ML frameworks like Swift for TensorFlow(and certainly new ones that follow) will be dramatically easier and more painless. And once these frameworks mature, I think the Python users will be very interested. Look at the momentum PyTorch has behind it. Most of that simply comes from just having a really nice clean API (it's why I switched to PyTorch from TensorFlow). If all these people are beginning to use PyTorch because it made their life a little easier, I think the performance, and flexibility Swift frameworks could offer would be equally enticing.

3 Likes

This is needlessly accusatory and divisive. No one reading these forums wants Swift to fail. (Least of all the Apple-platform developers who have already put years into using it.)

If the language can support more users in other fields and disciplines, that's wonderful. The concern is whether this specific proposal comes at the expense of both existing and future uses. The proposal wants to add various things to the core language. Is that going to negatively affect future changes (including those that might be wanted by some new users from yet another field)? For example, there are obvious generalizations to some of the mechanisms that are proposed -- is adding them now in this specific form going to make it harder to make those generalizations later?

Consider dynamic member subscripts: this was proposed because some folks wanted better Python bridging to support their work. But it wasn't added as "Python bridging for TensorFlow": we got a really nice, generally applicable feature.

"Is this proposal the right way to support this case" is not the same as "this case should not be supported".

16 Likes

Sorry if it read that way, I should've quoted specific comments as to not generalize. My concern was that there there are comments rejecting the proposal on the basis that is not useful for their work.

4 Likes

Not at all. I see the conflict here as anchor bias, between the systems programmers wanting a "better C++" and the data scientists wanting a "better Python". They are different backgrounds, but enviably the same future.

My point is there is a good chance, the data science side will win the larger audience, as more people need to solve data problems, than code apps. The authors offered many use cases, as did I. It is neither obscure nor domain specific to ML. As they say "its just math", it has no domain.

Your premise however is that they should , but I’m not seeing a lot of evidence for that.

No, My premise is we don't, because 50 years of programming languages didn't support it. So we did the only thing we could. People are practical.

5 Likes

I think the "it's just math" argument is actually pretty good. Languages already come with primitives and operators that cover a wide range of mathematics, why should this be any different? It is definitely not domain specific and simply opens up new doors for writing derivative oriented software(ML just happens to fall nicely into that category).

12 Likes

This. While it would be great for Swift to support enough meta-programming for this feature to be implemented, it doesn't seem like we are in any way close to this.

In my opinion the application of this is broad enough that it probably deserves it's own feature in the compiler. Off the top of my head I would have loved to have something like this in my robotics class where we had to do multiple derivatives by hand to implement in code later. Also in education imagine having a "playground" style app where kids can play around with the equations of motion and can see in real time a plot of how it affects speed and acceleration over time without having to compute the derivative each time they change the equation. It really kills exploration.

I'm sure there's a lot more but the biggest one of all is obviously AI and we can't downplay its importance right now. We can't ignore it's contribution to the increase in popularity of python as others mentioned. This feature is the only thing that has the chance to shift the tide onto Swift, and even if it's a long shot in my opinion this is the most desirable future possible for the language and it's worth every effort to try. How big is this feature? As far as I know it's the part that keeps Tensorflow from being a regular library. How important is that? Well important enough that one of the most popular MOOC has started teaching Swift for Tensorflow in the final lessons of its advanced course because they believe it has the potential to be a much better alternative to current tools: https://course.fast.ai/videos/?lesson=13.

The community seems divided between the ones who think this feature is of critical importance (I am one of these), the ones who think just a handful of users will use it, and the ones who see its potential but would like it to be implemented in a way that is not a special case, but that similar custom extensions to the compiler can be built on top.

Group one doesn't need to be convinced. I think if someone can confirm that this feature will in no way impact execution or compile time for those that don't need it, then a lot of group two would be neutral about it. As for group three, I've seen hints that if this was implemented in a more generic way the errors and warnings from the compiler can't be as good. It would be good to get a much more in depth explanation of what exactly we'd be missing and why it's important.

Just my 2 cents from reading this thread.

10 Likes

Well it's clearly not of critical importance because no current mainstream language has it (except Python?) and we've got along for 70 or 80 years without it.

Compile time and execution time are not the only potential costs. There's also compiler size and executable size, also the extra bugs that the code is going to introduce into the compiler. The proposal is 60 pages long, which makes it pretty much impossible for the community to review it for errors and suggests implementation is going to be time and resources consuming, possibly delaying other needed features and there will be a cost to future maintenance.

The idea does sound pretty cool, but don't pretend there will be no impact on the people who won't be using it.

3 Likes

By that logic nothing new is ever of critical importance just because it's never been done before. My opinion (emphasis on opinion) is that this is of critical importance if we ever want Swift to have any chance of taking python's place (python doesn't have this feature btw and that's why it's the best/only chance to win over developers). This is an opinion and it's just as valid as yours. I have no interest in using Swift for any purpose other than data engineering and machine learning. Obviously it's not critically important to people who are already using Swift for other purposes.

I'm not pretending anything as I'm only a potential user and not involved in the creation of this feature. I'm just trying to prompt the developers to address the potential costs that you and others have brought up and tell us if and how much they will impact existing users who will never use this feature. I think you would agree that this needs to be brought into the open sooner rather than later because otherwise we'll keep endlessly discussing abstract things.

3 Likes

Not being critical doesn't mean its not worth doing. I just dispute the implication that the world (of Swift) would end if it isn't done.

Do you mean take over in your field? Or generally? Because the latter is not happening any time soon no matter what features we bundle into Swift. Swift is already vastly superior IMO to Python as a general purpose programming language and it's not really denting Python's dominance. The reasons why are not because of missing features.

As for dominance in your field, well you know it a lot better than I do. I can understand why you might want Swift to take over from Python (for me, programming in Python after programming in Swift feels a bit like time travelling back to thew 1990's), but I would be concerned if our primary motivation for adding any new feature was so Swift could eat some other language's lunch.

I apologise. I worded that sentence poorly. It should have been more like "let us not pretend...": I wasn't accusing you specifically, or at all, really. In fact, I think we agree on the point about costs i.e. we want to know what they are. I just want to be sure people are aware that the speed is not the only metric of costs.

I'll reiterate: I think this is a cool feature. I am just concerned that the costs might be too high at this stage.

5 Likes

Sure it's not close to being critical to Swift's survival, that's a fact. However it's arguable whether it's critical for Swift to break away from the iOS/OS X programming language box that it's currently in.

I disagree that Python feels 1990s I actually really enjoy the language but agree with you that I find Swift superior in almost every way. But I don't know many programming languages that became dominant because they were better than the ones before. They usually break through the competition when there's a feature/library that makes a certain task they try to solve a lot easier. It happened with Ruby after Ruby on Rails, then with Python after numpy/Pandas/Jupyter made it easier than any language to prototype for data science, then again after Tensorflow/Airflow etc.

I believe that languages get popular by riding waves when they come by providing superior libraries to its competitors. There's no bigger wave than machine learning right now, the libraries are still immature/hard to use and if you give them a compelling reason to switch and get some key people behind it (google is behind this project, fast.ai is teaching a new generation and hoping to use this in their courses, etc.) then people might follow.

No matter how good Vapor gets it would be almost impossible to get the majority of people to drop Django, but compel them to learn Swift for Tensorflow and you might get a lot more people willing to start using it for other projects.

I think we all agree on this point and that was the whole purpose of my initial post. We need to know the costs before we decide knock down or praise the proposal.

I appreciate you correcting me on that.

2 Likes

I'm like the idea of adding differentiable programming to Swift.

At the same time, I can appreciate some of the pushback. Integrated source-transformation techniques could be used to power lots of advanced and exciting features. Even acknowledging the importance of differentiation as a fundamental mathematical function transformation, might there be other kinds of transformations which would benefit from similar treatment - e.g. integrals or fourier transforms? Would it be reasonable to add "first-class language support" for all of them? I'm not qualified to answer those questions. Maybe there is something about differentiation which makes it uniquely suited to this kind of implementation strategy.

If we think this is a general problem and would like to support other mathematical transformations one day, I think we should break the protocol and conformances out in to a Differentiation core-library which we gradually move more of the transformation-related logic in to as we figure out how that should work.

Some other things to consider (responding to concerns above, and just generally thinking out-loud):

  • Maintainability
    How confident can we be that this won't bit-rot? That's a question the core team and corporate contributors will have to consider. Google have a bit of a spotty record when it comes to long-term support for projects - how do we know they won't one day drop their investment in Swift in favour of Go or some other language? It's impossible to say what will happen in the future, but hopefully Chris Lattner's involvement is a sign that Google's interest in Swift is not fleeting. Additionally - Google is far from the only company interested in machine learning. I'm sure Apple and IBM would be very interested in using Swift for ML.

  • Portability
    I might be alone in this, but I consider it important that it remains at least theoretically possible to write 3rd-party Swift compilers. In fact, there already is (at least) one: RemObject's Silver compiler. It seems like the S4TF team have published whitepapers and technical documentation describing how the transform should work, and the implementation in the official compiler is of course open-source, so I hope it would be possible to implement in other backends as well. We might ask @marc_hoffman for his perspective on how portable this feature is.

Finally, I'd like to thank the S4TF team for all their hard work. As somebody who has been spending more and more time with TensorFlow, I truly cannot wait to ditch Python. I think I might literally throw a party the day that S4TF has its first stable release.

22 Likes

Among these exciting additions, there's one that I'm particularly interested. That you can, in a way, tag a related function onto the the other.

func expf(_ x: Float) -> Float { ... }

@differentiating(expf)
func _(_ x: Float) -> (value: Float, differential: @differentiable(linear) (Float) -> Float) { ... }

This adds great amount of safety to the function implementations, and seems to make sense as a function member. Function is a first-class object, but we can barely do anything aside from calling them and/or passing them around. Now if we could extract the related function like so

expf.differentiate(4.5)

It'd be an independently useful feature. Especially when there's a lot of meaningful transformation; inverse, Fourier/z transform, derivative, integration, etc. Then the compiler synthesis could come later.

6 Likes