Differentiable Programming Mega-Proposal

Alejandro · September 7, 2019, 10:44pm

I'd like to thank everyone on the S4TF team because this is really well worded and really in-depth.

While personally I don't see much use case in this, I can understand the desire to upstream this so that others who do need to use AD already have access to it without downloading another swift compiler. That does lead me to believe that maybe we could gate this somehow so that those who don't need this aren't exposed to it, while those who do can explicitly ask for it (aka maybe we do something like -enable-differentiation or something similar?). I also question whether this needs to be in the standard library because perhaps one day if/when we have custom code transformations we'd want to separate the two?
Maybe it also makes sense to expose this functionality through a import Differentiation module?

I'm not too sure on the fine details, but these are just some initial thoughts.

wadetregaskis · September 7, 2019, 10:50pm

While I did chuckle at this humorous comic, it’s all the funnier because it’s true. Most CS graduates don’t use half the mathematics they learned in school. Your premise however is that they should, but I’m not seeing a lot of evidence for that. Programming for most people is more about moving data around than mathematics. (“programmer”, noun: a person that copies data between protobufs )

I think it’s also approaching ironic to apply this line of argument in this pitch given that machine learning is basically about shoving a metric duckton of empirical data through magic boxes and seeing what comes out, which is about as far from the purity of advanced mathematics as you can get.

haikuty · September 8, 2019, 4:56am

Since this big change is so specific to a seemingly small subgroup of users of Swift, I think a clear indication of any added friction this will introduce for the majority of Swift users would be helpful.

How will it impact compile times for code bases that don’t use this functionality?

Does it add any friction for programs that don’t want or need this functionality?

Impact on debugger speed and/or ergonomics for programs not using this?

Basically, is supporting this for the small group who need it going to negatively impact the much larger group who doesn’t? Not doing so seems like a base requirement for approving / merging this into Swift.

Beyond that it sounds like a powerful capability to add (though the discussion about better way to include it seems pertinent).

Lantua · September 8, 2019, 9:17am

I have a few questions:

How does @differentiable propagates across statements and/or generics?

@differentiable func foo(value: Float) -> Float { ... }
let bar = foo // Is it still `@differentiable`?
let result = array.map(foo) // Does the map *see* the `@differentiable`?

What's the runtime cost for upcasting from @differentiable functions to non-@differentiable ones?

From what I see of the runtime memory-layout diagram, you'd at least need to copy the function pointer and context pointer into new memory (heap or stack). Is there a reason for not appending the derivative pointer at the end so that we'd have zero-cost conversion? Or is that what happening and the diagram is simply misleading (if so, fixing the diagram would be nice)?
How future-proof is the memory layout? If we want to add another extension to the function, say invertability which needs inverse function pointer, would it compose with the current layout?

sighoya · September 8, 2019, 11:29am

I don't understand the role of the transposing annotation, afaic remember, it wasn't included in the original gist. Why we need this yet?

pvieito · September 8, 2019, 12:20pm

I am pretty +1 on this proposal, it seems an exciting addition to Swift, but I think all its features should be gated with a new module Differentiation. So the Differentiable protocol, @differentiable or derivative(of:) should not be available until the user does import Differentiation.

Paul_Cantrell · September 8, 2019, 2:11pm

I share Rob’s concern.

Certainly a general-purpose framework that would allow this capability as a library-level feature would lead to an even-more-mega proposal. And I can imagine that keeping this baked into the compiler allows more low-level optimization in a performance-sensitive space.

Still, this seems to sit at a different level of abstraction and domain specificity than the entire rest of the language. Baking these things in as language features does feel a bit like the tail wagging the dog. It’s not ideal, even if it proves necessary.

cgarciae · September 8, 2019, 2:26pm

Sorry this comment wont be technical because the discussion contains a lot of opinion:

So sad that the (mobile) community is just rejecting this because they (the individuals) don't use it and not thinking about what is best for Swift (the language). Python's raise is thanks to it being able to integrate diverse communities (desktop, backend, scientific), differentiable programming is definitely the future for scientific computing and a as Machine Learning Engineer I use gradient-based algorithms on a daily-basis now. I've been using S4TF and I think integrating this to Swift is THE solution for scientific computing, Swift has the chance to become this.

The real question is: do you want Swift to remain a mobile only language?

Avi · September 8, 2019, 2:30pm

Your comment reads like a lot of confirmation bias. You do this all day, so everyone will be doing it all day. Most programs require simple logic that doesn't necessarily use much more than basic arithmetic. It's not likely this will change anytime soon, and there's nothing wrong with niche languages. Consider Matlab or R.

cgarciae · September 8, 2019, 2:43pm

@Avi I didn't say "everybody", in fact what I am saying is: please consider including features that other communities (aka not everybody) will use.

anandabits · September 8, 2019, 4:39pm

I'm really excited to see this work start to make its way into the official SE process. I can't wait to see the applications for this the Swift community develops in the coming years.

Thank you for all the hard work you're doing to make Swift a great language in new domains! It is important for Swift to continue to grow beyond its initial domain of building apps for Apple platforms. I can't imagine a better way to do that than giving Swift a unique and innovative capability that is extremely relevant to the industry today. Even better, this feature targets a domain focused on high performance numerical computing. Making Swift a great language for numerical computing will benefit the whole community by letting us do more without leaving the language (especially since having the language often means dropping down to C and losing memory safety).

Language vs library

Since there are no precedents for a language with first class automatic differentiation it is no surprise that there is some pushback on deep compiler integration. This is especially the case given Swift's relative weakness in the area of metaprogramming and absence of support for library-defined code transformations.

While it may be possible to move some parts of this feature into a library it looks to me like requesting that the entire feature live in a library isn't reasonable given the stated goal. The design is deeply integrated with the language and the type system. This integration is essential to providing truly first class AD. As an example, it is not at all clear to me how a library could do replace @differentiable function types without imposing a significant burden on users of AD, especially given the carefully thought out subtyping relationships of such functions.

I would be interested in seeing the authors explore this topic a bit deeper. In an ideal world, what metaprogramming features would you like to see in Swift to support AD and how would that influence the design of this proposal? What parts do you think could ideally live in a library and which features really must be implemented in language itself?

I agree with the concern of others have expressed that there is a risk of continuing to kick metaprogramming down the road if when we encounter large concrete use cases for metaprogramming features, instead of designing and implementing them we put special case features in the language that might someday be replaced with a library solution building on top of future metaprogramming features. We should be cognizant of this risk and honest about how far we go down this road before we start prioritizing the metaprogramming tools Swift needs in order to enable library developers.

AD beyond ML

Other pushback appears to be due to the roots of this feature in the ML domain and a sense that the feature won't really general purpose in practice. It's natural that many of the concrete examples in the proposal are derived from the domain the team building it is focused on. This gives the proposal an emphasis on ML that I don't believe is representative of how the Swift community will use the feature if it is accepted. Of course it will be heavily used in the ML domain, but it will also be used in interesting ways in many other domains.

With no precedent in other languages we aren't able to draw on concrete examples that have been developed elsewhere in other domains. We have to use our imaginations. I think the proposal does a good job of describing domains beyond ML where AD will be extremely useful, such as graphics and scientific computing. But pointing in a direction is much different than providing concrete examples. I think the authors could make a stronger case for the general utility of the feature by fleshing out even just a few simple examples in other domains.

I also encourage people pushing back to think not only about their own code, but also about the libraries and frameworks that might be interesting to them which might take advantage of AD. We all benefit (at least indirectly) when Swift becomes a better language for writing libraries.

Additional language integration

The proposal does not discuss any interaction of AD with Swift's mutability model. There is a close relationship between a function with the signature (Float) -> Float and one with the signature (inout Float) -> Void. Is there a reason AD supports the former and not the latter? Is deeper integration with the mutability model something that could be explored in the future?

Similarly, the proposal mentions the possibility of providing a condition conformance for Result to Differentiable. There is a close relationship between (Float) -> Result<Float, E> (which could be differentiable when E: Differentiable and (Float) throws -> Float. If typed errors are added to Swift in the future would it be reasonable to consider supporting differentiation of throwing functions when the error type conforms to Differentiable? The utility of this is not immediately obvious so feel free to consider it a hypothetical question.

Minor questions

If a non- Differentiable or a let stored property is not marked with @noDerivative , then it is treated as if it has @noDerivative and the compiler emits a warning (with a fix-it in IDEs) asking the user to make the attribute explicit.

Why is this a warning rather than an error? It seems hard to justify impact behavior with a warning. We don't do that anywhere else in Swift (afaik).

Properties in the synthesized TangentVector have the same effective access level as their corresponding original properties.

Can you clarify what you mean by the same effective access level when the original property was declared private? The original property would be private to the Differentiable type within the declaring file. There is no way to spell that scope of visibility on a property of a different type. Would the property be private to the synthesized TangentVector, fileprivate (so it is also visible in the scope of the Differentiable type in the declaring file), or would it be a Voldemort scope of some kind?

mdlockyer · September 8, 2019, 4:47pm

While reading this thread, I can't help but get the impression that the size of the potential user base for this is being significantly understated. Judging from the discussion here one might assume this is only useful for a few hundred people or something. The scientific/ML world is huge and is one of the fastest growing software fields. We are talking about potentially millions of users who could benefit from this going forward. Who do you think is driving all this growth in Python? I think it would be foolish to miss the opportunity to extend Swift as the scientific successor to Python. The data science community desperately needs to migrate to a high performance language and having first class support for the underlying maths they use could be the catalyst for that migration.

hooman · September 8, 2019, 5:32pm

I am +1. With this reservation about the implementation:

cgarciae · September 8, 2019, 6:03pm

The MLIR team is already working on meta-programing for Swift (although for another purpose):

I am not an expert on language stuff but I think it might be a bit unrealistic to ask for this to be implemented in a (non-existing) meta programming language feature given its taken 1.5 years to get this correctly, it seems differentiating through control flow is specially tricky. Although it would be nice to hear from the authors about this.

Jean-Daniel · September 8, 2019, 8:44pm

I think that while ML will probably be used by millions of users, most of the softwares that provide ML functionalities are just based on existing ML frameworks and don't need low-level primitive like differential programming.

That does not mean that we should not include differential programming in the language, if only to help these frameworks implementers, but I don't think that it will be effectively used by as much user as you think.

mdlockyer · September 8, 2019, 8:59pm

I want to predicate what I'm saying with a few thoughts. I am a ML engineer by trade so I am certainly biased here; however, I come from a background in software engineering before getting into ML so I see Swift as more than a potential replacement for Python(although I would love nothing more). I have spent the last few years patiently waiting for Swift to mature into a strong general purpose language, and I firmly believe it would be amazing as such. I am not a contributor to Swift, only a loyalist that has used it for iOS/MacOS dev work and someone who wants to see it succeed outside that scope.

Having said that, It's going to be a long road getting there, if at all. Right now Swift is just another option in a pool of a dozen or so languages fighting to gain traction. For as awesome as Swift is, there's nothing ground-breaking about it that truly makes it stand out as a GP language. I understand the desire to protect and serve the existing users, but adhering too strictly to that cause will undoubtedly ensure that Swift never breaks out of the "Oh yeah, Swift is that iOS language right?" rut it is in currently. Swift needs something that none of those other languages have to differentiate it(maybe a little pun intended) and I can't think of a better thing than what is being proposed here.

The scientific/ML community is the perfect set of users to help push Swift into being a GP language and I think the Swift community should be excited about providing them with such an incredible opportunity to build new tools in a way they couldn't before. They have a broad set of use cases and they are deeply cross-platform. I have colleagues that use Windows, Mac, and Linux, and most of them use a mixture of those in their daily lives(MacOS/Windows for dev, Linux for training/production). Embracing scientific/ML(and all the other crazy stuff that could be done with baked-in differentiation) would most likely lead to an explosion in development and use for Swift. New libraries, tooling, and resources, as well as higher visibility even to those outside scientific/ML. Python's growth is evidence of this effect.

Adding these features and embracing scientific computing doesn't have to only benefit the people inside that field. I believe it could benefit the greater Swift user base directly and indirectly.

mdlockyer · September 8, 2019, 9:14pm

Certainly Swift is not going to have a million new users overnight. And I'm really coming from a "total addressable market" perspective with that number.

It definitely is possible to have a lot of people begin using Swift for ML in the future though. With first-class diff in Swift, the development of ML frameworks like Swift for TensorFlow(and certainly new ones that follow) will be dramatically easier and more painless. And once these frameworks mature, I think the Python users will be very interested. Look at the momentum PyTorch has behind it. Most of that simply comes from just having a really nice clean API (it's why I switched to PyTorch from TensorFlow). If all these people are beginning to use PyTorch because it made their life a little easier, I think the performance, and flexibility Swift frameworks could offer would be equally enticing.

woolsweater · September 9, 2019, 2:03am

This is needlessly accusatory and divisive. No one reading these forums wants Swift to fail. (Least of all the Apple-platform developers who have already put years into using it.)

If the language can support more users in other fields and disciplines, that's wonderful. The concern is whether this specific proposal comes at the expense of both existing and future uses. The proposal wants to add various things to the core language. Is that going to negatively affect future changes (including those that might be wanted by some new users from yet another field)? For example, there are obvious generalizations to some of the mechanisms that are proposed -- is adding them now in this specific form going to make it harder to make those generalizations later?

Consider dynamic member subscripts: this was proposed because some folks wanted better Python bridging to support their work. But it wasn't added as "Python bridging for TensorFlow": we got a really nice, generally applicable feature.

"Is this proposal the right way to support this case" is not the same as "this case should not be supported".

cgarciae · September 9, 2019, 4:11am

Sorry if it read that way, I should've quoted specific comments as to not generalize. My concern was that there there are comments rejecting the proposal on the basis that is not useful for their work.

Troy_Harvey · September 9, 2019, 4:21am

Not at all. I see the conflict here as anchor bias, between the systems programmers wanting a "better C++" and the data scientists wanting a "better Python". They are different backgrounds, but enviably the same future.

My point is there is a good chance, the data science side will win the larger audience, as more people need to solve data problems, than code apps. The authors offered many use cases, as did I. It is neither obscure nor domain specific to ML. As they say "its just math", it has no domain.

Your premise however is that they should , but I’m not seeing a lot of evidence for that.

No, My premise is we don't, because 50 years of programming languages didn't support it. So we did the only thing we could. People are practical.