Differentiable Programming Mega-Proposal

Karl · September 13, 2019, 1:50pm

That's a good point, and something I wanted to bring up as well. I'm not sure if it fits in to the language - currently, we synthesise member-wise implementations of things like Equatable and Codable without an attribute if an implementation is not given. Is this one really necessary, or should it follow that precedent and do things member-wise by default?

Letan · September 13, 2019, 2:02pm

@memberwise was introduced because of this pitch about synthesis of AdditiveArithmethic if I'm not mistaken. That pitch was not about Differentiable exactly, but the same rationale applies.

David_Sweeris · September 13, 2019, 2:23pm

bartchr808:

WRT to the claim of inefficient code in the section on Symbolic differentiation, is there a source on that? I don't just mean "Mathematica benchmarks" or the like; that section seems (to me, anyway) to imply that symbolic differentiation is inherently slower, not just that whatever implementations the authors are familiar with are slower. If the former is what's being claimed, I'm not aware of any proof.

The problem with symbolic differentiation comes from "expression swelling" and not working on control flow. Here are a couple of great sources that are publicly available about it. I couldn't find an exact proof as to why this is true, but hopefully this clears it up.

Slide 12

Page 3 under "AD is not symbolic differentiation"

Slides 4-8

Page 2, second paragraph of introduction

Thanks!

Spencer_Kohan · September 13, 2019, 2:33pm

I agree with you that Swift's cross-platform story is sorely lacking, but in terms of using S4TF it's pretty trivial to get it up and running in the context of a docker container. It's no more convoluted than getting a Python workflow setup with pyenv/anaconda etc.

Spencer_Kohan · September 13, 2019, 2:51pm

Big +1 for me on this proposal. As someone who works on simulations, this would improve my QOL and adds a ton of power to the language, and seems to come with little or no drawbacks for users who the feature is not relevant for (i.e. it should not affect compile times if you simply don't use it).

With respect to the concerns about the priority of metaprogramming, I don't necessarily think it's putting the cart before the horse to implement a proposal like this before giving attention to general metaprogramming tools in the language (which I agree are currently missed).

It's clear that a lot of work has already been put into this proposal and it is already quite clearly defined, and IMO it doesn't make sense to put something like this which is "ready to go" on pause in order to flesh out another massive set of language features which may require months or years of refinement. IMO that's letting the perfect be the enemy of the good, and it's a great way not to make progress in the language.
I am sure that this feature, if implemented, would be tremendously elucidating with respect to the design and implementation of metaprogramming in the future. IMO it's always valuable to have tangible, non-trivial use cases implemented for a feature like metaprogramming in order to tease out the non-obvious requirements.

cgarciae · September 13, 2019, 7:19pm

I would encourage people who are hesitant about this proposal to actually check out its implementation, that is the tensorflow/swift repo. This isn't just a proposal idea, you can install it, follow its tutorials, it even comes with the fastai course where Jeremy and Chris talk about this for 4 hours + tenths of example notebooks, or you could just open the jupyter notebook on google colab (it can't get easier than that).

I think many are just underestimating the scope and ambitions of swift for data science in general, and nobody is to blame, swift has been a "niche" language for the apple ecosystem (even Vapor hasn't been able to grow the backend ecosystem much) hence these alien features seam hard to digest. I think people should take a step back and think more strategically about the opportunity that having this unique feature presents for Swift in 10 years time.

Many people here compare the usefulness of other features that power e.g. SwiftUI, just so you know don't own any apple products, know nothing about iOS development, I've been using Swift on Ubuntu + VSCode + sourcekit-lsp, so in theory I could say "yeah, this function builder stuff is interesting, but I am not going to use it, -1". In fact I love the idea of function builders and I hope Swift grows enough so I can someday maybe do some UI stuff with it. I do ML in my day to day but I also get excited about frontend stuff even if I am not going to use it much.

The idea is sound, the changes are minimal ( a protocol + 3 attributes), the impact for the future is huge (open swift for AI, which would also grow the linux plus backend because models have to be deployed). Please look a MLIR, SwiftAI, SwiftRL, OpenSpiel by DeepMind; Swift is starting to get used for AI if that is news for you. This should be a +1 with modifications at maximum, don't know how rejection is being considered.

David_Sweeris · September 13, 2019, 7:23pm

True. OTOH, what happens if/when we later add sufficient metaprogramming such that all this special code is no longer needed for this proposal’s functionality, but the resulting design for whatever reason isn’t either source or binary compatible (or both, I suppose) with code written to this proposal? Do we break compatibility? If so, why is metaprogramming a strong enough reason while other, arguably more widely-applicable proposals are still bound by compatibility requirements? Do we instead make the metaprogramming implementation special-case everything to maintain compatibility with this proposal? Or maybe we just maintain the proposal’s code and the metaprogramming code and just make sure this proposal’s code tries to parse it first?

Once we have metaprogramming, unless we can ensure it can be implemented in a manor that meets our compatibility requirements, there may not be a clean solution for having both.

Edit: Or at least that’s my worry.

mdlockyer · September 13, 2019, 9:07pm

This is really what I was getting at. It's a pain. I have an older MacBook that doesn't support the new XCode that I wanted to use as a Dev machine and I can't. It runs fine and is perfectly capable of the workload, but essentially Apple still decides whether or not I can run the latest version of an open source language. I should be able to install the lastest version of Swift with an installer(like Python) or just brew install swift regardless of what version of an IDE I have.

elhoov · September 14, 2019, 8:19am

Kind of off topic but I couldn't agree more. If the OS X specific parts can't be distributed separately then at least we should have the ability to download a server side swift toolchain that command line executables can be built in without needing XCode. I'm currently using Swift 5.1 in docker in a MacBook pro which is kind of ridiculous, because I can't run 5.1 without downloading a 5GB beta version of XCode.

Paul_Cantrell · September 15, 2019, 1:48am

Cristian, I suspect this was directed in part at me, because I wrote this:

It seems my point here got lost. I’ll try to clarify it.

Function builders were built around one specific framework in one specific domain. Yet the way they’re laid out opens the doors to all sorts of eDSL applications distant from the original problem. Function builders are general, repurposable, engineered to let the feature’s problem domain grow in ways the feature’s designers couldn’t have anticipated.

The proposal at hand is about derivatives. It doesn’t generalize to anything other than derivatives. The only unanticipated uses it serves are unanticipated uses of derivatives.

And yes, I get it, derivatives are useful in a great many applications. So are, say, trig functions and complex numbers — but they don’t get specially privileged language features, because the language provides machinery flexible enough to present them using more general language constructs.

But … derivatives are so important! Don’t they deserve their own specially privileged corner of the language itself?

Well, sure, maybe, for now! But here’s the thing: the particular way the chain rule lets derivatives propagate through ASTs and the type system is not a pattern magically unique to derivatives and derivatives alone. There are doubtless other interesting applications with some kind of chain-rule-like “recursive transformation” behavior that could benefit from the underlying machine of this proposal.

There is an abstraction here yearning to emerge. That’s what my function builders comparison was about: they found that larger abstraction for SwiftUI, and it’s worth trying to find it here.

I’m not saying to reject or even substantially delay the proposal. I’m saying just please, as a part of this proposal, let’s think about how this might generalize in the future, both (1) to lay the groundwork for that future context and (2) to make sure differentiable functions have a chance of making sense in a future world where they’re migrated to this hypothetical more general feature and no longer entirely baked into the compiler itself.

Spencer_Kohan · September 16, 2019, 8:49am

I can understand the desire not to over-specify the language and have a lot of "one off" features that cater to specific use-cases, but I think there's also a danger in trying to find an abstraction for the sake of abstraction.

In some sense it's always possible to go one level up, and to solve the more general case than you have at hand. But what can and does happen with this kind of pure abstraction, when not informed by concrete use-cases, is that you end up serving a general case that doesn't end up mapping well to your future needs, because you were just imagining what those needs might be in the first place.

Now if someone has a second concrete use-case now for something similar to automatic differentiation right now that might be worth considering, or if there are some pieces of the implementation which could easily be generalized (@rxwei and others who've worked on this might be able to speak to that) then I think it's worth having that discussion.

However, IMO it's better not to reach for alternate use-cases with little intrinsic motivation behind them only for the sake of finding an another abstraction which might be applicable. I think it's perfectly fine to add just this feature to the language, which serves real users right this minute and opens up a wide variety of mathematical applications in the future. IMO the right time to think about abstraction is when another concrete use-case emerges from the community which can use some shared machinery from this feature.

DeFrenZ · September 16, 2019, 9:05am

That's when you don't have to care about compatibility...
The fear here is that with Swift's source and ABI stability constraints, once this feature is in place it could seriously hinder further development on the hypothesised abstraction.

Dante-Broggi · September 16, 2019, 10:51am

I think that adding this ~now is acceptable only if it is explicitly not ABI stable, and is only available when one imports Differentiation. It would be, effectively, a library feature, just faked.

Incidentally, I think that the generalized feature is also a direct generation of “function builders” as well: ExpressibleByFunctionLiteral & callAsFunction & generics over function type parameter count (either via variadic generics, or non-type generics).

ExFalsoQuodlibet · September 16, 2019, 1:34pm

I'm super happy for this proposal, and I 100% agree with people like @cgarciae that are excited for the the strategic advantage that something like this gives to Swift, as a vehicle to expand the Swift interest and subsequent user base. So, definitely a +1 for me.

I also share the 5 years old concern that something more general could be done, but honestly, after having pointed out for years that Swift is missing major abstraction primitives, I don't really care anymore. There's plenty of excellent proposals, happily accepted with great support, that contain a "future directions" section with conditionals like "if Swift had this feature" or "when this will be implemented" that are, honestly, not particularly useful: things evolve, requirements change, and if there's a chance to do something useful right now, that could be transformative in userland without breaking the compiler, so be it.

I think that the feature added by the proposal is extremely forward-thinking, and it's the result of a carefully considered long-term plan, rather than confronting a necessity that exists right now; this also means that considerable work was done to asses a (currently) specific, moderate use case instead of working on more general things, which causes inevitable friction with people whose "fringe" requests were many times deemed too specific and not urgent (and I'm one of those people). But I want to be positive on this and, even if I'm unable to see how I could use @differentiable in my current work, who says I won't see that when the feature is readily available and explorable? I'm confident.

Lantua · September 18, 2019, 5:00am

We should reframe this thread. It quickly devolved into "who need this!?" vs "heck I do!" discussion.

A lot has said that the proposal is needlessly niche without much concrete suggestion as to which direction that this could be generalized. We could do function transformation, but... How could it work in the current type system? What does the argument/return type look like compared to original function? What are relations between original and transformed function? There isn't even a syntactical level discussion. What would the declaration look like? What would the call-site look like?

The other side mostly provide anecdotal story about how it'd improve on the their current work, but not much (if at all) discuss about the current design. Could it be shipped as is, or is there any refinement to be had?

It feels stagnated. It's not moving, either toward rejection or refinement.

I want to discuss about the in-memory representation. Why would differentiable functions be subtype? Can't it compile down to tuple? What if we represent them in-memory as functions with shared context instead? How could we come up with (somewhat) future-proof call-site syntax? What did they try? What did they ditch, and why? There's so, so, so, so much I want to discuss, but they can easily be drowned by the aforementioned debates. In fact, I just realized I missed many of those when I go back and try to re-read the entire discussion.

I feel that framing is partially at fault. It's being represented as a proposal, but the sheer scale makes it unfitting. It introduces new function (implicit) subtyping, adds new annotations, and adds synthesizer for said annotations. Even if we all agree to refine this proposal, discussion in any part of the proposal easily competes with other parts as they currently go through the same thread.

Perhaps it does fit more as a Manifesto. Or perhaps, everybody already think it's a high-level manifesto and left me out .

jawbroken · September 18, 2019, 5:29am

Yes, as covered here:

The established name in Swift evolution for a document that is intended to be divided into manageable proposals for review is “manifesto”. I have various nitpicks and suggestions but I am waiting for the individual threads to discuss them.

Lantua · September 18, 2019, 12:05pm

Ahh, oops.

porterchild · September 24, 2019, 4:29am

+1 for me, I can see several applications of differentiable programming for my current Swift work in simulation, autonomous systems, networking, and of course machine learning.
As far as I'm aware, Swift would be the first time this math becomes so easily implemented that it becomes a general optimization problem-solving tool that's easy for me to reach for.

Chris_Lattner3 · September 25, 2019, 10:32pm

Hi David,

I'm sorry for the delay responding to this, I missed it fly by.

I can't "prove" that this feature cannot be implemented in a reasonable way with some metaprogramming feature. I can just say that I am not aware of any reasonable metaprogramming system that is likely to be adopted in Swift that would allow a first class implementation of AD along the lines of this proposal.

There are two possible explanations for this: 1) either there isn't a reasonable metaprogramming feature, or 2) I'm just unaware of it :-). I welcome exploration about this, because while we (the S4TF team) have sunk a lot of time and thought into metaprogramming related features and exploration (e.g. see Eugene and Alex's recent talks), I'm not aware of something that could provide this level of language integration.

The issue here isn't simply "code generation" for the backward path. We want first class type system integration, as well as SIL level and IRGen level integration. Notably, the proposal extends function types to track the VJP for the function - this means that function pointers get a different representation, as does witness table entries, vtable entries, and other things that touch function types. We also want to provide fine grained (native feeling) customization for VJPs, as that is one of the most common requests by advanced users.

AFAIK, eliminating the type system integration would eliminate the benefit of static checking from the proposal - pushing us down the same lines as the dynamic ("tape based") implementations, and (in my opinion) undermining the fundamental advances of this approach. That said, I can't prove that there isn't some other approach. I'd love to hear about it if anyone has a "reasonable" approach to this - where "reasonable" means "likely to be acceptable for the Swift language and likely to be approved by the evolution community.

As we've said a few times, our primary goal for S4TF is to use it to incubate advanced language features needed for applications of Swift outside the core iOS/apple developer ecosystem, and autodiff is a very important part of this. We don't want S4TF to be a fork of the language and ecosystem.

I'm really thrilled to see the discussions on this thread: it is great to see both the support for it as well as the concern about the overall language direction and complexity. These are super critical factors to balance - one of the reasons why I'm a big fan of swift-evolution despite all the challenges we face with it.

-Chris

David_Sweeris · September 26, 2019, 12:58am

Thanks for the thought-out reply! Is SF Scala: Eugene Burmako & Alex Suhan, Swift as syntactic sugar for MLIR what you're referring to here by "Eugene and Alex's recent talks"? I haven't had a chance to watch more than the first couple minutes of it, but I should have time after work tonight.