Formalizing a Numerical/ML Working Group

philipturner · January 17, 2022, 6:04pm

I'm waiting on answers to a few questions about autodiff from those with expertise with the compiler or anything related to S4TF. Do any of you think you could help? My objective is to fix up autodiff so I can build S4TF on the newest nightly and open it up to arbitrary backends besides CTensorFlow (e.g. Metal, Accelerate, DirectX).

Even if you can't help me out right now, I really need to be in contact with people from the Swift community who are just as motivated as me to continue using Swift for ML.

PS: Why is every post on this thread made on the 21st day of a month?

scanon · January 17, 2022, 6:08pm

"Aug '21" is August 2021, not August 21st.

xwu · January 17, 2022, 7:20pm

Darn it, we’ll have to keep posting only on the 22nd of every month now.

binderh · January 17, 2022, 7:55pm

While I am currently considering to move some of my groups‘ algorithm development to Swift, strengths and weaknesses have to be carefully considered. Even in the Julia community, which has much more ML infrastructure compared to Swift, there are some serious discussion concerning the areas where competition with Python-based frameworks, such as PyTorch, and recent developments, such as JAX, can be successful (State of machine learning in Julia - Machine Learning - Julia Programming Language).

In my opinion Swift has a potential strength in terms of deployment, but even there competition is intense (TensorFlow.js | למידת מכונה למפתחי JavaScript).

Therefore, successful community building could/should be based on use cases that community members themselves have. Then a larger framework may or may not emerge, but this cannot be forced. Maybe a (virtual) meeting, where community members present their use cases, could be a starting point.

tjadejong · January 17, 2022, 8:52pm

Hi @philipturner,

I’d really like and am considering to move some of my data science work over to Swift. I have been using Python exclusively for this up until now, but i’m seeing quite some advantages from Swift over Python. One of them strong typing (which on the other hand can also cause some difficulties implementing ML), but I also think the language itself is much more suited to build error free and fast code. I have tried migrating some of my image processing code over to Swift (from c++) and that was already such a joy and so much simpler than c++. I have been thinking of more scenarios and would like to start working them in this year but one big hurdle for me would be the lack of (maintained) packages that would make this possible. A working S4TF would already make it possible for me to train some CNNs in Swift and that involves a large part of my work, so your work there is greatly appreciated. I am not sure I would be able to help there as it seems quite low level, but i am considering writing other packages that might at least provide some part of the puzzle. First of all, however, i would like to look at those parts of my data science work that can already be done fine with the current swift ecosystem. I think more will be possible within the Apple ecosystem but ideally i would like to be able to do my work on swift for linux,

Kind regards and big thanks for your work,

Tim

philipturner · January 17, 2022, 8:54pm

My primary objective and use case is to resurrect Swift for TensorFlow. Whether Google accepts or rejects my request for resurrection, I'll greatly benefit from the community's help. If it succeeds, it will be because so many people put momentum behind the proposal and showed Google that there is a reason to bring it back.

If it fails, I'll be all alone and lacking in resources to maintain a successor to S4TF. Having people contribute to and validate my work will decrease the chance that it dies off in the future.

Therefore, successful community building could/should be based on use cases that community members themselves have. Then a larger framework may or may not emerge, but this cannot be forced. Maybe a (virtual) meeting, where community members present their use cases, could be a starting point.

I'm open to other use cases, especially if it lets me talk to more people, but they will be a second priority. The scoped-down proposal for autodiff benefits my effort because it's locking in the connection between differentiation and machine learning. If autodiff isn't used in a resurrection of Swift for TensorFlow, then it will contradict the proposal's purpose.

philipturner · January 17, 2022, 8:54pm

@tjadejong perhaps my work with Swift-Colab will interest you?

tjadejong · January 17, 2022, 8:56pm

@philipturner yes i have been following your progress there. I need to still give it a spin though! Will try in the coming days!

If i understood correctly, it is possible to use Swift in colab now, but how far along is it? What kind of scenarios could we use it for now (asking out of interest, not trying to criticize)?

philipturner · January 17, 2022, 9:03pm

Not for Swift for TensorFlow! But, if you look under some issues on my Swift-Colab repository, you'll find it was used for graphing stuff. Right now, it's just a very accessible way for a beginner to learn Swift. In order to use it for S4TF, we need a Swift release toolchain with autodiff fixed. That's only going to happen if I fix autodiff entirely by the next WWDC, otherwise it will be summer 2023 before you can use S4TF in Google Colab!

Troy_Harvey · January 17, 2022, 9:16pm

A bit of an update on our end. We are still super keen on supporting the efforts of a working group. Our team has been continuing to develop differentiable Swift, focusing on improving performance. We currently manage a Swift development fork for our own use, upstreaming features as they get accepted.

Additionally our team has been building Swift frameworks and tooling for introspection, graph compute, heterogeneous neural networks, constraint solving, run-time kernel generation, and distributed computing. This year we will be focusing on accelerators. Our goal this year is to add an open source product manager to help assist the transfer to the community. We have 30+ engineers working on Swift numerical, ML, server-side, and embedded frameworks — perhaps(?) the largest group targeting non-apple platforms. So we are pretty committed. We have a national lab collaboration starting soon, we'll be diffusing numerical Swift further into the research ecosystem.

As for the marketplace. Swift is a unique target in todays market for what we call "AI for Grownups". For our needs in industrial solutions, the Swift vin diagram would be hard to replace given 1st class differentiability, compiled fast performance, the strong type system, introspection, and strong safety guarantees. Many of these problems are poorly suited untyped python, or the dynamic approach in Julia, or the smaller solution footprint in Rust. Given Swifts unique shape, we will be doubling down on our investment this year on the Swift ecosystem (team, OSS, partnerships).

-Troy

philipturner · January 17, 2022, 9:20pm

Talking about hardware acceleration, I originally got into ML because I wanted to accelerate DL4S with Metal. My plan for opening up S4TF to other backends might align with your work. I'm thinking of a simple API for both eager and graph computation, similar in concept to TensorFlow's PluggableDevice. By itself, it's not strictly tied to ML. What do you think about this?

github.com/tensorflow/swift-apis

Implementation of Metal/Accelerate/DirectX backend

opened 11:52PM - 10 Jan 22 UTC

closed 11:12PM - 21 Jan 22 UTC

philipturner

I'm splitting this into a third issue for organization purposes. This conversati…on stems from https://github.com/tensorflow/swift-apis/issues/1185, but that conversation is going to get crowded with multiple diverging topics otherwise. The following is a comment I copied and deleted from the parent thread (Brad Larson may have been pinged twice as a result): --- Quoting this reply by @BradLarson from way above: https://github.com/tensorflow/swift-apis/issues/1185#issuecomment-994129723 > Broader accelerator support would most likely be possible only through a replacement of the TensorFlow layer with a different underlying runtime. That would open up many options for accelerators that would be challenging to support via TensorFlow. I now see the meaning behind that statement. My goal is to allow acceleration through Metal, Accelerate, DirectX, and custom backends. I'm currently debating whether I should go for PluggableDevice or something entirely custom that removes the dependency on CTensorFlow. 1. Using PluggableDevice would be the easiest option, only if I could get it to work. I'm scanning over TensorFlow's documentation, and there seems to be some pain points with using it for S4TF. One is the `LoadLibrary` function, which must load a file in a Python-like file structure. Another is that it seems to make assumptions that you're using Python TensorFlow. Furthermore, I have to gain a lot of expertise with TensorFlow's C API, which would cost time. Using PluggableDevice means 100% investing in CTensorFlow, so no chance for deep learning on iOS like with [DL4S](https://github.com/palle-k/dl4s). 2. The alternative option is to remove as many connections as possible to CTensorFlow. This is more flexible, but has a greater chance of bugs; I'm maintaining it alone while TensorFlow has dozens or hundreds of maintainers. There are many ops that I can't synthesize (e.g. `cholesky_grad`), so I might fall back to TensorFlow eager for them. I have to assume the backend has a graph compiler, which puts more burden on the backend creator. That's true for Metal/Accelerate and might be true for DirectML. If there's a way to separate X10 from XLA and use a custom graph compiler instead of XLA, then a custom runtime would be preferable to PluggableDevice. One more limiting factor is that I can't personally cache new CTensorFlow/X10 builds (e.g. TF 2.7.0) and download them from SwiftPM. X10 binaries seem to be ~40 MB, but the maximum file upload size to GitHub is 25 MB. Google Drive seems like an even worse option. Removing CTensorFlow entirely means I don't need to store any pre-compiled executables online, and it fits more nicely with SwiftPM. I would rely on the existing 2.4.0 CTensorFlow binaries for Google Colab, Linux with CUDA, and nothing else.

I am interested in learning more about your group and maybe even joining, if it both helps your goals and gives me the resources to complete the new backends for S4TF. I'm entering college next year, so I don't think I can be a full-time employee in the case I do join.

philipturner · January 17, 2022, 9:28pm

I rewrote the Swift Jupyter kernel in Swift instead of Python because Swift is for grown-ups. Removing Python is one very big reason why I’m resurrecting S4TF. Also, it’s that I had a dynamic where my iPhone was more powerful that my Mac for a year (I had an Intel Mac mini). If TensorFlow were written in Swift, I could have used my iPhone for model training.

binderh · January 17, 2022, 10:07pm

Thanks for the update. I take this as a sign that Swift autodiff is not at the danger of software rot due to a lack of users, thus encouraging investment by my group. As a group in academia (with a focus on developing new algorithms for modeling of biomedical data), we are not equipped well enough for developing the foundation, such as in an own Swift autodiff fork, but can contribute implementations of machine learning techniques that hopefully also are interesting to others.

philipturner · January 17, 2022, 10:09pm

@binderh I was wondering whether you had decided to avoid Swift because I was wavering on whether I would devote time to fixing autodiff. I'm glad you stuck around until now!

fan · January 26, 2022, 8:53pm

Just at a side note , SwiftFusion is still available as an example on doing graph optimization with Swift. The missing piece in 2022 is 1) S4TF still not useable in Release mode, and 2) one outstanding PR that is WIP which enables SwiftFusion to be compiled with the main toolchain.

can be easily solved by extracting the Tensor type from S4TF (just implement some shims), and (2) needs a little bit of patching. But both are not difficult to do. Hope this is useful to @binderh .

philipturner · January 28, 2022, 1:58am

can be easily solved by extracting the Tensor type from S4TF (just implement some shims), and (2) needs a little bit of patching. But both are not difficult to do. Hope this is useful to @binderh

@fan my current plan for making new backend for my S4TF fork requires extracting the Tensor type from S4TF. I want to find a generic protocol (named after PluggableDevice from Python TensorFlow) that provides the ideal interface between a frontend and accelerator backend. It can require a method that takes an arbitrary kernel name as a string, similar to how the CTensorFlow API executes operations. Thus, I can redirect kernel calls to MPSGraph and BNNS, getting it to run on iOS just like DL4S.

Retroactively restructuring the existing CTensorFlow backend for TPU and CUDA support might be a hurdle, though. Either way, I have a gut feeling this will be much more than "just implement some shims", at least if you're looking for a long-term solution. If the Swift numerical community comes together and agrees to share a common lightweight Swift package (might be named Tensor instead of PluggableDevice), we could all mix and match each other's frontend frameworks and backends. This would help in the long run regardless of whether I succeed at bringing back S4TF, and make Swift a unique opportunity for cross-platform software.

The package I'm thinking of will be similar in size and function to Swift Numerics and its ElementaryFunctions protocol. It just has new types like Tensor, maybe LazyTensorBarrier, and Device/PluggableDevice (I'm a bit wary of using Device since that name is already taken in S4TF and DeviceKit).

philipturner · January 30, 2022, 4:52pm

@binderh does not having forward-mode differentiation restrict what you can do? I don't know much about biomedicine and how derivatives are used for it.

binderh · January 30, 2022, 7:15pm

As many biomedical scenarios are of the type “many inputs, few outputs”, forward more is not so important.

Troy_Harvey · February 3, 2022, 11:00pm

It would be good to get other PassiveLogic team member thoughts on this question... but here is my view.

Tensors are an unnecessarily narrow type to be overly focused on. I think the notion of tensors in deep learning are largely an implementation detail, that has over time become the jailer of the thinking of what ML should look like. If instead we come from a fully differentiable Swift point of view, within the scope of a richly typed language... shouldn't the goal be a generalized complier for any type, any function... all differentiable? Shouldn't we be able to resolve compute graphs of many types and compile them for many accelerator architectures? My view is Tensorflow was a simple stepping stone, but ultimately an unnecessary part of the puzzle... and honestly was in the way toward solving the larger set of problems.

As an example of new avenues in differentiable computing: We just released an industry open standard called Quantum that defines heterogeneous networks of physically-based compute graphs. This digital twin standard is built on a Swift DSL, runs on a Swift compute engine, and solves autonomous control problems in real-time in a Swift AI engine. We will open source some of the infrastructure this year, after the release of the digital twin standard.

All that being said, we'd love to support pitches for a fixed size array type. But in the end, they are just another type.

philipturner · February 3, 2022, 11:31pm

All that being said, we'd love to support pitches for a fixed size array type. But in the end, they are just another type.

There were a lot of people telling me to “rip the Tensor out of Swift for TensorFlow”. Your insight makes it easier for me to bring S4TF back to life. Instead of having to meet the demands of other users, I can aggressively optimize the Tensor type for my purposes. I still plan to open up the new S4TF to arbitrary backends, and a lot of that is about how Tensor is structured.

P.S. I just fixed 2 AutoDiff bugs and I'm learning the skill of debugging the Swift compiler much faster than I anticipated. Maybe this is the sign of something good for finishing AutoDiff and the proposal for Swift evolution?