Swift for TensorFlow Resurrection: Swift running on Colab again

philipturner · December 20, 2021, 1:27am

This weekend, I've been working on how to sideload Swift on Google Colab (repo: philipturner/swift-colab). Eventually, this will turn into loading Swift for TensorFlow as a Swift package, pre-compiled as a binary target instead of a toolchain. I got the point where I can pass an arbitrary string of Swift code as a Python string, then compile and run it.

The next step is to install PythonKit into the Colab virtual machine and support SwiftPM. I was exploring the Swift compiler and Swift Package Manager, and came across this tutorial.

After getting it to work on my Mac, I decided to try it out again on Linux. I realized that having the ability to carry out the commands in Swift instead of Shell had some advantages (I could write a string literal to a file without learning the Shell command for that ), and Google Colab is easier to setup than Docker. Also, it was exciting to see Swift code execute something generally useful in Google Colab!

Below is the link to the Colab notebook. Anyone can make a copy and run the program. The first code block is modified to pull from the save-1 branch of the GitHub repository, which will stay stable unlike the main branch.

import swift

swift.run('''
import Foundation
let fm = FileManager.default

... // do some stuff

func doCommand(args: [String]) throws {
    let command = Process()
    command.executableURL = .init(fileURLWithPath: "/usr/bin/env")
    command.arguments = args
    try command.run()
    command.waitUntilExit()
}

try doCommand(args: ["swiftc", "-D", "DEBUG", "point.swift", "main.swift", "-o", "point-app"])
try doCommand(args: ["./point-app"])
''')

Output:

debug mode
Hello world! 4 20

I can't guarantee that side-loading will break out of passing strings to Python and support the full Jupyter notebook experience. Regardless, sideloading makes it possible to utilize cloud GPUs and TPUs in the new Swift for TensorFlow. I can validate that the upcoming Metal backend doesn't break CUDA support.

For more context about the Swift for TensorFlow resurrection:

philipturner · December 20, 2021, 6:49pm

I should be able to call Swift functions from Python. I can take the memory address of an Python object via id(object) and spawn a process using the address as an argument. Then, a pre-compiled Swift executable can transform the memory address into a PythonObject using a modified fork of PythonKit.

This form of bridging is needed to allow subclassing the Jupyter Kernel class, while implementing the kernel's logic in Swift instead of Python. Currently, the Swift Jupyter kernel from @marcrasi is written almost entirely with Python (in the swift_kernel.py file).

philipturner · December 21, 2021, 1:25am

I got to the point where I can call a Swift function from Python! I made a C-compatible Swift function, compiled that into a dynamic library (.so file), then loaded that file using the Python ctypes library. I passed the id of a Python string into the C-like Swift function, and verified that it was a memory address!

The next step is to compile my fork of PythonKit and convert that reference back into a Python string! I'll post the source code for this once I have the two-way interface between Swift and Python fully working.

philipturner · December 22, 2021, 8:14pm

I got PythonKit to compile on Google Colab. All I have left is to make Swift executables accessible to Python, then I can subclass the Jupyter kernel and (hopefully) resurrect full Swift support in Colab.

Michael_Gottesman · December 22, 2021, 8:36pm

@philipturner I have a question for you/sort of a challenge/I don't know if it is possible... but I would think about how the swift project can have a test to make sure that whatever you are fixing still works.

Do you have any thoughts on how we could do that? Otherwise there isn't a guarantee that the breakage will not reappear.

philipturner · December 22, 2021, 8:42pm

I am eager to help you out. Colab support is still unstable, and I'm nowhere near finished. The first paragraph of your reply wasn't very clear due to how it was worded. Could you reiterate that?

Michael_Gottesman · December 22, 2021, 8:44pm

Sorry, my first sentence means that I am unfamiliar with what you are exactly doing and what that would mean in terms of testing it.

For instance, I am not sure if it is possible to test on swift's CI whatever you are fixing. The reason I am saying it is a challenge is that I am challenging you to think if it is possible = p.

philipturner · December 22, 2021, 9:38pm

It is possible to set up a semi-automated test. Since Colab requires a Google account and authorization to run, I don't know whether a GitHub YAML bot could use it. However, you could automate most of the workflow and manually trigger the test on a personal Colab notebook, much like you manually trigger a @swift-ci workflow.

Modify my build script to pull from the nightly build, and add some Swift scripts that you can download into a Colab notebook. You need to be cautious about build times, as you will be kicked off of Colab if one task takes too long. However, what you are thinking of is entirely doable.

Once I have Colab fully functional again (or the best it can possibly get), I'm going to switch to MetalXLA and be fully occupied with finishing it before PyTorch releases their Metal backend. When that happens, you could work on some testing scripts and try them out in Colab.

philipturner · December 31, 2021, 12:20am

Swift-Colab is complete! Several tutorials from Swift for TensorFlow have been tested on it, and the Python unit testing suite has been transformed into a series of Colab notebooks. These future-proof it by allowing you to test specific Swift versions, ensuring Colab support is never dropped again. Furthermore, I can test Swift 5.3 (the last version S4TF worked on) and the toolchain only takes 30 seconds to download because of how fast Google's internal servers are .

Unlike before March 2021, this version of the Swift Jupyter kernel does not come with any libraries built in. You must explicitly import PythonKit and Differentiation. Soon, I'll patch up TensorFlow and allow that too. It will require a special installation command because it takes very long to compile and SwiftPM doesn't support pre-compiled Swift binaries yet.

Thanks @Michael_Gottesman for the suggestion to future-proof it. These tests must be manually run, but they don't take too long and I don't expect them to break often.

If anyone is interested in catching up on the effort to resurrect S4TF, here's a good repository for reference: https://github.com/philipturner/resurrection-of-s4tf

Daniel_Mullenborn · December 31, 2021, 1:32am

I didn't expect you to finish so quickly. Just tried it myself, works really great.
So far I have not had any problems.

nikitamounier · December 31, 2021, 6:20am

Congrats man! Super impressive stuff.

robnik · January 1, 2022, 3:56pm

How are you getting Swift to be "interactive"? Is this using LLDB? (That's what I saw in some old Jupyter kernel code.)

I ask because I recently looked into using Swift as an app extension/scripting language. Unfortunately, things seemed very dead, so I gave up and I'm currently experimenting with other languages.

I'd like to stay with using Swift, but I need something like this (basically, eval):

import SwiftCompiler

let code = "... func baz() { ... "
let compiled = try! compiler.compile(code)
let baz = compiled.lookup("baz()")
baz()

It would need to run quickly, and I fear that Swift's advanced type-checking may make that impossible.

philipturner · January 1, 2022, 5:53pm

@robnik at the lowest level, it involves an LLDB type called SBValue. It's accessed through the LLDB Python API, which I called from Swift using PythonKit. The types are converted to either fundamental Python types or a hierarchy of members (e.g. a struct or class). For your purposes, I'd first import the "swift" Python library in Google Colab as outlined in my earlier posts on this thread:

import PythonKit
let SwiftModule = Python.import("Swift")
SwiftModule.run("""
... some Swift code
let data = try JSONEncoder.encode(...)
data.write(to: \(... input filename here))
""")
let data = ... (read from filename)
let jsonDecoded = try JSONDecoder.decode(...)
// jsonDecoded is the output

Note that the above technique has very high latency and involves spawning a new process. If you study the Jupyter kernel's source code, there's a function called preprocess_and_execute in both the new and old code. It does what I described above, but in a more elegant way. Also, it goes directly through LLDB/REPL without spawning a new process.

Second, I recommend that you try copying some of the code from the Sources/SwiftColab/JupyterKernel directory, which includes preprocess_and_execute and all of the functions it calls. Then, refactor it into the API you described in the above comment.

marcrasi · January 4, 2022, 8:06pm

Hi! I was on holiday and not checking any of my notifications or emails, but I'm back now. I'll read through this thread and see if I know anything that can help.

philipturner · January 4, 2022, 8:51pm

I have already got everything figured out on Google Colab, and I can close out the issue on the swift-jupyter repo. I’m having trouble compiling Swift for TensorFlow, so I would rather have you help me with that if possible.

robnik · January 7, 2022, 4:12pm

Would you agree that in the long run, we need some kind of pitch/proposal to improve interactive Swift, with a runtime API to the compiler and loader? Going through LLDB seems like a hack. I'm also worried that it may be too difficult to create a single system that has both an advanced whole-program type checker (like Swift) and a low-latency interactive programming environment (like Python). But I don't know; I'm curious what the compiler devs think. I've not seen them comment on this topic. Swift playgrounds gave me hope, but in my experience they are buggy.

philipturner · January 7, 2022, 5:00pm

That’s what Swift REPL is for (sort of), although it’s broken on every platform except the built-in toolchain on macOS. I get your point that a runtime API is distinct from the currently available tools. If Colab support withstands the test of time, I think your idea should be brought up on Swift Forums as a serious proposal.

philipturner · May 1, 2022, 9:31pm

If anyone is confused by the large number of deleted comments, I'm transferring them over to the GitHub repository to chronicle its history. They made it difficult to follow the actual discussion taking place, and served better as a log or journal. They're now viewable at Documentation/ForumThreadComments.md.