Started to work on swift-diffusion, a port of Stable Diffusion in Swift

Hi, all,

I started to work on this a few days ago. Now the model runs on my computer* and produces matching image against stable diffusion: GitHub - liuliu/swift-diffusion

At the moment, it is a bit of a hassle to get it run. The immediate next is to move the tokenizer off Python, that will make the whole thing "Swift-only". After that, will be various performance optimizations, memory usage optimizations, CPU / Metal work to finally make this usable on mobile. I estimate at least a month of work ahead to achieve mobile-friendly.

For some fun, here is the logo for this project generated with prompt: "a logo for swift diffusion, with a carton animal and a text diffusion underneath". It seems have trouble to understand exactly what "diffusion" is though :slight_smile:

21 Likes

Python tokenizer is removed. Now it is "Swift-only".

4 Likes

This is exciting!

I realize this is extremely early, but in it's current state, how does performance compare to something like DiffusionBee on an M1?

There is no comparison. swift-diffusion in current form only supports Linux + CUDA. CPU support will come later and at that time, it can run on Mac / iOS. But to run it efficiently, some ops need to leverage the hardware either as Metal compute kernels or use the neural engine: tinygrad/accel/ane at master · geohot/tinygrad · GitHub DiffusionBee currently use MPS backend implemented in PyTorch to run on M1 efficiently. I haven't looked too deep into how the MPS backend implemented but would imagine some Metal kernels plus ANE there.

1 Like

@liuliu note that on the M1 Max at least, the GPU F16 processing power is higher than the ANE. Also, the GPU is more programmable and can be accessed with lower latency. Make sure to use MPS and MPSGraph, otherwise try the simdgroup_matrix in Metal Shading Language. This will provide the highest matrix mul performance.

Apple went this route with MetalFX temporal upscaling. Most people suspect that it runs on the ANE, but it actually runs entirely on the GPU. It's also restricted to only M1/Pro/Max and doesn't run on A14/A15, probably because it needs sufficient GPU F16 TFLOPS.

MPSGraph should be pleasant to use for making neural networks, so I advise trying exclusively MPSGraph at first. But measure the CPU-side overhead, which is massive with MPSGraph, before shipping the final product.

Also if you're struggling with performance, don't hesitate to ping me in a GitHub issue and ask for advice. I know the ins and outs of Metal :)

7 Likes

Hi liuliu,
can this work in the Xcode? SwiftUI specifically.

I've been updating the repo in the past a few days. Now img2img should work, as well as inpainting (or you can call it outpainting, really depends on where the mask is). Both requires text prompt to work (it is weird for inpainting, but I haven't figured out a way to avoid that).

As of now, it doesn't work with Apple hardware yet (requires CUDA, therefore, Linux).

What about M2 in the MacBook Air? Can it run there?

@liuliu Are you going to change it to what @philipturner recommend to use MPSGraph to replace CUDA? It's of no use for people in this forum as we are all using Apple's M processor.

Thanks for you work! So where are you running this Swift port? On what hardware platform to do what? Just curios.

Yeah, the plan is to support macOS / iOS with enough work. Starting with CUDA is easy as it is proving and I know where to look. Once I get it running on CPU, the work will move to enable it with MPS (such that I can compare results on macOS).

I am running with Swift 5.6.3 on Ubuntu 20.04 with CUDA 11.7 (or any CUDA after 10.2 should be fine), hardware RTX 2080 Ti (should be compatible with other RTX cards as long as having more than 8GiB memory).

How can I help?

1 Like

I am setting up now to validate CPU version works on my macOS. Need to do some scaffolding so we can port one MPSGraph op over at a time. If you have a macOS with M1 / M2 chips, certainly would be helpful when we port MPSGraph over. I am still running Intel macOS so any MPS work need to be validated on a iDevice. Will let you know when the scaffolding are done and in the porting op mode.

1 Like

I have an M2 MacBook Air (10 core GPU).

I am very curious to know how this M2 CPU and GPU perform compare to the M1 and M1 Max!

Are you going to use the GPU or ANE? Or both to see which is better?

Please let me know how I can help!