Swift concurrency and Metal

Lua · May 17, 2024, 10:19am

I am using MetalKit and Cocoa to render my game, but because terrain is procedurally generated and mutable I want to use concurrency to recalculate the meshes in the background so the game doesn't freeze every time something changes or more of the world is generated.

I have strict concurrency enabled which gives me warnings for everything like this:

let mouse = parent.window.mouseLocationOutsideOfEventStream

Which warns me that I can't access the MainActor isolated Cocoa api from the MTKViewDelegate (and it's definitely correct as ignoring the warnings and using concurrency anywhere in my program results in not even the window appearing).
But I do need access to things like mouse position at the exact time of rendering for example for the software UI renderer to draw the cursor.

How would I bridge metal code with a concurrent Swift program? Are there any relevant WWDC sessions I should see (I didn't find any )?

nkbelov · May 17, 2024, 10:26am

Not particular to your question, but typically you wouldn't want to "access mouse position at the exact time of rendering"; you would read out all relevant events / keystrokes / etc. at the beginning of your frame (say, into a Sendable struct) and then consider them immutable for the duration of the frame.

This obviously not only helps with concurrency, but is also a general architectural pattern to have predictable buffers of per-frame data to ensure that all rendering is consistently done using the same information.

Lua · May 17, 2024, 10:37am

That is actually something I'm doing, I meant time of rendering to be func draw(in view: MTKView) of the MTKViewDelegate
the game is already structured to be platform independent which implies getting input in a platform independent struct, so it eventually makes its way to the software UI like so:

    public func frame(input: Input, renderer: inout Image) {
        let elapsed = timer.lap()
        
        renderer.clear()
        renderer.text("Frame: \(elapsed)", x: 2, y: 2)
        renderer.draw(input.mouse.left ? cursorPressed : cursor, x: input.mouse.x - 1, y: input.mouse.y - 1)
    }

johnfairh · May 17, 2024, 10:43am

Apple haven't updated MetalKit / MTKViewDelegate for concurrency yet.

I can't explain the not even the window appearing part, but if it helps, the workaround I use in my renderer class which is marked @MainActor and implements MTKViewDelegate is:

    public nonisolated func draw(in view: MTKView) {
        MainActor.assumeIsolated {
            realDraw(in: view)
        }
    }

Lua · May 17, 2024, 10:51am

That did not go well for some reason

Building for production...
error: compile command failed due to signal 6 (use -v to see invocation)
Assertion failed: (SGF.ExpectedExecutor || SGF.unsafelyInheritsExecutor()), function emit, file SILGenConcurrency.cpp, line 650.

Nevermind that was not what crashed it

Ok I think the latest toolchain is just compiling my code into nonsense (again)

It should be closer to 200MB

It would be really useful if Xcode could launch the memory debugger without rebooting with SIP disabled

It's leaking memory so badly macOS started stuttering

Nevermind I forgot I'm not yet removing invisible faces so I was accidentally trying to allocate 1 610 612 736 vertices It's still using more memory on classes/actors without the vertices vs when I was manually managing it in Zig (with vertices) but it's not bad

After solving that issue I could try MainActor.assumeIsolated and is not working for me sadly, I see the blank window without the clear color or my UI on top

wadetregaskis · May 17, 2024, 3:44pm

While it's probably not the best approach - something like pre-fetching the necessary info at the start of each frame, as @nkbelov suggests, is probably better - but for edification you can do:

let mouse = DispatchQueue.main.sync {
    parent.window.mouseLocationOutsideOfEventStream
}

The really big caveat with that is that it blocks the current thread until the main thread responds. If the main thread is already doing something, that could be a while. So you risk serialising your code.

It's certainly possible to make well-behaved programs that use this sort of thing, but it requires discipline and care to basically keep the main thread idle all the time, so that it can serve these syncs very quickly.

nkbelov · May 17, 2024, 3:57pm

I think I understand what you're trying to do, but, assuming that draw(in view: MTKView) is indeed called from somewhere which isn't the main thread (i.e. if MainActor.assumeIsolated happens to crash there; can't check this myself at the moment) — or you really want to respect concurrency semantics — then I'd suggest you think about the event ordering the following way (FWIW this also applies even if it's happening on the main thread / actor, which being a method on a view it probably should):

At the time when the loop calls your draw(in:), it's already kind of irrelevant to gather user input (which includes the cursor position). Remember that rendering triple-buffers, meaning that the actual cursor position which changes during this call (maybe your user just so happens to move the mouse while you're assembling your GPU calls) will lag behind at least this one frame. Screen refreshes are fast enough that you don't have to be this instantaneous with handling the input; the proper place for the new pointer coordinates is the frame that comes after, and there you have ample space to read out this property from the main thread.

Or, correct me if I'm wrong and somehow misunderstand the particular case of mouse pointers.

nkbelov · May 17, 2024, 4:16pm

Just to add, if the Metal call actually happens to run on the main thread, and the compiler's complaint is simply because it wasn't annotated @MainActor, then DispatchQueue.main.sync will deadlock, so @Lua may want to keep this in mind too.

Lua · May 17, 2024, 4:57pm

I would like to do that, but Cocoa makes control flow incomprehensible, I have no idea what it's going to call and when and so I don't know how to structure my code correctly.
And it seems to change every few macOS updates in ways that subtly break code for me.

My code is otherwise very structured and deterministic, and most of my functions are pure.

I barely found any documentation on creating a windowed app without Xcode, this is what I have now:

    static func main() {
        let instance = Self()
        
        let delegate = AppDelegate(game: instance)
        let app = NSApplication.shared
        app.delegate = delegate
        app.setActivationPolicy(.regular)
        app.run()
    }

Once I call run() I have no idea where anything is running. This is not what I want, could I run Apple's classes on the side without giving Cocoa control over my program?

This is why I put everything in draw(in view: MTKView), it was the most obvious way to have my code run reliably every frame.

vns · May 17, 2024, 5:32pm

If you are writing macOS app, despite Xcode being… how to put it… controversial IDE in overall, I’d suggest to use it in such cases, it brings more benefits to the table.

If the app didn’t crash, then we can safely assume that delegate is called on the main actor, therefore issue not in delegate call, but in implementation of drawing itself.

If there was a crash, then delegate wasn’t called on the main actor, and you can safely use synchronious dispatch here to the main queue. But I would be a bit surprised if delegate of view isn’t called on main thread.

Lua · May 17, 2024, 5:39pm

I am, what I meant was my code is just a Swift package and I'm creating the .app myself.

But I might switch to something else, I was just experiencing editing a 100 line file with 1 second input delay (not exaggerating) on an M2 Max.

It didn't crash, but when I used any actors in my program other than MainActor (even not doing anything) draw(in view: MTKView) was actually never called in the first place. Xcode said I have +infinity frames per second and never hit breakpoints in the delegate functions.

vns · May 17, 2024, 5:47pm

I struggle to understand your experience with Metal, but seems like you are new to it, so I would go with default templates in the first place.

(Totally understand your pain on editing nightmare in Xcode, but either that, or googling/remembering tons of APIs because make autosuggestions work longer than 10 minutes outside of Xcode for Apple's SDKs I've found impossible).

There is too many unclear details to understand what's happening. You might not setting up flow properly, for example. Or as I've said the drawing implementation might be incorrect. If you can provide more details, the discussion might be more helpful.

Lua · May 17, 2024, 6:25pm

Okay

App Delegate

final class AppDelegate: NSObject, NSApplicationDelegate {
    public var game: Game
    public var interface: Image!
    
    public var window: NSWindow!
    private var metalView: MTKView!
    private var renderer: Renderer!
    
    public init(game: Game) { self.game = game }
    
    func applicationDidFinishLaunching(_ notification: Notification) {
        self.interface = .init(width: 400, height: 300)
        self.window = .init(
            contentRect: .init(x: 0, y: 0, width: 800, height: 600),
            styleMask: [.titled, .closable, .resizable, .miniaturizable],
            backing: .buffered,
            defer: false
        )
        window.center()
        window.minSize = .init(width: 800, height: 600)
        window.title = "Game"
        
        let menu = NSMenu()
        let main = NSMenuItem()
        main.submenu = NSMenu()
        main.submenu!.items = [
            NSMenuItem(title: "Quit Game", action: #selector(NSApplication.terminate(_:)), keyEquivalent: "q")
        ]
        menu.addItem(main)
        NSApplication.shared.mainMenu = menu
        
        self.metalView = MTKView(frame: window.contentView!.bounds, device: MTLCreateSystemDefaultDevice())
        metalView.autoresizingMask = [.width, .height]
        metalView.preferredFramesPerSecond = .max
        window.contentView = metalView
        
        self.renderer = Renderer(self, device: metalView.device!)
        metalView.delegate = renderer
        
        NSApplication.shared.activate()
        window.makeKeyAndOrderFront(nil)
    }
    
    func applicationShouldTerminateAfterLastWindowClosed(_ sender: NSApplication) -> Bool { true }
}

Metal View Delegate

@MainActor
final class Renderer: NSObject, MTKViewDelegate {
    private unowned let parent: AppDelegate
    private var commandQueue: (any MTLCommandQueue)!
    
    private var shaderLibrary: (any MTLLibrary)!
    
    private var interfacePipelineState: (any MTLRenderPipelineState)!
    private var interfaceVertexBuffer: (any MTLBuffer)!
    private var interfaceTexture: (any MTLTexture)!
    
    private var terrainPipelineState: (any MTLRenderPipelineState)!
    private var terrainVertexCount = 0
    private var terrainVertexBuffer: (any MTLBuffer)!
    private var terrainTexture: (any MTLTexture)!
    
    private var sampler: (any MTLSamplerState)!
    
    private var terrainUniformBuffer: (any MTLBuffer)!
    
    init(_ parent: AppDelegate, device: any MTLDevice) {
        self.parent = parent
        super.init()
        commandQueue = device.makeCommandQueue()
        createShaderLibrary(device: device)
        
        createInterfacePipelineState(device: device)
        createTerrainPipelineState(device: device)

        createInterfaceVertexBuffer(device: device)
        createInterfaceTextureState(device: device)
        
        createTerrainVertexBuffer(device: device)
        createTerrainTextureState(device: device)
        
        createSamplerState(device: device)
        
        self.terrainUniformBuffer = device.makeBuffer(
            length: MemoryLayout<Matrix<Float>>.stride,
            options: []
        )
    }
    
    func createTerrainTextureState(device: any MTLDevice) {
        let textureDescriptor = MTLTextureDescriptor()
        textureDescriptor.pixelFormat = .rgba8Unorm
        textureDescriptor.width = Block.atlas.width
        textureDescriptor.height = Block.atlas.height
        textureDescriptor.usage = [.shaderRead]
        textureDescriptor.storageMode = .shared
        textureDescriptor.mipmapLevelCount = 9
        
        let texture = device.makeTexture(descriptor: textureDescriptor)!
        self.terrainTexture = texture
        
        let bytesPerPixel = MemoryLayout<Color>.stride
        let bytesPerRow = bytesPerPixel * Block.atlas.width
        
        texture.replace(
            region: MTLRegionMake2D(0, 0, Block.atlas.width, Block.atlas.height),
            mipmapLevel: 0,
            withBytes: Block.atlas.flatten().data,
            bytesPerRow: bytesPerRow
        )
        
        // Generate mipmaps
        let commandBuffer = commandQueue.makeCommandBuffer()!
        let blitCommandEncoder = commandBuffer.makeBlitCommandEncoder()!
        blitCommandEncoder.generateMipmaps(for: texture)
        blitCommandEncoder.endEncoding()
        commandBuffer.commit()
    }
    
    func createInterfaceTextureState(device: any MTLDevice) {
        let textureDescriptor = MTLTextureDescriptor()
        textureDescriptor.pixelFormat = .rgba8Unorm
        textureDescriptor.width = parent.interface.width
        textureDescriptor.height = parent.interface.height
        textureDescriptor.usage = [.shaderRead]
        textureDescriptor.storageMode = .shared
        
        let texture = device.makeTexture(descriptor: textureDescriptor)!
        self.interfaceTexture = texture
        
        let bytesPerPixel = MemoryLayout<Color>.stride
        let bytesPerRow = bytesPerPixel * parent.interface.width
        
        texture.replace(
            region: MTLRegionMake2D(0, 0, parent.interface.width, parent.interface.height),
            mipmapLevel: 0,
            withBytes: parent.interface.data,
            bytesPerRow: bytesPerRow
        )
    }
    
    func updateInterfaceTexture() {
        let bytesPerPixel = MemoryLayout<Color>.stride
        let bytesPerRow = bytesPerPixel * parent.interface.width
        
        guard parent.interface.width == interfaceTexture.width &&
                parent.interface.height == interfaceTexture.height else { return }
        interfaceTexture.replace(
            region: MTLRegionMake2D(0, 0, parent.interface.width, parent.interface.height),
            mipmapLevel: 0,
            withBytes: parent.interface.data,
            bytesPerRow: bytesPerRow
        )
    }
    
    func createSamplerState(device: any MTLDevice) {
        let samplerDescriptor = MTLSamplerDescriptor()
        samplerDescriptor.minFilter = .nearest
        samplerDescriptor.magFilter = .nearest
        samplerDescriptor.mipFilter = .linear // Maybe separate for 3d?
        samplerDescriptor.maxAnisotropy = 8
        samplerDescriptor.sAddressMode = .repeat
        samplerDescriptor.tAddressMode = .repeat
        samplerDescriptor.normalizedCoordinates = true
        self.sampler = device.makeSamplerState(descriptor: samplerDescriptor)
    }
    
    func createShaderLibrary(device: any MTLDevice) {
        let compileOptions = MTLCompileOptions()
        compileOptions.fastMathEnabled = true
        self.shaderLibrary = try! device.makeLibrary(source: String(cString: SHADERS_METAL), options: compileOptions)
    }
    
    func createInterfacePipelineState(device: any MTLDevice) {
        let vertexFunction = shaderLibrary.makeFunction(name: "vertex_passthrough")
        let fragmentFunction = shaderLibrary.makeFunction(name: "fragment_passthrough")
        
        let pipelineDescriptor = MTLRenderPipelineDescriptor()
        pipelineDescriptor.vertexFunction = vertexFunction
        pipelineDescriptor.fragmentFunction = fragmentFunction
        pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
        pipelineDescriptor.colorAttachments[0].isBlendingEnabled = true;
        pipelineDescriptor.colorAttachments[0].rgbBlendOperation = .add;
        pipelineDescriptor.colorAttachments[0].alphaBlendOperation = .add;
        pipelineDescriptor.colorAttachments[0].sourceRGBBlendFactor = .sourceAlpha;
        pipelineDescriptor.colorAttachments[0].sourceAlphaBlendFactor = .sourceAlpha;
        pipelineDescriptor.colorAttachments[0].destinationRGBBlendFactor = .oneMinusSourceAlpha;
        pipelineDescriptor.colorAttachments[0].destinationAlphaBlendFactor = .oneMinusSourceAlpha;
        
        self.interfacePipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)
    }
    
    func createTerrainPipelineState(device: any MTLDevice) {
        let vertexFunction = shaderLibrary.makeFunction(name: "vertex_terrain")
        let fragmentFunction = shaderLibrary.makeFunction(name: "fragment_terrain")
        
        let pipelineDescriptor = MTLRenderPipelineDescriptor()
        pipelineDescriptor.vertexFunction = vertexFunction
        pipelineDescriptor.fragmentFunction = fragmentFunction
        pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
        pipelineDescriptor.colorAttachments[0].isBlendingEnabled = true;
        pipelineDescriptor.colorAttachments[0].rgbBlendOperation = .add;
        pipelineDescriptor.colorAttachments[0].alphaBlendOperation = .add;
        pipelineDescriptor.colorAttachments[0].sourceRGBBlendFactor = .sourceAlpha;
        pipelineDescriptor.colorAttachments[0].sourceAlphaBlendFactor = .sourceAlpha;
        pipelineDescriptor.colorAttachments[0].destinationRGBBlendFactor = .oneMinusSourceAlpha;
        pipelineDescriptor.colorAttachments[0].destinationAlphaBlendFactor = .oneMinusSourceAlpha;
        
        self.terrainPipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)
    }
    
    func createInterfaceVertexBuffer(device: any MTLDevice) {
        let vertices: [PassthroughVertex] = [
            .init(x: -1, y: 1, z: 0, u: 0, v: 0),
            .init(x: -1, y: -1, z: 0, u: 0, v: 1),
            .init(x: 1, y: 1, z: 0, u: 1, v: 0),
            
            .init(x: 1, y: 1, z: 0, u: 1, v: 0),
            .init(x: -1, y: -1, z: 0, u: 0, v: 1),
            .init(x: 1, y: -1, z: 0, u: 1, v: 1)
        ]
        
        self.interfaceVertexBuffer = device.makeBuffer(
            bytes: vertices,
            length: MemoryLayout<PassthroughVertex>.stride * vertices.count,
            options: []
        )
    }
    
    func createTerrainVertexBuffer(device: any MTLDevice) {
        let vertices = parent.game.world.unifiedMesh
        self.terrainVertexCount = vertices.count
        guard terrainVertexCount > 0 else { return }
        
        self.terrainVertexBuffer = device.makeBuffer(
            bytes: vertices,
            length: MemoryLayout<BlockVertex>.stride * vertices.count,
            options: []
        )
    }
    
    nonisolated func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
        MainActor.assumeIsolated {
            let scale = parent.window.backingScaleFactor
            parent.interface.resize(width: Int(size.width / 2 / scale), height: Int(size.height / 2 / scale))
            createInterfaceTextureState(device: view.device!)
        }
    }
    
    var isMouseHidden = false
    
    nonisolated func draw(in view: MTKView) {
        MainActor.assumeIsolated {
            let upsideMouse = parent.window.mouseLocationOutsideOfEventStream
            if parent.window.contentView!.frame.contains(upsideMouse) {
                if !isMouseHidden { NSCursor.hide() }
                isMouseHidden = true
            } else {
                if isMouseHidden { NSCursor.unhide() }
                isMouseHidden = false
            }
            
            let mouse = NSPoint(x: upsideMouse.x, y: parent.window.contentView!.frame.height - upsideMouse.y)
            let btn = NSEvent.pressedMouseButtons
            let left = btn & 1 << 0 == 1 << 0
            let right = btn & 1 << 1 == 1 << 1
            parent.game.frame(
                input: .init(
                    mouse: .init(x: Int(mouse.x / 2), y: Int(mouse.y / 2), left: left, right: right)
                ),
                renderer: &parent.interface
            )
            updateInterfaceTexture()
            
            guard let drawable = view.currentDrawable else { return }
            guard let renderPassDescriptor = view.currentRenderPassDescriptor else { return }
            
            let commandBuffer = commandQueue.makeCommandBuffer()!
            let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)!
            
            // Matrix
            let size = parent.window.contentView!.frame
            let bufferPointer = terrainUniformBuffer.contents()
            var mat = parent.game.world.primaryMatrix(width: Float(size.width), height: Float(size.height))
            memcpy(bufferPointer, &mat, MemoryLayout<Matrix<Float>>.size)
            
            // Render terrain
            if terrainVertexCount > 0 {
                renderEncoder.setRenderPipelineState(terrainPipelineState)
                renderEncoder.setVertexBuffer(terrainVertexBuffer, offset: 0, index: 0)
                renderEncoder.setFragmentTexture(terrainTexture, index: 0)
                renderEncoder.setFragmentSamplerState(sampler, index: 0)
                renderEncoder.setVertexBuffer(terrainUniformBuffer, offset: 0, index: 1)
                renderEncoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: terrainVertexCount)
            }
            
            // Render interface
            renderEncoder.setRenderPipelineState(interfacePipelineState)
            renderEncoder.setVertexBuffer(interfaceVertexBuffer, offset: 0, index: 0)
            renderEncoder.setFragmentTexture(interfaceTexture, index: 0)
            renderEncoder.setFragmentSamplerState(sampler, index: 0)
            
            renderEncoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 6)
            renderEncoder.endEncoding()
            
            commandBuffer.present(drawable)
            commandBuffer.commit()
        }
    }
}

I am new to Metal, but it is infinitely easier than OpenGL. 90% of my issues come from the forced object oriented structure of Cocoa and all the delegates, I don't like it and how spaghetti it feels to initialize anything.

The last thing I want is to know even less about what my code is doing, use storyboards and have less type safe resource bundling

tera · May 17, 2024, 6:50pm

An idea: install a 1/60 or 1/120 sec timer (or even better: a display link callback) that grabs the current mouse location and remembers it in some common variable, and when you want to use the current mouse location from a secondary thread use that variable instead of calling window.mouseLocationOutsideOfEventStream. A more optimal variation of this method – subscribe to a "mouseMoved" event. In all cases the variable must be read/write protected, e.g. with a mutex, or you could store x/y coordinates into an atomic UInt64 variable (32 bits per coordinate should be more than enough to represent exact mouse position on the screen). Not sure which is preferable in this case, a mutex or an atomic.

There's also a CGEvent(source: nil)!.location route, which AFAIK is secondary thread safe (I could be mistaken), but its result are in global coordinates.

vns · May 17, 2024, 6:51pm

I'd reuse template from Xcode for Metal app in that case. I suspect (haven't set up macOS windows from ground up like never) your view simply not being rendered... Because MainActor.assumeIsolated works just fine in fact on this method.

Lua · May 18, 2024, 10:10am

As usual I went with the opposite of the reasonable solution

Code

static func main() {
    let id = CGSMainConnectionID()
    
    var tags: CGSWindowTagBit = kCGSDocumentWindowTagBit
    var rect = CGRect(x: 0, y: 0, width: 800, height: 600)
    var region: Unmanaged<CGSRegion>!
    defer { region.release() }
    CGSNewRegionWithRect(&rect, &region)
    
    var window: CGWindowID = 0
    CGSNewWindowWithOpaqueShape(
        id, kCGSBackingBuffered, 0, 0,
        region.takeUnretainedValue(),
        region.takeUnretainedValue(),
        0, &tags, 32,
        &window
    )
    defer { CGSReleaseWindow(id, window) }
    CGSOrderWindow(id, window, kCGSOrderIn, 0)
    
    let dictionary = NSDictionary(object: true, forKey: NSString("CGWindowContextShouldUseCA"))
    let context = CGWindowContextCreate(id, window, dictionary)!.takeUnretainedValue()
    
    context.setFillColor(.black)
    context.fill(.init(x: 20, y: 20, width: 100, height: 100))
    context.flush()
    while true {
        
    }
}

Result

I went around Cocoa completely and made my own window and CGContext using CoreGraphicsServices.
That api is not public so it will definitely break one day but it's a fun experiment

It uses so much less memory than Cocoa and starts up basically instantly, Cocoa is definitely adding a lot of overhead.

At least now I can be 100% sure what's running on the MainActor

Topic		Replies	Views
Help with Swift concurrency Using Swift	8	1101	January 25, 2023
Thread-Safe Pool management with callers from synchronous contexts Using Swift concurrency , swiftui , actors	4	150	October 9, 2024
Sending [...] risks causing data races and Metal completion handlers Using Swift concurrency , metal	0	581	June 16, 2024
Best practices for operating on a buffer in parallel? Using Swift concurrency	14	2920	January 8, 2022
Concurrency: How to update UIKit Views from Sendable Closures? (with data race checks enabled) Using Swift concurrency	3	1863	March 9, 2022

Swift concurrency and Metal

Related Topics