Truly decoupling two parallel async tasks in SwiftUI or: Random snake audio visualization thanks to Core Audio witchcraft

kyr0 · November 5, 2024, 2:51am

Dear community,

I'm new to Swift and wrote my first App last weekend. I'm an experienced software engineer, so I adapted quickly. I wrote some C code (a low-level audio-visualization implementation, anachronistic reimplementation of the famous Geiss screensaver/Winamp plugin from 1998) to render a framebuffer (RGBA). Now I wanted to render it natively in a macOS App and therefore decided to use Swift. I somehow regret my choice atm, as I'm really struggling in my fight over control of parallelization.

I have everything working basically - Audio App Capture via Aggregated Audio Device and Audio Tap with Core Audio; SwiftUI with it's UI and input fields; C integration of my code with FFI and pointer arithmetic. I don't think that I did a bad job at learning Swift in those 2-3 days...

I even implemented the datatype conversion to C datatypes and back, the FFT and all that. But when trying to make Swift decouple two Tasks that have to work in true parallel manner, I almost break my hands ;)

Core Audio shows some insane behaviour in delivering the audio buffers. Sometimes they arrive at a rate of 2 FPS and sometimes at 60 FPS. Only god knows when they will arrive. But due to the apparent strict synchronization in SwiftUI, my C codes render function is only called, when audio data arrived, pushing the rendering down to 2 FPS... or up to 60 FPS.. depends on witchcraft I guess ;)

Well, I understand that my writing sounds weird. And that's why I prepared a code repo for you to check. Simply checking this out, you'll be able to find a beautiful new open source music visualization... that runs at snake speed.... and to reproduce the issue in a matter of seconds.

Issue details: The audioQueue is basically receiving data at a pace that I didn't find a way to control. However, only when data is received, would the renderQueue actually call updateData() and let the detached rendering window call my C-code and re-render:

github.com

kyr0/MilkyApp/blob/main/Milky/RecordingView.swift#L171


      
          let audioQueue = DispatchQueue(label: "AudioQueue", qos: .userInitiated, attributes: .concurrent)
          let renderQueue = DispatchQueue(label: "RenderQueue", qos: .userInteractive, attributes: .concurrent)
          
          audioQueue.async {
              Task {
                  while await syncController.isSyncing {
                      let samplesData = await recorder.samplesData
                      let fftData = await recorder.fftData
                      let sampleRate = Int(await recorder.currentFormat.sampleRate)
                      
                      await syncController.sharedData.update(samplesData: samplesData, fftData: fftData, sampleRate: sampleRate)
                      
                      try? await Task.sleep(nanoseconds: 5) /// as fast as possible
                  }
              }
          }
          
          renderQueue.async {
              Task {
                  var lastFrameTime = Date()
                  var lastWidth = 0

A) The primary fix I need is to get Swift to detach the queues. I simply want the renderQueue to re-render at the configured frequency, and not to wait for the audioQueue at all. It should always call updateData() at that pace, pick the latest audio data and that's all.

B) If that's fixed, my C code would finally render at the highest speed it can. I know that this is possible, because sometimes Core Audio delivers at 60 FPS and then everything is fine. It's a true Heisenbug. To get back control over that, I'd like to force CoreAudio to hand me the buffers at the fastest pace possible so that the waveform would sync better with the rendering visually...

This is probably hanging here:

github.com

kyr0/MilkyApp/blob/main/Milky/ProcessTap/ProcessTap.swift#L163


      
          
              if processTapID.isValid {
                  let err = AudioHardwareDestroyProcessTap(processTapID)
                  if err != noErr {
                      logger.warning("Failed to destroy audio tap: \(err, privacy: .public)")
                  }
                  self.processTapID = .unknown
              }
          }
          
          private func prepare(for objectID: AudioObjectID) throws {
              errorMessage = nil
          
              let tapDescription = CATapDescription(stereoMixdownOfProcesses: [objectID])
              tapDescription.uuid = UUID()
              tapDescription.muteBehavior = muteWhenRunning ? .mutedWhenTapped : .unmuted
              var tapID: AUAudioObjectID = .unknown
              var err = AudioHardwareCreateProcessTap(tapDescription, &tapID)
          
              guard err == noErr else {
                  errorMessage = "Process tap creation failed with error \(err)"

Could you please help me with this? I really did work hard on getting all of that done on my own. This project will be open source -- I also have a WebAssembly version working. It's a bit frustrating to struggle so much with native performance when the thing renders at 25 FPS with high res in a browser...

https://kyr0.github.io/Milky.js/

How does that let Swift look like? ;)

Thank you in advance!
kyr0

tera · November 5, 2024, 8:46am

I'd not use queues here (of any shape and form) and decouple mic capturing and everything else. Mic capturing (e.g. done with AudioUnits for crossplatfomobility) will be done in real time and use the push model to write to the ring buffer, everything else will be in non-realtime and use the pull model to read from that ring buffer. Where to put FFT – if it's quick enough it could be on the realtime side (so in this case the ring buffer will be actually the ring buffer of frequencies), otherwise it could be on the non-realtime side pulling from the ring buffer of audio samples. Draw the diagram first. Good luck and well done for the first Swift app.

kyr0 · November 5, 2024, 1:10pm

Hi tera,

thank you for your reply. I'm using Audio Taps because the audio visualization uses the output audio stream of another App (the user can select it, Spotify for example, or iTunes) as an input. Microphone recording doesn't support this AFAIK. The feature of Audio Taps was introduced in macOS 14.2 with a Core Audio update.

Can I prioritize the Core Audio queues somehow? I suspect them to be de-prioritized and that's why they randomly deliver buffers slow, medium or fast.

The RingBuffer idea is interesting. I implemented a ring buffer for a similar reason in another integration of the same code, but as the syncing/blocking seemed to be at language level in Swift, I didn't bother to implement my own.

Does this implementation look good to you?

gist.github.com

https://gist.github.com/ocrickard/d222f9d744fee2ad0fdee53625d42ded

RingBuffer.swift

//: Playground - noun: a place where people can play

import Foundation
import UIKit

import Darwin.os.lock

/**
  A threadsafe, non-blocking (on the write side), zero-allocation ring-buffer implementation.

This file has been truncated. show original

This impl. uses an UnsafeMutablePointer much like my C FFI code does. I was thinking about trying that myself too, but I was a bit too tired and decided to ask here first. Reflecting on this, Swift should def. not block on DMA via pointer-based memory reads, so.. it might work. I'll give it a try.

The FFT calculations are already on a global background DispatchQueue and are debounced by skipping every 2nd operation:

github.com

kyr0/MilkyApp/blob/main/Milky/ProcessTap/ProcessTap.swift#L260


      
          var currentFormat: AVAudioFormat
          var fftData: [UInt8] = []
          var samplesData: [UInt8] = []
          private var updateCounter = 0
          
          private func updateSamplesAndFFT(buffer: AVAudioPCMBuffer) {
              updateCounter += 1
              guard updateCounter % 2 == 0 else { return } /// throttle updates to every other frame
              let samples = processSamples(from: buffer)
              
              DispatchQueue.global(qos: .background).async { [weak self] in
                  guard let self = self else { return }
                  let frequencyBins = self.performAndProcessFFT(on: buffer)
                  
                  DispatchQueue.main.async {
                      self.samplesData = samples
                      self.fftData = frequencyBins
                  }
              }
          }

Thanks and wish you a great day.

tera · November 5, 2024, 10:13pm

I'd triple check that, first reading the docs, then goggling, then doing tests and finally if still not clear – asking on the relevant forums (stackoverflow or apple dev forums). This forum is focused on the Swift language itself.

If it's intended to be used in real time context (for writing or reading or both), then no, as you can't do anything other than "read memory, write memory, and do math" (look around ~31:00 and ~38:00). Note that if this is in relation to visualising audio in a screen saver of similar – the provider and the consumer don't have to be perfectly synchronised – so long as they are not too far apart in time (say, 50ms - 100ms), nobody will notice if the consumer is slower or faster than the producer (in the former case it will miss some samples in the latter case it will use some samples more than once). And there's a trivial low-tech solution to keep the producer and consumer within 50-100 ms from each other – just make the ring buffer that big.

kyr0 · November 6, 2024, 2:00am

Implementing true parallelism in Swift seems impossible to me. There is no way to get Swift away from syncing Tasks, variable access etc. I ended up implementing it in C with double buffering. Now I have fast rendering for super slow audio waveforms passed down by Swift. That's already half of the rent payed.. but as the audio buffers are delivered in a pace of 2 FPS sometimes the image looks of course totally snake slow even though the image is rendered at 30 FPS now. So... I guess I have to hack my way through Core Audio to identify this mess of a bottleneck xD

kyr0 · November 6, 2024, 2:43am

So... I'm pretty convinced I found the root cause of the issue. When I start the program with power plugged in, Swift selects a p (performance) core of my M3 Macbook Air machine for processing the Queue that pulls the audio stream buffers. It's always delivering buffers fast-paced (60FPS+). Once I unplug and restart the program, it selects an e (economic) core, and audio buffers are delivered at 15 FPS. When the battery goes low, buffer delivery drops to 2 FPS. I clearly need to find a way to force Core Audio and my program to select p cores, no matter what.

nukka123 · November 6, 2024, 12:18pm

I'm not familiar with the Core Audio API, but DispachQueue's QoS affects thread priority and the CPU cores it runs on. How about changing .background or unspecified qos to .userInteractive?

kyr0 · November 6, 2024, 12:54pm

@nukka123 Thank you for your reply. I didn't push my recent code when I posted my last reply. But here you go

Render loop in C instead of Swift:

github.com

kyr0/MilkyApp/blob/main/Milky/Visualizer/video.c#L95


      
                  // Wait for roughly 60 FPS
                  usleep(16000);
          
                  // Toggle buffers after rendering
                  toggleBuffer();
              }
          
              return NULL;
          }
          
          void startContinuousRender(uint8_t *frameBufferA, uint8_t *frameBufferB, size_t canvasWidthPx, size_t canvasHeightPx, uint8_t bitDepth, size_t sampleRate) {
              pthread_t renderThread;
              pthread_attr_t attr;
              struct sched_param param;
          
              // Initialize buffer pointers
              bufferA = frameBufferA;
              bufferB = frameBufferB;
          
              // Allocate memory for the arguments struct
              RenderLoopArgs *args = malloc(sizeof(RenderLoopArgs));

Double-buffered rendering:

github.com

kyr0/MilkyApp/blob/main/Milky/DetachedViewController.swift#L59


      
              fpsLabel.isEditable = false
              fpsLabel.isBezeled = false
              fpsLabel.drawsBackground = false
              view.addSubview(fpsLabel)
              
              // Start the continuous render loop in C
              pixelBufferA.withUnsafeMutableBytes { pixelBufferAPointer in
                  pixelBufferB.withUnsafeMutableBytes { pixelBufferBPointer in
                      let framePointerA = pixelBufferAPointer.baseAddress!.assumingMemoryBound(to: UInt8.self)
                      let framePointerB = pixelBufferBPointer.baseAddress!.assumingMemoryBound(to: UInt8.self)
                      startContinuousRender(framePointerA, framePointerB, width, height, UInt8(32), 44100)
                  }
              }
          
              // Timer to update the displayed image at approximately 60 FPS
              Timer.scheduledTimer(withTimeInterval: 1.0 / 60.0, repeats: true) { _ in
                  self.updateImage()
              }
          }
          
          override func viewWillAppear() {

Trying to force high priority on a low level:

github.com

kyr0/MilkyApp/blob/main/Milky/ProcessTap/AudioProcessController.swift#L48


      
          func activate() {
              logger.debug(#function)
          
              NSWorkspace.shared
                .publisher(for: \.runningApplications, options: [.initial, .new])
                .map { $0.filter({ $0.processIdentifier != ProcessInfo.processInfo.processIdentifier }) }
                .receive(on: RunLoop.main) /// process updates on the main thread directly
                .sink { [weak self] apps in
                    guard let self else { return }
                    self.reload(apps: apps)
                    Task(priority: .high) { // Max priority for updates
                      self.reload(apps: apps)
                    }
                }
                .store(in: &cancellables)
          }
          
          fileprivate func reload(apps: [NSRunningApplication]) {
              logger.debug(#function)
              
              do {

Setting everything to .userInteractive

github.com

kyr0/MilkyApp/blob/main/Milky/RecordingView.swift#L183


      
                  
                  if fullscreenMode {
                      detachedWindow.toggleFullScreen(nil)
                  }
              }
          }
          
          func startAudioSyncLoop() {
              syncController.startSyncing()
          
              let audioQueue = DispatchQueue(label: "AudioQueue", qos: .userInteractive, attributes: .concurrent)
              let renderQueue = DispatchQueue(label: "RenderQueue", qos: .userInteractive, attributes: .concurrent)
          
              // Audio Queue (Producer)
              audioQueue.async {
                  Task {
                      var lastFrameTime = Date()
                      let nanosecondsPerSecond: UInt64 = 1_000_000_000
                      let sleepDuration = nanosecondsPerSecond / UInt64(targetFPS)
                      while await syncController.isSyncing {
                          let samplesData = await recorder.samplesData

I still get slow audio buffer updates at times. Even when power cord is attached and macOS is configured to never go into energy saving mode.

I also tried calling low-level pthread priority, but it didn't help..

github.com

kyr0/MilkyApp/blob/main/Milky/MilkyApp.swift#L13


      
          let kAppSubsystem = "de.aronhomberg.Milky"
          
          @main
          struct MilkyApp: App {
              
              /*
              init() {
                  setMainThreadHighPriority();
              }
              
              func setMainThreadHighPriority() {
                  let qosClass: qos_class_t = QOS_CLASS_USER_INTERACTIVE  // High QoS for UI/foreground tasks
                  let priority: Int32 = 63  // relative prio
                  
                  pthread_set_qos_class_self_np(qosClass, priority)
              }
               */
              
              var body: some Scene {
                  WindowGroup {
                      RootView()

At this point I'm a bit lost. There is definitely some uncontrollable witchcraft going on in the runtime behind the scenes; something it seems, a developer has no control over. I don't know if there is an API to get hold of any threads reference of the App so that I could loop through it regularly and set the priority high?

@mickeyl Danke Dir für den Star auf meinem Repo :) Ich hab' gelesen, dass Du Experte für Swift bist. Vielleicht hast Du einen kleinen Tipp für mich? Ich helfe gerne mal bei irgendwas komplizierten Web-basierten oder so, wenn Du was brauchst.

nukka123 · November 6, 2024, 5:01pm

It may be a different problem than yours.

Audio Provider FPS: 26.53220608096354
Rendering Call FPS: 28.449460761039138
Audio Provider FPS: 28.421989191753205
Rendering Call FPS: 2.8689712535414933
Audio Provider FPS: 2.8691773226920154
Rendering Call FPS: 29.013274305675647

As for the FPS drops that this log indicates, reducing the frequency of print may solve the problem.

    if fps < 10 {
        print("Audio Provider FPS:", fps)
    }

    if fps < 10 {
        print("Rendering Call FPS:", fps)
    }

I don't have the details, but the internal code of print is locking related to OutputStream.
I suspect the wait time occurs when the internal buffer is full.

kyr0 · November 6, 2024, 8:40pm

@nukka123 Good idea. Thank you for cloning the repo and setting it up! I appreciate it! Actually, I commented out all logs and the issue remains. What you see with 2.8 FPS is exactly the issue I mean though. Please notice that "Rendering Call FPS" is not the actual rendering FPS.

It's just the queue that used to render before I moved the render loop in C+++, but now it's just the code that calls the actual rendering code with updated audio data.

btw. "Audio Provider FPS" and "Rendering Call FPS" are synced/locked to roughly the same number always because those queues cannot be decoupled in Swift.

So, for the moment, I'm planning to force the OS to move my threads on performance cores by using system commands; let's how successful I'll be. If that's not going to show good results, I'm going to rewrite the Audio code in footgun C++...

kyr0 · November 9, 2024, 12:11pm

Whoever will read this someday, this is how you do it:

Rewrite in C++:

MilkyApp/Milky/DSP/audio.cpp at main · kyr0/MilkyApp · GitHub

It will be so fast (90FPS+) that you even have to implement frame limiting
Rewrite the render loop in C++ as well:

github.com

kyr0/MilkyApp/blob/main/Milky/DSP/video.cpp#L145


      
              return NULL;
          }
          
          size_t getCurrentTimeMillis(void) {
              struct timeval time;
              gettimeofday(&time, NULL);
              return (size_t)(time.tv_sec * 1000 + time.tv_usec / 1000);  // Convert to milliseconds
          }
          
          // Start continuous rendering with maximum performance optimizations
          void startContinuousRender(
             uint8_t *frameBufferA,
             uint8_t *frameBufferB,
             int32_t *sharedBufferIndex, 
             size_t canvasWidthPx,
             size_t canvasHeightPx,
             uint8_t bitDepth,
             size_t sampleRate,
             size_t delayMs,
             size_t desiredFPS
          ) {

Then synchronize via a doublebuffer and counter with Swift and render on a Metal view:

MilkyApp/Milky/DetachedViewController.swift at main · kyr0/MilkyApp · GitHub
Use DMA/Pointers to access the texture buffers rendered and sync via semaphores:

github.com

kyr0/MilkyApp/blob/main/Milky/DetachedViewController.swift#L226


      
                let drawable = metalView.currentDrawable,
                let renderPassDescriptor = metalView.currentRenderPassDescriptor,
                let commandBuffer = commandQueue.makeCommandBuffer(),
                let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor) else {
              frameSemaphore.signal()
              return
          }
          
          needsTextureUpdate = false
          
          let activePixelBuffer = useBufferA ? pixelBufferA : pixelBufferB
          guard let renderSize = renderSize else {
              frameSemaphore.signal()
              return
          }
          
          if inputTexture == nil || inputTexture?.width != renderSize.width || inputTexture?.height != renderSize.height {
              inputTexture = texture(fromRGBAData: activePixelBuffer, width: renderSize.width, height: renderSize.height)
          } else {
              updateTexture(inputTexture!, withData: activePixelBuffer, width: renderSize.width, height: renderSize.height)
          }

Use Metal for fast framebuffer painting instead of NSImage, and use Metal Shaders for post processing effects and upscaling:

github.com

kyr0/MilkyApp/blob/main/Milky/Shaders.metal#L190


      
              float4 result = float4(0.0);
              for (int i = -radius; i <= radius; ++i) {
                  float2 offset = float2(i) * direction / textureSize;
                  result += texture.sample(s, uv + offset) * weights[i + radius];
              }
          
              return result;
          }
          
          // Fragment shader with pixelated upscaling and corrected corner vignette effect
          fragment float4 fragment_main(VertexOut in [[stage_in]],
                                        texture2d<float, access::sample> inputTexture [[texture(0)]]) {
              
              // Sampler with linear filtering and clamped addressing
              constexpr sampler texSampler(mag_filter::nearest, min_filter::nearest, address::repeat);
          
              float2 textureSize = float2(inputTexture.get_width(), inputTexture.get_height());
              float2 uv = in.texCoord;
              
              float2 center = float2(0.5, 0.5);

DO NOT USE SWIFT FOR ANY REALTIME TASKS. IT'S A FOOTGUN ;)
The witchcraft is neither Core Audio nor CPU Core Affinity of the process/thread.
It is the Swift Runtime itself that comes with a scheduler that always prefers stability over performance. You'll constantly have background GC going on and there is no safe way to implement any realtime algorithm truly lock/sync free. This is good for the most part, but terrible for DSP/realtime code.

Here is a release build of the Realtime Music Visualizer for any App's Audio: Tags · kyr0/MilkyApp · GitHub

QuinceyMorris · November 9, 2024, 5:39pm

Well, the good news is that you've discovered this correct answer to your original issue. I sort of feel that the all-caps are a bit over the top, since this issue has been discussed multiple times in the Swift forums, so it's pretty well known.

It isn't correct to blame this on Swift alone, though. It is also true that you should not use Obj-C for any real-time stuff like this. The actual rule is that there are certain things your code should not do in real-time processing. The most common (and easiest to overlook) things to avoid are:

Memory allocations
Locks
I/O

Both the Swift runtime and the Obj-C runtime do some of these things, which means they are to be avoided, since your code cannot avoid those behaviors. C and C++ generally don't do these things, assuming you don't call into libraries which do them.

I don't know what this means, really. Swift (and Obj-C and some other languages) aren't architected for real-time work. That's about as extreme as you can validly get.

This is false. There's no GC in Swift or Obj-C. They also don't really do any housekeeping in the "background".

tera · November 9, 2024, 11:15pm

FWIW Swift got @nolocks / @noallocations recently, which features (once they are truly landed) make Swift more safe than C IRT realtime programming.

lukeh · November 10, 2024, 12:30am

Reference counting is GC according to some definitions of GC. So maybe it's just a terminology issue.

This thread was a great read!