Idea for a declarative audio system for Swift

crontab · August 15, 2024, 8:27pm

At the moment it's just an immature and potentially crazy idea, but wanted to share it to see if it resonates and ideally also discuss the implementation.

How does audio work on Apple platforms? At the lowest level, the system periodically requests chunks of data in a pre-allocated buffer; whatever you write in the buffer will be sent to the DAC and played on your speakers or headphones. Similarly, the system can periodically give you buffers with audio data recorded from the microphone or other source.

Of course there are higher level API's that are much easier to use, from AVPlayer that can play a file with some minimal control over how to play it, or you can go down one level and use AudioUnits.

But if you want maximum control over what to generate and how to mutate the sound, you usually build a graph within your app, where each node does something with data and passes it on to one or more other nodes; the root node is connected to the system where it receives input data or sends output data.

For example, you can have multiple sound generator nodes connected to some effect nodes that in turn are connected to a mixer node that mixes everything and returns the result in the system buffer to be played on your speakers (that's a synth!).

So, the first instinct here is to have a classical OOP system with some root Node class with basic functionality like mute, bypass, etc, and various connectors, then other node types derived from it.

You can find plenty of frameworks on practically every platform that work this way.

But OOP is boring and so past century. Now, what if we forget all that and build a declarative/functional audio system, similar to SwiftUI but for audio?

Let's see!

Firstly, I want to play a sound with echo and reverberation on button click. In an imaginary OOP audio system you will have something like this (it's not imaginary actually, but not the point):

BEFORE:

struct MyView: View {
	@State var system = System()
	@State var player = FilePlayer(resource: "ding.wav")

	var body: some View {
		Button("Ding!") {
			player.play()
		}
		.onAppear {
			let reverb = Reverb(decay: 4)
			let delay = Delay(time: 2, feedback: -50)
			system.connect(source: reverb)
			reverb.connect(source: delay)
			delay.connect(source: player)
			system.start()
		}
	}
}

And now with an imaginary functional/declarative system:

AFTER:

struct MyView: View {
	var body: some View {
		Button("Ding!") {
			FilePlayer(resource: "ding.wav") {
				Delay(time: 2, feedback: -50)
				Reverb(decay: 4)
			}
		}
	}
}

So what's going on?

In a non-ViewBuilder context you can have a Generator component, FilePlayer in this case. The top level component will receive an audio data buffer to be filled and sent back to the system, which it does by loading file data. Optionally, as you can see the buffer can be passed to any nested Filter components to do additional mutation.

The crucial difference here is that you don't need to instantiate and connect audio nodes. Let's skip the magic behind the top-level generator for a moment, but everything that's nested within FilePlayer's filter trailing closure, are structs that (magically) receive the audio buffer with each rendering cycle and do their own mutations.

Once the file is played in full, i.e. the generator finishes and disables itself, its connection to the system is destroyed and FilePlayer's filter is no longer called. Underneath, I think there should be only one persistent object, the top-level one (e.g. FilePlayer) and the rest becomes a chain of function calls.

Important to note that FilePlayer's filter closure is executed on a special high-priority actor which means the processing chain should be kept minimal. A debug-time subsystem will ensure no heavy computations take place inside it and warns if the processing time is unacceptable (you usually have to wrap up everything within 10ms or less).

Now, one more cool demo: let's define a custom audio component! It will be a component that changes volume (extremely simplified, with no ramps etc.):

struct Volume: Filter {
	@AudioState var level: Float // 0...1

	func process(channels: AudioBuffers) { // protocol requirement, like View's `body`
		let factor = pow(10, (level - 1) * 2)
		for channel in channels { // e.g. 2 channels for stereo
			dspMultiply(channel.data, factor)
		}
	}
}

And finally let's connect the two worlds, SwiftUI and this audio system:

struct MyView: View {
	@State var left: Float = 0
	@State var right: Float = 0
	@State var reverb: Bool = true

	var body: some View {
		ProgressView(value: left)
		ProgressView(value: right)

		Toggle(isOn: $reverb)

		Button("Ding!") {
			FilePlayer(resource: "ding.wav") {
				RMSMeter($left, $right) // magic! we have levels shown in real time!
				if reverb {
					Reverb(decay: 4)
				}
			}
		}
	}
}

Again, this is a very, very, very immature idea that needs infinite polishing. Audio is surprisingly hard, but so is graphical UI and yet there are declarative UI systems like SwiftUI that are taking the world by storm and will probably leave the OOP ones behind someday.

If this becomes reality, the underlying implementation will be quite complex because you want maximum flexibility, maximum efficiency, and of course multithreading done properly.

And it might be that some support from the Swift compiler and/or Foundation will be required too, but I'm not sure.

Please feel free to critique, amend, ask difficult questions, or just shout in joy (or rage) at no more than 120dB

Geordie_J · August 25, 2024, 9:47am

It could work. My input/suggestion would be to consider the declarative audio system separate from SwiftUI to begin with.

You might want to play around with AudioKit as a base to design the API and iron out issues that come up.