Add API that makes it easier to interoperate with C callbacks

AlexanderM · January 21, 2023, 9:22pm

C doesn't have the luxury of closures, and by extension, neither does Swift code that tries to interoperate with imported C APIs. C functions that want to have a callback have to take a function pointer, which is @convention(c) by Swift terms. If you try to pass a capturing closure, you'd get a compilation error:

a C function pointer cannot be formed from a closure that captures context

Working around this is pretty tricky, and requires some pretty sophisticated knowledge. (There are Objective-C blocks which can be used in C code, but they're not a standard part of C itself, and there are plenty of APIs that don't use them, so I don't see them as a solution to this problem.)

C APIs often simulate closures by using a pair parameters:

A function pointer (for defining the behaviour)
And an accompanying context pointer (for the contextual data). This is often called "userInfo" or "context" and is just a void * pointer.

When the callback is called, you're passed back your context as an argument, which you can cast to whatever type you had given it, and unpack your contextual variables from there.

Here's an example simulated C API:

import Foundation
import Dispatch

// Just a simple example, written in Swift so you don't need to set up a complex multi-lang project.
 Pretend this was an imported C API.
func runCallback(
	after delaySeconds: Int,
	userInfo: UnsafeRawPointer?,
	callback: @convention(c) (_ currentTime: UInt64, _ userInfo: UnsafeRawPointer?) -> Void
) {
	DispatchQueue.main.asyncAfter(deadline: .now() + .seconds(delaySeconds) ) {
		callback(
			DispatchTime.now().uptimeNanoseconds, // Some values given to you by the C API
			userInfo // Passes back your `userInfo` for you to access your context.
		)
	}
}

Calling it from Swift is a little tricky, in part because you need to be pretty careful with specifying how userInfo's memory should be managed.

Here's an example caller that uses a Swift object to store the context

func cAPICaller_userInfoObject() {
	let i = 123 // Some local state we want to close over and use in our callback

	class CallbackUserInfo {
		let i: Int
		
		init(i: Int) { self.i = i }
	}
	
	let userInfo = CallbackUserInfo(i: i) // Package up `i` into our own context object
	
	runCallback(
		after: 1,
		userInfo: Unmanaged.passRetained(userInfo).toOpaque(),
		callback: { userInfoP in
			guard let userInfoP else { fatalError("The userInfo pointer was nil!") }
			
			// Cast our raw pointer back to `CallbackUserInfo` and retain it.
			let userInfo = Unmanaged<CallbackUserInfo>.fromOpaque(userInfoP).takeRetainedValue()
			
			// Fish out whatever context we cared about
			let i = userInfo.i
			
			print("hello, world! \(i)")
		}
	)
}

In that example, we had to hand-write our own CallbackUserInfo class. If we want to capture different values instead, we have to manually update that class' properties and initializer.

We can notice that this context object is really just a hand made closure capturing mechanism, which we already have in Swift. We can simplify this code by leveraging the userInfo to pass a normal Swift closure, which can capture arbitrary context just like normal:

Example caller that uses a Swift closure to store the context

func cAPICaller_closure() {
	let i = 123 // Some local state we want to close over and use in our callback
	
	typealias ClosureType = (_ currentTimeNS: UInt64) -> Void
	
	// This is just a simple Swift closure. It can capture variables like normal,
	// and doesn't need to know/worry about the `userInfo` pointer
	let closure: ClosureType = { currentTimeNS in
		print("Hello, world! \(currentTimeNS) \(i)")
	}
	
	runCallback(
		after: 1,
		userInfo: Unmanaged.passRetained(closure as AnyObject).toOpaque(), // Needs `as AnyObject`? A bit odd, but sure.
		callback: { currentTimeNS, closureP in
			guard let closureP else { fatalError("The userInfo pointer was nil!") }
			
			// Retain the pointer to get an object, and cast it to our Swift closure type
			guard let closure = Unmanaged<AnyObject>.fromOpaque(closureP).takeRetainedValue() as? ClosureType else {
				fatalError("The userInfo points to an object that wasn't our expected closure type.")
			}
			
			// Call our Swift closure, passing along the callback arguments, but not any `userInfo` pointer
			closure(currentTimeNS)
		}
	)
}

I think this is an area where the compiler can help us. From what I understand, closures are already objects that have this two part combination of a function pointer (which contains the behaviour) and heap-allocated storage (which contains the captured context). It would be great if we could use these two values directly with our C API. By using the existing context-building capability of the compiler, we simplify the user's code, and also get to piggy-back off some optimizations for free. For example, if you're capturing only a single memory-managed object (commonly self), and nothing else, then the pointer to that object can be used directly as the context pointer, without needing any other heap allocations.

I don't know how this would look syntactically. One idea might be to have a hypothetical API like createCCallback which returns these values for us, and lets them pass them directly to our C API. It might look something like this:

func cAPICaller_proposedImprovement() {
	let i = 123
	
	// `createCCallback` takes a regular Swift function and return a `@convention(c)` function.
    // It's special, like `withoutActuallyEscaping` (which takes an `@escaping` function and
    // returns a non-escaping function), in that its type can't be expressed in the Swift type system.
    //
    // `@closureContext` is hypothetical syntax that indicates which arg is passes the closure context.
	let (callback, userInfo) = createCCallback { currentTimeNS, @closureContext userInfo in
		// This is just a proper Swift closure. It can capture variables like normal.
		print("Hello, world! \(currentTimeNS) \(i)")
	}

	runCallback(
		after: 1,
		// Assumes that `userInfo` should always be passed retained. Is that right?
		userInfo: userInfo,
		// The `callback` already has the correct type, and knows how to unpack captured values
		// from the `userInfo` object.
		callback: callback
	)
}

If we only have one argument, it's implied that it'll be the userInfo pointer to use to store the context. It gets messier when you have a C callback has multiple arguments. It wouldn't obvious to the compiler which one of them should be the context pointer. One (clunky) is is to ask users to point it would with a marker annotation, like @closureContext.

Do you guys think this is a problem worth solving? Is there any nicer way to simplify Swift code that calls C APIs?

jrose · January 21, 2023, 10:28pm

The next bit of complexity is cleaning up the context. Different C functions might call the callback exactly once, or 0-1 times, or N times. They might have an explicit “destroy” step, or they might not. And they might be escaping, or non-escaping.

I bring this up only because I think it’s part of the design space. In the non-escaping case, cleanup can be implicit, attached to a scope or an object (and possibly more options coming soon with move-only types). In the escaping-with-explicit-destroy case, things are still pretty easy if the helper API can provide an appropriate “destroyer” function.

Maybe those two use cases are enough to focus on for now?

AlexanderM · January 21, 2023, 10:49pm

You're right, I did gloss over that complexity. My post was already getting pretty long haha. I have no idea how we'd express those nuaces. In general, I'm not sure if this kind of thing should be lean towards being Swift API (withoutActuallyEscaping being a perfect example) or dedicated syntax (since there'd be quite a bit of magic going on, anyway). Perhaps it could be written with the new macro system? Not sure!

In any case, do you think this is a problem worth considering?

Semi-related, in experimenting with this, I found that this line:

userInfo: Unmanaged.passRetained(closure as AnyObject).toOpaque()

Is bugged. It can cause a dangling pointer and subsequent crash. closure as AnyObject boxes the closure into a __SwiftValue, and allows it to escape. I filed a bug for it: `unescapingClosure as AnyObject` allows closure to escape · Issue #63151 · apple/swift · GitHub

tera · January 22, 2023, 3:15am

The "recent" trend in the Apple API's world (for quite some number of years already!) is to fix the problem "at source" on the C side by using blocks. A couple of examples:

// OLD WAY:
typedef void (*VTCompressionOutputCallback)(void* refCon, void* sourceFrameRefCon, OSStatus, VTEncodeInfoFlags, CMSampleBufferRef);

// NEW WAY:
typedef void (^VTCompressionOutputHandler)(OSStatus, VTEncodeInfoFlags, CMSampleBufferRef);
API_AVAILABLE(macosx(10.11), ios(9.0), tvos(10.2))

// OLD WAY:
typedef OSStatus (*AURenderCallback)(void* refCon, AudioUnitRenderActionFlags*, AudioTimeStamp*, UInt32, UInt32, AudioBufferList*);

// NEW WAY:
typedef AUAudioUnitStatus (^AURenderPullInputBlock)(AudioUnitRenderActionFlags*, AudioTimeStamp*, UInt32, UInt32, AudioBufferList*);
API_AVAILABLE(macos(10.11), ios(9.0), watchos(2.0), tvos(9.0))

When this not an option in some peculiar case – I tend to find a simple solution that does not involve lifetime management of the passed "refCon" - e.g. I know (and ensure) that my class instance will outlive all callback invocations, and thus I can pass my class instance reference at +0 (and cast it to void*), similarly on the callback side I can treat it as +0 passed reference and just cast it back to my class instance reference). Then there's no problem whether the callback is called 0 or 1 or N times. It's not a general approach but it served me well in all cases I've encountered so far, and the number of such cases steadily decreases every year given the trend outlined above.

andrews05 · January 22, 2023, 9:37am

I would really appreciate something like this. AudioToolbox doesn't seem to have implemented the "new way" mentioned above for functions like AudioQueueAddPropertyListener and AudioConverterFillComplexBuffer.

AlexanderM · January 22, 2023, 4:13pm

Do we have any idea how Apple's block-based replacements work? Are they hand-crafted wrappers, or are there some kind of annotation/macros that autogenerate them?

stuchlej · January 22, 2023, 4:18pm

It is a Clang extension. I believe this Wikipedia page is describing what we're talking about Blocks (C language extension) - Wikipedia.

Look at the first link in "External Links".

jrose · January 22, 2023, 5:25pm

I suspect most of Apple’s block-based APIs have the real implementation, with the function-based APIs calling them, rather than the other way around. But also (a) blocks are easier to stuff in a pointer than Swift closures, and (b) when you’re wrapping an individual API you don’t have to worry about generalizing your ownership model.

tera · January 22, 2023, 5:41pm

So this is one of those peculiar cases I mentioned above; in this case I have an instance of MyAudioConverter class handy, this instance is responsible for audio conversion (obviously it has to stay alive until audio conversion is finished):

let userData = unsafeBitCast(myAudioConverter, to: UnsafeMutableRawPointer.self)
    
let err = AudioConverterFillComplexBuffer(converter, {
    converter, count, io, outPacketDescription, context in
    let myAudioConverter = unsafeBitCast(context, to: MyAudioConverter.self)
    //  do something here using myAudioConverter
    return noErr
}, userData, &count, &io, &desc)

AlexanderM · January 22, 2023, 5:49pm

I'd recommend Unmanaged.passUnretained(myAudioConverter).toOpaque() over unsafeBitCast(myAudioConverter, to: UnsafeMutableRawPointer.self). It's semantically equivalent, but makes it a bit clearer that there's no retain going on.

Though in your case, your passing is essentially equivalent to unsafe unowned. If you're certain your callback is always called once, and only once, you can passRetained and make this safe without needing a "obviously it has to stay alive until audio conversion is finished" caveat

tera · January 22, 2023, 5:57pm

It is an audio conversion callback that's called many times, and by "obviously it has to stay alive until audio conversion is finished" I meant that it would be a logical error if it wasn't... The class that is responsible of doing audio conversion going away without waiting for audio being fully converted?! Kind of absurd. If I do not want all audio being converted (e.g. I want to stop conversion mid way) - then I'd make sure I am cancelling system audio converter properly (e.g. releasing it, so it won't call my callback any more) and then safely kill myAudioConverter instance without a fear.

AlexanderM · January 22, 2023, 6:05pm

Haha that didn't even occur to me. I suppose that's the kind of change you can make when you own the source. :D

tera · January 23, 2023, 1:38pm

This:

the need to allocate a block of memory (and then memory manage it!) to pass closure to a pointer sized quantity; and this quote from a different thread:

How do closures work (memory management)

Closures are actually conceptually a bit more like this:
struct Closure {
    var functionPointer: UnsafeRawPointer
    var closureContext: AnyObject?
}

Makes me wonder if the following can happen:

a closure functionPointer can be odd †
a closure functionPointer can point to some illegal instruction (unless of course you badly want the closure to die in a specific way)

If either of these can not (realistically) happen this could open up an opportunity to use a single word closures:

struct NewClosure {
    var pointer: UnsafeRawPointer
}

The benefits would be a better interop between closures <-> blocks and closures <-> C API's that accept function pointer + userData pointers.

Option 1 ("closure function pointers must be even"):

    pointer is odd †, ††:
        yes? then pointer - 1 actually points to a heap allocated object with:
            var function pointer
            var closure captured variables
            ...
        no? then it points to a function code directly and there is no captured state

† - a variation to this method could be using some unusual / unmapped memory address. As an example, imagine that all valid function and heap addresses must not have their most significant bit "on". If so happens we know that this is not a normal function address, so we invert the bits to get the heap allocated object address.

†† - a second variation would be to invert the cases: make function pointer odd-ball case (and adjust the pointer accordingly before calling through it) - if this plays better with ARC rules or some such.

Option 2 ("a specific illegal instruction must not happen as the very beginning of the closure function"):

    pointer points to the chosen illegal instruction?
        yes? then it is actually pointing to a heap allocated object with:
            var illegalInstruction: UInt64
            var functionPointer: UnsafeRawPointer
            var closure captured variables
            ...
        no? then it points to a function code directly and there is no captured state

Option 3 ("magic spell"):

    pointer points to a specific sequence of magic words?
        yes? then it is actually pointing at a heap allocated object with:
            var magicSpell: (UInt64, UInt64) // like deadbeef feedface etc
            var functionPointer: UnsafeRawPointer
            var closure captured variables
            ...
        no? then it points to a function code directly and there is no captured state

Could this fly?

stuchlej · January 23, 2023, 4:48pm

Looks interesting, but are we talking some new C compiler feature? Otherwise I fail to see how this could be used with existing C binary following standard calling conventions.

tera · January 24, 2023, 3:14am

It's on Swift side only, C side remains the same. Put simply, if you want to stuff a two-pointer quantity (of the current closure) into a single pointer storage (to squeeze it into a C API's userinfo field) you'd need to allocate another intermediate block of memory:

struct Closure {
    var functionPointer: UnsafeRawPointer
    var closureContext: AnyObject?
}

new memory Block: [8 bytes for functionPointer, 8 bytes for closureContext]

This is in addition to the (optional) memory block that's already allocated for "closureContext".

The proposed method suggests a mechanism to only have a single memory block to worry about. And as with current closures when closure context is not needed there's no memory block at all, in this case the resulting closure is in an effect a function pointer.

stuchlej · January 28, 2023, 10:56am

This issue is surfacing almost any time we discuss adding something to the C interop in general.

I remember, that in the past, Swift (clang-importer?) was able to infer swift throwing API from Objective-C method returning (BOOL) and having (NSError **) argument and recently the ability to infer swift async method from Objective-C method having specfic completionHandler argument.

The ability to annotate C API with _Nullable, _Nonnull or various __attribute__ like swiftName (or in this case - using Clang Blocks) brings us no benefit outside of somewhat narrow world of Mac-focused C libraries.

I always thought the obvious solution is APINotes Blogpost, Clang Doc but I was never able to get it work and I suspect it does not work on non-Apple platforms at all.
In my view, the ability to provide additional context without modifying the header files would solve a lot of these issues (for example with C strings), the issue at hand included.

Is there some obvious problem I'm missing?

ksluder · January 28, 2023, 5:51pm

If you relax the problem from “interoperate with existing C code” to “augment the C code to make Swift do the right thing”, it’s a lot more straightforward to wrap the original C function in a function that takes an ObjC block.

stuchlej · January 28, 2023, 10:29pm

Sure, but only if I am in control of the codebase.

I use Swift-C interop mainly on Linux with libraries like SDL, GTK, AdPlug, ...

Even if I somehow had the time and knowledge of the inner workings of those libraries, I don't think the maintainers would accept a PR that adds a lot of complexity for sake of 1 language that is kind of niche in the Linux world (and in case of Clang Blocks possibly major ABI breaking change).
And even if - it would be only useful if you interact with the most up-to-date libraries, which is also not a standard.

ksluder · January 28, 2023, 11:05pm

It’s the same for API notes.

stuchlej · January 29, 2023, 9:00am

I dont think so. In my view, APINotes are more like modulemap.