Considerations when adapting completion-block async code to Swift concurrency via `withTaskCancellationHandler` & `withCheckedContinuation`

jamieQ · December 8, 2023, 3:43pm

i was recently thinking about some code that adapts a completion-block-based API to Swift concurrency with support for cancelation of the async work. i found things a bit more complex at times than i initially expected, so wanted to walk through my thought process and the resulting code more explicitly. the general structure began with something like this:

// v1
func doAsyncWorkWithCancelation() async -> Int? {
    // async work cancelation handler
    var cancelHandler: DispatchWorkItem?

    // wrap async work in a cancelation handler
    return await withTaskCancellationHandler(
        operation: {
            return await withCheckedContinuation { continuation in
                // configure work
                let workItem = DispatchWorkItem {
                    continuation.resume(returning: 42)
                }

                // configure cancelation handler
                cancelHandler = workItem

                // enqueue work
                DispatchQueue.global().async(execute: workItem)
            }
        }, onCancel: {
            cancelHandler?.cancel() // 🛑 Reference to captured var 'cancelHandler' in concurrently-executing code
        }
    )
}

this implementation would not compile due to the mutable reference to the cancelation handler being identified as potentially being accessed concurrently (aside: it's great that this can be surfaced statically!). this led me to conclude that we must use something like an immutable, atomic reference to the cancelation handler, which resulted in a version like this:

// v2
func doAsyncWorkWithCancelation() async -> Int? {
    // async work cancelation handler
    let atomicCancelHandler = Atomic<DispatchWorkItem>()

    // wrap async work in a cancelation handler
    return await withTaskCancellationHandler(
        operation: {
            return await withCheckedContinuation { continuation in
                // configure work
                let workItem = DispatchWorkItem {
                    continuation.resume(returning: 42)
                }

                // configure cancelation handler
                atomicCancelHandler.value = workItem

                // enqueue work
                DispatchQueue.global().async(execute: workItem)
            }
        }, onCancel: {
            atomicCancelHandler.value?.cancel() // now compiles successfully
        }
    )
}

this addressed the compiler error, and presumably fixed the data race by adding a lock around the cancelation handler. however, upon thinking through the potential orderings of onCancel and operation, it became clear that there was another potential issue – since the operation and onCancel closures run concurrently with respect to each other, there is a race condition. it would be possible for the cancel handler to be invoked after the work item was created but before it was stored in atomicCancelHandler.value, which would prevent the expected cancelation of the work item from ocurring. this led to the next implementation which looked like:

// v3
func doAsyncWorkWithCancelation() async -> Int? {
    // async work cancelation handler
    let atomicCancelHandler = Atomic<DispatchWorkItem>()

    // wrap async work in a cancelation handler
    return await withTaskCancellationHandler(
        operation: {
            return await withCheckedContinuation { continuation in
                // check if we were canceled before starting
                guard !Task.isCancelled else {
                    continuation.resume(returning: nil)
                    return
                }

                // configure work
                let workItem = DispatchWorkItem {
                    continuation.resume(returning: 42)
                }

                // configure cancelation handler
                atomicCancelHandler.value = workItem

                // check for cancelation again after
                // configuring the cancelation handler
                // but before submitting the work
                guard !Task.isCancelled else {
                    continuation.resume(returning: nil)
                    return
                }

                // enqueue work
                DispatchQueue.global().async(execute: workItem)
            }
        }, onCancel: {
            atomicCancelHandler.value?.cancel()
        }
    )
}

i thought this was basically the final version, but then realized that a similar race exists between enqueuing the work item, canceling the task, and executing the work item. this resulted in the following, final implementation:

// v4
func doAsyncWorkWithCancelation() async -> Int? {
    // async work cancelation handler
    let atomicCancelHandler = Atomic<DispatchWorkItem>()

    // wrap async work in a cancelation handler
    return await withTaskCancellationHandler(
        operation: {
            return await withCheckedContinuation { continuation in
                // check if we were canceled before starting
                guard !Task.isCancelled else {
                    continuation.resume(returning: nil)
                    return
                }

                // configure work item
                var innerCancelHandler: DispatchWorkItem?
                let workItem = DispatchWorkItem { [unowned innerCancelHandler] in
                    // check for cancelation
                    guard innerCancelHandler?.isCancelled == false else {
                        continuation.resume(returning: nil)
                        return
                    }

                    continuation.resume(returning: 42)
                }
                // configure 'inner' cancelation handler.
                // note: the specific approach here could
                // differ. this one was used for simplicity
                // and to avoid possibly creating a retain cycle
                // with the work item
                innerCancelHandler = workItem

                // configure cancelation handler
                atomicCancelHandler.value = workItem

                // check if we were canceled again after
                // configuring the cancelation handler
                guard !Task.isCancelled else {
                    continuation.resume(returning: nil)
                    return
                }

                // enqueue work
                DispatchQueue.global().async(execute: workItem)
            }
        }, onCancel: {
            atomicCancelHandler.value?.cancel()
        }
    )
}

curious if this final implementation seems generally 'correct' for this scenario, or if there's anything else i may be overlooking.

wadetregaskis · December 8, 2023, 5:24pm

Yeah, that's a thorny one. Although not all that surprising - Structured Concurrency doesn't play well with GCD.

And another problem I see is that you're not actually concluding the continuation when cancellation occurs 'successfully' (when your work item never actually runs, due to the cancellation), so in that situation your code will either never return from doAsyncWorkWithCancelation or trigger a runtime error when the continuation deinits without first being resumed.

You might be able to accomplish this using a Future of some kind, e.g. from Combine. Something like:

func doAsyncWorkWithCancelation() async -> Int? {
    return await Future<Int?, Never>() { resultPromise in
        let cancelHandler = Future<DispatchWorkItem, Never>() { cancellationHandlerPromise in
            await withTaskCancellationHandler(
                operation: {
                    let workItem = DispatchWorkItem {
                        resultPromise(.success(42))
                    }

                    cancellationHandlerPromise(.success(workItem))

                    DispatchQueue.global().async(execute: workItem)
                }, onCancel: {
                    Task {
                        await cancelHandler.value.cancel()
                        resultPromise(.success(nil))
                    }
                })
        }
    }.value
}

This does potentially call the resultPromise twice, and I don't know if that's permitted or not (bafflingly the Combine documentation doesn't cover this essential usage aspect, although YMMV depending on what Future implementation you ultimately go with, anyway).

It's still technically racey as your work item might actually execute yet not call resultPromise first, but that's a benign race - it merely results in the work being wasted, and with asynchronous prompt cancellation that's always a possibility.

It's a bit ugly that you have to create another Task during cancellation, but unfortunately there's no version of withTaskCancellationHandler that takes an async cancel closure.

You can try putting the checked continuation back in there if you really want its checks, but otherwise it's unnecessary… and it might be tricky to make it work correctly since (per the documentation) "resuming from a continuation more than once is undefined behavior".

jamieQ · December 8, 2023, 6:46pm

thanks, that's a great point. cancelation could occur after the item was enqueued but before it was invoked, in which case you're right we would be violating the continuation's contract by not resuming. that case could presumably be handled with something like workItem.notify(...), but would also add some more complexity... .