Blog post: The retain cycle of Swift async/await

simonseyer · November 12, 2024, 10:30am

Hi everyone,

I wrote a small blog post about a retain cycle I inadvertently caused while using Swift concurrency: The retain cycle of Swift async/await.

If you want to share some feedback or recommend other approaches, I'm happy to hear from you Otherwise I hope the post is a pleasant read

FranzBusch · November 12, 2024, 1:07pm

Thanks for sharing. I think you are highlighting one of the problems of using unstructured concurrency that can be completely avoided by using structured concurrency instead. Spawning a task in init which references self and cancelling the task in deinit is a very common pattern of retain cycles.

Instead of using unstructured tasks at all, I would recommend looking to adopt structured concurrency by adopting a func run() async method and letting the caller decide on which task to call it. There also more advanced patterns such as with-style methods for scoped access. I have recently given a talk about this at the Server-Side Swift conference https://www.youtube.com/watch?v=JmrnE7HUaDE.

simonseyer · November 12, 2024, 3:53pm

Thanks @FranzBusch for sharing you perspective/recommendations! I'll definitely watch your talk, looking forward to it.

Using a run method makes a lot of sense because it basically allows the caller to manage both the lifetime of the object and the task, avoiding such cycles. I assume the run method is long-running, though, so at some level it needs to be called on an unstructured Task to allow concurrent execution with the rest of the application, right?

Some additional context/thoughts

Maybe this is all answered in your talk but I can share a bit my motivation to use unstructured concurrency (to see if indeed a mindset change may just make the difference):

The concrete problem was to implement multiple protocol layers in a Bluetooth communication. Layers pass messages (or fragments) down and process them on the way up (either as responses or as independently incoming messages). Some of those layers manage timeouts for certain operations.

One design goal of mine was that I can locally reason about concurrency to prevent race conditions (in addition to data races). This required that sending a message was fully completed before the next one was started, even while awaiting a response (the reentrancy topic).

The way I solved this was to use AsyncSequence as a queue on each layer that messages (or fragments) could be added to. Then I spawned a new Task that looped over the AsyncSequence, sent out a request, waited for the response and then continued with the next. This way I could be sure that there would be no overlapping communication.

Applying your suggestion, the layers itself would not manage their own task but just offer a run method that loops over the AsyncSequence. The place where all layers are wired up (let's call it CommunicationStack) could then use, for example, a task group to call run on all layers. If CommunicationStack goes out of memory, it would cancel the group.

No cycle could happen this way (which is good!) but I would still use unstructured concurrency, just on a higher level. So I'm not sure I got your recommendation right. Maybe I will post an update after watching your talk ;)

Jean-Daniel · November 12, 2024, 7:48pm

If your task is always running, it can be started in main(), and doesn't need to be a detached tasks, else you have to manage the lifecycle of your task at some point.

If you want to tie it to the lifecycle of your Bluetooth device connection, you just have a single task and a single object to managed: your CommunicationStack.
Start it when the device is connected.
Stop it when it is disconnected.

It is still explicitly managed, but does not rely on init/deinit and so does not create a retain cycle.

simonseyer · November 17, 2024, 8:38pm

Watched your talk @FranzBusch — well done Can recommend it

@Jean-Daniel yes, it was tied to a lifecycle in my case. Using explicit lifecycle methods would make it more obvious but could be forgotten to be called in all cases. Using deinit ensures this.

Making

each layer a "Service" by introducing a run method,
creating a TaskGroup in the CommunicationStack and
cancelling the group when the CommunicationStack goes out of scope

could work quite well and not risk any cycles.

I could also look again (it's a past project by now) at the object holding the CommunicationStack — it might be simple enough to ensure the lifecycle methods are called properly as you described.

Thank you both for your feedback!