Making actor function non-reentrant using async Semaphore: blocked, not but deadlock?

davidbaraff · October 2, 2024, 10:52pm

I posted some time ago about a function in an actor that needs to be non-reentrant, and there are numerous examples around of why this is sometimes necessary.

One proposed solution is to use an "async semaphore" that is, a semaphore implemented entirely using Swift async/await methods. Since async/await is supposed to be deadlock free, one should be able to achieve non-reentrancy in a deadlock free manner: the first line of the function acquires the semaphore; all exit paths of the function release it. (Let's ignore throwing and cancelation for now to keep it simple.)

But what if the function called itself? I became aware that this would have to block because it would involve a task that was... waiting for itself to finish. Huh? Is this even possible?

Sure, easily:

actor TaskHolder {
    var task1: Task<Void, Never>?
    func setTask1(_ task: Task<Void, Never>) {
        self.task1 = task
    }
}

let taskHolder = TaskHolder()

func load() async {
    let task = Task {
        print("Task starts. ")
 
        if let task = await taskHolder.task1 {
            print("I found a task.  Waiting... for myself?")
            await task.value
            print("Done with await.")
        }
    }
    
    await taskHolder.setTask1(task)
    await task.value
}

When you call load(), it never finishes. Its launched task waits for itself.

But this is not deadlock though? Do we have another word for it? I was under the impression that deadlock was impossible with async/await, so clearly, this cannot be deadlock, which means I need another phrase to describe what's going on here...

The reason this matters is, clearly, a non-reentrant function which tries to recurse is a bad idea. But if it can happen in simple cases like that, it can also happen in more complex cases involving more than one non-reentrant function, and we won't see it coming.

Again, do we have a precise definition of what deadlock is when it comes to Swift concurrency? Because effectively, this scheme would run into it.

ktoso · October 2, 2024, 11:01pm

It very much is a deadlock. The kind of deadlock that people are concerned about and have been the reason that we don't have non-reentrant actors by default.

It's a tradeoff, non-reentrancy like this means you can deadlock; Re-entrancy means you get confusing unexpected behaviors.

If we did reentrant actors in the language they'd suffer a similar issue.

Personally I'd argue that it is an easier (but still a hard problem) to build tooling to detect deadlocks than to detect "something weird happened because the developer didn't think about reentrancy" which is a very undefined state to be looking for. But yes, it means we’d be at constant risk of deadlocks.

davidbaraff · October 3, 2024, 2:03am

I have heard that async/await always allows for forward progress, and is free from deadlocks, so many times, I thought it was impossible to paint yourself into this corner using only async/await. So I thought there must be a narrow technical definition of “deadlock” as pertains to async/await, but clearly not.

itaiferber · October 4, 2024, 2:14pm

Just to add, there are two critical parts of this description that are missing:

Swift concurrency always allows threads to make forward progress (but not necessary individual units of work), and
Making forward progress is a cooperative effort: it's still possible to write tasks which never make forward progress in a way that prevent scheduling around them, blocking other work

From Swift concurrency: Behind the scenes - WWDC21 - Videos - Apple Developer

In Swift, tasks can only await other tasks that are known to the Swift runtime -- be it continuations or child tasks. Therefore, code when structured with Swift's concurrency primitives provide the runtime with a clear understanding of the dependency chain between the tasks. So far, you've learned how Swift's language features allow a task to be suspended during an await. Instead, the executing thread is able to reason about task dependencies and pick up a different task instead. This means that code written with Swift concurrency can maintain a runtime contract that threads are always able to make forward progress.

and

Using a lock in synchronous code is safe when used for data synchronization around a tight, well-known critical section. This is because the thread holding the lock is always able to make forward progress towards releasing the lock. As such, while the primitive may block a thread for a short period of time under contention, it does not violate the runtime contract of forward progress. It is worth noting that unlike Swift concurrency primitives, there is no compiler support to aid in correct usage of locks, so it is your responsibility to use this primitive correctly. On the other hand, primitives like semaphores and condition variables are unsafe to use with Swift concurrency. This is because they hide dependency information from the Swift runtime, but introduce a dependency in execution in your code. Since the runtime is unaware of this dependency, it cannot make the right scheduling decisions and resolve them. In particular, do not use primitives that create unstructured tasks and then retroactively introduce a dependency across task boundaries by using a semaphore or an unsafe primitive. Such a code pattern means that a thread can block indefinitely against the semaphore until another thread is able to unblock it. This violates the runtime contract of forward progress for threads.

(2) isn't relevant in this specific case, but (1) is: task's self-dependency may never allow it to complete, but while it is awaiting itself forever, it's also not blocking one of the cooperative thread pool threads forever while it waits; other work can be scheduled on that thread to let other tasks continue making forward progress.

As long as any concurrent system allows for circular references or awaiting shared resources, it can never be free from deadlocks

jrose · October 4, 2024, 4:14pm

As an ad absurdum example, this task never finishes, but also technically doesn’t block the cooperative thread pool (it sure does waste time though):

Task {
  while true {
    await Task.yield()
  }
}

itaiferber · October 4, 2024, 8:29pm

Makes for a pretty serviceable hand warmer in the winter, too!

ibex10 · October 5, 2024, 12:25am

jrose:

As an ad absurdum example, this task never finishes, but also technically doesn’t block the cooperative thread pool (it sure does waste time though):
Task {
  while true {
    await Task.yield()
  }
}

Then, this makes me wonder what the true definition of forward progress is.

ibex10 · October 5, 2024, 12:30am

Is Mutex in Synchronization safe to use for this kind of lock?

itaiferber · October 5, 2024, 12:55am

Again, individual tasks don't need to necessarily make forward progress themselves so long as they don't otherwise prevent the scheduling of other tasks. The task in Jordan's example will never complete, but it also yields and allows other tasks to run, so it's not problematic beyond spinning and doing nothing.

Sure — the type of lock itself doesn't matter so much as the fact that it's not held unnecessarily long, as the quote mentions. If you have many tasks all waiting on the same lock, none of them can make forward progress so long as the lock is held; if a task is then holding that lock for an arbitrarily long period of time, none of those other tasks can progress. If you have enough tasks to then saturate the cooperative thread pool, all of which are waiting for the lock, then the concurrency system will grind to a halt until the lock is released.

This scenario is the type of "unstructured" dependency model warned against in the talk, since the concurrency system has no way of knowing that all of those tasks are waiting on a single external shared resource. If you find yourself in a scenario like this, you'd want to likely convert the lock protecting that resource into an actor, access to which can then be safely awaited in a structured manner.

Edit: Just to add that the reason the talk focuses on condition variables and semaphores to describe this scenario instead of locks is that locks are much more commonly scoped in nature (and thus more likely to allow for forward progress), whereas condition variables and semaphores are not. The typical use-case for a semaphore (and by definition, a condition variable) is to wait indefinitely until some signal is received — this indefinite wait for some external signal blocks the thread, which would tie up a shared concurrency thread and prevent other tasks from being able to use it. (A lock technically does the same, but the entity that took the lock is also typically the one to release it when it's done, so the external dependency is often not there, and it's usually well known that you should hold a lock only as long as you really need to.)