Hang when awaiting call to actor

kavon · December 14, 2021, 8:14pm

I've filed a JIRA with a minimal reproducer on your behalf, but feel free to add some context or motivation for how it affects your particular situation, etc.

Now, I have a larger teaching example, with a lot of comments, to help illuminate what's going on so that you can work around this in your program:

actor A {
  func f(_ i: Int) async {
    print("task \(i) called A.f()")
  }
}

@main
struct Main {
  static func main() async {
    let a = A()
    await withTaskGroup(of: Void.self) { group in
      for i in 0..<3 {
        group.addTask {
          await caller(a, i)
        }
      }
    }
  }
}

func caller(_ a: A, _ task: Int) async {
  print("task \(task) starting")

  // Because this caller function is not isolated to any actor, after completing
  // this call to an async actor function, we remain on a's executor, which 
  // can prevent other tasks from using the same actor.
  await a.f(task)
  
  /////
  // Now, here are some one-liner tricks to play with. Try commenting, 
  // uncommenting, or even reordering:

  // Temporarily gives up a's executor, but I believe it will try to 
  // resume on the same executor upon returning? I'm not sure.
  // await Task.yield()

  // This gives up a's executor and switches to the main actor during the call.
  // Similar to a.f(), since we're calling an async function, we won't give 
  // up the main actor after returning.
  // await asyncMainActorFunc(task)

  // This one would also give up a's executor during the call, but upon 
  // returning it will try to switch back to whichever executor it was on prior
  // to the call. so, this can still prevent forward progress if it appears after
  // a call to an async actor-isolated function.
  // await ordinaryMainActorFunc(task)

  // this terrible hack should get us off of whichever executor we're on now 
  // and onto one that is unique, so every task can make progress in this func.
  // await DropExecutor().doIt()

  ///// end of one-liners
  
  // The goal is to have every task make it to `doLongRunningWork`.
  doLongRunningWork(task)
}


actor DropExecutor {
  var state: Int = 0
  func doIt() async {
    state = 0 // needed to prevent optimization
  }
}

func doLongRunningWork(_ i: Int) {
  print("task \(i) starting long-running work")
  while true {}
}

@MainActor
func asyncMainActorFunc(_ i: Int) async {
  print("task \(i) called asyncMainActorFunc()")
}

@MainActor
func ordinaryMainActorFunc(_ i: Int) {
  print("task \(i) called ordinaryMainActorFunc()")
}

To play with the example above, you can compile with:

xcrun swiftc -parse-as-library hang.swift

(just drop the xcrun if you're on Linux). I particularly recommend starting-off by commenting out all four "tricks". You should see something like this:

task 0 starting
task 1 starting
task 2 starting
task 0 called A.f()
task 0 starting long-running work

which shows that the other two tasks are stuck trying to call a.f(), but the one task still holding a's executor while doing their long-running work. Next, if you uncomment the line that calls asyncMainActorFunc you should see something like this:

task 2 starting
task 0 starting
task 1 starting
task 2 called A.f()
task 0 called A.f()
task 1 called A.f()
task 2 called asyncMainActorFunc()
task 2 starting long-running work

Notice that now all three made to a.f but not any further, because now task 2 is holding the main actor while doing its long-running work. Anytime you uncomment the DropExecutor hack, you'll see all three tasks will make it to their long-running work:

task 1 starting
task 0 starting
task 1 called A.f()
task 2 starting
task 0 called A.f()
task 2 called A.f()
task 1 called asyncMainActorFunc()
task 0 called asyncMainActorFunc()
task 2 called asyncMainActorFunc()
task 1 starting long-running work
task 0 starting long-running work
task 2 starting long-running work

That DropExecutor hack is creating a fresh actor instance and calling one of it's async methods that must be on the instance's executor to update its state. Since each instance has a unique executor, it doesn't matter that each task running caller continues on that executor after the call. Of course, this hack is terrible; so please closely watch the bug report for a better solution or fix.