Clarification needed on UnsafeContinuation documentation

ole · June 3, 2022, 2:50pm

The documentation for withUnsafeContinuation and withCheckedContinuation says (emphasis added):

Suspends the current task, then calls the given closure with [an unsafe|a checked] continuation for the current task.

To me, this says unambiguously that the suspension happens before the continuation closure is called. But I don't think those are the actual semantics.

SE-0300 says:

withUnsafe*Continuation will run its operation argument immediately in the task's current context, passing in a continuation value that can be used to resume the task.

To me, this says that the task is not suspended before the continuation closure is called. I believe these are the actual semantics of the implementation. Correct?

Would you agree that the documentation is wrong?

Joe_Groff · June 3, 2022, 5:34pm

They're both right. The task stops executing any async code at all before the continuation is formed, and any state will be moved off of the callstack into the task object at that point. The closure is then immediately executed in the same execution context (in other words, the current thread) with the closure as a parameter. Once the closure returns, control goes back to the executor.

John_McCall · June 3, 2022, 5:49pm

I think you could argue that the task is still running while the closure is executing. I know that's detectable in at least one way, which is that task-local values are still set; you can't change them without running async code, which you can't do within the closure, but you can still read them. I'm blanking on whether there are other ways this is semantically detectable.

nikolai.ruhe · June 3, 2022, 6:08pm

I find the meaning of the documentation hard to understand.

Consider the following code (which I'm using in unit tests to synchronize progress of two tasks). Depending on the interpretation of the documentation there is a race condition in self.completion = $0.

actor MeetingPoint {
    var completion: CheckedContinuation<Void, Never>? = nil

    func join() async {
        if let completion = completion {
            self.completion = nil
            completion.resume()
        } else {
            await withCheckedContinuation {
                self.completion = $0
            }
        }
    }
}

When two tasks call join at the same time, one will succeed and enter the actor. This task progresses to await withCheckedContinuation. If the task suspends at this point, before calling the closure, the second task may enter the actor and find the condition in the wrong state (completion == nil). That's not the intended behavior.

If, on the other hand, the task does not suspend before the closure is executed, the behavior is as intended and there's no race condition.

The documentation makes me think the code is wrong, and checking needs to take place within the closure.

nikolai.ruhe · June 3, 2022, 6:12pm

To further illustrate what I mean, this is the code I'm actually using, because the documentation makes me afraid the code will suspend at the wrong point:

func join() async {
    return await withCheckedContinuation { newCompletion in
        if let oldCompletion = completion {
            self.completion = nil
            oldCompletion.resume()
            newCompletion.resume()
        } else {
            self.completion = newCompletion
        }
    }
}

John_McCall · June 3, 2022, 6:36pm

The closure is executed synchronously, without allowing any interleaving on the actor; your first code is correct.

This scheduling behavior is actually a special power of the with*Continuation functions ever since SE-0338. We intend to generalize that so that other functions can opt in to that behavior, but we haven't done so yet.

ole · June 5, 2022, 9:19am

I think this is where my misunderstanding comes from. My mental model is that the task continues running, executing the continuation closure, and that the task suspends when the continuation closure returns. (This begs the question what happens when the continuation closure resumes the continuation synchronously, which would kind of resume the task before it suspended, so maybe this mental model isn't ideal.)

This interpretation better fits my mental model. Another way to detect this seems to be withUnsafeCurrentTask:

func doSomething() async {
  withUnsafeCurrentTask { unsafeTask in
    print("Task before continuation: \(unsafeTask!.hashValue)")
  }
  let _: Void = await withCheckedContinuation { continuation in
    withUnsafeCurrentTask { unsafeTask in
      print("Task inside continuation closure: \(unsafeTask!.hashValue)")
    }
    continuation.resume()
  }
}

This will print the same hash value for both unsafe tasks. (hashValue isn't ideal to verify these are identical, but close enough; another way could be to use withUnsafeCurrentTask to cancel the current task before creating the continuation, then check Task.isCancelled inside the continuation closure.)

ole · June 5, 2022, 9:47am

Interesting, thanks for this information. I didn't consider how SE-0338 would change things, but knowing that with…Continuation needs special handling to preserve its semantics makes things clearer.

Let me expand on your post to verify my understanding:

Among other things, SE-0338 prescribes:

non-actor-isolated async functions never formally run on any actor's executor

I.e. if an actor calls a non-actor-isolated async func, the runtime must switch executors immediately. The executor hop may (not must) suspend the current task, e.g. if the target executor is busy.

There is a special exception for await with*Continuation (implemented via @_unsafeInheritExecutor, I think) that opts out of the new SE-0338 semantics and continues to execute these functions on the calling executor.

Correct?

John_McCall · June 5, 2022, 3:28pm

That’s correct, yes.

The semantic rule has always been that the task is not resumed until both resume is called and the closure has returned.

If resume is called synchronously, the task is not suspended at all.

ole · June 5, 2022, 3:38pm

Thanks John!

gwendal.roue · October 7, 2022, 12:25pm

I think that in your sentence, "the task" is the task that calls withUnsafeContinuation. There is a second task that enters the equation, and it is the eventual task that is resumed by withUnsafeContinuation (as a suspension point).

This second task can start running before the closure has returned. For all we know, it may even start running on the same thread as the first task, before the closure starts. This uncertainty raises questions: should we assume that methods that call withUnsafeContinuation must support reentrancy ? To be explicit: if the second resumed task immediately calls the same method that is still inside withUnsafeContinuation, waiting for its continuation, then this method must support reentrancy. And what if we don't want to (support reentrancy)? And which expectations will break when users will be able to write their own executors?

Unsafe continuations are still not sufficiently documented and described. This is all very confusing, and a lot of people are writing buggy code thinking they are safe. One bug around unsafe continuations I found today in the Swift runtime: `withUnsafeContinuation` can break actor isolation · Issue #61485 · apple/swift · GitHub

ibex10 · October 7, 2022, 12:41pm

Personally, I find the documentation for concurrency stuff quite inadequate and poorly written. Sometimes I feel that it is written for those who are familiar with the internals of the system, not for those who actually write code to solve real word problems.

gwendal.roue · October 7, 2022, 3:07pm

Hmm. It is important to distinguish the language and its standard library from the ecosystem that can flourish, based on them.

Sure, the concurrency aspects of the language and the stdlib are in sufficiently documented, leaving too much room for interpretation and nagging doubts. This will improve, I suppose, with time, and also with the discovered flaws that will help ideas to settle. One does not create a robust concurrency system in a few months. It takes more time.

What you call "real world problems" are supposed to be solved in the ecosystem, not in the language+stdlib. This is not a fact or an opinion, this is my interpretation of what I see.

The ecosystem is slow to provide the tools we need. And recent progress are limited to the latest Xcode beta (looking at you, GitHub - apple/swift-async-algorithms: Async Algorithms for Swift), with dependencies on the future Clock), which means that we may never get back-deployable apis when the tools we need ship. To me, this is the most frustrating part.

The ecosystem is slow to adapt to the new concurrency apis, and we're stuck with a marketing motto "fearless concurrency", which lacks building blocks.

asdf_bro · October 7, 2022, 3:16pm

The bug you found may be exclusive to macOS + Xcode 14.0. Check out my post from a few weeks ago.

gwendal.roue · October 7, 2022, 3:34pm

Maybe!

The closure argument of withUnsafeContinuation is documented to run "immediately", and it is not @escaping, so in all reasonable trends of thoughts it has to run on the same thread as the caller.

Now, I don't know if I would assume that it runs on the same dispatch queue. We know that DispatchQueue.sync can reuse the same thread, but changes the outcome of dispatchPrecondition(condition: .onQueue(*)).

I think that the bug I found is more related to the task that is resumed from the withUnsafeContinuation suspension point (not the current task, but the task that has an opportunity to resume, which I call "the second task" in this post) - but this is just my interpretation.

All right, instead of spending more time trying to make sense of all of this, compiler bugs included, let's have a nice weekend

John_McCall · October 7, 2022, 9:29pm

I'm not totally sure how to respond to this. You're arguing that there's a lot of confusion about Swift concurrency, and that's very convincing, because your post also asserts a lot of stuff that's wrong. I think you've misunderstood some of the basic terms in use in Swift concurrency, so let me try to clear things up.

It sounds like you're using "task" as if it's basically a scheduling unit — the amount of code that would be indivisibly scheduled by a single call to, say, dispatch_async. In Swift concurrency, we use the term "job" or "partial task" for that. A "task" is an asynchronous thread, which is ultimately executed as a sequence of scheduling units; those units never execute concurrently with one another, and are in fact totally sequential, and their execution is formally well-ordered with respect to concurrency so that the events in one unit must all happen-before the events in the next.

Continuations are not an exception to this. withUnsafeContinuation does not return until both something has called resume on the continuation and the function passed to withUnsafeContinuation has returned, and that is also formally well-ordered with respect to concurrency. So it is absolutely not the case there are somehow two tasks involved with continuations or that the "second task" can start running before the closure has returned.

Now, there is a bug in Xcode 14 when compiling for macOS because it ships with an old macOS SDK. That bug doesn't actually break any of the ordering properties above. It does, however, break Swift's data isolation guarantees because it causes withUnsafeContinuation, when called from an actor-isolated context, to send a non-Sendable function to a non-isolated executor and then call it, which is completely against the rules. And in fact, if you turn strict sendability checking on when compiling against that SDK, you will get a diagnostic about calling withUnsafeContinuation because it thinks that you're violating the rules (because withUnsafeContinuation doesn't properly inherit the execution context of its caller).

But that has nothing to do with the basic correctness of the order of execution on a task, and its only relation to the scheduling of partial tasks is that it incorrectly creates suspension points at the call to and return from withUnsafeContinuation, forcing more partial tasks to be scheduled. (There is not otherwise necessarily a suspension point on the return from withUnsafeContinuation — if the function passed in manages to call resume on the continuation before it returns, then withUnsafeContinuation will return without the task ever having been suspended.)

gwendal.roue · October 7, 2022, 9:57pm

You sound like you're trying to make me look stupid.

If you want to be useful, please chime in this discussion. I'm trying to build a counting semaphore on top of Swift concurrency (after all, even Microsoft thinks that awaiting a semaphore is not a stupid idea), and we have a few questions that need a practical answer - the main one being: is the current implementation correct?

John_McCall · October 7, 2022, 10:11pm

I am not trying to make you look stupid. I responded to the reply you made to me to try to clarify things what I think you have misunderstood, which is important not just for your benefit, but for the benefit of other people who might find this thread.

I'll go look at that thread. If you thought that that was the right way for me to be useful, though, you might have linked it at some point above instead of just popping off.

gwendal.roue · October 8, 2022, 8:03am

Thank you John for shedding light on the dangers of the Xcode 14+macOS, and helping clarifying the behavior of unsafe continuation in the context of AsyncSemaphore.

Look, we have a reliable semaphore that we can await, now. And it back-deploys as far as it can. And it even works on the unstable Xcode 14+macOS combo. Isn't it good news? Yes is it good news. The actor reentrancy problem reported by some people is solved, for example. I wish more people would know that. Thank you!

ole · October 10, 2022, 2:04pm

@John_McCall If you don’t mind my asking, was this bug known inside Apple before the final release of Xcode 14.0? Or were Apple folks surprised because nobody thought through the implications of matching the Swift 5.7 compiler with the Swift 5.6's standard library module interface? Or was it known but not considered as a serious issue? I don't mean to blame anyone – as evidenced by this thread, I was aware of the special behavior of withUnsafeContinuation and didn't think it through, either.

I'm asking because the compiler generating code that breaks concurrency invariants is a serious problem, and I'm surprised by the lack of communication from Apple about it:

I can't find any mention of it in the Xcode 14.0 release notes or Xcode 14.0.1 release notes. In my opinion, something like this warrants a big red warning at the top of the release notes: "Don't use Xcode 14.0 to build macOS targets that use concurrency!"
The bug is hard/impossible for third-party developers to catch during the SDK beta phase because Xcode betas ship with the beta macOS SDK, only to revert back to last year's SDK for the final release. This is another argument for Apple to communicate it offensively.
The thread that first (AFAIK) mentioned this problem on this forum, Concurrency is broken in Xcode 14 for macOS (2022-09-14), received little engagement or acknowledgement of the issue. (I know that no-one can read everything, so again no blame!)