Incremental migration to Structured Concurrency

rokhinip · February 13, 2022, 7:03pm

Semaphores aren't safe. NSLock can be used safely but require caution to only use in synchronous code and not across an await.

Karl · February 13, 2022, 8:40pm

Hmm, we do also have unstructured Tasks, so I don't think we can entirely depend on structured concurrency. That said, if we consider the ultimate goal of having everything isolated to some actor/global actor, then it would be true. It would mean every access to every piece of mutable state is dispatched by some serial executor, so we can simply say that every task in the queue depends on all tasks before it.

Every access to mutable state would need to await and hop to an appropriate executor rather than block/wait for a lock.

It's interesting. I'm not entirely sure what I think about it, especially in terms of performance. Hopeful but a little sceptical, perhaps.

dabrahams · February 13, 2022, 10:34pm

I know all this, thanks. I'm still trying to get a complete explanation of what specific rules I must follow to use the “use with caution” primitives safely in the presence of Swift async code. These primitives always needed to be used with caution (i.e. according to their documentation) but presumably there are special rules for correct interoperation with Swift async. These details are important if I'm going to use Swift async with an existing body of code that uses threading primitives.

Another example: one thread could lock a mutex, then pass the lock to another thread (by move) to be unlocked.

pthread mutexes very unfortunately allow this but if you do this, the behaviour is actually undefined -

As I reckon things, if the behavior is undefined, they don't allow it. The compiler doesn't enforce the rules, I get that, but the rules are spelled out and, at least with respect to this rule, once you know whether an async task can thread-hop, you know everything needed in order to ensure that it's followed.

I had thought lock-passing among threads was allowed by C++ but now I see §32.5.3.3.2 ¶3 forbids it.

Lastly, since a lock can be implemented in terms of a binary semaphore, it doesn't seem to be an intrinsic property of the semaphore that makes it problematic.

Yes you're right but the only difference here, is that you are using a single bit which is flipped on and off instead of having say, a pthread_t worth of information to have additional bookkeeping on who the locking thread is.

My point here is that you said something like “it's just a matter of which primitives you use,” and also something like “locks can interoperate correctly with Swift async (if used with caution) but semaphores can't.” If I can implement something with the exact semantics of a lock using a binary semaphore, and that lock can be used correctly, then the semaphore in that lock must be interoperating correctly too. I must have misunderstood one of your statements.

Regarding locks, the caution is as follows: Locks intrinsically rely on thread locality - you need to unlock the lock on the same thread which took the lock. You can't hold a lock across an await because there is no guarantee that the same thread will pick up the continuation.

OK, this is good, thanks. Presumably this doesn't apply to code known to be in the @MainActor, though, since it is locked to a single thread?

Using thread local storage is another example of something that is not safe in Swift concurrency since you don't know which threads will pick up your task and the various partial tasks as it suspends and resumes.

OK, do all the special rules for interop with Swift async have to do with thread-hopping, or are there others?

Blocking a thread on a primitive can be safe if you can guarantee that the task which will unblock that primitive and your thread, has already run or is concurrently running.

This sounds like it's going to be useful information, if I can a few more questions answered (sorry):

Is this true even if the task that will unblock the primitive is itself going to block?
When you say "has already run" I suppose you mean that the async function that will unblock the primitive has started, is suspended, and is guaranteed to unblock the primitive before exiting?
Aside from this rule, and the caveats about thread-hopping, are there any others?

Thanks again!

rokhinip · February 17, 2022, 7:14pm

You're right but the dependency of a task on an unstructured task is also known at the point of awaiting it. The distinction is that you know the dependency up front with structured concurrency and there is clear scoping. The dependency on an unstructured task is only known at the point you await it. But it is known to the runtime.

rokhinip · February 17, 2022, 8:32pm

I was referring to the lock APIs we have today - pthread_mutexes, os_unfair_locks, NSLock, etc - and in general when I said "using a lock", I meant a lock implementation that is typically used by clients in the following manner in synchronous code:

Thread 1:
lock()
<critical section>
unlock()

while with a semaphore, I was referring to the likes of DispatchSemaphore or DispatchGroup which typically are used in client code in the following manner:

Thread 1:
<do some work>
if (!condition) 
   semaphore.wait() 

Thread 2:
<do other work to satisfy condition>
condition = true; 
semaphore.signal()

A lock can be implemented with a semaphore but that's the internal implementation of the lock and not of interest to clients who are using the lock in async code. While such an internal implementation of a lock allows for a thread that is not the one which called lock() to call unlock(), I consider that to be (a) undefined behavior (b) not the 99.9% use cases of how people use locks or mutexes when using it as clients of these APIs.

OK, this is good, thanks. Presumably this doesn't apply to code known to be in the @MainActor , though, since it is locked to a single thread?

The main actor is tied to the main dispatch queue. The main queue is tied to the main thread but that tie can be broken if your application calls dispatch_main(). dispatch_main() does a bit of bookkeeping and exits the main thread at which point, the main queue is no longer thread bound to a main thread. It will be serviced by a thread on demand from the dispatch's worker thread pool when there is work on the main queue.

So you could try to make the case that you have some freedom to hold locks across await if your code executes on the @MainActor but I think that is fragile and requires additional knowledge about whether or not the application has called dispatch_main(). Relying on auxiliary knowledge like this to use locks in async code on the MainActor, is not how I'd recommend someone write code with async.

OK, do all the special rules for interop with Swift async have to do with thread-hopping, or are there others?

It's about thread-hopping and also about using primitives that assume a minimum number of threads. A semaphore assumes at least 2 threads being vended to you - the thread which will wait and another one which will signal. A lock doesn't have this requirement - it is perfectly possible, albeit redundant - to use a lock for code that is entirely single threaded. This ties back into thinking about the guarantee of forward progress as being able to finish the workload on a single thread if that's what the runtime decides it can vend to you.

Is this true even if the task that will unblock the primitive is itself going to block?

How is that possible? You have a thread running a task, if the task is using a primitive that causes it to block, you are now blocking the thread as well. How can you guarantee that the task will unblock itself if the thread that is executing it, is blocked?

When you say "has already run" I suppose you mean that the async function that will unblock the primitive has started, is suspended, and is guaranteed to unblock the primitive before exiting?

I meant that it has already unblocked the primitive and so your thread doesn't have to block on the primitive at all when it is trying to acquire it.

If the Task that will unblock the primitive is suspended and hasn't yet unblocked the primitive, once the Task becomes runnable, there is no guarantee that you will get an additional thread to execute that task - the cooperative pool may be at its limit and it may not give you another thread.

This is a very fragile guarantee to be able to uphold as a developer because you are now relying on the scheduling order between tasks, and that can change.

Aside from this rule, and the caveats about thread-hopping, are there any others?

The main thing I'd advise, is to be able to make sure your workload can complete with a single thread using the environment variable. If you are able to run to completion reliably in that environment, you are safe and will be able to handle multiple threads running your workload.

The Swift concurrency runtime reserves the right to make different scheduling decisions, including optimizing the size of the thread pool based on global information on what is happening in the system. Therefore relying on specific scheduling order between tasks and threads is discouraged.

ktoso · February 17, 2022, 10:57pm

// replied on wrong thread

dabrahams · March 1, 2022, 6:16pm

Thanks for your reply, Rokhini! Sorry it's been so long—I went on vacation and am only now getting back to my stack of discussions…

(emphasis mine)

OK, I don't mean to pick nits here, but I am still trying to nail down the truth. From your language above, it seems like it's not just a matter of which primitives you use, but how you use them. However, I also understand that a semaphore does not convey task dependency information, and that in a thread pool with task-stealing, avoiding deadlock can depend on using dependency information to ensure the right task is stolen. So which is it? If I implement a (correct) lock with a semaphore and use it according to the pattern that you recommend for locks, can I deadlock?

While such an internal implementation of a lock allows for a thread that is not the one which called lock() to call unlock()

Can we please pretend I never brought that up? I was wrong about the rules for locks and I'm not actually interested in the case of passing a lock across threads. I haven't been focused on it for several messages now, except to say that it could theoretically make sense, if it were allowed (but it isn't!)

The main actor is tied to the main dispatch queue. The main queue is tied to the main thread but that tie can be broken if your application calls dispatch_main() .

Sorry, do you mean DispatchQueue.dispatchMain? Sadly, I am not well versed in GCD and can't figure out what that documentation says it's doing. I don't know what "park" or "wait" mean in this context. There's nothing that says this function must be called on the main thread, so “parks the main thread” (presumably?) doesn't mean “blocks the current thread.” But you generally can't force a thread that's not the current thread to stop or pause, so it's hard to know what that means… When documentation says a call “waits for” something, that generally does mean the current thread is blocked. So I'm gonna guess that dispatchMain is callable only from the main thread (despite that not being documented), and causes that thread to block until some other thread submits new blocks to the main queue.

dispatch_main() does a bit of bookkeeping and exits the main thread

When you say this call "exits the main thread," do you mean the main thread actually exits, or do you just mean that it blocks?

at which point, the main queue is no longer thread bound to a main thread. It will be serviced by a thread on demand from the dispatch's worker thread pool when there is work on the main queue.

So you could try to make the case that you have some freedom to hold locks across await if your code executes on the @MainActor but I think that is fragile… Relying on auxiliary knowledge like this to use locks in async code on the MainActor, is not how I'd recommend someone write code with async.

Agreed.

OK, do all the special rules for interop with Swift async have to do with thread-hopping, or are there others?

It's about thread-hopping and also about using primitives that assume a minimum number of threads. A semaphore assumes at least 2 threads being vended to you - the thread which will wait and another one which will signal.
A lock doesn't have this requirement - it is perfectly possible, albeit redundant - to use a lock for code that is entirely single threaded. This ties back into thinking about the guarantee of forward progress as being able to finish the workload on a single thread if that's what the runtime decides it can vend to you.

This is great.

Is this true even if the task that will unblock the primitive is itself going to block?

How is that possible? You have a thread running a task, if the task is using a primitive that causes it to block, you are now blocking the thread as well. How can you guarantee that the task will unblock itself if the thread that is executing it, is blocked?

Maybe I'm missing something, but this seems straightforward. Let me try to reconstruct the context. You wrote:

Blocking a thread on a primitive can be safe if you can guarantee that the task which will unblock that primitive and your thread, has already run or is concurrently running.

and I asked

Is this true even if the task that will unblock the primitive is itself going to block?

It's possible this way:

Thread T0 is running a task S0.
S0 launches thread T1 which runs task S1
S0 blocks on a result from S1
S1 launches thread T2 which runs task S2
S1 blocks waiting on a result from S2

Here, S1 is the task that will eventually unblock the primitive blocking S1. It blocks in the last step but is guaranteed to be unblocked when S2 completes.

When you say "has already run" I suppose you mean that the async function that will unblock the primitive has started, is suspended, and is guaranteed to unblock the primitive before exiting?

I meant that it has already unblocked the primitive and so your thread doesn't have to block on the primitive at all when it is trying to acquire it.

If the Task that will unblock the primitive is suspended and hasn't yet unblocked the primitive, once the Task becomes runnable, there is no guarantee that you will get an additional thread to execute that task - the cooperative pool may be at its limit and it may not give you another thread.

Got it! So “is concurrently running” means “is running on a different thread,” which is usually pretty hard to guarantee.

The main thing I'd advise, is to be able to make sure your workload can complete with a single thread using the environment variable. If you are able to run to completion reliably in that environment, you are safe and will be able to handle multiple threads running your workload.

So that's what the environment variable does? Very revealing.

Thanks, this is super-helpful!