Why are semaphores, conditions, and read/write locks unsafe in Swift Concurrency?

I've recently rewatched the Swift concurrency: Behind the scenes - WWDC21 - Videos - Apple Developer video and came back across the section on which synchronization primitives are safe and unsafe to use in Swift Concurrency.

It states that (apart from async constructs like actors) os_unfair_lock and NSLock - and presumably by extension pthread_mutex_t which underpins NSLock - are safe to use in synchronous code but require caution.

On the other hand it specifically calls out semaphores, conditions (pthread_cond, NSCondition), and read/write locks (pthread_rwlock_t) as unsafe primitives to use, but doesn't provide a satisfying explanations as to why these primitives are unsafe to use. Specifically the video states:

... primitives like semaphores and condition variables are unsafe to use with Swift concurrency. This is because they hide dependency information from the Swift runtime, but introduce a dependency in execution in your code. Since the runtime is unaware of this dependency, it cannot make the right scheduling decisions and resolve them.

Can someone provide additional context as to what exactly this means practically? Is code that uses these primitives as risk of hanging or deadlocking? Is it an issue of priority inversion like alluded to here? Could it lead to data corruption or crashes?

Mainly I'm just trying to understand the underlying (low-level) rationale as to why these primitives are unsafe and the consequences of their use in asynchronous code (assuming that locks aren't held across await boundaries) or synchronous code invoked by asynchronous code.

1 Like

There is the priority inversion issue, but there's also a thread-switching issue: Many locks require that they be released on the same thread where they were acquired. Swift Concurrency doesn't know about threads*, so if you acquire a lock, await something, and then attempt to release the lock, you could encounter an error due to the await silently resuming on a different thread.

Swift concurrency also has a much smaller thread pool than, say, Dispatch -- one thread per logical core, plus the main thread. You could easily block all tasks if only a handful of them are all waiting on the same lock, since the concurrency runtime will never pre-empt one job to run another. This is what is meant by "tasks must make forward progress" -- it should be assumed that a task left to its own devices will eventually either finish or suspend, and never be blocked by something outside of itself (in this case the lock).

*The main thread and the MainActor have a special relationship, and you could write a custom actor that's bound to a specific thread, but in general it's not safe to make any assumptions about specific threads in the context of Swift concurrency.

5 Likes

"Unsafe" in the video means that they are not subject to Swift's guarantees about data-race safety.

It doesn't mean they don't work as advertised. It means you'd be responsible for integrating their behavior with the Swift compiler's verification of data-race safety.

You can create components that are safe using other synchronization primitives and then mark them Sendable (possibly unchecked). Essentially, you tell the compiler what to trust in this regard.

The usual cautions apply: those primitives might rely on invariants not guaranteed in Swift (e.g., the thread hopping mentioned above), and there are many ways to use locks and other synchronization primitives that don't afford the guarantees that you actually want (lock granularity, ordering, ...).

1 Like

Swift Concurrency's scheduling algorithm assumes that threads will never be blocked on "future work" — that the program will always make progress even if nothing that's currently waiting for a thread starts running until indefinitely "later". This is an important assumption because otherwise the system can be required to proactively create extra threads under heavy load conditions, which is exactly when you don't want to be creating extra threads. Unfortunately, this assumption is broken by several common use patterns of semaphores and condition variables. For example, if you create a task that will signal a condition variable, and then you wait on that condition variable on the current thread, you are blocking on future work: the current thread will not make progress until the system finds a different thread to run the task. That is why those are dangerous to use in Swift Concurrency.

If you have a use of semaphores / condition variables that avoids blocking on future work (e.g. the condition will be signaled by a dedicated thread that is not scheduled by Swift Concurrency), that is not unsafe in this same way.

Semaphores and condition variables are also inherently prone to priority inversion. This can be a serious performance problem, but it is not specifically worse in Swift Concurrency than it is in any other environment.

Reader-writer locks aren't inherently prone to priority inversion, but most implementations do not protect against it correctly. This is because it's very difficult (and much more expensive) to write one that does the necessary priority elevation. Moreover, optimizing reader-writer locks is usually not an implementation priority because high-performance code with heavy reader/writer skew generally aims to avoid reader-writer locks anyway (in favor of lock-free algorithms). Reader-writer locks are not otherwise problematic in Swift Concurrency, except that (as is usually the case for locks) you must take care to unlock them from the same thread that locked them — which is to say, don't await while holding a lock.

27 Likes

Respectfully, this is not a “data-race safety” issue: It is a scheduling concern. As they say, the fundamental issue is that these unsafe primitives “hide dependency information” between tasks. Also, we must fulfill our contract with Swift concurrency system to not impede forward progress in the cooperative thread pool. (And a bbrk24 said, there is also the issue that Swift concurrency makes few guarantees about what thread is used after a suspension point, so any thread-dependent API will be incompatible with Swift concurrency across suspension points, i.e., an await).

The threads in the cooperative thread pool are (deliberately) quite limited and the whole Swift concurrency system is predicated on its ability to make clever, optimized scheduling decisions about the dependencies between tasks. Anything that introduces outside dependencies (about which Swift concurrency is unable to reason), or, worse, blocks a thread from the cooperative thread pool, can introduce scheduling problems.

It is not a race safety concern, but a scheduling one.

4 Likes

It sounds then, that if I were to create a type -- similar to Apple's OSAllocatedUnfairLock or to the new Mutex type -- but using a pthread_rwlock_t for its underlying synchronization mechanism, it wouldn't create larger issues apart from the potential priority inversion, depending on the underlying implementation of the lock itself?

If I am understanding you correctly, if I am awaiting a semaphore or condition variable that will be signaled from a thread that is not managed by the Swift Concurrency runtime, then this doesn't inherently break the runtime contract of forward progress? Of course there are other pitfalls like tying up one of the pool's threads or priority inversion, but apart from those there aren't any issues in regards to the concurrency runtime?

More broadly, it sounds like "unsafe" in this context is mainly referring to breaking the runtime contract of forward progress that underpins the whole system. Am I understanding that correctly? Since the runtime isn't able to just spawn additional threads, the risk here is that if enough of these "unsafe" primitives are used that the runtime could end up in a deadlock?

1 Like

Am I to understand then that the main reason that os_unfair_lock and pthread_mutex_t are considered "safe" (but require caution) but pthread_rwlock_t is considered "unsafe" is due to the potential of a priority inversion?

Am I correct in interpreting this as meaning that the Swift Concurrency runtime doesn't (or can't) know about the dependency that these primitives (specifically semaphores and condition variables) creates between two threads, leading the possibility of a priority inversion?

It sounds like the runtime scheduler would be able to know about the priorities and dependencies between two async tasks and be able to schedule them appropriately, whereas if a task awaits a semaphore or condition signaled by another task/thread then it wouldn't know about this relationship and could potentially schedule things in an unperformant manner (in addition to the issue of blocking one of the runtime's threads)?

If that is the case, don't forget to initialize it inside an UnsafeMutablePointer yourself instead of implicitly wrapping it every time you pass it to lock/unlock/etc. function:

The pointer created through implicit bridging of an instance or of an array’s elements is only valid during the execution of the called function. Escaping the pointer to use after the execution of the function is undefined behavior.

1 Like

Yup! :slight_smile:

Inspecting the Swift Interface of the Apple os module the OSAllocatedUnfairLock actually uses ManagedBuffer under the hood to manage the pointers to the state and lock, which is what I use in my own implementation

Right.

Right. Swift Concurrency's default task executor has an overall progress guarantee that relies on the jobs it executes making their own progress guarantees, which must not recursively depend on the default task executor making progress.

That's correct, yes. Normal use of pthread_rwlock_t doesn't interfere with progress (other than by priority inversion) because threads waiting on the lock are only ever waiting for other threads to make progress. It's a little unfortunate that the presentation slide puts them into the same category of unsafe primitives — the spoken text only talks about semaphores and condition variables, but the slide clearly lumps them all together.

Now, priority inversion can be a serious obstacle to progress, depending on the kernel's thread scheduling algorithm and exactly how overloaded the system is, but that's a somewhat separable concern.

Yes. In fact, it can be as little as one improper use — on Darwin, Dispatch assumes that threads running on behalf of Swift Concurrency never block in this way, so if they violate the rules, the program can actually deadlock with just a single scheduled thread.

I should note here that using semaphores this way can create progress problems even without Swift Concurrency's stronger assumptions. Dispatch will spawn extra threads to try to make progress if the jobs submitted to e.g. dispatch_async seem to have stalled out, but it does have a hard upper limit, and that hard upper limit means it can still deadlock if all of the threads are blocked on future work. It's just more forgiving because it can't happen from isolated semaphore use.

6 Likes

The unavoidable priority inversion problem with any API that looks like a semaphore — as opposed to any API that looks like a lock — is that a thread waiting on the semaphore has no idea what thread is going to signal it.

2 Likes

Yep, that’s part of it: It can’t reason correctly about task priorities and dependencies between tasks when using these unsafe primitives. As you quoted in your original question (emphasis added):

Another concern (which you raise below) is that one can easily block one of the very limited number of threads in the cooperative thread pool.

Yes. And not just “unperformant” from poor scheduling, but potentially introduce blocking risks (or even deadlocking), too. As that video continues:

Regarding the safe use of locks, that video clarifies the proper usage:

(Needless to say, we would generally use OSAllocatedUnfairLock rather than using os_unfair_lock, directly.)

But, bottom line (and my apologies for pointing out what I suspect you already know), we only use lock.withLock {…} or mutex.withLock {…} around “tight, well-known critical sections”, but not across a suspension point or between tasks. And we do so with great care (and when we know that it will be very quick). And the beauty of the withLock pattern is that not only does it ensure that the locks and unlocks are balanced, but it also eliminates a potential source of misuse, namely, across asynchronous boundaries.

2 Likes

Something else that’s worth noting about priority inversion is that in app programming contexts it’s less likely to be a serious issue. It’s not unreasonable to know that a particular lock in an app is never held by truly low priority threads for example.

Whereas in a daemon programming context the default assumption is that everything is extremely low priority and then some things suddenly get boosted to high priority when the foreground app sends a message. It’s surprisingly easy to get stuck so badly that processes start getting killed due to being unresponsive for too long. I’ve even seen OSes fail to boot due to this back when we still used spinlocks in userspace.

I would still encourage everyone to care about priority inversion; it’ll especially pay off when the device is overloaded or overheated. But hopefully this perspective helps in understanding why we’re this careful about the topic.

7 Likes

Apart from John's reply, I'd like to add that you can easily introduce other critical problems like dead locks if you are not careful. For example, if this not-in-runtime thread somehow participates in a waiting chain and eventually waits for a signal coming from one of the threads managed by the Swift concurrency runtime, than there can be a deadlock. This is especially difficult to avoid if this thread of yours can run custom code not controlled by you.

3 Likes

Thank you all for your replies and explanations; this has been extremely informative in terms of these primitive types and the respective consequences of their use!

It seems a common motif in this thread has been priority inversions and how they can come about. I would like to get a little more information about this issue, but first want to confirm my understanding:

At the most basic level a priority inversion is a situation where a thread with a high priority is waiting for a thread with a lower priority to finish -- or in the case of a semaphore or condition variable waiting for "permission" to continue by means of a signal -- whether or not the actual thread that's being waited for is known.

It also seems that a condition for classifying this as an "issue" is that the system can't elevate the priority of lower-priority thread to match the priority of the higher-priority thread, but I'm not particularly confident about this point.

Is this understanding correct or is there more nuance to this issue?

Now, thread priority and how the system schedules which threads should be running at any given time seems like a broader conversation outside of the purview of this specific thread, but if there are good resources or other threads that I could be pointed to, I would greatly appreciate that. That being said, I want to understand a bit more about the seriousness of the issue of priority inversion.

For a bit more background, I'm effectively leading the charge for my company in our transition to Swift 6. I'm working on an internal library with a number of higher-level "primitives" that are meant for consumption within our application for aiding in synchronization and creation of Sendable types across our codebase. Also of note, we've come across the "reentrancy problem" in our own code and have sought similar solutions, but practically all suggestions in the linked thread warn against this kind of pattern due to the potential of priority inversion.

In terms of low-level system programming (e.g. daemons, kernel extensions, OS, etc.) I can certainly see priority inversion being a huge issue (as mentioned by @David_Smith above), however, in practical "Application" programming contexts how big of an issue is this really?

I am trying to balance between creating types and APIs that abstract enough of the nuances of Swift Concurrency to make it simpler for those that aren't as knowledgeable and ensuring that all of the types/APIs that I create are extremely performant and don't introduce these kinds of issues under the hood. Any guidance or tips for navigating this would be greatly appreciated.

It depends a lot on what you’re doing and how you do it, but if you aren’t extensively using prioritization yourself I would expect the main symptom to be degraded UI latency when the device is under thermal pressure or multitasking a lot. Think situations like “plugged into a fast charger in a hot car while streaming music and doing turn by turn directions with poor cellular reception”. (My thinking here is that even if you aren’t using a wide range of priorities there’s still main thread vs everything else)

The issues around not synchronously waiting for future work are the more potentially serious ones.

And of course none of this is specific to Swift, you’d have all the same concerns in a pure Objective-C app, just with variations on how they presented themselves (I’ve certainly gotten enough bug reports from ObjC apps deadlocking due to blocking too many libdispatch threads to the point that it refused to spawn any more).

5 Likes

I would be cautious about burying any low-level synchronization primitives inside a library that lots of developers will be using. In my mind, the choice to descend into these lower-level techniques is an edge case, always one that should be a very conscious choice, perhaps not something casually hidden behind some library’s API. Before you do introduce these sorts of low-level primitives, I would ask myself how likely a well-intentioned, newer developer could innocently misuse it and introduce problems.

Personally, rather than worrying about the abstract problem of “why not use these low-level primitives?”, I might suggest a stretch objective of writing a library that is entirely free of them. And if you have a specific problem for which you are inclined to fall back to our old, familiar, low-level synchronization primitives, I would pose that specific problem to a community like this, to see if others know of a more natural async-await solution. Discussing specific use-cases is likely going to be more fruitful than an abstract conversation, like this.

But, don’t get me wrong: There are edge-cases where we need a different synchronization mechanism. (See the discussion in SE-0433 – Mutex.) But use them sparingly, only where you must.

Yes, that is precisely the sort of scenario where these primitives would be a catastrophically bad approach. There are tons of natural Swift concurrency solutions without any need to fall back to legacy synchronization techniques.

IMHO, simple priority inversion issues tend to be secondary. The more profound risk is deadlocks. We must avoid hidden dependencies between Swift concurrency tasks.

1 Like

Handling reentrancy is rather simple when you aware of it - you just need to implement any kind of serial (queued) execution at this points. Or avoid/remove async part and put inside an actor if possible.

I’d suggest to ask yourself a question where do you need concurrency-safe primitives? Quite often you might be better with non-Sendable types instead, passing structs as data mostly. And for cases where you do need some shared mutable state use either actor or mutex (I guess Mutex more likely might be not available due to runtime requirements, but any other will be fine). That should cover most of the cases with Swift 6 mode turned on.

I would’ve avoided providing extensive tools for implementing Sendable types at the core of the project generally and limit it to basic tools. From my personal experience, some things that looked like need to be Sendable at the first sight, turn out to be much simpler other way around.

1 Like