That’s at odds with a very important feature of atomics, which is that their storage is inline.
Not entirely; they still have reference semantics. Because they are not copyable, you can only access them via borrows (which are references).
Of course, this is the only way atomics make sense -- they must be a kind of shared mutable state (i.e. have reference semantics), otherwise there is no point to atomic operations because there will never be concurrent accesses.
If we consider a mutable Int property of a class, we don’t say the Int itself has reference semantics.
We need better terminology here, because there is such a thing as a ManagedAtomic, which does have reference semantics.
Many of the semantic differences between value and reference types disappear for non-copyable types. A uniquely-held reference has all of the local-reasoning properties we associate with a value type; that’s part of why copy-on-write works, as well as the core insight behind exclusivity, Rust ownership, etc. So on some level, sure, we could say that atomics have reference semantics and that the inline-ness of the storage is an implementation detail that we could expect only experts to understand.
Where the reference-type idea falls apart is that you can have mutable variables that hold references, and you can assign new references into those variables. You can come up with ways that this makes sense for atomic types, actually — assignment had to be done with exclusive access to a variable, so this “reference” assignment replaces the value in the atomic at a moment when you know it’s not being concurrently accessed. The problem is that that readmits the exclusivity-tracking problem; we still don’t want var
s of these types in most of the contexts we’d like to use them.
atomics are Weird, they’re an advanced feature and i think it’s okay if they don’t fit neatly into the introductory value vs. reference semantics dichotomy. i feel like they are closer to being references than values, so i’m okay with saying they have “reference semantics”.
Fantastic proposal, very thorough and I support it wholeheartedly. Props to the authors!
Just one question: several examples in the proposal include a compareExchange that is retried in a while loop (as it usually should) but I’d expect some sort of yield operation at each iteration and I’m seeing none. There is also nothing proposed to that end — is this considered beyond the scope of the proposal?
Back-off (such as yielding) is important when your attempts to perform an operation can interfere with other threads' ability to make progress. Compare-exchanges do cause contention in the memory subsystem that can stall other compare-exchanges for a few cycles, but in typical lock-free algorithms, the expectation is that this contention will always rapidly clear because (1) at least one of the threads should succeed and (2) failure on any thread should set up that thread to succeed on its next attempt, unless another thread succeeds first again. As a result, there's not really any point in adding back-off.
Now, that caveat is important — the fact that other threads might repeatedly succeed over ours is a failure in basic fairness. This is generally considered acceptable for three reasons:
- As far as I know, you can't really fix unfairness while keeping a lock-free algorithm; you need to switch to a fair lock.
- Fair locks are generally a lot more expensive than unfair locks, and much more expensive than lock-free algorithms. The gains from a lock-free algorithm are generally more than enough to pay for a few threads starving for a few cycles.
- If you have a single atomic that's so heavily contended that some threads are completely unable to make progress, something is seriously broken about your design.
But simple back-off strategies like "back off after failure" don't really help with fairness anyway, because it's the other threads you need to make back off.
This all just applies to lock-free algorithms that follow the typical pattern of a load followed by a compare-exchange loop that builds a new value based on the last observed value. Back-off is important for something like a spin lock because it doesn't follow those two rules above. In particular, failing to acquire a spin lock doesn't set the failing thread up to succeed on its next attempt, so a thread failing to acquire a naive spin lock can definitely throw up enough contention to interfere with an attempt to unlock. But implementing a good spin lock is a very tricky business and tends to be very environment-specific; I don't think there's a useful way to offer a general yield
function that would help people to implement portable spin locks. And frankly I'm not sure we really want people to offer portable spin lock libraries anyway, because in most environments you should probably not be using a spin lock.
Thank you for the extensive reply @John_McCall! Understood.
Beyond fairness though, when a thread does indeed starve, does it matter in practice what it’s doing? In other words, is spinning considered harmful and should be avoided, or it just doesn’t matter in practice?
As a general rule, yeah, I’d say spinning for what you might reasonably expect to be a long time is never great. It’s wasted energy if nothing else. Sometimes you don’t have any good alternatives because e.g. you’re implementing the thread scheduler; but short of that, there are better alternatives, and you should use them.
On yield
(i.e. POSIX's sched_yield
), Linus Torvalds has a (well, fairly aggressive TBH, but also informative) piece on it:
The problem with that is "yield" is pretty much undefined. The definition of it is literally about single queue of a real-time behavior with a real-time scheduler with priorities.
But that "definition" has almost nothing to do with actual usage. There are various random people who use it, and some might use it for locking, while some use it for other things.
What you want to use it for is "schedule the right process". But you don't even know what the right process is, or if you do you don't tell the system (because sched_yield() literally doesn't have that interface), so the kernel has to guess.
In some cases, "sched_yield()" is basically used by processes that say "I'm CPU-intensive, but I'm not important for latency, and I don't want to cause problems for others". You'll find various random GUI programs doing that because they are threaded, and one thread does things like update the screen, while another thread does calculations. The calculation loop (which is still important, just not latency-critical) might have "sched_yield() in it as a cheap way of saying "maybe there's a more important UI event going on".
In other cases, it's a performance optimization, where somebody says "I have done my part of the work and written it out, now I'm yielding because there's another user that is likely to want to use it, and that other user is actually the more heavy CPU hog and should run before I start generating more data for it".
And in others, it's because they are actually using real-time scheduling - perhaps on dedicated CPU doing the true Hard real-time kinds of things - and depend on the FIFO definition within their priority group.
That's the defined usage, but even that is really defined only in the traditional UP sense of "there is one process running at a time, and we have one single queue of them".
End result: sched_yield() is basically historical garbage. It's often literally wrong even for the defined usage because times have moved on from that whole "we run one thing at a time" long long ago. If your RT system actually has multiple concurrent threads (as opposed to being limited to a dedicated CPU) it's not really all that well-defined even there. But at least there you can still pretend it is.
Good locking simply needs to be more directed than what "sched_yield()" can ever give you outside of a UP system without caches. It needs to actively tell the system what you're yielding to (and optimally it would also tell the system about whether you care about fairness/latency or not - a lot of loads don't).
But that's not "sched_yield()" - that's something different. It's generally something like std::mutex, pthread_mutex_lock(), or perhaps a tuned thing that uses an OS-specific facility like "futex", where you do the nonblocking (and non-contended) case in user space using a shared memory location, but when you get contention you tell the OS what you're waiting for (and what you're waking up).
The really unfortunate thing about spinlocks, and the reason they're pretty much entirely avoided in userspace on Darwin-based OSs, is the combination of spinning and lacking priority inversion avoidance support. This means that if several high priority threads are waiting on the lock, they may keep a low priority thread from being scheduled to release the very lock that they're waiting for, since from the scheduler's point of view they're extremely busy doing higher priority work.
Luckily, in my experience at least, very few userspace spinlock users actually cared specifically about spinning under contention; they just wanted either fast uncontended locking, unfair locking, or both. Newer locks like OSAllocatedUnfairLock provide both of those properties without the downsides.
It's not just Darwin-based OS's, Linus Torvald's has an even more incendiary rant about spin-locks, and says, essentially, that they should never be done in user-space, period, end-of-story. He's obviously speaking about Linux here, but I think his arguments apply to most (if not all) OSs.
Thanks for the review feedback, everyone! The language steering group has decided to return this proposal for revision based on some of the topics raised in this review thread.