Here is a pitch for adding a limited set of low-level atomic operations to the Standard Library, including native spellings for C++-style acquire-release memory orderings. Our goal is to enable intrepid library authors to start building synchronization constructs directly in Swift.
This means we must start talking about how these things will work in Swift -- in other words, we need to start working on a concurrency memory model. Given that Swift already interoperates with the C/C++ memory model, it seems like a good idea to use that as a starting point.
The document linked below attempts to informally translate the C++ memory model to Swift, loosely describing how the various abstractions/operations work, and how they interact with some of Swift's unique language features.
If you're interested in this area, please give it a read and let me know your thoughts!
(Author's warning: This is a long document on an intricate subject. It is intended to be a coherent introduction to the underlying concepts in a way that works as a standalone article, but reading this one document alone won't (can't!) give you the full details. I included some references for people interested in learning more about this subject.)
As a quick taste, we want to make atomics look like this:
import Atomics
import Dispatch
let counter = UnsafeAtomicInt.create(initialValue: 0)
DispatchQueue.concurrentPerform(iterations: 10) { _ in
for _ in 0 ..< 1_000_000 {
counter.wrappingIncrement(ordering: .relaxed)
}
}
print(counter.load(ordering: .relaxed))
counter.destroy()
Thank you so much @lorentey for putting together such a well thought through pitch. The functionality this will offer is a major building block for high-performance applications and libraries and is crucial to make Swift more useful as a systems programming language.
In SwiftNIO we have been missing this sort of functionality since forever. To SwiftNIO specifically this will improve two things:
minor: we can delete a lot of C code (CNIOAtomics)
major: because of the memory ordering guarantees proposed, we will be able to remove a number of locks from some hot code paths (for example in the EventLoop task list) which should provide really nice speedups in high-load scenarios
+1 for leaving out the double-words for now. Although I already can’t wait for that proposal
+1 also on offering the low-level APIs first. They might not look as pretty as other things in the stdlib but it’s IMHO exactly the right call: it allows library authors to build the lock-free/atomic things that users can then use with a nice API that fits whatever is being built.
Thank you @lorentey, looks really great! One thing I would like to propose is to add basic AtomicDouble primitive as well. At least one of my libraries already contain c-based (much like NIO) implementation, but I think this would be generally useful as well (many languages contain such primitives, like Java, C++ and others). I understand that this primitive can be implemented using an AtomicInt, but it requires a lot of unsafe bit casting, so there is a chance that some people who need this functionality will have to implement it and may encounter hard-to-understand bugs. wdyt?
Thank you @lorentey! Overall, I think this pitch describes a very well-thought out and reasonable approach given the current shortcomings/constraints of the language. I've got a couple of points I'd like to share.
I personally think all exposed atomic operations should have sequentially consistent ordering by default. I appreciate encouraging programmers to think about what ordering to use, but my personal experience is that it takes quite a while until one builds up the necessary knowledge and intuition in order to even remotely answer that question correctly. A seqcst default ordering means the programmer can start using atomics immediately and defer thinking about/experimenting with orderings until much later.
One thing missing from the pitch and implementation, which I think is required, is a primitive for yielding the processor when implementing busy-wait loops. I think the examples provided are all suboptimal and actually need a PAUSE instruction at the end of each iteration. I'm by no means an expert, but I know it's considered best-practice and that at least one CPU vendor [1] advices for it.
Thank you again! I really can't wait to see this making it into the standard library.
This is awesome! I really like the overall design.
One minor point I'm curious about is the constant-evaluability enforcement for the ordering argument. The proposal suggests this should be applied during type-checking, which seems at odds with the current constant evaluation infrastructure used to support things like the os_log prototypes. Are there any concerns that the type checker and optimizer's interpretation of what is constant-evaluable would diverge over time?
Given that the enforcement isn't proposed as a user-accessible feature, I wonder if it makes sense to just enforce that the switch gets optimized away instead of trying to analyze the passed-in argument. Maybe some kind of private attribute like @_staticAssertNoDispatch which could be applied to a switch to assert that the corresponding branch/lookup is removed during optimization?
That said, I don't have much experience in this area so it's possible the existing design is better, and this is a very small part of the overall proposal.
A minor tangent: it doesn't require any unsafe bit casting. You can use the bitPattern property and initializer (those are implemented in terms of unsafe bit casting, but that's an implementation detail for the standard library to worry about):
With a quick look (always dangerous with such an intricate topic! :) this looks really great! I agree with others that the approach of providing low level primitives (effectively LLVM IR level) is the way to go to allow library devs to do interesting things on top.
Some random questions:
It seems a bit odd that UnsafeAtomicMutablePointer<T> is implicitly nullable when UnsafeMutablePointer is not. I agree with you that the non-nullable version will not be very useful, but did you consider making the name more explicit that it is nullable? Similar question for Unmanaged and Reference. Incidentally, +1 for including these helpers: if you don't, each library developer would have to reimplement them.
I would expect UnsafeAtomicInt to be a non-movable value type (which, of course, we can't do quite yet). However, as you describe it, it is more of a pointer to an unsafe pointer to an atomic value in memory. Have you considered including the word "Reference" or "Pointer" in its name? Something like UnsafeAtomicIntReference would connote something a bit closer to what you're proposing and leave the door open to an actual UnsafeAtomicInt or AtomicInt type later.
Did you consider an API style where you (also?) provide static methods that perform these sorts of operations, and take the unsafe pointers as arguments?
Why static func create instead of using an initializer with a keyword argument?
I agree with the point upthread that memory ordering should default to sequential consistency, for progressive disclosure of complexity.
I also agree with the comment upthread that you should use the existing constexpr stuff used by oslog instead of creating a new syntax-driven concept.
In "Memory-Safe Atomic Constructs" in future directions, generally you'd want atomics to be non-movable, not move-only. This is how std::atomic works in C++.
As a general comment about the pitch writing itself, I think it should be made significantly shorter. For example:
It sometimes goes into trying to explain what ordering and consistency models are (e.g. in the Acquire-Release Ordering section), and sometimes references C/C++ model. I would pick one or the other and use it consistently. More specifically I would recommend describing everything in terms of a mapping to the C model (which has had many experts work on it for years). This will make it very concrete and unambiguous, instead of opening the door to imprecision and interpretation problems.
The paper also goes into things like compiler optimizations and other things that are not core to the proposal; I would recommend dropping them. Some of the examples are also dangerous and don't seem necessary.
The paper also includes a bunch of topics like "However, it can be surprisingly difficult to generally reason about what program flows are allowed under acquire-release ordering. For example, consider the following example,..." which are general background about atomics, but not about the proposal itself.
In short, I'd prefer to see a focused API proposal, not a tutorial on how to use atomics and the traps therein: such a thing could be its own book: this proposal cannot do justice to the topic and such content dilutes the meat of the critical parts. The additional content could be split out to a reference doc or something like that of course.
Super happy to see we're finally defining concurrency! Also, nice job sidestepping the problems with memory_order_consume, which was a worry for me when I first saw this announcement. I will follow up when I've had a chance to digest more, but having read @Chris_Lattner3's remarks I'd like to prematurely +1 them: focused pitches and proposals really help me evaluate and provide useful feedback, and in so doing, incentivize positive participation.
While the background might dilute the substance of the pitch, I hope such feedback doesn’t deter future pitches from providing the background as a reference doc. I learned new things from it. I am humbled and far more appreciative of the effort that went into the design. My deepest gratitude to the author!
I hope proposals in the future don’t deprive me and others of an appreciation for the complexity of a topic or the opportunity to learn something new.
I am far from an expert on atomic and lock-free programming, but am very excited to see progress happening! I agree with others that starting with the low-level foundations is the right approach. Thanks for working on this @lorentey!
I also found some of the background very helpful as a refresher. That said, I think that part could easily be split off as a separate companion document for those need it, leaving the proposal itself more focused. It was a very long read that took quite a while to get through. If I were an expert in this area I would have preferred a more concise presentation of the proposed design for Swift.
UnsafeAtomicInt isn't an Int. It's a kind of wrapper around an Int that provides atomic functionality. That's not what I expect from the name. As mentioned upthread having an init method that takes an Int value is what I expect, at least for the common case. Having the factory method create doesn't seem sensible.
Isn't decrementing the same as incrementing by a negative number?
Would it make sense to have an Atomic protocol that the various types adopt so other devs could add their own types in a sensible way?
Disclaimer: I have not read any of the background material that the author has written on Atomics, nor do I have a solid familiarity with Atomics in general
IIRC, there is a limited set of primitive values that CPUs support for atomic operations. Primarily integers.
That would basically restrict the ability for people to provide their own types unless we have
public protocol AtomicRepresentable: RawRepresentable
where RawValue: BinaryInteger { }
This does raise the point that I would like to see the Atomic value to be more than just Int. I very much appreciate the Atomic API provided by NIO where it is generic, giving way to Atomic<Int8>.
Especially if the name is to be UnsafeAtomic, as it would by familiar with the other Unsafe APIs.
Is there a performance reason (or any reason) that UnsafeAtomicInt and UnsafeAtomicUInt can't be represented by a single generic type (but still constrained to Int and UInt)?
First of all, I really like the discussion in this proposal. Many have noted that it is a little long-winded, but this is a complex topic being posted on a forum that is meant to be accessible to a broad class of programmers and I think it does a great job of filling in the gaps.
One idea I had that might make the API and proposal simpler is to rename the C-style memory orderings in terms of what they allow and make them an OptionSet. I've come up with some terrible strawman names to illustrate: allowReorderingOfSubsequentOrPrecedingAtomicOperations, allowDeferOfPrecedingMemoryAccesses, allowOptimisticLoadOfSubsequentMemoryAccesses.
I reiterate: these are terrible names. What I like about this approach though is that it fits with progressive disclosure: you start with what you naively expect (no reordering) and then you allow bits of complexity in, giving you an opportunity to mentally verify that the thing you are allowing doesn't break your invariants. This would be significantly more verbose, and I think this is good too: most concurrent algorithms I've encountered are fairly terse, but significantly more complex that that terseness implies. This would also be a win for folks who either haven't looked at memory barriers in a while and need a refresher about what acquiring means or programmers stumbling upon a concurrent algorithm for the first time.
Exciting times! I appreciate everyone's comments so far and would like to add a few of my own:
I like the idea of a single generic struct + associated protocol, though I'd name it UnsafeAtomicPointer<Value>. I wanted to make sure this was feasible from an API perspective, so I made a mockup here. Of course, I'm not sure if the compiler can support this level of optimization…but if we put the SIMD types through the wringer to support a pretty and somewhat-extensible syntax, we can probably do it for atomic pointers too?
The pitfalls with Unmanaged referenced in the proposals seem pretty bad, to the point where even with my protocols I think it's probably best not to make Unmanaged conform at all. (They can go through their opaque representation of UnsafeRawPointer just fine; my API prototype contains a reimplementation of UnsafeAtomicLazyReference.)
I mildly do prefer a separate static func create over a labeled initializer to make it clear that it's sort of a compound operation, but I could live with either one.
Does create need an ordering argument, or is it guaranteed that anyone who's managed to acquire the atomic reference has already performed some kind of synchronization? (Worried about memory being deallocated and reused, thus leading to other threads getting ahold of the address and seeing an old value there.)
Seconding "default-to-sequentially-consistent". (For everything, not just create.)
It would be nice™ if a global initialized with create and never destroyed could be promoted to a static allocation by compiler magic.
UnsafeAtomicLazyReference.initialize(to:) feels like the wrong level of abstraction to me. What do you think about this instead? This is definitely the most common idiom I've seen for initializing a lazy singleton.
func getIfPresent() -> Instance? {
return underlying.load()
}
// Filled out more in my UnsafeAtomicPointer prototype.
func getOrCreate(_ constructor: () -> Instance) -> Instance {
if let value = getIfPresent() { return value }
let newValue = constructor()
let (exchanged, existingValue) = underlying.compareExchange(nil, newValue) // omitting refcounting logic for now
return exchanged ? newValue : existingValue
}
Relatedly, I think destroy should be named deallocate, to match UnsafeMutablePointer. It's only UnsafeAtomicLazyReference that does any additional work anyway. (You may have seen me put that into my API prototype.)
Floating point atomics are called out in the Potential Future Directions section. I expect Float & Double will eventually become primitive atomic types, however I'd much prefer to defer this to a followup proposal.
The implementation of atomics for these types deserves its own separate discussion. (The main issue is that compareExchange wants to implement bitwise equality rather than the usual floating point comparison.)