Low-Level Atomic Operations

lorentey · March 24, 2020, 7:35am

This can also be done by explicitly spelling out .sequentiallyConsistent -- but I see the point!

This looks like a universal request, so in the next revision I'm defaulting to sequential consistency.

(Fun fact: for a couple months, seq_cst was not even part of this feature set. )

This sounds like a good idea. I don't want to encourage unbounded busy waits, but we definitely want to support spinning a few hundred/thousand times before blocking.

On ARM I expect we'll want to emit WFE, which needs to be paired with SEV (and sometimes prepared with SEVL).

Assuming we're okay with modeling a WFE/SEV-style global event (rather than something more fine-grained that may not have hw support), we'd need to add two or three new entry points: (Note: suggestions for better names welcome)

/// Pause and briefly rest within an iteration of a spinlock-style retry loop.
func spinPause() {
  // WFE on ARM, PAUSE on Intel.
}

/// Signals that this would be a good time for threads that are currently
/// paused in `spinPause` to wake up and retry their condition.
func spinWake() {
 // SEV on ARM, nothing on Intel
}

// Optional:

/// Prepare for a spinlock-style retry loop that immediately starts 
/// with a call to `spinPause`.
func spinPrepare() {
  // SEVL on ARM, nothing on Intel
}

Here is how these would look in actual use:

class NaïveFuture<Value> {
  private let _barrier = UnsafeAtomicInt.create(initialValue: 0)
  private var _result: Result<Value, Error>?

  deinit() {
    _barrier.destroy()
  }

  func set(_ body: () throws -> Value) {
    precondition(_result == nil)
    _result = Result(catching: body)
    _barrier.store(1, ordering: .releasing)
    spinWake()
  }

  func get() throws -> Value {
    while _barrier.load(ordering: .acquiring) == 0 {
      spinPause()
    }
    return try _result!.get()
  }

ktoso · March 24, 2020, 8:06am

A bit slow to chime in here, but wanted to re-read the latest version

Overall this is fantastic and exactly what we need in Swift to get going with building some low-level concurrency building blocks and concurrent data-structures – can't wait to remove tons of C code we have to use today to achieve the same results

Following up on some specific topics from the doc and brought up in this thread:

As someone working with atomics quite often I really really would want for them to not have a default odering. If this gets overruled by the majority of folks so be it, however here is the rationale which lead to not having default values to begin with:
- It has happened to myself or peers multiple times that an atomic operation on the "default" ordering slipped through during review, what should have been an acquire/release etc. Things work and "look fine" but one always has to exert the special mental energy that "never use the default ordering", as on the level of data structures where one cares for those primitives it often is (and should be) a thoughtful decission which ordering to use. If it is sequential, then it is so for a reason, not because "I didn't think about orderings".
- tl;dr; I'm sympathetic to the progressive disclosure argument, but I'll argue that it actively hurts users who really work with such operations. The "I don't care" style of writing atomics based algorithms and data-structures is not really something that should be encouraged IMHO, those things are hard enough to get right in the first place, and we should not make it simpler to "not think about it" when building those
- if someone really does not care, it is simple enough to add an extension function which has the sequential ordering by default. It is not possible to "remove" it though.
Very nice to see fences being adopted after all! They're a nice to have (at least for all cases I can think of, as there's usually some counter I'd use as barrier anyway), but very nice to have them right away, sometimes they
Ordering requirement to be a compile constant: not a problem. These operations are always pretty tightly connected to the environment they work in and changing from one ordering to some other based on some setting (pass it in) is not really useful.
Very excited for UnsafeAtomicLazyReference<Instance> being included in this proposal I have numerous use lock-protected variables which this will replace
The set of atomic types looks solid Doubles and Floats can follow some other time, but the most important (U)Int64 being in the initial proposal is all I really wish for.
minor: The spelling of create surprised me a bit. In reviews I'm always told to stick to make... consistently, though grepping through some codebases now I guess create is also being used sometimes... No strong preference, just surprised me now.
no strong feelings around the other "shape" of the APIs really, current or similar proposals all look fine
for PAUSE if we could get those in the initial pitch! Was thinking we'd get those later, but they're definitely very helpful -- exactly for "a few quick spins before heading to sleep," definitely have use cases for this.
On final note: let's not reinvent custom spellings for the orderings, adopting the existing ones is fine (current wording) these things are confusing enough that sticking to same wording across runtimes (C/C++/Swift/other concurrency models) helps building or porting the right thing.

In any case, the proposal is / 10, thank you again for all the hard work and can't wait to put these to good use!

lorentey · March 24, 2020, 8:42am

We hope to piggy-back constexpr-constrained orderings on @ravikandhadai's Sema-level diagnostics improvements for logging -- see https://github.com/apple/swift/pull/26969 for an earlier work-in-progress PR. (This doesn't include any atomics support, of course.)

Statically asserting that the switch always needs to be eliminated could be tricky if e.g. any of these are ever emitted as standalone functions. (Arguably we shouldn't be able to utter things like let myFunc = UnsafeAtomicInt.load, but preventing this sort of thing may not be feasible in the short term.) Ravi is much more competent in this area than I, so I'll stop pulling things out my nose and let him chime in instead.

Jean-Daniel · March 24, 2020, 9:21am

I agree on this point. This API is explicitly design as a low-level API and requires that the user know how to use Atomic and understand them in the first place, so I don't think the progressive disclosure argument applies here.

Higher level APIs would be free to add a default ordering as they see fit.

lorentey · March 24, 2020, 9:21am

The problem is how to spell these. I think UnsafeAtomicMutablePointer is already overly explicit, and UnsafeAtomicOptionalMutablePointer would be even worse.

Arguably Mutable is redundant for atomics, so how about UnsafeAtomicOptionalPointer?

I'm okay with UnsafeAtomicOptionalUnmanaged.

I don't think we need to also support non-optional variants.

The non-copiable/movable variant would be called AtomicInt, because it would be memory safe.

I believe that the consistent UnsafeAtomic prefix will be enough to alert readers to the new convention. All UnsafeAtomic types work the same way, and they all share the same prefix.

On a superficial level, a Pointer suffix wouldn't work well for UnsafeAtomicMutablePointerPointer, while Reference gets confusing with UnsafeAtomicOptionalUnmanagedReference and UnsafeAtomicLazyReferenceReference.

More importantly, I want to encourage people to think of UnsafeAtomicFoo as approximating the eventual AtomicFoo that owns its storage. I expect most use-cases will be happy with create/destroy -- and with those, UnsafeAtomicFoo does logically own its storage.

That said, I'm open to suggestions for alternative naming schemes!

Until/unless we get non-copiable types, I expect most synchronization constructs will use the same mechanics, so it would be nice to have a consistent naming convention for all of them.

This is indeed something I'm currently considering adding, to better support custom atomic-convertible types. (This would be in addition to, not replacing, the UnsafeAtomic types.) Stay tuned!

lorentey · March 24, 2020, 9:57am

Cont'ed!

I modeled this after UnsafeMutablePointer.allocate and similar APIs. I feel the allocation makes this API more than a mere initializer.

If an UnsafeAtomic value was made by calling create, I expect to see a call to destroy at the end of its lifetime. Having both of these as explicit function calls makes it easier to review code using them.

Agreed & updated. Note that the Ownership Manifesto currently uses the moveonly strawman keyword for its demo of non-copiable types.

Absolutely. In a revision or two, I expect to move the tutorial sections into API docs (and probably a standalone document somewhere under docs/).

However, I feel we need to be a lot more careful about how we document these APIs than most stdlib additions. API docs tend to get nowhere near as much scrutiny here as the APIs themselves -- and I really, really would like people to carefully read the tutorial sections, and to report if anything is amiss. Are the explanations easy to understand? Are there any misleading oversimplifications? Are there better illustrative examples?

As we add more API surface in upcoming revisions, I expect the document will be plenty long without the intro parts.

gribozavr · March 24, 2020, 10:00am

Thank you for the proposal! I have a couple of comments:

The proposed abstractions seem to allow atomic operations on non-atomic objects. Usually, the atomics abstraction combines the storage and the operations in one type, and ensures that users can't perform non-atomic accesses to the storage. The proposed abstractions don't guarantee that. I understand why this is designed like that -- some use cases would want to control precisely where their atomics are allocated and laid out in memory. However, I would want us, at this point, to avoid adopting a model where mixing atomic and non-atomic accesses to storage is allowed. Atomics are difficult as they are, and other languages don't have experience mixing atomic and non-atomic accesses to the same object. C++ only added std::atomic_ref just now in C++ 2020.

There are certainly valid use cases for mixing atomic and non-atomic accesses, however, I think we should gain more experience before adding such capability to the standard library.

How to fix: just specify that if a storage location is accessed with an atomic operations, all accesses except initialization must be atomic. We can relax it later.

Names. I wanted to echo the comments above that UnsafeAtomicInt is not an int itself, but it is a pointer. Consider UnsafeMutablePointerToAtomicInt (similarly for other types), or, if we adopt a "universal" atomic pointer type, UnsafeMutablePointerToAtomic<Pointee>.

UnsafeAtomicInt.create(initialValue:) should probably be an initializer on the type, and include the word "allocate" in the argument label, or it should be called "allocate" like the corresponding unsafe pointer method.

UnsafeAtomicInt.destroy() should probably reuse the "deallocate" terminology established by existing unsafe pointer types.

UnsafeAtomicLazyReference.initialize(to:) is a "maybe-initialize" operation, unlike other existing operations in the standard library called "initialize". I think this difference in semantics should be reflected in the name, like initializeIfNil(to:). Or should it even use the word "initialize"? The memory is already initialized to nil, this operation changes nil to non-nil -- we call that operation "store" or "set". So maybe setIfNil(to:)? Or even ifNilSet(to:), or ifNilStore(_:)?

Explore the feasibility of a "semi-universal" pointer-to-atomic type. I would very much like us to explore a design where the user must know about as few types as possible, and where atomics compose with existing standard library types, something along the lines of SwiftNIO, or the design that Jordan posted in this thread.

I completely agree with the concerns expressed in the "A Truly Universal Generic Atomic Type" section, and I agree that this type must provide lock-free guarantees.

I think there are two possible directions to explore:

Only allow composing a limited set of standard library types with this semi-universal pointer-to-atomic.
Allow a user-defined type to declare that it is "bitwise equatable", that is, that its Equatable conformance can be implemented by comparing the underlying bits. This way, a struct Point { var x, y: Int8 } could compose with such a semi-universal pointer-to-atomic type.

Nullability and mutability of UnsafeAtomicMutablePointer<Pointee>. I would prefer to see a more detailed argument why the pointee should be nullable. Similarly regarding mutability. If we can't make a convincing argument, then I think we should provide more variants of these types. It is important to note that adopting a generic type UnsafeMutablePointerToAtomic<Pointee> would allow us to avoid the question altogether and allow each user to specify whether they want a nullable/non-nullable pointer, mutable/non-mutable pointer.

Only providing wrapping semantics of arithmetic. I understand that there is a desire to provide direct access to the underlying CPU instructions that don't perform overflow checks. However, I think we should discuss providing both wrapping and trapping variants of atomic arithmetic, with trapping semantics being default.

No convenience operations? If there is a desire for the proposed abstractions to only reflect the underlying machine (from the list of goals "Every atomic operation must compile down to the corresponding CPU instruction (when available), with minimal overhead. (Ideally even if the code is compiled without optimizations.)"), then we should think about the place where the convenience operations will go (for example, trapping variants of arithmetic or some kind of closure-based variant of compare-exchange). If the proposed types are not the right place for convenience operations (because they may have more overhead and don't correspond to a machine instruction), maybe we should name these types something like "machine atomics", freeing up more ergonomic names for more user-friendly abstractions.

Explain in more detail what implementations are acceptable. You specified that a implementation must be lock-free. What about wait freedom? Regardless of what the answer is, the proposal should be explicit about the intent: whether a compare-exchange loop is an acceptable implementation is important to users.

Explore which of the proposed operations are implementable on supported architectures. It would be great if the proposal confirmed if the proposed operations are implementable, and if not, listed the operations that will be missing on certain architectures.

Exposing atomics as a separate module. I would like the proposal to discuss the pros and cons of including atomics into the standard library module vs. exposing them as a separate module. Some thoughts from me: adding atomics to the standard library means that they would be imported into every Swift program, most of which are not going to use atomics. On the other hand, putting atomics into a separate module means that the standard library implementation won't be able to use them. It is almost as if we want a "submodule" that is imported on demand.

Karl · March 24, 2020, 1:21pm

I'm happy to see us address this! It's a critical step for concurrency.

One thing I'm not sure about is the idea of having a new atomic pointer type. As the linked paper about C++ tearable atomics and atomic_ref indicate, there is not really such thing as an atomic memory location. Instead, there are atomic operations on regular memory locations. The paper summaries it quite well:

After all, this has been solved in non-C++ contexts. To assembly programmers, or to those used to memory models such as Linux’s memory model, the distinctions the Standard makes seems overly complex. Their code simply defines atomicity as a property of code rather than C++'s definition of atomicity as a property of particular memory locations during an object’s lifetime. Indeed, in assembly a memory location can be concurrently accessed with a regular non-atomic memory instruction as well as an atomic memory instruction.

So with that in mind, and seeing that the atomic pointers just wrap an UnsafeMutablePointer, why not expose this as a .atomic wrapper view directly on UnsafeMutablePointer? Then we could replace all the separate atomic pointer types as constrained extensions of a nested wrapper view:

extension UnsafeMutablePointer {
  struct Atomic {
    var address: UnsafeMutablePointer
  }
  var atomic: Atomic { Atomic(address: self) }
}
// Basic load, store, cas.
extension UnsafeMutablePointer.Atomic {
  func load(order: MemoryOrder) -> Wrapped { ... }
}
// Read-modify-write ops as applicable.
extension UnsafeMutablePointer.Atomic where Value == Int {
  public func wrappingIncrementThenLoad(
    by operand: Int,
    ordering: AtomicUpdateOrdering = .sequentiallyConsistent
  ) -> Int
}

UnsafeAtomicLazyReference could be difficult to express in this form without parameterised extensions, though.

This model would make it easy to change code to use atomics - you don't need to change types, and you are not forced to use them everywhere. If you have external synchronisation which guarantees accesses don't overlap, you can only use atomics for part of your processing. Writing x.atomic.load(.relaxed) rather than just x.load(.relaxed) is better for thinking about atomicity of code and not of types, and for informing others that some cross-thread signal is being sent/received IMO.

dfunckt · March 24, 2020, 2:58pm

The direct equivalent for x86 PAUSE is YIELD on ARM and it's what I've always used. I can't comment on whether the potential benefits of using the SEV/WFE outweigh the cost of the extra API surface -- others more knowledgeable than me would have to comment on that. It seems to me however that those instructions carry more semantic meaning (work in a sort of equivalent way to wait/signal on a semaphore) and could be added later.

ktoso · March 24, 2020, 3:07pm

Such design for low-level atomic feels a bit like a step backwards (edit: well, admittably no worse/better than the status quo in C I guess). And would cause us to go back to "scary name" variables to indicate "this one is only intended to be accessed atomically".

I'd much more be on board with Dmitri's proposal of not allowing "mixed" atomic/non-atomic accesses at all (at first) rather than making it way too easy to forget to write .atomic in a single access which will happen to work for a while and then come down burning (and while the error being simple, causing much time "staring at the code" in disbelief at first), because the API allowed me to write an clearly unsafe access to an "only touch this using atomic loads/stores" variable.

Atomic variables are not "just an Int, but thread-safe" but are often used to implement more complex data-structures (e.g. queues, concurrent caches or dicts) on top of them, which heavily rely on the memory visibility guarantees an access to such variable provides. As such, I'd really vote for keeping those things very explicit.

Joe_Groff · March 24, 2020, 3:44pm

I don't think we want to bake in a guarantee that a type's atomic storage is bitwise identical to its regular storage, since it would be nice if we could do a little bit better than C/C++ if we extend atomics to user-defined types by being able to add additional alignment, normalize padding bits to zero, or make other transformations necessary to be able to atomically update values of a type. I think the design Karoy proposed allows us to add a protocol later, since the atomic types mostly share an interface, but they aren't tightly coupled to their corresponding non-atomic storage type. We could add a protocol later to associate a type with its atomic pointer representation like:

protocol AtomicRepresentable {
  associatedtype UnsafeAtomic: UnsafeAtomicPointerProtocol
}

extension Int: AtomicRepresentable {
  typealias UnsafeAtomic = UnsafeAtomicInt
}

typealias UnsafeAtomicPointer<T: AtomicRepresentable> = T.Atomic

anandabits · March 24, 2020, 3:47pm

This should be taken with a grain of salt, because as I said in my earlier post I am no expert in this area. But what I do know about atomics leads me to believe that this is the best perspective. This is an expert feature, and one where even experts need as much help as they can get in writing correct code. What experts need most is to be clear about their intent, both when reading and writing code.

+1

I also agree with the comments from @gribozavr

scanon · March 24, 2020, 3:53pm

Since the decrement / increment operations use wrapping arithmetic, you can increment by 0 &- decrement.

Karl · March 24, 2020, 4:15pm

You would still have the option of using the atomic wrapper type if you wanted to ensure all accesses were atomic: var x: UnsafeMutablePointer<Int>.Atomic. It has the benefit of having a predictable spelling, too.

Ultimately, it comes down to thinking of code as being atomic and not of particular variables as being atomic. If you're not used to that way of thinking, it may seem strange to you that the type-system doesn't limit you to atomic ops.

As C++ shows, a type-based system has issues needing to opt-in and opt-out of those constraints. That suggests that it might not be a good lowest-level abstraction.

I agree about keeping things explicit.

In the context of "scary name" variables, though, it's quite rare even for complex data structures to worry about the interactions between a huge number of atomics. It shouldn't be necessary to give those kinds of names anyway, but I suppose it's a defensive move.

jrose · March 24, 2020, 6:10pm

What I don't like about the current approach is that it's necessarily going to leave out types, both smaller-than-word-sized integer types and parts of our panoply of pointer types. I do agree that a general AtomicRepresentable sounds nice, and I even started with that in my prototype; the main thing that kept me away from it was the interoperability with regular pointers that the address property (and init(at:)) implies. If we were willing to drop that, then you could easily hop in and out of an atomic representation (and even a separate optional representation). But I don't think that's the right tradeoff at this point, because it makes it harder to declare atomic storage using ManagedBuffer or something like that (because you need to write the storage type there, not the representable type).

jrose · March 24, 2020, 6:24pm

I fall in between @gribozavr and @Karl in terms of mixing atomic and non-atomic ops. While the C/C++ model of atomics is (mostly) about atomic memory locations, that's not the model that the actual CPU uses, so we get to decide if we think that's a good model. I liken this to separating the choice of wrapping vs. non-wrapping arithmetic from signed vs. unsigned integers. (In C, unsigned integers get wrapping arithmetic, always, and signed integers get "it's undefined behavior if you overflow" unless your compiler supports an alternate behavior.) I do actually think the current balance is better given the reference-semantics API it's going to have; a true atomic value model via property wrappers or whatever is going to take more thought and more compiler support than what Karoy has here.

What I don't think is a good idea is to make the atomic operations subordinate to UnsafeMutablePointer. That feels like it's making it too easy to mix atomic and non-atomic operations by accident (rather than intentionally), because you're able to pass around an UMP and get atomic operations whenever you want. Having a separate top-level type makes it a little more obvious that you're not supposed to keep converting from UMP to UnsafeAtomicWhatever. I guess it doesn't do anything to stop someone from going the other way via the address property, but that feels like it'd be less accidental; still, if we were worried about it we could make that an initializer on UMP too (init(addressOf: UnsafeAtomicWhatever)).

Joe_Groff · March 24, 2020, 7:10pm

OTOH, I don't see that it prevents us from adding UnsafeAtomicInt8 or UnsafeAtomicDoubleWord or other forms in the future.

To support that in the general case, I think we'd need the atomic-representable types to provide size and alignment and/or a suitable type with the correct properties to be able to allocate space of the right shape.

lorentey · March 24, 2020, 7:36pm

Quick note: although it’s not part if the pitch document, the implementation already supports all integer sizes, not just Int/UInt. While most bit widths can be considered inessential, atomic Int64/UInt64 values cannot be emulated by Int/UInt on 32-bit systems. We will also need to add double-wide atomic primitives at some point, most likely sooner rather than later.

Support for all bit widths is currently implemented by having a dedicated UnsafeAtomic type for every built-in integer type. I don’t think this is the right approach, which is why I omitted these from the first text. I expect to introduce a protocol for primitive “atomicable” types and some sort of generic UnsafePrimitiveAtomic to replace the standalone atomic integer types.

This approach will have trouble representing atomic pointer/reference types due to our current lack of support for parameterized extensions. So I expect those will either continue to remain standalone types, or (less likely) they will hook into a facility for custom atomic-convertible types.

glessard · March 24, 2020, 8:31pm

Thanks for working towards an official approach to atomics.

A few comments on the pitch, the implementation and other commenters's comments:

Personally, I have had uses for non-optional atomic pointers. Queues with a dummy node come to mind. It's not nothing, I guess. I would prefer to be able to select the most appropriate pointer type for each application.
I am doubtful that there is another safe use of UnsafeAtomicUnmanaged.load beyond the one used in the pitched lazy reference. As I commented on github, load and store on that UnsafeAtomicUnmanaged perhaps shouldn't exist at all. The various exchange methods are sound, because one can ensure the refcount doesn't get decremented from 1 to 0 in a racing thread.
I agree with the atomics-only-storage requirement recommended by Dmitri, or at least atomics-first storage. It seems to me that adding special-case non-atomic operations on otherwise exclusively-atomic memory locations is a better approach than adding atomic operations to non-atomic memory locations. When it's too easy to interleave atomic and non-atomic code, it gets messy.
If there is a way to group atomics in a way similar to the SIMD types, it would be much nicer than a panoply of new type names. Given that the prefix would be UnsafeAtomic, the good name wouldn't be taken away by doing that.
Int64/UInt64 should be in the initial version, because Int is often insufficient on a 32-bit platform.
I don't understand the purpose of having both load-then-atomic-rmw and atomic-rmw-then-load operations, especially when the implementation of the latter is based on the former:

public func ${lowerFirst(name)}ThenLoad(
  ${label} operand: Value${defaultValue},
  ordering: AtomicUpdateOrdering
) -> Value {
  loadThen${name}(${argLabel(label)}operand, ordering: ordering) ${op} operand
}

Having both seems like clutter to me, since in the majority of cases one can readily express the algorithms using either approach. When it's not the case, doing the operation quoted above in user code is easy enough, and I'm sure the optimizer will do a good job of it.

Finally, if all this functionality must only be used as inline code, why does it need to be gated by ABI version?

ravikandhadai · March 24, 2020, 8:41pm

Are there any concerns that the type checker and optimizer's interpretation of what is constant-evaluable would diverge over time?

That is a very valid concern. But let me first clarify that there is no, general, language-level concept of constant evaluability at this point. The "constant evaluation" machinery exists in the compiler currently as a SIL-level utility that is completely internal to the compiler, which the compiler can use to improve diagnostics/optimization etc. The constant evaluator can change, expand the fragment it can analyze etc.

But, even if there is no language level concept, it does help to not have very diverging interpretation of "constant evaluability" in the compiler, as you are concerned.

The idea here is that the Sema check would be so restricted (and over-fitted to these APIs) that they would fit within any reasonable notion of constant evaluability. Moreover, in my opinion, to support any future language-level notion of constant evaluability, we do require some Sema-level checks to detect and diagnose non-constant evaluable fragments reliably. The nature of the problem is such that the SIL level constant evaluator alone would not be sufficient (or would be inefficient) to provide a good user-model for writing constant evaluable expressions.

Given that the enforcement isn't proposed as a user-accessible feature, I wonder if it makes sense to just enforce that the switch gets optimized away instead of trying to analyze the passed-in argument. Maybe some kind of private attribute like @_staticAssertNoDispatch which could be applied to a switch to assert that the corresponding branch/lookup is removed during optimization?

The biggest problem there is that if the switch is not optimized, it would be difficult to diagnose what is the actual error the user code had. The Sema check would let you diagnose which argument was not constant, and emit errors at the Source location of the call.