SE-0282: Low-Level Atomic Operations

lorentey · April 16, 2020, 5:54pm

Moving the create/destroy use case to standalone classes would crystallize the UnsafeAtomic types to the pointer-like use case, removing my major objection against considering them as such.

gwendal.roue · April 16, 2020, 6:24pm

What has changed since the below message from the pitch thread?

lorentey · April 16, 2020, 6:27pm

Nothing! The classes are additions to, not replacements of, the UnsafeAtomic types. Authors of high-performance concurrency libs like Konrad will just ignore them, like they would ignore create/destroy.

gribozavr · April 16, 2020, 6:32pm

I'm sorry that it hurt you -- I didn't intend to. I understand the proposal is long, but it is a complicated topic with lots of things to consider; and you are doing a great job navigating this space, designing and implementing APIs and managing the feedback!

Mordil · April 16, 2020, 6:38pm

I'm +0.9 on the proposal, which I followed early in the pitch phase but then fell away as it got deeper into the weeds of atomics. My experience with them is limited to what's provided by SwiftNIO and only a handful of places is it used in my code.

The remaining 10% is in regards to the points already brought up by Chris and others about

the API Design
the Atomics module

RE: #1 - I'm on board with the reasoning behind create/destroy methods, but several other method names and argument labels feel like they don't adhere to the Swift API Design guidelines.

RE: #2 - I'm mostly wanting to see something (either from the Core Team, or in the proposal) outlining what necessitates adding new explicit modules like this in the future, as everything so far is in a global namespace

Joe_Groff · April 16, 2020, 6:48pm

In newer Swift compilers, accessing instance properties in a class held in a let ivar should also not usually incur any ARC traffic in optimized builds.

Chris_Lattner3 · April 16, 2020, 10:26pm

Hi all,

In this response, I want to address the primary point I brought up in my review - "should this be a struct, a class, a bag of static methods, or something else?" This is a response to these sorts of comments:

Specifically I want to underscore this fundamental point that Dmitri brought up, which I think is the crux of the issue:

In my opinion, the UnsafeAtomic type (as proposed) is not a suitable replacement (safe or unsafe) for std::atomic. The helpers to provide out-of-line allocation were a huge distraction for me, because I thought that was the goal of the type. On the other hand, Joe is right that the proposal as written can solve a broad range of use cases, but I still think it does so poorly and - to repeat a point Dmitri already made - I think this is confusion about the role of the type, trying to be a bridge to a future, all in one go.

For sake of this post and to further the discussion, let's take the malloc/free helpers out of discussion - just pretend they aren't there.

With the helpers subset out of the proposal, UnsafeAtomic is a wrapper around an UnsafePointer, providing methods like:

  // The proposal.
  ua.loadThenWrappingIncrement(by: 1, ordering: .sequentiallyConsistent)

This is biasing towards a use-case where you want an at-rest storage of an UnsafePointer, at the cost of making computed pointers more awkward to work with. This makes the computed case more annoying, because you have to write it as:

  // The proposal.
  UnsafeAtomic(at: some expression).loadThenWrappingIncrement(by: 1, ordering: .sequentiallyConsistent)

This is what I mean when I say to Joe that "everything is possible with this proposal, it is just unnecessarily awkward." I find this to be very problematic, and I agree that the root of the issue is a confusion between the use-case this is trying to serve.

Furthermore, I'd like to reunderscore the point that efficient use of atomics critically relies on knowing about cache locality, false sharing, and other low level issues. You don't just malloc an int and atomically access it - this is going to lead to all sorts of unpredictable performance and cliffs. To be clear, this complaint is true of both the proposal's approach and of the class-based approach I suggested exploring. I think that both would be a bad door to open in this proposal, and recommend we do neither of them.

In my opinion, the solution to this is to stop trying to provide a C++-like API, and provide a C-like procedural API instead. I believe that this can be better for the "today case" served by the proposal, as well as a better bridge towards the future.

Counterproposal

To be specific, I recommend that this proposal expose a resilient Swift struct named Atomic that has no initializer and no public instance methods (in this round of proposal), but does provide static methods that correspond to the C atomics builtins. The two examples above would be written as:

  // This counterproposal.  The <Int> can be inferred, I include them for clarity.
  Atomic<Int>.loadThenWrappingIncrement(address: uaptr, by: 1, ordering: .sequentiallyConsistent)
  Atomic<Int>.loadThenWrappingIncrement(address: some expression, by: 1, ordering: .sequentiallyConsistent)

This lower-level API solves the stated problem in the proposal of exposing the low level compiler builtins that only the standard library can have access to, without leading to the bad use-case (mallocing an int and atomically accessing it). Following precedent in other standard library APIs, we don't even need to mark these as unsafe, because they take UnsafePointers as arguments.

The nice thing about this is that - in a future proposal when the language is more fully baked with move semantics etc - we can fill in the Atomic type that provides the safer move-only value type that we want to eventually get to, maintaining a single and consistent API that gels together.

-Chris

Chris_Lattner3 · April 16, 2020, 10:28pm

The best response to this is Dmitri's point here:

While it appears to be a nice little helper, it really does deserve its own clear motivation and discussion, I don't see that happening while the larger design points are still up in the air.

-Chris

glessard · April 16, 2020, 11:09pm

I don't see how this counterproposal changes what you call the bad use case. If the API relies on UnsafePointer it will still lead to "mallocing an int and atomically accessing it", because that is simply the only way to get a pointer in Swift right now. That bad use case cannot be designed out with the language as it exists now; it can only be solved by users profiling their code and having the awareness of possible performance cliffs.

lorentey · April 17, 2020, 10:23pm

I believe blowing the Atomic name for this would be a mistake. I don't see an upgrade path from a resilient Swift struct to a @frozen move-only type.

This seems like an irresponsible move to me. The static pointer-based methods aren't exposed as public API because they're practically impossible to use correctly in the current language. (This is also true for UnsafeAtomic, but at least the initializer's @_nonEphemeral annotation provides a compiler warning for the most egregious forms of misuse, and UnsafeAtomic provides a clear path for plugging into a potential @RawStorage feature.)

I spent a day or so on a bit of soul searching. It seems like we're stuck with two equally bad options:

Managed atomics via classes. These would be memory safe, but come with refcounting and heap allocations for every atomic variable, so they aren't fit for purpose.
Unsafe atomics via UnsafeAtomic or naked pointer APIs. These ostensibly allow inline storage, but given that Swift does not currently provide a sensible way to create such inline storage, providing these right now would be extremely irresponsible.

The proposal is as complex as it is because it is trying very very hard to shoehorn the library-level parts of @RawStorage from Exposing the Memory Locations of Class Instance Variables into the atomics feature set. This is why there is so much pushback on the UnsafeAtomic construct -- it works toward a vision that the Core Team has not yet accepted as correct.

So I think the only right option is to table this proposal for now, and return to it when we've decided how we want to provide stable memory locations for atomics and similar lightweight synchronization constructs.

The work that has gone into this so far won't go to waste -- the atomic operations themselves and the protocol hierarchy looks right to me, and I'm pretty sure we can just apply them to whatever we come up with. Removing the tedious memory management discussions will also free up precious space in the proposal to round out the atomics feature set with additional constructs, including hugely important abstractions for lock-free data structures (atomic strong references, versioned atomic values), and inessential additions like atomic floating point and boolean operations.

If we like, we can use the rest of this review period to discuss the parts of the proposal that are about actual atomics.

Chris_Lattner3 · April 18, 2020, 12:11am

I agree with you if that is true, we should call it something else. I'm curious though: why wouldn't we be able to define this as a frozen move only type in the future if it were left fully resilient and had no public instance members?

The inability to do that seems like a huge hole in the resilience model if true.

I don't think this is true. You're right that there is an issue with & on an ivar according to Swift's abstract machine model, but there are lots of other places to obtain pointers, including malloc() as the proposal observes. Moreover, but the proposal as written and the "things are static methods" approach has the same problem with the machine model.

I agree with you that those two options seem like the only reasonable ones, and I also really dislike the class-based model now that I've had some time to think about it.

I'm curious though, why not provide access to the static method APIs? This is effectively a C-like interface. Adding them would be incredibly valuable to Swift today, and is a bridge to the future. If you're worried about taking the Atomic name, then we can squirrel them away somewhere else - we could even just provide the AtomicValue protocol and the conformances to it, and have people access it directly. For a low level feature, the sugar is just a "nice to have".

Joe_Groff · April 18, 2020, 12:15am

One issue I can see is that any type you declare today is going to be assumed copyable, but maybe that's not a practical concern if there are never any values of the type accessible to programs. More practically, having a type already named Atomic compiled into binaries today might also impede our ability to back-deploy a different "real" implementation of the type with the same name using a static library shim or other tricks.

lorentey · April 18, 2020, 1:01am

One problem is that this feels like a bridge that we would probably want to burn once we have crossed it. Once we have working move-only atomics, why would we need to continue exposing the standalone pointer operations? This cuts UnsafeAtomic too -- it provides essentially the same model as the static pointer-based methods, just rearranged slightly in preparation for an interim solution with @RawStorage. I think UnsafeAtomic is the better approach, because it allows us to build a (wobbly) bridge into a scaled-back version of the move-only future sooner. However, we should have a standalone discussion on this plan!

The static methods are right there in the _PrimitiveAtomic protocol -- we could expose them directly.

let value = Int._AtomicStorage._atomicLoad(at: ptr, ordering: .relaxed)

However, this highlights the problem of separating the logical atomic values from their storage representation. Allowing these two things to diverge and (more or less) transparently converting between the two is one of the useful things UnsafeAtomic does behind the scenes.

(This is less important for integers, but it's rather essential for (optional) pointers, and it's especially tricky to get right for atomic strong references such as AtomicLazyReference (and the eventual double-wide atomic reference implementation).)

lorentey · April 18, 2020, 3:56am

We can choose to have the @RawStorage discussion right here on this thread, too, of course. (It feels inline storage is a prerequisite of usable atomics, but it isn't really part of the proposal text, and it seems largely orthogonal to atomics in general. I worry that all this talk on memory management has derailed the review, and it is scaring away folks who would want to discuss the minutiae of atomic operations.)

The problem is that we want to carve out storage space within class instances to store atomic values. To do this, we need a way to reliably retrieve the address of their memory location.

In a nutshell, I see three potential approaches:

Fix the & conversion somehow to keep the syntax everyone is trying to use right now
Add keypath-based access to storage locations, such as the MemoryLayout.unsafeAddress(of:in:) method in the original addressable ivars pitch
Add a magical attribute-based solution like @RawStorage

Let's quickly go over these one by one. One way to try fixing & would be to introduce an attribute to reject cases where the inout-to-pointer conversion doesn't use a direct storage location, such as @stableStorageLocation below:

extension Int {
  struct AtomicStorage { ... }
  static func atomicLoad(
    at address: @stableStorageLocation UMP<AtomicStorage>, 
    ordering: AtomicLoadOrdering
  ) -> Int
}

Int.atomicLoad(at: &someComputedProperty, ordering: .relaxed) 
// error: 'atomicLoad' needs directly addressable storage for 'address'

(This is in the same ballpark as the @_nonEphemeral attribute we have right now, but it produces errors, not warnings, and it allows the use of & when it happens to generate a pointer that is "safe" to escape.)

The problem, of course, is that the & syntax implies that the variable is being mutated, and write access conflicts with atomic access:

class Counter {
  var _value: Int.AtomicStorage
  
  func load() -> Int {
    // Blatant exclusivity violation: 
    // atomic access overlaps with a write access
    Int.atomicLoad(at: &_value, ordering: .relaxed) 
  }
}

I think this rules out &; saving it would require major surgery on the law of exclusivity. (We could try forcing it by saying that the inout-to-pointer conversion completes the write access before the function call begins, but that would be unlike how regular inout arguments work, and I think it would just lead to even more confusion.)

The MemoryLayout.unsafeAddress(of: \._value, in: self) idea in #2 above could get rid of the exclusivity violation, since we're free to define what sort of access (if any) it entails. However, the (very reasonable) feedback on the pitch thread was that this would be far too dangerous -- it would allow code to circumvent exclusivity checks on any ivar by simply switching to accessing it through unsafe pointer operations. So we should rather go with an opt-in approach, where ivars would be explicitly annotated with some attribute that exposes their storage location.

#3 is the obvious next step on that path: it hides the actual mechanics of extracting and passing around pointers behind an attribute that works a bit like property wrappers:

// This probably wouldn’t actually be a protocol; rather it would be a 
// compiler-enforced “shape” like @propertyWrapper.
protocol RawStorable {
  associatedtype Storage: RawStorage // comes with init(_:) and dispose()
  init(at: UnsafeMutablePointer<Storage>)
}

extension UnsafeAtomic: RawStorable {}

public class Counter {
  @raw var _value: UnsafeAtomic<Int>

  // _value is a computed property of type UnsafeAtomic<Int>
  // $_value is the underlying ivar of type UnsafeAtomic<Int>.Storage

  init(_ initialValue: Int) {
    $_value = .init(initialValue)
  }
  deinit {
    $_value.dispose()
  }
  func load() -> Int {
    _value.load(ordering: .relaxed)
  }
  // ...
}

A slightly (?) more elaborate version of this would hide $_value and let the compiler autogenerate the boilerplate-y initialization and disposal of the backing storage:

public class Counter {
  @raw var _value: UnsafeAtomic<Int>

  init(_ initialValue: Int) {
    _value = initialValue   // Note the weirdly mismatched types
  }
  // a call to dispose() is generated by the compiler at the end of deinit

  func load() -> Int {
    _value.load(ordering: .relaxed)
  }
  // ...
}

Either of these last two approaches get us some of the practical benefits of move-only types without having to wait for their implementation. When Atomic<Int> becomes a thing, my hope is that code using UnsafeAtomic this way can simply upgrade to that with a simple, (more or less) mechanical migration step:

public moveonly struct Counter {
  var _value: Atomic<Int>

  init(_ initialValue: Int) {
    _value.init(initialValue)   // How exactly are we going to spell this?
  }

  func load() -> Int {
    _value.load(ordering: .relaxed)
  }
  // ...
}

I hope this explains why I’m against naked pointer-based methods like Int.atomicLoad(at:ordering:). We will need to tackle inline storage soon, and naked pointer APIs won’t fit into the most likely design for that. If we introduce them now, we will end up also introducing something like UnsafeAtomic later, and then we’d have two separate unsafe APIs for the exact same thing.

Of course, unsafe pointer-based atomics (either through UnsafeAtomic or direct pointer apis) have some reason to exist on their own right, even after we introduce Atomic. They also interoperate with manually malloced dynamic variables, ManagedBuffer, withUnsafeMutablePointer(to:), pointers coming from C, and any of the other weird & wonderful ways people may get hold of pointer values. UnsafeAtomic has an additional long-term benefit — I expect that retrieving the address of ivars within move-only types will be similarly difficult, so the eventual move-only Atomic type will most likely still use @raw and UnsafeAtomic in its internal implementation.

lukasa · April 18, 2020, 10:03am

I share @lorentey’s concern here. A surprising amount of my time is spent policing the pointer management code of programmers who do not understand the way pointers work in Swift. Some of this is simple (UnsafePointer(&x)), some is trickier (pointer lifetime management).

This proposal would add another rule: any pointer vended by a Swift CoW struct is ineligible for being used as an atomic. This is for multiple reasons: it likely violates the rule of exclusivity, and even if it didn’t the memory location is not stable across the required multiple-ownership state needed here.

So far as I know there are only two safe places to get a pointer that can back an atomic from today: malloc and ManagedBuffer, as well as their spiritual cousins and indirections to them (i.e. memory allocated by C libraries, maybe). Have I missed some other source? If not, why not try to discourage using the many other ways to obtain a pointer that will lead to either subtle or not-subtle breakage?

Chris_Lattner3 · April 19, 2020, 4:42am

Hi Karoy,

I think the best way to handle this is to expose the static members on the protocol (and provide conformances of standard library types to it). This is all that is required in this step to achieve your goal laid out in the motivation of the proposal:

These new primitives are intended for people who wish to implement synchronization constructs or concurrent data structures in pure Swift code. Note that this is a hazardous area that is full of pitfalls. While a well-designed atomics facility can help simplify building such tools, the goal here is merely to make it possible to build them, not necessarily to make it easy to do so. We expect that the higher-level synchronization tools that can be built on top of these atomic primitives will provide a nicer abstraction layer.

This avoids all of the questions about how best to expose the user-facing functionality, while providing the core abstraction required for people to start experimenting with it. We can standardize one or more of the user-facing APIs once we have implementation and usage experience with them. This becomes possible when the core mechanics are available to general Swift programmers, which the protocol does.

WDYT?

-Chris

lorentey · April 19, 2020, 7:32pm

These interfaces do not make good public API, even tucked away as they would be in an obscure module. The time is definitely right to carefully expose the inherent complexities of atomics, but pointer-based atomics introduce an unconscionable amount of extra complications on their own. To use these correctly, one has to be an expert at both atomics and the (underdocumented) Swift execution model — and as this thread has clearly demonstrated, mistakes will slip in even then.

I’m happy to do a cleanup pass on the pointer-based atomic methods to make sure that they are usable for the handful of people who may be able to responsibly use them; but I strongly believe these need to remain underscored.

There is but one way to expose atomics that actually fits well in the language we have today, and that’s class ManagedAtomic<Value>. Going with that as the single public atomic construct will considerably simplify and focus the proposal, letting it concentrate on atomics.

Class-based atomics will work as an excellent stand-in for Atomic<Value>. The heap allocations will limit their usefulness, but as always, the responsible choice is to prefer correctness to performance. And, as Joe aptly observed, managed atomics generally won’t incur ARC traffic during actual use; the overhead is mostly limited to init/deinit.

Sounds good?

Chris_Lattner3 · April 20, 2020, 6:06pm

I think you're conflating two very important and very different things:

Atomics are fundamentally UnsafeMutablePointer based. There is nothing involving extra complications - this is the inherent complexity of how these operations work, e.g. at the LLVM IR and C levels of abstraction.
Swift has completely separate parts of the language that convert some things to UnsafePointers in places that may have dangerous or unexpected lifetime implications, e.g. the & on an ivar example discussed upthread.

You see very focused on #2, but there is nothing you can do in an atomics API that "solves" #2 completely, there are just different ways of attempting to sweep the issue under the rug (and I don't think the original proposal was particularly successful at this). The right way to solve #2 are new language mechanics that are completely orthogonal to atomics.

I think that #1 is completely solvable, extremely valuable, and very important. That is why I'm recommending that this proposal focus on it.

-Chris

lorentey · April 20, 2020, 7:18pm

Hey Chris,

APIs don't exist in a vacuum. It must be possible to productively use them to solve real-life problems.

Can you please show me a piece of code solving some toy problem that uses pointer-based atomics without separate heap allocations for every atomic value? I've been looking at this for (on and off) half a year now, but I have so far failed to make an example that isn't broken, ridiculously overcomplicated or both. (There is a reason why the proposal doesn't show how to implement inline storage through ManagedBuffer or MemoryLayout.offset(of:).) Something like the proposal's silly lock-free single-consumer stack should be enough to illustrate how these APIs will work in practice.

I believe ManagedAtomic is the sweet solution that lets us move ahead until the language matures. Again, for the two or three full-time Swift engineers who think they may able to directly use pointer-based APIs, they will be available as public-but-underscored methods. Trying to document how they work is a fool's errand at this point.

There is plenty of room to expose pointer-based APIs in followup proposals, as soon as it becomes possible to responsibly do so.

Thanks,
Karoy

Joe_Groff · April 20, 2020, 8:18pm

My concern with adding class-based atomic types is that it puts us on a track to committing to three different APIs for the same thing—the "unsafe" low-level API, a stopgap class-based API, and a future safe move-only API. The unsafe API has reason to exist in the future, if nothing else as a mechanism for implementing the move-only types, though I think it will remain interesting even when move-only atomics exist, for more specialized cases that don't fit cleanly in the confines of the safe model. The use cases for a class-based API are at best questionable today, because of the performance concerns with any doubly-indirected design that Chris raised, and I think they would evaporate pretty much completely once move-only atomics exist. A ManagedAtomic class might be a great package to ship on top of the low-level atomics API, but I don't think it belongs in the long-term API of the standard library.