SE-0410: Atomics

Joe_Groff · October 23, 2023, 4:57pm

Hi everyone. The review of SE-0410: Atomics begins now and runs through November 6th, 2023.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to me as the review manager by email or DM. When contacting the review manager directly, please put "SE-0410" in the subject line.

Trying it out

An implementation of this proposal is provided in these custom built toolchains:

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at:

https://github.com/apple/swift-evolution/blob/main/process.md

Thank you,

Joe Groff
Review Manager

ajo · October 23, 2023, 5:34pm

I’m very happy to see this proposal! +1

I’ve read through it and it will be a nice addition to the language, especially the RawRepresentable-support was

One edge-case I didn’t see covered is how this will play with e.g. SIMD-types that fit in 64-bits, e.g SIMD8<UInt8>?

Alejandro · October 23, 2023, 5:45pm

SIMD is a hard one to do because a type can only conform to a protocol once and we'd need different atomic representations depending on the scalar type.

extension SIMD2: AtomicValue {
  // When Scalar == Int8 || Scalar = UInt8
  public typealias AtomicRepresentation = Int16.AtomicRepresentation

  // When Scalar == Int16 || Scalar == UInt16
  public typealias AtomicRepresentation = Int32.AtomicRepresentation

  ...
}

It could in theory do something like:

extension SIMD2: AtomicValue {
  public typealias AtomicRepresentation = TwoOf<Scalar.AtomicRepresentation>
}

(TwoOf is not real, just expressing an example. We'd need FourOf, EightOf, etc for all the various sizes)

which would double the storage of whatever scalar's representation is, but it would be difficult to express the alignment guarantees of the type considering we don't know the size/alignment of the containing type.

(Although, now that I think about we could go pretty extreme here and do something like:

protocol AtomicValue {
  associatedtype AtomicRepresentation
  associatedtype TwoAtomicRepresentation = Never
  ...
}

extension UInt8: AtomicValue {
  typealias AtomicRepresentation = ...
  typealias TwoAtomicRepresentation = ...
}

extension SIMD2: AtomicValue where Scalar: AtomicValue {
  typealias AtomicRepresentation = Scalar.TwoAtomicRepresentation

  ...
}

but this would add a level of complexity and complications to the AtomicValue protocol a bit I feel.)

grynspan · October 23, 2023, 5:49pm

This is a fantastic addition and something I know we've got use cases for in swift-testing. Thanks for taking on this work!

I think the names of the various member functions on Atomic could do with a bit of finessing, as they don't really follow the typical API naming patterns. For example:

  public borrowing func compareExchange(
    expected: consuming Value,
    desired: consuming Value,
    ordering: AtomicUpdateOrdering
  ) -> (exchanged: Bool, original: Value)

This might be better-named:

  public borrowing func compareAndExchange(
    expecting: consuming Value,
    desiring: consuming Value,
    orderedAs: AtomicUpdateOrdering
  ) -> (exchanged: Bool, original: Value)

So it reads:

... = x.compareAndExchange(
  expecting: 10,
  desiring: 20,
  orderedAs: .relaxed
)

Or some such. (I suspect others on the forums can come up with some good suggestions here!)

I might have missed it, but I don't think I saw an explanation as to why AtomicOptionalWrappable must be a separate type from AtomicValue? Seems like it could also be implemented as a specialization of AtomicValue?

scanon · October 23, 2023, 6:19pm

I think in this case I would define a struct with a single SIMD8<UInt8> member and then conform that struct to AtomicValue with an AtomicRepresentation = UInt64.

Alejandro · October 23, 2023, 8:49pm

Hi all, I built a toolchain with these changes that you can try here:

macOS: https://ci.swift.org/job/swift-PR-toolchain-macos/909/artifact/branch-main/swift-PR-68857-909-osx.tar.gz

Ubuntu 20.04 (x86_64): https://download.swift.org/tmp/pull-request/68857/620/ubuntu2004/PR-ubuntu2004.tar.gz

wadetregaskis · October 23, 2023, 9:42pm

Good to see this coming about.

Non-functional fences

I think I mentioned this in an earlier thread, but I'm still not thrilled with the ability to write atomicMemoryFence(ordering: .relaxed) which is effectively a no-op. I don't think it's a big deal, but it is a slightly sharp edge that could confuse newbies (understanding the C++ memory ordering model is hard enough as it is).

Documentation

It'd be great to see a more human-readable version of the explanation on memory ordering as part of adding this to the standard library, in the documentation (perhaps the Swift Reference Guide?). Not only is it preferable if folks learning about atomics can do so without referring to another language's documentation, it'd be great if Swift's version of that documentation could be the best out there.

Atomic weak references

The inclusion of AtomicLazyReference is excellent. It's one of my most common use-cases for atomics, other than atomic integers.

It could perhaps benefit from a throwing store method, so that callers can more easily write preconditions (e.g. value.store(x)!) or otherwise fail (e.g. try value.store(x)).

However, what about the variant case where you want to be able to write the value multiple times? e.g. for a 'most recent value' cache (of sorts). I don't need this often but it has come up occasionally (and might be more frequently used with a nice high level abstraction, rather than having to drop into C).

Before I read to that part of the proposal, I actually spent a bunch of time trying to figure out how to derive this from the earlier primitives (e.g. Atomic<Unmanaged<T>>) but ran into challenges exactly as the proposal mentions regarding reference counting. It makes me wonder whether the inclusion of Unmanaged (perhaps among others?) in the atomic world should be made private, given how difficult it is to use correctly.

Unsupported architectures

Will the compiler specifically diagnose unsupported-only-on-the-target-architecture atomic types? e.g. a simple "UInt64 does not support atomic operations on i386" rather than whatever vaguer message it would if left to the default e.g. some "does not conform to protocol" diagnostic?

Actual instructions for memory fences

If I understand correctly, then with atomicMemoryFence the various arguments map as:

AtomicUpdateOrdering	Instruction	Unintended side-effects
relaxed	None	No actual function.
acquiring	dmb ishld	Prevents earlier reads crossing the barrier.
releasing	dmb ish	Prevents later reads & writes crossing the barrier.
acquiringAndReleasing	dmb ish
sequentiallyConsistent	dmb ish

Is that correct? I'm basing it on Table B2-1 in the ARMv8 ARM.

I'm assuming use of the Inner Shareable domain because that seems to me like the most commonly applicable one for user-space code, because per the ARMv8 ARM:

The Inner Shareable domain is expected to be the set of PEs controlled by a single hypervisor or operating system.

The unintended side-effects appear to be because ARMv8 doesn't provide more nuanced modes for dmb…?

How do I get a dmb ishst instruction?

Beyond basic memory barriers

I suspect the answer is that it's beyond the chosen scope (which is fine), but has consideration been given to other types of barriers? e.g. speculation barriers (sb / csdb / ssdb / pssbb), sync barriers (dsb), instruction barriers (isb), etc. I'm not all that familiar with most of them, but as I understand it they are intended for userspace code (instruction barriers for runtime codegen, for example, and speculation barriers for cryptographic or otherwise security-sensitive code).

Acquire-release pairing (etc)

A lot of the time the nature of accesses is intrinsic in the variable, e.g. it's always relaxed, or you always read acquiring and store releasing, or it's always sequentially consistent. Should that be codified in the types themselves, e.g. Atomic<T> (meaning .relaxed), OrderedAtomic<T>, TotallyOrderedAtomic<T>, or somesuch?

It'd be nicer to use, then, because you wouldn't:

Have to specify the ordering on every access (and therefore could use standard operators etc), and
Couldn't accidentally screw it up by using the wrong ordering in an individual access (e.g. Xcode auto-completed .relaxed when you meant .releasing, and you didn't notice).

Maybe this is what is alluded to when the proposal says it's expected that higher-level data types are expected to be built atop these primitives?

At the end of the proposal, it does discuss this, although some of the arguments aren't persuasive to me.

This design puts the ordering specification far away from the actual operations -- obfuscating their meaning.

Well, only insofar as it does for putting the variable's type, or whether a variable is 'weak' (vs strong), etc. I don't think the ordering method needs to be mentioned at every single use any more than the variable's type needs to be. Like the type of the variable, or its reference semantics, it's (usually) more tied to the nature of the variable than specific uses of the variable.

It makes it a lot more difficult to use custom orderings for specific operations…

Only if it's the only way to do it. I think it's acceptable to have an optional ordering argument in this scenario, so you can override the default behaviour on a case-specific basis. It does introduce a risk of confusion, between the type declaration and its usage, but IMO it's worth it. But if that is a real bother to folks, then there could always be two layers - one of primitive "Atomic" and one above that. It does make the API surface nominally much bigger but conceptually it's only slightly more complicated.

Alternatively.:

let foo: Atomic<UInt64>(0, defaultOrdering: .acquireAndRelease)

The trick with the above is you'd want the corresponding ordering parameters to be omitted (or become optional) only if a default ordering was specified at the declaration site. To my knowledge Swift (the language) doesn't have a way to express that, currently.

Ordering views

FWIW I like that proposed alternative, counter.relaxed.wrappingAdd(1) etc.

I think the resulting code reads well, and it might also make it possible to address the prior point by allowing users to stash the view in a variable instead of the base atomic (assuming the view retains its base object) - which'd also allow a user to choose to enforce a specific ordering by only storing that ordering's view, and throwing out all accessible references to the base atomic value.

Re. some of the other counter-arguments listed in the proposal:

Composability. Such ordering views are unwieldy for the variant of compareExchange that takes separate success/failure orderings. Ordering views don't nest very well at all:

counter.acquiringAndReleasing.butAcquiringOnFailure.compareExchange(...)

I disagree, even with that specific example. It reads easily and clearly.

API surface area and complexity.…

Granted it's easy for me to say as not the implementor of this functionality, but the API explosion seems worth the benefits. We're talking about the Swift standard library here - it's really widely used and so the trade-off between implementation challenge and user experience should be weighted way towards user experience.

The documentation explosion is a minor annoyance - it's true it'd nominally multiply the volume of documentation, but it'd have little impact on the actual conceptual complexity of the API - folks will pretty quickly realise the API for the views is the same; they're not likely to mistakenly read the redundant copies.

Unintuitive syntax. While the syntax is indeed superficially attractive, it feels backward to put the memory ordering before the actual operation. While memory orderings are important, I suspect most people would consider them secondary to the operations themselves.

I don't think it's unintuitive. Unconventional, perhaps.

And if you consider my prior point about the ability to save the view rather than the base object, this approach actually gives you a way to abstract the ordering away from individual use sites, if you want.

Limited Reuse. Implementing ordering views takes a rather large amount of (error-prone) boilerplate-heavy code that is not directly reusable. Every new atomic type would need to implement a new set of ordering views, tailor-fit to its own use-case.

Every additional type or set of equivalent types? I would have thought you could have a single view [for each ordering] for e.g. all atomic integers, genericised over the specific integer type… no?

English sucks

compareExchange doesn't make sense (for the intended meaning) in English, strictly speaking. It parses as "compare Exchange", whatever the noun "Exchange" means. Any particular reason to omit the And, or to not phrase it conditionallyExchange or somesuch?

`weakCompareExchange`

It's not clear [to me] from the method's documentation what this really means; why I might want this over the [regular] compareExchange. The docs say only:

(In this weak form, transient conditions may cause the original == expected check to sometimes return false when the two values are in fact the same.)

What's missing is an explanation of why those false negatives might occur, which might be important for the caller to understand at least for performance (e.g. if there's deterministic factors, could this cause an infinite loop?).

The proposal itself does provide more insight:

The weakCompareExchange form may sometimes return false even when the original and expected values are equal. (Such failures may happen when some transient condition prevents the underlying operation from succeeding -- such as an incoming interrupt during a load-link/store-conditional instruction sequence.) This variant is designed to be called in a loop that only exits when the exchange is successful; in such loops, using weakCompareExchange may lead to a performance improvement by eliminating a nested loop in the regular, "strong", compareExchange variants.

That should be in the actual "headerdoc".

Missing type information in API listings

e.g. logicalAnd(with:ordering:) is ambiguous because it's not apparent from context what type the first parameter is…?

`var`, `inout`, `consuming`…

The proposal notes that use of these with atomic types is basically wrong and will cause problems, but it doesn't seem to be proposing that the compiler prevent it…?

For full implementation details…

For the full API definition, please refer to the [implementation][implementation].

i.e. typo; missing link.

Atomic Strong References

Note that you can probably implement these in a single pointer-sized value, if there's at least a bit still free in the pointer representation. That bit can be made to mean "I'm retaining or releasing this", and with compare-and-exchange you can use that essentially as a spinlock on the pointer.

If you have two bits available then you can make a more efficient version, I believe.

But of course you need to wrap that pointer in a non-copyable type (same as for the double-word variant the proposal alludes to).

It'd be less efficient than a "non-atomic" reference, of course, due to the extra locking (essentially), but presumably it'd be worth it in some cases. And honestly, it's probably an insignificant inefficiency to the vast majority of users.

Default ordering

I concur that having a default - which would basically have to be .sequentiallyConsistent to be safe - is a bad idea. It's not just that it defeats the point somewhat (re. performance) but that making callers explicitly choose an ordering helps ensure they understand its ramifications (or at the very least, that the 'ordering' concept exists).

Nobody1707 · October 23, 2023, 10:56pm

I'm not very familiar with atomics, but I read the pitch thread and the proposal and this seems good to me. +1.

pyrtsa · October 25, 2023, 5:10pm

As part of this proposal, I'd expect to see a few standard protocol conformances for the new WordPair type: Equatable, Hashable, Comparable, and Sendable at least.

lorentey · October 25, 2023, 11:14pm

Beware, this proposal mentions no such thing as an "atomic weak reference". We do not have an implementation for such a construct, and we don't currently have plans to create one, either.

AtomicLazyRefence has strong reference semantics; it implements one useful way to generalize the lazy var feature for concurrency. (But it's just one way -- there are others. The primary benefit of this particular one is that it doesn't rely on blocking; its primary drawback is that it may induce duplicate work, so use cases must allow their result to be discarded. Another approach would be to use a dispatch_once-style, blocking solution; IIRC behind the scenes global let variables already work that way in Swift.)

I don't see how it would be desirable for callers to ever consider losing a race a precondition violation, or to do different things based on whether or not they won -- can you please expand? If a caller loses the race, they must simply discard their object and use the returned instance instead; the API reflects and encourages this.

The API does make it possible to do otherwise (as callers can technically hold on to their original reference and use === to compare it against the returned one), but doing that is outside of the intended target use case -- it's true that it looks quite clumsy to do that, but the clumsiness is intentional.

This is covered in the Future Directions section. To keep this proposal at a (relatively) manageable length, it postpones introducing a reassignable atomic strong reference construct to a future date. This is not a value judgement; it's just a practical matter.

Safe atomic strong references aren't rarely needed, niche constructs -- they are crucial, unavoidable building blocks for concurrent data structures. They bring Swift to relative feature parity with garbage collected languages, by providing a baseline memory reclamation solution that avoids the usual problems. (It still does not make such things easy to write, of course, but it provides an important shortcut.)

We have had a stable implementation for this in the swift-atomics package for several years now. I fully expect a proposal for adding them to the stdlib to follow shortly after this one (although the usual caveats about scheduling difficulties apply).

This is an excellent idea -- trying to roll your own atomic strong reference implementation is a great way to get into the right mindset about the difficulty level of such code.

(In my opinion, the right mindset is to avoid deploying atomics code into production at all costs, unless there is absolutely no other way to solve a problem.)

I don't think that would be right! Single-word, untagged atomic pointers are difficult to use correctly in general, but so are atomic integers. It is possible to write useful algorithms over raw atomic pointers. E.g., an atomic pointer that simply cycles through a preset number of storage buffers is useful, and isn't subject to any of the issues that plague the general case.

This is unfortunately not the case. We cannot let readers/writers ever get blocked on a particular thread's forward progress -- that would effectively be a (bad) reimplementation of a lock, and it would lead to the same problems as naive userspace locks invariably run into (such as problems with priority inversion).

One basic goal for atomic strong references (and atomic operations in general) is that the suspension of any thread must not block any others from continuing to run operations on the same entity. Forward progress must always remain possible -- even if an existing thread is suspended midway through a read or update operation, it must be possible for other threads to continue reading/updating the variable.

To make atomic strong references work reliably as a standalone construct (i.e., not in coordination with some tailor-made memory reclamation solution), they need to include a version number (to avoid the ABA problem), and they need to include an "inlined" refcount that can be updated in the same atomic transaction that loads the reference. (Well, I say "need", but there may be alternative approaches.)

lorentey · October 26, 2023, 12:20am

(In my opinion, the right mindset is to avoid deploying atomics code into production at all costs, unless there is absolutely no other way to solve a problem.)

Related: I still think that it would be a mistake to put struct Atomic and its paraphernalia in the default namespace of every Swift program.

wadetregaskis · October 26, 2023, 12:37am

Ah yes, I did conflate the two in my reading. I was indeed thinking about the weak case.

Can you elaborate on the [lack of] plans to add such functionality? e.g. is it a matter of difficulty, or lack of perceived utility?

Right, and I think that might be partly why I mistook it as meaning weak. I imagine that the built-in let or lazy var functionality will be what most people need most of the time, for strong references, but to my knowledge you can't have a weak lazy var.

The write path might be inherently serial (or at least should be per a given design, thus the interest in a precondition-like check to ensure that), even if the read paths aren't (and therefore you can't just use a vanilla variable).

I think it's fine that the design allows for you to gracefully loose a race, for cases where those might be inherent to the usage and benign.

They might have clean-up or follow-up actions to take (which would fit naturally in the exception handling mechanism), beyond just letting the object release. e.g. sending a response back over HTTP, such as 409 "Conflict" (meaning 'you lost the race and your version was not installed').

As you note the API as currently proposed does permit this kind of check, but it makes it a bit awkward. I don't think it benefits from being awkward.

Fair enough.

I know this is a common opinion, and I understand where it's coming from. I think there's room for disagreement, though. Particularly with numeric values, oftentimes it is very straightforward to use an atomic rather than a more heavy-handed method of mutual exclusion. Yes, you can end up with performance-degrading cache ping-ponging, but not necessarily (and addressing that is a whole other level of design complexity, so it's not something to be rushed into).

Right, but Unmanaged isn't really a raw pointer. It has methods which actually do things with the underlying pointer in ways that are often overlooked by users - e.g. takeRetainedValue calls release on the underlying pointer, nominally after reading the 'current' value, but it neither uses appropriate barriers to prevent reordering nor anything to prevent races against other threads.

I haven't mapped out a specific sequence that would cause an issue, but I assume there is one. If there weren't, then wouldn't Unmanaged already be essentially thread-safe?

Can you clarify what you mean by "blocked"? Delayed, sure - that's the "spin" aspect of 'spinlock' - but in this case I don't see how anything could be blocked permanently… unless, maybe a re-entrancy problem? I know it's possible to 'resurrect' objects in their dealloc methods, for example, but isn't that sort of thing basically forbidden anyway? And I'm not sure it's worth worrying about someone e.g. implementing a custom retain or release method and somehow causing re-entrancy that way. On the basis that it's surely exceedingly rare, at least.

I'm less familiar with the ABA problem in a practical sense. I can see academically why it would be a problem for some use-cases, although seemingly not all? A lot of the time all I care about with an atomic [strongly-referenced] pointer is that I get a valid object when I interact with it; whether it's the same object, and what its address is, between interactions is irrelevant for many purposes… isn't it?

lorentey · October 26, 2023, 1:43am

The compareExchange method performs the atomic compare/exchange operation. Its name follows the term of art very strongly established by C++11, that has also been adopted by popular languages born since then.

We intentionally chose to align the Swift names of the core atomic operations with C++ -- there are very strong benefits in not arbitrarily diverging from Swift's parent and sister languages. (Or, for that matter, from SE-0282v1 and the swift-atomics package, which are the precursors of this proposal.)

At the time those precursors were drafted, feedback on the name compareAndExchange (or compareThenExchange) was quite negative within the primary target audience for atomics -- low-level systems programmers. A less inventive name would be "compare-and-swap", which is a wildly used term for this in academics, but the same issue applies: the actual people who earn a living using these seem to prefer the C++ name. "Compare-exchange" isn't a confusing or misleading name, so I don't think it's worth annoying experts by trying to invent our own.

(Interestingly, some of the same experts did also recommend breaking C++/Rust precedent by switching compareExchange to use a tuple return type.)

lorentey · October 26, 2023, 1:53am

I'm simply talking about lock-freedom, which we really want these operations to universally guarantee. (As opposed to wait-freedom, which doesn't seem practical to guarantee.)

ABA is one particular aspect of the wider problem of memory reclamation, usually discussed in texts about lock-free data structures. Wikipedia has a plausible-looking explanation.

(Edit: the wiki's example is plausible-looking, but its "simplifying" assumptions make it quite useless.)

taylorswift · October 26, 2023, 3:44am

you might be interested in many of the replies to this thread:

Alejandro · October 26, 2023, 4:31pm

I agree that it's not ideal that one can write that, however I don't think introducing new API to prevent this, something like AtomicFenceOrdering would provide a big benefit here. Atomics are fundamentally a low level implementation detail, so "newbies" shouldn't be reaching for these in the first place. Instead, we should push them towards using all of the amazing concurrency features and support in the language to help write better and more correct concurrent code.

I agree, documentation should be updated to better reflect what each ordering does, how some operations synchronize with each other, etc. This can all be incremental however!

It would be nice if we could do something like the following:

@available(
  *,
  unavailable,
  message: "Target does not support double wide atomics"
)
extension WordPair: AtomicValue {...}

However, this is a warning (until Swift 6). This feature was designed for Sendable, but it would be nice to make this a hard error for protocols that either 1. aren't marker protocols or 2. aren't Sendable. In the mean time, we must make this conformance not exist whatsoever for those targets, so yes you will see WordPair does not conform to AtomicValue for example.

wadetregaskis:

Actual instructions for memory fences

If I understand correctly, then with atomicMemoryFence the various arguments map as:

AtomicUpdateOrdering Instruction Unintended side-effects

relaxed None No actual function.

acquiring dmb ishld Prevents earlier reads crossing the barrier.

releasing dmb ish Prevents later reads & writes crossing the barrier.

acquiringAndReleasing dmb ish

sequentiallyConsistent dmb ish

Is that correct? I'm basing it on Table B2-1 in the ARMv8 ARM.

I'm assuming use of the Inner Shareable domain because that seems to me like the most commonly applicable one for user-space code, because per the ARMv8 ARM:

The Inner Shareable domain is expected to be the set of PEs controlled by a single hypervisor or operating system.

The unintended side-effects appear to be because ARMv8 doesn't provide more nuanced modes for dmb…?

How do I get a dmb ishst instruction?

For ARMv8, yes those memory orderings compile down to those instructions.

The unintended side effects are not entirely unintended. Keep in mind that the compiler is also forced into preventing reads/writes from being reordered past certain fences. These are somewhat documented in the std::atomic_thread_fence documentation:

While an atomic store-release operation prevents all preceding reads and writes from moving past the store-release, an atomic_thread_fence with memory_order_release ordering prevents all preceding reads and writes from moving past all subsequent stores.

This is not only enforced by the CPU architecture itself, but also at the compiler level.

If you want dmb ishst then you gotta write the assembly yourself More seriously though, the semantics of the current memory orderings best compile down to what you've written and anything besides those are outside of C++'s memory ordering model (which we're now using for our atomics). x86_64 for instance doesn't issue any instruction for fences besides sequentially consistent.

A lot of these barriers are way more granular than what Swift code is operating at. Also, keep in mind that not all architectures have support for these various barrier kinds, so it's not immediately obvious that we need an instruction barrier for only arm targets in Swift code. Perhaps for embedded targets it would make sense to say do an isb with inline assembly or something, but wouldn't be super useful outside of those environments I feel.

The other barrier we were interested in was std::atomic_signal_fence which is essentially just a compiler barrier. It doesn't emit any instructions whatsoever, but it does require the compiler to prevent reorderings from happening underneath your nose as well. This would require more effort to look into the SILOptimizer to fully support it, but it is something we're looking into.

wadetregaskis:

Acquire-release pairing (etc)

A lot of the time the nature of accesses is intrinsic in the variable, e.g. it's always relaxed, or you always read acquiring and store releasing, or it's always sequentially consistent. Should that be codified in the types themselves, e.g. Atomic<T> (meaning .relaxed), OrderedAtomic<T>, TotallyOrderedAtomic<T>, or somesuch?

It'd be nicer to use, then, because you wouldn't:

Have to specify the ordering on every access (and therefore could use standard operators etc), and

Couldn't accidentally screw it up by using the wrong ordering in an individual access (e.g. Xcode auto-completed .relaxed when you meant .releasing, and you didn't notice).

Maybe this is what is alluded to when the proposal says it's expected that higher-level data types are expected to be built atop these primitives?

At the end of the proposal, it does discuss this, although some of the arguments aren't persuasive to me.

This design puts the ordering specification far away from the actual operations -- obfuscating their meaning.

Well, only insofar as it does for putting the variable's type, or whether a variable is 'weak' (vs strong), etc. I don't think the ordering method needs to be mentioned at every single use any more than the variable's type needs to be. Like the type of the variable, or its reference semantics, it's (usually) more tied to the nature of the variable than specific uses of the variable.

It makes it a lot more difficult to use custom orderings for specific operations…

Only if it's the only way to do it. I think it's acceptable to have an optional ordering argument in this scenario, so you can override the default behaviour on a case-specific basis. It does introduce a risk of confusion, between the type declaration and its usage, but IMO it's worth it. But if that is a real bother to folks, then there could always be two layers - one of primitive "Atomic" and one above that. It does make the API surface nominally much bigger but conceptually it's only slightly more complicated.

Alternatively.:
let foo: Atomic<UInt64>(0, defaultOrdering: .acquireAndRelease)
The trick with the above is you'd want the corresponding ordering parameters to be omitted (or become optional) only if a default ordering was specified at the declaration site. To my knowledge Swift (the language) doesn't have a way to express that, currently.

That may be true for some applications of atomics, but others need to have differing orderings depending on the operation. For example, AtomicLazyReference's load is an acquiring operation while the store is an acquiring and releasing. Having a type level atomic like TotallyOrderedAtomic would mean we would be able to do 1 but not the other. As well as there being an accepted default ordering in the atomic initializer. The atomic type must store that value somewhere in its representation, but we do not want to introduce storage just to store a default ordering.

At a higher level, atomics are very hard to get right. It is immensely useful to those writing code and reviewing it to see explicitly at every call site of an atomic operation what ordering it's using. Allowing for custom defaults would create immense confusion reading arbitrary atomic operations because the semantics between the orderings are very different. We do not want to design an atomic API whatsoever where the ordering of the operation is hidden away somewhere.

wadetregaskis:

Ordering views

FWIW I like that proposed alternative, counter.relaxed.wrappingAdd(1) etc.

I think the resulting code reads well, and it might also make it possible to address the prior point by allowing users to stash the view in a variable instead of the base atomic (assuming the view retains its base object) - which'd also allow a user to choose to enforce a specific ordering by only storing that ordering's view, and throwing out all accessible references to the base atomic value.

Re. some of the other counter-arguments listed in the proposal:

Composability. Such ordering views are unwieldy for the variant of compareExchange that takes separate success/failure orderings. Ordering views don't nest very well at all:

counter.acquiringAndReleasing.butAcquiringOnFailure.compareExchange(...)

I disagree, even with that specific example. It reads easily and clearly.

API surface area and complexity.…

Granted it's easy for me to say as not the implementor of this functionality, but the API explosion seems worth the benefits. We're talking about the Swift standard library here - it's really widely used and so the trade-off between implementation challenge and user experience should be weighted way towards user experience.

The documentation explosion is a minor annoyance - it's true it'd nominally multiply the volume of documentation, but it'd have little impact on the actual conceptual complexity of the API - folks will pretty quickly realise the API for the views is the same; they're not likely to mistakenly read the redundant copies.

Unintuitive syntax. While the syntax is indeed superficially attractive, it feels backward to put the memory ordering before the actual operation. While memory orderings are important, I suspect most people would consider them secondary to the operations themselves.

I don't think it's unintuitive. Unconventional, perhaps.

And if you consider my prior point about the ability to save the view rather than the base object, this approach actually gives you a way to abstract the ordering away from individual use sites, if you want.

Limited Reuse. Implementing ordering views takes a rather large amount of (error-prone) boilerplate-heavy code that is not directly reusable. Every new atomic type would need to implement a new set of ordering views, tailor-fit to its own use-case.

Every additional type or set of equivalent types? I would have thought you could have a single view [for each ordering] for e.g. all atomic integers, genericised over the specific integer type… no?

We think the views are really cool too. However, they introduce a bunch of ergonomic problems that we don't think are worth the risk. API documentation would be a mess to scrub through, custom atomic operations would be significantly harder to do:

extension AtomicRelaxedView
  where Value.AtomicRepresentation == UInt8.AtomicRepresentation
{
  func myCustomOp() -> Value {...}
}

extension AtomicRelaxedView
  where Value.AtomicRepresentation == UInt16.AtomicRepresentation
{
  func myCustmoOp() -> Value {...}
}

... For each storage kind and ordering

Of course, if you shared a single view type then it's only 5 extensions, 1 for each storage (UInt8, UInt16, UInt32, UInt64, and WordPair). It would also mean another thing we really need the optimizer to compile away would be the intermediate state storing the kind of ordering in that view. We feel the constant expression ordering API is some most experts using atomics would feel very familiar with.

wadetregaskis:

weakCompareExchange

It's not clear [to me] from the method's documentation what this really means; why I might want this over the [regular] compareExchange. The docs say only:

(In this weak form, transient conditions may cause the original == expected check to sometimes return false when the two values are in fact the same.)

What's missing is an explanation of why those false negatives might occur, which might be important for the caller to understand at least for performance (e.g. if there's deterministic factors, could this cause an infinite loop?).

The proposal itself does provide more insight:

The weakCompareExchange form may sometimes return false even when the original and expected values are equal. (Such failures may happen when some transient condition prevents the underlying operation from succeeding -- such as an incoming interrupt during a load-link/store-conditional instruction sequence.) This variant is designed to be called in a loop that only exits when the exchange is successful; in such loops, using weakCompareExchange may lead to a performance improvement by eliminating a nested loop in the regular, "strong", compareExchange variants.

That should be in the actual "headerdoc".

I can include what the proposal documents for the weak form in the docs

The context is that it's in a table called specialized boolean operations I can however update the proposal to be more explicit here though.

Correct, we are not proposing any way of disallowing this at the compiler level. We could in theory incrementally add that if we feel those using atomics are doing it wrong with these keywords.

Will update, thanks!

nonsensery · October 26, 2023, 5:53pm

Bikeshedding: It seems like the obvious names for AtomicValue and AtomicOptionalWrappable would be AtomicRepresentable and AtomicOptionalRepresentable. The proposal even uses the phrase “custom ‘atomic-representable’ types” to describe the types that conform to AtomicValue.

Has there been any discussion as to why the current names were chosen?

wadetregaskis · October 27, 2023, 4:52am

Tangentially, is there a reason dmb ishst's function is so demoted? Isn't it the one you actually want a lot of time, e.g. for the common pattern of initialising a structure and then publicising it by writing its address somewhere. On the writer side you don't care about loads in that case (beyond normal dependency orderings that the processor always enforces anyway), only that all your stores into the structure happen before the store of its address.

It seems like the natural compliment for the LD variants (as used by readers). The Arm docs don't explicitly say it, but I get the impression these two modes (ST & LD) were specifically designed to pair together in this way).

I'd be surprised if other [modern] architectures don't have equivalent functionality? Even if (like x86-64) it's relatively coarse-grained compared to ARMv8.

Even if they don't, ARMv8 is the most widely used CPU architecture - doubly-so within the Apple & Swift worlds - so IMO it warrants "special treatment" in this sense.

It's not really any different to having support for more than just dmb sy - x86-64 doesn't have any of the finer granularities (IIRC), and doesn't even need a memory barrier at all in some cases. Doesn't mean Swift code written without the barriers is valid (nor that ARM systems should be penalised by excessively strict barriers, in the dmb case).

This is academic for me personally, at least for now. And I don't see any need at all to try to incorporate these other barrier types in this proposal. I was partly just curious about future plans, and partly hoping to ensure the proposal doesn't inadvertently preclude adding them later (I see nothing in it which does, but you'd know better than me).

Karl · October 28, 2023, 12:15am

Will the Clang importer import C/C++ atomics as these native atomic types?

Also, are there any thoughts about adding an equivalent to C++20's atomic_ref one day?

Alejandro · October 30, 2023, 10:51pm

Hi all, I pushed a PR to address some of the things brought up in this review here: [SE-0410] Atomics: Some clarifications (minor fixes), WordPair changes, and RawRepresentable OptionalWrappable by Azoy · Pull Request #2203 · apple/swift-evolution · GitHub

This adds the implementation link which is referenced later in the proposal, adds type information to the specialized integer and boolean functions, rewords the atomic strong references future direction to directly mention the AtomicReference construct in swift-atomics, and included the WordPair conformances to standard protocols like Equatable, Mashable, Comparable, and more. In addition to that, I've also renamed the highWord/lowWord in WordPair to just first/second because WordPair is not an integer type deliberately, but rather it's semantically akin to (Uint, UInt).

Thanks! I included these conformances and few others in the updated proposal text.

SE-0410: Atomics

Trying it out

What goes into a review?

Non-functional fences

Documentation

Atomic weak references

Unsupported architectures

Actual instructions for memory fences

Beyond basic memory barriers

Acquire-release pairing (etc)

Ordering views

English sucks

weakCompareExchange

Missing type information in API listings

var, inout, consuming…

For full implementation details…

Atomic Strong References

Default ordering

`weakCompareExchange`

`var`, `inout`, `consuming`…