SE-0282 (Review #2): Interoperability with the C Atomic Operations Library

I'll start off by saying that I haven't read all the replies, sorry!

I think interop with C and C++ atomics make sense. However, what this actually means is "participate in a memory model", that is: you have X threads, making accesses to Y memory locations, and there's a model which details the possible outcomes for these accesses and threads. It doesn't really matter what the thread languages are! Just that they access the same memory locations.

I'm therefore saying that you want a memory model, compatible with C and C++'s.

In other words, you want to have a model which, given threads and accesses, outputs the possible valid values observed by the program. You can then list interesting litmus tests, and run them a few billion times on relevant hardware to make sure your compiler lowering and the hardware actually obey the memory model. Concretely, here's an example of such an interactive memory model checker for C++.

There's plenty of research in that area.

Other interesting reads related to C++ directly:

FWIW, the WebAssembly and JS memory models handle mixing seq_cst with non-atomic accesses (because the instructions are atomic, not the locations). Also, you might want to think what unaligned / mismatched sizes do, if you allow it at all (C and C++ don't because atomicity is a property of the type, not the code, so a location can't be mixed atomic / not, nor different sizes).

IMO you should force lock-free for everything. We don't even ship libatomic on our platforms so it's gotta be lock free, or fail linking. I wouldn't bother with interop with non-lock-free atomics in C and C++.

One thing you ought to consider is only allowing specific ordering for some atomic objects. Like “you can only do acquire release on this”. Or “this object is sequentially consistent only”. Instead of allowing arbitrary orderings. It greatly simplifies things. Doesn’t hurt interop really. Rarely if ever loses capabilities.

If you do a full memory model then you probably still want to detail what happens when individual locations are modified inconsistently, but Swift itself should only allow one location to be one of:

  • relaxed
  • consume / release
  • acquire / release
  • seq_cst
  • non-atomic

You also want to consider volatile atomic (separate from volatile, and separate from atomic). C and C++ have volatile atomic, and if you want interop you probably want this as well. Or is this implicit in Swift, i.e. can a compiler optimize around atomic accesses by assuming that the program is data-race free? Basically, in C++ a data race is UB, so LLVM can (and does) optimize assuming that there isn't a data race. See https://wg21.link/n4455

You also need to think about notify: https://wg21.link/P1135

You'll want to think about out-of-thin-air values, and pray that someone fixes the issues there... Similarly, C and C++ have a pointer zap issue.

On consume: yes https://wg21.link/p0750 is still the state of the art. Until someone implements a proof of concept in a compiler (i.e. tracks dependencies for real, preventing things like GVN from breaking them), then the committee will ask that people avoid consume, and compilers will auto-upgrade consume to acquire. I don't think it's particularly, hard, we just need to track that dependency through IR and teach various parts of LLVM to leave dependencies alone instead of ICE'ing.

One last thing: you probably want to ignore C++ atomic flag, or at least match what C++20 has for it? It's kind of a historical thing, we almost removed it, but then decided to fix it instead.

6 Likes

I disagree with that, there are several useful algorithms where you need to mix&match more complex things than these options.

Examples:

  • a lock that also contains flags (that interact with the lock behavior), flags will be modified relaxed, but the lock taken/dropped with acquire/release
  • refcounts are relaxed/release/acquire-on-the-last-release

that's just the 2 that came up to my mind.

I do agree with pretty much everything else you said, but while I can see why orienting implementors to one of those options makes sense in a lot of cases, it is also too constricted. I would suggest instead:

  • relaxed
  • explicit mandatory barriers
  • seq_cst
  • non atomic

as the 4 behaviors a field can have.

2 Likes

and for the love of everything that is holly, please make 'consume' work. In Swift that is a more managed language than C, it is possible to have consume work for the typical usage that 99% people need which is dereferencing a pointer to see the object pointed at to be initialized.

Sure, if we can express things such as injecting dependencies with "fake zeroes" it's cool. But that's extremely niche and advanced. Exposing the natural HW dependency of the pairing of "consume on pointer dereference"/"store-release" is IMO something that Swift should try to expose.

I also think that consume in C{,++} because it tries to be automatic is doomed. in 99% of the cases I've used it before, I'm fine with manual propagation of dependencies for the weird edge cases (like I have that one global atomic field that I read, and I want to inject that load dependency into another load, I'm 100% fine having to spell it out), provided that you get that "pointer deref" behavior in the language. That's what C is failing at because it assumes we'll build machines like DEC Alpha again where the compiler needs to insert barriers for the HW consume-on-deref. If Swift can ignore that, or maybe manually annotate pointers that assume this behavior, then we'd have effectively "fixed" consume.

2 Likes

If I understand you correctly, @Pierre_Habouzit, it sounds like you're advocating for something similar to what @Andrew_Trick suggested, which is to prohibit the compiler from messing with dependency chains so that "relaxed" ordering effectively gives the guarantees of "consume" (and hope that nobody resurrects Alpha architecture). Is that a fair assessment?

I agree that these cases exist, and are useful. However, atomics are already an expert tool, and I think the use cases you outline should be relegated to implementation-provided facilities, or to "unsafe" builtins.

My rationale for suggesting what I am is that, if someone uses atomics, they ought to have really strong guarantees and understanding of ordering. My experience is that people can understand seq_cst, and they can understand acq / rel, but mixed isn't something most people can. It's way easier to provide a sensible memory model when all you have is one way to use a particular memory location.

More powerful stuff shouldn't make the entire usage more difficult. It can be in a separate toolbox, with safety label on it.

Yes. Except that I disagree with:

Programmers who need 'consume'-like ordering should use 'relaxed' and know the rules for protecting themselves against the compiler.

The compiler should provide guarantees for reasonnable expectations, and the one I'm asking for is:

  • dereferences
  • non optimizable arithmetics propagates dependencies (pointer arithmetics)

For anything else smarter than that, then explicit compiler primitives should be provided to inject a dependency reliably. (the "magic 0" tricks that are sometimes used).

I don't think consume should become relaxed. You should just not have consume until it's implemented. C++ standardize something that didn't actually work, and so compilers just strengthen consume to acquire because that allows fewer executions than what the programmer requested. Weakening consume to relaxed is wrong because it allows more executions (i.e. the programmer said "do X or Y or Z", and you're doing Ώ).

This doesn't hurt interop with C and C++, because the reader is independent from the writer (in different languages). It does affect the entire model, but I don't think this matters.

I agree with Andrew that you can optimize consume as if it were acquire. The only thing I don't agree with is "just lower it as if it were relaxed". i.e. it can't be a simple load on some architectures. On x86 sure, but on ARM you either have to use ldaex, or a load with a barrier, or use the address dependency rule which is what my prototype in wg21.link/p0750 does.

Ideally, someone who cares would implement dependencies in the compiler, and then all languages would benefit. C++ folks are playing chicken with you, so maybe Swift folks can implement it? :wink:

that's the thing, what I'm saying is that the language should provide that relaxed access respects the address dependency rule, all HW out there (but Alpha) respects it. If we were to have Alpha again, then I'd argue that marking a variable as _Atomic (in C parlance) should give you address dependencies always. And make the rest explicit.

As such it means that relaxed atomics give you that property (I assume that's what @Joe_Groff and @Andrew_Trick are talking about and I would tend to agree)

EDIT: To elaborate on this, when I have used "consume" ordering, I have only used it to inject dependencies from one load into something that respected address dependencies. For example, libdispatch uses it in a structure that looks like this:

struct dispatch_queue_s {
...
_Atomic uint64_t dq_state;
...
dispatch_queue_t dq_targetq;
...
}

When the dq_targetq is modified, there's a RMW-release done on dq_state and I will do things like:

  1. uint64_t state = atomic_load_relaxed(&queue->dq_state);
  2. queue = inject_dependency(queue, state); // inject a dependency into queue
  3. target = queue->dq_targetq; // use address dependency to see the proper target queue
  4. target->...; // use address dependency on dq_targetq to see a properly initialized struct

I want the language to guarantee (3) and (4) and to sometimes let me express (2). and that's it.

2 Likes

I think I'm misunderstanding you. Can you provide a code example of relaxed code, and what the corresponding ARM64 assembly ought to be (versus what it is today)?

Optimizing a 'consume' as-if it were an 'acquire' would mean lowering to a pseudo-instruction (ld.consume). I'm not proposing this, just laying it out as an implementation choice. If I'm wrong, it doesn't change what I'm proposing. More concisely:

SE-0282 does not define a Swift memory model. It specifies, very nicely, how Swift values map to C's memory model. There is no promoting/downgrading of memory ordering happening in this proposal or in the Swift compiler.

The C implementation upholds its memory model at language boundaries. A 'consuming' load actually emits an acquire fence before returning a dependent value. Since Swift does not provide native atomics and does not import atomic types, any access to the same storage in Swift is considered a non-atomic access, and likely not legal according to C's memory model.

Until Swift provides it's own native atomics, as a matter of practical necessity, programmers who truly need 'consume' ordering for performance, will need to make use of 'relaxed' or use inline assembly, with an understanding of how to protect themselves from the compiler. SE-0282 has nothing to do with this problem.

In some later proposal, after Swift introduces native atomics, Swift should support "consume-like" atomic ordering by introducing a "load consume" API that produces a dependent pointer type. Along these lines: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html

2 Likes

Right, there are two sides to this. A set of rules the programmer must follow and an aggreement by the compiler to respect data dependencies within those rules. This paper lays out a bunch of rules in section 4.1.1:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0098r1.pdf

I'm not trying to debate those rules, just pointing out that, as a practical matter, we rely on some set of rules in addition to 'relaxed' ordering semantics, at least until the language introduces a new consume-like concept. The point here is simply that 'consume' ordering as specified is useless.

The core team has decided to accept this proposal, with clarifying text about the interaction with consume ordering:

Thanks everyone!

11 Likes
Terms of Service

Privacy Policy

Cookie Policy