Low-Level Atomic Operations

I think there is a strong and reasonable argument to be made that nothing about this API is really suitable for beginners. Optimizing for predictability seems like a higher priority than progressive disclosure of complexity. I was one of the people that suggested a default argument, and I retract that suggestion!

-Chris

9 Likes

This is not something we should expose. it is something that some low level OS primitives might want to use to implement higher level constructs, but basically unless you are the kernel and can disable preemption, in the context of a heterogenous priority environment, building anything around an atomic busy loop (spinlocks being the most obvious example) is incorrect for both performance and power.

Thanks for chiming in!

Yeah, I hear the argument against spin-loops in general and it's a solid one AFAICS. It's definitely not a thing that "has to" be included in an initial atomics pitch so seems like it'd be fine to skip those here until we have proven they'd really help. If we ever end up in a design / use case that'd want to use one, I'm happy to be then proven wrong and do something better :slight_smile:

For reference: My "oh nice!" reaction on PAUSE was based on that pause was recently exposed in the JVM to much rejoice of people implementing queues and messaging systems ( https://openjdk.java.net/jeps/285 ) – however that proposal and use case very Intel centric when one thinks about it.

I'd be (personally) happy to not have pause exposed in this initial pitch, and revisit it with proper discussion and use cases when the time comes.

Very glad the argument resonates, and thanks for reconsidering the suggestion :+1: Predictability / readability are indeed paramount in those APIs :slight_smile:

I agree with @Pierre_Habouzit on PAUSE. Whether there is a more useful alternative available is heavily platform dependent. As @lorentey mentioned, ARM has WFE/SEV, where the thread at least gets paused until an event is signaled. On x86 there is MONITOR/MWAIT, that waits for a change on a specific memory location, but that is unfortunately only available at privilege level 0, so of no use here. Intel recently added UMONITOR/UMWAIT, but AFAIK that is only available on some Atom processors at the moment. With Excavator, AMD added MONITORX/MWAITX, which seems to be pretty much equivalent to UMONITOR/UMWAIT. Both take a timeout and wake up either when the watch triggered, or the timeout was exceeded.

Given that all of this is very platform dependent, I don't think it's feasible to expose this in the stdlib.

2 Likes

I think I'm starting to convince myself that it's worth it.

It would also be desirable to have a RawRepresentable extension to allow a limited set of custom atomic types:

protocol AtomicProtocol {
  associatedtype AtomicStorage = Self
  static func atomicLoad(at address: UMP<AtomicStorage>) -> Self
}

extension Int: AtomicProtocol {...}
extension UInt: AtomicProtocol {...}
...
extension UInt8: AtomicProtocol {...}

// ⚛︎ ⚛︎ ⚛︎
protocol NullableAtomicProtocol: AtomicProtocol {
  static func atomicLoadOptional(at address: UMP<AtomicStorage>) -> Self?
}
extension Optional: AtomicProtocol where Wrapped: NullableAtomicProtocol {
  typealias AtomicStorage = Wrapped.AtomicStorage
  static func atomicLoad(at address: UMP<AtomicStorage>) -> Self {
    RawValue.atomicLoadOptional(at: address)
  }
}
extension UnsafeMutablePointer: NullableAtomicProtocol {...}
extension Unmanaged: NullableAtomicProtocol {...}

// ⚛︎ ⚛︎ ⚛︎
protocol AtomicRepresentable: AtomicProtocol, RawRepresentable
  where AtomicStorage == RawValue {}
extension AtomicRepresentable {
  static func atomicLoad(at address: UMP<RawValue>) -> Self {
    Self(rawValue: RawValue.atomicLoad(at: address))!
  }
}

// ⚛︎ ⚛︎ ⚛︎
struct AtomicHandle<Value: AtomicProtocol> {
  init(at: UMP<Value.AtomicStorage>)
  static func create(initialValue: Value)
  func destroy()

  func load() -> Value {
    Value.atomicLoad(at: _address)
  }
}  

This way, AtomicHandle can support:

// Integer types:
let counter = AtomicHandle<Int>.create(initialValue: 42)
let cnt32 = AtomicHandle<UInt32>.create(initialValue: 23)

// Optional and non-optional pointers:
let ptr1 = AtomicHandle<UnsafeMutablePointer<Node>>.create(initialValue: ...)
let ptr2 = AtomicHandle<UnsafeMutablePointer<Node>?>.create(initialValue: nil)

// Optional and non-optional unmanaged references:
let ref1 = AtomicHandle<Unmanaged<Foo>>.create(initialValue: Unmanaged.passRetained(Foo()))
let ref2 = AtomicHandle<Unmanaged<Foo>?>.create(initialValue: nil)

// Custom atomicable types:
enum State: Int, AtomicRepresentable {
  case starting
  case running
  case stopped
}
let state = AtomicHandle<State>.create(initialValue: .starting)
...

It is my sad duty to report that supporting implicitly unwrapped optionals doesn't seem feasible without reintroducing dedicated type(s) for such things. (Which I'm not willing to do.)

This requires three more protocols than I originally planned on adding, but I really like where it's going.

(These protocols/generics would be like FixedWidthInteger in that unspecialized usages won't work very well at all.)

3 Likes

Hello Pierre, thanks for chiming in, I really appreciate your input.

This is not something we should expose. it is something that some low level OS primitives might want to use to implement higher level constructs, but basically unless you are the kernel and can disable preemption, in the context of a heterogenous priority environment, building anything around an atomic busy loop (spinlocks being the most obvious example) is incorrect for both performance and power.

I see your point and completely agree. What I have in mind however is not necessarily spinlocks (with os_unfair_lock, there really is no need for custom spinlocks) but any algorithm that includes a compareAndExchange operation which typically needs to be retried. Do you suggest retrying is bad in general and should be avoided? I don't think you do, and I also don't think it can be avoided altogether (otherwise we should just not even expose compareAndExchange).

I'm strongly opposed to this making it the standard library "just because", instead the library should provide better higher level construct that the OS has a chance to be able to optimize for you, and provide the adaptive spinning in a controlled way (only the OS can do that safely, the library can't).

Agreed. I personally don't care if we expose PAUSE specifically, my point is that it's important to expose something (anything really) that can hint to the OS that we're in a busy loop context (FWIW, Rust exposes a spin_loop_hint() top-level function) and, more importantly, that we shouldn't punt this for later. Otherwise, what we'll end up with is exactly what you want to prevent: badly performing and energy inefficient code that also adversely affects the rest of the system, because people will either have to write their own PAUSE/backoff/etc. strategy or just ignore it altogether and write naive busy-wait loops.

1 Like

SGTM!

I mean, UnsafeMutablePointer<UnsafeMutablePointer<Foo>> is the current type of a pointer to a pointer. Someone coming from a C background might say that it is too long, but that's not what we are optimizing for.

The proposed types are memory unsafe, so they must have the word "unsafe" somewhere in them. They are also pointers, so I think using the word "Pointer" is appropriate. The word "Mutable" is less necessary (since there is no other option), but it is nice to be consistent with existing unsafe pointer naming, which uses it.

AtomicHandle for me invokes a sense of memory safety, which is not there. UnsafeAtomicHandle would be better, but it is somewhat obscuring the pointer nature -- "handle" is something quite abstract and often memory safe (think file handle, Windows kernel object handle etc.)

Both SGTM.

Could we use just one conformance somehow?

extension Optional: PrimitiveAtomic where Wrapped: PrimitiveAtomic {}

(and then, if necessary, use a trick similar to _customIndexOfEquatableElement in the standard library, for example, define _customLoadOptionalValue which would be implemented for UnsafePointer but not implemented for Int.)

1 Like

Hello Chris!

I think there is a strong and reasonable argument to be made that nothing about this API is really suitable for beginners. Optimizing for predictability seems like a higher priority than progressive disclosure of complexity.

With all due respect, I would argue that, equivalently, a language with a static type system is also not really suitable for beginners in programming. Does this mean beginners in programming should not start learning programming with Swift? As far as I have seen until now, Swift has been welcoming to beginners and when atomics are added to the standard library I'm fairly certain users with no prior experience in atomics will use them to implement counters, flags and other simple algorithms. Some of these people will get fascinated by them and will continue on their journey and learn more about the intricacies of memory orderings, cache line invalidation, etc. Should the rest have to figure out all these things before they ever implement a counter? [1]

My argument is that, as I understand it, sequentially consistent ordering offers the strongest guarantees, relieving the user from the burden of having to reason about a bunch of things that in my opinion sit one level lower than most people will care about. The end result is that the final implementation will be correct by default (assumming the algorithm itself is implemented correctly in the first place), and that at worst it will perform slightly worse than an implementation with manually specified orderings. In that regard, and I may be wrong, I consider memory orderings as a tool for optimising an algorithm.

That said, I'm also okay with not having a default if people end up agreeing against -- like it was mentioned upthread, this can simply be added in an extension as needed.

[1]: granted, you might argue that a relaxed memory ordering typically offers enough guarantees for a counter, but a user would still have to figure this out for themselves, if they cared enough, but a default seqcst ordering would still be correct.

AFAIU, any memory ordering can be replaced with seqcst and the algorithm will still be sound. Therefore, I do not consider worrying that a seqcst load or store happens to slip through during review -- this by itself is not an issue, the code will still behave correctly and it will at worst perform slightly worse.

What is an issue however, is people not having a complete understanding of memory visibility (and there is unfortunately no "I know just enough of this") or having a false security they know what they're doing and spraying acquire and release here and there -- this leads to code that crashes at runtime, or worse.

Edit: Ugh, re-reading my post I think I need to clarify that I don't mean to suggest you or your peers don't understand atomics or have a false sense of security :) All I'm saying is that I believe the bar before dabbling with orderings is quite high and can discourage most people from using atomics, or worse, use them incorrectly.

storeOnce doesn’t sound like a function that may have no effect. storeIfNil happens to be the name I picked for this functionality a couple years ago, so I agree with it.

2 Likes

Specifically because class initializers could not replace self -- otherwise ManagedBuffer.create would have been an initializer.

I'm not sure what you're getting at, but this doesn't seem very pertinent to the discussion on this thread. This is not a general discussion, this is a specific discussion about atomic APIs.

1 Like

Here is my thought process on this (having recently written a bunch of code doing fiddly things with atomics in the TFRT project):

People (should) only reach for atomics when they care about very low level performance. It is true that you can default to sequentially consistency semantics and get something that is correct, but if you're reaching for this for performance in the first place, it seems useful to make these places explicit. Particularly on non-X86 architectures, there can be huge performance differences between using the correct consistency model and relying on sequential consistency.

Atomics are not a beginner concept - there are many complex topics that cannot be swept under a rug. I personally don't think that defaulting this leads to a simpler or better API.

-Chris

5 Likes

Apologies, instead of “Does this mean beginners in programming should not start learning programming with Swift?” I should have asked “should Swift not make a reasonable effort to simplify concepts and syntax so that it’s approachable by beginners?”. That is in response to your argument that atomics are not really beginners’ material, which I agree, and that progressive disclosure is not important, which I personally disagree.

Thanks for sharing your reasoning in your other comment. I agree with everything you say but I still think there’s room to make atomics more generally approachable and defaulting to seqcst helps a bit to achieve that.

1 Like

It seems that the point of debate is whether or not memory ordering is an intrinsic concept with respect to atomics. Chris is arguing that it is, and that one should not use atomics if one is not prepared to understand that parameter and what it means. You are arguing that atomics are useful even if one defaults to a safe memory ordering, and thus one can make atomics available to less-knowledgeable programmers without detracting from both correctness and available power.

2 Likes

Yes, it is actually even more fundamental than atomics.

Typically, the language assumes a single-threaded execution model, so the compiler can reason about the state of memory and whether or not operations which change that state can be observed. That is what enables it to reorder or eliminate bits of code.

Atomics are intrinsically multi-threaded. We are telling the compiler that another thread will be executing, maybe reading or writing to memory in our address space, and the atomic is the channel over which they coordinate, because it has defined behaviour for concurrent reads/writes. It’s a hole in the typical, single-threaded execution model, so the memory order tells the compiler how it should do it’s job around that hole.

So, while it may not necessarily be intrinsic to an atomic operation itself, it really is intrinsic to using an atomic, because it’s about how everything else happens relative to them.

(So yes, I’m against having a default)

3 Likes

I do too think that there is room to make Atomic more approachable, but this is not the goal of a low-level atomic API IMHO.

It will then be pretty easy to build a simpler and more approchable API on top of it.

3 Likes

No offence taken :wink:

It seems though that (luckily!) we are converging on that part of the discussion that yes those are the low-level API after all and any kind of help for people writing those things is welcome. Thanks everyone, this will make reviews much simpler by making the ordering always spelled out explicitly.

Thanks for your replies gentlemen. Avi's summary above is extremely accurate (thank you!).

Yes, it does seem we're converging against having a default. While I'm still personally not convinced, I realise I argue for watering down a really difficult subject in order to be more widely consumed, which might ultimately be at the detriment of people that actually know what they're doing, and I respect that. I don't see any serious harm going forward without a default and we can revisit any time in the future if this becomes important enough and it would just be an additive change.

2 Likes

This can only be implemented when Optional<T> has a layout compatible with atomics, which may not be true even if T is itself atomicable.

UnsafeAtomicThingamabob<Int?> // Ought to be rejected

(There is a similar problem with RawRepresentable, since it doesn't require conformant types to have a layout that matches their RawValue. This is why custom atomic-representable types will use their RawValue as their storage representation.)

The problem is that we wouldn't be able to provide a default implementation that does anything other than trap.

The nullability constraint can be modeled in the type system through a refinement of the atomic protocol, and I think that's the right way to resolve this. (Why stop at adding just one protocol, when we can have two?)

This is now implemented:

Properly modeling the distinction between atomic values (AtomicProtocol) and their storage representation (AtomicStorage) is far more fiddly than it may seem; there is plenty to dislike about API usability there. But on the whole, I think this is the right approach.

I named the generic UnsafeAtomicPointer per Dmitri’s advice — although I still deeply dislike how similar it looks to UnsafeMutablePointer. The two API names blur together when I use this in actual code and it interferes with readability.

2 Likes
Terms of Service

Privacy Policy

Cookie Policy