~Copyable Synchonization Atomic

wes1 · August 6, 2024, 6:06am

(Am I not holding this right? I hope I'm mistaken...)

Per [SE-410], Synchronization Atomic in Swift 6 is ~Copyable.

But atomics are often used in teams, e.g., producer-consumer pairs. Building a library to produce validly-configured teams (to make their use correct-by-construction), I'd like the client to provide an atomic value and the library would return the team members e.g., the producer and consumer. But ~Copyable seems to make it impossible to produce distinct team members, since it can only be consumed once.

(At best I can create one component as a god-class knowing everything about all team members, possibly including client-specific per-state tracking, and showing distinct faces via (slow) protocols - yuck.)

I understand how ~Copyable move-only types can help in other situations to maintain singular ownership, but isn't the goal of an atomic to be used from multiple contexts?

So what can I do to make code like the following work in the current 6.0 toolchains? Am I missing some way to ~~copy~~ refer to the same underlying atomic? (~Copyable requires any container to be ~Copyable.)

enum MagicLib {
  static func demo() {
    let a = Atomic(0)
    let rw = rw(a)
    // now go do something with them...
    _ = consume rw.read
    _ = consume rw.write
  }
  static func rw(_ a: consuming Atomic<Int>) -> RW {
    // error here: consumed twice
    RW(read: Read(a: a), write: Write(a: a))
  }
  struct RW: ~Copyable {
    let read: Read
    let write: Write
  }
  struct Read: ~Copyable {
    public let a: Atomic<Int>
    public func read() -> Int {
      a.load(ordering: .sequentiallyConsistent) + 1
    }
  }

  struct Write: ~Copyable {
    public let a: Atomic<Int>
    public func write(_ a: Int) {
      self.a.store(a - 1, ordering: .sequentiallyConsistent)
    }
  }
}

[SE-410 acceptance] https://forums.swift.org/t/accepted-with-modifications-se-0410-atomics/69244

nkbelov · August 6, 2024, 7:00am

An atomic should not be copyable because it needs to have a stable memory address; the predecessor library had ManagedAtomic which was a class instance, but this requires a separate allocation and refcount traffic for each atomic, which is a bit less ideal.

No protocols are required. ~~You'd just have to make the RW in your example a class~~: from the proposal,

Variables of type struct Atomic are always located at a single, stable memory location, no matter its nature (be that a stored property in a class type or a noncopyable struct, an associated value in a noncopyable enum, a local variable that got promoted to the heap through a closure capture, or any other kind of variable.)

(emphasis mine).

Edit: not make the RW a class, but store the atomic in a class, then vend the parts separately if you so wish:

final class Inner {
    private let atomic: Atomic<Int>
}

struct Read {
    private let inner: Inner

    func load(ordering: AtomicLoadOrdering) -> Int { }
}

struct Write {
    private let inner: Inner

    func store(_ new: Int, ordering: AtomicStoreOrdering) { }
}

grynspan · August 6, 2024, 4:09pm

If we imagine how we'd use an atomic integer in another language (say, C++) across two types:

// 🛑 WRONG! Producer and Consumer aren't sharing state!
struct Producer {
  std::atomic<int> value;
};

struct Consumer {
  std::atomic<int> value;
};

We can immediately see we need to share a reference to the same atomic value across a producer/consumer pair, because each std::atomic<int> instance has a unique location in memory.

In Swift, value types do not have unique locations in memory by default. Swift uses ~Copyable to tell the compiler that an instance of a type cannot be copied; as an intentional side effect and with ~~dome~~ some behind-the-scenes attribute magic, the compiler can then reliably store instances of such types at fixed locations in memory (knowing no other component of the system will accidentally make a copy in another location.)

In C++, to resolve this issue, you'd share a single instance of std::atomic<int> between a producer/consumer pair by assigning it a unique location in memory that both the producer and consumer know about: in other words, they'd reference it by pointer (or by reference, but let's ignore that for now):

struct Producer {
  std::atomic<int> *value;
};

struct Consumer {
  std::atomic<int> *value;
};

std::pair<Producer, Consumer> makePCPair() {
  std::atomic<int> *value = ...; // heap-allocate, get from a bucket, whatever
  return std::make_pair(Producer(value), Consumer(value));
}

void destroyPCPair(Producer&& producer, Consumer&& consumer) {
  assert(producer->value == consumer->value);
  delete producer->value; // or however you need to clean it up
  producer->value = nullptr;
  consumer->value = nullptr;
}

In Swift, you need to do something broadly similar. You could allocate the Atomic<Int> manually using UnsafeMutablePointer:

struct Producer {
  nonisolated(unsafe) let value: UnsafePointer<Atomic<Int>>
}

struct Consumer {
  nonisolated(unsafe) let value: UnsafePointer<Atomic<Int>>
}

func makePCPair() -> (Producer, Consumer) {
  let value: UnsafePointer<Atomic<Int>> = ... // as with C++
  return (Producer(value: value), Consumer(value: value))
}

func destroyPCPair(_ producer: consuming Producer, _ consumer: consuming Consumer) {
  precondition(producer.value == consumer.value)
  producer.value.deinitialize(count: 1)
  producer.value.deallocate()
}

This looks suspiciously like the C++ strawman implementation above. Unfortunately, using unsafe pointers is pretty ugly and you end up being responsible for cleaning up any allocated memory manually (as you were in the C++ example.) Instead, you can store the atomic value inside a class (whose instances are refcounted and have fixed locations in memory already. @nkbelov provided a good example already of how to do that.

So why not just make Atomic a class to guarantee its fixed state and make it refcountable? Because reference types are significantly more expensive in terms of both space and time than a simple atomic value (which can be stored in as little as a register and accessed with direct CPU instructions.) So the primitive type provided by Swift is ~Copyable and callers that need to share them across space/time can opt into the necessary overhead.

I hope that was helpful!

jrose · August 6, 2024, 4:13pm

Additionally, classes and raw allocations aren’t the only references out there: capturing a binding in a closure counts as well. Which can be useful for structured concurrency and other forms of scoped work.

wes1 · August 6, 2024, 4:20pm

Great suggestion, but that's what drove me here: when I tried that (albeit in a switch (on a copyable)), the compiler crashes in silgen - I'm still isolating that (as I dream of a safe refcounted handle without any class semantics....)

Edit: here's the code that crashes the Xcode beta-4 compiler 75724:

import Synchronization

public enum Bug {
  case a, b
  public typealias AR = AtomicRepresentable

  // avoids crash
  //public func writeOp(_ item: consuming Atomic<Int>) -> (Int) -> Void {
  // crash when generic
  public func writeOp<T: AR>(_ item: consuming Atomic<T>) -> (T) -> Void
  where T.AtomicRepresentation == _Atomic64BitStorage {
    { (item).store($0, ordering: .sequentiallyConsistent) }
  }
}
Bug.a.writeOp(Atomic(0))(3)

Alejandro · August 6, 2024, 4:26pm

Hopefully sometime in the future we can have stored borrows which would let you write something like:

struct RW: ~Escapable {
  let read: Read
  let write: Write
}

struct Read: ~Escapable {
  let a: borrow Atomic<Int>
}

struct Write: ~Escapable {
  let a: borrow Atomic<Int>
}

static func rw(_ a: borrowing Atomic<Int>) -> RW {
  ...
}

Note that RW, Read, and Write no longer have to be noncopyable and clearly defines that these types aren't supposed to own their own instances of an atomic and that you're sharing the same instance with the producer of the atomic. They are however nonescapable meaning they can't live longer than the producer of the atomic.

Alejandro · August 6, 2024, 4:31pm

Note that we still don't need this extra indirection (at least in C++) because the producer can just store the value inline and vend a pointer to this atomic for the consumer.

grynspan · August 6, 2024, 4:35pm

Sure, one or the other can own the pointer, but the other (or the one) still needs a pointer/reference back to it, which was my point (no pun intended.)

wes1 · August 6, 2024, 5:03pm

Thanks to you and @nkbelov for the excellent, clear replies. I see the benefit of the primitive and the need for a handle/reference type.

Looking at legacy [ManagedAtomic], I liked the work put into inlining API into clients (at the cost of replicating all the available semantics) and the purpose-built Storage. But perhaps the hoops to get to the atomic access are likely not significant (performance-wise) relative to the complications over a final class with limited access.

But I don't want to give up yet, and this is where I strain at the abstractions a bit in my limited understanding. I assume the storage in memory is moveable (hence the effort to make AtomicRepresentable bit-copiable and the risk of managing pointer directly). ~Copyable for move-only types makes it easier for the compiler to track actual moves, and should make it possible e.g., to prove that all references are in the same stack (or even the same register?), and I was hoping that code from different tasks using an atomic value in the same isolation region wouldn't always have to go to main memory. If I always have to use a class reference, then that would seem to prevent the compiler from optimizing producer/consumer pairs in the same isolation domain when they share a lightweight thread. I had an admittedly vague understanding that the inlining and Storage of ManagedAtomic made that possible, but perhaps an ordinary class restricted to a domain is enough?

My naive impression of go channels is that they manage to put the producer/consumer pair on the same lightweight thread (or at least they share the same go stack?) to avoid main memory or a full context switch when the pair is alternating. Is that possible in Swift?

(This brings up other questions about the scope affected by the memory orderings, but that's a separate issue.)

(Again, apologies if the question is bad, since this is mostly above my pay grade)

Edit: as a reminder from last year's discussion

That was perhaps assuming an older implementation:

Perhaps the producer can use await to call consumers with a borrowing atomic parameter.

[ManagedAtomic] - https://github.com/apple/swift-atomics/blob/main/Sources/Atomics/Types/ManagedAtomic.swift

nkbelov · August 6, 2024, 5:25pm

Perhaps the following will make it a bit clearer for you:

in reality on the CPU, there's no such thing as an "atomic value". All a CPU can do is atomically load/write/exchange etc. to/from a memory location (read: cache line carrying that memory location). This explains why the memory location has to be stable: the synchronisation is achieved by syncing the cache line between CPU cores, so all accesses have to be done through the same address.

The higher-level Atomic<Something> abstractions that languages provide simply aim to make those ops type-safe (that is, prevent you from, say, storing a UInt while loading an Int8). Structurally, the Atomic type is "just" a wrapper around a pointer to itself, and the actual atomic ops are performed on this address.

When the address changes, this essentially becomes a different atomic. This doesn't conflict with them being movable because then the old location is not accessible anymore; a move means "access me through a new cache line instead".

wes1 · August 6, 2024, 5:42pm

Ah, I guess that was my question about regions. My hope was that when an isolation region could be pinned to a core, we could avoid cross-core synchronization.

It's kind of the reverse optimization. Swift concurrency avoids assuming the same (OS) thread context so they can be moved across threads and cores. But with isolation regions, I was hoping that the runtime would sometimes arrange code using the same region on the same core and avoid secondary caching. Otherwise it sounds like we're back to the model that the underlying CPU-level primitives sync the cache, so (conservatively) any fully-sequential atomic access translates to a full-cache flush (making them quite expensive).