Memory footprint of UnsafeAtomic

lorentey · November 30, 2023, 7:46pm

It does. Swift (very deliberately!) does not provide a reliable way to get this address, as accessing ivars through unsafe pointers defeats Swift's elaborate exclusivity checking infrastructure. Swift's model for variable accesses is at the heart of its concurrency strategy; allowing code to randomly escape it in an ad hoc way is not a good idea. (We have learned this in the old Exposing the Memory Locations of Class Instance Variables thread.)

The current implementation of struct Atomic relies on the non-public, experimental @_rawLayout attribute that implements an evolved variant of @Joe_Groff's @RawStorage suggestion from the same thread. It would be possible to use it to make this a first-class, public construct by e.g. providing a Swift equivalent of something like Rust's Cell and UnsafeCell types -- however, properly designing this is future work.

tera · November 30, 2023, 10:36pm

Hmm, is this a broken ring buffer implementation?

// single writer single reader ring buffer
class RingBuffer {
    let totalSize: Int64
    var readPos: Int64 = 0
    var writePos: Int64 = 0
    let buffer: UnsafeMutableRawPointer
    
    init(totalSize: Int) {
        self.totalSize = Int64(totalSize)
        buffer = malloc(totalSize)!
    }
    deinit {
        free(buffer)
    }
    
    var usedSize: Int {
        Int(OSAtomicAdd64(0, &writePos) - OSAtomicAdd64(0, &readPos))
    }
    
    var freeSize: Int {
        Int(totalSize) - usedSize
    }
    
    func write(_ bytes: UnsafeRawPointer, _ size: Int) {
        precondition(size <= freeSize)
        var remSize = Int64(size)
        var bytesOffset: Int64 = 0
        var bufferOffset = OSAtomicAdd64(0, &writePos)

        while remSize > 0 {
            let offset = bufferOffset % totalSize
            let chunkSize = min(remSize, totalSize - offset)
            memmove(buffer + Int(offset), bytes + Int(bytesOffset), Int(chunkSize))
            remSize -= chunkSize
            bytesOffset += chunkSize
            bufferOffset += chunkSize
        }
        OSAtomicAdd64(Int64(size), &writePos)
   }
    
    func read(_ bytes: UnsafeMutableRawPointer, _ size: Int) {
        precondition(size <= usedSize)
        var remSize = Int64(size)
        var bytesOffset: Int64 = 0
        var bufferOffset = OSAtomicAdd64(0, &readPos)
        
        while remSize > 0 {
            let offset = bufferOffset % totalSize
            let chunkSize = min(remSize, totalSize - offset)
            memmove(bytes + Int(bytesOffset), buffer + Int(offset), Int(chunkSize))
            remSize -= chunkSize
            bytesOffset += chunkSize
            bufferOffset += chunkSize
        }
        OSAtomicAdd64(Int64(size), &readPos)
    }
}

The relevant extract:

class RingBuffer {
    var readPos: Int64 = 0
    var writePos: Int64 = 0
    ....
    var usedSize: Int {
        Int(OSAtomicAdd64(0, &writePos) - OSAtomicAdd64(0, &readPos))
    }
    func write(_ bytes: UnsafeRawPointer, _ size: Int) {
        ...
        OSAtomicAdd64(Int64(size), &writePos)
        ...
    }
    func read(_ bytes: UnsafeMutableRawPointer, _ size: Int) {
        ...
        OSAtomicAdd64(Int64(size), &readPos)
        ...
    }
}

Joe_Groff · December 4, 2023, 3:30pm

Yeah, that isn't sound. &readPos is the "inout" operator even when you're using it to pass an address to a C function, which means Swift will impose exclusivity checks surrounding the access, and atomics rely on shared access. In general, the storage used for properties of typical Swift types is subject to Swift's semantics, and while you can dodge the high-level semantics in various ways, it's an uphill battle, and you're better off using raw memory in the first place for things with nonstandard semantics. Noncopyable types finally give Swift a way to express types made up of raw storage in a composable way, so hopefully the addition of a standard Atomic type, and more generally a Cell type like Karoy mentioned, will make this sort of thing easier to write robustly.

tera · December 4, 2023, 4:38pm

Thank you. Until something better lands would use the explicitly allocated storage:

class AtomicStorage<T> {
    private (set) var value: UnsafeMutablePointer<T>
    init(initialValue: T) {
        let value = UnsafeMutablePointer<T>.allocate(capacity: 1)
        value.initialize(to: initialValue)
        self.value = value
    }
    deinit {
        value.deinitialize(count: 0)
        value.deallocate()
    }
}

Relevant ring buffer implementation bits:

class RingBuffer {
    let readPos = AtomicStorage<Int64>(initialValue: 0)
    let writePos = AtomicStorage<Int64>(initialValue: 0)
    ...
    var usedSize: Int {
        Int(OSAtomicAdd64(0, writePos.value) - OSAtomicAdd64(0, readPos.value))
    }
    func write(_ bytes: UnsafeRawPointer, _ size: Int) {
        ...
        OSAtomicAdd64(Int64(size), writePos.value)
        ...
    }
    func read(_ bytes: UnsafeMutableRawPointer, _ size: Int) {
        ...
        OSAtomicAdd64(Int64(size), readPos.value)
        ...
    }
}

Joe_Groff · December 4, 2023, 4:40pm

Yeah, memory you allocate yourself is fine to use with external atomic APIs.

tera · December 4, 2023, 10:01pm

Experimentally I found that I should use "acquire" when reading atomic variable and "acq_rel" when updating it.

    // over-simplified code
    func write(_ bytes: UnsafeRawPointer, _ size: Int) {
        let bufferOffset = atomicAdd(0, writeAddress, .memory_order_acquire)
        let offset = bufferOffset % totalSize
        memmove(buffer + offset, bytes, size)
        atomicAdd(size, writeAddress, .memory_order_acq_rel)
    }

    func read(_ bytes: UnsafeMutableRawPointer, _ size: Int) {
        let bufferOffset = atomicAdd(0, readAddress, .memory_order_acquire)
        let offset = bufferOffset % totalSize
        memmove(bytes, buffer + offset, size)
        atomicAdd(size, readAddress, .memory_order_acq_rel)
    }

although some other combinations work as well (e.g. memory_order_consume + memory_order_seq_cst). What combination is the best?

Is it normal to use "atomicAdd(0, address, ...)" as an "atomic" getter?

tera · December 5, 2023, 12:00am

Yep, once I enabled thread sanitiser and runtime exclusivity checks (maybe only one of those is required but I enabled both) I started getting exclusivity access violations at runtime with the old code.

The above solution based on externally storage works good. I also managed to find a working solution that doesn't involve using external storage. A simplified version of a ring buffer implementation (storage and read/write size parameters are limited to 1 byte):

// getAddress
// usage: getAddress(&c.field)
// where c is a class instance and field is a stored property of simple type.
// in this case the address passed is stable and could be used afterwards.
func getAddress<T>(_ v: UnsafeMutablePointer<T>) -> UnsafeMutablePointer<T> { v }

class RingBuffer {
    let totalSize = 1
    var readPos = 0
    var writePos = 0
    var buffer: UInt8 = 0
    var bufferAddress: UnsafeMutablePointer<UInt8>
    var readAddress: UnsafeMutablePointer<Int>
    var writeAddress: UnsafeMutablePointer<Int>

    init() {
        bufferAddress = getAddress(&buffer)
        readAddress = getAddress(&readPos)
        writeAddress = getAddress(&writePos)
    }
    var freeSize: Int {
        // acqure memory order
        Int(totalSize) - usedSize
    }
    var usedSize: Int {
        // acqure memory order
        atomic_add(writeAddress, 0, .acquire) - atomic_add(readAddress, 0, .acquire)
    }
    func writeByte(_ byte: UInt8) {
        precondition(freeSize >= 1)             // acqure memory order
        bufferAddress.pointee = byte            // nonatomic assess is fine here
        atomic_add(writeAddress, 1, .acq_rel)   // acqure + release memory order
    }
    func readByte() -> UInt8 {
        precondition(usedSize >= 1)             // acqure memory order
        let byte = bufferAddress.pointee        // nonatomic assess is fine here
        atomic_add(readAddress, 1, .acq_rel)    // acqure + release memory order
        return byte
    }
}

extension memory_order {
    static let relaxed = memory_order_relaxed
    static let consume = memory_order_consume
    static let acquire = memory_order_acquire
    static let release = memory_order_release
    static let acq_rel = memory_order_acq_rel
    static let seq_cst = memory_order_seq_cst
}

This is a simplified version, full version uses storage and read/write size parameters of arbitrary sizes. I wasn't able to break this approach so far (Thread sanitizer and exclusivity checks are enabled).

tera · December 5, 2023, 3:48am

Applying the same approach to the original problem would result into this version:

class Edge {
    var totalActionValue: Float16 = 0.0
    var visitCountStorage: UInt32 = 0
    var virtualLossStorage: UInt8 = 0
    let visitCountAddress: UnsafeMutablePointer<UInt32>
    let virtualLossAddress: UnsafeMutablePointer<UInt8>

    init() {
        visitCountAddress = getAddress(&visitCountStorage)
        virtualLossAddress = getAddress(&virtualLossStorage)
    }
    
    func visit() {
        OSAtomicAdd32(1, visitCountAddress)
    }
    var visitCount: UInt32 {
        UInt32(bitPattern: OSAtomicAdd32(0, visitCountAddress))
    }
}

Note the 32-48 bytes overhead on top of useful payload.

jrose · December 5, 2023, 5:27am

That getAddress isn't safe. That's what everyone's saying. It's not just exclusivity, it's that you have no guarantee the compiler won't insert a temporary, or poison the address, or anything else.

Beyond that, your acquire/release is in the wrong order with your read to function as a fence, and…you're not actually using writeAddress and readAddress to change which position you read and write from.

You should definitely be testing data structures like this under both ASan and TSan.

tera · December 5, 2023, 12:08pm

Realistically if compiler doesn't use a temporary to form a mutable pointer today it is highly unlikely it will be changed to use a temporary in the future as it will break many things, so effectively this is a part of contract de-facto. The change is not impossible though, so if/when/once this happens the code would need revising, potentially using something totally new and cool we can't imagine today.

What is it?

        precondition(freeSize >= 1)             // acqure memory order
        bufferAddress.pointee = byte            // nonatomic assess is fine here
        atomic_add(writeAddress, 1, .acq_rel)   // acqure + release memory order

As I mentioned above this pair works (according to limited tests on ARM) but some other pairs work as well. What would be the best here?

That's right and correct in this particular case as I am using a "1-byte" ring buffer in this simplified example (all possible read/write positions are 0).

Those are the best friends

jrose · December 6, 2023, 2:54am

Ah, this was my mistake, I wasn't taking into account the fact that the increments were acquiring and releasing every time. That part should be safe, I believe.

As in, it would be valid for ASan or a similar kind of tool (including a hardened runtime implementation) to refuse to allow reads or writes to the storage without exclusivity checks. Passing class storage via inout simply does not produce a pointer you can use for the lifetime of the class. You can get "miscompiles" this way today if you go looking for them, but it's undefined behavior regardless.

(I'm kind of surprised ASan doesn't check for this today, but I guess no one got around to implementing it.)

As written, your code is already relying on a particular optimization: the class has no subclasses that override the property. (You can use final to prevent this, of course.)

It does not matter that your code appears to behave correctly today. Do not do this.

tera · December 6, 2023, 3:38am

I like to live dangerously. If it works today - that all that matters to me. And if it breaks tomorrow - fine, I'll fix it right there and then; hopefully the proper atomics will land in Swift by the time so will use those. YAGNI at its finest. I doubt it will break during my lifetime and would welcome any improvements to TSAN/ASAN/compiler that break this code (would in fact willing to pay a reasonable monetary contribution for that to happen).