Is there any way to ensure vars (atomics) are laid out on separate cache lines?

I'm reproducing a FIFO queue in Swift using Swift Atomics. The original implementation uses cache line padding to boost performance. Is there any equivalent way to do it in Swift?

The layout in code is like this:

import Atomics

public final class BroadcastChannel<Element> {
  
  @usableFromInline
  internal struct Cell {
    
    @usableFromInline
    let sequence: UnsafeAtomic<Int>
    @usableFromInline
    let data: UnsafeMutablePointer<Element>
    
    init(sequence: Int) {
      self.sequence = .create(sequence)
      self.data = .allocate(capacity: 1)
    }
  }
  
  public let capacity: Int
  public let mask: Int
  @usableFromInline
  internal let headIndex: UnsafeAtomic<Int>
  @usableFromInline
  internal let tailIndex: UnsafeAtomic<Int>
  @usableFromInline
  internal let buffer: UnsafeMutableBufferPointer<Cell>

  // rest of implementation
}

So long as you provide a properly aligned memory locations to the unsafe atomics API that should be possible.

Swift does not currently provide enough control over memory layout to explicitly describe such constraints directly within the language. (E.g., the underscored @_alignment attribute is roughly in the same ballpark -- but not quite, and IIRC it only supports alignments up to 16 bytes.)

However, since you're allocating memory dynamically anyway, you can use UnsafeMutableRawPointer.allocate(byteCount:alignment:) to manually set up cache line aligned & sized storage for your atomic values. (I believe it does support higher alignments, although this needs to be carefully verified.)

The cache line size varies between architectures and it can be queried at runtime using the hw.cachelinesize sysctl on Darwin. Unfortunately we don't yet have an easy way to get this information (and similar low-level data, like page sizes, L1/L2/L3 cache sizes, core counts) in a platform agnostic way. (Even calling sysctl correctly in Swift is a challenging puzzle. We really should add the solution to that particular sub-task to the swift-system package, and we should also consider adding platform independent APIs to query such data to the stdlib.)

The eventual non-copiable Atomic type is expected to provide "inline" storage for atomics, which will give us an opportunity to hide such nitty gritty details. However, I don't know if we'd want the default Atomic type to always occupy a full cache line itself -- perhaps we'd prefer to have it use tight layout (i.e., the same alignment/size as its value representation), and defer layout issues in the hope that Swift will eventually gain precise (opt-in) control over the memory layout of types containing these atomics. (I expect that will have to happen, for Swift to become actually usable for systems programming -- but I don't have any insight into when. :stuck_out_tongue:)

9 Likes

Thanks for the reply, it's really insightful – agree it would be great to see language support.

I guess in the meantime I'll see if I can work out how to call sysctl and create some kind of UnsafePaddedAtomic<T> wrapping type. :slight_smile:

This is correct. Static alignment for Swift types is limited to 16B (you can define types with more-than-16B alignment in a C header, but we might drop it on the floor if you pass/return them by value in a Swift function; @beccadax would know).

UMRP.allocate() will give you whatever alignment you ask for, so long as the underlying platform allocator supports it.

2 Likes

You shouldn't need a new type because UnsafeAtomic<T> is already just a pointer to the atomic. As long as you initialize it with a properly-aligned pointer you should be good to go.

3 Likes

Ah, yes – I hadn't noticed the .init(at:) initialiser. Thanks!

Yes, Swift 5.0 and later drop alignments>16 on the floor, even when defined in C. Here is a closed issue on the topic.

You can do arbitrary alignments manually with a code like this:

func foo(size: Int, alignment: Int) {
    let extra = alignment - 1
    let originalPointer = calloc(1, size + extra)!
    var r = Int(bitPattern: originalPointer)
    let (_, reminder) = (r + extra).quotientAndRemainder(dividingBy: alignment)
    precondition(reminder < alignment)
    r -= reminder
    let pointer = UnsafeMutablePointer<UInt8>(bitPattern: r)! // properly aligned pointer
    // ...
    free(originalPointer)
}

Note, this is quite tricky to do correctly (unless you use locks, but as you are using atomics I assume you don't want locks), even before you start worrying about cache line optimisations. I remember seeing a lock free FIFO queue based on a singly linked list – quite clever implementation that avoided locks but it might not be applicable in all scenarios (IIRC it was useful for file I/O).

In the end I went with this:

import Atomics

extension UnsafeAtomic {
  
  static func createCacheAligned(_ initialValue: Value) -> Self {
    let byteCount = MemoryLayout<Value.AtomicRepresentation>.size
    let alignment = Int(Sysctl.cacheLineSize)
    let rawPtr = UnsafeMutableRawPointer.allocate(byteCount: byteCount, alignment: alignment)
    let ptr = rawPtr.assumingMemoryBound(to: Value.AtomicRepresentation.self)
    ptr.initialize(to: Value.AtomicRepresentation(initialValue))
    return Self(at: ptr)
  }
}

Where Sysctl.cacheLineSize pulls in from Darwin's sysctlbyname(_:_:_:_:_:)

Yeah, definitely. "Playing with razor blades." Luckily, I'm porting a tried and tested algorithm. Working so far... :crossed_fingers:

2 Likes