What are the benefits of ManagedBuffer?

Jesse_Grosjean · January 13, 2023, 3:00pm

I have a buffer class that looks like this:

class Buffer1 {
    inner: ContiguousArray<UInt8>
}

For learning and maybe performance I'm testing this against a ManagedBuffer.

My understanding is that ManagedBuffer will help because it only requires one allocation, while my Buffer1 requires two allocations. This single allocation will also provide better cache locality when reading data from multiple buffers.

I ask because I'm having a hard time making ManagedBuffer stand out in my simple performance tests below. I'm allocating about 1G of data over 65000 buffers. After allocation I'm touching each buffer's data.

When I profile these tests the results are close:

Buffer1: 1.8 seconds
Buffer2: 1.6 seconds

ManagedBuffer is faster, but not by a huge amount. Is that about what you would expect or am I missing some other benefits or optimizations?

Thanks,
Jesse

import XCTest

class Buffer1 {
    var inner: ContiguousArray<UInt8>
    init(_ size: Int) {
        self.inner = .init(repeating: 1, count: size)
    }
}

class Buffer2: ManagedBuffer<Int, UInt8> {
    static func create(_ size: Int) -> Self {
        unsafeDowncast(Buffer2.create(minimumCapacity: size) { buffer in
            buffer.withUnsafeMutablePointerToElements { pointer in
                pointer.assign(repeating: 1, count: size)
            }
            return size
        }, to: Self.self)
    }
}

final class BufferTests: XCTestCase {

    let bufferCount = 65000

    static var bufferSize: Int = {
        let capacityInBytes = 16000
        let elementsCapacityInBytes = capacityInBytes - MemoryLayout<Int>.stride // count
        let elementCount = elementsCapacityInBytes / MemoryLayout<UInt8>.stride
        return elementCount
    }()
    
    var buffers1: [Buffer1] = []
    var buffers2: [Buffer2] = []

    override func setUp() {
        buffers1 = []
        buffers1.reserveCapacity(bufferCount)
        buffers2 = []
        buffers2.reserveCapacity(bufferCount)
    }
    
    override func tearDown() {
        buffers1 = []
        buffers2 = []
    }

    func testBuffer1() throws {
        let size = Self.bufferSize
        
        measure {
            // create buffers
            for _ in 0..<bufferCount {
                buffers1.append(.init(size))
            }
            
            // touch buffers
            for b in buffers1 {
                assert(b.inner[0] == 1)
            }
            
            buffers1.removeAll()
        }
    }

    func testBuffer2() throws {
        let size = Self.bufferSize
        
        measure {
            // create buffers
            for _ in 0..<bufferCount {
                buffers2.append(.create(size))
            }
            
            // touch buffers
            for b in buffers2 {
                b.withUnsafeMutablePointerToElements { pointer in
                    assert(pointer.pointee == 1)
                }
            }
            
            buffers2.removeAll()
        }
    }   
}

ksluder · January 14, 2023, 12:01am

The prefetcher is probably recognizing that any time your Buffer1 instance’s memory is accessed, it results in an immediate de-reference and load of its inner array’s buffer. You’re also accessing that memory linearly, which modern CPUs are extremely optimized for.

I’d try random access patterns.

scanon · January 14, 2023, 12:50am

If I'm reading the source right (and I might not be, as I'm reading this in the browser rather than looking at a running program), prefetching has nothing to do with this; the sample code is initializing ~16000 bytes for each buffer or contiguous array it allocates, which totally dominates any overhead from the buffer header being inline or out-of-line. In as much as that matters as an optimization, it only matters when buffers are small (a few cachelines or so). I'm actually pretty surprised that there's as much of a difference as reported.

Further, because you're repeatedly allocating and freeing exactly the same size allocations, it is quite likely that the allocator is repeatedly handing you back the memory you just used, further hiding an differences in the allocations performed by these two strategies.