Input on a per-resource serialized access coordinator with global parallelism

Hey, I am lately in need of getting some performance-critical pipeline improved and came around the topic of per-resource serialized access coordinator with global parallelism and a little abstraction for some resource management via, e.g., LRU/TTL.

I am now struggling to decide between the new structured concurrency model and more controllable options such as unfair locks/threads, etc.

The current architecture uses a class with an actor inside for bookkeeping.

// Defaults: Queue depth 512, In-flight limit 10
let coordinator = try ResourceAccessCoordinator(resources: [
    UnsafeResource(id: "resource_1"),
    UnsafeResource(id: "resource_2")
])
try await coordinator.enableLRU(depth: 2)

try await coordinator.access(resourceID: "resource1") { res in
        ...
}

// runs concurrently to 'resource1'
try await coordinator.access(resourceID: "resource2") { res in 
        ... 
}

// 
try await coordinator.access(resourceID: "resource2") { res in
        ... 
}

// try await coordinator.add(UnsafeResource(id: "resource_3")) // (c)(w)ould trigger LRU eviction
// try await coordinator.remove(resourceID: "resource2")

// try await coordinator.enqueueRemoval(resourceID: "resource2")
// try await coordinator.enqueueAddition(UnsafeResource(id: "resource_3"))

The motivation here is to manage, e.g., file-io for tensor-like file-based storage, etc., with minimal overhead. It would help to hold and pool FileDescriptor channels open until LRU eviction kicks in. However, I assume the actor related management and task creation overhead on my MacBook Pro M4 is around 4 µs (not sure if measured correctly: t_total, count_op, arg_op_duration, which is not much on its own but adds up quickly since we interact a lot with the data storage.

I like the semantics of structured concurrency, but I have to optimize for maximum throughput at the same time. Since we internally queue and resume, I am using withTaskCancellationHandler+ withCheckedThrowingContinuation which needs me to use Task {} twice here. The backend is swift-nio datagram based, so we are already in the event loop world, etc., but also not. I am really no fan of the non predictable overhead of an async await potential suspension and task creation, but at the same time it feels weird to further enforce that mixture of unfair locks with withCheckedThrowingContinuation where I couldn't find out if the latter might not cause more overhead than the unfair locks' gains in performance.