Cooperative pool deadlock when calling into an opaque subsystem

gwendal.roue · March 14, 2024, 5:06pm

Thanks for your sample code. Since I read about those potential deadlocks it the interaction between GCD aka Dispatch and Swift concurrency, I wasn't feeling comfortable with the async SQLite accesses of GRDB, which use GCD under the hood. No one has ever reported this issue, but maybe people are shy.

So I tried to reproduce the deadlock, inspired by your example, with a DatabasePool, the database connection that can perform parallel reads:

func testTaskGroup() async throws {
    let dbPool = try makeDatabasePool()
    try await dbPool.write { db in
        try db.execute(sql: "CREATE TABLE t(a)")
    }

    try await withThrowingTaskGroup(of: Int.self) { group in
        for i in 1...1000 {
            group.addTask {
                print("Task \(i) starting")
                return try await dbPool.read(Table("t").fetchCount)
            }
        }
        for try await count in group {
            print(count)
        }
    }
}

And... This works. No deadlock. Xcode 15.3, macOS 14.3.1, M1 Max.

Screenshot in the middle of the 1000 concurrent tasks:

This is a serious topic, so I'll share how, precisely, GCD is used here, even if I don't understand everything.

The read async method just uses a continuation on top of a completion-based method:

public func read<T: Sendable>(_ value: @escaping @Sendable (Database) throws -> T) async throws -> T {
    try await withUnsafeThrowingContinuation { continuation in
        asyncRead { result in
            do {
                try continuation.resume(returning: value(result.get()))
            } catch {
                continuation.resume(throwing: error)
            }
        }
    }
}

The asyncRead method asynchronously acquires a connection from a pool of SQLite connections. The pool has a maximum size: some reads must wait until a reader become available (this limit is enforced by a DispatchSemaphore, that we'll see below). The call to pool.asyncGet below eventually provides an available connection, and a release closure that must be called when we're done with the connection (so that the connection turns available for another read). Slightly simplified while retaining the important GCD stuff:

public func asyncRead(_ value: @escaping @Sendable (Result<Database, Error>) -> Void) {
    readerPool.asyncGet { result in
        do {
            let (reader, releaseReader) = try result.get()
            // Second async jump because that's how `Pool.asyncGet` has to be used.
            reader.async { db in
                value(.success(db))
                releaseReader(.reuse)
            }
        } catch {
            value(.failure(error))
        }
    }
}

The interesting comment above is "Second async jump because that's how Pool.asyncGet has to be used".

The reason for this async jump (we're talking about plain GCD DispatchQueue.async) is this post by @soroush, where he explains how to avoid thread explosion with GCD. Importantly each concurrent job that has to wait for the semaphore is enqueued in one serial DispatchQueue (see the blog post). Here's the code of readerPool.asyncGet:

/// Eventually produces a tuple (element, release), where element is
/// intended to be used asynchronously.
///
/// Client must call release(), only once, after the element has been used.
///
/// - important: The `execute` argument is executed in a serial dispatch
///   queue, so make sure you use the element asynchronously.
func asyncGet(_ execute: @escaping @Sendable (Result<ElementAndRelease, Error>) -> Void) {
    // Inspired by https://khanlou.com/2016/04/the-GCD-handbook/
    // > We wait on the semaphore in the serial queue, which means that
    // > we’ll have at most one blocked thread when we reach maximum
    // > executing blocks on the concurrent queue. Any other tasks the user
    // > enqueues will sit inertly on the serial queue waiting to be
    // > executed, and won’t cause new threads to be started.
    semaphoreWaitingQueue.async {
        execute(Result { try self.get() })
    }
}

The DispatchSemaphore is finally there, in pool.get() (again, simplified):

typealias ElementAndRelease = (element: T, release: @Sendable (PoolCompletion) -> Void)

/// Returns a tuple (element, release)
/// Client must call release(), only once, after the element has been used.
func get() throws -> ElementAndRelease {
    itemsSemaphore.wait()
    ...
    return (element: element, release: {
        itemsSemaphore.signal()
    })
}

That's the complete picture of GCD usage in GRDB. I omitted a few details in the pool that look unrelated to thread creation, such as a concurrent queue and a dispatch group that make it possible to run some code with the guarantee that no connection is used (similar to the "barrier" flag of dispatch queues) - features that are not used in the test.

I thought it could be interesting to share this experience. I'm not sure what is the workaround that @ole asks about, but it looks like it is possible to write GCD code that plays well with Swift concurrency?