Is this concurrentMap implementation efficient?

I'm exploring concurrentMap implementations looking for a better solution. I started with John Sundell's CollectionConcurrencyKit, here:

func concurrentMap<T>(
    withPriority priority: TaskPriority? = nil,
    _ transform: @escaping (Element) async -> T
) async -> [T] {
    let tasks = map { element in
        Task(priority: priority) {
            await transform(element)
        }
    }
    
    return await tasks.asyncMap { task in
        await task.value
    }
}

Is it ok to create a Task for every collection item? Assuming that I want to run concurrentMap for huge collections of maybe a million elements.

Other implementation I consider is limiting number of Tasks (making it a parameter), then using evenlyChunked(in: maxTasks) and perform maxTasks synchronous maps. Would it be faster/safer?

Chunking is almost certainly a good idea for perf, but may result in the "long pole" problem where one long-running chunk ends up taking a disproportionate amount of time.

The ideal solution likely involves work-stealing queues to deal with that, but simple chunking is sufficient for most people most of the time.

2 Likes

See these implementations in async-collections:

1 Like

Oh and as always: there's no substitute for empirical data. Profile early, profile often.

1 Like

@MahdiBM, they use the same approach with a Task for every element.

@David_Smith, yeah, I already bumped in a "long pole".

I'm not too good at Swift concurrency (yet) but I've never seen a proper / first-party library create new Tasks through the initializer in a loop, like in the library you mentioned.
I can't point out what exactly could be wrong/suboptimal about that code, but it looks off to me.
So using a task group is definetely the better way IMO, although I dont think anything is functionally wrong with just creating new Tasks.

Also in the link I provided there is somewhat smart and easy "maxConcurrentTasks" implementation which might be of use to you to limit the amount of tasks spawned and not just try to spawn as many tasks as Swift lets you.

Let's wait for a Concurrency expert who will explain how well tasks are optimized, maybe it's ok to create huge amount of them. But my first though would be - not ok. Because a Task carries a context with it.

Thanks for pointing out maxConcurrentTasks. I didn't scroll that page all the way down. I guess for me it works right now. However a Concurrency expert is still welcome, I wish to be educated.

Task group for me also feels a better way to do the parallel computation. I suspect that John's asyncMap() isn't async. It just waits for every element in a loop with await. Here:

func asyncMap<T>(
    _ transform: (Element) async throws -> T
) async rethrows -> [T] {
    var values = [T]()

    for element in self {
        try await values.append(transform(element))
    }

    return values
}