Technically, yes, you can do this.
A major caveat, though: If your synchronous calculation is so slow as to warrant parallelism, then it might be so slow that we would want to keep it out of the Swift concurrency cooperative thread pool. We have a “runtime contract” with Swift concurrency that threads should always be able to make “forward progress”. We should never block a thread from the cooperative thread pool for any prolonged time.
So, if the calculation is just a stand-in for something substantially more complicated, you should consider keeping these intense calculations out of Swift concurrency entirely, and instead use, for example, GCD’s concurrentPerform
, and then bridge back to Swift concurrency with a continuation (e.g., withCheckedContinuation
) or other equivalent pattern. And if the calculations are so simple enough that they won’t necessitate this, then you should benchmark your parallel algorithm against a simple serial calculation, and verify if there is any performance benefit from parallelism. Often, with simple algorithms, there is not, and parallel implementations can even be slower than their serial counterparts.
A few observations:
-
We generally stride in order to chunk our work. In your example, if you are striding by 10. That means that each iteration would likely process 10 items each. But, you are only doing one calculation per task. I don’t know if that was your intent or not.
-
There is a slight overhead introduced by parallelization and synchronization. To avoid having this offset any the performance gains you were hoping to achieve, we generally maximize the amount of work on each task. If you do not have enough work on each task, the modest overhead of parallelization can completely wipe out any performance gains from doing the work in parallel. It is not at all unusual to have naive parallelization routines to actually be much slower than their serial counterparts.
For what it is worth, processing 1 integer (or even 10 of them) per task is unlikely to be nearly enough work per task to enjoy parallelism performance benefits. E.g., if I was processing 1m items, I might consider 100 tasks, each processing 10,000 items, or something of that order of magnitude. And with something as simple as “multiply by a random integer”, even that may be insufficient. (That having been said, I assume your multiplication example was just a proxy for something more complicated.)
-
As an aside, we also stride is to avoid false sharing. So, striding by 10 may be insufficient to achieve that.
-
When performing parallelization, note that tasks can finish in an arbitrary order. Usually when processing an ordered collection, you want the results in the same order. Your code snippet will not do that. If order is important, you’d collate your results to get the results back in the intended order.
So, let us assume for a second that we fix the example so that it does enough synchronous work on each task to justify parallelization. The question (and I think this was your original question), is whether this is suitable to do within Swift concurrency. The answer is, in short, no, it might not be.
The problem is that we have a contract with Swift concurrency that threads should always be able to make forward progress. E.g., Swift concurrency: Behind the scenes says:
Recall that with Swift, the language allows us to uphold a runtime contract that threads will always be able to make forward progress. It is based on this contract that we have built a cooperative thread pool to be the default executor for Swift. As you adopt Swift concurrency, it is important to ensure that you continue to maintain this contract in your code as well so that the cooperative thread pool can function optimally.
In short, to enjoy the benefits of parallel execution, you need enough work on each item to offset the modest overhead of parallelism. But this is at odds with our runtime contract with Swift concurrency (namely, that we will never block a thread from the cooperative thread pool). Technically you can periodically await Task.yield()
inside your blocking code (to satisfy this runtime contract), but that will completely overwhelm any performance gains achieved by parallelism.
So, what to do? In Visualize and optimize Swift concurrency, they suggest:
… move that code outside of the concurrency thread pool– for example, by running it on a Dispatch queue– and bridge it to the concurrency world using continuations. Whenever possible, use async APIs for blocking operations to keep the system operating
So, they are recommending that you keep this code within legacy patterns (e.g., concurrencyPerform
for your parallelized algorithm) and then bridge it back with a continuation.
There are lots of ways to do this. This is one example where I wrap concurrentPerform
in an asynchronous sequence of results:
extension DispatchQueue {
/// Chunked concurrentPerform
///
/// - Parameters:
///
/// - iterations: How many total iterations.
///
/// - chunkCount: How many chunks into which these iterations will be divided. This is optional and will default to
/// `activeProcessorCount`. If the work is largely uniform, you can safely omit this parameter and the
/// work will evenly distributed amongst the CPU cores.
///
/// If different chunks are likely to take significantly different amounts of time,
/// you may want to increase this value above the processor count to avoid blocking the whole process
/// for slowest chunk and afford the opportunity for threads processing faster chunks to handle more than one.
///
/// But, be careful to not increase this value too high, as each dispatched chunk entails a modest amount of
/// overhead. You may want to empirically test different chunk sizes (vs the default value) for your particular
/// use-case.
///
/// - qos: The `DispatchQoS` of the work to be performed. Defaults to `.utility` which offers good performance,
/// while minimizing preemption of code on higher priority queues, such as the main queue.
///
/// - operation: Closure to be called for each chunk. The `Range<Index>` is parameter.
///
/// - Returns: An asynchronous sequence of the processed chunks.
static func concurrentPerformResults<T: Sendable>(
iterations: Int,
chunkCount: Int? = nil,
qos: DispatchQoS.QoSClass = .utility,
operation: @Sendable @escaping (Range<Int>) -> T
) -> AsyncStream<(range: Range<Int>, values: T)> {
AsyncStream { continuation in
DispatchQueue.global(qos: qos).async {
let chunks = Swift.min(iterations, chunkCount ?? ProcessInfo.processInfo.activeProcessorCount)
let (quotient, remainder) = iterations.quotientAndRemainder(dividingBy: chunks)
let chunkSize = remainder == 0 ? quotient : quotient + 1
DispatchQueue.concurrentPerform(iterations: chunks) { chunkIndex in
let start = chunkIndex * chunkSize
let end = min(start + chunkSize, iterations)
let range = start ..< end
continuation.yield((range, operation(range)))
}
continuation.finish()
}
}
}
}
And then I can do things like:
actor Experiment {
func performExperiment() async {
let iterations = 1_000_000
let values = Array(0..<iterations)
let sequence = DispatchQueue.concurrentPerformResults(iterations: iterations) { range in
values[range].map { $0 * 2 }
}
var results = Array(repeating: 0, count: iterations)
for await chunk in sequence {
results.replaceSubrange(chunk.range, with: chunk.values)
}
print(results)
}
}
And when I profile this:
Note that these finish in a fairly random order (which is why I built the array and used replaceSubrange
, to reassemble the results in the correct order).