I had to use Java for a couple of things during my Master Thesis and I found out a very nice API that Swift does not provide yet which is parallelStream().
Now, for those of you who are not familiar with higher order functions in Java, basically if you need to map or filter a sequence, you first have to invoke .stream() on the sequence itself. The cool part is that if you invoke .parallelStream(), your sequence gets automatically divided into chunks and the whole execution gets parallelized on all the core of your machine.
I wanted to replicate this in Swift and so I studied the implementation of LazySequence and LazyCollection to have a starting point to work on.
I came up with this draft that is highly inspired by the implementation of Lazy which, for now, supports map and filter: https://github.com/Buratti/Parallel
Please note that I posted this code just to let you see what the main idea is, as there are a couple of problems with the current implementation that I will discuss later.
The usage is exactly the same as lazy:
let someCollection = 0..<30_000_000
let douples = someCollection.parallel.map { $0 * 2 }
print(doubles[1]) // 2
The above code will split the Range<Int> in n parts, apply the transform function on each of the n parts on a different thread in parallel and then flatten the result in a new ParallelCollection.
Current problems:
-
Since my current implementation is just a draft, I used Foundation.Thread but if we wanted to add parallel in the Standard Library we would need to work with pthreads and, as far as I know, it is not possible to use them to execute code that at a certain point will need to work with generic types.
I also tried to use SwiftPrivateThreadExtras with no luck.
-
As far as I know, neither Foundation.Thread nor DispatchQueues allow to rethrow errors.
-
There might be confusion on the combination of Parallel and Lazy and their behaviour should be deeply analyzed in order to decide what happens in cases like myArray.lazy.parallel.map, myArray.parallel.lazy.map or myArray.parallel.lazy.parallel.lazy.map.
-
It would be up to the user to synchronize the access to shared states inside of the given closure, so for example code like the following
var globalState: MutatingState = ...
var result = someCollection.parallel.map { val in
globalState.change()
return val.someOperation()
}
would need to be written as
var globalState: MutatingState = ...
let synchronized = Synchronized()
var result = someCollection.parallel.map { val in
synchronized {
globalState.change()
}
return val.someOperation()
}
(Out of topic: you can find my example of Synchronized here.
Conclusion
As @hartbit suggested to me, this idea might make more sense to implement once we have first-level concurrency features in the language, but I'd still like to discuss it in the community and hear your opinion about it.