I had to use Java for a couple of things during my Master Thesis and I found out a very nice API that Swift does not provide yet which is parallelStream()
.
Now, for those of you who are not familiar with higher order functions in Java, basically if you need to map or filter a sequence, you first have to invoke .stream()
on the sequence itself. The cool part is that if you invoke .parallelStream()
, your sequence gets automatically divided into chunks and the whole execution gets parallelized on all the core of your machine.
I wanted to replicate this in Swift and so I studied the implementation of LazySequence
and LazyCollection
to have a starting point to work on.
I came up with this draft that is highly inspired by the implementation of Lazy
which, for now, supports map
and filter
: https://github.com/Buratti/Parallel
Please note that I posted this code just to let you see what the main idea is, as there are a couple of problems with the current implementation that I will discuss later.
The usage is exactly the same as lazy
:
let someCollection = 0..<30_000_000
let douples = someCollection.parallel.map { $0 * 2 }
print(doubles[1]) // 2
The above code will split the Range<Int>
in n
parts, apply the transform
function on each of the n
parts on a different thread in parallel and then flatten the result in a new ParallelCollection
.
Current problems:
-
Since my current implementation is just a draft, I used
Foundation.Thread
but if we wanted to addparallel
in the Standard Library we would need to work withpthread
s and, as far as I know, it is not possible to use them to execute code that at a certain point will need to work with generic types.
I also tried to use SwiftPrivateThreadExtras with no luck. -
As far as I know, neither
Foundation.Thread
norDispatchQueue
s allow to rethrow errors. -
There might be confusion on the combination of
Parallel
andLazy
and their behaviour should be deeply analyzed in order to decide what happens in cases likemyArray.lazy.parallel.map
,myArray.parallel.lazy.map
ormyArray.parallel.lazy.parallel.lazy.map
. -
It would be up to the user to synchronize the access to shared states inside of the given closure, so for example code like the following
var globalState: MutatingState = ...
var result = someCollection.parallel.map { val in
globalState.change()
return val.someOperation()
}
would need to be written as
var globalState: MutatingState = ...
let synchronized = Synchronized()
var result = someCollection.parallel.map { val in
synchronized {
globalState.change()
}
return val.someOperation()
}
(Out of topic: you can find my example of Synchronized
here.
Conclusion
As @hartbit suggested to me, this idea might make more sense to implement once we have first-level concurrency features in the language, but I'd still like to discuss it in the community and hear your opinion about it.