This looks essentially equivalent to what was shown in this thread that I previously linked, except that they hard-coded the number of chunks and you left it unspecified.
Is there a good way to make the number of chunks automatically match the number of CPU cores, or otherwise just let the concurrency system decide how many to use?