There is one potentially important difference between the behavior of the two snippets!
In the original code, batchShuffle is always invoked on a fresh, unique copy of input. The benchmark loop will look like this:
var array = input; array.reserveCapacity(0)
timer.measure { array.batchShuffle(using: &rng) }
array = input; array.reserveCapacity(0)
timer.measure { array.batchShuffle(using: &rng) }
array = input; array.reserveCapacity(0)
timer.measure { array.batchShuffle(using: &rng) }
...
In the updated version, array is created exactly once per input. The unique array instance gets captured by the actual task closure, and the same array gets repeatedly mutated through batchShuffle, feeding its output to the input of the next iteration. The benchmark loop looks like this:
var array = input; array.reserveCapacity(0)
timer.measure { array.batchShuffle(using: &rng) }
timer.measure { array.batchShuffle(using: &rng) }
timer.measure { array.batchShuffle(using: &rng) }
...
This whole nesting of closures thing is really quite subtle -- but it gives us the most flexibility without forcing developers to create one or more custom types per task to represent its various states. (Although that would be a potentially palatable API alternative, perhaps leading to easier to understand (if more verbose) definitions.)
If chaining batchShuffle like that is not expected to change its performance, then the difference isn't an issue. (An example of an operation that would probably be significantly affected by such chaining is the standard Array.sort().)
Avoiding timer.measure when possible indeed improves the stability of benchmark results, as the runner can decide to batch up multiple executions in a single timer run, and that can significantly improve the effective clock resolution. (Plus it amortizes any variable cost of accessing the clock itself).
The difference should just be a lower variance of the measured results on the left (quick) side of the charts (and sometimes faster gathering of data), without significantly affecting the minimum or average execution time. Experimental results seem to confirm that the results are close enough between the various execution styles, but I haven't done a formal proof of it -- statistics is not my strongest field. 
Re rng: I'm a little surprised that it's okay to have the inout capture of a local in the task closure, as add(title:input:body:) takes an escaping closure, too. But if it works, then it works! The benchmarks are always run one at a time, on a single thread, with no concurrency.
If you want the tasks to always run on the same random sequence, this would be one way to do it:
self.add(
title: "Batch shuffle (\(rngName))",
input: [Int].self,
body: { input in
var array = input
array.reserveCapacity(0)
return { timer in
var rng = rng // <==
array.batchShuffle(using: &rng)
blackHole(array)
}
}
)
The tradeoff is that this would add the cost of copying the generator to the measured task.
If you want each task execution to run on a distinct random number sequence, I think one good option would be to initialize the RNG from some seed within the task:
self.add(
title: "Batch shuffle (\(rngName))",
input: [Int].self,
body: { input in
var array = input
array.reserveCapacity(0)
return { timer in
var rng = MyRNG(<some unique seed>)
array.batchShuffle(using: &rng)
blackHole(array)
}
}
)
The unique seed could come from a global integer variable (using, say, Atomic<Int>.wrappingAdd(1, ordering: .relaxed).oldValue))
If initializing/copying the RNG has significant cost, it may be a good idea to revert to timer.measure and let the benchmark run for more cycles -- that should similarly help smoothing out the deviation.