Performance of AsyncSequence

I was a recently playing around with AsyncSequence and was wondering about the performance implications of awaiting each element of the sequence. This seems to me like it would come with a performance penalty over yielding multiple elements at once. As such it may be necessary to do the following in performance-sensitive scenarios:

for await chunk in foo {
  for element in chunk {
    process(element)
  }
}
  1. Is my intuition here correct?
  2. Would it make sense to alter AyncIteratorProtocol to provide a mutating func next() async -> [Element] where an empty array would signal termination (or something analogous).

It depends on how much inline visibility there is; one of Swift's things that it does quite well is peering into inlineable functions, this means that if the compiler can see through into the implementation it can determine that a set of calls are really just isomorphic to a memcpy in some cases and optimize down. The benchmarks so far are quite promising and are primarily limited on the async/await runtime more than the swift side of any implementations right now. That means as we optimize the compiler further in parity with the runtime being optimized we will get some pretty fast code - in that more often than not it won't matter too much (even doing things byte by byte is pretty reasonable from some initial tests/benchmarks).

Personally I would like to see async sequences be able to hit on a reasonably powerful desktop machine in the ranges of a few million elements per second.

If we determine that it is behooving to add a convenience mechanism to offer faster raw buffer access I would not claim that array would be the best path, but instead add a method to AsyncIteratorProtocol of func nextBuffer(_ apply: (UnsafeBufferPointer<Element>) -> Void) async rethrows or something along those lines w/ a default implementation that just calls next. That way we get the best of both worlds; a raw buffer dumping mechanism and a single element accessor.

5 Likes

To be clear on the perf standpoint; we are not yet at the millions of events per second, but I think that is obtainable. Other frameworks that do similar things can reach ~30M or so events per second on reasonably powerful desktop machines.

3 Likes