We use AsyncStream to transfer data from non-async to async domain and can see that it severely limits throughput of our app.
Raw read from our data source gives us ~700 K messages per second, but after AsyncStream we can get only ~90K, i.e. ~7-8x slowdown.
Samples point to __ulock_wait() called from AsyncStream.Continuation.yield() so it looks like contention on some internal AsyncStream mutex.
So the questions are:
is such slowdown something known/expected?
are there plans to improve implementation using e.g. some lock-free structures?
BTW if I use DispatchQueue instead I get only ~ 3-4x slowdown (It's "unbounded", but I use the same policy for AsyncStream).
Hardware I use is MB Pro M1 max, macOS 13.2.1
$ swift --version
swift-driver version: 1.62.15 Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)
Target: arm64-apple-macosx13.0
On Linux it looks a bit better: AsyncStream slowdown is ~4x, DispatchQueue -- ~2x
Interesting to see a difference between Linux and MacOS.
Could you share the code you used to test the performance? I would love to understand if you are yielding/consuming from a single task/thread or from many.
For very high performance (hundreds of thousands of events per second) likely would need tuning to a custom AsyncSequence - my guess would be something similar to how we do AsyncBufferedByteIterator.
I tried to make a small reproducer and got slightly different numbers -- slowdown is only ~ 2-3x (which is similar to DispatchQueue performance).
But the heaviest stacks in sample looks similar.
Nice! just casually looking at that it looks about on point. This is something I have considered making more formalized as a general purpose thing - there are definitely some opportunities there to make something that works more generally for that use.