Performance of AsyncStream

Hi,

We use AsyncStream to transfer data from non-async to async domain and can see that it severely limits throughput of our app.
Raw read from our data source gives us ~700 K messages per second, but after AsyncStream we can get only ~90K, i.e. ~7-8x slowdown.
Samples point to __ulock_wait() called from AsyncStream.Continuation.yield() so it looks like contention on some internal AsyncStream mutex.

So the questions are:

  • is such slowdown something known/expected?
  • are there plans to improve implementation using e.g. some lock-free structures?

BTW if I use DispatchQueue instead I get only ~ 3-4x slowdown (It's "unbounded", but I use the same policy for AsyncStream).

Hardware I use is MB Pro M1 max, macOS 13.2.1

$ swift --version
swift-driver version: 1.62.15 Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)
Target: arm64-apple-macosx13.0

On Linux it looks a bit better: AsyncStream slowdown is ~4x, DispatchQueue -- ~2x

Thanks!

3 Likes

Interesting to see a difference between Linux and MacOS.

Could you share the code you used to test the performance? I would love to understand if you are yielding/consuming from a single task/thread or from many.

1 Like

For very high performance (hundreds of thousands of events per second) likely would need tuning to a custom AsyncSequence - my guess would be something similar to how we do AsyncBufferedByteIterator.

4 Likes

Thanks for replies and sorry for delay.

I tried to make a small reproducer and got slightly different numbers -- slowdown is only ~ 2-3x (which is similar to DispatchQueue performance).
But the heaviest stacks in sample looks similar.

We are reading data from Kafka so I forked swift-kafka-gsoc -- GitHub - dimlio/swift-kafka-gsoc at async-stream-performance -- and added a simple producer to populate a topic with messages and a consumer that reads messages and transfers them to another task/thread.

Assuming there is a Kafka broker running locally:

# Create a topic with many partitions
$ bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic read-perf-test --partitions 6
# Populate the topic with 10M 100-byte messages
$ KAFKA_BROKERS=localhost:9092 swift run -c TestProducer
# Consume
$ KAFKA_BROKERS=localhost:9092 swift run -c TestConsumer poll
Consume using raw poll
rate: 609399 msgs/s, 58 MiB/s
rate: 592405 msgs/s, 56 MiB/s
...
$ KAFKA_BROKERS=localhost:9092 swift run -c release TestConsumer stream
Consume using AsyncStream
rate: 367188 msgs/s, 35 MiB/s
rate: 275825 msgs/s, 26 MiB/s
...
$ KAFKA_BROKERS=localhost:9092 swift run -c release TestConsumer queue
Consume using DispatchQueue
rate: 539611 msgs/s, 51 MiB/s
rate: 561143 msgs/s, 53 MiB/s
...
$ KAFKA_BROKERS=localhost:9092 swift run -c release TestConsumer nio
Consume using NIOAsyncSequenceProducer
rate: 277865 msgs/s, 26 MiB/s
rate: 285212 msgs/s, 27 MiB/s

The numbers above are from MacBook Pro.

On Linux (different hardware, but reading from the same cluster):

$ KAFKA_BROKERS=192.168.0.5 swift run -c release TestConsumer poll
Consume using raw poll
rate: 668716 msgs/s, 63 MiB/s
rate: 811677 msgs/s, 77 MiB/s
...
$ KAFKA_BROKERS=192.168.0.5 swift run -c release TestConsumer stream
Consume using AsyncStream
rate: 204341 msgs/s, 19 MiB/s
rate: 211097 msgs/s, 20 MiB/s
...
$ KAFKA_BROKERS=192.168.0.5 swift run -c release TestConsumer queue
Consume using DispatchQueue
rate: 298310 msgs/s, 28 MiB/s
rate: 282511 msgs/s, 26 MiB/s
...
$ KAFKA_BROKERS=192.168.0.5 swift run -c release TestConsumer nio
Consume using NIOAsyncSequenceProducer
rate: 102812 msgs/s, 9 MiB/s
rate: 99723 msgs/s, 9 MiB/s

Thanks!

1 Like

Thanks for pointing to AsyncBufferedByteIterator

Using the same idea I made my own AsyncSequence implementation with performance similar to raw read and nice async sequence interface.

Thanks!

4 Likes

Nice! just casually looking at that it looks about on point. This is something I have considered making more formalized as a general purpose thing - there are definitely some opportunities there to make something that works more generally for that use.

4 Likes