Performance problem sending WebSocket frames

I'm rather new to SwiftNIO but I think I'm starting to understand it. I have an app that needs to perform a speed test, testing the communication speed to and from a server. To do this it sends and receives binary web socket frames. I'm on a 1 Gb network with a switch between the client and server. Receiving data achieves 110 MB/s but sending gets me only 10 Mbps.

I've been debugging this and have now reduced the code into a very simple test that only uses the WebSocketFrameEncoder, all other parts of the web socket handling has been removed. The receiver on the server is a simple TCP listener that only receives data and throws it away after measuring the speed.

fileprivate class SpeedHandlerTcp: ChannelOutboundHandler {
    typealias OutboundIn = ByteBuffer
    typealias OutboundOut = ByteBuffer

    func write(context: ChannelHandlerContext, data: NIOAny, promise: EventLoopPromise<Void>?) {
        context.write(data, promise: promise)
    }
}

fileprivate class SpeedHandlerWebSocket: ChannelOutboundHandler {
    typealias OutboundIn = ByteBuffer
    typealias OutboundOut = WebSocketFrame

    func write(context: ChannelHandlerContext, data: NIOAny, promise: EventLoopPromise<Void>?) {
        let buffer = self.unwrapOutboundIn(data)
        var key = [UInt8]()
        for _ in 1 ... 4 {
            key.append(UInt8.random(in: 0 ... 255))
        }
        let maskingKey = WebSocketMaskingKey(key)
        let frame = WebSocketFrame(fin: true, opcode: .binary, maskKey: maskingKey, data: buffer)
        context.write(self.wrapOutboundOut(frame), promise: promise)
    }
}

fileprivate class Connection {
    private var ip: String
    private var port: Int
    private var useWebSocket: Bool
    
    private let group = MultiThreadedEventLoopGroup(numberOfThreads: 2)
    private var channel: Channel?
    let bufferSize = 65536
    private let buffer: ByteBuffer
    private var startTime: Date?
    private var bytesWritten = 0
    
    
    init(ip: String, port: Int, useWebSocket: Bool) {
        self.ip = ip
        self.port = port
        self.useWebSocket = useWebSocket
        self.buffer = ByteBuffer(repeating: 0, count: bufferSize)
    }
    
    func start() {
        let bootstrap = ClientBootstrap(group: group)
            .channelOption(ChannelOptions.socketOption(.so_reuseaddr), value: 1)
            .channelOption(ChannelOptions.tcpOption(.tcp_nodelay), value: 1)
            .protocolHandlers {
                if self.useWebSocket {
                    return [
                        WebSocketFrameEncoder(),
                        SpeedHandlerWebSocket()
                    ]
                } else {
                    return [SpeedHandlerTcp()]
                }
            }
        
        let channel = bootstrap.connect(host: ip, port: port)
        channel.whenSuccess { channel in
            print("Connected")
            self.channel = channel
            self.channel?.eventLoop.scheduleRepeatedTask(
                initialDelay: .seconds(1),
                delay: .seconds(1),
                notifying: nil
            ) { _ in
                self.statistics()
            }
            self.writeData()
        }
        
        channel.whenFailure { error in
            print("Connect error: \(error)")
        }
    }
    
    func statistics() {
        let speed = Double(bytesWritten) / Double(Date().timeIntervalSince(startTime!)) / 1_000_000.0
        print("Speed: \(speed) MBps")
    }

    func writeData() {
        bytesWritten += bufferSize
        if startTime == nil { startTime = Date() }
        channel?.writeAndFlush(buffer).whenSuccess(writeData)
    }
}

let connection = Connection(ip: "10.0.30.40", port: 8888, useWebSocket: true)
connection.start()

The code can either use a raw TCP connection or filter everything through WebSocketFrameEncoder. When using raw TCP I get full speed, using WebSocketFrameEncoder reduces it by 90%. With raw TCP the code uses around 40% CPU, with web socket is uses 100% CPU. Also, the scheduledRepeatedTask never gets called when using web socket.

I'm sure I'm doing something wrong but I can't figure out what it is. I've tried a lot of different approaches but nothing seems to help. Right now I'm using

writeAndFlush().whenSuccess()

to loop the write operations but I've tried other ways of doing this, such as checking the isWritable property of the channel but it doesn't help. I've also tried using multiple threads to synchronise operations but there's still no change.

I'm sure the problem is that the NIO thread is using all CPU time. I tried using Instruments to figure out what it's doing but it didn't help.

Can anyone see what's wrong? Any help is greatly appreciated.

You're likely burning a lot of CPU generating random bytes like that. I suggest just sending constant bytes. If you need random bytes, you can generate four at a time rather than looping over UInt8's. I do something similar in my test server:

var buffer = request.application.allocator.buffer(capacity: count)
let big = count / 8
let remainder = count % 8

for _ in 0..<big {
    buffer.writeInteger(UInt64.random())
}

for _ in 0..<remainder {
    buffer.writeInteger(UInt8.random())
}

Where count is the total number of bytes I want to send.

Like I said, sending non-random data will be faster, whether that is entirely constant data or just randomly generating a certain number of bytes and repeating them when sending.

A NIO engineer may have better ideas for overall performance here as well.

Thanks, but I've already tried a version using pre-generated frames that are already masked and fully prepared, but the result is the same.

Ah, right, that's just a random frame identifier. How are you generating your data, and how much are you sending at one time?

Otherwise your next step is a flame graph or Instruments to see where the CPU is going, barring an obviously solution from someone more familiar with NIO.

Before we do any more digging, can you please confirm that you're using -c release to build your code?

3 Likes

I was using a debug build. Changing it to release fixed the problem!

Thank you so much! It never occurred to me that this could be the problem. I was just assuming that I didn't understand SwiftNIO properly.

3 Likes

Yeah this is an easy problem to hit: Swift's "zero-cost" abstractions are only zero-cost in release mode, they can be expensive in debug mode. I'm pleased you got stuff working!

2 Likes

Speaking of which. Any clues on what could be a bottleneck? Not necessarily in this particular case, but rather in general.