Multiplexing or multiple connections?

My topic here went a little bit off topic, so I want to open a separate one to isolate my question.

I still have very little knowledge about networking, but I managed to create a simple TCP server and client application, what is working quite well. But as I wanted to introduce some new features (at the moment server and client only communicates with JSON messages and now I also want to transfer files), I faced some conecptual problems in my code.

In the other thread and by googling I figured out, that TCP only uses one stream with one way from client to server and one way from server to client. If I will send data, it will be queued in a buffer and the other side will order this data in the same way it was sent from the one side.

So now my problem is, that if I introduce a way to send bigger data from the server to the client, this will block my whole application: In the meantime there is no way to receive other messages from the server.

I now heard about "multiplexing" and I found this thread, but unfortunately there are no real informations I could use. The only information I can find in the internet about multiplexing is about HTTP/2. It seems, that this multiplexing really allows sending data at the same time as I would need in my case. But unfortunately I am not able to understand the multiplexing part of the code in swift-nio-http2 or swift-nio-ssh.

I am unsure, what would be the best now:

Trying to implement any kind of multiplexing in my own code (but how?) or just using multiple connections? My other idea (I didn't try that yet) was to open multiple connections at the same time, then I can use one main connection for all the simple and small data and one or more other connections for transferring files. But then this leads me to another question: Every user in my applications authorizes himself with username and password. I would then have two possibilities: Option a) Open up a given number of seperate connections when authorizing at the beginning (so I would have for example 4 additional connections for file transfers) or option b) I open up one main connection at the beginning and the client gets an authentification token, he can use when opening separate connections for file transfer.

Then I also could switch my whole project to HTTP, I could maybe create a REST API. But in this case the client would need to connect for every request to the server, every time I would need a TLS handshake and I have the bad feeling, that this would slow down everything (that was the initial reason, why I use a permanent TCP connection).

I know, that my question has to do a lot about the conception of my application. But I had to write all those, because I wanted to demonstrate my problem leading in a simple question: Is there an easy way to use multiplexing when using a ClientBootstrap and a ServerBootstrap?

1 Like

I think you learn a lot by building your own network protocol, so I fully support that idea.

That being said, if your payloads are anyhow mostly JSON, maybe just pick a battle tested protocol that has well tested implementations available…

I am sure @lukasa will probably give the definitive answer here, but I will give it a shot anyhow:

HTTP 1.1 supports connection reuse, so after a request and response pair has completed, the connection can be reused for the next request. Usually a client will use some kind of connection pool that maintains up to some max number of connections with a server and will distribute the requests accordingly. This way a long lasting file upload will not block other requests. Also, as you said, once the connections are established, there is no additional overhead for subsequent requests that reuse an existing connection.

HTTP 2 adds to this the ability to even execute multiple requests on the same connection concurrently. This is done by multiplexing, essentially every request/response is part of a logical stream identified by a number. Server and client use these stream ids to correctly relate the messages together. To avoid that a large file transfer would „block“ the entire connection, I thin, large messages are split into smaller pieces (called DATA frames) such that other requests or responses can be interspersed with the large file transfer.

Anyhow, in Swift land Async HTTP client supports both HTTP 1.1 and 2 with persistent connections and connection re-use.

Unless your specific use-case forbids the overhead of HTTP (big question mark), it is probably easier to build your communication on top of that.

If you want to go bit more fancy and abstract away some of the nitty gritty network details, you could also check out gRPC which is built on top of HTTP 2 and gives you very efficient message encoding and decoding as well as bi-directional streaming support. There is an excellent Swift implementation available for both server and client. Of course this adds a full new layer of complexity to your stack that takes time to fully understand.

To answer your last question, I think building multi-plexing is „up to you“. In your protocol and channel handlers you need some way to identify which message parts belong to which logical stream and then correctly assemble those parts.
This is essentially what the multiplexer in NIO HTTP2 does.

1 Like

Let's start with this because it's the easiest: no, every client would not need to create a new connection for every request. All good HTTP implementions aggressively re-use connections whenever possible, even in HTTP/1.1. Far and away the best way to solve this problem for you is to get out of the business of doing custom networking, and use HTTP.

With that said, let's tackle this question:

At a fundamental level, multiplexing is a way to send multiple parallel streams of data inside a single meta-stream of data. This is a really very simple idea. The simplest way to achieve multiplexing is to use a simple TLV (type-length-value) pattern for structuring your data.

For example, imagine you want to be able to send n parallel streams of data at once. You decide to identify each stream with a unique number, starting from 0 and going upwards. How would we encode this?

The simplest way might look something like this:

    +---------------------------------------------------------------+
    |                 Stream Identifier (32)                        |
    +---------------------------------------------------------------+
    |                 Length (32)                                   |
    +---------------------------------------------------------------+
    |                 Payload (0...)                      ...
    +---------------------------------------------------------------+

In this case, the data in each stream is chunked up into chunks no larger than 2^32 bytes in size, tagged with the stream it belongs to, and then sent. This gives the receiver enough information to work out where each chunk begins and ends, and then to "demultiplex" the payload into a series of parallel streams.

It might seem too simple, but this is really all that multiplexing is. In fact, that format above was taken from the HTTP/2 "frame format" section from the (now-obsolete) original HTTP/2 specification, with some of the unnecessary stuff removed.

However, an important note about multiplexing is that it does not inherently solve the problem you described in your original post:

This problem (called "head of line blocking") is not solved by multiplexing. Multiplexing reduces it, because you no longer need to wait for an entire message to be sent before you can handle a different one, but it doesn't eliminate it. You get further by introducing more complex concepts, like flow control and priorities. But even then, the shared TCP connection is a single bottleneck and if it gets stuck, you cannot make forward progress at all.

All of this is to say, there's a reason that HTTP/2 has been supplemented by HTTP/3: HTTP/3 is able to finally remove the one remaining head-of-line blocking problem that HTTP/2 could not.

3 Likes

I'd also like to suggest that if you're curious about this, you should consider reading some of the early sections of RFC 9113, the HTTP/2 specification. I'm a little biased, but I think we did a decent job of making the early sections reasonably approachable for non-experts.

2 Likes

oh my, if only it were that simple.

today, if you want to do HTTP in swift on the server, you basically have two options:

  1. you can write the channel handlers yourself and make a bare-bones, HTTP/1-only server implementation, and you are responsible for everything.

  2. you can write a delegate for a Web Framework (such as Vapor) that does everything, and shoehorn your use-case to fit Vapor’s idea of a Server-Side Swift Web Service, which wouldn’t otherwise make a lot of sense were it not for the motivation of you not wanting to spend time doing #1.

there aren’t a lot of options in between those two extremes.

1 Like

While I understand what this feedback is trying to drive at, I don't think it's appropriate to the discussion we're having here. The feedback I gave was specifically targetted at the question of pivoting the project to use REST, where @Lupurus had specified a concrete objection to doing so (TCP connection re-use). My point there was that if this was the only reason not to do HTTP, then @Lupurus should do HTTP, because this problem doesn't actually manifest.

Though for what it's worth, if you want something between those two extremes you could consider using Hummingbird, specifically hummingbird-core.

1 Like

Thank you so much for all your detailed answers. I will have a closer look to the HTTP specifications, as this is surely a very interesting topic.

I still don't really understand the multiplexing, I have to read more about it. But so far... Is the data being sent one frame after another or is it sent truly in parallel? Option a) would be

Queue: | Frame 1/20 Stream 1 | Frame 1/3 Stream 2 | Frame 2/20 Stream 1 | Frame 2/3 Stream 2 | .....

Or is it:

Queue 1: | Frame 1/20 | Frame 2/20 | .....
Queue 2: | Frame 1/3 | Frame 2/3 | .....

At the beginning of my project, I thought about using a REST API. But I choosed TCP, because I think, that this will be faster when having a minimal protocol that only fits my needs. One other argument against REST is, that in my case I have a multi user application and sometimes the changes of one user will be pushed to the other users. So would I use a REST API, I would need (web-)sockets for this.

1 Like

It is being sent one frame after another, but this is not different to what happens in all networking. The only way to achieve "truly parallel" transmission (which I'll define as "sending or receiving two network packets at the exact same time on the local clock") is to have two entirely separate physical links. This can mean two ethernet cables, or one ethernet and one wifi, or two wifi radios, or something similar to that. But assuming you have only one physical link then nothing is sent truly in parallel.

If we use ethernet as an example, down at the ethernet level your data is chunked up and sent in frames. While a frame is being sent on a physical length of ethernet cabling, no other data can be sent. Full-duplex media allow you to receive while you send, but there can only be one frame sent at a time. Period.

If you create two TCP connections, packets may be sent "simultaneously" on those two connections but, once the TCP segments have been created, your kernel will create ethernet frames as well and then hand those frames off to the NIC driver which will send them one at a time.

Multiplexing as described above does the same thing, just at a higher level of abstraction. In fact, you can argue that all of the wire protocols in your stack are already multiplexed! Ethernet frames carry a 16-bit "ethertype" field which tells the kernel what kind of data is inside the frame. When that's set to 0x0800 (IPv4), the IP packet header contains an 8-bit "protocol" field that tells the kernel what kind of data is inside the IP packet. When that is set to 0x06 (TCP), the TCP segment header contains two 16-bit numbers, a source port and a destination port, that together identify what connection a segment belongs to. These are all multiplexed, supporting multiple different kinds of data inside one another.

So yes, with multiplexing the data is sent one frame after another. But it will be anyway.

3 Likes

Wow, your knowledge is just impressive. Thank you so much for your perfect explanations and for all the time you invested in helping me, I really appreciate that. Now after your answer everything is clear for me: There is no real simultaneous transfer because it's not possible with one ethernet connection. Should have been obvious ;)

I will now think about the best (and maybe easiest) way to prevent blocking data transfers.

1 Like

As literally always, @lukasa is giving very readable and clear descriptions of everything. Just @Lupurus is asking about "way to prevent blocking data transfers" I want to highlight one of Cory's earlier snippets that may have been a little hidden between all the other important information.

I think this is an important point which helps to understand other multiplexed systems -- especially HTTP/2.

I think it's already clear how HTTP/2 (and many other protocols, even TCP) does the multiplexing: It embeds a stream (port) ID into each frame. This doesn't however immediately explain why the multiplexing still works if we hit the following issue: What if the user enqueues a huge amount of data for one particular stream. How do we prevent starvation of the other streams?

And I think flow control (and to a lesser extend priorities) are important here. In HTTP/2's case IMHO the most important ones are:

  • MAX_FRAME_SIZE: It's illegal to put more than that into a single frame (which means that at some point that frame is sent which allows "switching" to other streams)
  • The flow control windowing mechanism: The sending peer just isn't allowed to send arbitrary amounts of data on either the connection or an individual stream. The connection starts with an initial window (64 kiB) and so does every stream (also 64 kiB, subject to settings). Unless the receiving peer explicitly sends a WINDOW_UPDATE frame, the sending peer will reach a state where it's no longer allowed to send any more data (on both the whole connection and individual streams).

Those two concepts combined do effectively prevent the problem that could arise if an HTTP server would for example send a multi-GB response in one go. At first it appears that that multi-GB thing might need to go out "in one piece" which would therefore block anything else from being sent on the single TCP stream. But that's not true for two reasons:

  1. (Assuming that it's larger than MAX_FRAME_SIZE): It has to be chopped up into multiple smaller frames that would get reassembled on the other side
  2. (Obviously depending on what window updates/setting the sender sent) The sender wouldn't be allowed to send arbitrary amounts of data at once. It would be forced to send a bit (no more than the allowed remaining connection & stream windows), wait for the WINDOW_UPDATE(s), then send more. Whilst waiting for the WINDOW_UPDATEs for one stream, it may (and hopefully) will use the remaining connection window (assuming that's not exhausted too) for other streams.

I hope that helps. I'll amplify two (annoying) points that @lukasa already added:

  1. Flow-control is not easy to implement, whatever protocol you base it on. Hence: Using something that's already multiplexed for you (e.g. HTTP/2 or HTTP/3 or even multiple TCP connections) is much simpler. Especially because you also have standard tooling to analyse potential issues.
  2. Doing flow-control over TCP is kinda like doing TCP over TCP and that has a number of interesting issues. I won't go into details here but the most important one is what Cory already mentioned: It won't fully solve the head of line blocking. If you lose a packet that happens to carry data for stream 1, this will also slow down stream 3 on the same connection. They are not independent because they're both in the same TCP connection. QUIC/HTTP/3 solve that.
5 Likes

Thank you Johannes for joining this thread with even more useful informations.

I am still not completely sure which way I will go. Using HTTP with additional websockets for push messages, using multiple connections or implement multiplexing. I think I will read the HTTP specification to gather some more background informations.

The flow control though is something I still struggle with. As far as I understand, it will prevent, that the client will loose frames, if he is not capable of processing all incoming frames. Is this true? If so, then this is not a specific topic of multiplexing, is it? Using only one connection with one stream and one data it also could happen, that the other party will loose data.

Without adding flow control, multiplexing seems now not so complicate anymore. I could create an actor what has an array of streams. If one party now sends data, the actor will split the data into small frames and put them to a new array element. Afterwards it will start processing the stream arrays by sending step by step one frame of one stream array then the next frame of the next stream array and so on. If more data is tried to send and a sending is in process, a new array will get added and the processing will also iterate over this array.

If I think so far about this, even adding a simple flow control could be managable: I define an initial window when start processing a stream array. I will stop sending for this stream, when I reached the window size and I wait until the other party will send an okay, that he can receive more data.

Please forgive me for those questions, that now really don't have to do anything with SwiftNIO. Of course using a proven protocol would be maybe easier, but where is then the fun? ;) And for someone who didn't study computer science, this all is very exiting :slight_smile:

1 Like

This is only true depending on the level of abstraction you consider.

Up to the TCP layer, yes, this is true. If the receiver does not have space in the buffer to store an ethernet frame or IP packet, they will drop that packet.

One of the things that TCP provides is reliable, in-order delivery. It does this by having the receiver acknowledge receiving a segment. If a segment gets dropped, TCP won't see the acknowledgement, and it'll send that segment again. Importantly, this intersects with flow control: as long as that segment has not been acknowledged, it consumes part of the flow control window, and the sender cannot send more data. Essentially, packet loss has the effect of slowing TCP down, but not causing data loss. (Sidebar: packet loss also impacts congestion control, which is related to but different from flow control.)

You're right, flow control is not specific to multiplexing. However, it is somewhat more critical to implement when multiplexing, or you will not get the benefits from your multiplexing strategy that you wanted.

This is a basic flow control algorithm that works pretty well for simple use-cases. In fact, it's not too terribly far from HTTP/2's flow control algorithm as @johannesweiss discussed above.

I need to come back to this topic once more...

I don't understand this. Why 2^32 bytes? That would be 4,294,967,296 Bytes = 4,194,304 KB = 4,096 MB?!

I'm actually trying to implement my ideas, but I stuck on splitting the data to seperate chunks. I understand it in this way, that I will need to split the data and I have to put the stream number in front of every chunk. But when the MTU is 1500 Bytes and when I can send 1,460 bytes max. via TCP, will I need to split it in 1,460 - 32 (stream identifier) - 32 (length)?

If so: How can I then ensure, that SwiftNIO will send the whole chunk in one ethernet frame, so that the other side can always receive the stream identifier?

Because the length field is 32-bits long, and it records the number of bytes in the frame, so 2^32 bytes is the maximum length.

But I didn't say "as large as possible", only that the chunks cannot be larger than this. They can very definitely be smaller.

No. The MTU is an implementation detail of TCP. You don't have to worry about it.

You don't need to, TCP will reassemble the stream.

Thank so very much for your answer! So TCP will split the chunks in even more smaller parts but due to the frame header it will be capable of reassamble the data?

But what will arrive in channelRead? Do I receive the whole chunk even if it is bigger then 1460 bytes? Is there any limit when I tread the readableBytes in channelRead? Or could the chunk size be whatever I like? (I don't plan to do this, but I just want to ensure, that every Data-packet that arrives in channelRead starts with the stream identifier).

Yes.

channelRead will give you streams of bytes. There are no delimiters there, so you can receive the bytes in any size chunk: the only requirement is that the bytes will be in the same order as you sent them. You'll need to implement a little length-prefixed parser, probably.

Shame on me, that I forgot, what I already knew. Alright, thank you. I will split the data in different chunks, that will have:

| stream id (32) | length (32) | chunk data |

and those chunks I will send with a prefix and a suffix to seperate them in channel read =>

| prefix | stream id | length | data | suffix |