Partial input

Helge_Hess1 · May 21, 2018, 8:00pm

When hooking up SwiftProtobuf with SwiftNIO you run into the issue that a stream of data may either have more or less data than what a Protobuf message carries.

What is the proper way to deal with that? Incomplete data can be detected using BinaryDecodingError.truncated, but what about data that belongs to the next message? You would need something like func decode() -> ( Message, RemainingData )?

johannesweiss · May 21, 2018, 9:28pm

Until you have enough to parse one protocol buffer message, read that off the buffer you got (buffer.readData(length: sizeOfTheProtoMessage)) and return .continue. Once the buffer doesn't have enough to decode the message, just return .needMoreData.

ByteToMessageDecoder will automatically call you in a loop and keep a cumulation buffer of bytes available for you.

Pseudo code for a simple protocol that is [32 bits of unsigned big endian int: length of message][message1][32 bits of unsigned big endian int: length of message][message 2]...:

/* disclaimer: haven't compiled or tested this but should give an idea */
class MyProtoDecoder: ByteToMessageDecoder {
    [...]

    enum State {
        case waitingForLength
        case waitingForMessage(Int)
    }
    private var state = State.waitingForLength

    func decode(ctx: ChannelHandlerContext, buffer: inout ByteBuffer) -> DecodingState {
        switch self.state {
            case .waitingForLength:
                if let len = buffer.readInteger(as: UInt32.self) {
                    if len > 4 * 1024 * 1024 {
                        /* just close channel if over 4 MB, in real-world code you might want to handle differently */
                        ctx.channel.close(promise: nil)
                    }
                    self.state = .waitingForMessage(len)
                    return .continue
                } else {
                    return .needMoreData
                }
          case .waitingForMessage(let length):
               if let data = buffer.readData(length: length) {
                    ctx.fireChannelRead(somehowDecodeAMessageFrom(data))
                    self.state = .waitingForLength
                    return .continue
               } else {
                    return .needMoreData
               }
        }
}

does that make sense?

(edit: added a length check just in case someone wants to use code derived from this in real-world apps)

Helge_Hess1 · May 21, 2018, 9:31pm

That makes a lot of sense (it is handling the BinaryDecodingError.truncated part) but actually doesn't answer my question ;-) The question is specifically about the case when more data is available (i.e. a second message in the receive buffer). What is the API for this in SwiftProtobuf?

tbkka · May 21, 2018, 9:34pm

Protobuf is not self-framing. This means you cannot simply append protobuf messages to a stream and then separate them again.

You'll need to add some information to your data stream to help you identify the start/end of each message. Usually, people do this by inserting an integer length value before each message.

SwiftProtobuf does not have an API to do this for you, so you'll have to implement it yourself. Johannes' code shows one approach.

johannesweiss · May 21, 2018, 9:34pm

If you return .continue it’ll call decode again and form a parsing loop. So that should work just fine or am I missing anything?

Helge_Hess1 · May 21, 2018, 9:34pm

@johannesweiss Yes, you miss the case in which more data is available

Helge_Hess1 · May 21, 2018, 9:36pm

@tbkka I don't quite get this. When decoding anything, you consume N bytes of data. So there should be API which gives you the data not consumed. I can't see why I need an explicit frame here.

tbkka · May 21, 2018, 9:37pm

I should also clarify: BinaryDecodingError.truncated does not detect all cases where a message might be truncated.

Helge_Hess1 · May 21, 2018, 9:37pm

Oh

Helge_Hess1 · May 21, 2018, 9:41pm

OK, I see. You explicitly add framing. Fair enough.

johannesweiss · May 21, 2018, 9:42pm

@Helge_Hess1 the data not consumed will be handed to you in a subsequent call to decode that happenes automatically when you return .continue. So you really only need to read one message off the buffer and B2MD (ByteToMessageDecoder) will call you repeatedly until you need more data to proceed.

If for whatever reason you want to pre-access data that decode would deliver on the next call, just check the bytes in buffer. After you read off the first message, you can use the usual API and the bytes between readerIndex and writerIndex are all yours. But really just return .continue and B2MD will call you again immediately (even if there are no further bytes received from the network as .continue indicates the buffer contains at least as much and likely more than needed for one message).

If this still doesn’t answer your question, can you provide example code on what you’d like to do?

Helge_Hess1 · May 21, 2018, 9:42pm

I'm fine with this, but would you be so kind to explain why implicit framing doesn't work (w/ Protobuf)?

Helge_Hess1 · May 21, 2018, 9:45pm

I'm just looking at Protocol Buffers with SwiftNIO. Previously, we looked at sending a… | by Jonathan Wong | Medium and how to do this properly. He is writing messages without frames, and I'm not entirely sure why frames are necessary.

johannesweiss · May 21, 2018, 9:46pm

@tbkka is the expert but from what I know about protobufs it’s just not a thing the protocol does. You always need to know the message size before attempting to parse. How you do that is up to you. If your wrapping protocol has explicit framing just use that. If it does not (like TCP) you need to create something yourself, like sending the size before the message like in my example.

Helge_Hess1 · May 21, 2018, 9:46pm

Yeah, but this makes no sense to me. A parser parses and there can be left overs. Just like HTTP upgrade.

johannesweiss · May 21, 2018, 9:49pm

That will need framing. That’s not a NIO problem it’s just how protobufs work: Techniques | Protocol Buffers Documentation

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own.

Helge_Hess1 · May 21, 2018, 9:50pm

I never said it is a NIO issue, which is why I posted that here :-) I just would like to understand why/how the format is not self delimiting. No big deal, I'm also happy to just accept the fact and then use Cap'n Proto ;-)

johannesweiss · May 21, 2018, 9:52pm

Or sorry! Thought that’s the NIO forum . Well, it’s also not a SwiftProtobufs issue either. You somehow need to provide framing by whatever means and when you have a full frame then hand it to SwiftProtobufs or any other protobufs implementation.

tbkka · May 21, 2018, 9:54pm

Details of protobuf encoding are here:
Encoding | Protocol Buffers | Google Developers

There's also a Google forum for general protobuf questions:
Redirecting to Google Groups

Helge_Hess1 · May 21, 2018, 9:57pm

Yes I can use Google. So what is the part which requires explicit framing?