IdleStateHandler not working when remote server crashes


(Kirill Titov) #1

Hello everyone. Yesterday I found an interesting issue.

Say, I have a NIO TCP server and a NIO TCP client, both client and server have IdleStateHandler in channel pipeline with 1 second timeout, and normally it works perfectly.

But when remote server crashes for some reason after it started reading client input and haven't written anything back (this isn't a problem), the client itself just hangs.

All threads are listening for an OS event, there doesn't seem to be a deadlock. I tried to debug it a little bit and found out that scheduled task from IdleStateHandler isn't even executed. I thought that there might be a problem with my actual code and tried to recreate the problem with just some minimal bootstrap.

And guess what, the problem is still there :D I understand that it's 99% me being stupid, but I still kindly ask this lovely community to help me.

Here is an example: https://github.com/kirilltitov/swift-nio-stuck-idle


(Cory Benfield) #2

The IdleStateHandler cannot help you because your client doesn't notice that the connection has gone idle.

If the connection is idle, the IdleStateHandler will fire user events: specifically, IdleStateEvent.read, IdleStateEvent.write and IdleStateEvent.all. However, the handler for your client doesn't actually look for any of these events.

You should add a function to your handler like this:

func userInboundEventTriggered(ctx: ChannelHandlerContext, event: Any) {
    if event is IdleStateEvent {
        ctx.close(promise: nil)
        self.promise.fail(error: SomeError())
    }
    ctx.fireUserInboundEventTriggered(event: event)
}

(Kirill Titov) #3

Yes, @lukasa, my bad, I forgot to commit that function (see). Of course, I tried it. Still not working :(


(Cory Benfield) #4

Your code is a bit different than mine: mine completes self.promise. You are blocking waiting for that to complete, but nothing actually does so. Try adding the self.promise.fail line from my code.


(Kirill Titov) #5

OK, I added but nothing changed because this method isn't even run once, none of prints invoked


(Cory Benfield) #6

Aha, sorry, the server crashed. In this case what's happening is that your TCP connection is going away, which is what's preventing those timers from firing. You should add this:

func channelInactive(ctx: ChannelHandlerContext) {
    self.promise.fail(error: SomeError())
    ctx.fireChannelInactive()
}

(Kirill Titov) #7

Ohhh, I see... I considered trying it... But it's always invoked when channel closes (even after successful transfer, which is totally fine), and already finished promise (succeeded or failed) will additionally fail. Is it OK? Won't it invoke future's callbacks again?


(Kirill Titov) #8

Just to be clear: it worked, and further experiments showed that everything works perfectly. I presume it's because this channel handler deinitialized after channel has closed...


(Cory Benfield) #9

That's correct.

Right now we allow multiple completion of futures, it will not run the callbacks twice. We may change this in future (I don't like it much). If you're worried, you can make the promise an optional and set it to nil when you've finished with it.


(Kirill Titov) #10

Yeah, I think it would be the perfect solution. Thank you very much for help!