How We Improved Reliability of our WebSocket Connections

by philfreoon 10/28/2021, 2:49 PMwith 18 comments

by moffkalaston 10/28/2021, 4:04 PM

> The (sadly all too) common approach to rarely occurring bugs & edge cases: Pretend like the problem doesn't exist. Blame it on faulty networking, solar flares, etc.

How to tell if someone hasn't been working with a piece of software in production yet? They've never blamed a bug on cosmic radiation yet :D

by robinjhuangon 10/28/2021, 4:53 PM

I think socket.io handles this client keep alive automatically.

https://socket.io/docs/v4/how-it-works/

See disconnection detection section

by politicianon 10/28/2021, 4:17 PM

Good to know that WebSocket API is broken by design. Thanks W3C!

https://www.w3.org/Bugs/Public/show_bug.cgi?id=13104

by acknackackon 10/28/2021, 6:35 PM

A classic issue of TCP half open connection. The client/browser side still thinks that the websocket/TCP connection is still alive. It happens because the client is not actively sending any data outbound, which would have helped to reset that connection eventually. It will be nice if the browser side of the websocket connection can also start PING/PONG mechanism.

by david422on 10/28/2021, 4:28 PM

Interesting read, thanks. I've delved into websockets and hit some interesting issues. I don't think I've had this scenario - that I know of - but this is good to know.

by heliostaticon 10/28/2021, 3:05 PM

> You need to prove that what you think your code does is truly what happens.

Such a good insight -- seems obvious, but too often the source of gotchas, bad data, and bad user experience.

by hungnvon 10/28/2021, 11:58 PM

This is practical implementation when working with websocket. When server got an error or timeout waiting for client pong, it closes the connection, at the same time client send “health check” message without receive reponse (whatever message value of your choise) it closes the connection and reconnect.

by renewiltordon 10/28/2021, 3:54 PM

This is why so many crypto exchanges send ping and pongs periodically as requests and not as control.

It’s application layer keepalive.