Skip to content

Device Connection Flipping, KeepAlive Fixup #899

@lorennorman

Description

@lorennorman

On the IO frontend, WipperSnapper devices flip "disconnected"/"connected" frequently. It's a pain because:

  • it can wipe out frontend activity if you're in a form that relies on being connected
  • it "junks up" a device's connection log with many spurious connection events

I believe this can be fixed entirely in the device code, per the following 3 snippets:

constant is set to 5 seconds:

#define WS_KEEPALIVE_INTERVAL_MS                                               \
  5000 ///< Session keepalive interval time, in milliseconds

broker is told "disconnect me if i don't see you after 5 seconds":

      WS._mqtt->setKeepAliveInterval(WS_KEEPALIVE_INTERVAL_MS / 1000);

device gives itself a half-second to:

  • finish up whatever it was doing
  • get the ping packet to the broker
void Wippersnapper::pingBroker() {
  // ping within keepalive-10% to keep connection open
  if (millis() > (_prv_ping + (WS_KEEPALIVE_INTERVAL_MS -
                               (WS_KEEPALIVE_INTERVAL_MS * 0.10)))) {

Analysis: the device is inadvertently giving itself a mere 500ms to finish up whatever it is doing and get a "ping" packet to the broker. So every 5 seconds, there's a chance this lands after the timeout, causing a "disconnect" following closely by a "connect".

We discussed quickly as a team to make sure this wasn't intended, and we agree it is not. It's more that it's not obvious this is how this all works!

Proposed Fix:
Instead of using 1 constant plus math to express everything, break it into 2 obvious constants and remove the math from the callsites.

// ...
#define WS_BROKER_KEEPALIVE_MS 10000 // maximum time without a ping before a disconnect
#define WS_DEVICE_PING_MS 5000 // minimum time before a ping
// ...
// tell the broker how long to wait for a ping
WS._mqtt->setKeepAliveInterval(WS_BROKER_KEEPALIVE_MS / 1000);
// ...
void Wippersnapper::pingBroker() {
  // if it's past time to send the next ping
  if (millis() > (_prv_ping + WS_DEVICE_PING_MS)
// ...

I hope this will fix the problem and be easier to reason about going forward.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions