Remote Procedure Calls stop functioning when device is left on for about an hour

Doing some more experimentation with RPC. I noticed that after a certain period of time the device on the Golioth Console will show up as “offline” despite still being on and sending regular messages to the log (which show up in the console). When this switch to offline occurs, the device no longer accepts RPC commands through the webpage (they always time out).

Is there a way to prevent this from happening, like a function that will refresh the online status so it doesn’t time out?

I appreciate you reporting the issue. We’re currently investigating and will report back with more info.

Update here. It appears that periodically calling the function “wifi_wait_for_connected_with_timeout(5);” will allow RPC to go through even when the console shows as “offline” for the device.

We drilled down on this and found an edge case where observations were occasionally being prematurely garbage collected on the server side. The gist of it is that if a device reconnected, two observations existed (a defunct one and a current one) and under just the right circumstances both were garbage collected.

A fix is now on production for that issue. You should see better RPC performance right away (assuming your device was in that state). Thanks for sharing info that enabled us to reproduce this!

The issue with the “online”/“offline” indicator is still present. That shouldn’t affect the stability of your device, but we are prioritizing a fix as that is valuable information and currently it is not being reported correctly to users. I’ll publish an update here when I have more info.

Sounds good. Let me know when you have an ETA the fix for the “online”/“offline” indicator.

As a side note I am also seeing that the “online”/offline" indicator light does not update to “offline” for a long time after shutting off a device as well.

An update to the Golioth has changed the connection status behavior to address the connected/not connected issue.

image

Here’s the gist of it:

  • Session Established: the last time the device had a new handshake with server
  • Last Report: the last time a communication was received from the device
  • for both values, hover over the value for a precise timestamp

The online/offline was always triggered by the start of a new session. But the garbage collection of old sessions caused the incorrect offline status to be shown in some cases. Previously when it showed as “online” it wasn’t actually a measurement of the status, but just an indication that a session had started and hadn’t been cancelled (or garbage collected). This update better reflects the information we have available about each device.

More information on this is being added to our documentation, I think that could be merged as soon as this afternoon but may slip to next week.

Just a quick follow-up: we’ve added documentation for the status behavior: Status | Golioth

Thanks for the update mike. I will be experimenting a bit with the new update this week.

2 posts were split to a new topic: Session not created after devices offline for 24 hours