I’m working on a “sleepy” cellular device where the device goes to sleep when it’s not sending data to the Stream service. I’m also using the OTA and settings services.
What causes a device to check for updates to the device settings, LightDB State, and receive a new manifest from the OTA service?
For example, let’s say the following config is used:
If the device sends data to the Stream service every 15 minutes, should I expect that the device will also check for updated settings/state/new firmware versions at the same time (i.e. every 15min)? Or are these only checked on the interval defined by CONFIG_GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC=3600 (1hr)?
Actual Behavior
It looks like the device only checks for updates based on CONFIG_GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC=3600
(see comment below)
Apologies… I’m testing this again and it looks like the device is updating the settings and checking for new firmware whenever the device sends new data to the Stream service. The first time I tested out the longer RX timeout settings it appeared the device wasn’t getting updates to settings/OTA, but it was probably something I misconfigured.
Just to confirm, the device should get updates from the server whenever it connects to stream new data, right? And the CONFIG_GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC defines how long before the device tries to connect again in the absence of any other connection to the server?
The settings are not re-fetched on a cadence. There is the possibility of removing the settings observation and then re-establishing it, but I don’t think we continuously fetch them in the background.
If you performed an OTA update, the device should automatically re-establish the settings observation once it reconnects. It’s highly peculiar that you’re seeing otherwise, and I wasn’t able to reproduce this issue on my end.
As for GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC , it defines how long (in seconds) the CoAP client will wait for a response after sending a request to the Golioth Cloud.
I was able to reproduce this behavior a couple different times on a modified version of the Thingy:91 Example repo where I removed the LightDB State & RPC functionality and changed the prj.conf for low power (here are the changes I made)
Here’s the specific behavior I’m seeing:
After the device finishes an OTA update, settings are initially synchronized successfully with the device.
The device transmits sensor data to LightDB Stream successfully on an interval defined by LOOP_DELAY_S (I was using 60s).
If I change the LOOP_DELAY_S to something else (e.g. 65), the new setting is never synchronized and the console shows “Out of sync” indefinitely.
I uploaded the zephyr.signed.bin as a new package v99.99.99 and created a new deployment in the cohort
The device OTA upgraded successfully to v99.99.99 firmware and settings sync’d correctly
I waited for a couple minutes to verify the device was sending Stream data correctly
I changed the LOOP_DELAY_S from 60 to 65
The settings remain “Out of sync” indefinitely even though the device is sending Stream data every 60s.
It’s as if the Settings service gets “stuck” and the device can no longer receive new settings.
However, while I’ve seen this same behavior multiple times now, I can’t figure out how to reproduce it 100% of the time on demand… sometimes I repeat the whole process above (or even just reboot the device) and the settings just sync correctly without any changes to the firmware image.
Can you confirm something for me?
If I have an app that is only using the Stream, Settings, and OTA services, and has the following config:
Assuming the app is sending stream data more frequently than CONFIG_GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC, e.g. every 60s, then the device should only check for new settings and OTA updates each time the stream data is sent to Golioth? Is that a true statement?
@cdwilson I can reproduce the described behavior with GOLIOTH_COAP_KEEPALIVE_INTERVAL_S = 0 (as the blog post suggests). Stream uploads continue to work in that state, but Settings/OTA pushes go missing. The core issue is that pushes require a live server→device path and an active Observation. With keepalive off, the NAT/CGN downlink mapping times out between publishes. When the server sends a confirmable notification, the device often never sees it, doesn’t ACK, and the server eventually cancels the Observation. Outbound Stream still works (client traffic reopens the path), but you won’t get Settings/OTA again until the client reconnects or re-subscribes.
Connection ID helps the DTLS association survive IP/port changes, but it’s not a keepalive—it doesn’t keep the downlink open or maintain the Observation.
We suggest you:
Enable a small keepalive (9–60 s). This keeps NAT/CGN downlink mapping so notifications arrive in time to be ACKed, and the Observation stays alive.
Keep RX timeout sane (30–120 s). If the downlink path dies, the client notices sooner, reconnects, and re-establishes Settings/OTA observations. We’re planning to introduce a polling option for observations, which I believe would be helpful in this case. This should be available relatively soon for OTA, and in the medium term,m we’ll extend it to all observation types.
I originally had a small RX timout set, but in my app the delay between sending outbound stream data is configurable via a runtime setting configured via the Settings service. This delay could be 60s or it could be 24hr. If the stream delay is 24hr, but the client’s RX timeout is 30s, the device will wake up every 30s to check in with the server and battery life suffers. To avoid this, I set the RX timeout to be greater than the max allowed stream delay value (24hr) to avoid having the client wake up earlier due to the RX timeout.
That would be helpful in my case.
Alternatively, I looked to see if there was a way to dynamically set the CoAP client RX timeout at runtime, but it appears to only be configurable at compile time via the CONFIG_GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC Kconfig setting.
Would it be possible to make this RX timeout configurable at runtime? (similar to the way that I can set a default LTE RPTAU period via CONFIG_LTE_PSM_REQ_RPTAU_SECONDS but also change it at runtime via lte_lc_psm_param_set() LC API function)
If the RX timeout was configurable at runtime, I could automatically change it to a reasonable value whenever the user updates the stream delay value.
BTW, I think I followed most of this explanation, but there is some terminology in here that was initially not familiar to me (e.g. “Observation”, “confirmable notification”, etc). Is there any documentation or blog post you can point me to that describes how the Golioth client/server interaction is intended to work at a high-level for each of these services? If not, I would find that documentation super helpful to have. When I first saw this behavior, I wasn’t sure what was supposed to happen, so it was hard to determine if what I was seeing was intended/correct behavior, a bug, or me using something in the wrong way.
When I enable the keepalive (GOLIOTH_COAP_KEEPALIVE_INTERVAL_S), the client gets updates from the server (as expected, since it’s keeping the session active). However, I’d like to avoid waking up frequently to sending keepalive requests to conserve battery life.
Let’s say that I change a setting in the Golioth web console while my device is currently sleeping. If the device wakes up to send data to the stream service, doesn’t the client need to reconnect in order to send the stream request? I’m still a bit confused as to why the settings/OTA aren’t received when this connection is opened by the device.
You mentioned that the “server eventually cancels the Observation”. Does this mean that even though the device eventually reconnects to send the stream data, the server won’t push updates to the settings/OTA because the observations have been cancelled from the server’s perspective? Is there a way to extend the timeout on the server such that it won’t cancel the observations before the device has a chance to reconnect? Or is there something I can call on the device SDK side to recover/reinitialize the observations if the server has cancelled them?
I guess I’m trying to figure out if there is a hard limit (on the server or elsewhere) to how long a device can sleep while still keeping the ability to receive settings/OTA updates when the device eventually wakes up and sends data to the stream service.