@sebastian,
The firmware team wasn’t able to reproduce the issue you’re seeing with firmware update state reporting using the fw_update
example. When stopping the client mid-update, they consistently saw messages indicating that the block download resumes. Once the client started again, it picked up from where it left off.
*** Booting My Application v1.2.3-81bb59b876ba ***
*** Using nRF Connect SDK v2.8.0-a2386bfc8401 ***
*** Using Zephyr OS v3.7.99-0bc3393fb112 ***
*** Golioth Firmware SDK v0.17.0-30-gf6d2c1d74ed0 ***
[00:00:00.510,467] <dbg> fw_update_sample: main: Start FW Update sample
[00:00:00.510,498] <inf> golioth_samples: Bringing up network interface
[00:00:00.510,498] <inf> golioth_samples: Waiting to obtain IP address
[00:00:01.288,330] <inf> lte_monitor: Network: Searching
[00:00:03.367,431] <inf> lte_monitor: Network: Registered (roaming)
[00:00:03.368,164] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:03.368,865] <inf> golioth_fw_update: Current firmware version: main - 1.2.3
[00:00:03.370,483] <inf> golioth_fw_update: State = Idle
[00:00:04.595,031] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:04.595,306] <inf> fw_update_sample: Golioth client connected
[00:00:04.595,336] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:04.945,617] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:04.945,617] <inf> golioth_fw_update: State = Idle
[00:00:05.284,027] <inf> golioth_fw_update: Received OTA manifest
[00:00:05.284,088] <inf> golioth_fw_update: Current version = 1.2.3, Target version = 5.5.5
[00:00:05.284,118] <inf> golioth_fw_update: State = Downloading
[00:00:06.359,893] <inf> golioth_fw_update: Received block 0/351
[00:00:06.359,985] <inf> mcuboot_util: Image index: 0, Swap type: none
[00:00:06.360,015] <inf> golioth_fw_zephyr: swap type: none
[00:00:07.062,805] <inf> golioth_fw_update: Received block 1/351
[00:00:07.576,263] <inf> golioth_fw_update: Received block 2/351
[00:00:08.136,169] <inf> golioth_fw_update: Received block 3/351
[00:00:08.700,347] <inf> golioth_fw_update: Received block 4/351
[00:00:09.364,318] <inf> golioth_fw_update: Received block 5/351
[00:00:09.926,330] <inf> golioth_fw_update: Received block 6/351
[00:00:10.536,285] <inf> golioth_fw_update: Received block 7/351
[00:00:11.056,243] <inf> golioth_fw_update: Received block 8/351
[00:00:11.797,454] <inf> golioth_fw_update: Received block 9/351
[00:00:12.392,364] <inf> golioth_fw_update: Received block 10/351
[00:00:12.942,474] <inf> golioth_fw_update: Received block 11/351
[00:00:13.486,663] <inf> golioth_fw_update: Received block 12/351
[00:00:14.117,462] <inf> golioth_fw_update: Received block 13/351
[00:00:14.595,336] <wrn> fw_update_sample: Stopping client
[00:00:14.595,367] <inf> golioth_coap_client_zephyr: Attempting to stop client
[00:00:14.595,520] <inf> golioth_coap_client_zephyr: Stop request
[00:00:14.595,520] <inf> golioth_coap_client_zephyr: Ending session
[00:00:14.595,550] <inf> fw_update_sample: Golioth client disconnected
[00:00:14.596,069] <inf> golioth_fw_update: Block download failed at block idx: 14; status: GOLIOTH_ERR_FAIL; resuming
uart:~$
uart:~$
[00:00:25.696,380] <wrn> fw_update_sample: Starting client
[00:00:26.913,360] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:26.913,421] <inf> fw_update_sample: Golioth client connected
[00:00:26.913,513] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:43.445,495] <inf> golioth_fw_update: Received block 14/351
[00:00:44.023,590] <inf> golioth_fw_update: Received block 15/351
[00:00:44.633,544] <inf> golioth_fw_update: Received block 16/351
[00:00:45.288,330] <inf> golioth_fw_update: Received block 17/351
That said, if the client is stopped while the state reporting function is already in its backoff cycle, those retries will continue to trigger in the background. The state message will try to resend up to five times before timing out.
It’s also worth reiterating that the SDK’s fw_update
module is a reference implementation. It’s designed to demonstrate OTA functionality, not to handle power management or sleep-heavy applications out of the box. Users are free to implement their own logic on top of the core OTA APIs.
Separately, I tested stopping and starting the Golioth client using the Reference Design Template — adding golioth_client_stop
and golioth_client_start
calls inside the main loop (once every 5 passes). I didn’t see any issues like the ones you’re describing.
[00:00:00.519,897] <inf> golioth_rd_template: Modem firmware version: mfw_nrf9160_1.3.4
[00:00:00.519,958] <inf> golioth_rd_template: Connecting to LTE, this may take some time...
[00:00:00.552,642] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 0
[00:00:04.630,218] <inf> lte_monitor: Network: Searching
[00:00:06.419,403] <inf> lte_monitor: Network: Registered (home)
[00:00:06.419,982] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:06.420,776] <inf> golioth_fw_update: Current firmware version: main - 2.6.0
[00:00:06.423,248] <inf> golioth_fw_update: State = Idle
[00:00:07.663,116] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:07.663,421] <inf> golioth_rd_template: Golioth client connected
[00:00:07.663,421] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:07.913,146] <dbg> app_state: app_state_desired_handler: desired
7b 22 65 78 61 6d 70 6c 65 5f 69 6e 74 30 22 3a |{"exampl e_int0":
2d 31 2c 22 65 78 61 6d 70 6c 65 5f 69 6e 74 31 |-1,"exam ple_int1
22 3a 2d 31 7d |":-1}
[00:00:07.913,238] <dbg> app_state: app_state_desired_handler: No change requested for example_int0
[00:00:07.913,238] <dbg> app_state: app_state_desired_handler: No change requested for example_int1
[00:00:07.980,804] <dbg> app_state: async_handler: State successfully set
[00:00:08.063,201] <inf> app_settings: Set loop delay to 10 seconds
[00:00:08.063,232] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 1
[00:00:08.112,213] <inf> app_settings: Set loop delay to 10 seconds
[00:00:08.112,274] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 2
[00:00:08.216,003] <inf> golioth_rpc: RPC observation established
[00:00:08.401,092] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:08.401,092] <inf> golioth_fw_update: State = Idle
[00:00:09.521,148] <inf> golioth_fw_update: Received OTA manifest
[00:00:09.521,209] <inf> golioth_fw_update: Manifest does not contain target component: main
[00:00:09.521,240] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:09.521,240] <inf> golioth_fw_update: State = Idle
[00:00:18.112,487] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 3
[00:00:28.112,731] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 4
[00:00:38.112,976] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 5
[00:00:48.113,189] <inf> golioth_coap_client_zephyr: Attempting to stop client
[00:00:48.113,342] <inf> golioth_coap_client_zephyr: Stop request
[00:00:48.113,342] <inf> golioth_coap_client_zephyr: Ending session
[00:00:48.113,372] <inf> golioth_rd_template: Golioth client disconnected
[00:00:49.213,989] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 6
[00:00:59.214,080] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 7
[00:01:09.214,202] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 8
[00:01:19.214,324] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 9
[00:01:29.214,447] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 10
[00:01:39.214,599] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 11
[00:01:48.691,772] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:01:48.691,802] <inf> golioth_rd_template: Golioth client connected
[00:01:48.692,138] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:01:48.945,739] <dbg> app_state: app_state_desired_handler: desired
7b 22 65 78 61 6d 70 6c 65 5f 69 6e 74 30 22 3a |{"exampl e_int0":
2d 31 2c 22 65 78 61 6d 70 6c 65 5f 69 6e 74 31 |-1,"exam ple_int1
22 3a 2d 31 7d |":-1}
[00:01:48.945,800] <dbg> app_state: app_state_desired_handler: No change requested for example_int0
[00:01:48.945,831] <dbg> app_state: app_state_desired_handler: No change requested for example_int1
[00:01:48.994,842] <inf> app_settings: Set loop delay to 10 seconds
[00:01:48.994,873] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 12
[00:01:49.098,907] <inf> golioth_rpc: RPC observation established
[00:01:49.129,699] <inf> golioth_fw_update: Received OTA manifest
[00:01:49.129,760] <inf> golioth_fw_update: Manifest does not contain target component: main
[00:01:49.129,760] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:01:49.129,760] <inf> golioth_fw_update: State = Idle
[00:01:58.995,117] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 13
[00:02:08.995,361] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 14
Getting back to the matter at hand, from your logs, it looks like the client is being stopped while operations like status reporting and settings synchronization are still in progress:
stop_client_if_update_complete(client);
pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
This is backed up by the logs:
[00:00:21.529,327] <inf> golioth_coap_client_zephyr: Stop request
...
golioth_fw_update: Failed to report fw status: 1
golioth_settings: Settings callback received status error: 1
...
Client not running, dropping set request for path .u/c/main
golioth_fw_update: Failed to report fw status: 12
These show that the client was stopped, and then async operations continued to try and run — resulting in failures and retries.
To avoid this, the client should only be stopped once all pending requests have been processed. You can do this using the API call:
uint32_t golioth_client_num_items_in_request_queue(struct golioth_client *client);
Here’s a pattern you can test with to wait for the request queue to flush before stopping the client, while avoiding infinite loops:
#define MAX_WAIT_MS 5000
#define POLL_INTERVAL_MS 200
uint32_t waited = 0;
while (golioth_client_num_items_in_request_queue(client) > 0 && waited < MAX_WAIT_MS) {
LOG_INF("Waiting for Golioth request queue to empty...");
k_sleep(K_MSEC(POLL_INTERVAL_MS));
waited += POLL_INTERVAL_MS;
}
if (waited >= MAX_WAIT_MS) {
LOG_WRN("Timeout waiting for Golioth queue to empty");
}
stop_client_if_update_complete(client);
pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
This should prevent operations from getting cut off mid-flight and avoid the retry/backoff logs you’re seeing. Be sure to run this through full testing before relying on it.