Handling Firmware Updates with Golioth: Dealing with Golioth Client Stop Interruptions

In a typical application that interacts with the Golioth platform, you might initiate a firmware update using the golioth_fw_update_init() and golioth_fw_update_register_state_change_callback() functions. During the update process, Golioth communicates the state of the update using state change notifications such as GOLIOTH_OTA_STATE_IDLE, GOLIOTH_OTA_STATE_DOWNLOADING, etc.

However, a problem arises when you call golioth_client_stop() while a firmware update is in starting/progress. This method will effectively stop the client, which in turn can abort any ongoing operations, including firmware updates. As a result, the connection is closed by the client_stop and the firmware update tries to run further.

0:00:24.250,274] <inf> golioth_fw_update: State = Idle
[00:00:24.252,838] <inf> golioth_coap_client_zephyr: Attempting to stop client
[00:00:24.252,929] <inf> golioth_coap_client_zephyr: Stop request
[00:00:24.252,929] <inf> golioth_coap_client_zephyr: Ending session
[00:00:24.252,960] <inf> lte_data_thread: Golioth client disconnected
[00:00:24.253,417] <err> golioth_fw_update: Failed to report fw status: 1; retry in 5s
[00:00:29.253,631] <wrn> golioth_coap_client: Client not running, dropping set request for path .u/c/main
[00:00:29.253,631] <err> golioth_fw_update: Failed to report fw status: 12; retry in 10s
[00:00:39.253,875] <wrn> golioth_coap_client: Client not running, dropping set request for path .u/c/main
[00:00:39.253,875] <err> golioth_fw_update: Failed to report fw status: 12; retry in 20s

Example Scenario

In a my case:

  1. Initialize the Golioth client.
  2. Start the firmware update.
  3. Send Sync Data Report.
  4. Call golioth_client_stop().

Point (3) failed and point (2) progress trying to fetch packages, which stop really stops and not waiting for open connections as documented. Especially the update starts a bit late or is delayed in the DOWNLOADING state, because with some sleep in between 3 and 4 it works the downloading really starts.

My Current workaround

static bool is_fw_update_in_progress = false;

static void on_fw_update_state_change(enum golioth_ota_state state,
                                      enum golioth_ota_reason reason,
                                      void *user_arg)
{
    switch (state) {
        case GOLIOTH_OTA_STATE_IDLE:
            is_fw_update_in_progress = false; // Update completed, safe to stop client
            break;
        case GOLIOTH_OTA_STATE_DOWNLOADING:
            is_fw_update_in_progress = true;  // Update in progress
            break;
        case GOLIOTH_OTA_STATE_DOWNLOADED:
            is_fw_update_in_progress = false; // Download completed
            break;
        default:
            break;
    }
}

In this case, the on_fw_update_state_change callback keeps track of whether the update is in progress. You can use this flag (is_fw_update_in_progress) to prevent calling golioth_client_stop() until the update has finished.

Another solution is to use a semaphore to block the process until the firmware update is finished. The semaphore will allow you to synchronize the client stop operation with the completion of the firmware update.

K_SEM_DEFINE(fw_update_complete, 0, 1); // Semaphore to track the update progress

static void on_fw_update_state_change(enum golioth_ota_state state,
                                      enum golioth_ota_reason reason,
                                      void *user_arg)
{
    switch (state) {
        case GOLIOTH_OTA_STATE_IDLE:
            k_sem_give(&fw_update_complete); // Firmware update completed
            break;
        case GOLIOTH_OTA_STATE_DOWNLOADING:
            break; // Do nothing, waiting for the update to complete
        default:
            break;
    }
}

// Wait for update completion before stopping the client
k_sem_take(&fw_update_complete, K_FOREVER);
golioth_client_stop(client);

However, the first state is idle, so the firmware is passed before the real download is started.

Delaying Client Stop Until Update Completion is the worst option i guess.

Would it not better to have this logic in golioth. That the stop really waits for all ongoing activity or better force to finish?

K_SEM_DEFINE(fw_update_complete, 0, 1); 
static bool _is_update_in_progress = false;
static void on_fw_update_state_change(enum golioth_ota_state state,
                                      enum golioth_ota_reason reason,
                                      void *user_arg)
{

    switch (state) {
        case GOLIOTH_OTA_STATE_IDLE:
            _is_update_in_progress = false; // No update in progress
            LOG_INF("Firmware update is idle.");
            break;
        case GOLIOTH_OTA_STATE_DOWNLOADING:
            _is_update_in_progress = true; // Update started
            LOG_INF("Firmware update downloading...");
            break;
        case GOLIOTH_OTA_STATE_DOWNLOADED:
            _is_update_in_progress = true; // Update downloaded, ready to apply
            LOG_INF("Firmware update downloaded.");
            k_sem_give(&fw_update_complete); // Release semaphore
            break;
        case GOLIOTH_OTA_STATE_UPDATING:
            _is_update_in_progress = true; // Firmware is being applied
            LOG_INF("Firmware is being applied...");
            break;
        default:
            _is_update_in_progress = false;
            break;
    }
}

static void stop_client_if_update_complete(struct golioth_client *client) {
    if (!_is_update_in_progress) {
        // No update is in progress, skip waiting for the semaphore
        LOG_INF("No firmware update in progress, stopping the client.");
        golioth_client_stop(client);
        return;
    }

    // If update is in progress, wait for the semaphore
    LOG_INF("Waiting for firmware update to complete...");
    k_sem_take(&fw_update_complete, K_FOREVER);

    // Now it's safe to stop the client
    LOG_INF("Firmware update complete, stopping the client.");
    golioth_client_stop(client);
}

later in the code

golioth_fw_update_init(client, _current_version);
    golioth_fw_update_register_state_change_callback(on_fw_update_state_change, NULL);
    k_sem_take(&fw_update_complete, K_FOREVER);
    // go to sleep
    stop_client_if_update_complete(client);
    pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
    pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);

My output

[00:00:22.758,605] <err> lte_data_thread: golioth_stream_set_sync failed: 9
[00:00:22.758,636] <err> lte_data_thread: Failed to send status payload
[00:00:22.758,697] <inf> golioth_fw_update: Current firmware version: main - 0.0.3
[00:00:22.759,552] <inf> golioth_fw_update: State = Idle
[00:00:22.784,912] <inf> app_golioth: Setting on_send_adc_data_setting to 0 s
[00:00:22.785,003] <inf> app_golioth: Setting SEND_ADC_DATA_ON_NEXT_ONLINE to false
[00:00:22.785,034] <inf> app_golioth: Setting on_send_adc_record_setting to 3600 s
[00:00:23.089,447] <wrn> golioth_coap_client: 1 resends in last 10 seconds
[00:00:23.457,489] <inf> app_golioth: Setting on_send_adc_data_setting to 0 s
[00:00:23.457,550] <inf> app_golioth: Setting SEND_ADC_DATA_ON_NEXT_ONLINE to false
[00:00:23.457,611] <inf> app_golioth: Setting on_send_adc_record_setting to 3600 s
[00:00:23.567,382] <inf> app_golioth: Setting on_send_adc_data_setting to 0 s
[00:00:23.567,474] <inf> app_golioth: Setting SEND_ADC_DATA_ON_NEXT_ONLINE to false
[00:00:23.567,504] <inf> app_golioth: Setting on_send_adc_record_setting to 3600 s
[00:00:24.943,603] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:24.943,634] <inf> golioth_fw_update: State = Idle
[00:00:24.943,634] <inf> lte_data_thread: Firmware update is idle.
[00:00:26.863,677] <inf> golioth_fw_update: Received OTA manifest
[00:00:26.863,769] <inf> golioth_fw_update: Current version = 0.0.3, Target version = 0.0.4
[00:00:26.863,800] <inf> golioth_fw_update: State = Downloading
[00:00:26.863,800] <inf> lte_data_thread: Firmware update downloading...
[00:00:29.293,945] <inf> golioth_fw_update: Received block 0/411
[00:00:29.294,036] <inf> mcuboot_util: Image index: 0, Swap type: none
[00:00:29.294,067] <inf> golioth_fw_zephyr: swap type: none
[00:00:30.407,623] <inf> golioth_fw_update: Received block 1/411
[00:00:31.400,634] <inf> golioth_fw_update: Received block 2/411
[00:00:32.407,775] <inf> golioth_fw_update: Received block 3/411
[00:00:33.414,550] <inf> golioth_fw_update: Received block 4/411
[00:00:33.514,678] <wrn> golioth_coap_client: 3 resends in last 10 seconds
[00:00:34.695,831] <inf> golioth_fw_update: Received block 5/411
[00:00:35.527,709] <inf> golioth_fw_update: Received block 6/411
[00:00:36.534,820] <inf> golioth_fw_update: Received block 7/411
[00:00:37.526,580] <inf> golioth_fw_update: Received block 8/411
[00:00:38.487,640] <inf> golioth_fw_update: Received block 9/411
[00:00:39.371,551] <inf> golioth_fw_update: Received block 10/411
[00:00:40.341,583] <inf> golioth_fw_update: Received block 11/411
[00:00:41.335,021] <inf> golioth_fw_update: Received block 12/411
[00:00:42.327,758] <inf> golioth_fw_update: Received block 13/411
[00:00:43.207,733] <inf> golioth_fw_update: Received block 14/411
[00:00:44.182,678] <inf> golioth_fw_update: Received block 15/411
[00:00:45.174,621] <inf> golioth_fw_update: Received block 16/411
[00:00:46.246,887] <inf> golioth_fw_update: Received block 17/411
[00:00:47.447,692] <inf> golioth_fw_update: Received block 18/411
[00:00:48.326,568] <inf> golioth_fw_update: Received block 19/411
[00:00:49.275,939] <inf> golioth_fw_update: Received block 20/411
[00:00:50.326,568] <inf> golioth_fw_update: Received block 21/411
[00:00:51.287,872] <inf> golioth_fw_update: Received block 22/411

I don’t know why the sync call fails - but another topic

Hey @sebastian

Thank you for taking the time to report this and for the detailed explanation.

I see what you’re describing regarding golioth_client_stop() interrupting a firmware update that’s in progress or just about to begin. Your workaround using a state flag and the semaphore approach makes sense, but I understand the limitations you’re pointing out — especially with the update sometimes being delayed in entering the DOWNLOADING state.

I’ll go ahead and forward this to our firmware team so they can take a closer look.

We’ll keep you posted as we dig in further.

1 Like

Could you help me with that?
why is the swap type:none?

[00:00:46.333,160] golioth_fw_update: Current version = 0.0.3, Target version = 0.0.4
[00:00:46.333,190] golioth_fw_update: State = Downloading
[00:00:46.333,190] lte_data_thread: Firmware update state changed to: 1, reason: 0
[00:00:46.333,221] lte_data_thread: Firmware update downloading…
[00:00:48.286,712] app_golioth: golioth_stream_set_sync failed: 9
[00:00:48.286,743] lte_data_thread: Failed to send status payload
[00:00:48.286,743] lte_data_thread: Waiting for firmware update to complete…
[00:00:52.035,980] golioth_fw_update: Received block 0/411
[00:00:52.036,041] mcuboot_util: Image index: 0, Swap type: none
[00:00:52.036,071] golioth_fw_zephyr: swap type: none

Hey @sebastian,

You’re seeing swap type: none most likely because the firmware image has not finished downloading and hasn’t been marked for an upgrade yet.

1 Like

Maybe there is also a bug.
Requesting if there is anything pending and stop the client and afterards the settings and fw_update start to request.


[00:00:21.529,205] <inf> lte_data_thread: Number of items in request queue: 0
[00:00:21.529,235] <inf> golioth_coap_client_zephyr: Attempting to stop client
[00:00:21.529,327] <inf> golioth_coap_client_zephyr: Stop request
[00:00:21.529,357] <inf> golioth_coap_client_zephyr: Ending session
[00:00:21.529,388] <inf> lte_data_thread: Golioth client disconnected
[00:00:30.030,059] <err> golioth_fw_update: Failed to report fw status: 1; retry in 5s
[00:00:30.030,273] <err> golioth_settings: Settings callback received status error: 1  
[00:00:30.030,395] <err> golioth_settings: Settings callback received status error: 1  
[00:00:30.030,487] <err> golioth_settings: Settings callback received status error: 1  
[00:00:35.030,273] <wrn> golioth_coap_client: Client not running, dropping set request for path .u/c/main
[00:00:35.030,303] <err> golioth_fw_update: Failed to report fw status: 12; retry in 10s
[00:00:45.030,517] <wrn> golioth_coap_client: Client not running, dropping set request for path .u/c/main
[00:00:45.030,548] <err> golioth_fw_update: Failed to report fw status: 12; retry in 20s

Could you provide more context for this post?
Are you calling fw_update_init() and golioth_settings_register() while the Golioth client is not running?

struct golioth_client *client = golioth_client_create(&my_client_config);
golioth_client_register_event_callback(client, on_client_event, NULL);
golioth_fw_update_init(client, _current_version);
 golioth_fw_update_register_state_change_callback(on_fw_update_state_change, NULL);
    app_settings_register(client); // init and register my settings

    if (!wait_for_connection()) 
    {
        LOG_ERR("Failed to connect to Golioth");
        golioth_client_stop(client);
        return; // Exit the thread if connection fails multiple times
    }
    print_cellular_info(modem);

    // send status data
    int err = send_status_payload(modem, client, STATUS_FIRST_BOOT, 0);
    if (err != 0)
    {
        LOG_ERR("Failed to send status payload");
    }

    //  go to sleep
    stop_client_if_update_complete(client); //here have the stop and after that the request are happend
    pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
    pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);

So in general i do everything and the there is no update, therefore i stop the client and after that the request form the log above happen.

@sebastian,

The firmware team wasn’t able to reproduce the issue you’re seeing with firmware update state reporting using the fw_update example. When stopping the client mid-update, they consistently saw messages indicating that the block download resumes. Once the client started again, it picked up from where it left off.

*** Booting My Application v1.2.3-81bb59b876ba ***
*** Using nRF Connect SDK v2.8.0-a2386bfc8401 ***
*** Using Zephyr OS v3.7.99-0bc3393fb112 ***
*** Golioth Firmware SDK v0.17.0-30-gf6d2c1d74ed0 ***
[00:00:00.510,467] <dbg> fw_update_sample: main: Start FW Update sample
[00:00:00.510,498] <inf> golioth_samples: Bringing up network interface
[00:00:00.510,498] <inf> golioth_samples: Waiting to obtain IP address
[00:00:01.288,330] <inf> lte_monitor: Network: Searching
[00:00:03.367,431] <inf> lte_monitor: Network: Registered (roaming)
[00:00:03.368,164] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:03.368,865] <inf> golioth_fw_update: Current firmware version: main - 1.2.3
[00:00:03.370,483] <inf> golioth_fw_update: State = Idle
[00:00:04.595,031] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:04.595,306] <inf> fw_update_sample: Golioth client connected
[00:00:04.595,336] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:04.945,617] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:04.945,617] <inf> golioth_fw_update: State = Idle
[00:00:05.284,027] <inf> golioth_fw_update: Received OTA manifest
[00:00:05.284,088] <inf> golioth_fw_update: Current version = 1.2.3, Target version = 5.5.5
[00:00:05.284,118] <inf> golioth_fw_update: State = Downloading
[00:00:06.359,893] <inf> golioth_fw_update: Received block 0/351
[00:00:06.359,985] <inf> mcuboot_util: Image index: 0, Swap type: none
[00:00:06.360,015] <inf> golioth_fw_zephyr: swap type: none
[00:00:07.062,805] <inf> golioth_fw_update: Received block 1/351
[00:00:07.576,263] <inf> golioth_fw_update: Received block 2/351
[00:00:08.136,169] <inf> golioth_fw_update: Received block 3/351
[00:00:08.700,347] <inf> golioth_fw_update: Received block 4/351
[00:00:09.364,318] <inf> golioth_fw_update: Received block 5/351
[00:00:09.926,330] <inf> golioth_fw_update: Received block 6/351
[00:00:10.536,285] <inf> golioth_fw_update: Received block 7/351
[00:00:11.056,243] <inf> golioth_fw_update: Received block 8/351
[00:00:11.797,454] <inf> golioth_fw_update: Received block 9/351
[00:00:12.392,364] <inf> golioth_fw_update: Received block 10/351
[00:00:12.942,474] <inf> golioth_fw_update: Received block 11/351
[00:00:13.486,663] <inf> golioth_fw_update: Received block 12/351
[00:00:14.117,462] <inf> golioth_fw_update: Received block 13/351
[00:00:14.595,336] <wrn> fw_update_sample: Stopping client
[00:00:14.595,367] <inf> golioth_coap_client_zephyr: Attempting to stop client
[00:00:14.595,520] <inf> golioth_coap_client_zephyr: Stop request
[00:00:14.595,520] <inf> golioth_coap_client_zephyr: Ending session
[00:00:14.595,550] <inf> fw_update_sample: Golioth client disconnected
[00:00:14.596,069] <inf> golioth_fw_update: Block download failed at block idx: 14; status: GOLIOTH_ERR_FAIL; resuming
uart:~$
uart:~$
[00:00:25.696,380] <wrn> fw_update_sample: Starting client
[00:00:26.913,360] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:26.913,421] <inf> fw_update_sample: Golioth client connected
[00:00:26.913,513] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:43.445,495] <inf> golioth_fw_update: Received block 14/351
[00:00:44.023,590] <inf> golioth_fw_update: Received block 15/351
[00:00:44.633,544] <inf> golioth_fw_update: Received block 16/351
[00:00:45.288,330] <inf> golioth_fw_update: Received block 17/351

That said, if the client is stopped while the state reporting function is already in its backoff cycle, those retries will continue to trigger in the background. The state message will try to resend up to five times before timing out.

It’s also worth reiterating that the SDK’s fw_update module is a reference implementation. It’s designed to demonstrate OTA functionality, not to handle power management or sleep-heavy applications out of the box. Users are free to implement their own logic on top of the core OTA APIs.

Separately, I tested stopping and starting the Golioth client using the Reference Design Template — adding golioth_client_stop and golioth_client_start calls inside the main loop (once every 5 passes). I didn’t see any issues like the ones you’re describing.

[00:00:00.519,897] <inf> golioth_rd_template: Modem firmware version: mfw_nrf9160_1.3.4
[00:00:00.519,958] <inf> golioth_rd_template: Connecting to LTE, this may take some time...
[00:00:00.552,642] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 0
[00:00:04.630,218] <inf> lte_monitor: Network: Searching
[00:00:06.419,403] <inf> lte_monitor: Network: Registered (home)
[00:00:06.419,982] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:06.420,776] <inf> golioth_fw_update: Current firmware version: main - 2.6.0
[00:00:06.423,248] <inf> golioth_fw_update: State = Idle
[00:00:07.663,116] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:07.663,421] <inf> golioth_rd_template: Golioth client connected
[00:00:07.663,421] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:07.913,146] <dbg> app_state: app_state_desired_handler: desired
                                    7b 22 65 78 61 6d 70 6c  65 5f 69 6e 74 30 22 3a |{"exampl e_int0":
                                    2d 31 2c 22 65 78 61 6d  70 6c 65 5f 69 6e 74 31 |-1,"exam ple_int1
                                    22 3a 2d 31 7d                                   |":-1}
[00:00:07.913,238] <dbg> app_state: app_state_desired_handler: No change requested for example_int0
[00:00:07.913,238] <dbg> app_state: app_state_desired_handler: No change requested for example_int1
[00:00:07.980,804] <dbg> app_state: async_handler: State successfully set
[00:00:08.063,201] <inf> app_settings: Set loop delay to 10 seconds
[00:00:08.063,232] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 1
[00:00:08.112,213] <inf> app_settings: Set loop delay to 10 seconds
[00:00:08.112,274] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 2
[00:00:08.216,003] <inf> golioth_rpc: RPC observation established
[00:00:08.401,092] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:08.401,092] <inf> golioth_fw_update: State = Idle
[00:00:09.521,148] <inf> golioth_fw_update: Received OTA manifest
[00:00:09.521,209] <inf> golioth_fw_update: Manifest does not contain target component: main
[00:00:09.521,240] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:09.521,240] <inf> golioth_fw_update: State = Idle
[00:00:18.112,487] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 3
[00:00:28.112,731] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 4
[00:00:38.112,976] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 5
[00:00:48.113,189] <inf> golioth_coap_client_zephyr: Attempting to stop client
[00:00:48.113,342] <inf> golioth_coap_client_zephyr: Stop request
[00:00:48.113,342] <inf> golioth_coap_client_zephyr: Ending session
[00:00:48.113,372] <inf> golioth_rd_template: Golioth client disconnected
[00:00:49.213,989] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 6
[00:00:59.214,080] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 7
[00:01:09.214,202] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 8
[00:01:19.214,324] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 9
[00:01:29.214,447] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 10
[00:01:39.214,599] <dbg> app_sensors: app_sensors_read_and_stream: No connection available, skipping streaming counter: 11
[00:01:48.691,772] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:01:48.691,802] <inf> golioth_rd_template: Golioth client connected
[00:01:48.692,138] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:01:48.945,739] <dbg> app_state: app_state_desired_handler: desired
                                    7b 22 65 78 61 6d 70 6c  65 5f 69 6e 74 30 22 3a |{"exampl e_int0":
                                    2d 31 2c 22 65 78 61 6d  70 6c 65 5f 69 6e 74 31 |-1,"exam ple_int1
                                    22 3a 2d 31 7d                                   |":-1}
[00:01:48.945,800] <dbg> app_state: app_state_desired_handler: No change requested for example_int0
[00:01:48.945,831] <dbg> app_state: app_state_desired_handler: No change requested for example_int1
[00:01:48.994,842] <inf> app_settings: Set loop delay to 10 seconds
[00:01:48.994,873] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 12
[00:01:49.098,907] <inf> golioth_rpc: RPC observation established
[00:01:49.129,699] <inf> golioth_fw_update: Received OTA manifest
[00:01:49.129,760] <inf> golioth_fw_update: Manifest does not contain target component: main
[00:01:49.129,760] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:01:49.129,760] <inf> golioth_fw_update: State = Idle
[00:01:58.995,117] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 13
[00:02:08.995,361] <dbg> app_sensors: app_sensors_read_and_stream: Streaming counter: 14

Getting back to the matter at hand, from your logs, it looks like the client is being stopped while operations like status reporting and settings synchronization are still in progress:

stop_client_if_update_complete(client);
pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);

This is backed up by the logs:

[00:00:21.529,327] <inf> golioth_coap_client_zephyr: Stop request
...
golioth_fw_update: Failed to report fw status: 1
golioth_settings: Settings callback received status error: 1
...
Client not running, dropping set request for path .u/c/main
golioth_fw_update: Failed to report fw status: 12

These show that the client was stopped, and then async operations continued to try and run — resulting in failures and retries.

To avoid this, the client should only be stopped once all pending requests have been processed. You can do this using the API call:

uint32_t golioth_client_num_items_in_request_queue(struct golioth_client *client);

Here’s a pattern you can test with to wait for the request queue to flush before stopping the client, while avoiding infinite loops:

#define MAX_WAIT_MS 5000
#define POLL_INTERVAL_MS 200

uint32_t waited = 0;
while (golioth_client_num_items_in_request_queue(client) > 0 && waited < MAX_WAIT_MS) {
    LOG_INF("Waiting for Golioth request queue to empty...");
    k_sleep(K_MSEC(POLL_INTERVAL_MS));
    waited += POLL_INTERVAL_MS;
}

if (waited >= MAX_WAIT_MS) {
    LOG_WRN("Timeout waiting for Golioth queue to empty");
}

stop_client_if_update_complete(client);
pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);

This should prevent operations from getting cut off mid-flight and avoid the retry/backoff logs you’re seeing. Be sure to run this through full testing before relying on it.