High data usage for the manifest update

Description

Hello,
I’ve started to use the Golioth SDK with the NRF9161 and I’m impressed with how easy it is to setup something as complex as OTA updates! However, upon reviewing the data usage, I found out that the firmware update routine is consuming approximately 2kB/min of both uplink and downlink data. This would amount to almost 3MB/day, which is unreasonably high considering a provider like 1NCE has 500MB for 10 years.

First of all, is this kind of data usage expected, or do I have something misconfigured? I did turn off the logging and other Golioth functions, only OTA is active. Second, as far as I can tell, the update check runs each 30 seconds now. How can I change this default period? I didn’t find a way to set it through compile configuration (prj.conf) or during execution time.

Thank you for any advice on the topic!

Expected Behavior

I want to decrease the amount of data the device is using, either by fixing the configuration (if there’s anything I missed) or by lowering the update check periodicity.

Actual Behavior

The device is working fine, but is using more data than expected.

Environment

Golioth SDK - latest (0.18.0)
NRF SDK - 2.9.0

Logs and Console Output

No relevant errors, data usage approximately 60kB after 30 minutes

1 Like

@jakub.nagy thank you for posting on the forum! The fw_update thread uses a CoAP observation to determine when a new manifest is available for a device, so in the default configuration it should not be performing operations on a cadence. It does appear as though your device is reporting the state of its current firmware version on a roughly 30 second cadence though. Could you share more information about your application? Are you using one of the Golioth Firmware SDK samples?

@jakub.nagy following up here – the frequent firmware status reports are due to the CoAP client restarting, which causes observations to be re-established and the firmware state to be reported. This is occurring due to a bug that can cause client restarts if there is no other activity taking place. This is fixed on main and is included in a pending patch release (v0.18.1). This patch release should be out promptly, but in the mean time you can use the previous release (v0.17.0), or, if still in the development phase, build from tip of main. However, note that no guarantee regarding compatibility or functionality are made on tip of main, so tagged releases should always be used in production settings.

Thank you for the answer! I made a mistake, the version I am using actually is 0.17.0. But the behavior you describe (client restarts) still seem to be taking place. Here are the logs:

00:05:05.826,141] <inf> golioth_fw_update: State = Idle
[00:05:36.108,551] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:05:36.108,581] <inf> golioth_coap_client_zephyr: Ending session
[00:05:38.748,413] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:05:38.748,626] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:05:39.097,229] <inf> golioth_fw_update: Received OTA manifest
[00:05:39.097,290] <inf> golioth_fw_update: Current version = 1.0.8, Target version = 1.0.8
[00:05:39.097,290] <inf> golioth_fw_update: Current version matches target version.
[00:05:39.097,320] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:05:39.097,320] <inf> golioth_fw_update: State = Idle
[00:06:09.415,710] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:06:09.415,710] <inf> golioth_coap_client_zephyr: Ending session

Should I wait for the 0.18.1 patch or do you think it’s something else? Thank you.

@jakub.nagy thanks for the update! Can you try building from tip of main and see if the behavior continues to reproduce?

@hasheddan I’m not sure how to do this since I downloaded it as a package using west.

I guess the revision only works for tagged releases, I’m not that familiar with west either.

    - name: golioth
      path: modules/lib/golioth-firmware-sdk
      revision: main
      url: https://github.com/golioth/golioth-firmware-sdk.git
      submodules: true

are there any instructions for building the sdk?

@jakub.nagy instructions for using the Golioth Firmware SDK with Zephyr can be found here. You can point to a tag, commit, or branch using the revision field. When changes are made in your west.yml, it is necessary to run west update and perform a pristine build.

@jakub.nagy the v0.18.1 release is now available. Please let us know if you continue to encounter issues!

Hi, thank you for the update!

Unfortunately the behavior is still the same. If you check the logs:

*** Booting My Application v1.0.8-4bc7593d9f4e ***
*** Using nRF Connect SDK v2.9.0-7787b2649840 ***
*** Using Zephyr OS v3.7.99-1f8f3dc29142 ***
*** Golioth Firmware SDK v0.18.1 ***
[00:00:00.263,854] <inf> golioth_settings_autoload: Initializing settings subsystem
[00:00:00.269,866] <inf> golioth_settings_autoload: Loading settings
[00:00:00.269,897] <inf> watchdog: Watchdog started (timeout 600 s)
[00:00:00.275,695] <wrn> nvs: Registering NVS keys for the first time.
[00:00:04.042,846] <inf> lte_monitor: Network: Searching
[00:00:10.941,436] <inf> lte_monitor: Network: Registered (roaming)
[00:00:11.042,572] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:13.953,765] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:13.953,857] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:14.045,684] <inf> golioth_fw_update: Current firmware version: main - 1.0.8
[00:00:14.049,163] <inf> golioth_fw_update: State = Idle
[00:00:14.430,145] <inf> mqtt: IPv4 Address found 35.214.203.191
[00:00:14.710,601] <inf> net_mqtt: Connect completed
[00:00:14.710,754] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:14.710,754] <inf> golioth_fw_update: State = Idle
[00:00:14.926,269] <inf> mqtt: Subscribing to: device-cloud/N0008 len 18
[00:00:14.927,398] <inf> mqtt: Connected to MQTT server: emqx.tvojlox.sk:1883
[00:00:15.020,721] <inf> golioth_fw_update: Received OTA manifest
[00:00:15.020,782] <inf> golioth_fw_update: Current version = 1.0.8, Target version = 1.0.8
[00:00:15.020,812] <inf> golioth_fw_update: Current version matches target version.
[00:00:15.020,812] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:15.020,812] <inf> golioth_fw_update: State = Idle
[00:00:15.075,347] <inf> mqtt: Publishing to topic: device-cloud-statuses/N0008 | len: 792
[00:00:45.371,032] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:00:45.371,063] <inf> golioth_coap_client_zephyr: Ending session
[00:00:47.377,960] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:47.378,173] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:47.794,677] <inf> golioth_fw_update: Received OTA manifest
[00:00:47.794,738] <inf> golioth_fw_update: Current version = 1.0.8, Target version = 1.0.8
[00:00:47.794,738] <inf> golioth_fw_update: Current version matches target version.
[00:00:47.794,769] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:47.794,769] <inf> golioth_fw_update: State = Idle

Is this behavior the client restarts you meant?

@jakub.nagy I notice that you are also setting up an MQTT client, can you share more information about how your firmware is structured? Also, can you try running the firmware update sample and see if you observe the same behavior?

@jakub.nagy also, based on the banner printed at the beginning of your logs, it looks like you may have updated the Golioth Firmware SDK version, but did not run a west update afterwards.

*** Booting My Application v1.0.8-4bc7593d9f4e ***
*** Using nRF Connect SDK v2.9.0-7787b2649840 ***
*** Using Zephyr OS v3.7.99-1f8f3dc29142 ***
*** Golioth Firmware SDK v0.18.1 ***

The Golioth Firmware SDK uses NCS v3.0.1 in v0.18.1.

Yes, I’m using an MQTT client for publishing and subscribing to data. I’m not sure how the client is implemented but it’s likely using the same socket.

The main difference between my code and the sample is that I don’t use net_connect() to connect to the network but rather use this custom function:

int MQTT::connect_to_network()
{
    int err;

    // Initialize modem library
    err = nrf_modem_lib_init();
    if (err)
    {
        LOG_ERR("Failed to initialize the modem library, error: %d", err);
        k_sleep(K_SECONDS(CONFIG_MQTT_RECONNECT_DELAY_S));
        sys_reboot(SYS_REBOOT_COLD);
    }
    k_msleep(1000);

    // Connect to LTE network
    LOG_DBG("Connecting to LTE network");
    lte_lc_register_handler(lte_handler);
    err = lte_lc_connect();
    if (err)
    {
        LOG_ERR("Error when connecting to network: %d", err);
        k_sleep(K_SECONDS(CONFIG_MQTT_RECONNECT_DELAY_S));
        sys_reboot(SYS_REBOOT_COLD);
    }
    LOG_DBG("Connected to LTE network!");
    k_msleep(100);

    // Activate connectivity statistics
    char response[3];
    nrf_modem_at_scanf("AT%XCONNSTAT=1", "%s", response);
    if (strstr(response, "OK") == NULL)
    {
        LOG_ERR("Failed to set connectivity statistics.");
    }

    return 0;
}

Commenting out all of my MQTT related code, adding the configs for network connection manager to the project config and connecting using net_connect() doesn’t fix the issue.

However if I try to use MQTT with connection manager and net_connect(), the connection fails more often. It also reconnects sometimes, but is overall much less stable:

[00:00:49.447,021] <inf> golioth_coap_client_zephyr: Ending session
[00:00:52.218,994] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:52.219,177] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:52.562,713] <inf> golioth_fw_update: Received OTA manifest
[00:00:52.562,774] <inf> golioth_fw_update: Current version = 1.0.8, Target version = 1.0.8
[00:00:52.562,805] <inf> golioth_fw_update: Current version matches target version.
[00:00:52.562,805] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:52.562,805] <inf> golioth_fw_update: State = Idle
[00:01:22.850,250] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:01:22.850,280] <inf> golioth_coap_client_zephyr: Ending session
[00:01:26.877,960] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:01:26.879,272] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:26.879,272] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:34.881,805] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:01:34.882,965] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:34.882,965] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:42.899,688] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:01:42.901,214] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:42.901,245] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:50.070,037] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:01:50.070,220] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:01:50.424,774] <inf> golioth_fw_update: Received OTA manifest
[00:01:50.424,835] <inf> golioth_fw_update: Current version = 1.0.8, Target version = 1.0.8
[00:01:50.424,835] <inf> golioth_fw_update: Current version matches target version.
[00:01:50.424,865] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:01:50.424,865] <inf> golioth_fw_update: State = Idle
[00:02:20.803,283] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:02:20.803,283] <inf> golioth_coap_client_zephyr: Ending session
[00:02:24.816,497] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:02:24.817,687] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:02:24.817,687] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:02:31.168,457] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:02:31.168,670] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:02:31.495,239] <inf> golioth_fw_update: Received OTA manifest
[00:02:31.495,300] <inf> golioth_fw_update: Current version = 1.0.8, Target version = 1.0.8
[00:02:31.495,300] <inf> golioth_fw_update: Current version matches target version.
[00:02:31.495,330] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:02:31.495,330] <inf> golioth_fw_update: State = Idle```

updated the Golioth Firmware SDK version, but did not run a west update afterwards.

I did, but I used west only for installation, as I’m compiling using the NRF VSCode extension, which was set to use NCS 2.9.0. There is a breaking change with the NCS 3.0.1 directly relating to the SDK:

/opt/nordic/ncs/v3.0.1/nrf/include/flash_map_pm.h:10:10: fatal error: pm_config.h: No such file or directory

Now, I can’t find an default partition manager file for the NRF9160/1 so I can’t compile the code, and the Golioth SDK also only provides .conf files for this processor family.

The Golioth Firmware SDK uses NCS v3.0.1 in v0.18.1.

Which version does 0.17.0 use? I can try to invest more time to migrate to the new NCS but I’m not sure if it would even fix the issue.

v0.17.0 uses NCS v2.8.0. If you are able to share more of your firmware application then we can try to provide additional support. I would try building and running the sample application to help isolate the root cause.

My application is relatively large, it’s not in a public repo and I can’t upload files here so I don’t think I’ll be able to share it whole.

I did replace my whole main file with the sample and also replace the whole project config. The resulting behavior is much worse:

Booting TF-M v2.1.1-ncs2-snapshot1
[Sec Thread] Secure image initializing!
*** Booting My Application v1.0.8-4439dd2d4bf9 ***
*** Using nRF Connect SDK v2.9.0-7787b2649840 ***
*** Using Zephyr OS v3.7.99-1f8f3dc29142 ***
*** Golioth Firmware SDK v0.18.1 ***
[00:00:00.537,048] <inf> golioth_settings_autoload: Initializing settings subsystem
[00:00:00.542,816] <inf> golioth_settings_autoload: Loading settings
[00:00:00.542,846] <dbg> fw_update_sample: main: Start FW Update sample
[00:00:00.542,846] <inf> golioth_samples: Bringing up network interface
[00:00:00.542,877] <inf> golioth_samples: Waiting to obtain IP address
[00:00:06.265,319] <inf> lte_monitor: Network: Searching
[00:00:25.670,043] <inf> lte_monitor: Network: Registered (roaming)
[00:00:25.670,806] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:25.671,630] <inf> golioth_fw_update: Current firmware version: main - 1.0.8
[00:00:25.673,339] <inf> golioth_fw_update: State = Idle
[00:00:25.893,218] <err> golioth_coap_client_zephyr: Failed to connect to socket: -22
[00:00:25.893,768] <err> golioth_coap_client_zephyr: Failed to connect: -22
[00:00:25.893,768] <wrn> golioth_coap_client_zephyr: Failed to connect: -22
[00:00:30.895,080] <err> golioth_coap_client_zephyr: Failed to connect to socket: -22
[00:00:30.895,507] <err> golioth_coap_client_zephyr: Failed to connect: -22
[00:00:30.895,507] <wrn> golioth_coap_client_zephyr: Failed to connect: -22
[00:00:35.896,911] <err> golioth_coap_client_zephyr: Failed to connect to socket: -22
[00:00:35.897,338] <err> golioth_coap_client_zephyr: Failed to connect: -22
[00:00:35.897,369] <wrn> golioth_coap_client_zephyr: Failed to connect: -22

Please note that I did base the application from the sample, but the sample never really worked reliably for me.

Right not the only hot fix I have in mind is to manually stop the golioth coap client, but this could introduce more instabilities.

So it turns out the solution was very simple. I did read this article from the official blog before: https://blog.golioth.io/nat-is-the-enemy-of-low-power-devices/

The CONFIG_GOLIOTH_COAP_KEEPALIVE_INTERVAL_S=0 setting helped me get rid of the -22 connection errors before.

But it also turns out the setting I was looking for the whole time was also in the same article, it’s CONFIG_GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC.

Setting it to 300s and enabling CONFIG_GOLIOTH_USE_CONNECTION_ID reduces the data and energy usage by 10x without breaking the connection. I could likely use even a higher timeout, but it’s not necessary/I’m not going to test it now. Either way, this solves my problem.