OTA firmware update loop using excessive data

I was updating my device over OTA from version 0.0.2 to 0.0.4 and it continuously updated and used >10MB of data in an hour. It appears to download the data but doesn’t successfully update it. Any idea what would cause this?

here is the serial console dump for one of the loops.

[00:03:07.903,198] <dbg> app_dfu: data_received: Received 1024 bytes at offset 423936
[00:03:08.562,927] <dbg> app_dfu: data_received: Received 522 bytes at offset 424960 (last)
[00:03:09.375,701] <inf> app_dfu: Requesting upgrade
[00:03:09.383,728] <inf> app_dfu: Rebooting in 1 second(s)
*** Booting Zephyr OS build v3.3.99-ncs1-1 ***
[00:00:00.529,235] <inf> golioth_system: Initializing
[00:00:00.543,457] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.551,544] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.559,082] <inf> fs_nvs: data wra: 0, 60
------------------------------------------------------------------------------
Steadfast SM-IOT Foundation Monitor. Copyright 2023. version: v0.23 10/4/2023 (Oct  4 2023 09:58:26) MCUboot Ver:0.0.2
[00:00:00.590,423] <inf> sm_iot: Connecting to LTE, this may take some time...
[00:00:03.057,495] <inf> sm_iot: Network: Searching
[00:00:17.632,202] <inf> sm_iot: Network: Registered (roaming)
[00:00:17.641,265] <inf> golioth_system: Starting connect
[00:00:17.774,383] <dbg> sm_iot: main: Steadfast SM-IOT Foundation Monitor. Copyright 2023. version: v0.23 10/4/2023 (Oct  4 2023 09:58:26) MCUboot Ver: 0.0.2
[00:00:18.986,358] <inf> sm_iot: Golioth connected. Registering observations
[00:00:19.000,091] <inf> golioth_system: Client connected!
[00:00:19.434,600] <dbg> app_dfu: golioth_desired_update: Desired
                                  a3 01 1a 65 25 6d 16 02  78 40 34 31 37 66 36 34 |...e%m.. x@417f64
                                  61 30 31 64 37 36 37 34  39 38 38 62 32 64 33 35 |a01d7674 988b2d35
                                  63 39 38 64 39 65 38 35  34 31 34 35 64 31 34 32 |c98d9e85 4145d142
                                  32 33 37 31 61 38 38 35  32 62 65 65 30 66 37 34 |2371a885 2bee0f74
                                  61 36 64 61 39 65 64 35  66 38 03 81 a6 01 64 6d |a6da9ed5 f8....dm
                                  61 69 6e 02 65 30 2e 30  2e 34 03 78 40 61 66 39 |ain.e0.0 .4.x@af9
                                  31 34 31 39 63 31 36 34  62 30 31 64 35 34 39 65 |1419c164 b01d549e
                                  66 66 34 39 65 64 63 31  64 32 30 36 64 62 39 66 |ff49edc1 d206db9f
                                  32 66 32 30 38 35 61 38  35 39 64 34 61 64 39 36 |2f2085a8 59d4ad96
                                  63 35 31 30 38 66 65 65  63 36 64 62 35 04 1a 00 |c5108fee c6db5...
                                  06 7e 0a 05 70 2f 2e 75  2f 63 2f 6d 61 69 6e 40 |.~..p/.u /c/main@
                                  30 2e 30 2e 34 06 67 6d  63 75 62 6f 6f 74       |0.0.4.gm cuboot
[00:00:19.556,579] <inf> app_settings: Set DEBUG_LEVEL to 20
[00:00:19.571,319] <dbg> app_state: async_handler: State successfully set
[00:00:19.590,362] <inf> sm_iot: nRF9160 hardwire ID :            0x54c69ae8a26120e9
[00:00:19.600,921] <inf> sm_iot: Cellular Modem IMEI number:      350457798708960
[00:00:19.611,236] <dbg> sm_iot: main: Cellular Modem ICCID number:     8944501207217644380
[00:00:19.623,138] <dbg> sm_iot: main: Cellular Modem Signal Strength:  -81db
[00:00:19.642,608] <dbg> sm_iot: main: OS time and date: 10-10-23 16:38:06
[00:00:20.073,577] <dbg> app_dfu: data_received: Received 1024 bytes at offset 0
[00:00:20.085,021] <inf> mcuboot_util: Swap type: test
[00:00:20.093,811] <inf> golioth_dfu: swap type: test
[00:00:20.677,337] <dbg> app_dfu: data_received: Received 1024 bytes at offset 1024
[00:00:21.126,434] <dbg> app_dfu: data_received: Received 1024 bytes at offset 2048

I created a 0.0.3 version and programmed it manually using SWD. it did the update the first time and then reported “00:00:09.254,486] app_dfu: Desired version (0.0.4) matches current firmware version!” on the second boot like I would expect so I’ve fixed it for now.

Looking at the golioth_dfu log it went from UPDATING to IDLE with no other status when it was failing.

2023-10-10T16:38:06.855311378Z,DIAGNOSTICS,INFO,golioth_dfu,main(IDLE): ready state,main,ready state,0,IDLE,0,2023-10-10T16:38:06.855311378Z,0.0.2,651ce9fc5d4d6a17c7d39011,
2023-10-10T16:37:45.668010156Z,DIAGNOSTICS,INFO,golioth_dfu,main(UPDATING): ready state,main,ready state,0,UPDATING,3,0.0.4,2023-10-10T16:37:45.668010156Z,0.0.2,651ce9fc5d4d6a17c7d39011,

When I manually re-programmed it to v0.03 it passed with:

2023-10-10T16:46:42.827268176Z,DIAGNOSTICS,INFO,golioth_dfu,main(IDLE): firmware updated successfully,main,firmware updated successfully,1,IDLE,0,2023-10-10T16:46:42.827268176Z,0.0.4,651ce9fc5d4d6a17c7d39011
2023-10-10T16:45:49.514255380Z,DIAGNOSTICS,INFO,golioth_dfu,main(UPDATING): ready state,main,ready state,0,UPDATING,3,0.0.4,2023-10-10T16:45:49.51425538Z,0.0.3,651ce9fc5d4d6a17c7d39011

I’m worried that it will happen in the field and consume vast amounts of data in the field so I need to know what went wrong. Any ideas?

Hey John,
It looks to me that we have two issues here: first, the device uses too much data as it continuously downloads the same FW image, and the second issue is the device not running the downloaded v0.0.4 version of the firmware.
The first issue could be addressed in the application; that being said, it would be a future good feature in the firmware SDK. A temporary way to solve it could be to use something like LightDB State to track the number of times an update has been attempted and stop trying to perform it after some number of tries.
The second issue could be a mcuboot issue. Between v0.0.2 and v0.0.4, have you made any changes to app_dfu or flash files, or project configuration files? Have you been customizing the bootloader or making changes to the signing keys?
Have you seen behavior like this before?

I haven’t modified app_dfu, flash files, or bootloader at all. They are straight from the reference code. No change to signing keys recently either. There are always changes to the project config file but manually flashing the same code with a 0.0.3 revision on it using SWD fixed it.
This was the first update I’ve ever done on my code base. I did create a 0.0.5 version and it updated from 0.0.4 correctly though.