Issue with COAP_EXTENDED_OPTIONS_LEN_VALUE causing OTA failure

Description

I recently ported my working app to Golioth SDK v0.19.0 to support the latest location services, and with the new SDK comes new issues and bugs. When triggering the OTA, I was getting the following Path too long error message. So to fix that, I added the following to the prj.conf file as stated in the Golioth template

CONFIG_COAP_EXTENDED_OPTIONS_LEN=y
CONFIG_COAP_EXTENDED_OPTIONS_LEN_VALUE=39

However, the issue persists, and although the device manages to download the image from the cloud, it gets stuck in a reset loop (see the full log),

[00:00:07.456,939] <inf> golioth_fw_update: Current version = 1.0.0, Target version = 1.1.1
[00:00:07.768,554] <inf> golioth_fw_update: Target component already downloaded. Attempting to update.
[00:00:07.768,554] <inf> golioth_fw_update: State = Updating
[00:00:08.096,893] <inf> golioth_fw_update: Rebooting into new image in 5 seconds
[00:00:09.097,045] <inf> golioth_fw_update: Rebooting into new image in 4 seconds
[00:00:10.097,198] <inf> golioth_fw_update: Rebooting into new image in 3 seconds
[00:00:11.097,351] <inf> golioth_fw_update: Rebooting into new image in 2 seconds

which seems to be related to this issue, as the OTA was working flawlessly in the previous SDK v0.18.0.

So, what should be the correct CoAP length value to prevent this?
39 does not seem to be working atm.

Expected Behavior

The device should be able to download without giving coap length error and boot in the downloaded image.

Actual Behavior

[00:00:07.728,027] <err> net_coap: 234 is > sizeof(coap_option->value)(39)!
[00:00:07.728,057] <inf> golioth_rpc: RPC observation established
[00:00:07.828,186] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:07.828,186] <inf> golioth_fw_update: State = Idle
[00:00:08.156,005] <inf> golioth_fw_update: Received OTA manifest
[00:00:08.156,066] <inf> golioth_fw_update: Current version = 1.0.0, Target version = 1.1.1
[00:00:08.468,231] <inf> golioth_fw_update: State = Downloading
[00:00:09.574,493] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:09.574,615] <inf> golioth_fw_update: Received block 0/368

Impact

Hampering the deployment and release of the firmware update to the fleet

Environment

NCS 3.0.2
Golioth SDK v0.19.0
Device nRF9151 SiP

Logs and Console Output

Booting TF-M v2.1.1-ncs4-2
[Sec Thread] Secure image initializing!

[00:00:00.256,866] <inf> lis2dh: fs=2, odr=0x4 lp_en=0x0 scale=9576
*** Booting My Application v1.0.0-a3bb163d1c60 ***
*** Using nRF Connect SDK v3.0.2-89ba1294ac9b ***
*** Using Zephyr OS v4.0.99-f791c49f492c ***
*** Golioth Firmware SDK v0.19.0 ***
[00:00:00.598,999] <dbg> battery: npm1300_fuel_gauge_init: Init and start nPM1300 PMIC
[00:00:00.599,212] <dbg> battery: fuel_gauge_init_params: nRF Fuel Gauge version: 1.0.1
[00:00:00.601,959] <inf> battery: Fuel gauge initialized successfully

[00:00:00.601,989] <inf> golioth_settings_autoload: Initializing settings subsystem
[00:00:00.608,337] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.608,337] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.608,367] <inf> fs_nvs: data wra: 0, 74
[00:00:00.608,428] <inf> golioth_settings_autoload: Loading settings
[00:00:00.609,252] <inf> tracker: -------------------- MAIN APP --------------------
[00:00:00.609,283] <dbg> tracker: main: Starting application
[00:00:00.609,313] <inf> tracker: Firmware version: 1.0.0
[00:00:00.696,594] <inf> modem: Modem firmware version: mfw_nrf91x1_2.0.2
[00:00:00.796,020] <inf> modem: Reset reason: PIN reset | software (0x10001)
[00:00:02.609,680] <wrn> date_time: Valid time not currently available
[00:00:02.609,710] <err> vibe_monitor: Failed to obtain current time, error -61
[00:00:02.783,660] <inf> modem: Incoming RSRP status message, RSRP value is -88
[00:00:02.783,874] <inf> cellular_nrf91: LTE cell changed: Cell ID: 103705110, Tracking area: 41601
[00:00:02.783,905] <inf> cellular_nrf91: LTE mode: LTEM
[00:00:04.796,417] <inf> user_button: User button initialized on pin 31
[00:00:04.796,447] <inf> modem: Connecting to LTE, this may take some time...
[00:00:06.199,829] <wrn> at_monitor: No heap space for incoming notification: %XTIME: "0A","5280303013650A","01"

[00:00:06.202,606] <inf> modem: Network: Registered (roaming)
[00:00:06.203,063] <inf> golioth_mbox: Mbox created, bufsize: 1320, num_items: 10, item_size: 120
[00:00:06.203,765] <inf> golioth_fw_update: Current firmware version: main - 1.0.0
[00:00:06.205,718] <inf> modem: PSM parameter update: TAU: 3600 s, Active time: 0 s

[00:00:06.206,024] <inf> golioth_fw_update: State = Idle
[00:00:07.317,260] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:07.317,474] <inf> tracker: Golioth client connected
[00:00:07.317,596] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop

[00:00:07.728,027] <err> net_coap: 234 is > sizeof(coap_option->value)(39)!
[00:00:07.728,057] <inf> golioth_rpc: RPC observation established
[00:00:07.828,186] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:07.828,186] <inf> golioth_fw_update: State = Idle
[00:00:08.156,005] <inf> golioth_fw_update: Received OTA manifest
[00:00:08.156,066] <inf> golioth_fw_update: Current version = 1.0.0, Target version = 1.1.1
[00:00:08.468,231] <inf> golioth_fw_update: State = Downloading
[00:00:09.574,493] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:09.574,615] <inf> golioth_fw_update: Received block 0/368
[00:00:09.574,676] <inf> mcuboot_util: Image index: 0, Swap type: none
[00:00:09.574,707] <inf> golioth_fw_zephyr: swap type: none
[00:00:10.310,546] <inf> modem: Incoming RSRP status message, RSRP value is -93
[00:00:10.331,146] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:10.331,207] <inf> golioth_fw_update: Received block 1/368
[00:00:10.873,504] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:10.873,596] <inf> golioth_fw_update: Received block 2/368
[00:00:11.567,260] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:11.567,352] <inf> golioth_fw_update: Received block 3/368
[00:00:12.111,724] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:12.111,816] <inf> golioth_fw_update: Received block 4/368
[00:00:12.197,967] <inf> modem: Incoming RSRP status message, RSRP value is -91
[00:00:12.891,296] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:12.891,387] <inf> golioth_fw_update: Received block 5/368
[00:00:13.434,326] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:13.434,417] <inf> golioth_fw_update: Received block 6/368
[00:00:13.967,285] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:13.967,376] <inf> golioth_fw_update: Received block 7/368
[00:00:14.441,314] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:14.441,436] <inf> golioth_fw_update: Received block 8/368
[00:00:15.066,619] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:15.066,711] <inf> golioth_fw_update: Received block 9/368
[00:00:15.618,377] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:15.618,469] <inf> golioth_fw_update: Received block 10/368
[00:00:16.157,257] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:16.157,348] <inf> golioth_fw_update: Received block 11/368
[00:00:16.718,200] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:16.718,322] <inf> golioth_fw_update: Received block 12/368
[00:00:17.350,708] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:17.350,799] <inf> golioth_fw_update: Received block 13/368
[00:00:17.887,390] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
....
[00:03:44.854,187] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:03:44.854,278] <inf> golioth_fw_update: Received block 368/368
[00:03:44.949,554] <inf> golioth_fw_update: Successfully downloaded 377651 bytes in 216154 ms
[00:03:44.949,584] <inf> golioth_fw_update: State = Downloaded
[00:03:45.281,127] <inf> golioth_fw_update: State = Updating
[00:03:45.604,064] <inf> golioth_fw_update: Rebooting into new image in 5 seconds
[00:03:46.604,248] <inf> golioth_fw_update: Rebooting into new image in 4 seconds
[00:03:47.604,400] <inf> golioth_fw_update: Rebooting into new image in 3 seconds
[00:03:48.189,575] <inf> modem: Incoming RSRP status message, RSRP value is -85
[00:03:48.189,880] <inf> cellular_nrf91: LTE cell changed: Cell ID: 5370133, Tracking area: 41707
[00:03:48.189,910] <inf> modem: LTE cell changed: Cell ID: 5370133, Tracking area: 41707

[00:03:48.604,553] <inf> golioth_fw_update: Rebooting into new image in 2 seconds
uart:~$ All pins have been configured as non-secure
Booting TF-M v2.1.1-ncs4-2
[Sec Thread] Secure image initializing!

[00:00:00.256,835] <inf> lis2dh: fs=2, odr=0x4 lp_en=0x0 scale=9576
*** Booting My Application v1.0.0-a3bb163d1c60 ***
*** Using nRF Connect SDK v3.0.2-89ba1294ac9b ***
*** Using Zephyr OS v4.0.99-f791c49f492c ***
*** Golioth Firmware SDK v0.19.0 ***
[00:00:00.598,907] <dbg> battery: npm1300_fuel_gauge_init: Init and start nPM1300 PMIC
[00:00:00.599,121] <dbg> battery: fuel_gauge_init_params: nRF Fuel Gauge version: 1.0.1
[00:00:00.601,867] <inf> battery: Fuel gauge initialized successfully

[00:00:00.601,898] <inf> golioth_settings_autoload: Initializing settings subsystem
[00:00:00.608,245] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.608,245] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.608,276] <inf> fs_nvs: data wra: 0, 74
[00:00:00.608,337] <inf> golioth_settings_autoload: Loading settings
[00:00:00.609,130] <inf> tracker: -------------------- MAIN APP --------------------
[00:00:00.609,191] <dbg> tracker: main: Starting application
[00:00:00.609,222] <inf> tracker: Firmware version: 1.0.0
[00:00:00.696,502] <inf> modem: Modem firmware version: mfw_nrf91x1_2.0.2
[00:00:00.795,928] <inf> modem: Reset reason: PIN reset | software (0x10001)
[00:00:04.405,029] <inf> modem: Incoming RSRP status message, RSRP value is -85
[00:00:04.405,242] <inf> cellular_nrf91: LTE cell changed: Cell ID: 5370133, Tracking area: 41707
[00:00:04.405,273] <inf> cellular_nrf91: LTE mode: LTEM
[00:00:04.796,325] <inf> user_button: User button initialized on pin 31
[00:00:04.796,325] <inf> modem: Connecting to LTE, this may take some time...
[00:00:05.821,807] <inf> modem: Incoming RSRP status message, RSRP value is -89
uart:~$ 
[00:00:09.586,517] <err> net_coap: 813 is > sizeof(coap_option->value)(39)!
[00:00:09.611,389] <err> net_coap: 234 is > sizeof(coap_option->value)(39)!
[00:00:09.611,419] <inf> golioth_rpc: RPC observation established
[00:00:09.666,564] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:09.666,564] <inf> golioth_fw_update: State = Idle
[00:00:09.960,662] <inf> golioth_fw_update: Received OTA manifest
[00:00:09.960,723] <inf> golioth_fw_update: Current version = 1.0.0, Target version = 1.1.1
[00:00:10.272,552] <inf> golioth_fw_update: Target component already downloaded. Attempting to update.
[00:00:10.272,552] <inf> golioth_fw_update: State = Updating
[00:00:10.571,746] <inf> golioth_fw_update: Rebooting into new image in 5 seconds
[00:00:11.571,899] <inf> golioth_fw_update: Rebooting into new image in 4 seconds
[00:00:12.572,052] <inf> golioth_fw_update: Rebooting into new image in 3 seconds
[00:00:13.572,204] <inf> golioth_fw_update: Rebooting into new image in 2 seconds
uart:~$ All pins have been configured as non-secure
Booting TF-M v2.1.1-ncs4-2
[Sec Thread] Secure image initializing!

[00:00:00.256,805] <inf> lis2dh: fs=2, odr=0x4 lp_en=0x0 scale=9576
*** Booting My Application v1.0.0-a3bb163d1c60 ***
*** Using nRF Connect SDK v3.0.2-89ba1294ac9b ***
*** Using Zephyr OS v4.0.99-f791c49f492c ***
*** Golioth Firmware SDK v0.19.0 ***
[00:00:00.598,602] <dbg> battery: npm1300_fuel_gauge_init: Init and start nPM1300 PMIC
[00:00:00.598,815] <dbg> battery: fuel_gauge_init_params: nRF Fuel Gauge version: 1.0.1
[00:00:00.601,562] <inf> battery: Fuel gauge initialized successfully

[00:00:00.601,593] <inf> golioth_settings_autoload: Initializing settings subsystem
[00:00:00.607,940] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.607,940] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.607,971] <inf> fs_nvs: data wra: 0, 74
[00:00:00.608,001] <inf> golioth_settings_autoload: Loading settings
[00:00:00.608,825] <inf> tracker: -------------------- MAIN APP --------------------
[00:00:00.608,886] <dbg> tracker: main: Starting application 
[00:00:00.608,886] <inf> tracker: Firmware version: 1.0.0
[00:00:00.696,197] <inf> modem: Modem firmware version: mfw_nrf91x1_2.0.2
[00:00:00.795,623] <inf> modem: Reset reason: PIN reset | software (0x10001)
[00:00:02.168,853] <inf> modem: Incoming RSRP status message, RSRP value is -81
[00:00:02.169,067] <inf> cellular_nrf91: LTE cell changed: Cell ID: 5370133, Tracking area: 41707
[00:00:02.169,097] <inf> cellular_nrf91: LTE mode: LTEM
[00:00:04.796,020] <inf> user_button: User button initialized on pin 31
[00:00:04.796,051] <inf> modem: Connecting to LTE, this may take some time...
[00:00:05.565,490] <wrn> at_monitor: No heap space for incoming notification: %XTIME: "0A","5280303063300A","01"

[00:00:05.668,182] <inf> modem: Network: Registered (roaming)
[00:00:05.668,670] <inf> golioth_mbox: Mbox created, bufsize: 1320, num_items: 10, item_size: 120
[00:00:05.669,372] <inf> golioth_fw_update: Current firmware version: main - 1.0.0
[00:00:05.671,325] <inf> modem: PSM parameter update: TAU: 3600 s, Active time: 0 s

[00:00:05.671,630] <inf> golioth_fw_update: State = Idle
[00:00:06.755,798] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:06.756,011] <inf> tracker: Golioth client connected
[00:00:06.756,011] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:07.097,595] <err> net_coap: 27259 is > sizeof(coap_option->value)(39)!
[00:00:07.111,633] <err> net_coap: 234 is > sizeof(coap_option->value)(39)!
[00:00:07.111,694] <inf> golioth_rpc: RPC observation established
[00:00:07.143,737] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:07.143,768] <inf> golioth_fw_update: State = Idle
[00:00:07.456,878] <inf> golioth_fw_update: Received OTA manifest
[00:00:07.456,939] <inf> golioth_fw_update: Current version = 1.0.0, Target version = 1.1.1
[00:00:07.768,554] <inf> golioth_fw_update: Target component already downloaded. Attempting to update.
[00:00:07.768,554] <inf> golioth_fw_update: State = Updating
[00:00:08.096,893] <inf> golioth_fw_update: Rebooting into new image in 5 seconds
[00:00:09.097,045] <inf> golioth_fw_update: Rebooting into new image in 4 seconds
[00:00:10.097,198] <inf> golioth_fw_update: Rebooting into new image in 3 seconds
[00:00:11.097,351] <inf> golioth_fw_update: Rebooting into new image in 2 seconds

@rajeev thank you for reporting! Could you confirm that the 1.1.1 image that the device is successfully downloading does in fact of the version set to 1.1.1 in the firmware binary and is not just named 1.1.1 on Golioth? I am surprised to see that when you reboot into the new image that the logged firmware version is the same as the previous image.

@hasheddan,
Correct. The new image has been built with version 1.1.1 as set in the VERSION file:

VERSION_MAJOR = 1
VERSION_MINOR = 1
PATCHLEVEL = 1
VERSION_TWEAK = 0
EXTRAVERSION =

I am also confused as to why it is unable to reboot into the new version, and I started seeing this issue after upgrading to v0.19.0.

I am using sysbuild with pristine + pm_static.yml to compile the new image, and zephyr.signed.bin is the binary uploaded to the Golioth cloud.

I had one of the devices running the app with the previous Golioth SDK v0.17.0, which was the stable version for me, and then, when that device received the new image with Golioth SDK v0.19.0, it started showing this consistent reboot issue. Another device running the app with the latest SDK, same behaviour.

I am a bit confused about what might still be missing or misconfigured, causing this new behavior. I hope it’s not the COAP_EXTENDED_OPTIONS params some how glitching things out.

Below are the config params if this may help.
Here is my prj.conf

# Enable Golioth Firmware SDK
CONFIG_GOLIOTH_FIRMWARE_SDK=y

# Enable common sample library
CONFIG_GOLIOTH_SAMPLE_COMMON=y

# Golioth services used in this app
CONFIG_GOLIOTH_FW_UPDATE=y
CONFIG_GOLIOTH_LIGHTDB_STATE=y
CONFIG_GOLIOTH_RPC=y
CONFIG_GOLIOTH_SETTINGS=y
CONFIG_GOLIOTH_STREAM=y
CONFIG_GOLIOTH_NET_INFO=y

# Enabling CONFIG_LOG_BACKEND_GOLIOTH
# and CONFIG_GOLIOTH_AUTO_LOG_TO_CLOUD
# can result in a large amount of data transmitted 
# to Golioth, so use with caution.

# Enable/disable sending logs to Golioth backend
CONFIG_LOG_BACKEND_GOLIOTH=n

# Enable/disable automatic logging to Golioth for GLTH_LOGX statements
# mainly for OTA module
CONFIG_GOLIOTH_AUTO_LOG_TO_CLOUD=n

# Configure Golioth SDK dependencies
CONFIG_ZVFS_EVENTFD_MAX=14
CONFIG_ZVFS_OPEN_MAX=23
CONFIG_LOG_PROCESS_THREAD_STACK_SIZE=1536
CONFIG_MBEDTLS_ENABLE_HEAP=y
CONFIG_MBEDTLS_HEAP_SIZE=10240
CONFIG_NETWORKING=y
CONFIG_NET_IPV4=y
CONFIG_COAP_EXTENDED_OPTIONS_LEN=y
CONFIG_COAP_EXTENDED_OPTIONS_LEN_VALUE=39

# Application
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048
CONFIG_NET_LOG=y

# Flash memory (etc.) for firmware upgrade
CONFIG_FLASH=y
CONFIG_FLASH_MAP=y
CONFIG_NVS=y
CONFIG_STREAM_FLASH=y
CONFIG_IMG_MANAGER=y
CONFIG_IMG_ERASE_PROGRESSIVELY=y
CONFIG_REBOOT=y

# The rest of the runtime credentials config
CONFIG_SETTINGS=y
CONFIG_SETTINGS_RUNTIME=y
CONFIG_GOLIOTH_SAMPLE_SETTINGS_AUTOLOAD=y
CONFIG_GOLIOTH_SAMPLE_SETTINGS_SHELL=y

# Misc.
CONFIG_JSON_LIBRARY=y
# Longer response length needed for network info
CONFIG_GOLIOTH_RPC_MAX_RESPONSE_LEN=512

# Date-Time library
CONFIG_DATE_TIME=y

# Enable I2C
CONFIG_I2C=y

# newlibc for printing floats
CONFIG_NEWLIB_LIBC=y
CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y

# Disable keepalive
CONFIG_GOLIOTH_COAP_KEEPALIVE_INTERVAL_S=0
CONFIG_GOLIOTH_COAP_CLIENT_RX_TIMEOUT_SEC=1200
CONFIG_GOLIOTH_USE_CONNECTION_ID=y

and the nrf9151_ns.conf

# General config
CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_NEWLIB_LIBC=y

# Networking
CONFIG_NET_SOCKETS_OFFLOAD=y
CONFIG_NET_IPV6=y
CONFIG_NET_IPV6_NBR_CACHE=n
CONFIG_NET_IPV6_MLD=n

# Increase native TLS socket implementation, so that it is chosen instead of
# offloaded nRF91 sockets
CONFIG_NET_SOCKETS_TLS_PRIORITY=35

# Modem library
CONFIG_NRF_MODEM_LIB=y

# LTE connectivity with network connection manager
CONFIG_NRF_MODEM_LIB_NET_IF=y
CONFIG_NRF_MODEM_LIB_NET_IF_AUTO_START=y
CONFIG_NRF_MODEM_LIB_NET_IF_AUTO_CONNECT=y
CONFIG_NET_CONNECTION_MANAGER=y
CONFIG_NET_CONNECTION_MANAGER_MONITOR_STACK_SIZE=1024

# Add Logs for LTE Link Handler
CONFIG_GOLIOTH_SAMPLE_NRF91_LTE_MONITOR=n

# Increased sysworkq size, due to LTE connectivity
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048

# Disable options y-selected by NCS for no good reason
CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_PSK_ENABLED=n
CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_RSA_ENABLED=n

# MbedTLS configuration to support p-384 curve. These options
# enable using the MbedTLS built-in support for operations not
# supported by the default nRF Oberon crypto backend
CONFIG_NORDIC_SECURITY_BACKEND=n
CONFIG_MBEDTLS_LEGACY_CRYPTO_C=y

# Configure Golioth SDK dependencies (for NCS)
CONFIG_MBEDTLS_SSL_IN_CONTENT_LEN=2048
CONFIG_MBEDTLS_SSL_OUT_CONTENT_LEN=2048

# Add Network Info Support
CONFIG_MODEM_INFO=y

# For location using LTE
CONFIG_LTE_LC_NEIGHBOR_CELL_MEAS_MODULE=y

# Generate MCUboot compatible images
CONFIG_BOOTLOADER_MCUBOOT=y

Will continue testing to see if I manage to resolve the issue.
Thanks

@hasheddan for quick testing, reverted to the stable version of the app that uses:

Environment

NCS 2.9.0
Golioth SDK v0.17.0

Built and uploaded the binary to the Golioth cloud. Triggered OTA, and the image booted up flawlessly in the newer version. This was to confirm that the pm_static.yml and the build process are correct, and the mcuboot also works on the device.

The downside, the GA Golioth location does not work as the location API is outdated for SDK v0.17.0 :frowning:
Hence, it was the main reason to upgrade to Golioth SDK v0.19.0.

Let me know what else I should try out?

@rajeev I am attempting to reproduce, but in the meantime I notice that you are using NCS 3.0.2. While I don’t expect that this is the root of the issues you are seeing, v0.19.0 of the Golioth Firmware SDK is tested against NCS 3.0.1. It may be worth seeing if the issue reproduces with NCS 3.0.1. I don’t suspect that COAP_EXTENDED_OPTIONS_LEN is related to this specific behavior you are seeing at this point.

1 Like

@rajeev are you able to share the devicetree from your build? I’m curious if the call to to boot_request_upgrade may be silently not marking the secondary slot for upgrade. It may also be worth validating whether the firmware update sample works on your hardware, which would help us narrow down the source of the issue.

1 Like

Lemme test against the NCS 3.0.1 and will update you.

1 Like

@hasheddan sharing the zipped version of the entire build folder.

Hoping this might help.

I will also test the OTA sample and get back to you.

1 Like

@hasheddan, tested the fw_update sample and the device did boot up the image, although it did give a few CoAP length errors.

Here are the redacted logs

All pins have been configured as non-secure
Booting TF-M v2.1.1-ncs4-2
[Sec Thread] Secure image initializing!

[00:00:00.256,988] <inf> lis2dh: fs=2, odr=0x4 lp_en=0x0 scale=9576
*** Booting My Application v1.0.0 - unknown commit ***
*** Using nRF Connect SDK v3.0.2-89ba1294ac9b ***
*** Using Zephyr OS v4.0.99-f791c49f492c ***
*** Golioth Firmware SDK v0.19.0 ***
[00:00:00.602,722] <inf> golioth_settings_autoload: Initializing settings subsystem
[00:00:00.608,551] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.608,581] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.608,581] <inf> fs_nvs: data wra: 0, 74
[00:00:00.608,612] <inf> golioth_settings_autoload: Loading settings
[00:00:00.608,825] <dbg> fw_update_sample: main: Start FW Update sample
[00:00:00.608,856] <inf> golioth_samples: Bringing up network interface
[00:00:00.608,886] <inf> golioth_samples: Waiting to obtain IP address
[00:00:06.267,761] <inf> golioth_mbox: Mbox created, bufsize: 1320, num_items: 10, item_size: 120
[00:00:06.269,256] <inf> golioth_fw_update: Current firmware version: main - 1.0.0
[00:00:06.270,843] <inf> golioth_fw_update: State = Idle
[00:00:07.380,065] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:07.380,371] <inf> fw_update_sample: Golioth client connected
[00:00:07.380,401] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:07.704,772] <err> net_coap: 242 is > sizeof(coap_option->value)(39)!
[00:00:07.706,817] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:07.706,817] <inf> golioth_fw_update: State = Idle
[00:00:08.018,035] <inf> golioth_fw_update: Received OTA manifest
[00:00:08.018,096] <inf> golioth_fw_update: Manifest does not contain target component: main
[00:00:08.018,096] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:08.018,096] <inf> golioth_fw_update: State = Idle
[00:03:09.135,955] <err> net_coap: 242 is > sizeof(coap_option->value)(39)!
[00:03:09.136,566] <inf> golioth_fw_update: Received OTA manifest
[00:03:09.136,627] <inf> golioth_fw_update: Current version = 1.0.0, Target version = 1.0.1
[00:03:09.466,156] <inf> golioth_fw_update: State = Downloading
[00:03:10.310,546] <inf> golioth_fw_update: Received block 0/387
[00:03:10.310,638] <inf> mcuboot_util: Image index: 0, Swap type: none
[00:03:10.310,668] <inf> golioth_fw_zephyr: swap type: none
[00:03:10.909,729] <inf> golioth_fw_update: Received block 1/387
[00:03:11.300,689] <inf> golioth_fw_update: Received block 2/387
[00:03:11.686,584] <inf> golioth_fw_update: Received block 3/387
[00:03:12.086,608] <inf> golioth_fw_update: Received block 4/387
[00:03:12.580,718] <inf> golioth_fw_update: Received block 5/387

.....
[00:03:53.151,000] <inf> golioth_fw_update: Received block 102/387
[00:03:53.558,074] <err> net_coap: 12910 is > sizeof(coap_option->value)(39)!
[00:03:53.558,166] <inf> golioth_fw_update: Received block 103/387
[00:03:53.928,100] <inf> golioth_fw_update: Received block 104/387
[00:03:54.431,060] <inf> golioth_fw_update: Received block 105/387
[00:03:54.849,212] <inf> golioth_fw_update: Received block 106/387

....
[00:04:33.321,472] <inf> golioth_fw_update: Received block 198/387
[00:04:33.730,560] <err> net_coap: 12910 is > sizeof(coap_option->value)(39)!
[00:04:33.730,651] <inf> golioth_fw_update: Received block 199/387

[00:05:51.075,286] <inf> golioth_fw_update: Received block 383/387
[00:05:51.452,209] <inf> golioth_fw_update: Received block 384/387
[00:05:51.932,342] <inf> golioth_fw_update: Received block 385/387
[00:05:52.355,316] <inf> golioth_fw_update: Received block 386/387
[00:05:52.655,181] <inf> golioth_fw_update: Received block 387/387
[00:05:52.655,639] <inf> golioth_fw_update: Successfully downloaded 396299 bytes in 162864 ms
[00:05:52.655,639] <inf> golioth_fw_update: State = Downloaded
[00:05:53.000,183] <inf> golioth_fw_update: State = Updating
[00:05:53.310,546] <inf> golioth_fw_update: Rebooting into new image in 5 seconds
[00:05:54.310,668] <inf> golioth_fw_update: Rebooting into new image in 4 seconds
[00:05:55.310,821] <inf> golioth_fw_update: Rebooting into new image in 3 seconds
[00:05:56.310,974] <inf> golioth_fw_update: Rebooting into new image in 2 seconds
[00:05:57.311,126] <inf> golioth_fw_update: Rebooting into new image in 1 seconds
uart:~$ All pins have been configured as non-secure
Booting TF-M v2.1.1-ncs4-2
[Sec Thread] Secure image initializing!

[00:00:00.256,988] <inf> lis2dh: fs=2, odr=0x4 lp_en=0x0 scale=9576
*** Booting My Application v1.0.1 - unknown commit ***
*** Using nRF Connect SDK v3.0.2-89ba1294ac9b ***
*** Using Zephyr OS v4.0.99-f791c49f492c ***
*** Golioth Firmware SDK v0.19.0 ***
[00:00:00.600,402] <inf> golioth_settings_autoload: Initializing settings subsystem
[00:00:00.606,231] <inf> fs_nvs: 2 Sectors of 4096 bytes
[00:00:00.606,231] <inf> fs_nvs: alloc wra: 0, fb8
[00:00:00.606,231] <inf> fs_nvs: data wra: 0, 74
[00:00:00.606,292] <inf> golioth_settings_autoload: Loading settings
[00:00:00.606,506] <dbg> fw_update_sample: main: Start FW Update sample
[00:00:00.606,536] <inf> golioth_samples: Bringing up network interface
[00:00:00.606,536] <inf> golioth_samples: Waiting to obtain IP address
[00:00:05.468,170] <inf> golioth_mbox: Mbox created, bufsize: 1320, num_items: 10, item_size: 120
[00:00:05.469,665] <inf> golioth_fw_update: Current firmware version: main - 1.0.1
[00:00:05.470,306] <inf> golioth_fw_update: Waiting for golioth client to connect before cancelling rollback
[00:00:06.586,456] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:00:06.586,761] <inf> fw_update_sample: Golioth client connected
[00:00:06.586,853] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:00:07.470,489] <inf> golioth_fw_update: Firmware updated successfully!
[00:00:07.778,533] <inf> golioth_fw_update: State = Idle
[00:00:08.116,424] <inf> golioth_fw_update: Waiting to receive OTA manifest
[00:00:08.116,455] <inf> golioth_fw_update: State = Idle
[00:00:08.418,487] <inf> golioth_fw_update: Received OTA manifest
[00:00:08.418,548] <inf> golioth_fw_update: Current version = 1.0.1, Target version = 1.0.1
[00:00:08.418,579] <inf> golioth_fw_update: Current version matches target version.
[00:00:08.418,579] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[00:00:08.418,579] <inf> golioth_fw_update: State = Idle

Will now move on to testing with NCS v3.0.1.

1 Like

@hasheddan tested against NCS 3.0.1 and is getting the same behavior.

I am moving this issue to [email protected] so that I can send over the project files for your review.

Thanks

1 Like

@rajeev sounds good, thanks for testing!

1 Like

@rajeev I know you moved this to email, but if you figure out how to fix this, can you reply back to this thread? (I’m seeing the same error)

@cdwilson the net_coap length overflow issue is still pending (WIP) @hasheddan and team are working on that end to get it fixed. I think the Golioth SDK params, or probably the cloud side, needs to be fixed to overcome this length issue.

Thanks

1 Like

Just adding here that we were able to work with @rajeev to determine that the firmware update issue was unrelated. This log message was added in the newly pinned Zephyr version in v0.19.0 of the Golioth Firmware SDK, so this is unlikely to be related to code changes in the SDK itself, but rather due to the Zephyr version bump. Nevertheless, we are working on tracking down the source of the issue and resolving it in a forthcoming release.

2 Likes

@rajeev @cdwilson providing an update here – @sam was able to track down the source of the issue and has implemented a fix (zephyr: coap: reset block context on first block2 response by sam-golioth · Pull Request #819 · golioth/golioth-firmware-sdk · GitHub). This will be included in an upcoming v0.19.1 release. Thanks again for the reports here!

2 Likes

@hasheddan thanks for getting the fix out. Will continue testing and report if any other bugs might appear.

1 Like

We just released v0.19.1 of the Golioth Firmware SDK, which includes this fix. I’ll close out this topic as resolved, but please let us know if you experience any further issues. Thanks!

1 Like