OTA Update dropping

Hi all,
This may be a silly one - I am new to Zephyr and Golioth but just looking for some help to point me in the right direction!

I have a device that we have created in house that uses an Actinius Icarus SOM with a NRF9160 onboard. We are currently using a SoftSIM from Onomondo and are looking at the OTA update functionality but I am running in to an issue where it drops off and goes to RCC Idle/Connected but doesnt resume or recover the update.

After a few minutes it times out and I get a -128 and -115 error and the Golioth client will disconnect and then reconnect, however it wont resume or restart the OTA.
I had a feeling it may be signal so I have bought the device home with me but the signal seems to stick around the same values during the update and as the issue occurs so I am a little lost.
I am also getting the same behaviour using the Icaurs SOM dev kit too.

[00:05:03.647,583] <inf> fw_block_processor: Downloading block index 203 (204/372)
[00:05:04.352,630] <inf> fw_block_processor: Downloading block index 204 (205/372)
[00:05:05.087,677] <inf> fw_block_processor: Downloading block index 205 (206/372)
[00:05:05.824,676] <inf> fw_block_processor: Downloading block index 206 (207/372)
[00:05:06.527,709] <inf> fw_block_processor: Downloading block index 207 (208/372)
[00:05:07.230,743] <inf> fw_block_processor: Downloading block index 208 (209/372)
[00:05:07.947,784] <inf> fw_block_processor: Downloading block index 209 (210/372)
[00:05:08.670,806] <inf> fw_block_processor: Downloading block index 210 (211/372)
[00:05:09.385,833] <inf> fw_block_processor: Downloading block index 211 (212/372)
[00:05:10.110,870] <inf> fw_block_processor: Downloading block index 212 (213/372)
> AT+CESQ
+CESQ: 99,99,255,255,24,48
OK
[00:05:10.827,911] <inf> fw_block_processor: Downloading block index 213 (214/372)
[00:05:11.550,933] <inf> fw_block_processor: Downloading block index 214 (215/372)
[00:05:12.265,930] <inf> fw_block_processor: Downloading block index 215 (216/372)
[00:05:12.992,980] <inf> fw_block_processor: Downloading block index 216 (217/372)
> AT+CESQ
[00:05:15.201,080] <wrn> golioth_coap_client: Resending request 0x2003f7d0 (reply 0x2003f808) (retries 2)
+CESQ: 99,99,255,255,28,48
OK
[00:05:17.568,176] <inf> fw_block_processor: Downloading block index 217 (218/372)
[00:05:18.271,209] <inf> fw_block_processor: Downloading block index 218 (219/372)
> AT+CESQ
+CESQ: 99,99,255,255,27,48
OK
[00:05:21.169,250] <wrn> golioth_coap_client: Resending request 0x2003f7d0 (reply 0x2003f808) (retries 2)
+CSCON: 0
[00:05:23.622,253] <inf> softsim_module: RRC: Idle
[00:05:26.963,714] <wrn> golioth_coap_client: Resending request 0x2003f7d0 (reply 0x2003f808) (retries 1)
> AT+CESQ
+CESQ: 99,99,255,255,21,49
OK
[00:05:38.551,879] <wrn> golioth_coap_client: Resending request 0x2003f7d0 (reply 0x2003f808) (retries 0)
> AT+CESQ
+CESQ: 99,99,255,255,21,49
OK
> AT+CESQ
+CESQ: 99,99,255,255,22,48
OK
> AT+CESQ
+CESQ: 99,99,255,255,22,48
OK
[00:08:11.458,953] <wrn> golioth_coap_client: Packet 0x2003f7d0 (reply 0x2003f808) was not replied to
[00:08:11.470,672] <inf> fw_block_processor: Downloading block index 219 (220/372)
+CSCON: 1
[00:08:12.310,791] <inf> softsim_module: RRC: Connected
[00:08:12.535,552] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:08:12.547,271] <inf> fw_block_processor: Downloading block index 220 (221/372)
[00:08:12.551,971] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:08:12.552,062] <err> golioth_coap_client_zephyr: Failed to receive: -128
[00:08:12.552,490] <err> golioth_coap_client_zephyr: Failed to schedule CoAP GET BLOCK: -115
[00:08:12.552,520] <inf> golioth_coap_client_zephyr: Ending session
[00:08:12.552,551] <inf> softsim_module: Golioth client disconnected
[00:08:16.183,105] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:08:17.683,197] <inf> softsim_module: Connected to Golioth
[00:08:17.683,288] <inf> golioth_fw_update: Current firmware version: main - 0.2.0
[00:08:17.683,319] <inf> main_module: Connected for updates
[00:08:17.683,349] <inf> main_module: Golioth client connected
[00:08:17.683,441] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
+CSCON: 0
[00:08:25.194,885] <inf> softsim_module: RRC: Idle
+CSCON: 1
[00:08:25.751,342] <inf> softsim_module: RRC: Connected
+CSCON: 0
[00:08:31.470,153] <inf> softsim_module: RRC: Idle
+CSCON: 1
[00:08:32.151,611] <inf> softsim_module: RRC: Connected

Hi Jack,

Thanks for the report! I agree it looks like cell network instability is causing issues here. Have you ever seen the update succeed? My concern with restarting the update is it would likely fail again, and use up a lot of data on your cell plan in the process.

OTA fw updates in general is an area of current focus for us, and we expect to be making several improvements over the coming months, including several that we expect will make the updates more reliable on unstable networks.

Hi Sam, thanks for your reply.
Yeah, I thought it may be. I have seen it succeed a few times now, I was just wandering whether the errors pointed at me making a mistake somewhere. My thing with the restarting was that I saw it usually attempted to start again from the last block when it got connection again but it began to not attempt again and go idle.

We have got a unit going out to site next week which will report it’s signal quality/strength for me to monitor.

Improvements for unstable networks would be great! Although it’s hoped it can be sorted O2 blocking 3rd party roaming on LTE-M in the UK has not been very helpful.