Description
This issue occurs when the golioth_client_stop(client); function is called after a connection attempt has timed out. The function gets stuck in while loop, causing a thread lock that is eventually reset by the Watchdog Timer (WDT). The last log message observed is "Attempting to stop client".
Steps to Reproduce
I need some guide here, how can I reproduce a connection timeout with/without changing client parameters?
I don’t want to disconnect LTE, also not reduce connection timeout as normally the connection works in 2-5 seconds and keeping the timeout less than 3 seconds lead to other issues of getting the connection after trying to stop the client at timeout which was already reported back in time. However I have attached the logs of last night when this issue happened.
Expected Behavior
The golioth_client_stop() function should successfully exit the while loop, shut down the client, and return GOLIOTH_OK as it does in normal scenerio and print the log msg after golioth_client_stop(client);
static void stop() {
LOG_INF("Golioth client stopping");
golioth_client_stop(client);
LOG_INF("Golioth client stopped");
}
Actual Behavior
LOG_INF("Golioth client stopped");, this msgs is never printed and the firmware is rebooted by watchdog timer after the wdt timeout.
Impact
The device stops responding and is reset by WDT.
Environment
The hardware is nrf9160 based custom board working completely fine in normal cases.
Golioth sdk: 0.15.0
NCS: 2.7.0
zephyr os: v3.6.99-ncs2 (nordic fork)
Connection timeout is set to 30 seconds.
Logs and Console Output
Normal output when everything works fine:
[02:50:01.262,512] <inf> buffer: Publishing 1 datapoint
[02:50:01.615,203] <inf> rf_manager: RRC mode: connected
[02:50:14.018,310] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[02:50:14.018,341] <inf> golioth_client: Golioth client connected
[02:50:14.020,233] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[02:50:15.240,844] <inf> golioth_settings: Setting ACCELERATION_OBA_INTERRUPT_THRESHOLD = 2.000000
[02:50:15.343,231] <inf> golioth_rpc: RPC observation established
[02:50:15.413,024] <inf> golioth_fw_update: Received OTA manifest
[02:50:15.413,024] <inf> golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
[02:50:15.413,024] <inf> golioth_fw_update: Waiting to receive OTA manifest
[02:50:15.624,023] <inf> golioth_uplink: Successfully published data
[02:50:15.624,145] <inf> golioth_dtrqq: All datatransfers completed
[02:50:17.040,100] <inf> golioth_dtrqq: Disconnecting provider
[02:50:17.040,130] <wrn> golioth_client: Golioth client stopping
[02:50:17.040,161] <inf> golioth_coap_client_zephyr: Attempting to stop client
[02:50:17.040,252] <inf> golioth_coap_client_zephyr: Stop request
[02:50:17.040,252] <inf> golioth_coap_client_zephyr: Ending session
[02:50:17.040,252] <inf> golioth_client: Golioth client disconnected
[02:50:18.140,930] <wrn> golioth_client: Golioth client stopped
[02:50:30.015,655] <inf> rf_manager: RRC mode: idle
Logs when it is not working properly:
[02:51:15.262,512] <inf> buffer: Publishing 1 datapoint
[02:51:42.338,897] <inf> rf_manager: RRC mode: idle
[02:51:45.262,786] <wrn> golioth_dtrqq: Connection attempt timed out
[02:51:45.263,061] <wrn> golioth_dtrqq: Queue provider failed to connect
[02:51:45.263,122] <inf> golioth_dtrqq: All datatransfers timed out or cancelled
[02:51:45.263,305] <inf> golioth_dtrqq: Disconnecting provider
[02:51:45.263,305] <wrn> golioth_client: Golioth client stopping
[02:51:45.263,305] <inf> golioth_coap_client_zephyr: Attempting to stop client
[00:00:00.518,157] <inf> LIS2DW12: lis2dw12@19: int-pin is on INT1
[00:00:00.518,218] <inf> LIS2DW12: lis2dw12@19: int on [email protected]