Golioth_system: Receive timeout

Hi,

I’m testing the lightdb_stream sample program using SparkFun Thing Plus nRF9160 and was running fine for a few hours. But then I got “golioth_system: Receive timeout” warning message and it stopped sending data.

I changed the uplink frequency from 5sec to 30min and repeated the test. This time, it sent the new data after 30min, but then 20 min later I got the same warning message and it won’t recover until I reset the board.

Has anyone experience similar issues? If yes, please help!

Also, is there an example on how to implement a watchdog?

Thanks!

Hey @invosa welcome to the forum!

Do you call out a specific version of the Golioth Zephyr SDK in your manifest? If not, when did you pull the Golioth code into your project?

Chris

Hi @ChrisGammell, thanks!

No, I did not call out a specific version. I just used the standard sample program which I first downloaded on 10/31, but I’ve done west update before compiling it today.

Btw, I reset the board and it’s currently been running for 3hrs. I’ll let you know when it stops again.

Hi invosa! Will you please send the hash of the Golioth SDK? You can do this from the modules/lib/golioth folder as follows:

➜ cd ~/golioth-ncs-workspace/modules/lib/golioth/
➜ git rev-parse HEAD
fa816c5c74848dfdca0827e25a6562a1bc237e94

Hi @mike, here’s the hash of the Golioth SDK: e77f9355307d1177c3b569cbda3ea796f59955a7

After the last reset, so far it’s been running for almost 12hrs with no issue (only sending new values every 30min). I plan to run it for several days.

FYI, it stopped again after 22.5 hrs. See below logs:

[22:00:46.187,683] golioth_lightdb_stream: main: Sending temperature 22.000000
[22:00:46.831,207] golioth_lightdb_stream: temperature_push_sync: Temperature successfully sync pushed
[22:00:46.831,207] golioth_lightdb_stream: main: Sending humidity 72.000000
[22:00:47.066,619] golioth_lightdb_stream: humidity_push_sync: Humidity successfully sync pushed
[22:30:47.066,680] golioth_lightdb_stream: main: Sending temperature 22.500000
[22:30:49.357,482] golioth_lightdb_stream: temperature_push_sync: Temperature successfully sync pushed
[22:30:49.357,513] golioth_lightdb_stream: main: Sending humidity 72.500000
[22:30:49.606,811] golioth_lightdb_stream: humidity_push_sync: Humidity successfully sync pushed
[22:50:43.360,839] golioth_system: Receive timeout
[23:00:49.606,872] golioth_lightdb_stream: main: Sending temperature 23.000000

No more messages after that and the device went offline.

Thanks for reporting and sending over the hash. I’ve checkout out the same version of Golioth and have my Sparkfun Thing Plus nrf9160 running the lightdb_stream sample to see if I can replicate the same behavior you are seeing.

As a workaround while we troubleshoot this, the watchdog timer you mentioned isn’t a bad idea. Zephyr has a watchdog sample that you can use as an implementation guide. I checked and this board already has a watchdog0 node in the Devicetree.

Thank you @mike, I’ll try the watchdog timer. Btw, the board didn’t reset automatically after flashing so I wonder if the reset pin was set correctly. Where is this being defined?

FYI, the issue is intermittent. Over the weekend it was running for more than 29 hrs before I reset it accidentally. But today it stopped again after ~16 hrs.

uart:~$ [15:37:23.330,383] golioth_lightdb_stream: main: Sending temperature 22.000000
uart:~$ [15:37:23.602,203] golioth_lightdb_stream: temperature_push_sync: Temperature successfully sync pushed
uart:~$ [15:37:23.602,233] golioth_lightdb_stream: main: Sending humidity 62.000000
uart:~$ [15:37:23.843,231] golioth_lightdb_stream: humidity_push_sync: Humidity successfully sync pushed
uart:~$ uart:~$ [15:40:25.862,945] golioth_system: Receive timeout
uart:~$ uart:~$ [16:07:23.843,292] golioth_lightdb_stream: main: Sending temperature 22.500000

So far I’m 22 hours in and haven’t encountered the issue yet.

[22:15:11.331,420] <dbg> golioth_lightdb_stream: main: Sending temperature 24.000000
[22:15:11.977,966] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed
[22:15:16.978,027] <dbg> golioth_lightdb_stream: main: Sending temperature 24.500000
[22:15:18.063,171] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed
[22:15:21.978,759] <dbg> golioth_lightdb_stream: main: Sending temperature 25.000000

Do you get any other messages after the Receive Timeout? I would expect the stream to keep trying to send data and spitting out failure messages.

Regarding the reset pin: I also have the issue that the board needs manual reset after flashing. I don’t know if this is a problem with the board mapping or not. The workaround I’ve used is to flash it using:

west flash -r jlink --reset-after-load

I’ve tried putting the reset-after-load flag into the board_runner_args in board.cmake but haven’t have success with that yet.

Mine is still running without issue. It did reset the timestamp after 36 hours and 24 minutes but it didn’t hang or reboot:

[36:24:07.232,482] <dbg> golioth_lightdb_stream: main: Sending temperature 26.000000                                                                                                           
[36:24:07.646,057] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed                                                                                        
[36:24:12.646,148] <dbg> golioth_lightdb_stream: main: Sending temperature 26.500000                                                                                                           
[36:24:13.822,265] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed                                                                                     
[36:24:17.646,881] <dbg> golioth_lightdb_stream: main: Sending temperature 27.000000                                                                                                           
[36:24:18.294,433] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed                                                                                        
[36:24:23.294,525] <dbg> golioth_lightdb_stream: main: Sending temperature 27.500000                                                                                                           
[36:24:24.362,640] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed                                                                                     
[36:24:28.295,257] <dbg> golioth_lightdb_stream: main: Sending temperature 28.000000                                                                                                           
[36:24:28.837,829] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed                                                                                        
[00:00:01.837,890] <dbg> golioth_lightdb_stream: main: Sending temperature 28.500000                                                                                                           
[00:00:02.916,046] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed                                                                                     
[00:00:06.838,623] <dbg> golioth_lightdb_stream: main: Sending temperature 29.000000                                                                                                           
[00:00:07.396,209] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed                                                                                        
[00:00:12.396,301] <dbg> golioth_lightdb_stream: main: Sending temperature 29.500000                                                                                                           
[00:00:13.668,426] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed                                                                                     
[00:00:17.397,033] <dbg> golioth_lightdb_stream: main: Sending temperature 20.000000                                                                                                           
[00:00:18.143,585] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed   

Okay, good news, I was able to replicate the issue and we are working on it now. I’ll post back when I have more information:

[12:02:41.496,551] <dbg> golioth_lightdb_stream: main: Sending temperature 26.500000                                                                                                                                 
[12:02:42.577,697] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed                                                                                                           
[12:02:46.497,283] <dbg> golioth_lightdb_stream: main: Sending temperature 27.000000                                                                                                                                 
[12:02:47.055,847] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed                                                                                                              
[12:02:52.055,938] <dbg> golioth_lightdb_stream: main: Sending temperature 27.500000                                                                                                                                 
[12:02:53.226,074] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed                                                                                                           
[12:02:57.056,671] <dbg> golioth_lightdb_stream: main: Sending temperature 28.000000                                                                                                                                 
[12:02:59.563,751] <wrn> golioth: Resending request 0x20020e78 (reply 0x20020eac) (retries 2)                                                                                                                        
[12:03:00.587,280] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed                                                                                                              
[12:03:05.587,402] <dbg> golioth_lightdb_stream: main: Sending temperature 28.500000                                                                                                                                 
[12:03:07.803,833] <wrn> golioth: Resending request 0x20020e78 (reply 0x20020eac) (retries 2)                                                                                                                        
[12:03:10.588,134] <dbg> golioth_lightdb_stream: main: Sending temperature 29.000000                                                                                                                                 
[12:03:12.208,221] <wrn> golioth: Resending request 0x20020e78 (reply 0x20020eac) (retries 1)                                                                                                                        
[12:03:12.784,606] <wrn> golioth: Resending request 0x20020ed8 (reply 0x20020f0c) (retries 2)                                                                                                                        
[12:03:14.407,836] <dbg> golioth_lightdb_stream: temperature_push_handler: Temperature successfully pushed                                                                                                           
[12:03:15.368,804] <dbg> golioth_lightdb_stream: temperature_push_sync: Temperature successfully pushed                                                                                                              
[12:03:20.368,896] <dbg> golioth_lightdb_stream: main: Sending temperature 29.500000                                                                                                                                 
[12:03:22.794,311] <wrn> golioth: Resending request 0x20020f70 (reply 0x20020fa4) (retries 2)                                                                                                                        
[12:03:25.369,628] <dbg> golioth_lightdb_stream: main: Sending temperature 20.000000                                                                                                                                 
[12:03:27.624,145] <wrn> golioth: Resending request 0x20020f70 (reply 0x20020fa4) (retries 1)                                                                                                                        
[12:03:28.343,505] <wrn> golioth: Resending request 0x20020e78 (reply 0x20020eac) (retries 2)                                                                                                                        
[12:03:34.262,664] <wrn> golioth: Resending request 0x20020e78 (reply 0x20020eac) (retries 1)                                                                                                                        
[12:03:37.275,573] <wrn> golioth: Resending request 0x20020f70 (reply 0x20020fa4) (retries 0)                                                                                                                        
[12:03:46.119,018] <wrn> golioth: Resending request 0x20020e78 (reply 0x20020eac) (retries 0)                                                                                                                        
[12:03:54.888,458] <wrn> golioth_system: Receive timeout                                                                                                                                                             
uart:~$

Hi @mike, I’m glad that you’re able to replicate the issue. On my setup, I get one last attempt of sending data (see below) after the Receive Timeout before it got hang. No failure messages after that. Is it the same in your setup?

uart:~$ [15:37:23.602,233] golioth_lightdb_stream: main: Sending humidity 62.000000
uart:~$ [15:37:23.843,231] golioth_lightdb_stream: humidity_push_sync: Humidity successfully sync pushed
uart:~$ uart:~$ [15:40:25.862,945] golioth_system: Receive timeout
uart:~$ uart:~$ [16:07:23.843,292] golioth_lightdb_stream: main: Sending temperature 22.500000

Likely the loop was still running when the timeout occurred and you get one more send attempt before the synchronous call halts main thread flow.

To us the Receive Timeout appears to be the correct failure mode, but the system client should try to reconnect automatically, which isn’t happening. Thanks for reporting this, it will be nice to get it fixed!