Lots of Client Timeouts (legacy Golioth Zephyr SDK)

I am seeing Golioth client timeouts. Networking is fine as I can ping continuously and Golioth always reconnects immediately. The only recent change is disabling the Golioth logging backend, so I suspect the lack of logging traffic may have something to do with the issue.

[00:04:06.580,000] <dbg> golioth_system: golioth_system_client_main: Event in eventfd
[00:04:06.584,000] <dbg> golioth_system: golioth_system_client_main: Next timeout: 119990
[00:06:06.583,000] <dbg> golioth_system: golioth_system_client_main: Waiting for client to be started
[00:06:06.583,000] <inf> golioth_system: Starting connect
[00:06:07.734,000] <inf> golioth_system: Client connected!
[00:06:07.734,000] <dbg> golioth_system: golioth_system_client_main: Next timeout: 120000
[00:06:07.984,000] <dbg> golioth_system: golioth_system_client_main: Next timeout: 119996
[00:08:07.981,000] <dbg> golioth_system: golioth_system_client_main: Event in eventfd
[00:08:07.985,000] <dbg> golioth_system: golioth_system_client_main: Next timeout: 120000
[00:10:07.986,000] <dbg> golioth_system: golioth_system_client_main: Event in eventfd
[00:10:07.990,000] <dbg> golioth_system: golioth_system_client_main: Next timeout: 119990
[00:12:07.980,000] <dbg> golioth_system: golioth_system_client_main: Event in eventfd
[00:12:07.980,000] <wrn> golioth_system: Receive timeout
[00:12:07.988,000] <dbg> golioth_system: golioth_system_client_main: Waiting for client to be started
[00:12:07.988,000] <inf> golioth_system: Starting connect
[00:12:08.838,000] <inf> golioth_system: Client connected!
[00:12:08.838,000] <dbg> golioth_system: golioth_system_client_main: Next timeout: 120000
[00:12:09.117,000] <dbg> golioth_system: golioth_system_client_main: Next timeout: 119996

I have the RX timeout set to 360 seconds and the ping interval set to 120 seconds.
CONFIG_GOLIOTH_SYSTEM_CLIENT_RX_TIMEOUT_SEC=360 CONFIG_GOLIOTH_SYSTEM_CLIENT_PING_INTERVAL_SEC=120

This is for Zephyr v3.5 and v0.8.0 (4f1538de21b160f7e8993d8820e32e2f9120ca0e) of GitHub - golioth/golioth-zephyr-sdk: Golioth SDK For Zephyr which isn’t scheduled to be updated to the new repo until July timeframe.

Hey @EricNRS,

CONFIG_GOLIOTH_SYSTEM_CLIENT_RX_TIMEOUT_SEC is the receive timeout, after which the connection will be re-established.

CONFIG_GOLIOTH_SYSTEM_CLIENT_PING_INTERVAL_SEC is a periodic interval between consecutive ping messages being sent to Golioth.

So if you disable the Logging backend, there won’t be networking events from the Logging system.

The Zephyr SDK has been depreciated, and there won’t be further support or updates after July 31, 2024. We recommend that you migrate your project to the Firmware SDK using the Migration Guide.

Hi @marko,

Thanks your reply, but the question still remains as to why the Golioth client is timing out. I would think the golioth_ping() having an interval of less than the RX timeout would prevent the client timeout from happening since the system should receive an ACK from the Golioth servers.

Hey @EricNRS, I have created a PR for client timeouts that needs to be approved and merged.

I have tested it with the nRF9160 DK, and was not experiencing client timeouts.

I’ll notify you once it is merged.

Hi @marko, thanks for the fix. I have started testing it and results are promising with no reconnects in the first 30 minutes. I’ll run the test for the rest of the day just to confirm there isn’t another another code path causing the issue.

@marko The fixed worked - the session is staying established based upon the pings on an otherwise idle device with no timeouts for the last 10 hours of operation (the longest it went before was about 4 minutes). Much appreciated.