Possible bug in CoAP client connection behaviour

Description

Hello everyone,

I think I might have found a bug in the golioth_connect_host_port function in the coap_client_zephyr.c file. As I am understanding it, it is supposed to fetch the IP address of the Golioth server, create a network socket on it and connect to that socket, and then delete the address info that was just fetched with zsock_freeaddrinfo(addrs). This should mean that on a subsequent connection attempt, the IP address is fetched anew, right?

However, when I do such a reconnect, I have found that this does not happen and the device directly tries to connect to the previously fetched IP address, but only within the TTL of the DNS resolution packet. I suspect the zsock_close and zsock_freeaddrinfo functions do not actually discard the DNS response in the modem (in fact I couldn’t find an implementation of zsock_close at all).

Is there any way to discard this information via the Zephyr network interface every time as soon as the connection is established? We had connection problems with our devices because the SIM cards we are using have a firewall feature, and we do not want to whitelist the Golioth server IP manually as it might change in the future. Our internet provider told us that they cannot simply change the TTL of the DNS response and advised us to manually discard it.

If this is something that cannot be solved on the Golioth level, please let me know so I can start looking into possible solutions the lower layers in Zephyr and Nordic SDK.

Best regards
Peter

Steps to Reproduce

Connect to Golioth twice within the TTL of the DNS response of the first connection.

Expected Behavior

The DNS is queried again on the second connection.

Actual Behavior

The DNS is not queried again and the connection is established via the IP address given by the previous DNS response.

Impact

We don’t properly achieve connection to Golioth every time we attempt it, which results in delayed data transfers.

Environment

We are using Golioth SDK 0.15.0, Zephyr 3.7.0 and NCS 2.5.2.

Logs and Console Output

Wireshark output on first connection:

51 4.122894 10.179.105.72 10.105.16.254 DNS 73 Standard query 0x6a82 A coap.golioth.io
56 4.329010 10.105.16.254 10.179.105.72 DNS 89 Standard query response 0x6a82 A coap.golioth.io A 34.135.90.112
57 4.350067 10.179.105.72 34.135.90.112 DTLSv1.2 179 Client Hello (SNI=coap.golioth.io)
58 4.670990 34.135.90.112 10.179.105.72 DTLSv1.2 88 Hello Verify Request
59 4.672058 10.179.105.72 34.135.90.112 DTLSv1.2 199 Client Hello (SNI=coap.golioth.io)
60 4.864929 34.135.90.112 10.179.105.72 DTLSv1.2 116 Server Hello

followed by certificate exchange and application data.

Wireshark output on subsequent connection:

309 35.980377 10.179.105.74 34.135.90.112 DTLS 179 Client Hello (SNI=coap.golioth.io)
310 37.081298 10.179.105.74 34.135.90.112 DTLS 179 Client Hello (SNI=coap.golioth.io)

and 3 more similar lines before the connection attempt is timeouted by our firmware. As you can see, the client does not attempt a DNS query first.

Attempts to Resolve

A temporary fix is manually whitelisting the Golioth server IP, but we would prefer a more stable solution.

Hey @kolozspe,

Thanks for your detailed report and for reaching out again on this issue. Based on our investigation, this behavior is expected due to the DNS resolution being cached within the modem for the duration of the TTL. At the moment, we don’t have a way to manually override or purge the DNS cache at the modem level.

However, we are planning to introduce support within Zephyr RTOS to avoid offloading DNS resolution to the modem, as this is a limitation of the operating system itself. This would provide tighter control over DNS handling in the future, allowing for more flexibility in cases like yours, where a fresh DNS query on each connection attempt is preferred.

In the meantime, if your provider cannot adjust the TTL, you might need to explore solutions at the Nordic SDK level to force a fresh resolution. If you find a workaround that works well in your setup, we’d be happy to hear about it!

Hello Marko,

thanks for the swift reply. I already assumed it would be like this. With some more research, I managed to find this thread in the Nordic Forums: nRF9160 DNS cache & TTL - Nordic Q&A - Nordic DevZone - Nordic DevZone

Unfortunately it appears that there is no way to manually discard the DNS information even with the lowest level drivers available, at least not on the chip we are using. We will find another solution though, I’m sure of it :slight_smile:

Best regards
Peter