We have observed connectivity issues on Nordic Semiconductor’s nRF9160 SiP, a popular chipset for many Golioth customers, in which devices appear unable to connect to the Golioth cloud platform. We suspect the same issues may be present on sibling products, such as the nRF9161 and the nRF9151. We are working with Nordic to identify a long-term solution, but in the mean time this post serves to describe the problem and enumerate the available mitigation strategies.
The issue manifests as a failure to perform DNS resolution for coap.golioth.io
. When encountered, the following logging output will typically be observed.
[00:01:40.747,802] <err> golioth_coap_client_zephyr: Fail to get address (coap.golioth.io 5684) -2
[00:01:40.747,802] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:40.747,802] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:45.756,652] <err> golioth_coap_client_zephyr: Fail to get address (coap.golioth.io 5684) -2
[00:01:45.756,683] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:45.756,683] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:50.765,533] <err> golioth_coap_client_zephyr: Fail to get address (coap.golioth.io 5684) -2
[00:01:50.765,563] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:50.765,563] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
We have only observed this issue when connecting to NB-IoT networks. By default, Nordic’s modem firmware appears to use extended protocol configuration options (ePCO) when connecting to NB-IoT networks, and protocol configuration options (PCO) when connecting to LTE-M networks. NB-IoT networks are required to support ePCO according to 3GPP release 13.
However, some NB-IoT networks do not support ePCO, leading to requests made in the configuration options to be unanswered in the LTE attach accept response. Importantly, PCO / ePCO is utilized to configure the primary and secondary DNS servers that are used by the modem when DNS is offloaded. Also of importance is the fact that DNS is offloaded to the modem even when TLS/DTLS sockets have higher priority than offloaded sockets. Because the modem does not have a fallback DNS server address, this results in all DNS resolution requests failing and the device being unable to connect to Golioth.
We have identified two potential workarounds for this behavior. The first is setting CONFIG_PDN_LEGACY_PCO=y
. This will cause the modem to use PCO in all cases. While this may address the immediate issue, we have some concerns about recommending this as a general solution given that some networks may only support ePCO.
The second option, which we have found to work reliably thus far in our testing, is using nrf_setdnsaddr
, which sets the secondary DNS server address (see example here). The secondary should be used in the case that a primary is not provided by the network operator (i.e. via PCO / ePCO) or resolution via the primary server fails. Google (8.8.8.8
) and Cloudflare (1.1.1.1
) public DNS servers are common defaults.
We will continue working with Nordic and popular carriers utilized by Golioth customers to provide more clarity and consistency around this behavior. We will make sure to update this post with any further information.