Golioth_coap_client_zephyr failes to connect with error -22

Hello,

we are using the Golioth Firmware SDK v0.10.0 on a nRF9160 device, along with the nRF Connect SDK v2.5.1.
The device is able to connect to the Network (what takes ~75 seconds) and seems to resolve DNS without a problem, but than fails to connect to the Golioth Cloud with error -22.
The log repeatedly prints the following messages after connecting:

[00:01:14.061,462] <err> golioth_coap_client_zephyr: Failed to send empty CoAP message: -22
[00:01:14.061,828] <err> golioth_coap_client_zephyr: Failed to connect: -22
[00:01:14.061,828] <wrn> golioth_coap_client_zephyr: Failed to connect: -22

This happens with the hello sample and the logging sample (from golioth-firmware-sdk/examples/zephyr).

With a firmware based on the Golioth Zephyr SDK, I already connected successfully to the Golioth Cloud, using the same hardware.

Any ideas on this issue are welcome.

1 Like

Hi @SenseJo, thank you for this detailed report! While I am not certain this is the issue, if you are using a pre-shared key (PSK) for authentication, it is possible that the PSK ID has exceeded the default max length in the Golioth Firmware SDK. Could you check and see if your PSK ID is longer than 32 characters, and, if so, try to connect again using a shorter PSK ID?

1 Like

Hi @hasheddan, thank you for your response!
Yes, I am using PSK auth, and actually, my PSK ID was about 40 characters long! I now changed it to a 30 character PSK ID.
But unfortunately, this did not make any difference about the error, I’m still getting the same log output.

Maybe it helps if I describe how I build the firmware: I have a west manifest repo with the following west.yml:

west.yml
manifest:
  remotes:
    - name: nrfconnect
      url-base: https://github.com/nrfconnect
    - name: golioth
      url-base: https://github.com/golioth

  projects:
    # Always ensure nRF Connect SDK and the Golioth Firmware SDK versions are
    # compatible.
    - name: sdk-nrf
      remote: nrfconnect
      revision: v2.5.1
      path: nrf
      import:
        name-allowlist:
          - cmsis
          - hal_nordic
          - hal_st
          - mbedtls
          - mbedtls-nrf
          - mcuboot
          - net-tools
          - nrf
          - nrfxlib
          - qcbor
          - segger
          - tfm-mcuboot
          - tinycrypt
          - trusted-firmware-m
          - zcbor
          - zephyr
    - name: golioth-firmware-sdk
      remote: golioth
      revision: v0.10.0
      path: modules/lib/golioth-firmware-sdk
      submodules: true

I am building the hello sample on the terminal:

cd modules/lib/golioth-firmware-sdk/examples/zephyr/hello

west build -b nrf9160dk_nrf9160_ns -p -- -DEXTRA_CONF_FILE="golioth-auth.conf;rtt.conf" -DCONFIG_LOG=y

My Kconfig overlay files look as follows:

rtt.conf
CONFIG_USE_SEGGER_RTT=y
CONFIG_RTT_CONSOLE=y
CONFIG_UART_CONSOLE=n
golioth-auth.conf
CONFIG_GOLIOTH_SAMPLE_PSK_ID="test1@xxxxxx-01-xxx-xx-xxxxxx"
CONFIG_GOLIOTH_SAMPLE_PSK="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

The board I’m using actually isn’t a nRF9160DK, but a custom board. However, the board connects to Golioth with no issues when I use a firmware, e.g. the hello sample again, that is based on the Golioth Zephyr SDK. I just tried it again successfully with Golioth Zephyr SDK v0.8.0, using the same compile command as for the Golioth Firmware SDK hello sample.

@SenseJo thank you for that information! Could you also provide your config you are using with the Zephyr SDK? That will help me compare and reproduce.

@hasheddan Sure.
With the Golioth Zephyr SDK, I’m using the hello sample under modules/lib/golioth/samples/hello, and building using the same command as above:

The rtt.conf is the same as well, golioth-auth.conf differs slightly, and west.yml is different, of course:

golioth-auth.conf

CONFIG_GOLIOTH_SAMPLE_HARDCODED_PSK_ID="test1@xxxxxx-01-xxx-xx-xxxxxx" CONFIG_GOLIOTH_SAMPLE_HARDCODED_PSK="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

west.yml
manifest:
  remotes:
    - name: golioth
      url-base: [email protected]:golioth

  projects:
    # the following is taken from
    # https://github.com/golioth/reference-design-template/blob/main/west.yml
    - name: golioth-zephyr-sdk
      path: modules/lib/golioth
      remote: golioth
      revision: v0.8.0
      west-commands: scripts/west-commands.yml
      import:
        file: west-ncs.yml
        name-allowlist:
          - nrf
          - zephyr
          - cmsis
          - hal_nordic
          - mbedtls
          - mbedtls-nrf
          - mcuboot
          - net-tools
          - nrfxlib
          - qcbor
          - segger
          - tfm-mcuboot
          - tinycrypt
          - trusted-firmware-m
          - zcbor

Thanks @SenseJo! I doubt this is the issue, but it may be worth specifying pristine in your build to ensure that your new PSK ID / PSK is being used. The v0.8.0 Zephyr SDK release does use a different version of the nRF Connect SDK (v2.5.0), but I am also doubtful that would be causing this behavior. Could you try increasing the logging verbosity, particularly in the networking stack to provide additional informatin?

Hi ! I’ve the same issue on my thingy91

prj.conf
# Application

CONFIG_GOLIOTH_SAMPLE_COMMON=y

CONFIG_GOLIOTH_SAMPLE_HARDCODED_CREDENTIALS=y

CONFIG_LOG_BACKEND_GOLIOTH=y

# Golioth Firmware SDK with all dependencies

CONFIG_GOLIOTH_FIRMWARE_SDK=y

CONFIG_GOLIOTH_SAMPLE_PSK_ID="xx@xxxxxx-xxxxxx"

CONFIG_GOLIOTH_SAMPLE_PSK="xxxxxxxxxxxxxxxxxxxxxxxxxxx"

# General config

CONFIG_HEAP_MEM_POOL_SIZE=4096

CONFIG_NEWLIB_LIBC=y

# Networking

CONFIG_NET_SOCKETS_OFFLOAD=y

CONFIG_NET_IPV6=y

CONFIG_NET_IPV6_NBR_CACHE=n

CONFIG_NET_IPV6_MLD=n

# Increase native TLS socket implementation, so that it is chosen instead of

# offloaded nRF91 sockets

CONFIG_NET_SOCKETS_TLS_PRIORITY=35

# Modem library

CONFIG_NRF_MODEM_LIB=y

CONFIG_NRF_MODEM_LIB_ON_FAULT_APPLICATION_SPECIFIC=y

# LTE connectivity with network connection manager

CONFIG_LTE_CONNECTIVITY=y

CONFIG_NET_CONNECTION_MANAGER=y

CONFIG_NET_CONNECTION_MANAGER_MONITOR_STACK_SIZE=1024

# Increased sysworkq size, due to LTE connectivity

CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048

# Disable options y-selected by NCS for no good reason

CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_PSK_ENABLED=n

CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_RSA_ENABLED=n

# Generate MCUboot compatible images

CONFIG_BOOTLOADER_MCUBOOT=y
west.yml
manifest:
  self:
    path: golioth-track
  remotes:
    - name: nrf
      url-base: https://github.com/nrfconnect
  projects:
    - name: sdk-nrf
      remote: nrf
      path: nrf
      revision: v2.5.1
      import: true
      clone-depth: 1
    - name: golioth
      path: modules/lib/golioth-firmware-sdk
      revision: v0.10.0
      url: https://github.com/golioth/golioth-firmware-sdk.git
      submodules: true

Logs :

*** Booting nRF Connect SDK v2.5.1 ***
[00:00:00.514,129] <dbg> hello_zephyr: main: start hello sample
[00:00:00.514,282] <inf> golioth_samples: Waiting to obtain IP address
[00:00:02.770,080] <inf> lte_monitor: Network: Searching
[00:00:02.912,109] <inf> lte_monitor: RRC: Connected
[00:00:03.610,229] <inf> lte_monitor: Network: Registered (home)
[00:00:03.611,267] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:00:03.681,915] <err> golioth_coap_client_zephyr: Failed to send empty CoAP message: -22
[00:00:03.682,281] <err> golioth_coap_client_zephyr: Failed to connect: -22
[00:00:03.682,281] <wrn> golioth_coap_client_zephyr: Failed to connect: -22
[00:00:08.698,211] <err> golioth_coap_client_zephyr: Failed to send empty CoAP message: -22
...

Thank you @hasheddan, and sorry for the delay, I had a few days off.
I checked my .config file to ensure that the credentials with the short PSK_ID are used.

Then I activated more logging by building as follows:

west build -b nrf9160dk_nrf9160_ns (read) -- -DEXTRA_CONF_FILE="golioth-auth.conf;rtt.conf" -DCONFIG_LOG=y -DCONFIG_NET_LOG=y -DCONFIG_NET_SOCKETS_LOG_LEVEL_DBG=y

This yields the following log output:

RTT log output
*** Booting nRF Connect SDK 87355af5914e ***
[00:00:00.508,331] <dbg> hello_zephyr: main: start hello sample
[00:00:00.508,331] <inf> golioth_samples: Waiting to obtain IP address
>
[00:01:15.499,206] <inf> lte_monitor: Network: Searching
[00:01:15.541,992] <inf> lte_monitor: RRC: Connected
[00:01:16.651,123] <inf> lte_monitor: Network: Registered (roaming)
[00:01:16.652,160] <inf> golioth_mbox: Mbox created, bufsize: 1232, num_items: 10, item_size: 112
[00:01:17.011,688] <dbg> net_sock_tls: tls_alloc: (coap_client): Allocated TLS context, 0x2000d730
[00:01:17.012,695] <err> golioth_coap_client_zephyr: Failed to send empty CoAP message: -22
[00:01:17.012,725] <dbg> net_sock: z_impl_zsock_close: (coap_client): close: ctx=0x2000d730, fd=7
[00:01:17.012,908] <dbg> net_sock: z_impl_zsock_close: (coap_client): close: ctx=0x2000fb6c, fd=8
[00:01:17.013,122] <err> golioth_coap_client_zephyr: Failed to connect: -22
[00:01:17.013,153] <wrn> golioth_coap_client_zephyr: Failed to connect: -22
[00:01:22.019,897] <dbg> net_sock_tls: tls_alloc: (coap_client): Allocated TLS context, 0x2000d730
[00:01:22.026,550] <err> golioth_coap_client_zephyr: Failed to send empty CoAP message: -22
[00:01:22.026,580] <dbg> net_sock: z_impl_zsock_close: (coap_client): close: ctx=0x2000d730, fd=7
[00:01:22.026,763] <dbg> net_sock: z_impl_zsock_close: (coap_client): close: ctx=0x2000fb6c, fd=8
[00:01:22.026,977] <err> golioth_coap_client_zephyr: Failed to connect: -22
[00:01:22.026,977] <wrn> golioth_coap_client_zephyr: Failed to connect: -22

@73jn thank you for adding your reproduction, and @SenseJo thank you for the additional information! We are looking into this and I will follow up here shortly.

1 Like

@SenseJo @73jn are you both able to verify that your cellular data plan is active? Also, if supported by your provider, you may be able to observe packet captures that would help give a better indication of what data is actually being sent.

@hasheddan Yes, I am sure my cellular data plan is active, since I just swapped the firmware on the same board with the same SIM card, and things worked with the Golioth Zephyr SDK based firmware.
I don’t know whether our operator supports packet capturing, a quick search yielded no result.

1 Like

I just tried the the hello-sample again with different revision of the Golioth Firmware SDK. Of course, I always updated the NCS accordingly.
With the current main branch, it’s the same behavior.
Interestingly, with revision v0.9.0, the nRF91 does not output error -22. However, it seems like no connection was established anyways, since the device does not print the hello counter logs, and is show as offline in the Golioth web console.

RTT log output with Golioth Firmware SDK v0.9.0
*** Booting nRF Connect SDK 1fae141fc671 ***
[00:00:00.500,976] <dbg> hello_zephyr: main: start hello sample
[00:00:00.500,976] <inf> golioth_samples: Waiting to obtain IP address
[00:01:11.547,637] <inf> lte_monitor: Network: Searching
[00:01:11.589,172] <inf> lte_monitor: RRC: Connected
[00:01:13.021,179] <inf> lte_monitor: Network: Registered (roaming)
[00:01:13.022,064] <inf> golioth_mbox: Mbox created, bufsize: 1144, num_items: 10, item_size: 104
[00:01:13.253,112] <inf> golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
[00:01:13.253,173] <inf> golioth_coap_client: Session PSK-ID: xxxxx@xxxxxx-xx-xxx-xx-xxxxxx
[00:01:13.254,516] <inf> golioth_coap_client: Entering CoAP I/O loop
[00:01:24.975,494] <inf> lte_monitor: RRC: Idle
1 Like

@SenseJo we have been unable to reproduce your observed behavior thus far, but it makes sense that you would see a difference between v0.9.0 and v0.10.0 as the latter moved to using the Zephyr CoAP library.

To help us continue investigation, would you mind sharing your build/zephyr/.config file with any sensitive information redacted?

@hasheddan Thank you for further looking into this.
Here is the .config file, from the hello sample build just as described above (golioth-firmware-sdk v0.10.0): 2024-03-11_golioth_hello_sample_nrf9160dk_nrf9160_ns_dotconfig · GitHub

1 Like

Hey @SenseJo,

I looked closer at your project configuration file, and one line caught my attention. On line 140 of your config file, CONFIG_MBEDTLS_ENABLE_HEAP is not set . This explains the errors you are experiencing, and I was able to reproduce them.

At this moment, I’m unsure why this has happened, but I’ll keep investigating.

Can you add CONFIG_MBEDTLS_ENABLE_HEAP=y to your prj.conf file (and check that it is actually set in the .config file) to confirm that this was the issue?

@73jn, could you also check your .config file and post if the MBEDTLS_ENABLE_HEAP is not set?

Hi @marko, thanks a lot for looking at this!

I took your advice and added CONFIG_MBEDTLS_ENABLE_HEAP=y, but that alone did not do the trick. That changed something, however, now I’m getting the following errors:

[00:01:13.725,982] <err> golioth_coap_client_zephyr: Failed to send empty CoAP message: -12
[00:01:13.726,348] <err> golioth_coap_client_zephyr: Failed to connect: -12
[00:01:13.726,379] <wrn> golioth_coap_client_zephyr: Failed to connect: -12

So now it’s error code -12 instead of -22.

I think the CONFIG_MBEDTLS_ENABLE_HEAP options should normally be set by golioth-firmware-sdk/examples/zephyr/common/Kconfig.defconfig, which includes the following:

Kconfig.defconfig
if GOLIOTH_SAMPLE_COMMON

[...]

if MBEDTLS

config MBEDTLS_ENABLE_HEAP
	default y

if MBEDTLS_ENABLE_HEAP

config MBEDTLS_HEAP_SIZE
	default 49152 if GOLIOTH_AUTH_METHOD_CERT
	default 10240

endif # MBEDTLS_ENABLE_HEAP

if MBEDTLS_BUILTIN

config MBEDTLS_SSL_MAX_CONTENT_LEN
	default 4096 if GOLIOTH_AUTH_METHOD_CERT
	default 2048

endif # MBEDTLS_BUILTIN

if NRF_SECURITY

config MBEDTLS_SSL_IN_CONTENT_LEN
	int
	default 4096 if GOLIOTH_AUTH_METHOD_CERT
	default 2048

config MBEDTLS_SSL_OUT_CONTENT_LEN
	int
	default 4096 if GOLIOTH_AUTH_METHOD_CERT
	default 2048

endif # NRF_SECURITY

endif # MBEDTLS

[...]

endif # GOLIOTH_SAMPLE_COMMON

However, it looks like in my build of the hello-sample, this .defconfig file is not picked up for some reason:

Build output
-- west build: making build dir /home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/build/nrf9160dk_nrf9160_ns pristine
-- west build: generating a build system
Loading Zephyr default modules (Zephyr base).
-- Application: /home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello
-- CMake version: 3.25.1
-- Found Python3: /usr/bin/python3 (found suitable version "3.10.12", minimum required is "3.8") found components: Interpreter
-- Cache files will be written to: /home/jo/.cache/zephyr
-- Zephyr version: 3.4.99 (/home/jo/Documents/Project/project_ncs/zephyr)
-- Found west (found suitable version "1.0.0", minimum required is "0.14.0")
-- Board: nrf9160dk_nrf9160_ns, Revision: 0.14.0
-- ZEPHYR_TOOLCHAIN_VARIANT not set, trying to locate Zephyr SDK
-- Found host-tools: zephyr 0.16.5 (/home/jo/zephyr-sdk-0.16.5-1)
-- Found toolchain: zephyr 0.16.5 (/home/jo/zephyr-sdk-0.16.5-1)
-- Found Dtc: /home/jo/zephyr-sdk-0.16.5-1/sysroots/x86_64-pokysdk-linux/usr/bin/dtc (found suitable version "1.6.0", minimum required is "1.4.6")
-- Found BOARD.dts: /home/jo/Documents/Project/project_ncs/zephyr/boards/arm/nrf9160dk_nrf9160/nrf9160dk_nrf9160_ns.dts
-- Found devicetree overlay: /home/jo/Documents/Project/project_ncs/zephyr/boards/arm/nrf9160dk_nrf9160/nrf9160dk_nrf9160_ns_0_14_0.overlay
-- Generated zephyr.dts: /home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/build/nrf9160dk_nrf9160_ns/zephyr/zephyr.dts
-- Generated devicetree_generated.h: /home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/build/nrf9160dk_nrf9160_ns/zephyr/include/generated/devicetree_generated.h
-- Including generated dts.cmake file: /home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/build/nrf9160dk_nrf9160_ns/zephyr/dts.cmake

warning: Experimental symbol NET_SOCKETS_SOCKOPT_TLS is enabled.


warning: Experimental symbol NET_SOCKETS_ENABLE_DTLS is enabled.


warning: Experimental symbol NET_CONNECTION_MANAGER is enabled.

Parsing /home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/Kconfig
Loaded configuration '/home/jo/Documents/Project/project_ncs/zephyr/boards/arm/nrf9160dk_nrf9160/nrf9160dk_nrf9160_ns_defconfig'
Merged configuration '/home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/prj.conf'
Merged configuration '/home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/boards/nrf9160dk_nrf9160_ns.conf'
Merged configuration '/home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/golioth-auth.conf'
Merged configuration '/home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/rtt.conf'
Configuration saved to '/home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/build/nrf9160dk_nrf9160_ns/zephyr/.config'
Kconfig header saved to '/home/jo/Documents/Project/project_ncs/modules/lib/golioth-firmware-sdk/examples/zephyr/hello/build/nrf9160dk_nrf9160_ns/zephyr/include/generated/autoconf.h'
-- Found GnuLd: /home/jo/zephyr-sdk-0.16.5-1/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/../../../../arm-zephyr-eabi/bin/ld.bfd (found version "2.38")
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- The ASM compiler identification is GNU
-- Found assembler: /home/jo/zephyr-sdk-0.16.5-1/arm-zephyr-eabi/bin/arm-zephyr-eabi-gcc
-- Found Python3: /usr/bin/python3 (found version "3.10.12") found components: Interpreter
Changed board to secure nrf9160dk_nrf9160 (NOT NS)

=== child image mcuboot -  begin ===
[...]

However, I currently have no idea why the file is not been picked up. CONFIG_GOLIOTH_SAMPLE_COMMON=y is set in the build/zehphyr/.config file.

I have the same error on v0.11.0 Golioth-sdk and NCS 2.5.2.

Hi @SenseJo!

Could you please reorder projects in west.yml, so that golioth-firmware-sdk is listed first, then sdk-nrf:

manifest:
  remotes:
    - name: nrfconnect
      url-base: https://github.com/nrfconnect
    - name: golioth
      url-base: https://github.com/golioth

  projects:
    - name: golioth-firmware-sdk
      remote: golioth
      revision: v0.10.0
      path: modules/lib/golioth-firmware-sdk
      submodules: true

    # Always ensure nRF Connect SDK and the Golioth Firmware SDK versions are
    # compatible.
    - name: sdk-nrf
      remote: nrfconnect
      revision: v2.5.1
      path: nrf
      import:
        name-allowlist:
          - cmsis
          - hal_nordic
          - hal_st
          - mbedtls
          - mbedtls-nrf
          - mcuboot
          - net-tools
          - nrf
          - nrfxlib
          - qcbor
          - segger
          - tfm-mcuboot
          - tinycrypt
          - trusted-firmware-m
          - zcbor
          - zephyr

Even better, you could automatically import sdk-nrf repo with all dependencies with:

manifest:
  remotes:
    - name: nrfconnect
      url-base: https://github.com/nrfconnect
    - name: golioth
      url-base: https://github.com/golioth

  projects:
    - name: golioth-firmware-sdk
      remote: golioth
      revision: v0.10.0
      path: modules/lib/golioth-firmware-sdk
      submodules: true
      import:
        file: west-ncs.yml
        name-allowlist:
          - cmsis
          - hal_nordic
          - hal_st
          - mbedtls
          - mbedtls-nrf
          - mcuboot
          - net-tools
          - nrf
          - nrfxlib
          - qcbor
          - segger
          - tfm-mcuboot
          - tinycrypt
          - trusted-firmware-m
          - zcbor
          - zephyr

Please note that this is just a workaround for now for unexpected Zephyr module import order. Still working on a better long term solution.

Thank you, and sorry for the late reply.
You are right, the order of the projects in the west.yml file makes the difference! After moving Golioth to the top, the hello-sample would connect successfully.

I agree that this solution is better, thanks!