Golioth_coap_client_zephyr failes to connect with error -22

I have the same issue since sunday, before my code works.

0:00:46.294,586] <err> golioth_coap_client_zephyr: Fail to get address (coap.golioth.io 5684) -101
[00:00:46.294,616] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:46.294,616] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:47.398,040] <err> lte_data_thread: Failed to connect to Golioth

When i try it multiple times and set the timeout high - it will connect some time. I’m on the main and on 2.9.0
BR

@sebastian, have there been any changes between Sunday and now — either in the code, configuration, or environment?

Can you verify whether the modem is successfully attaching to the network, obtaining an IP address, and whether DNS resolution is working?

Also, try increasing the values of CONFIG_ZVFS_EVENTFD_MAX and CONFIG_ZVFS_OPEN_MAX in your configuration — these might be limiting resource availability at runtime.

Didn’t touch them. the environment changed a little by the rssi from -80 to -72.

CONFIG_ZVFS_EVENTFD_MAX=24
CONFIG_ZVFS_OPEN_MAX=22
CONFIG_MBEDTLS_ENABLE_HEAP=y
CONFIG_MBEDTLS_HEAP_SIZE=10240

The DNS is working, but CONFIG_NET_L2_PPP_OPTION_DNS_USE=n (default) make it more stable in the past. So not every attempt fails. I guess you make everything out of the main thread, therefore i change the prio from 6 to 2 - be higher as the coap thread.
now not every connect fails - more each 4-5 attempt fails at the first beginning. I guess it’s a timing issue on my side.

Before connecting to golioth i change:
NET_EVENT_DNS_SERVER_ADD instead of NET_EVENT_IPV4_ADDR_ADD
I’m not sure it this is a good solution. I though about NET_EVENT_L4_CONNECTED too.

[00:01:01.714,416] <inf> lte_data_thread: Attempt 1 to connect to Golioth...
[00:01:02.079,315] <inf> app_networking: www.zephyrproject.org IPv4 address: 23.185.0.4
[00:01:02.079,345] <inf> app_networking: DNS resolving finished
[00:01:11.668,640] <err> golioth_coap_client_zephyr: Fail to get address (coap.golioth.io 5684) -101
[00:01:11.668,670] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:11.668,670] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:21.669,036] <err> golioth_coap_client_zephyr: Fail to get address (coap.golioth.io 5684) -101
[00:01:21.669,067] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:21.669,067] <wrn> golioth_coap_client_zephyr: Failed to connect: -11

CONFIG_NET_SOCKETS_DNS_TIMEOUT=5000 is also set and than everything. DNS should work.

Due to the intermittent connection failures, this does feel like a timing or race condition issue. Adjusting the thread priority (from 6 to 2) to ensure it runs before the CoAP thread is a good experiment — and the improved behavior you observed suggests that thread scheduling could indeed be a factor. Are you calling the Golioth connection logic from a thread that’s guaranteed to run after the modem has fully attached and the network interface is up?

The issues you’re seeing may indicate that the Golioth connection is being attempted before the network stack is fully ready — even though DNS resolution appears to have succeded a few log lines earlier.

yes it’s start with in a frame with modem device init on startup.
i have a delay around 4 seconds after startup than the thread run.

the problem occure on weak connection around -80db to 90db.

I’m not understanding why i see the device getting this error while getting in the backend this infor

the modem is fully attached and i see it online in the 1nce console.
dns is also working and i have a ip.

so seams it’s really on the thread

#include <zephyr/kernel.h>
#include "temp.h"
#include "settings.h"
#include "config.h"

#include <zephyr/net/socket.h>
#include <zephyr/net/net_if.h>
#include <zephyr/net/dns_resolve.h>

#include "app_networking.h"
#include "app_golioth.h"

#include <golioth/client.h>
#include <golioth/fw_update.h>
#include <golioth/stream.h>

#include <app_version.h>

#include <zephyr/pm/device.h>
#include <zephyr/pm/device_runtime.h>
#include <zephyr/drivers/uart.h>

#include "file_manager.h"
#include <zephyr/drivers/uart.h>
#include <zephyr/drivers/cellular.h>

#include "saadc.h"
#include "utils.h"
#include "nb_payload.h"

#include <zephyr/logging/log.h>
LOG_MODULE_REGISTER(lte_data_thread,LOG_LEVEL_INF);

#define THREAD_PRIORITY_DATA 4
#define THREAD_STACKSIZE_DATA 9216
#define THREAD_START_DELAY 4000

#define CONNECTION_TIMEOUT 20 * 1000

#define MAX_WAIT_MS 5000
#define POLL_INTERVAL_MS 200

K_SEM_DEFINE(lte_send_sem, 0, 1); // Initial count 0, max count 1
// K_SEM_DEFINE(lte_shell_wait, 0, 1);
K_SEM_DEFINE(connected, 0, 1); // check if golioth is connected or not

const struct device *modem = DEVICE_DT_GET(DT_ALIAS(modem));
static const struct device *uart_dev = DEVICE_DT_GET(DT_NODELABEL(uart1));

extern bool _force_send_status;

#include "app_sntp.h"

static void on_client_event(struct golioth_client *client,
                            enum golioth_client_event event,
                            void *arg)
{
    bool is_connected = (event == GOLIOTH_CLIENT_EVENT_CONNECTED);
    LOG_INF("Golioth client %s", is_connected ? "connected" : "disconnected");
    if (is_connected)
    {
        k_sem_give(&connected);
    }
    else
    {
        pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
        pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
    }
}

K_SEM_DEFINE(fw_update_complete, 0, 1);
static bool _is_update_in_progress = false;

static void on_fw_update_state_change(enum golioth_ota_state state,
                                      enum golioth_ota_reason reason,
                                      void *user_arg)
{
    LOG_INF("Firmware update state changed to: %d, reason: %d", state, reason);

    switch (state)
    {
    case GOLIOTH_OTA_STATE_IDLE:
        _is_update_in_progress = false; // No update in progress
        LOG_INF("Firmware update is idle.");
        break;
    case GOLIOTH_OTA_STATE_DOWNLOADING:
        _is_update_in_progress = true; // Update started
        LOG_INF("Firmware update downloading...");
        break;
    case GOLIOTH_OTA_STATE_DOWNLOADED:
        _is_update_in_progress = true; // Update downloaded, ready to apply
        LOG_INF("Firmware update downloaded.");

        break;
    case GOLIOTH_OTA_STATE_UPDATING:
        _is_update_in_progress = true; // Firmware is being applied
        LOG_INF("Firmware is being applied...");
        if (reason == GOLIOTH_OTA_REASON_FIRMWARE_UPDATED_SUCCESSFULLY)
        {
            k_sem_give(&fw_update_complete); // Release semaphore
        }
        else if (reason == GOLIOTH_OTA_REASON_FIRMWARE_UPDATE_FAILED)
        {
            LOG_ERR("Firmware update failed.");
        }
        break;
    default:
        _is_update_in_progress = false;
        break;
    }
}

static void stop_client_if_update_complete(struct golioth_client *client)
{
    uint32_t waited = 0;
    while (golioth_client_num_items_in_request_queue(client) > 0 && waited < MAX_WAIT_MS)
    {
        LOG_INF("Waiting for Golioth request queue to empty...");
        k_sleep(K_MSEC(POLL_INTERVAL_MS));
        waited += POLL_INTERVAL_MS;
    }

    if (waited >= MAX_WAIT_MS)
    {
        LOG_WRN("Timeout waiting for Golioth queue to empty");
    }

    if (!_is_update_in_progress)
    {
        // No update is in progress, skip waiting for the semaphore
        LOG_INF("No firmware update in progress, stopping the client.");
        golioth_client_stop(client);
        return;
    }

    // If update is in progress, wait for the semaphore
    LOG_INF("Waiting for firmware update to complete...");
    k_sem_take(&fw_update_complete, K_FOREVER);
    // Now it's safe to stop the client
    LOG_INF("Firmware update complete, stopping the client.");
    golioth_client_stop(client);
}

// #include <zephyr/drivers/modem/simcom-sim7080.h>

#define MAX_RETRIES 5
#define RETRY_TIMEOUT_MS 90000            // 90 seconds per try (can also be 60000 for 60s)
#define BACKOFF_SLEEP_MS (60 * 60 * 1000) // 1 hour sleep on failure
#define SLEEP_BETTWEEN_TIMOUTS 120000     // 2 minutes sleep between retries

static bool wait_for_connection(void)
{
    for (int retry = 1; retry <= MAX_RETRIES; retry++)
    {
        LOG_INF("Attempt %d to connect to Golioth...", retry);
        if (retry > 1)
        {
            pm_device_action_run(uart_dev, PM_DEVICE_ACTION_RESUME); // EALREADY
            pm_device_action_run(modem, PM_DEVICE_ACTION_RESUME);    // EALREADY
        }

        int wait = k_sem_take(&connected, K_MSEC(RETRY_TIMEOUT_MS));
        if (wait == 0)
        {
            LOG_INF("Connected to Golioth");
            return true;
        }
        pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
        pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
        k_sleep(K_MSEC(SLEEP_BETTWEEN_TIMOUTS));

        LOG_WRN("Retry %d/%d timed out after %d seconds",
                retry, MAX_RETRIES, RETRY_TIMEOUT_MS / 1000);
    }

    LOG_ERR("Failed to connect after %d attempts.", MAX_RETRIES);

    pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
    pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);

    return false;
}

static void data_thread(void)
{
    LOG_INF("data_thread started");

    static int64_t last_time_sensor_readings_send = 0;
    pm_device_action_run(modem, PM_DEVICE_ACTION_RESUME);
    // net_connect();
    if (!app_net_connect())
    {
        LOG_ERR("No IP connectivity — fallback");
        return;
    }


    if (strlen(lte_cfg.client_psk_id) == 0 || strlen(lte_cfg.client_psk) == 0) {
        LOG_INF("PSK credentials not set, skipping...");
        return;
    }
    
    struct golioth_client_config my_client_config = {
        .credentials =
            {
                .auth_type = GOLIOTH_TLS_AUTH_TYPE_PSK,
                .psk =
                    {
                        .psk_id = lte_cfg.client_psk_id,
                        .psk_id_len = strlen(lte_cfg.client_psk_id),
                        .psk = lte_cfg.client_psk,
                        .psk_len = strlen(lte_cfg.client_psk),
                    },
            },
    };
    LOG_INF("PSK ID: %s", lte_cfg.client_psk_id);
    LOG_INF("PSK: %s", lte_cfg.client_psk);
    LOG_INF("PSK ID len: %d", strlen(lte_cfg.client_psk_id));
    LOG_INF("PSK len: %d", strlen(lte_cfg.client_psk));
    
    int ret = sync_date();
    if (ret < 0)
    {
        LOG_ERR("Failed to sync date/time: %d", ret);
    }

    struct golioth_client *client = golioth_client_create(&my_client_config);
    golioth_client_register_event_callback(client, on_client_event, NULL);
    // TODO: disabled, becaus we don't get the right finish command
    // golioth_fw_update_init(client, _current_version);
    // golioth_fw_update_register_state_change_callback(on_fw_update_state_change, NULL);
    app_settings_register(client);
    print_cellular_info(modem);
    check_all_registrations(modem);
    // do_ipv4_lookup();

    if (!wait_for_connection()) // hard version  // k_sem_take(&connected, K_FOREVER);
    {
        LOG_ERR("Failed to connect to Golioth");
        golioth_client_stop(client);
        pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
        pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
        return; // Exit the thread if connection fails multiple times
    }

    // send status data
    int err = send_status_payload(modem, client, STATUS_FIRST_BOOT, 0);
    if (err != 0)
    {
        LOG_ERR("Failed to send status payload");
    }

    stop_client_if_update_complete(client);
    pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
    pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);

    while (1)
    {
        int reason = k_sem_take(&lte_send_sem, K_MSEC(lte_cfg.send_record_data_interval_s * 1000)); // we wake up every hour and check if something there to send

        uint8_t status = STATUS_OK;

        if (reason == 0 || _force_send_status)
        {
            LOG_INF("Semaphore triggered or force status. Sending data.");
            status = STATUS_MAINTENANCE; // TODO marke the send type!
        }
        else if (reason == -EAGAIN)
        {
            LOG_INF("Semaphore Timeout reached. Sending data.");
        }
        else
        {
            LOG_ERR("Semaphore take failed with error %d", reason);
            continue; // Retry if there was an error
        }

        // check if we have records to send
        int total_records_in_bytes = get_file_size(REC_FILE_PATH); // Pass 0 to read all available records
        if (total_records_in_bytes <= 0 && status == STATUS_OK)
        {
            LOG_INF("No records to send");
            continue;
        }

        pm_device_action_run(uart_dev, PM_DEVICE_ACTION_RESUME);
        pm_device_action_run(modem, PM_DEVICE_ACTION_RESUME);
        if (!app_net_connect())
        {
            LOG_ERR("No IP connectivity — next time");
            pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
            pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
            continue;
        }
        //sync time
        int ret = sync_date();
        if (ret < 0)
        {
            LOG_ERR("Failed to sync date/time: %d", ret);
        }

        int start_err = golioth_client_start(client);
        if (start_err != 0)
        {
            LOG_ERR("Failed to start Golioth client: %d", start_err);
            pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
            pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
            continue;
        }
        LOG_INF("Golioth client started");

        // int connectionReason = k_sem_take(&connected, CONNECTION_TIMEOUT);

        if (!golioth_client_wait_for_connect(client, 3000))
        {
            LOG_ERR("Failed to connect to Golioth");
            pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
            pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
            continue;
        }

        // send status data
        int err = send_status_payload(modem, client, status, 0); // TODO: check error register!
        if (err != 0)
        {
            LOG_ERR("Failed to send status payload");
        }

        err = stream_file_to_golioth(client, "/record", REC_FILE_PATH);
        if (err != 0)
        {
            LOG_ERR("Failed streaming record data. %d", err);
        }
        else
        {
            LOG_INF("Sent %d bytes of records ", total_records_in_bytes);
            file_manager_delete(REC_FILE_PATH);
            print_counter(&counter);
            counter_reset(&counter, false);
        }

        // !!! we don't send data if the send_adc_data_interval_s is 0!!!
        int64_t now = k_uptime_get();
        if (lte_cfg.send_adc_data_interval_s != 0 && (now - last_time_sensor_readings_send) >= lte_cfg.send_adc_data_interval_s * 1000)
        {
            LOG_INF("Sending sensor data");
            err = stream_file_to_golioth(client, "/sensor", ADC_FILE_PATH);
            if (err != 0)
            {
                LOG_ERR("Failed streaming sensor data. %d", err);
                GLTH_LOGE(TAG, "Failed streaming sensor data. %d", err);
            }
            else
            {

                LOG_INF("Sent %d records", total_records_in_bytes);
                file_manager_delete(ADC_FILE_PATH); // TODO: check if we need a extra check or discuss if this is good ?
                last_time_sensor_readings_send = now;
            }
            // check if we need to send data on next online
            if (lte_cfg.send_adc_data_interval_s == 1)
            {
                LOG_INF("Sending sensor data on next online");
                reset_send_adc_data_next_online();
            }
        }
        stop_client_if_update_complete(client);
        pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
        pm_device_action_run(uart_dev, PM_DEVICE_ACTION_SUSPEND);
    }
}

// TODO: kConfig for not using modem api
K_THREAD_DEFINE(data_tid, THREAD_STACKSIZE_DATA, data_thread, NULL, NULL, NULL, THREAD_PRIORITY_DATA, 0, THREAD_START_DELAY);
*** Booting My Application v0.0.6-stable-480f0d9f11a9 ***
*** Using nRF Connect SDK v2.9.0-7787b2649840 ***
*** Using Zephyr OS v3.7.99-1f8f3dc29142 ***
*** Golioth Firmware SDK v0.17.0-5-g7594c7ad6c5b ***
[00:00:00.060,791] <inf> edge_app: GPIO 2V4 already set 0xFFFFFFFA
[00:00:00.060,791] <inf> edge_app: Starting application...
[00:00:00.060,821] <inf> edge_app: Build time: Mar 27 2025 13:41:24
[00:00:00.060,852] <inf> edge_app: Version: 0.0.6
[00:00:00.060,852] <inf> edge_app: SerialNo db185061
[00:00:00.063,964] <inf> fs_nvs: 4 Sectors of 4096 bytes
[00:00:00.063,995] <inf> fs_nvs: alloc wra: 0, d38
[00:00:00.063,995] <inf> fs_nvs: data wra: 0, a4c
[00:00:00.076,202] <inf> edge_app: SIMCOM module initalised
[00:00:00.076,232] <inf> edge_app: Golioth psk_id:xxx@xxx
[00:00:00.076,263] <inf> edge_app: Golioth psk:xxx
[00:00:00.076,293] <inf> led0m


[00:00:04.060,821] <inf> lte_data_thread: data_thread started
[00:00:04.060,913] <inf> app_networking: Bringing up network interface
[00:00:04.061,004] <inf> app_networking: Waiting to obtain IP address
[00:00:24.100,524] <inf> app_networking: Network event received
[00:00:25.100,677] <inf> lte_data_thread: PSK ID: xxx@txxx
[00:00:25.100,708] <inf> lte_data_thread: PSK: xxxx
[00:00:25.100,738] <inf> lte_data_thread: PSK ID len: 30
[00:00:25.100,738] <inf> lte_data_thread: PSK len: 32
[00:00:25.146,942] <inf> app_networking: RSSI -79
[00:00:25.147,003] <inf> app_networking: IMEI: 860016049954113
[00:00:25.147,033] <inf> app_networking: Technology 4: Registration status = Registered (Roaming)
[00:00:25.147,064] <inf> app_networking: Technology 7: Registration status = Registered (Roaming)
[00:00:25.147,125] <inf> app_networking: Technology 0: Registration status = Not Registered
[00:00:25.147,155] <inf> app_networking: Technology 2: Registration status = Registered (Roaming)
[00:00:25.147,155] <inf> lte_data_thread: Attempt 1 to connect to Golioth...
[00:00:28.738,281] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:00:28.739,318] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:28.739,318] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:37.499,023] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:00:37.500,061] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:37.500,061] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:46.372,985] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:00:46.373,840] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:46.373,840] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:54.628,051] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:00:54.629,058] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:00:54.629,089] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:03.049,041] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:01:03.050,048] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:03.050,048] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:11.333,129] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:01:11.333,953] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:11.333,953] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:19.699,035] <err> golioth_coap_client_zephyr: Failed to connect to socket: -11
[00:01:19.700,073] <err> golioth_coap_client_zephyr: Failed to connect: -11
[00:01:19.700,073] <wrn> golioth_coap_client_zephyr: Failed to connect: -11
```

BR