SDK 0.10.0 sends >15million logs/day in error condition

When I get some kind of network error SDK 0.10.0 sends extreme log levels. Since Errors typically aren’t turned off this would create extreme cellular and Golioth charges.
Since errors of this type may not show up in testing it could easily get to the field where devices aren’t monitored closely.

The sample below is what my system sent in 1 second. If it hadn’t dropped any messages the DEFAULT SDK is sending >15 million log messages per day at a rate of 15KB/second.
15K/second= 54MB/hour = 1296MB/day = 38880MB/month = 38.8GB/month.
At 0.20/MB after the first 200M of free logs per month this ONE DEVICE would rack up ~$260 per day or $7,776 per month in log charges at $0.20/MB

I’m not sure a code base that would do this is ready for production.

[00:03:46.187,011] <wrn> golioth_coap_client_zephyr: Receive timeout
[00:03:46.187,072] <err> net_sock: invalid access on sock 11 by thread 0x200247a8
[00:03:46.187,133] <err> net_sock: invalid access on sock 11 by thread 0x200247a8
[00:03:46.187,164] <err> golioth_coap_client_zephyr: Failed to receive: -128
[00:03:46.187,622] <inf> golioth_coap_client_zephyr: Ending session
[00:03:46.187,652] <err> net_sock: invalid access on sock 0 by thread 0x200247a8
[00:03:46.187,683] <err> net_sock: invalid access on sock 0 by thread 0x200247a8
[00:03:46.187,713] <err> golioth_sys_zephyr: sem eventfd_read failed, errno: 9
[00:03:46.187,774] <dbg> sm_iot: on_client_event: Golioth client connected
[00:03:48.486,511] <inf> golioth_coap_client_zephyr: Golioth CoAP client connected
[00:03:48.486,816] <dbg> sm_iot: on_client_event: Golioth client connected
[00:03:48.486,846] <inf> golioth_coap_client_zephyr: Entering CoAP I/O loop
[00:03:49.145,385] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:49.146,026] <err> golioth_coap_client: Unknown CoAP response code class (0)
...
[00:03:55.001,922] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.004,425] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.006,866] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.009,521] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.014,373] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.016,784] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.019,195] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.021,697] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.024,536] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.027,099] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.061,309] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.063,751] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.066,619] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.069,030] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.071,380] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.073,883] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 5 messages dropped ---
[00:03:55.133,087] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 2 messages dropped ---
[00:03:55.135,528] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.138,092] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 1 messages dropped ---
[00:03:55.141,601] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 9 messages dropped ---
[00:03:55.149,597] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 4 messages dropped ---
[00:03:55.150,695] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 2 messages dropped ---
[00:03:55.151,611] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.159,637] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.160,644] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.161,529] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.169,586] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.185,638] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.207,641] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.208,862] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.209,808] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.219,451] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.284,210] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.289,062] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.291,870] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.294,494] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.342,224] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.350,189] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.355,041] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.356,384] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.357,147] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.357,910] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.361,663] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.363,006] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.363,830] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.371,734] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.372,955] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.373,901] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.382,934] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.383,941] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.392,669] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.395,599] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.401,702] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.402,893] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.403,839] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.414,245] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.416,656] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.417,449] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.526,794] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.541,900] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.542,755] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.543,548] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.544,311] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.544,982] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.553,924] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.556,335] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.563,476] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.564,208] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.609,924] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.612,304] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.613,037] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.617,889] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.638,336] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 1 messages dropped ---
[00:03:55.639,068] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.645,721] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.647,033] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 4 messages dropped ---
[00:03:55.647,796] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 21 messages dropped ---
[00:03:55.655,975] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.656,982] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.657,928] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.683,837] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.684,539] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 4 messages dropped ---
[00:03:55.692,932] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 3 messages dropped ---
[00:03:55.700,805] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.701,843] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 2 messages dropped ---
[00:03:55.710,815] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 5 messages dropped ---
[00:03:55.721,771] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 6 messages dropped ---
[00:03:55.723,175] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 3 messages dropped ---
[00:03:55.723,876] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 10 messages dropped ---
[00:03:55.731,597] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.740,051] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 1 messages dropped ---
[00:03:55.771,087] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:03:55.796,722] <err> golioth_coap_client: Unknown CoAP response code class (0)

Hey @john.stuewe ,

We can reproduce this behavior with Firmware SDK v0.10.0 but not with v0.11.0.

The issue is this function:

{
    switch (COAP_RESPONSE_CODE_CLASS(code))
    {
        case 2:
            return 0;
        case 4:
            switch (code)
            {
                case COAP_RESPONSE_CODE_BAD_REQUEST:
                    return -EFAULT;
                case COAP_RESPONSE_CODE_UNAUTHORIZED:
                    return -EACCES;
                case COAP_RESPONSE_CODE_BAD_OPTION:
                    return -EINVAL;
                case COAP_RESPONSE_CODE_FORBIDDEN:
                    return -EACCES;
                case COAP_RESPONSE_CODE_NOT_FOUND:
                    return -ENOENT;
                case COAP_RESPONSE_CODE_NOT_ALLOWED:
                    return -EACCES;
                case COAP_RESPONSE_CODE_NOT_ACCEPTABLE:
                    return -EACCES;
                case COAP_RESPONSE_CODE_INCOMPLETE:
                    return -EINVAL;
                case COAP_RESPONSE_CODE_CONFLICT:
                    return -EBUSY;
                case COAP_RESPONSE_CODE_PRECONDITION_FAILED:
                    return -EACCES;
                case COAP_RESPONSE_CODE_REQUEST_TOO_LARGE:
                    return -E2BIG;
                case COAP_RESPONSE_CODE_UNSUPPORTED_CONTENT_FORMAT:
                    return -ENOTSUP;
                case COAP_RESPONSE_CODE_UNPROCESSABLE_ENTITY:
                    return -EBADMSG;
                case COAP_RESPONSE_CODE_TOO_MANY_REQUESTS:
                    return -EBUSY;
            }
            __fallthrough;
        case 5:
            return -EBADMSG;
        default:
            LOG_ERR("Unknown CoAP response code class (%u)",
                    (unsigned int) COAP_RESPONSE_CODE_CLASS(code));
            return -EBADMSG;
    }
}

Prior to this commit in upstream Zephyr from @Mike, Zephyr didn’t forward us the TOO_MANY_REQUESTS error code, so we would log an unknown error resulting in the behavior you experienced.

Now, in the Firmware SDK v0.11.0, we get the TOO_MANY_REQUESTS code from the Zephyr stack, and we return an error without logging, which prevents us from getting into the loop.

Can you update the Firmware SDK to v0.11.0 and try to reproduce the issue?

SDK 0.11.0 does appear to fix the 15K/second log issue. It will still send >15 warning logs per second when the connection isn’t good though.

[00:01:36.385,192] <wrn> golioth_coap_client: Resending request 0x20026900 (reply 0x20026938) (retries 1)
[00:01:36.386,260] <wrn> golioth_coap_client: Resending request 0x200269f8 (reply 0x20026a30) (retries 2)
[00:01:36.387,359] <wrn> golioth_coap_client: Resending request 0x200289b0 (reply 0x200289e8) (retries 2)
[00:01:36.388,336] <wrn> golioth_coap_client: Resending request 0x20029148 (reply 0x20029180) (retries 2)
[00:01:36.389,556] <wrn> golioth_coap_client: Resending request 0x20029978 (reply 0x200299b0) (retries 2)
[00:01:36.390,594] <wrn> golioth_coap_client: Resending request 0x20029410 (reply 0x20029448) (retries 2)
[00:01:36.391,662] <wrn> golioth_coap_client: Resending request 0x20025708 (reply 0x20025740) (retries 2)
[00:01:36.392,486] <wrn> golioth_coap_client: Resending request 0x2002ac70 (reply 0x2002aca8) (retries 2)
[00:01:36.393,768] <wrn> golioth_coap_client: Resending request 0x2002aa50 (reply 0x2002aa88) (retries 2)
[00:01:36.394,958] <wrn> golioth_coap_client: Resending request 0x2002b078 (reply 0x2002b0b0) (retries 2)
[00:01:36.902,587] <wrn> golioth_coap_client: Resending request 0x20027600 (reply 0x20027638) (retries 1)
[00:01:36.903,594] <wrn> golioth_coap_client: Resending request 0x200271f8 (reply 0x20027230) (retries 1)
[00:01:36.904,571] <wrn> golioth_coap_client: Resending request 0x20024e28 (reply 0x20024e60) (retries 1)
[00:01:36.905,578] <wrn> golioth_coap_client: Resending request 0x200288c8 (reply 0x20028900) (retries 1)
[00:01:36.906,585] <wrn> golioth_coap_client: Resending request 0x2002b408 (reply 0x2002b440) (retries 2)

Hi John!

Glad to hear that v0.11 got you out of that doom loop. Thanks for the feedback about the warning messages. We’re constantly evaluating the tradeoff between the usefulness of log messages and the costs for bandwidth. As a reminder, you have options to control the logging at various points and with differing granularity:

  • You can set one of the CONFIG_GOLIOTH_LOG_LEVEL_* macros to control the maximum compiled in log messages. This will affect all of the logs coming from the SDK
  • You can change the log level filter on a frontend source at runtime. In this case, the module giving you the warning logs is golioth_coap_client, so you could set that module to only log error messages.
  • You can change the log level filter on the Golioth log backend. This can be done for the whole backend, or for a given frontend. So, for example, you can set it up so that the golioth_coap_client module logs at the info level to the UART console, but only logs at the error level to the Golioth backend.

Hope that helps.

Best,
Sam

I’m not sure I added in the original function in my testing above. I have not reproduced the same issue on SDK 0.11.0 so this issue isn’t fixed.

[00:00:15.039,794] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 2 messages dropped ---
[00:00:15.040,649] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 7 messages dropped ---
[00:00:15.050,689] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 2 messages dropped ---
[00:00:15.058,746] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 5 messages dropped ---
[00:00:15.083,801] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 6 messages dropped ---
[00:00:15.108,825] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 1 messages dropped ---
[00:00:15.111,083] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.128,997] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.159,393] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.167,114] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 4 messages dropped ---
[00:00:15.184,631] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 1 messages dropped ---
[00:00:15.277,648] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 6 messages dropped ---
[00:00:15.279,876] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.493,774] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.498,260] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.505,432] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.554,290] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.561,645] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.573,516] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 3 messages dropped ---
[00:00:15.578,063] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 8 messages dropped ---
[00:00:15.620,910] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 3 messages dropped ---
[00:00:15.630,615] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.633,178] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.635,498] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 2 messages dropped ---
[00:00:15.680,206] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.682,495] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.689,422] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.803,955] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.813,995] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.814,697] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.822,814] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.824,096] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.825,134] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.885,162] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.887,512] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.892,059] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.906,646] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.909,637] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.910,339] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.911,010] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.912,384] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.913,757] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.914,459] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.915,130] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.949,523] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.951,812] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.956,146] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.958,587] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.960,845] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 5 messages dropped ---
[00:00:15.964,813] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 7 messages dropped ---
[00:00:15.966,064] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 8 messages dropped ---
[00:00:15.982,849] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.984,069] <err> golioth_coap_client: Unknown CoAP response code class (0)
[00:00:15.984,985] <err> golioth_coap_client: Unknown CoAP response code class (0)
--- 10 messages dropped ---
[00:00:16.025,299] <err> golioth_coap_client: Unknown CoAP response code class (0)

Hey John, thanks for the reply. In our original response, we overlooked that you’re using the nRF Connect SDK (NCS). NCS is forked from an earlier commit in upstream Zephyr that doesn’t include the fix we mentioned. To workaround this issue until NCS is updated, I’ve introduced a change for us to get the CoAP response code directly instead of relying on the filtering done in the Zephyr CoAP stack. You can see the change in this PR. We’ll include this change in our next SDK release, which we expect to happen in the very near future.