[tarantool-patches] Re: [PATCH v2 2/2] say: take getaddrinfo() errors into account
Alexander Turenko
alexander.turenko at tarantool.org
Thu Aug 29 00:34:31 MSK 2019
This particular patch is mostly okay, but I would work a bit more on
tests and minor details.
Please, consider comments below.
WBR, Alexander Turenko.
On Mon, Aug 05, 2019 at 04:32:41PM +0300, Roman Khabibov wrote:
>
>
> > On Jul 23, 2019, at 17:52, Alexander Turenko <alexander.turenko at tarantool.org> wrote:
> >
> >> @@ -594,6 +594,7 @@ log_syslog_init(struct log *log, const char *init_str)
> >> log->fd = log_syslog_connect(log);
> >> if (log->fd < 0) {
> >> /* syslog indent is freed in atexit(). */
> >> + diag_log();
> >
> > 1. It is need to be properly commented: we need to log a diagnostics
> > here until stacked diagnostics will be implemented (#1148).
> >
> > 2. syslog_connect_unix() does not set a diag, diag_log() will lead to an
> > assertion fail.
> >
> > 3. I would mention this change in the commit message, because it is not
> > part of the problem with Mac OS you described in it.
> @@ -506,10 +506,15 @@ syslog_connect_remote(const char *server_address)
> hints.ai_protocol = IPPROTO_UDP;
>
> ret = getaddrinfo(remote, portnum, &hints, &inf);
> - if (ret < 0) {
> + if (ret != 0) {
> errno = EIO;
> diag_set(SystemError, "getaddrinfo: %s",
> gai_strerror(ret));
> + /*
> + * We need to log a diagnostics here until stacked
> + * diagnostics will be implemented (#1148).
> + */
> + diag_log();
It is not the only error that is possible in this function, but others
will not be logged. I think the logging should be added before replacing
the diagnostic with the next one:
| --- a/src/lib/core/say.c
| +++ b/src/lib/core/say.c
| @@ -593,6 +593,7 @@ log_syslog_init(struct log *log, const char *init_str)
| say_free_syslog_opts(&opts);
| log->fd = log_syslog_connect(log);
| if (log->fd < 0) {
| + /* XXX: comment. */
| + diag_log();
| /* syslog indent is freed in atexit(). */
| diag_set(SystemError, "syslog logger: %s", strerror(errno));
| return -1;
> commit f5b19e933fbf2eb3784a0c3cc28d999a3fa85abe
> Author: Roman Khabibov <roman.habibov at tarantool.org>
> Date: Tue Jul 30 15:39:21 2019 +0300
>
> coio/say: take getaddrinfo() errors into account
I would say that this fix is for Mac OS in the commit header if
possible. Say, 'coio/say: fix getaddrinfo error handling on Mac OS'.
>
> Before this patch, branch when getaddrinfo() returns error codes
> couldn't be reached on Mac OS, because they are greater than 0 on
> Mac OS (assumption "rc < 0" in commit ea1da04 is incorrect for
> Mac OS).
>
> * diag log() in say.c was added, because we need to log a
> diagnostics here until stacked diagnostics will be implemented
> (#1148).
I would say, because otherwise it will be hid by the following
diagnostic and then say that it should be handler in a better way after
#1148.
Asterisk here is redundant, because it is not a list.
>
> Need for #4138
>
> diff --git a/src/lib/core/coio_task.c b/src/lib/core/coio_task.c
> index 908b336ed..83f669d05 100644
> --- a/src/lib/core/coio_task.c
> +++ b/src/lib/core/coio_task.c
> @@ -413,7 +413,7 @@ coio_getaddrinfo(const char *host, const char *port,
> return -1; /* timed out or cancelled */
>
> /* Task finished */
> - if (task->rc < 0) {
> + if (task->rc != 0) {
> /* getaddrinfo() failed */
> errno = EIO;
> diag_set(SystemError, "getaddrinfo: %s",
> diff --git a/src/lib/core/say.c b/src/lib/core/say.c
> index 0b2cf2c34..a45595443 100644
> --- a/src/lib/core/say.c
> +++ b/src/lib/core/say.c
> @@ -506,10 +506,15 @@ syslog_connect_remote(const char *server_address)
> hints.ai_protocol = IPPROTO_UDP;
>
> ret = getaddrinfo(remote, portnum, &hints, &inf);
> - if (ret < 0) {
> + if (ret != 0) {
> errno = EIO;
> diag_set(SystemError, "getaddrinfo: %s",
> gai_strerror(ret));
> + /*
> + * We need to log a diagnostics here until stacked
> + * diagnostics will be implemented (#1148).
> + */
> + diag_log();
> goto out;
> }
> for (ptr = inf; ptr; ptr = ptr->ai_next) {
> diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
> index 55de5e41c..d92e9140d 100755
> --- a/test/box-tap/cfg.test.lua
> +++ b/test/box-tap/cfg.test.lua
> @@ -6,7 +6,7 @@ local socket = require('socket')
> local fio = require('fio')
> local uuid = require('uuid')
> local msgpack = require('msgpack')
> -test:plan(104)
> +test:plan(105)
>
> --------------------------------------------------------------------------------
> -- Invalid values
> @@ -566,6 +566,23 @@ os.exit(0)
> ]]
> test:is(run_script(code), 0, "log_nonblock")
>
> +--
> +-- gh-4138: check getaddrinfo() error and panic after that.
> +--
> +code=[[
> +local socket = require('socket')
> +local log = require('log')
> +local fio = require('fio')
> +
> +path = fio.pathjoin(fio.cwd(), 'log_unix_socket_test.sock')
> +unix_socket = socket('AF_UNIX', 'SOCK_DGRAM', 0)
> +unix_socket:bind('unix/', path)
> +
> +opt = string.format("syslog:server=non_exists_hostname:%s,identity=tarantool", path)
> +box.cfg{log = opt, log_nonblock=true}
log_nonblock is not needed here, so it is better to remove it.
box.cfg{log = 'syslog:server=non_exists_hostname:3301'} is enough, not need to
form a file path, no need identity, no need requiring socket, log and fio.
The test passes even before the patch, so what it is intended to test? I
think we should write a test that verifies stderr output to find all log
messages we expect to appear in the case:
Linux:
| SystemError getaddrinfo: Temporary failure in name resolution: Input/output error
| SystemError syslog logger: Input/output error: Input/output error
| failed to initialize logging subsystem
gai_strerror() message corresponds to EAI_AGAIN.
Mac OS:
| SystemError getaddrinfo: nodename nor servname provided, or not known: Input/output error
| SystemError syslog logger: Input/output error: Input/output error
| failed to initialize logging subsystem
gai_strerror() message corresponds to EAI_NONAME.
I propose to call ffi.C.gai_strerror() right from a test to form two
error messages and verify that the actual input match one of them.
If it is hard to catch stderr, then let's proceed w/o this test. However
I think it is doable.
I also propose to test error messages in the similar way (using
ffi.C.gai_strerror(GAI_AGAIN) and ffi.C.gai_strerror(GAI_NONAME)) in
test cases in second patch of the patchset.
> +]]
> +test:is(run_script(code), PANIC, "log_nonblock")
> +
> --
> -- Crash (instead of panic) when trying to recover a huge tuple.
> --
> diff --git a/test/unit/coio.cc b/test/unit/coio.cc
> index bb8bd7131..a70d3254d 100644
> --- a/test/unit/coio.cc
> +++ b/test/unit/coio.cc
> @@ -72,7 +72,7 @@ static void
> test_getaddrinfo(void)
> {
> header();
> - plan(1);
> + plan(3);
> const char *host = "127.0.0.1";
> const char *port = "3333";
> struct addrinfo *i;
> @@ -81,6 +81,12 @@ test_getaddrinfo(void)
> is(rc, 0, "getaddrinfo");
> freeaddrinfo(i);
>
> + /* gh-4138: Check getaddrinfo() error. */
> + isnt(coio_getaddrinfo("non_exists_hostname", port, NULL, &i, 1), 0,
> + "getaddrinfo error");
I would say 'getaddrinfo retval' instead 'getaddrinfo error'.
Use `rc = coio_getaddrinfo(<...>)` as above within this function, it
reads easier.
> + isnt(strstr(diag_get()->last->errmsg, "getaddrinfo"), NULL,
> + "getaddrinfo error message");
> +
I propose to verify the entire error message using
gai_strerror(GAI_AGAIN) and gai_strerror(GAI_NONAME)—just as proposed
above for a log message.
> /*
> * gh-4209: 0 timeout should not be a special value and
> * detach a task. Before a fix it led to segfault
> diff --git a/test/unit/coio.result b/test/unit/coio.result
> index 5019fa48a..49759b747 100644
> --- a/test/unit/coio.result
> +++ b/test/unit/coio.result
> @@ -7,6 +7,8 @@
> # call done with res 0
> *** test_call_f: done ***
> *** test_getaddrinfo ***
> -1..1
> +1..3
> ok 1 - getaddrinfo
> +ok 2 - getaddrinfo error
> +ok 3 - getaddrinfo error message
> *** test_getaddrinfo: done ***
>
More information about the Tarantool-patches
mailing list