[tarantool-patches] Re: [PATCH v2 1/2] lua: return getaddrinfo() errors

Alexander Turenko alexander.turenko at tarantool.org
Thu Aug 29 03:45:47 MSK 2019


I'm okay here with the minimalistic approach when we only change those
functions that were requested to be changed.

I have several comments below.

WBR, Alexander Turenko.

On Mon, Aug 05, 2019 at 04:36:46PM +0300, Roman Khabibov wrote:
> The problem that stated in the issue is about improper error message. I don't
> >>> understand the test in this context: it does not check whether it is so. It is
> >>> applicable to lua tests too.
> >>> I would add relevant test case for coio_getaddrinfo() into
> >>> test/unit/coio.cc.
> >> I added this for this reason.
> >> Why? Now, error message is proper, and I check it by grep:
> >> +-- gh-4138 Check getaddrinfo() error. Error code and error message
> >> +-- returned by getaddrinfo() depends on system's gai_strerror()
> >> +-- and compiler. So that there is no checking for certain error
> >> +-- message.
> >> +
> >> +s, err = socket:connect('non_exists_hostname:3301')
> >> +string.find(err, 'getaddrinfo') ~= nil
> >> +s, err = socket:bind('non_exists_hostname:3301')
> >> +string.find(err, 'getaddrinfo') ~= nil
> >> +
> >> Same about net.box test.
> > 
> > 
> > It does not depend on compiler, but on libc. Anyway, if there are two
> > variants of an error message, just check that it is one of them. I saw
> > this check in a previous verison of the patch (but it was checked
> > against three messages; don't know why).
> +-- gh-4138 Check getaddrinfo() error from socket:connect() only.
> +-- Error code and error message returned by getaddrinfo() depends
> +-- on system's gai_strerror(). So that there is no checking for
> +-- certain error message.

It checks for certain error messages now.

> +test_run:cmd("setopt delimiter ';'")
> +function check_err(err)
> +    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
> +       err == 'getaddrinfo: Servname not supported for ai_socktype' or
> +       err == 'getaddrinfo: Name or service not known' then
> +            return true
> +    end
> +    return false
> +end;
> +test_run:cmd("setopt delimiter ''");
> +
> +s, err = socket:connect('non_exists_hostname:3301')
> +check_err(err) == true
> +
>  test_run:cmd("clear filter”)

It is better to verify it against ffi.C.gai_strerror(EAI_AGAIN)
ffi.C.gai_strerror(EAI_NONAME) and don't rely on the fact that those
messages don't changed across libc versions. The same for other tests.

> 
> >> 
> >>> Also applicable to similar changes in other lua functions.
> >>> 
> >>>>    for i, remote in pairs(dns) do
> >>>>        timeout = stop - fiber.clock()
> >>>>        if timeout <= 0 then
> >>>>            boxerrno(boxerrno.ETIMEDOUT)
> >>>> -            return nil
> >>>> +            return nil, 'timed out'
> >>>>        end
> >>>>        local s = socket_new(remote.family, remote.type, remote.protocol)
> >>>>        if s then
> >>> 
> >>> Do you really think this change is self-explanatory and it is obvious
> >>> why this is needed (and why this was not needed before)?
> >>> 
> >>> Yep, we discussed it voicely, but I'm not a last reader of your code.
> >> -            return nil
> >> +            -- Second arg added to do the behaviour of this
> >> +            -- function symmetrical in case of timeout on both
> >> +            -- Linux and Mac OS. On Mac OS error message 'timed
> >> +            -- out' is thrown by getaddrinfo(). On Linux the
> >> +            -- behaviour of getaddrinfo() in this case isn't same
> >> +            -- and timeout is checked here.
> >> +            return nil, 'timed out'
> > 
> > It is unclear why the difference between OSes exists here. I don't sure,
> > but this can be flaky behaviour. We check timeout in coio_task and set a
> > TimedOut diag (with 'timed out' message) for address resolving timeout.
> > Here we (most likely) will get timeout if connect() takes a long time. I
> > guess your fix is for the same problem that Mergen already fixed in
> > 79f876adee1e11961db7840381bea7a18fe18180 ('box: increase connection
> > timeout in "net.box.test.lua"'). If so this change is not needed
> > anymore.
> -socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
> +-- Test timeout. In this test, tcp_connect can return the second
> +-- output value on Mac OS from internal.getaddrinfo (check for
> +-- timeout in coio_task and setting a TimedOut diag (with
> +-- 'timed out' message) for address resolving timeout). On Linux,
> +-- it is not occurred and time-out is checked in tcp_connect().
> +-- This difference has appeared after gh-4138 patch.
> +s, err = socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
> +s == nil

I tested it: it is flaky on Mac OS: sometimes getaddrinfo is timed out,
sometimes connect. On Linux however getaddrinfo is fast enough to never
give timeout error in the case. But if we'll add a busy loop into
getaddrinfo_cb, then the situation becomes the same.

So I would not say that it is not appears on Linux: it is possible
theoretically. The root of the problem is not in OS differences, but
that there are two sources of timeout errors that are reported
differently. Please, describe the problem in OS agnostic way, because it
is not OS dependent by its nature.

I'm not against mention that **usually** Mac OS behaves in one way, but
Linux **usually** in another: you can mention it afterwards if you wish.

> diff --git a/src/lua/socket.c b/src/lua/socket.c
> index 130378caf..b4282335f 100644
> --- a/src/lua/socket.c
> +++ b/src/lua/socket.c
> @@ -774,6 +774,18 @@ lbox_getaddrinfo_result_wrapper(struct lua_State *L)
>  	return 1;
>  }
>  
> +/**
> + * Wrap coio_getaddrinfo() and call it. Push returned values onto
> + * @a L Lua stack. Raise a Lua error in case of error.

It does not raise an error usually. As I see (consider e04413b1d0) it
can raise an error only in case of 'not enough memory' Lua error. I
would ignore this and remove the sentence. You already described how
errors are reported (in retvals below), this is enough.

> + *
> + * @param L Lua stack.
> + *
> + * @retval 1 Number of returned values by Lua function if
> + * coio_getaddrinfo() success.
> + * @retval 2 Number of returned values by Lua function if
> + * coio_getaddrinfo() is failed (first is nil, second is error
> + * message).
> + */
>  static int
>  lbox_socket_getaddrinfo(struct lua_State *L)
>  {

> @@ -1360,10 +1363,9 @@ local function lsocket_tcp_connect(self, host, port)
>      -- This function is broken by design
>      local ga_opts = { family = 'AF_INET', type = 'SOCK_STREAM' }
>      local timeout = deadline - fiber.clock()
> -    local dns = getaddrinfo(host, port, timeout, ga_opts)
> +    local dns, err = getaddrinfo(host, port, timeout, ga_opts)
>      if dns == nil or #dns == 0 then
> -        self._errno = boxerrno.EINVAL
> -        return nil, socket_error(self)
> +        return nil, err

If dns is nil, then we should set self._errno to boxerrno() value and
return err.

If #dns is 0, then we should set self._errno to boxerrno() value too and
return socket_error(self).

> +-- gh-4138 Check getaddrinfo() error from socket:connect() only.
> +-- Error code and error message returned by getaddrinfo() depends
> +-- on system's gai_strerror(). So that there is no checking for
> +-- certain error message.
> +test_run:cmd("setopt delimiter ';'")
> +---
> +- true
> +...
> +function check_err(err)
> +    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
> +       err == 'getaddrinfo: Servname not supported for ai_socktype' or
> +       err == 'getaddrinfo: Name or service not known' then
> +            return true
> +    end
> +    return false
> +end;
> +---
> +...
> +test_run:cmd("setopt delimiter ''");
> +---
> +- true
> +...
> +s, err = socket:connect('non_exists_hostname:3301')

socket.connect() should be called, the colon here is the mistake.

I think we should test both native and Lua Socket API, because both were
changed.

Should we ever change Lua Socket API? Please, verify it with a
documentation and an actual behaviour of the external module. We emulate
it, so we should be compatible.

I think it also worth to test socket.getaddrinfo() function.

> +---
> +...
> +check_err(err) == truea

== true is redundant.




More information about the Tarantool-patches mailing list