Tarantool development patches archive
 help / color / mirror / Atom feed
From: Alexander Turenko <alexander.turenko@tarantool.org>
To: Roman Khabibov <roman.habibov@tarantool.org>
Cc: tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH v2 1/2] lua: return getaddrinfo() errors
Date: Thu, 29 Aug 2019 03:45:47 +0300	[thread overview]
Message-ID: <20190829004547.k24fktai33tlbl73@tkn_work_nb> (raw)
In-Reply-To: <704142F3-802F-4C1C-8FA2-6155BFF9A951@tarantool.org>

I'm okay here with the minimalistic approach when we only change those
functions that were requested to be changed.

I have several comments below.

WBR, Alexander Turenko.

On Mon, Aug 05, 2019 at 04:36:46PM +0300, Roman Khabibov wrote:
> The problem that stated in the issue is about improper error message. I don't
> >>> understand the test in this context: it does not check whether it is so. It is
> >>> applicable to lua tests too.
> >>> I would add relevant test case for coio_getaddrinfo() into
> >>> test/unit/coio.cc.
> >> I added this for this reason.
> >> Why? Now, error message is proper, and I check it by grep:
> >> +-- gh-4138 Check getaddrinfo() error. Error code and error message
> >> +-- returned by getaddrinfo() depends on system's gai_strerror()
> >> +-- and compiler. So that there is no checking for certain error
> >> +-- message.
> >> +
> >> +s, err = socket:connect('non_exists_hostname:3301')
> >> +string.find(err, 'getaddrinfo') ~= nil
> >> +s, err = socket:bind('non_exists_hostname:3301')
> >> +string.find(err, 'getaddrinfo') ~= nil
> >> +
> >> Same about net.box test.
> > 
> > 
> > It does not depend on compiler, but on libc. Anyway, if there are two
> > variants of an error message, just check that it is one of them. I saw
> > this check in a previous verison of the patch (but it was checked
> > against three messages; don't know why).
> +-- gh-4138 Check getaddrinfo() error from socket:connect() only.
> +-- Error code and error message returned by getaddrinfo() depends
> +-- on system's gai_strerror(). So that there is no checking for
> +-- certain error message.

It checks for certain error messages now.

> +test_run:cmd("setopt delimiter ';'")
> +function check_err(err)
> +    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
> +       err == 'getaddrinfo: Servname not supported for ai_socktype' or
> +       err == 'getaddrinfo: Name or service not known' then
> +            return true
> +    end
> +    return false
> +end;
> +test_run:cmd("setopt delimiter ''");
> +
> +s, err = socket:connect('non_exists_hostname:3301')
> +check_err(err) == true
> +
>  test_run:cmd("clear filter”)

It is better to verify it against ffi.C.gai_strerror(EAI_AGAIN)
ffi.C.gai_strerror(EAI_NONAME) and don't rely on the fact that those
messages don't changed across libc versions. The same for other tests.

> 
> >> 
> >>> Also applicable to similar changes in other lua functions.
> >>> 
> >>>>    for i, remote in pairs(dns) do
> >>>>        timeout = stop - fiber.clock()
> >>>>        if timeout <= 0 then
> >>>>            boxerrno(boxerrno.ETIMEDOUT)
> >>>> -            return nil
> >>>> +            return nil, 'timed out'
> >>>>        end
> >>>>        local s = socket_new(remote.family, remote.type, remote.protocol)
> >>>>        if s then
> >>> 
> >>> Do you really think this change is self-explanatory and it is obvious
> >>> why this is needed (and why this was not needed before)?
> >>> 
> >>> Yep, we discussed it voicely, but I'm not a last reader of your code.
> >> -            return nil
> >> +            -- Second arg added to do the behaviour of this
> >> +            -- function symmetrical in case of timeout on both
> >> +            -- Linux and Mac OS. On Mac OS error message 'timed
> >> +            -- out' is thrown by getaddrinfo(). On Linux the
> >> +            -- behaviour of getaddrinfo() in this case isn't same
> >> +            -- and timeout is checked here.
> >> +            return nil, 'timed out'
> > 
> > It is unclear why the difference between OSes exists here. I don't sure,
> > but this can be flaky behaviour. We check timeout in coio_task and set a
> > TimedOut diag (with 'timed out' message) for address resolving timeout.
> > Here we (most likely) will get timeout if connect() takes a long time. I
> > guess your fix is for the same problem that Mergen already fixed in
> > 79f876adee1e11961db7840381bea7a18fe18180 ('box: increase connection
> > timeout in "net.box.test.lua"'). If so this change is not needed
> > anymore.
> -socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
> +-- Test timeout. In this test, tcp_connect can return the second
> +-- output value on Mac OS from internal.getaddrinfo (check for
> +-- timeout in coio_task and setting a TimedOut diag (with
> +-- 'timed out' message) for address resolving timeout). On Linux,
> +-- it is not occurred and time-out is checked in tcp_connect().
> +-- This difference has appeared after gh-4138 patch.
> +s, err = socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
> +s == nil

I tested it: it is flaky on Mac OS: sometimes getaddrinfo is timed out,
sometimes connect. On Linux however getaddrinfo is fast enough to never
give timeout error in the case. But if we'll add a busy loop into
getaddrinfo_cb, then the situation becomes the same.

So I would not say that it is not appears on Linux: it is possible
theoretically. The root of the problem is not in OS differences, but
that there are two sources of timeout errors that are reported
differently. Please, describe the problem in OS agnostic way, because it
is not OS dependent by its nature.

I'm not against mention that **usually** Mac OS behaves in one way, but
Linux **usually** in another: you can mention it afterwards if you wish.

> diff --git a/src/lua/socket.c b/src/lua/socket.c
> index 130378caf..b4282335f 100644
> --- a/src/lua/socket.c
> +++ b/src/lua/socket.c
> @@ -774,6 +774,18 @@ lbox_getaddrinfo_result_wrapper(struct lua_State *L)
>  	return 1;
>  }
>  
> +/**
> + * Wrap coio_getaddrinfo() and call it. Push returned values onto
> + * @a L Lua stack. Raise a Lua error in case of error.

It does not raise an error usually. As I see (consider e04413b1d0) it
can raise an error only in case of 'not enough memory' Lua error. I
would ignore this and remove the sentence. You already described how
errors are reported (in retvals below), this is enough.

> + *
> + * @param L Lua stack.
> + *
> + * @retval 1 Number of returned values by Lua function if
> + * coio_getaddrinfo() success.
> + * @retval 2 Number of returned values by Lua function if
> + * coio_getaddrinfo() is failed (first is nil, second is error
> + * message).
> + */
>  static int
>  lbox_socket_getaddrinfo(struct lua_State *L)
>  {

> @@ -1360,10 +1363,9 @@ local function lsocket_tcp_connect(self, host, port)
>      -- This function is broken by design
>      local ga_opts = { family = 'AF_INET', type = 'SOCK_STREAM' }
>      local timeout = deadline - fiber.clock()
> -    local dns = getaddrinfo(host, port, timeout, ga_opts)
> +    local dns, err = getaddrinfo(host, port, timeout, ga_opts)
>      if dns == nil or #dns == 0 then
> -        self._errno = boxerrno.EINVAL
> -        return nil, socket_error(self)
> +        return nil, err

If dns is nil, then we should set self._errno to boxerrno() value and
return err.

If #dns is 0, then we should set self._errno to boxerrno() value too and
return socket_error(self).

> +-- gh-4138 Check getaddrinfo() error from socket:connect() only.
> +-- Error code and error message returned by getaddrinfo() depends
> +-- on system's gai_strerror(). So that there is no checking for
> +-- certain error message.
> +test_run:cmd("setopt delimiter ';'")
> +---
> +- true
> +...
> +function check_err(err)
> +    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
> +       err == 'getaddrinfo: Servname not supported for ai_socktype' or
> +       err == 'getaddrinfo: Name or service not known' then
> +            return true
> +    end
> +    return false
> +end;
> +---
> +...
> +test_run:cmd("setopt delimiter ''");
> +---
> +- true
> +...
> +s, err = socket:connect('non_exists_hostname:3301')

socket.connect() should be called, the colon here is the mistake.

I think we should test both native and Lua Socket API, because both were
changed.

Should we ever change Lua Socket API? Please, verify it with a
documentation and an actual behaviour of the external module. We emulate
it, so we should be compatible.

I think it also worth to test socket.getaddrinfo() function.

> +---
> +...
> +check_err(err) == truea

== true is redundant.

  reply	other threads:[~2019-08-29  0:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-15 16:58 [tarantool-patches] [PATCH] " Roman Khabibov
2019-06-23 20:31 ` [tarantool-patches] " Alexander Turenko
2019-06-25 13:38   ` [tarantool-patches] Re: [PATCH v2 1/2] " Roman Khabibov
2019-07-09  8:04     ` Alexander Turenko
2019-07-10  2:16       ` Roman Khabibov
2019-07-23 12:39         ` Alexander Turenko
2019-07-26 13:48           ` Alexander Turenko
2019-08-05 13:36           ` Roman Khabibov
2019-08-29  0:45             ` Alexander Turenko [this message]
     [not found]               ` <868EAF2C-A491-46C9-AD37-7512D6CAB213@tarantool.org>
2019-09-06 13:45                 ` [tarantool-patches] Re: [server-dev] " Alexander Turenko
2019-09-10 12:54                   ` Roman Khabibov
2019-11-01 14:39                     ` [Tarantool-patches] [server-dev] Re: [tarantool-patches] " Alexander Turenko
2019-11-21 17:27                       ` [Tarantool-patches] [tarantool-patches] [server-dev] " Roman Khabibov
2019-12-08 19:48                         ` Alexander Turenko
2019-12-10 16:25                           ` Roman Khabibov
2019-12-18 14:58                             ` Alexander Turenko
2019-12-21 17:50                               ` Roman Khabibov
2019-12-23 13:36                                 ` Alexander Turenko
2019-12-26 17:29                                   ` Roman Khabibov
2020-02-18 13:55                                     ` [Tarantool-patches] Fwd: " Roman Khabibov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190829004547.k24fktai33tlbl73@tkn_work_nb \
    --to=alexander.turenko@tarantool.org \
    --cc=roman.habibov@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH v2 1/2] lua: return getaddrinfo() errors' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox