[tarantool-patches] Re: [PATCH v2 1/2] lua: return getaddrinfo() errors

Alexander Turenko alexander.turenko at tarantool.org
Tue Jul 23 15:39:56 MSK 2019


Please, split Mac OS specific fix (I think both coio_task.c and say.c
can be fixed in one commit) from OS independent Lua change (that just
pass received error to a user).

Returning either `nil` or `nil, err` still look strange for me. Please,
try to implement the following way: set an additional field in a socket
object (we have '_errno', let's add another one: say, '_error') with a
message from a diagnostics and return it from socket_error() if it
exists. Don't forget to flush it when update any of those two fields (I
guess an auxiliary function to set _errno and _error).

Don't sure how should we check whether a diag is set. Maybe flush it
before a call to C and then check whether it was set afterwards.

There are places where strerror() is called directly to return `nil,
err`. They should be replaced with socket_error().

I think it worth to introduce socket_clear_error() and
socket_set_error() in a separate commit w/o any logic change.

CCed Vladimir, because we discussed possible approaches with him.

WBR, Alexander Turenko.

> >>> The commit message should mention Roman's commit and cleanly state that
> >>> (rc < 0) assumption does not work on Mac OS.
> > 
> > This is not done.
>     lua: return getaddrinfo() errors
>     
>     Before this patch, branch when getaddrinfo() returns error codes
>     couldn't be reached on Mac OS, because they are greater than 0 on
>     Mac OS (assumption "rc < 0" in commit ea1da04d5 is incorret for
>     Mac OS). Add this errors into the socket and the net.box modules.
>     Now getaddrinfo() can return a pair of values (nil and error
>     message) in case of error.
>     
>     Closes #4138

Typo: incorret -> incorrect.

> > The problem that stated in the issue is about improper error message. I don't
> > understand the test in this context: it does not check whether it is so. It is
> > applicable to lua tests too.
> >I would add relevant test case for coio_getaddrinfo() into
> >test/unit/coio.cc.
> I added this for this reason.
> Why? Now, error message is proper, and I check it by grep:
> +-- gh-4138 Check getaddrinfo() error. Error code and error message
> +-- returned by getaddrinfo() depends on system's gai_strerror()
> +-- and compiler. So that there is no checking for certain error
> +-- message.
> +
> +s, err = socket:connect('non_exists_hostname:3301')
> +string.find(err, 'getaddrinfo') ~= nil
> +s, err = socket:bind('non_exists_hostname:3301')
> +string.find(err, 'getaddrinfo') ~= nil
> +
> Same about net.box test.


It does not depend on compiler, but on libc. Anyway, if there are two
variants of an error message, just check that it is one of them. I saw
this check in a previous verison of the patch (but it was checked
against three messages; don't know why).

> >> @@ -1534,9 +1537,12 @@ local function lsocket_connect(host, port)
> >>     if host == nil or port == nil then
> >>         error("Usage: luasocket.connect(host, port)")
> >>     end
> >> -    local s = tcp_connect(host, port)
> >> +    local s, err = tcp_connect(host, port)
> >>     if not s then
> >> -        return nil, boxerrno.strerror()
> >> +        if not err then
> > 
> >> +            return nil, boxerrno.strerror()
> >> +        end
> >> +        return nil, err
> > 
> > Why not always set errno to EINVAL and return err, nil in case on an error? It
> > seems the 'if' here is redundant.
> I think that if we remove this ‘if', the behaviour of these methods
> (lsocket_connect(), lsocket_bind()) will be strange. In some cases error
> message is present, in some cases it is not, because tcp_connect() can return
> just nil without err.

Hope this will become non-actual with the way proposed above.

> 
> > Also applicable to similar changes in other lua functions.
> > 
> >>     for i, remote in pairs(dns) do
> >>         timeout = stop - fiber.clock()
> >>         if timeout <= 0 then
> >>             boxerrno(boxerrno.ETIMEDOUT)
> >> -            return nil
> >> +            return nil, 'timed out'
> >>         end
> >>         local s = socket_new(remote.family, remote.type, remote.protocol)
> >>         if s then
> > 
> > Do you really think this change is self-explanatory and it is obvious
> > why this is needed (and why this was not needed before)?
> > 
> > Yep, we discussed it voicely, but I'm not a last reader of your code.
> -            return nil
> +            -- Second arg added to do the behaviour of this
> +            -- function symmetrical in case of timeout on both
> +            -- Linux and Mac OS. On Mac OS error message 'timed
> +            -- out' is thrown by getaddrinfo(). On Linux the
> +            -- behaviour of getaddrinfo() in this case isn't same
> +            -- and timeout is checked here.
> +            return nil, 'timed out'

It is unclear why the difference between OSes exists here. I don't sure,
but this can be flaky behaviour. We check timeout in coio_task and set a
TimedOut diag (with 'timed out' message) for address resolving timeout.
Here we (most likely) will get timeout if connect() takes a long time. I
guess your fix is for the same problem that Mergen already fixed in
79f876adee1e11961db7840381bea7a18fe18180 ('box: increase connection
timeout in "net.box.test.lua"'). If so this change is not needed
anymore.



More information about the Tarantool-patches mailing list