[tarantool-patches] Re: [PATCH v2 1/2] lua: return getaddrinfo() errors

Roman Khabibov roman.habibov at tarantool.org
Mon Aug 5 16:36:46 MSK 2019


The problem that stated in the issue is about improper error message. I don't
>>> understand the test in this context: it does not check whether it is so. It is
>>> applicable to lua tests too.
>>> I would add relevant test case for coio_getaddrinfo() into
>>> test/unit/coio.cc.
>> I added this for this reason.
>> Why? Now, error message is proper, and I check it by grep:
>> +-- gh-4138 Check getaddrinfo() error. Error code and error message
>> +-- returned by getaddrinfo() depends on system's gai_strerror()
>> +-- and compiler. So that there is no checking for certain error
>> +-- message.
>> +
>> +s, err = socket:connect('non_exists_hostname:3301')
>> +string.find(err, 'getaddrinfo') ~= nil
>> +s, err = socket:bind('non_exists_hostname:3301')
>> +string.find(err, 'getaddrinfo') ~= nil
>> +
>> Same about net.box test.
> 
> 
> It does not depend on compiler, but on libc. Anyway, if there are two
> variants of an error message, just check that it is one of them. I saw
> this check in a previous verison of the patch (but it was checked
> against three messages; don't know why).
+-- gh-4138 Check getaddrinfo() error from socket:connect() only.
+-- Error code and error message returned by getaddrinfo() depends
+-- on system's gai_strerror(). So that there is no checking for
+-- certain error message.
+test_run:cmd("setopt delimiter ';'")
+function check_err(err)
+    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
+       err == 'getaddrinfo: Servname not supported for ai_socktype' or
+       err == 'getaddrinfo: Name or service not known' then
+            return true
+    end
+    return false
+end;
+test_run:cmd("setopt delimiter ''");
+
+s, err = socket:connect('non_exists_hostname:3301')
+check_err(err) == true
+
 test_run:cmd("clear filter”)

>> 
>>> Also applicable to similar changes in other lua functions.
>>> 
>>>>    for i, remote in pairs(dns) do
>>>>        timeout = stop - fiber.clock()
>>>>        if timeout <= 0 then
>>>>            boxerrno(boxerrno.ETIMEDOUT)
>>>> -            return nil
>>>> +            return nil, 'timed out'
>>>>        end
>>>>        local s = socket_new(remote.family, remote.type, remote.protocol)
>>>>        if s then
>>> 
>>> Do you really think this change is self-explanatory and it is obvious
>>> why this is needed (and why this was not needed before)?
>>> 
>>> Yep, we discussed it voicely, but I'm not a last reader of your code.
>> -            return nil
>> +            -- Second arg added to do the behaviour of this
>> +            -- function symmetrical in case of timeout on both
>> +            -- Linux and Mac OS. On Mac OS error message 'timed
>> +            -- out' is thrown by getaddrinfo(). On Linux the
>> +            -- behaviour of getaddrinfo() in this case isn't same
>> +            -- and timeout is checked here.
>> +            return nil, 'timed out'
> 
> It is unclear why the difference between OSes exists here. I don't sure,
> but this can be flaky behaviour. We check timeout in coio_task and set a
> TimedOut diag (with 'timed out' message) for address resolving timeout.
> Here we (most likely) will get timeout if connect() takes a long time. I
> guess your fix is for the same problem that Mergen already fixed in
> 79f876adee1e11961db7840381bea7a18fe18180 ('box: increase connection
> timeout in "net.box.test.lua"'). If so this change is not needed
> anymore.
-socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
+-- Test timeout. In this test, tcp_connect can return the second
+-- output value on Mac OS from internal.getaddrinfo (check for
+-- timeout in coio_task and setting a TimedOut diag (with
+-- 'timed out' message) for address resolving timeout). On Linux,
+-- it is not occurred and time-out is checked in tcp_connect().
+-- This difference has appeared after gh-4138 patch.
+s, err = socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
+s == nil


commit 1e7d16ffa5c1908cac83ddee4e0f6cf42fce55e5
Author: Roman Khabibov <roman.habibov at tarantool.org>
Date:   Tue Jul 30 15:49:13 2019 +0300

    lua: return getaddrinfo() errors
    
    Add getaddrinfo() errors into the several fuctions of socket. Now
    getaddrinfo() can return a pair of values (nil and error message)
    in case of error.
    
    Closes #4138

diff --git a/src/box/lua/net_box.lua b/src/box/lua/net_box.lua
index 31a8c16b7..beab3b0a7 100644
--- a/src/box/lua/net_box.lua
+++ b/src/box/lua/net_box.lua
@@ -150,9 +150,12 @@ local function next_id(id) return band(id + 1, 0x7FFFFFFF) end
 local function establish_connection(host, port, timeout)
     local timeout = timeout or DEFAULT_CONNECT_TIMEOUT
     local begin = fiber.clock()
-    local s = socket.tcp_connect(host, port, timeout)
+    local s, err = socket.tcp_connect(host, port, timeout)
     if not s then
-        return nil, errno.strerror(errno())
+        if not err then
+            return nil, errno.strerror(errno())
+        end
+        return nil, err
     end
     local msg = s:read({chunk = IPROTO_GREETING_SIZE},
                         timeout - (fiber.clock() - begin))
diff --git a/src/lua/socket.c b/src/lua/socket.c
index 130378caf..b4282335f 100644
--- a/src/lua/socket.c
+++ b/src/lua/socket.c
@@ -774,6 +774,18 @@ lbox_getaddrinfo_result_wrapper(struct lua_State *L)
 	return 1;
 }
 
+/**
+ * Wrap coio_getaddrinfo() and call it. Push returned values onto
+ * @a L Lua stack. Raise a Lua error in case of error.
+ *
+ * @param L Lua stack.
+ *
+ * @retval 1 Number of returned values by Lua function if
+ * coio_getaddrinfo() success.
+ * @retval 2 Number of returned values by Lua function if
+ * coio_getaddrinfo() is failed (first is nil, second is error
+ * message).
+ */
 static int
 lbox_socket_getaddrinfo(struct lua_State *L)
 {
@@ -816,7 +828,9 @@ lbox_socket_getaddrinfo(struct lua_State *L)
 
 	if (dns_res != 0) {
 		lua_pushnil(L);
-		return 1;
+		struct error *err = diag_get()->last;
+		lua_pushstring(L, err->errmsg);
+		return 2;
 	}
 
 	/* no results */
diff --git a/src/lua/socket.lua b/src/lua/socket.lua
index a334ad45b..324c5fbb5 100644
--- a/src/lua/socket.lua
+++ b/src/lua/socket.lua
@@ -1041,9 +1041,12 @@ local function tcp_connect(host, port, timeout)
     end
     local timeout = timeout or TIMEOUT_INFINITY
     local stop = fiber.clock() + timeout
-    local dns = getaddrinfo(host, port, timeout, { type = 'SOCK_STREAM',
+    local dns, err = getaddrinfo(host, port, timeout, { type = 'SOCK_STREAM',
         protocol = 'tcp' })
-    if dns == nil or #dns == 0 then
+    if dns == nil then
+        return nil, err
+    end
+    if #dns == 0 then
         boxerrno(boxerrno.EINVAL)
         return nil
     end
@@ -1360,10 +1363,9 @@ local function lsocket_tcp_connect(self, host, port)
     -- This function is broken by design
     local ga_opts = { family = 'AF_INET', type = 'SOCK_STREAM' }
     local timeout = deadline - fiber.clock()
-    local dns = getaddrinfo(host, port, timeout, ga_opts)
+    local dns, err = getaddrinfo(host, port, timeout, ga_opts)
     if dns == nil or #dns == 0 then
-        self._errno = boxerrno.EINVAL
-        return nil, socket_error(self)
+        return nil, err
     end
     for _, remote in ipairs(dns) do
         timeout = deadline - fiber.clock()
@@ -1547,9 +1549,12 @@ local function lsocket_connect(host, port)
     if host == nil or port == nil then
         error("Usage: luasocket.connect(host, port)")
     end
-    local s = tcp_connect(host, port)
+    local s, err = tcp_connect(host, port)
     if not s then
-        return nil, boxerrno.strerror()
+        if not err then
+            return nil, boxerrno.strerror()
+        end
+        return nil, err
     end
     setmetatable(s, lsocket_tcp_client_mt)
     return s
diff --git a/test/app/socket.result b/test/app/socket.result
index 77eff7370..a361c378e 100644
--- a/test/app/socket.result
+++ b/test/app/socket.result
@@ -941,10 +941,18 @@ sc:close()
 - true
 ...
 -- tcp_connect
--- test timeout
-socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
+-- Test timeout. In this test, tcp_connect can return the second
+-- output value on Mac OS from internal.getaddrinfo (check for
+-- timeout in coio_task and setting a TimedOut diag (with
+-- 'timed out' message) for address resolving timeout). On Linux,
+-- it is not occurred and time-out is checked in tcp_connect().
+-- This difference has appeared after gh-4138 patch.
+s, err = socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
 ---
-- null
+...
+s == nil
+---
+- true
 ...
 -- AF_INET
 s = socket('AF_INET', 'SOCK_STREAM', 'tcp')
@@ -1664,7 +1672,7 @@ fio.stat(path) == nil
 { socket.tcp_connect('abrakadabra#123') == nil, errno.strerror() }
 ---
 - - true
-  - Invalid argument
+  - Input/output error
 ...
 -- wrong options for getaddrinfo
 socket.getaddrinfo('host', 'port', { type = 'WRONG' }) == nil and errno() == errno.EINVAL
@@ -2870,6 +2878,35 @@ server:close()
 ---
 - true
 ...
+-- gh-4138 Check getaddrinfo() error from socket:connect() only.
+-- Error code and error message returned by getaddrinfo() depends
+-- on system's gai_strerror(). So that there is no checking for
+-- certain error message.
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+function check_err(err)
+    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
+       err == 'getaddrinfo: Servname not supported for ai_socktype' or
+       err == 'getaddrinfo: Name or service not known' then
+            return true
+    end
+    return false
+end;
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+s, err = socket:connect('non_exists_hostname:3301')
+---
+...
+check_err(err) == true
+---
+- true
+...
 test_run:cmd("clear filter")
 ---
 - true
diff --git a/test/app/socket.test.lua b/test/app/socket.test.lua
index 7ae9a98aa..600eaf2a6 100644
--- a/test/app/socket.test.lua
+++ b/test/app/socket.test.lua
@@ -300,8 +300,14 @@ sc:close()
 
 -- tcp_connect
 
--- test timeout
-socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
+-- Test timeout. In this test, tcp_connect can return the second
+-- output value on Mac OS from internal.getaddrinfo (check for
+-- timeout in coio_task and setting a TimedOut diag (with
+-- 'timed out' message) for address resolving timeout). On Linux,
+-- it is not occurred and time-out is checked in tcp_connect().
+-- This difference has appeared after gh-4138 patch.
+s, err = socket.tcp_connect('127.0.0.1', 80, 0.00000000001)
+s == nil
 
 -- AF_INET
 s = socket('AF_INET', 'SOCK_STREAM', 'tcp')
@@ -982,6 +988,24 @@ fiber.cancel(echo_fiber)
 client:read(1, 5) == ''
 server:close()
 
+-- gh-4138 Check getaddrinfo() error from socket:connect() only.
+-- Error code and error message returned by getaddrinfo() depends
+-- on system's gai_strerror(). So that there is no checking for
+-- certain error message.
+test_run:cmd("setopt delimiter ';'")
+function check_err(err)
+    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
+       err == 'getaddrinfo: Servname not supported for ai_socktype' or
+       err == 'getaddrinfo: Name or service not known' then
+            return true
+    end
+    return false
+end;
+test_run:cmd("setopt delimiter ''");
+
+s, err = socket:connect('non_exists_hostname:3301')
+check_err(err) == true
+
 test_run:cmd("clear filter")
 
 -- case: sicket receive inconsistent behavior
diff --git a/test/box/net.box.result b/test/box/net.box.result
index e3dabf7d9..2c0219c24 100644
--- a/test/box/net.box.result
+++ b/test/box/net.box.result
@@ -3927,6 +3927,35 @@ test_run:grep_log('default', '00000040:.*')
 ---
 - null
 ...
+-- gh-4138 Check getaddrinfo() error from connect() only. Error
+-- code and error message returned by getaddrinfo() depends on
+-- system's gai_strerror(). So that there is no checking for
+-- certain error message.
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+function check_err(err)
+    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
+       err == 'getaddrinfo: Servname not supported for ai_socktype' or
+       err == 'getaddrinfo: Name or service not known' then
+            return true
+    end
+    return false
+end;
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+s = remote.connect('non_exists_hostname:3301')
+---
+...
+check_err(s['error']) == true
+---
+- true
+...
 box.cfg{log_level=log_level}
 ---
 ...
diff --git a/test/box/net.box.test.lua b/test/box/net.box.test.lua
index 8e65ff470..a6d98da00 100644
--- a/test/box/net.box.test.lua
+++ b/test/box/net.box.test.lua
@@ -1582,4 +1582,22 @@ test_run:wait_log('default', '00000030:.*', nil, 10)
 -- we expect nothing below, so don't wait
 test_run:grep_log('default', '00000040:.*')
 
+-- gh-4138 Check getaddrinfo() error from connect() only. Error
+-- code and error message returned by getaddrinfo() depends on
+-- system's gai_strerror(). So that there is no checking for
+-- certain error message.
+test_run:cmd("setopt delimiter ';'")
+function check_err(err)
+    if err == 'getaddrinfo: nodename nor servname provided, or not known' or
+       err == 'getaddrinfo: Servname not supported for ai_socktype' or
+       err == 'getaddrinfo: Name or service not known' then
+            return true
+    end
+    return false
+end;
+test_run:cmd("setopt delimiter ''");
+
+s = remote.connect('non_exists_hostname:3301')
+check_err(s['error']) == true
+
 box.cfg{log_level=log_level}




More information about the Tarantool-patches mailing list