* [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856
@ 2020-06-16 14:25 Alexander V. Tikhonov
2020-06-18 18:25 ` Alexander Turenko
0 siblings, 1 reply; 3+ messages in thread
From: Alexander V. Tikhonov @ 2020-06-16 14:25 UTC (permalink / raw)
To: Oleg Piskunov, Sergey Bronnikov; +Cc: tarantool-patches, Alexander Turenko
Found issue running test on FreeBSD VBox host:
[011] --- box/net.box_wait_connected_gh-3856.result Mon Jun 15 09:39:49 2020
[011] +++ box/net.box_wait_connected_gh-3856.reject Fri May 8 08:23:30 2020
[011] @@ -12,7 +12,8 @@
[011] - opts:
[011] wait_connected: false
[011] host: 8.8.8.8
[011] - state: initial
[011] + state: error
[011] + error: Invalid argument
[011] port: '123456'
[011] ...
[011] c:close()
The test uses external Google DNS IP, check information on it:
https://developers.google.com/speed/public-dns/docs/using
This issue appears because the link is external and connection
may fail from time to time. In this case the test should wait
till connection state became 'initial' and only after that the
test can continue.
Closes #5083
---
Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5083-net-box-google-dns
Issue: https://github.com/tarantool/tarantool/issues/5083
test/box/net.box_wait_connected_gh-3856.result | 9 +++++++++
test/box/net.box_wait_connected_gh-3856.test.lua | 4 ++++
2 files changed, 13 insertions(+)
diff --git a/test/box/net.box_wait_connected_gh-3856.result b/test/box/net.box_wait_connected_gh-3856.result
index 9234e6cb9..6b8a94b43 100644
--- a/test/box/net.box_wait_connected_gh-3856.result
+++ b/test/box/net.box_wait_connected_gh-3856.result
@@ -1,12 +1,21 @@
net = require('net.box')
---
...
+test_run = require('test_run').new()
+---
+...
--
-- gh-3856: wait_connected = false is ignored.
+-- Test uses Google DNS IP for testing:
+-- https://developers.google.com/speed/public-dns/docs/using
--
c = net.connect('8.8.8.8:123456', {wait_connected = false})
---
...
+test_run:wait_cond(function() return c.state == 'initial' end)
+---
+- true
+...
c
---
- opts:
diff --git a/test/box/net.box_wait_connected_gh-3856.test.lua b/test/box/net.box_wait_connected_gh-3856.test.lua
index 29e997fb5..d9fa80f3f 100644
--- a/test/box/net.box_wait_connected_gh-3856.test.lua
+++ b/test/box/net.box_wait_connected_gh-3856.test.lua
@@ -1,8 +1,12 @@
net = require('net.box')
+test_run = require('test_run').new()
--
-- gh-3856: wait_connected = false is ignored.
+-- Test uses Google DNS IP for testing:
+-- https://developers.google.com/speed/public-dns/docs/using
--
c = net.connect('8.8.8.8:123456', {wait_connected = false})
+test_run:wait_cond(function() return c.state == 'initial' end)
c
c:close()
--
2.17.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856
2020-06-16 14:25 [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856 Alexander V. Tikhonov
@ 2020-06-18 18:25 ` Alexander Turenko
2020-06-20 4:47 ` Alexander V. Tikhonov
0 siblings, 1 reply; 3+ messages in thread
From: Alexander Turenko @ 2020-06-18 18:25 UTC (permalink / raw)
To: Alexander V. Tikhonov
Cc: Oleg Piskunov, tarantool-patches, Vladislav Shpilevoy
The reason of the fail is that getaddrinfo() returns EIA_SERVICE for an
incorrect TCP/IP port on FreeBSD, but crops it as modulo of 65536 on
Linux/glibc. You may check it youself:
| /* cc getaddrinfo.c -o getaddrinfo */
|
| #include <sys/types.h>
| #include <sys/socket.h>
| #include <netdb.h>
| #include <netinet/in.h>
| #include <stdio.h>
| #include <stdlib.h>
| #include <string.h>
| #include <errno.h>
|
| const char *
| family_str(int family)
| {
| if (family == AF_INET)
| return "AF_INET";
| if (family == AF_INET6)
| return "AF_INET6";
| return "?";
| }
|
| const char *
| socktype_str(int socktype)
| {
| if (socktype == SOCK_DGRAM)
| return "SOCK_DGRAM";
| if (socktype == SOCK_STREAM)
| return "SOCK_STREAM";
| if (socktype == SOCK_RAW)
| return "SOCK_RAW";
| return "?";
| }
|
| const char *
| protocol_str(int protocol)
| {
| if (protocol == IPPROTO_TCP)
| return "IPPROTO_TCP";
| if (protocol == IPPROTO_UDP)
| return "IPPROTO_UDP";
| return "?";
| }
|
| int
| main(int argc, char **argv)
| {
| static char host[1024];
| static char serv[1024];
|
| struct addrinfo hints;
| memset(&hints, (char) 0, sizeof(hints));
| hints.ai_family = AF_UNSPEC;
| hints.ai_socktype = SOCK_STREAM;
|
| if (argc != 3) {
| fprintf(stderr, "Usage: %s host port\n", argv[0]);
| return 1;
| }
|
| struct addrinfo *addrs;
| int rc = getaddrinfo(argv[1], argv[2], &hints, &addrs);
| if (rc != 0) {
| fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rc));
| exit(1);
| }
|
| int flags = NI_NUMERICHOST | NI_NUMERICSERV;
| struct addrinfo *addr;
| for (addr = addrs; addr != NULL; addr = addr->ai_next) {
| int rc = getnameinfo(addr->ai_addr, addr->ai_addrlen,
| host, 1024, serv, 1024, flags);
| if (rc != 0) {
| fprintf(stderr, "getnameinfo error\n");
| exit(1);
| }
| printf("----\n");
| printf("family: %s\n", family_str(addr->ai_family));
| printf("socktype: %s\n", socktype_str(addr->ai_socktype));
| printf("protocol: %s\n", protocol_str(addr->ai_protocol));
| printf("host: %s\n", host);
| printf("serv: %s\n", serv);
|
| #if 0
| printf("Connecting...\n");
| int fd = socket(addr->ai_family, addr->ai_socktype, 0);
| if (connect(fd, addr->ai_addr, addr->ai_addrlen) == -1) {
| fprintf(stderr, "connect errno: %d\n", errno);
| perror("connect");
| } else {
| printf("connected successfully\n");
| }
| #endif
| }
|
| freeaddrinfo(addrs);
|
| return 0;
| }
(Linux/glibc) $ ./getaddrinfo 8.8.8.8 123456
----
family: AF_INET
socktype: SOCK_STREAM
protocol: IPPROTO_TCP
host: 8.8.8.8
serv: 57920
(FreeBSD) $ ./getaddrinfo 8.8.8.8 123456
getaddrinfo: Service was not recognized for socket type
So obvious fix would be change 123456 to something less or equal to
65535. Say, 1234.
> diff --git a/test/box/net.box_wait_connected_gh-3856.test.lua b/test/box/net.box_wait_connected_gh-3856.test.lua
> index 29e997fb5..d9fa80f3f 100644
> --- a/test/box/net.box_wait_connected_gh-3856.test.lua
> +++ b/test/box/net.box_wait_connected_gh-3856.test.lua
> @@ -1,8 +1,12 @@
> net = require('net.box')
> +test_run = require('test_run').new()
>
> --
> -- gh-3856: wait_connected = false is ignored.
> +-- Test uses Google DNS IP for testing:
> +-- https://developers.google.com/speed/public-dns/docs/using
> --
> c = net.connect('8.8.8.8:123456', {wait_connected = false})
> +test_run:wait_cond(function() return c.state == 'initial' end)
> c
> c:close()
It should not work and does not. I checked it with the following command
on a FreeBSD virtual machine:
$ ( cd test && ./test-run.py -j 20 `yes box/net.box_wait_connected_gh-3856.test.lua | head -n 1000` )
The 123456 -> 1234 change, however, passes.
The test still depend on an order in which fibers will be scheduled
(net_box.connect() creates a separate fiber for connecting in background
using fiber.create(), which yields). Unlikely our fiber will not get
execution time during the connection attempt, so it is more like a
formal thing.
But we can decrease probability of this situation even more if we'll
grab all connection fields just when net_box.connect() returns, not
after yield in console (which is due to waiting a next command from
test-run).
Consider this way:
| $ cat test/box/net.box_wait_connected_gh-3856.test.lua
| net = require('net.box')
|
| --
| -- gh-3856: wait_connected = false is ignored.
| --
| do \
| c = net.connect('8.8.8.8:1234', {wait_connected = false}) \
| return c.state \
| end
| c:close()
CCed Vlad as author of the test case.
WBR, Alexander Turenko.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856
2020-06-18 18:25 ` Alexander Turenko
@ 2020-06-20 4:47 ` Alexander V. Tikhonov
0 siblings, 0 replies; 3+ messages in thread
From: Alexander V. Tikhonov @ 2020-06-20 4:47 UTC (permalink / raw)
To: Alexander Turenko; +Cc: tarantool-patches, Vladislav Shpilevoy
Hi Alexander, thanks for your deep review, please chec my comments below.
On Thu, Jun 18, 2020 at 09:25:13PM +0300, Alexander Turenko wrote:
> The reason of the fail is that getaddrinfo() returns EIA_SERVICE for an
> incorrect TCP/IP port on FreeBSD, but crops it as modulo of 65536 on
> Linux/glibc. You may check it youself:
>
> | /* cc getaddrinfo.c -o getaddrinfo */
> |
> | #include <sys/types.h>
> | #include <sys/socket.h>
> | #include <netdb.h>
> | #include <netinet/in.h>
> | #include <stdio.h>
> | #include <stdlib.h>
> | #include <string.h>
> | #include <errno.h>
> |
> | const char *
> | family_str(int family)
> | {
> | if (family == AF_INET)
> | return "AF_INET";
> | if (family == AF_INET6)
> | return "AF_INET6";
> | return "?";
> | }
> |
> | const char *
> | socktype_str(int socktype)
> | {
> | if (socktype == SOCK_DGRAM)
> | return "SOCK_DGRAM";
> | if (socktype == SOCK_STREAM)
> | return "SOCK_STREAM";
> | if (socktype == SOCK_RAW)
> | return "SOCK_RAW";
> | return "?";
> | }
> |
> | const char *
> | protocol_str(int protocol)
> | {
> | if (protocol == IPPROTO_TCP)
> | return "IPPROTO_TCP";
> | if (protocol == IPPROTO_UDP)
> | return "IPPROTO_UDP";
> | return "?";
> | }
> |
> | int
> | main(int argc, char **argv)
> | {
> | static char host[1024];
> | static char serv[1024];
> |
> | struct addrinfo hints;
> | memset(&hints, (char) 0, sizeof(hints));
> | hints.ai_family = AF_UNSPEC;
> | hints.ai_socktype = SOCK_STREAM;
> |
> | if (argc != 3) {
> | fprintf(stderr, "Usage: %s host port\n", argv[0]);
> | return 1;
> | }
> |
> | struct addrinfo *addrs;
> | int rc = getaddrinfo(argv[1], argv[2], &hints, &addrs);
> | if (rc != 0) {
> | fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rc));
> | exit(1);
> | }
> |
> | int flags = NI_NUMERICHOST | NI_NUMERICSERV;
> | struct addrinfo *addr;
> | for (addr = addrs; addr != NULL; addr = addr->ai_next) {
> | int rc = getnameinfo(addr->ai_addr, addr->ai_addrlen,
> | host, 1024, serv, 1024, flags);
> | if (rc != 0) {
> | fprintf(stderr, "getnameinfo error\n");
> | exit(1);
> | }
> | printf("----\n");
> | printf("family: %s\n", family_str(addr->ai_family));
> | printf("socktype: %s\n", socktype_str(addr->ai_socktype));
> | printf("protocol: %s\n", protocol_str(addr->ai_protocol));
> | printf("host: %s\n", host);
> | printf("serv: %s\n", serv);
> |
> | #if 0
> | printf("Connecting...\n");
> | int fd = socket(addr->ai_family, addr->ai_socktype, 0);
> | if (connect(fd, addr->ai_addr, addr->ai_addrlen) == -1) {
> | fprintf(stderr, "connect errno: %d\n", errno);
> | perror("connect");
> | } else {
> | printf("connected successfully\n");
> | }
> | #endif
> | }
> |
> | freeaddrinfo(addrs);
> |
> | return 0;
> | }
>
> (Linux/glibc) $ ./getaddrinfo 8.8.8.8 123456
> ----
> family: AF_INET
> socktype: SOCK_STREAM
> protocol: IPPROTO_TCP
> host: 8.8.8.8
> serv: 57920
>
> (FreeBSD) $ ./getaddrinfo 8.8.8.8 123456
> getaddrinfo: Service was not recognized for socket type
>
> So obvious fix would be change 123456 to something less or equal to
> 65535. Say, 1234.
>
Ok, sure, changed.
> > diff --git a/test/box/net.box_wait_connected_gh-3856.test.lua b/test/box/net.box_wait_connected_gh-3856.test.lua
> > index 29e997fb5..d9fa80f3f 100644
> > --- a/test/box/net.box_wait_connected_gh-3856.test.lua
> > +++ b/test/box/net.box_wait_connected_gh-3856.test.lua
> > @@ -1,8 +1,12 @@
> > net = require('net.box')
> > +test_run = require('test_run').new()
> >
> > --
> > -- gh-3856: wait_connected = false is ignored.
> > +-- Test uses Google DNS IP for testing:
> > +-- https://developers.google.com/speed/public-dns/docs/using
> > --
> > c = net.connect('8.8.8.8:123456', {wait_connected = false})
> > +test_run:wait_cond(function() return c.state == 'initial' end)
> > c
> > c:close()
>
> It should not work and does not. I checked it with the following command
> on a FreeBSD virtual machine:
>
> $ ( cd test && ./test-run.py -j 20 `yes box/net.box_wait_connected_gh-3856.test.lua | head -n 1000` )
>
> The 123456 -> 1234 change, however, passes.
>
Right, I see it too now.
> The test still depend on an order in which fibers will be scheduled
> (net_box.connect() creates a separate fiber for connecting in background
> using fiber.create(), which yields). Unlikely our fiber will not get
> execution time during the connection attempt, so it is more like a
> formal thing.
>
> But we can decrease probability of this situation even more if we'll
> grab all connection fields just when net_box.connect() returns, not
> after yield in console (which is due to waiting a next command from
> test-run).
>
> Consider this way:
>
> | $ cat test/box/net.box_wait_connected_gh-3856.test.lua
> | net = require('net.box')
> |
> | --
> | -- gh-3856: wait_connected = false is ignored.
> | --
> | do \
> | c = net.connect('8.8.8.8:1234', {wait_connected = false}) \
> | return c.state \
> | end
> | c:close()
>
Ok, I've used your code as you suggested, thank you.
> CCed Vlad as author of the test case.
>
Sure, I'll send him too.
> WBR, Alexander Turenko.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-06-20 4:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16 14:25 [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856 Alexander V. Tikhonov
2020-06-18 18:25 ` Alexander Turenko
2020-06-20 4:47 ` Alexander V. Tikhonov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox