* [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856 @ 2020-06-16 14:25 Alexander V. Tikhonov 2020-06-18 18:25 ` Alexander Turenko 0 siblings, 1 reply; 3+ messages in thread From: Alexander V. Tikhonov @ 2020-06-16 14:25 UTC (permalink / raw) To: Oleg Piskunov, Sergey Bronnikov; +Cc: tarantool-patches, Alexander Turenko Found issue running test on FreeBSD VBox host: [011] --- box/net.box_wait_connected_gh-3856.result Mon Jun 15 09:39:49 2020 [011] +++ box/net.box_wait_connected_gh-3856.reject Fri May 8 08:23:30 2020 [011] @@ -12,7 +12,8 @@ [011] - opts: [011] wait_connected: false [011] host: 8.8.8.8 [011] - state: initial [011] + state: error [011] + error: Invalid argument [011] port: '123456' [011] ... [011] c:close() The test uses external Google DNS IP, check information on it: https://developers.google.com/speed/public-dns/docs/using This issue appears because the link is external and connection may fail from time to time. In this case the test should wait till connection state became 'initial' and only after that the test can continue. Closes #5083 --- Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5083-net-box-google-dns Issue: https://github.com/tarantool/tarantool/issues/5083 test/box/net.box_wait_connected_gh-3856.result | 9 +++++++++ test/box/net.box_wait_connected_gh-3856.test.lua | 4 ++++ 2 files changed, 13 insertions(+) diff --git a/test/box/net.box_wait_connected_gh-3856.result b/test/box/net.box_wait_connected_gh-3856.result index 9234e6cb9..6b8a94b43 100644 --- a/test/box/net.box_wait_connected_gh-3856.result +++ b/test/box/net.box_wait_connected_gh-3856.result @@ -1,12 +1,21 @@ net = require('net.box') --- ... +test_run = require('test_run').new() +--- +... -- -- gh-3856: wait_connected = false is ignored. +-- Test uses Google DNS IP for testing: +-- https://developers.google.com/speed/public-dns/docs/using -- c = net.connect('8.8.8.8:123456', {wait_connected = false}) --- ... +test_run:wait_cond(function() return c.state == 'initial' end) +--- +- true +... c --- - opts: diff --git a/test/box/net.box_wait_connected_gh-3856.test.lua b/test/box/net.box_wait_connected_gh-3856.test.lua index 29e997fb5..d9fa80f3f 100644 --- a/test/box/net.box_wait_connected_gh-3856.test.lua +++ b/test/box/net.box_wait_connected_gh-3856.test.lua @@ -1,8 +1,12 @@ net = require('net.box') +test_run = require('test_run').new() -- -- gh-3856: wait_connected = false is ignored. +-- Test uses Google DNS IP for testing: +-- https://developers.google.com/speed/public-dns/docs/using -- c = net.connect('8.8.8.8:123456', {wait_connected = false}) +test_run:wait_cond(function() return c.state == 'initial' end) c c:close() -- 2.17.1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856 2020-06-16 14:25 [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856 Alexander V. Tikhonov @ 2020-06-18 18:25 ` Alexander Turenko 2020-06-20 4:47 ` Alexander V. Tikhonov 0 siblings, 1 reply; 3+ messages in thread From: Alexander Turenko @ 2020-06-18 18:25 UTC (permalink / raw) To: Alexander V. Tikhonov Cc: Oleg Piskunov, tarantool-patches, Vladislav Shpilevoy The reason of the fail is that getaddrinfo() returns EIA_SERVICE for an incorrect TCP/IP port on FreeBSD, but crops it as modulo of 65536 on Linux/glibc. You may check it youself: | /* cc getaddrinfo.c -o getaddrinfo */ | | #include <sys/types.h> | #include <sys/socket.h> | #include <netdb.h> | #include <netinet/in.h> | #include <stdio.h> | #include <stdlib.h> | #include <string.h> | #include <errno.h> | | const char * | family_str(int family) | { | if (family == AF_INET) | return "AF_INET"; | if (family == AF_INET6) | return "AF_INET6"; | return "?"; | } | | const char * | socktype_str(int socktype) | { | if (socktype == SOCK_DGRAM) | return "SOCK_DGRAM"; | if (socktype == SOCK_STREAM) | return "SOCK_STREAM"; | if (socktype == SOCK_RAW) | return "SOCK_RAW"; | return "?"; | } | | const char * | protocol_str(int protocol) | { | if (protocol == IPPROTO_TCP) | return "IPPROTO_TCP"; | if (protocol == IPPROTO_UDP) | return "IPPROTO_UDP"; | return "?"; | } | | int | main(int argc, char **argv) | { | static char host[1024]; | static char serv[1024]; | | struct addrinfo hints; | memset(&hints, (char) 0, sizeof(hints)); | hints.ai_family = AF_UNSPEC; | hints.ai_socktype = SOCK_STREAM; | | if (argc != 3) { | fprintf(stderr, "Usage: %s host port\n", argv[0]); | return 1; | } | | struct addrinfo *addrs; | int rc = getaddrinfo(argv[1], argv[2], &hints, &addrs); | if (rc != 0) { | fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rc)); | exit(1); | } | | int flags = NI_NUMERICHOST | NI_NUMERICSERV; | struct addrinfo *addr; | for (addr = addrs; addr != NULL; addr = addr->ai_next) { | int rc = getnameinfo(addr->ai_addr, addr->ai_addrlen, | host, 1024, serv, 1024, flags); | if (rc != 0) { | fprintf(stderr, "getnameinfo error\n"); | exit(1); | } | printf("----\n"); | printf("family: %s\n", family_str(addr->ai_family)); | printf("socktype: %s\n", socktype_str(addr->ai_socktype)); | printf("protocol: %s\n", protocol_str(addr->ai_protocol)); | printf("host: %s\n", host); | printf("serv: %s\n", serv); | | #if 0 | printf("Connecting...\n"); | int fd = socket(addr->ai_family, addr->ai_socktype, 0); | if (connect(fd, addr->ai_addr, addr->ai_addrlen) == -1) { | fprintf(stderr, "connect errno: %d\n", errno); | perror("connect"); | } else { | printf("connected successfully\n"); | } | #endif | } | | freeaddrinfo(addrs); | | return 0; | } (Linux/glibc) $ ./getaddrinfo 8.8.8.8 123456 ---- family: AF_INET socktype: SOCK_STREAM protocol: IPPROTO_TCP host: 8.8.8.8 serv: 57920 (FreeBSD) $ ./getaddrinfo 8.8.8.8 123456 getaddrinfo: Service was not recognized for socket type So obvious fix would be change 123456 to something less or equal to 65535. Say, 1234. > diff --git a/test/box/net.box_wait_connected_gh-3856.test.lua b/test/box/net.box_wait_connected_gh-3856.test.lua > index 29e997fb5..d9fa80f3f 100644 > --- a/test/box/net.box_wait_connected_gh-3856.test.lua > +++ b/test/box/net.box_wait_connected_gh-3856.test.lua > @@ -1,8 +1,12 @@ > net = require('net.box') > +test_run = require('test_run').new() > > -- > -- gh-3856: wait_connected = false is ignored. > +-- Test uses Google DNS IP for testing: > +-- https://developers.google.com/speed/public-dns/docs/using > -- > c = net.connect('8.8.8.8:123456', {wait_connected = false}) > +test_run:wait_cond(function() return c.state == 'initial' end) > c > c:close() It should not work and does not. I checked it with the following command on a FreeBSD virtual machine: $ ( cd test && ./test-run.py -j 20 `yes box/net.box_wait_connected_gh-3856.test.lua | head -n 1000` ) The 123456 -> 1234 change, however, passes. The test still depend on an order in which fibers will be scheduled (net_box.connect() creates a separate fiber for connecting in background using fiber.create(), which yields). Unlikely our fiber will not get execution time during the connection attempt, so it is more like a formal thing. But we can decrease probability of this situation even more if we'll grab all connection fields just when net_box.connect() returns, not after yield in console (which is due to waiting a next command from test-run). Consider this way: | $ cat test/box/net.box_wait_connected_gh-3856.test.lua | net = require('net.box') | | -- | -- gh-3856: wait_connected = false is ignored. | -- | do \ | c = net.connect('8.8.8.8:1234', {wait_connected = false}) \ | return c.state \ | end | c:close() CCed Vlad as author of the test case. WBR, Alexander Turenko. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856 2020-06-18 18:25 ` Alexander Turenko @ 2020-06-20 4:47 ` Alexander V. Tikhonov 0 siblings, 0 replies; 3+ messages in thread From: Alexander V. Tikhonov @ 2020-06-20 4:47 UTC (permalink / raw) To: Alexander Turenko; +Cc: tarantool-patches, Vladislav Shpilevoy Hi Alexander, thanks for your deep review, please chec my comments below. On Thu, Jun 18, 2020 at 09:25:13PM +0300, Alexander Turenko wrote: > The reason of the fail is that getaddrinfo() returns EIA_SERVICE for an > incorrect TCP/IP port on FreeBSD, but crops it as modulo of 65536 on > Linux/glibc. You may check it youself: > > | /* cc getaddrinfo.c -o getaddrinfo */ > | > | #include <sys/types.h> > | #include <sys/socket.h> > | #include <netdb.h> > | #include <netinet/in.h> > | #include <stdio.h> > | #include <stdlib.h> > | #include <string.h> > | #include <errno.h> > | > | const char * > | family_str(int family) > | { > | if (family == AF_INET) > | return "AF_INET"; > | if (family == AF_INET6) > | return "AF_INET6"; > | return "?"; > | } > | > | const char * > | socktype_str(int socktype) > | { > | if (socktype == SOCK_DGRAM) > | return "SOCK_DGRAM"; > | if (socktype == SOCK_STREAM) > | return "SOCK_STREAM"; > | if (socktype == SOCK_RAW) > | return "SOCK_RAW"; > | return "?"; > | } > | > | const char * > | protocol_str(int protocol) > | { > | if (protocol == IPPROTO_TCP) > | return "IPPROTO_TCP"; > | if (protocol == IPPROTO_UDP) > | return "IPPROTO_UDP"; > | return "?"; > | } > | > | int > | main(int argc, char **argv) > | { > | static char host[1024]; > | static char serv[1024]; > | > | struct addrinfo hints; > | memset(&hints, (char) 0, sizeof(hints)); > | hints.ai_family = AF_UNSPEC; > | hints.ai_socktype = SOCK_STREAM; > | > | if (argc != 3) { > | fprintf(stderr, "Usage: %s host port\n", argv[0]); > | return 1; > | } > | > | struct addrinfo *addrs; > | int rc = getaddrinfo(argv[1], argv[2], &hints, &addrs); > | if (rc != 0) { > | fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rc)); > | exit(1); > | } > | > | int flags = NI_NUMERICHOST | NI_NUMERICSERV; > | struct addrinfo *addr; > | for (addr = addrs; addr != NULL; addr = addr->ai_next) { > | int rc = getnameinfo(addr->ai_addr, addr->ai_addrlen, > | host, 1024, serv, 1024, flags); > | if (rc != 0) { > | fprintf(stderr, "getnameinfo error\n"); > | exit(1); > | } > | printf("----\n"); > | printf("family: %s\n", family_str(addr->ai_family)); > | printf("socktype: %s\n", socktype_str(addr->ai_socktype)); > | printf("protocol: %s\n", protocol_str(addr->ai_protocol)); > | printf("host: %s\n", host); > | printf("serv: %s\n", serv); > | > | #if 0 > | printf("Connecting...\n"); > | int fd = socket(addr->ai_family, addr->ai_socktype, 0); > | if (connect(fd, addr->ai_addr, addr->ai_addrlen) == -1) { > | fprintf(stderr, "connect errno: %d\n", errno); > | perror("connect"); > | } else { > | printf("connected successfully\n"); > | } > | #endif > | } > | > | freeaddrinfo(addrs); > | > | return 0; > | } > > (Linux/glibc) $ ./getaddrinfo 8.8.8.8 123456 > ---- > family: AF_INET > socktype: SOCK_STREAM > protocol: IPPROTO_TCP > host: 8.8.8.8 > serv: 57920 > > (FreeBSD) $ ./getaddrinfo 8.8.8.8 123456 > getaddrinfo: Service was not recognized for socket type > > So obvious fix would be change 123456 to something less or equal to > 65535. Say, 1234. > Ok, sure, changed. > > diff --git a/test/box/net.box_wait_connected_gh-3856.test.lua b/test/box/net.box_wait_connected_gh-3856.test.lua > > index 29e997fb5..d9fa80f3f 100644 > > --- a/test/box/net.box_wait_connected_gh-3856.test.lua > > +++ b/test/box/net.box_wait_connected_gh-3856.test.lua > > @@ -1,8 +1,12 @@ > > net = require('net.box') > > +test_run = require('test_run').new() > > > > -- > > -- gh-3856: wait_connected = false is ignored. > > +-- Test uses Google DNS IP for testing: > > +-- https://developers.google.com/speed/public-dns/docs/using > > -- > > c = net.connect('8.8.8.8:123456', {wait_connected = false}) > > +test_run:wait_cond(function() return c.state == 'initial' end) > > c > > c:close() > > It should not work and does not. I checked it with the following command > on a FreeBSD virtual machine: > > $ ( cd test && ./test-run.py -j 20 `yes box/net.box_wait_connected_gh-3856.test.lua | head -n 1000` ) > > The 123456 -> 1234 change, however, passes. > Right, I see it too now. > The test still depend on an order in which fibers will be scheduled > (net_box.connect() creates a separate fiber for connecting in background > using fiber.create(), which yields). Unlikely our fiber will not get > execution time during the connection attempt, so it is more like a > formal thing. > > But we can decrease probability of this situation even more if we'll > grab all connection fields just when net_box.connect() returns, not > after yield in console (which is due to waiting a next command from > test-run). > > Consider this way: > > | $ cat test/box/net.box_wait_connected_gh-3856.test.lua > | net = require('net.box') > | > | -- > | -- gh-3856: wait_connected = false is ignored. > | -- > | do \ > | c = net.connect('8.8.8.8:1234', {wait_connected = false}) \ > | return c.state \ > | end > | c:close() > Ok, I've used your code as you suggested, thank you. > CCed Vlad as author of the test case. > Sure, I'll send him too. > WBR, Alexander Turenko. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-06-20 4:47 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-16 14:25 [Tarantool-patches] [PATCH v1] test: flaky box/net.box_wait_connected_gh-3856 Alexander V. Tikhonov 2020-06-18 18:25 ` Alexander Turenko 2020-06-20 4:47 ` Alexander V. Tikhonov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox