Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Ilya Kosarev" <i.kosarev@tarantool.org>
To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] test: fix flaky socket test
Date: Fri, 06 Dec 2019 19:01:27 +0300	[thread overview]
Message-ID: <1575648087.956039140@f128.i.mail.ru> (raw)
In-Reply-To: <ec633a19-20d5-fc31-2c67-23d05e26e278@tarantool.org>

[-- Attachment #1: Type: text/plain, Size: 7266 bytes --]


Hi!

Thanks for your review.

4. I run socket.test to simulate "high load" in parallel with itself
(while ./test-run.py -j20 `for r in {1..64} ; do echo socket ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done).
Using fiber.time() as random seed leads to the same results produced
on parallel tasks. Using instance uuid as a seed seems to be a good
decision.
5. Well, considering the new seed generation, 10s seem to be enough.
6. If we close socket in the main fiber, we might attempt to use closed
socket in background fiber.
7. Of course it shouldn't be nil and test will fail in this case. We
just want to be sure that test won't hang even if server is still nil.
Otherwise we will hang on chan:put(), which is not nice. Sent v2 considering mentioned drawbacks.

>Пятница,  6 декабря 2019, 0:31 +03:00 от Vladislav Shpilevoy < v.shpilevoy@tarantool.org >:
>
>Hi! Thanks for the patch!
>
>See 7 comments below.
>
>On 04/12/2019 14:43, Ilya Kosarev wrote:
>> socket.test had a number of flaky problems:
>> - socket readableness expectation
>> - race conditions on socket shutdown in emulation test cases
>> - tcp_server stability in socket receive inconsistent behavior case
>> Now they are solved. Port randomization is improved.
>> Socket test is not fragile anymore.
>> 
>> Closes #4451, #4426, #4469
>> ---
>> Branch:  https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4426-4451-fix-socket-test
>> Issues:  https://github.com/tarantool/tarantool/issues/4426
>>  https://github.com/tarantool/tarantool/issues/4451
>>  https://github.com/tarantool/tarantool/issues/4469
>> 
>>  test/app/socket.result   | 84 +++++++++++++++++++++++++---------------
>>  test/app/socket.test.lua | 60 +++++++++++++++++++---------
>>  test/app/suite.ini       |  1 -
>>  3 files changed, 94 insertions(+), 51 deletions(-)
>> 
>> diff --git a/test/app/socket.result b/test/app/socket.result
>> index fd299424c9..42dde8f375 100644
>> --- a/test/app/socket.result
>> +++ b/test/app/socket.result
>> @@ -45,6 +45,9 @@ test_run:cmd("push filter '(error: .builtin/.*[.]lua):[0-9]+' to '\\1'")
>>  WAIT_COND_TIME = 10
>1. This is now unused.
>
>>  ---
>>  ...
>> +WAIT_TCP_CONNECT_TIME = 240
>> +---
>> +...
>>  socket('PF_INET', 'SOCK_STREAM', 'tcp121222');
>>  ---
>>  - null
>> @@ -107,7 +110,7 @@ s:nonblock(true)
>>  ---
>>  - true
>>  ...
>> -s:readable(.1)
>> +s:readable(WAIT_TCP_CONNECT_TIME)
>>  ---
>>  - true
>>  ...
>> @@ -227,7 +230,7 @@ s:syswrite(ffi.cast('const char *', ping), #ping)
>>  ---
>>  - 6
>>  ...
>> -s:readable(1)
>> +s:readable(WAIT_TCP_CONNECT_TIME)
>
>2. I understand, why it is 'connect time' in the previous
>hunk. But here the socket is already connected. So it is
>not connect time.
>
>On the other hand, there are places right after connect,
>which don't use this timeout. For example, line 457
>in socket.result file:
>
>    sc:sysconnect('127.0.0.1', s:name().port) or errno() == errno.EINPROGRESS
>    ---
>    - true
>    ...
>    sc:writable(10)
>    ---
>    - true
>    ...
>
>Here it is 10 seconds, while in other places you started
>using 240.
>
>>  ---
>>  - true
>>  ...
>> @@ -830,7 +833,7 @@ sc:sendto('127.0.0.1', s:name().port, 'Hello, world')
>>  ---
>>  - 12
>>  ...
>> -s:readable(10)
>> +s:readable(WAIT_TCP_CONNECT_TIME)
>>  ---
>>  - true
>>  ...
>
>3. There is no connect time. This is DGRAM socket with UDP.
>No connections.
>
>The same below. Moreover, there are now 2 waits for
>'connect time' on the same socket.
>
>> @@ -842,7 +845,7 @@ sc:sendto('127.0.0.1', s:name().port, 'Hello, world, 2')
>>  ---
>>  - 15
>>  ...
>> -s:readable(10)
>> +s:readable(WAIT_TCP_CONNECT_TIME)
>>  ---
>>  - true
>>  ...
>> @@ -898,7 +901,7 @@ sc:sendto('127.0.0.1', s:name().port, 'Hello, World!')
>>  ---
>>  - 13
>>  ...
>> -s:readable(1)
>> +s:readable(WAIT_TCP_CONNECT_TIME)
>>  ---
>>  - true
>>  ...
>> @@ -1074,6 +1077,15 @@ master:setsockopt('SOL_SOCKET', 'SO_REUSEADDR', true)
>>  ---
>>  - true
>>  ...
>> +seed = ''
>> +---
>> +...
>> +for d in string.gmatch(box.info.uuid, '%d') do  seed = seed .. d end
>
>4. Well, this is probably the most intricate way to
>get a seed. Have you considered fiber.time()? And why do
>you need it? Isn't it initialized already?
>
>> +---
>> +...
>> +math.randomseed(tonumber(seed))
>> +---
>> +...
>>  port = 32768 + math.random(0, 32767)
>>  ---
>>  ...
>> @@ -1092,7 +1104,7 @@ test_run:wait_cond(function()
>>          return false, master:error()
>>      end
>>      return true
>> -end, WAIT_COND_TIME);
>> +end, 100);
>
>5. Why? 10 seconds was not enough to try a few random
>ports?
>
>>  ---
>>  - true
>>  ...
>> @@ -1822,8 +1834,14 @@ test_run:cmd("setopt delimiter ';'")
>>  ---
>>  - true
>>  ...
>> +socket_opened = true
>>  cfiber = fiber.create(function(sc, rch, wch)
>> -    while sc:send(wch:get()) and rch:put(sc:receive("*l")) do end
>> +    while socket_opened do
>> +        sc:send(wch:get())
>> +        local data = sc:receive("*l")
>> +        if not socket_opened then sc:close() end
>> +        rch:put(data)
>> +    end
>>  end, sc, rch, wch);
>
>6. What was a problem here? In the commit message you said
>'race conditions'. Where? You just moved socket close to
>the background fiber from the main fiber.
>
>>  ---
>>  ...
>> @@ -1936,6 +1954,9 @@ c:receive("*l")
>>  ---
>>  - 
>>  ...
>> +socket_opened = false
>> +---
>> +...
>>  wch:put("Fu")
>>  ---
>>  - true
>> @@ -1944,10 +1965,6 @@ c:send("354 Please type your message\n")
>>  ---
>>  - 29
>>  ...
>> -sc:close()
>> ----
>> -- 1
>> -...
>>  c:receive("*l", "Line: ")
>>  ---
>>  - null
>> @@ -2816,7 +2833,7 @@ test_run:cmd("clear filter")
>>  ---
>>  - true
>>  ...
>> --- case: sicket receive inconsistent behavior
>> +-- case: socket receive inconsistent behavior
>>  chan = fiber.channel()
>>  ---
>>  ...
>> @@ -2826,41 +2843,46 @@ counter = 0
>>  fn = function(s) counter = 0; while true do s:write((tostring(counter)):rep(chan:get())); counter = counter + 1 end end
>>  ---
>>  ...
>> -srv = socket.tcp_server('0.0.0.0', 8888, fn)
>> +srv = nil
>>  ---
>>  ...
>> -s = socket.connect('localhost', 8888)
>> +test_run:cmd("setopt delimiter ';'")
>>  ---
>> +- true
>>  ...
>> -chan:put(5)
>> +test_run:wait_cond(function()
>> +    port = 32768 + math.random(0, 32767)
>> +    srv = socket.tcp_server('0.0.0.0', port, fn)
>> +    return srv ~= nil
>> +end, 100);
>>  ---
>>  - true
>>  ...
>> -chan:put(5)
>> +receive1 = nil; receive2 = nil;
>>  ---
>> -- true
>>  ...
>> -s:receive(5)
>> +if srv ~= nil then
>> +    s = socket.connect('localhost', port)
>> +    chan:put(5)
>> +    chan:put(5)
>> +    receive1 = s:receive(5)
>> +    chan:put(5)
>> +    s:settimeout(1)
>> +    receive2 = s:receive('*a')
>> +    s:close()
>> +    srv:close()
>> +end;
>
>7. How is it possible, that the server still is nil here?
>At least one of 32k ports should have been free.
>
>>  ---
>> -- '00000'
>>  ...
>> -chan:put(5)
>> +test_run:cmd("setopt delimiter ''");
>>  ---
>>  - true
>>  ...
>> -s:settimeout(1)
>> +receive1
>>  ---
>> -- 1
>> +- '00000'
>>  ...
>> -s:receive('*a')
>> +receive2
>>  ---
>>  - '1111122222'
>>  ...
>> -s:close()
>> ----
>> -- 1
>> -...
>> -srv:close()
>> ----
>> -- true
>> -...


-- 
Ilya Kosarev

[-- Attachment #2: Type: text/html, Size: 10714 bytes --]

  reply	other threads:[~2019-12-06 16:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04 13:43 Ilya Kosarev
2019-12-05 21:31 ` Vladislav Shpilevoy
2019-12-06 16:01   ` Ilya Kosarev [this message]
2019-12-08 15:52     ` Vladislav Shpilevoy
2019-12-10 14:36       ` Ilya Kosarev
2019-12-17  0:03         ` Vladislav Shpilevoy
  -- strict thread matches above, loose matches on Subject: below --
2019-11-26 21:45 Ilya Kosarev
2019-11-26  1:19 Ilya Kosarev
2019-08-30 15:28 [tarantool-patches] " Ilya Kosarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1575648087.956039140@f128.i.mail.ru \
    --to=i.kosarev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] test: fix flaky socket test' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox