Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Ilya Kosarev" <i.kosarev@tarantool.org>
To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] test: fix flaky socket test
Date: Tue, 10 Dec 2019 17:36:53 +0300	[thread overview]
Message-ID: <1575988613.571435536@f381.i.mail.ru> (raw)
In-Reply-To: <075c5646-976c-b673-2639-634333bf1eaf@tarantool.org>

[-- Attachment #1: Type: text/plain, Size: 10736 bytes --]

Hi!

Thanks for the comments!

5 answers are below.

Sent v3 considering discussed questions.

>Воскресенье,  8 декабря 2019, 18:52 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
>
>Hi! Thanks for the patch!
>
>Please, keep my comments near your answers. Because
>I don't remember all my comments, and for each your
>answer I need to lookup my original email to remember,
>what was the problem.
>
>This is usually simpler both to you and a reviewer,
>when your answers are inlined into the original email.
>
>Also don't drop the comments. Here is my comment 2:
>
>>> 2. I understand, why it is 'connect time' in the previous
>>> hunk. But here the socket is already connected. So it is
>>> not connect time.
>>> 
>>> On the other hand, there are places right after connect,
>>> which don't use this timeout. For example, line 457
>>> in socket.result file:
>>> 
>>>     sc:sysconnect('127.0.0.1', s:name().port) or errno() == errno.EINPROGRESS
>>>     ---
>>>     - true
>>>     ...
>>>     sc:writable(10)
>>>     ---
>>>     - true
>>>     ...
>>> 
>>> Here it is 10 seconds, while in other places you started
>>> using 240.
>And it is still actual. Why is it called 'CONNECT_TIME' if
>it has nothing to do with connect in most of the places?
>The problem is in the name. I would introduce one timeout
>for conds, connects, readable, etc. You even already have
>it - COND_TIME. 
2. Ok, I see the problem here. I will carefully reconsider such
places to make them consistent.
>
>
>On 06/12/2019 17:01, Ilya Kosarev wrote:
>> Hi!
>> 
>> Thanks for your review.
>> 
>> 4. I run socket.test to simulate "high load" in parallel with itself
>> (while ./test-run.py -j20 `for r in {1..64} ; do echo socket ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done).
>
>I tried this example on your branch and got that:
>
>=========================================================================
>[009] --- app/socket.result	Sun Dec  8 15:40:58 2019
>[009] +++ app/socket.reject	Sun Dec  8 16:09:23 2019
>[009] @@ -2670,13 +2670,17 @@
>[009]  -- case: recvfrom reads first 512 bytes from the message with tcp
>[009]  received_message, from = receiving_socket:recvfrom()
>[009]  ---
>[009] +- error: '[string "received_message, from = receiving_socket:rec..."]:1: attempt to
>[009] +    index global ''receiving_socket'' (a nil value)'
>[009]  ...
>[009]  e = receiving_socket:errno()
>[009]  ---
>[009] +- error: '[string "e = receiving_socket:errno() "]:1: attempt to index global ''receiving_socket''
>[009] +    (a nil value)'
>[009]  ...
>[009]  received_message == message:sub(1, 512) -- expected true
>[009]  ---
>[009] -- true
>[009] +- false
>[009]  ...
>[009]  received_message:len() == 512 -- expected true
>[009]  ---
>[009] @@ -2684,7 +2688,7 @@
>[009]  ...
>[009]  received_message
>[009]  ---
>[009] -- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>[009] +- yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
>[009]  ...
>[009]  received_message:len()
>[009]  ---
>[009] @@ -2692,11 +2696,13 @@
>[009]  ...
>[009]  from == nil -- expected true
>[009]  ---
>[009] -- true
>[009] +- false
>[009]  ...
>[009]  from
>[009]  ---
>[009] -- null
>[009] +- host: 127.0.0.1
>[009] +  family: AF_INET
>[009] +  port: 57721
>[009]  ...
>[009]  e == 0 -- expected true
>[009]  ---
>[009] @@ -2709,33 +2715,39 @@
>[009]  -- case: recvfrom does not discard the message tail with tcp
>[009]  received_message, from = receiving_socket:recvfrom()
>[009]  ---
>[009] +- error: '[string "received_message, from = receiving_socket:rec..."]:1: attempt to
>[009] +    index global ''receiving_socket'' (a nil value)'
>[009]  ...
>[009]  e = receiving_socket:errno()
>[009]  ---
>[009] +- error: '[string "e = receiving_socket:errno() "]:1: attempt to index global ''receiving_socket''
>[009] +    (a nil value)'
>[009]  ...
>[009]  received_message == message:sub(513, 513) -- expected true
>[009]  ---
>[009] -- true
>[009] +- false
>[009]  ...
>[009]  received_message:len() == 1 -- expected true
>[009]  ---
>[009] -- true
>[009] +- false
>[009]  ...
>[009]  received_message
>[009]  ---
>[009] -- x
>[009] +- yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
>[009]  ...
>[009]  received_message:len()
>[009]  ---
>[009] -- 1
>[009] +- 512
>[009]  ...
>[009]  from == nil -- expected true
>[009]  ---
>[009] -- true
>[009] +- false
>[009]  ...
>[009]  from
>[009]  ---
>[009] -- null
>[009] +- host: 127.0.0.1
>[009] +  family: AF_INET
>[009] +  port: 57721
>[009]  ...
>[009]  e == 0 -- expected true
>[009]  ---
>[009] @@ -2747,7 +2759,8 @@
>[009]  ...
>[009]  receiving_socket:close()
>[009]  ---
>[009] -- true
>[009] +- error: '[string "return receiving_socket:close() "]:1: attempt to index global ''receiving_socket''
>[009] +    (a nil value)'
>[009]  ...
>[009]  sending_socket:close()
>[009]  ---
>[009] 
>=========================================================================
>
>Sometimes I get this:
>
>=========================================================================
>[008] app/socket.test.lua                                             [ fail ]
>[008] 
>[008] Test failed! Result content mismatch:
>[008] --- app/socket.result	Sun Dec  8 15:40:58 2019
>[008] +++ app/socket.reject	Sun Dec  8 16:11:27 2019
>[008] @@ -2340,23 +2340,24 @@
>[008]  ...
>[008]  received_message == '' -- expected true
>[008]  ---
>[008] -- true
>[008] +- false
>[008]  ...
>[008]  received_message
>[008]  ---
>[008] -- 
>[008] +- null
>[008]  ...
>[008]  from ~= nil -- expected true
>[008]  ---
>[008] -- true
>[008] +- false
>[008]  ...
>[008]  from.host == '127.0.0.1' -- expected true
>[008]  ---
>[008] -- true
>[008] +- error: '[string "return from.host == ''127.0.0.1'' -- expected t..."]:1: attempt
>[008] +    to index global ''from'' (a nil value)'
>[008]  ...
>[008]  e == 0 -- expected true
>[008]  ---
>[008] -- true
>[008] +- false
>[008]  ...
>[008]  -- case: recv, no datagram, explicit size
>[008]  received_message = receiving_socket:recv(512)
>[008] @@ -2386,23 +2387,25 @@
>[008]  ...
>[008]  received_message == nil -- expected true
>[008]  ---
>[008] -- true
>[008] +- false
>[008]  ...
>[008]  received_message
>[008]  ---
>[008] -- null
>[008] +- 
>[008]  ...
>[008]  from == nil -- expected true
>[008]  ---
>[008] -- true
>[008] +- false
>[008]  ...
>[008]  from
>[008]  ---
>[008] -- null
>[008] +- host: 127.0.0.1
>[008] +  family: AF_INET
>[008] +  port: 59867
>[008]  ...
>[008]  e == errno.EAGAIN -- expected true
>[008]  ---
>[008] -- true
>[008] +- false
>[008]  ...
>[008]  -- case: recv, zero datagram, explicit size
>[008]  sendto_zero(sending_socket, '127.0.0.1', receiving_socket_port)
>=========================================================================
>
>So socket.test.lua still is unstable. The command with 64
>socket.test.lua gives me an error in 100% cases. 
4. Ok, I guess these are mac os specific issues. I haven't got mac os
machine access yet, however, I hope to get it soon and fix these problems.
I still tried to stabilize these cases; if it don't work, I will reconsider them later.
>
>
>> Using fiber.time() as random seed leads to the same results produced
>> on parallel tasks. Using instance uuid as a seed seems to be a good
>> decision.
>
>I thought that it is strange, because UUIDs are also generated
>randomly, and by the same random generator as is used in Lua.
>But now I see why math.random() gives you the same numbers
>even though UUIDs are different: Lua math.random() has own
>random number generator. Not from the standard library.
>
>> 5. Well, considering the new seed generation, 10s seem to be enough.
>
>So you could skip this strange extraction of a number from
>UUID, but increase the timeout? I vote for it. Because the
>failing example with 64 instances of socket.test.lua is very
>artificial, and even then it will work in case the timeout
>is big enough, right? UUID parsing adds unnecessary complexity
>and raises questions like I asked. 
5. Ok, I agree.
>
>
>> 6. If we close socket in the main fiber, we might attempt to use closed
>> socket in background fiber.
>
>And what? We don't care anymore for the background
>fiber after the socket is not needed anymore. Ok, it will
>stumble into errors in that fiber, but how does it affect
>further execution of the test? 
6. It affects further test execution, as far as it leads
to the following error:
[015]  c:receive("*l", "Line: ")
[015]  ---
[015]  - null
[015] -- closed
[015] -- 'Line: Fu'
[015] +- Connection reset by peer
>
>
>> 
>> 7. Of course it shouldn't be nil and test will fail in this case. We
>> just want to be sure that test won't hang even if server is still nil.
>> Otherwise we will hang on chan:put(), which is not nice.
>
>If it will always be nil in case of an error, we won't notice the
>problem. Please, drop it. Increase a timeout somewhere if necessary,
>but consideration of a test passed, if in fact it didn't, is worse. 
7. We don't consider the test passed because of this construction.
We will notice the problem, as far as  test_run:wait_cond()
will return false instead of true, and also  receive1 &  receive2
values will be nil instead of  '00000' &  '1111122222'.
We just don't want to hang on chan:put and wait for extra 120 sec.
for test to fail, that is why it is hidden under if clause.
>


-- 
Ilya Kosarev

[-- Attachment #2: Type: text/html, Size: 15239 bytes --]

  reply	other threads:[~2019-12-10 14:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04 13:43 Ilya Kosarev
2019-12-05 21:31 ` Vladislav Shpilevoy
2019-12-06 16:01   ` Ilya Kosarev
2019-12-08 15:52     ` Vladislav Shpilevoy
2019-12-10 14:36       ` Ilya Kosarev [this message]
2019-12-17  0:03         ` Vladislav Shpilevoy
  -- strict thread matches above, loose matches on Subject: below --
2019-11-26 21:45 Ilya Kosarev
2019-11-26  1:19 Ilya Kosarev
2019-08-30 15:28 [tarantool-patches] " Ilya Kosarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1575988613.571435536@f381.i.mail.ru \
    --to=i.kosarev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] test: fix flaky socket test' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox