From: "Ilya Kosarev" <i.kosarev@tarantool.org> To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] test: fix flaky socket test Date: Tue, 10 Dec 2019 17:36:53 +0300 [thread overview] Message-ID: <1575988613.571435536@f381.i.mail.ru> (raw) In-Reply-To: <075c5646-976c-b673-2639-634333bf1eaf@tarantool.org> [-- Attachment #1: Type: text/plain, Size: 10736 bytes --] Hi! Thanks for the comments! 5 answers are below. Sent v3 considering discussed questions. >Воскресенье, 8 декабря 2019, 18:52 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>: > >Hi! Thanks for the patch! > >Please, keep my comments near your answers. Because >I don't remember all my comments, and for each your >answer I need to lookup my original email to remember, >what was the problem. > >This is usually simpler both to you and a reviewer, >when your answers are inlined into the original email. > >Also don't drop the comments. Here is my comment 2: > >>> 2. I understand, why it is 'connect time' in the previous >>> hunk. But here the socket is already connected. So it is >>> not connect time. >>> >>> On the other hand, there are places right after connect, >>> which don't use this timeout. For example, line 457 >>> in socket.result file: >>> >>> sc:sysconnect('127.0.0.1', s:name().port) or errno() == errno.EINPROGRESS >>> --- >>> - true >>> ... >>> sc:writable(10) >>> --- >>> - true >>> ... >>> >>> Here it is 10 seconds, while in other places you started >>> using 240. >And it is still actual. Why is it called 'CONNECT_TIME' if >it has nothing to do with connect in most of the places? >The problem is in the name. I would introduce one timeout >for conds, connects, readable, etc. You even already have >it - COND_TIME. 2. Ok, I see the problem here. I will carefully reconsider such places to make them consistent. > > >On 06/12/2019 17:01, Ilya Kosarev wrote: >> Hi! >> >> Thanks for your review. >> >> 4. I run socket.test to simulate "high load" in parallel with itself >> (while ./test-run.py -j20 `for r in {1..64} ; do echo socket ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done). > >I tried this example on your branch and got that: > >========================================================================= >[009] --- app/socket.result Sun Dec 8 15:40:58 2019 >[009] +++ app/socket.reject Sun Dec 8 16:09:23 2019 >[009] @@ -2670,13 +2670,17 @@ >[009] -- case: recvfrom reads first 512 bytes from the message with tcp >[009] received_message, from = receiving_socket:recvfrom() >[009] --- >[009] +- error: '[string "received_message, from = receiving_socket:rec..."]:1: attempt to >[009] + index global ''receiving_socket'' (a nil value)' >[009] ... >[009] e = receiving_socket:errno() >[009] --- >[009] +- error: '[string "e = receiving_socket:errno() "]:1: attempt to index global ''receiving_socket'' >[009] + (a nil value)' >[009] ... >[009] received_message == message:sub(1, 512) -- expected true >[009] --- >[009] -- true >[009] +- false >[009] ... >[009] received_message:len() == 512 -- expected true >[009] --- >[009] @@ -2684,7 +2688,7 @@ >[009] ... >[009] received_message >[009] --- >[009] -- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx >[009] +- yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy >[009] ... >[009] received_message:len() >[009] --- >[009] @@ -2692,11 +2696,13 @@ >[009] ... >[009] from == nil -- expected true >[009] --- >[009] -- true >[009] +- false >[009] ... >[009] from >[009] --- >[009] -- null >[009] +- host: 127.0.0.1 >[009] + family: AF_INET >[009] + port: 57721 >[009] ... >[009] e == 0 -- expected true >[009] --- >[009] @@ -2709,33 +2715,39 @@ >[009] -- case: recvfrom does not discard the message tail with tcp >[009] received_message, from = receiving_socket:recvfrom() >[009] --- >[009] +- error: '[string "received_message, from = receiving_socket:rec..."]:1: attempt to >[009] + index global ''receiving_socket'' (a nil value)' >[009] ... >[009] e = receiving_socket:errno() >[009] --- >[009] +- error: '[string "e = receiving_socket:errno() "]:1: attempt to index global ''receiving_socket'' >[009] + (a nil value)' >[009] ... >[009] received_message == message:sub(513, 513) -- expected true >[009] --- >[009] -- true >[009] +- false >[009] ... >[009] received_message:len() == 1 -- expected true >[009] --- >[009] -- true >[009] +- false >[009] ... >[009] received_message >[009] --- >[009] -- x >[009] +- yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy >[009] ... >[009] received_message:len() >[009] --- >[009] -- 1 >[009] +- 512 >[009] ... >[009] from == nil -- expected true >[009] --- >[009] -- true >[009] +- false >[009] ... >[009] from >[009] --- >[009] -- null >[009] +- host: 127.0.0.1 >[009] + family: AF_INET >[009] + port: 57721 >[009] ... >[009] e == 0 -- expected true >[009] --- >[009] @@ -2747,7 +2759,8 @@ >[009] ... >[009] receiving_socket:close() >[009] --- >[009] -- true >[009] +- error: '[string "return receiving_socket:close() "]:1: attempt to index global ''receiving_socket'' >[009] + (a nil value)' >[009] ... >[009] sending_socket:close() >[009] --- >[009] >========================================================================= > >Sometimes I get this: > >========================================================================= >[008] app/socket.test.lua [ fail ] >[008] >[008] Test failed! Result content mismatch: >[008] --- app/socket.result Sun Dec 8 15:40:58 2019 >[008] +++ app/socket.reject Sun Dec 8 16:11:27 2019 >[008] @@ -2340,23 +2340,24 @@ >[008] ... >[008] received_message == '' -- expected true >[008] --- >[008] -- true >[008] +- false >[008] ... >[008] received_message >[008] --- >[008] -- >[008] +- null >[008] ... >[008] from ~= nil -- expected true >[008] --- >[008] -- true >[008] +- false >[008] ... >[008] from.host == '127.0.0.1' -- expected true >[008] --- >[008] -- true >[008] +- error: '[string "return from.host == ''127.0.0.1'' -- expected t..."]:1: attempt >[008] + to index global ''from'' (a nil value)' >[008] ... >[008] e == 0 -- expected true >[008] --- >[008] -- true >[008] +- false >[008] ... >[008] -- case: recv, no datagram, explicit size >[008] received_message = receiving_socket:recv(512) >[008] @@ -2386,23 +2387,25 @@ >[008] ... >[008] received_message == nil -- expected true >[008] --- >[008] -- true >[008] +- false >[008] ... >[008] received_message >[008] --- >[008] -- null >[008] +- >[008] ... >[008] from == nil -- expected true >[008] --- >[008] -- true >[008] +- false >[008] ... >[008] from >[008] --- >[008] -- null >[008] +- host: 127.0.0.1 >[008] + family: AF_INET >[008] + port: 59867 >[008] ... >[008] e == errno.EAGAIN -- expected true >[008] --- >[008] -- true >[008] +- false >[008] ... >[008] -- case: recv, zero datagram, explicit size >[008] sendto_zero(sending_socket, '127.0.0.1', receiving_socket_port) >========================================================================= > >So socket.test.lua still is unstable. The command with 64 >socket.test.lua gives me an error in 100% cases. 4. Ok, I guess these are mac os specific issues. I haven't got mac os machine access yet, however, I hope to get it soon and fix these problems. I still tried to stabilize these cases; if it don't work, I will reconsider them later. > > >> Using fiber.time() as random seed leads to the same results produced >> on parallel tasks. Using instance uuid as a seed seems to be a good >> decision. > >I thought that it is strange, because UUIDs are also generated >randomly, and by the same random generator as is used in Lua. >But now I see why math.random() gives you the same numbers >even though UUIDs are different: Lua math.random() has own >random number generator. Not from the standard library. > >> 5. Well, considering the new seed generation, 10s seem to be enough. > >So you could skip this strange extraction of a number from >UUID, but increase the timeout? I vote for it. Because the >failing example with 64 instances of socket.test.lua is very >artificial, and even then it will work in case the timeout >is big enough, right? UUID parsing adds unnecessary complexity >and raises questions like I asked. 5. Ok, I agree. > > >> 6. If we close socket in the main fiber, we might attempt to use closed >> socket in background fiber. > >And what? We don't care anymore for the background >fiber after the socket is not needed anymore. Ok, it will >stumble into errors in that fiber, but how does it affect >further execution of the test? 6. It affects further test execution, as far as it leads to the following error: [015] c:receive("*l", "Line: ") [015] --- [015] - null [015] -- closed [015] -- 'Line: Fu' [015] +- Connection reset by peer > > >> >> 7. Of course it shouldn't be nil and test will fail in this case. We >> just want to be sure that test won't hang even if server is still nil. >> Otherwise we will hang on chan:put(), which is not nice. > >If it will always be nil in case of an error, we won't notice the >problem. Please, drop it. Increase a timeout somewhere if necessary, >but consideration of a test passed, if in fact it didn't, is worse. 7. We don't consider the test passed because of this construction. We will notice the problem, as far as test_run:wait_cond() will return false instead of true, and also receive1 & receive2 values will be nil instead of '00000' & '1111122222'. We just don't want to hang on chan:put and wait for extra 120 sec. for test to fail, that is why it is hidden under if clause. > -- Ilya Kosarev [-- Attachment #2: Type: text/html, Size: 15239 bytes --]
next prev parent reply other threads:[~2019-12-10 14:36 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-12-04 13:43 Ilya Kosarev 2019-12-05 21:31 ` Vladislav Shpilevoy 2019-12-06 16:01 ` Ilya Kosarev 2019-12-08 15:52 ` Vladislav Shpilevoy 2019-12-10 14:36 ` Ilya Kosarev [this message] 2019-12-17 0:03 ` Vladislav Shpilevoy -- strict thread matches above, loose matches on Subject: below -- 2019-11-26 21:45 Ilya Kosarev 2019-11-26 1:19 Ilya Kosarev 2019-08-30 15:28 [tarantool-patches] " Ilya Kosarev
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1575988613.571435536@f381.i.mail.ru \ --to=i.kosarev@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] test: fix flaky socket test' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox