Hi!

Thanks for the comments!

5 answers are below.

Sent v3 considering discussed questions.

Воскресенье, 8 декабря 2019, 18:52 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:

Hi! Thanks for the patch!

Please, keep my comments near your answers. Because
I don't remember all my comments, and for each your
answer I need to lookup my original email to remember,
what was the problem.

This is usually simpler both to you and a reviewer,
when your answers are inlined into the original email.

Also don't drop the comments. Here is my comment 2:

>> 2. I understand, why it is 'connect time' in the previous
>> hunk. But here the socket is already connected. So it is
>> not connect time.
>>
>> On the other hand, there are places right after connect,
>> which don't use this timeout. For example, line 457
>> in socket.result file:
>>
>> sc:sysconnect('127.0.0.1', s:name().port) or errno() == errno.EINPROGRESS
>> ---
>> - true
>> ...
>> sc:writable(10)
>> ---
>> - true
>> ...
>>
>> Here it is 10 seconds, while in other places you started
>> using 240.
And it is still actual. Why is it called 'CONNECT_TIME' if
it has nothing to do with connect in most of the places?
The problem is in the name. I would introduce one timeout
for conds, connects, readable, etc. You even already have
it - COND_TIME.
2. Ok, I see the problem here. I will carefully reconsider such
places to make them consistent.


On 06/12/2019 17:01, Ilya Kosarev wrote:
> Hi!
>
> Thanks for your review.
>
> 4. I run socket.test to simulate "high load" in parallel with itself
> (while ./test-run.py -j20 `for r in {1..64} ; do echo socket ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done).

I tried this example on your branch and got that:

=========================================================================
[009] --- app/socket.result Sun Dec 8 15:40:58 2019
[009] +++ app/socket.reject Sun Dec 8 16:09:23 2019
[009] @@ -2670,13 +2670,17 @@
[009] -- case: recvfrom reads first 512 bytes from the message with tcp
[009] received_message, from = receiving_socket:recvfrom()
[009] ---
[009] +- error: '[string "received_message, from = receiving_socket:rec..."]:1: attempt to
[009] + index global ''receiving_socket'' (a nil value)'
[009] ...
[009] e = receiving_socket:errno()
[009] ---
[009] +- error: '[string "e = receiving_socket:errno() "]:1: attempt to index global ''receiving_socket''
[009] + (a nil value)'
[009] ...
[009] received_message == message:sub(1, 512) -- expected true
[009] ---
[009] -- true
[009] +- false
[009] ...
[009] received_message:len() == 512 -- expected true
[009] ---
[009] @@ -2684,7 +2688,7 @@
[009] ...
[009] received_message
[009] ---
[009] -- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[009] +- yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
[009] ...
[009] received_message:len()
[009] ---
[009] @@ -2692,11 +2696,13 @@
[009] ...
[009] from == nil -- expected true
[009] ---
[009] -- true
[009] +- false
[009] ...
[009] from
[009] ---
[009] -- null
[009] +- host: 127.0.0.1
[009] + family: AF_INET
[009] + port: 57721
[009] ...
[009] e == 0 -- expected true
[009] ---
[009] @@ -2709,33 +2715,39 @@
[009] -- case: recvfrom does not discard the message tail with tcp
[009] received_message, from = receiving_socket:recvfrom()
[009] ---
[009] +- error: '[string "received_message, from = receiving_socket:rec..."]:1: attempt to
[009] + index global ''receiving_socket'' (a nil value)'
[009] ...
[009] e = receiving_socket:errno()
[009] ---
[009] +- error: '[string "e = receiving_socket:errno() "]:1: attempt to index global ''receiving_socket''
[009] + (a nil value)'
[009] ...
[009] received_message == message:sub(513, 513) -- expected true
[009] ---
[009] -- true
[009] +- false
[009] ...
[009] received_message:len() == 1 -- expected true
[009] ---
[009] -- true
[009] +- false
[009] ...
[009] received_message
[009] ---
[009] -- x
[009] +- yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
[009] ...
[009] received_message:len()
[009] ---
[009] -- 1
[009] +- 512
[009] ...
[009] from == nil -- expected true
[009] ---
[009] -- true
[009] +- false
[009] ...
[009] from
[009] ---
[009] -- null
[009] +- host: 127.0.0.1
[009] + family: AF_INET
[009] + port: 57721
[009] ...
[009] e == 0 -- expected true
[009] ---
[009] @@ -2747,7 +2759,8 @@
[009] ...
[009] receiving_socket:close()
[009] ---
[009] -- true
[009] +- error: '[string "return receiving_socket:close() "]:1: attempt to index global ''receiving_socket''
[009] + (a nil value)'
[009] ...
[009] sending_socket:close()
[009] ---
[009]
=========================================================================

Sometimes I get this:

=========================================================================
[008] app/socket.test.lua [ fail ]
[008]
[008] Test failed! Result content mismatch:
[008] --- app/socket.result Sun Dec 8 15:40:58 2019
[008] +++ app/socket.reject Sun Dec 8 16:11:27 2019
[008] @@ -2340,23 +2340,24 @@
[008] ...
[008] received_message == '' -- expected true
[008] ---
[008] -- true
[008] +- false
[008] ...
[008] received_message
[008] ---
[008] --
[008] +- null
[008] ...
[008] from ~= nil -- expected true
[008] ---
[008] -- true
[008] +- false
[008] ...
[008] from.host == '127.0.0.1' -- expected true
[008] ---
[008] -- true
[008] +- error: '[string "return from.host == ''127.0.0.1'' -- expected t..."]:1: attempt
[008] + to index global ''from'' (a nil value)'
[008] ...
[008] e == 0 -- expected true
[008] ---
[008] -- true
[008] +- false
[008] ...
[008] -- case: recv, no datagram, explicit size
[008] received_message = receiving_socket:recv(512)
[008] @@ -2386,23 +2387,25 @@
[008] ...
[008] received_message == nil -- expected true
[008] ---
[008] -- true
[008] +- false
[008] ...
[008] received_message
[008] ---
[008] -- null
[008] +-
[008] ...
[008] from == nil -- expected true
[008] ---
[008] -- true
[008] +- false
[008] ...
[008] from
[008] ---
[008] -- null
[008] +- host: 127.0.0.1
[008] + family: AF_INET
[008] + port: 59867
[008] ...
[008] e == errno.EAGAIN -- expected true
[008] ---
[008] -- true
[008] +- false
[008] ...
[008] -- case: recv, zero datagram, explicit size
[008] sendto_zero(sending_socket, '127.0.0.1', receiving_socket_port)
=========================================================================

So socket.test.lua still is unstable. The command with 64
socket.test.lua gives me an error in 100% cases.
4. Ok, I guess these are mac os specific issues. I haven't got mac os
machine access yet, however, I hope to get it soon and fix these problems.
I still tried to stabilize these cases; if it don't work, I will reconsider them later.


> Using fiber.time() as random seed leads to the same results produced
> on parallel tasks. Using instance uuid as a seed seems to be a good
> decision.

I thought that it is strange, because UUIDs are also generated
randomly, and by the same random generator as is used in Lua.
But now I see why math.random() gives you the same numbers
even though UUIDs are different: Lua math.random() has own
random number generator. Not from the standard library.

> 5. Well, considering the new seed generation, 10s seem to be enough.

So you could skip this strange extraction of a number from
UUID, but increase the timeout? I vote for it. Because the
failing example with 64 instances of socket.test.lua is very
artificial, and even then it will work in case the timeout
is big enough, right? UUID parsing adds unnecessary complexity
and raises questions like I asked.
5. Ok, I agree.


> 6. If we close socket in the main fiber, we might attempt to use closed
> socket in background fiber.

And what? We don't care anymore for the background
fiber after the socket is not needed anymore. Ok, it will
stumble into errors in that fiber, but how does it affect
further execution of the test?
6. It affects further test execution, as far as it leads
to the following error:
[015]  c:receive("*l", "Line: ")
[015]  ---
[015]  - null
[015] -- closed
[015] -- 'Line: Fu'
[015] +- Connection reset by peer


>
> 7. Of course it shouldn't be nil and test will fail in this case. We
> just want to be sure that test won't hang even if server is still nil.
> Otherwise we will hang on chan:put(), which is not nice.

If it will always be nil in case of an error, we won't notice the
problem. Please, drop it. Increase a timeout somewhere if necessary,
but consideration of a test passed, if in fact it didn't, is worse.
7. We don't consider the test passed because of this construction.
We will notice the problem, as far as test_run:wait_cond()
will return false instead of true, and also receive1 & receive2
values will be nil instead of '00000' & '1111122222'.
We just don't want to hang on chan:put and wait for extra 120 sec.
for test to fail, that is why it is hidden under if clause.



--
Ilya Kosarev