[Tarantool-patches] [PATCH] Stabilize tcp_connect in test_run:cmd()

Ilya Kosarev i.kosarev at tarantool.org
Tue Nov 26 03:19:27 MSK 2019


Hi!

Thanks for your review.

The real reason of the socket.tcp_connect returning nil in this case
is linux "open files limit":
2019-11-26 02:34:49.426 [12405] main/202/console/unix/: test_run.lua:15 E> test_run:cmd
2019-11-26 02:34:49.427 [12405] main/103/console/unix/:/home/kosarev/tara socket.c:925 !> accept(6): Too many open files
2019-11-26 02:34:49.427 [12405] main/103/console/unix/:/home/kosarev/tara socket.lua:1090 E> accept(fd 6, aka unix/:/home/kosarev/tarantool/test/var/004_replication/master_quorum1.socket-admin) failed: Too many open files
2019-11-26 02:34:49.427 [12405] main/202/console/unix/: test_run.lua:18 E> sock == nil

Sent v2 of the patch considering mentioned drawbacks.


>Понедельник, 25 ноября 2019, 18:31 +03:00 от Alexander Turenko <alexander.turenko at tarantool.org>:
>
>On Sat, Nov 23, 2019 at 05:50:12PM +0300, Ilya Kosarev wrote:
>> For some tests, for example, replication/box_set_replication_stress,
>> socket.tcp_connect() in test_run:cmd() might sometimes fail when
>> running under high load. Now it is fixed.
>> 
>> Closes #193
>> ---
>>  https://github.com/tarantool/test-run/tree/i.kosarev/gh-193-stabilize-test-run-cmd
>>  https://github.com/tarantool/test-run/issues/193
>> 
>>  test_run.lua | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/test_run.lua b/test_run.lua
>> index 63dfdef..0d450bd 100644
>> --- a/test_run.lua
>> +++ b/test_run.lua
>> @@ -11,6 +11,9 @@ local clock = require('clock')
>> 
>>  local function cmd(self, msg)
>>      local sock = socket.tcp_connect(self.host, self.port)
>> +    while sock == nil do
>> +        sock = socket.tcp_connect(self.host, self.port)
>> +    end
>>      local data = msg .. '\n'
>>      sock:send(data)
>
>I'm tentative about possibly infinite loop. I know, test-run will fail a
>hung test, but it would be better to fail gracefully and provide some
>information about an error (is socket.tcp_connect returns something
>about?)
>
>Let's consider using wait_cond or, better, set a connection timeout (if
>possible).
>
>It also interesting what is the reason of the connection error in the
>first place. Whether test-run actually listens at the moment? Maybe it
>unable to proceed much incoming connection requests at time?
>
>Please, don't bury youself with that, but look around briefly.
>
>WBR, Alexander Turenko.


-- 
Ilya Kosarev


More information about the Tarantool-patches mailing list