* [Tarantool-patches] [PATCH v2] Stabilize tcp_connect in test_run:cmd()
@ 2019-11-26 17:58 Ilya Kosarev
2019-12-05 1:35 ` Alexander Turenko
0 siblings, 1 reply; 3+ messages in thread
From: Ilya Kosarev @ 2019-11-26 17:58 UTC (permalink / raw)
To: tarantool-patches
For some tests, for example, replication/box_set_replication_stress,
socket.tcp_connect() in test_run:cmd() might sometimes fail with
"Too many open files" error when running under high load. In case of
box_set_replication_stress test it happens because of a huge number of
file descriptors being opened during many box_set_replication calls.
Under high load it takes time for them to be closed. Now we are trying
to perform socket.tcp_connect() under test_run:wait_cond() clause until
we succeed or time is gone.
Closes #193
---
https://github.com/tarantool/test-run/tree/i.kosarev/gh-193-stabilize-test-run-cmd
https://github.com/tarantool/test-run/issues/193
Changes in v2:
- replaced infinite loop with wait_cond
test_run.lua | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/test_run.lua b/test_run.lua
index 63dfdef..804d42c 100644
--- a/test_run.lua
+++ b/test_run.lua
@@ -10,7 +10,11 @@ local errno = require('errno')
local clock = require('clock')
local function cmd(self, msg)
- local sock = socket.tcp_connect(self.host, self.port)
+ local sock = nil
+ self:wait_cond(function()
+ sock = socket.tcp_connect(self.host, self.port)
+ return sock ~= nil
+ end, 100)
local data = msg .. '\n'
sock:send(data)
--
2.17.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] Stabilize tcp_connect in test_run:cmd()
2019-11-26 17:58 [Tarantool-patches] [PATCH v2] Stabilize tcp_connect in test_run:cmd() Ilya Kosarev
@ 2019-12-05 1:35 ` Alexander Turenko
0 siblings, 0 replies; 3+ messages in thread
From: Alexander Turenko @ 2019-12-05 1:35 UTC (permalink / raw)
To: Ilya Kosarev; +Cc: tarantool-patches
Thank you for the investigation and the patch.
Simplified it a bit:
diff --git a/test_run.lua b/test_run.lua
index 804d42c..e37b559 100644
--- a/test_run.lua
+++ b/test_run.lua
@@ -10,10 +10,8 @@ local errno = require('errno')
local clock = require('clock')
local function cmd(self, msg)
- local sock = nil
- self:wait_cond(function()
- sock = socket.tcp_connect(self.host, self.port)
- return sock ~= nil
+ local sock = self:wait_cond(function()
+ return socket.tcp_connect(self.host, self.port)
end, 100)
local data = msg .. '\n'
sock:send(data)
Pushed to test-run's master.
Updated in tarantool in master, 2.2 and 1.10.
WBR, Alexander Turenko.
----
TL;DR: I tried another approach, but then refuses it.
BTW, I was wonder why we connect to test-run's inspector each time and made
attempt to store a connected socket in test_run instance. I met several
problems with this approach:
* test-run/lib/inspector.py leans on the fact that tarantool closes a socket
after reading of a response (blocks on self.sem if it does not close it)
* test-run instance is recreated in many places: in an instance file, in a
test, after each test_run:cmd('switch <...>')
* test-run's pretest_clean cleans a global test_run variable
I found that if we'll eliminate self.sem.acquire() / self.sem.release()
(I'm very tentative it is right thing to do), will protect _G.test_run
using 'protected_globals' and reuse it in inspector_new(), will save all
established connections to inspector in a global weak table (and protect
it with 'protected_globals') using host:port as keys and will tweak
test_run:cmd() to use those connections when available (w/o wait_cond),
then the test will work reliably. (We also need a __gc metamethod to
close a connection.)
However when I tried to always use the pre-created connection in
test_run:cmd(), I got nils in self.sock. It seems that those connections
are sometimes fails anyway even when they are created from in
inspector_new() (when a test is just started).
If we'll try to keep the connections always open, but create them if not
available in test_run:cmd(), this looks as just increase of complexity
w/o enough gains. Aside of this, I'm tentative about removing locks
from inspector.py.
So I returned to your variant of the fix and pushed it.
On Tue, Nov 26, 2019 at 08:58:59PM +0300, Ilya Kosarev wrote:
> For some tests, for example, replication/box_set_replication_stress,
> socket.tcp_connect() in test_run:cmd() might sometimes fail with
> "Too many open files" error when running under high load. In case of
> box_set_replication_stress test it happens because of a huge number of
> file descriptors being opened during many box_set_replication calls.
> Under high load it takes time for them to be closed. Now we are trying
> to perform socket.tcp_connect() under test_run:wait_cond() clause until
> we succeed or time is gone.
>
> Closes #193
> ---
> https://github.com/tarantool/test-run/tree/i.kosarev/gh-193-stabilize-test-run-cmd
> https://github.com/tarantool/test-run/issues/193
>
> Changes in v2:
> - replaced infinite loop with wait_cond
>
> test_run.lua | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/test_run.lua b/test_run.lua
> index 63dfdef..804d42c 100644
> --- a/test_run.lua
> +++ b/test_run.lua
> @@ -10,7 +10,11 @@ local errno = require('errno')
> local clock = require('clock')
>
> local function cmd(self, msg)
> - local sock = socket.tcp_connect(self.host, self.port)
> + local sock = nil
> + self:wait_cond(function()
> + sock = socket.tcp_connect(self.host, self.port)
> + return sock ~= nil
> + end, 100)
> local data = msg .. '\n'
> sock:send(data)
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Tarantool-patches] [PATCH v2] Stabilize tcp_connect in test_run:cmd()
@ 2019-11-26 0:19 Ilya Kosarev
0 siblings, 0 replies; 3+ messages in thread
From: Ilya Kosarev @ 2019-11-26 0:19 UTC (permalink / raw)
To: tarantool-patches
For some tests, for example, replication/box_set_replication_stress,
socket.tcp_connect() in test_run:cmd() might sometimes fail with
"Too many open files" error when running under high load.
Now it is fixed.
Closes #193
---
https://github.com/tarantool/test-run/tree/i.kosarev/gh-193-stabilize-test-run-cmd
https://github.com/tarantool/test-run/issues/193
Changes in v2:
- replaced infinite loop with wait_cond
test_run.lua | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/test_run.lua b/test_run.lua
index 63dfdef..804d42c 100644
--- a/test_run.lua
+++ b/test_run.lua
@@ -10,7 +10,11 @@ local errno = require('errno')
local clock = require('clock')
local function cmd(self, msg)
- local sock = socket.tcp_connect(self.host, self.port)
+ local sock = nil
+ self:wait_cond(function()
+ sock = socket.tcp_connect(self.host, self.port)
+ return sock ~= nil
+ end, 100)
local data = msg .. '\n'
sock:send(data)
--
2.17.1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-12-05 1:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-26 17:58 [Tarantool-patches] [PATCH v2] Stabilize tcp_connect in test_run:cmd() Ilya Kosarev
2019-12-05 1:35 ` Alexander Turenko
-- strict thread matches above, loose matches on Subject: below --
2019-11-26 0:19 Ilya Kosarev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox