From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp15.mail.ru (smtp15.mail.ru [94.100.176.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5138D46971A for ; Thu, 5 Dec 2019 04:35:18 +0300 (MSK) Date: Thu, 5 Dec 2019 04:35:14 +0300 From: Alexander Turenko Message-ID: <20191205013513.x5ryflfpebvvepq2@tkn_work_nb> References: <20191126175859.21804-1-i.kosarev@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191126175859.21804-1-i.kosarev@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v2] Stabilize tcp_connect in test_run:cmd() List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ilya Kosarev Cc: tarantool-patches@dev.tarantool.org Thank you for the investigation and the patch. Simplified it a bit: diff --git a/test_run.lua b/test_run.lua index 804d42c..e37b559 100644 --- a/test_run.lua +++ b/test_run.lua @@ -10,10 +10,8 @@ local errno = require('errno') local clock = require('clock') local function cmd(self, msg) - local sock = nil - self:wait_cond(function() - sock = socket.tcp_connect(self.host, self.port) - return sock ~= nil + local sock = self:wait_cond(function() + return socket.tcp_connect(self.host, self.port) end, 100) local data = msg .. '\n' sock:send(data) Pushed to test-run's master. Updated in tarantool in master, 2.2 and 1.10. WBR, Alexander Turenko. ---- TL;DR: I tried another approach, but then refuses it. BTW, I was wonder why we connect to test-run's inspector each time and made attempt to store a connected socket in test_run instance. I met several problems with this approach: * test-run/lib/inspector.py leans on the fact that tarantool closes a socket after reading of a response (blocks on self.sem if it does not close it) * test-run instance is recreated in many places: in an instance file, in a test, after each test_run:cmd('switch <...>') * test-run's pretest_clean cleans a global test_run variable I found that if we'll eliminate self.sem.acquire() / self.sem.release() (I'm very tentative it is right thing to do), will protect _G.test_run using 'protected_globals' and reuse it in inspector_new(), will save all established connections to inspector in a global weak table (and protect it with 'protected_globals') using host:port as keys and will tweak test_run:cmd() to use those connections when available (w/o wait_cond), then the test will work reliably. (We also need a __gc metamethod to close a connection.) However when I tried to always use the pre-created connection in test_run:cmd(), I got nils in self.sock. It seems that those connections are sometimes fails anyway even when they are created from in inspector_new() (when a test is just started). If we'll try to keep the connections always open, but create them if not available in test_run:cmd(), this looks as just increase of complexity w/o enough gains. Aside of this, I'm tentative about removing locks from inspector.py. So I returned to your variant of the fix and pushed it. On Tue, Nov 26, 2019 at 08:58:59PM +0300, Ilya Kosarev wrote: > For some tests, for example, replication/box_set_replication_stress, > socket.tcp_connect() in test_run:cmd() might sometimes fail with > "Too many open files" error when running under high load. In case of > box_set_replication_stress test it happens because of a huge number of > file descriptors being opened during many box_set_replication calls. > Under high load it takes time for them to be closed. Now we are trying > to perform socket.tcp_connect() under test_run:wait_cond() clause until > we succeed or time is gone. > > Closes #193 > --- > https://github.com/tarantool/test-run/tree/i.kosarev/gh-193-stabilize-test-run-cmd > https://github.com/tarantool/test-run/issues/193 > > Changes in v2: > - replaced infinite loop with wait_cond > > test_run.lua | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/test_run.lua b/test_run.lua > index 63dfdef..804d42c 100644 > --- a/test_run.lua > +++ b/test_run.lua > @@ -10,7 +10,11 @@ local errno = require('errno') > local clock = require('clock') > > local function cmd(self, msg) > - local sock = socket.tcp_connect(self.host, self.port) > + local sock = nil > + self:wait_cond(function() > + sock = socket.tcp_connect(self.host, self.port) > + return sock ~= nil > + end, 100) > local data = msg .. '\n' > sock:send(data)