From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 23 Aug 2018 22:57:54 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH] Don't throw an exception in a replication handler Message-ID: <20180823195754.sxorwil73z36ub2z@esperanza> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: To: Georgy Kirichenko Cc: tarantool-patches@freelists.org List-ID: On Thu, Aug 23, 2018 at 07:00:31PM +0300, Georgy Kirichenko wrote: > It is an error to throw an error out of a cbus message handler because > it breaks cbus message delivery. In case of replication throwing an > error prevents iproto against replication socket closing. > > Fixes 3642 > --- > Branch: > https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3642-fix-replication-socket-leak > Issue: https://github.com/tarantool/tarantool/issues/3642 Please don't prefix branch and issue names with 'Branch' and 'Issue', because it's pretty clear which is which without them. The ticket doesn't have a milestone assigned. Please assign one and rebase your branch on the latest 1.9 or 1.10, depending on the milestone. > src/box/iproto.cc | 2 +- > test/replication/misc.test.lua | 10 ++++++++++ > 2 files changed, 11 insertions(+), 1 deletion(-) You seem to have forgotten to update the result file. > > diff --git a/src/box/iproto.cc b/src/box/iproto.cc > index 0b92c316e..df32e4f2b 100644 > --- a/src/box/iproto.cc > +++ b/src/box/iproto.cc > @@ -1412,7 +1412,7 @@ tx_process_join_subscribe(struct cmsg *m) > unreachable(); > } > } catch (SocketError *e) { > - throw; /* don't write error response to prevent SIGPIPE */ > + return; /* don't write error response to prevent SIGPIPE */ > } catch (Exception *e) { > iproto_write_error(con->input.fd, e, ::schema_version, > msg->header.sync); > diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua > index 850579769..32ab07924 100644 > --- a/test/replication/misc.test.lua > +++ b/test/replication/misc.test.lua > @@ -79,4 +79,14 @@ box.space.space1:drop() > test_run:cmd("switch default") > test_run:drop_cluster(SERVERS) > > +test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"') > +test_run:cmd(string.format('start server sock')) > +test_run:cmd('switch sock') > +fiber = require('fiber') > +k = tonumber(io.popen('ulimit -n'):read()) > +for i = 2, k > 1024 and 1 or k + 20 do local replication = box.cfg.replication box.cfg{replication = {}} box.cfg{replication = replication} while box.info.replication[1].upstream.status ~= 'follow' do fiber.sleep(0.0001) end end This line's definitely worth splitting. This case takes adds another 3 seconds to the replication/misc run time, which seems to be a little bit too much. Besides, it isn't tested properly by Travis, because Travis has the fd limit > 1024. I think you should lower the ulimit in the test. Please try to call setrlimit via Lua ffi. > +test_run:cmd('switch default') > +test_run:cmd('stop server sock') > +test_run:cmd('cleanup server sock') > + > box.schema.user.revoke('guest', 'replication')