[tarantool-patches] [PATCH] Don't throw an exception in a replication handler

Vladimir Davydov vdavydov.dev at gmail.com
Thu Aug 23 22:57:54 MSK 2018


On Thu, Aug 23, 2018 at 07:00:31PM +0300, Georgy Kirichenko wrote:
> It is an error to throw an error out of a cbus message handler because
> it breaks cbus message delivery. In case of replication throwing an
> error prevents iproto against replication socket closing.
> 
> Fixes 3642
> ---
> Branch:
> https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3642-fix-replication-socket-leak
> Issue: https://github.com/tarantool/tarantool/issues/3642

Please don't prefix branch and issue names with 'Branch' and 'Issue',
because it's pretty clear which is which without them.

The ticket doesn't have a milestone assigned. Please assign one
and rebase your branch on the latest 1.9 or 1.10, depending on
the milestone.

>  src/box/iproto.cc              |  2 +-
>  test/replication/misc.test.lua | 10 ++++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)

You seem to have forgotten to update the result file.

> 
> diff --git a/src/box/iproto.cc b/src/box/iproto.cc
> index 0b92c316e..df32e4f2b 100644
> --- a/src/box/iproto.cc
> +++ b/src/box/iproto.cc
> @@ -1412,7 +1412,7 @@ tx_process_join_subscribe(struct cmsg *m)
>  			unreachable();
>  		}
>  	} catch (SocketError *e) {
> -		throw; /* don't write error response to prevent SIGPIPE */
> +		return; /* don't write error response to prevent SIGPIPE */
>  	} catch (Exception *e) {
>  		iproto_write_error(con->input.fd, e, ::schema_version,
>  				   msg->header.sync);
> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
> index 850579769..32ab07924 100644
> --- a/test/replication/misc.test.lua
> +++ b/test/replication/misc.test.lua
> @@ -79,4 +79,14 @@ box.space.space1:drop()
>  test_run:cmd("switch default")
>  test_run:drop_cluster(SERVERS)
>  
> +test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
> +test_run:cmd(string.format('start server sock'))
> +test_run:cmd('switch sock')
> +fiber = require('fiber')
> +k = tonumber(io.popen('ulimit -n'):read())
> +for i = 2, k > 1024 and 1 or k + 20 do local replication = box.cfg.replication box.cfg{replication = {}} box.cfg{replication = replication} while box.info.replication[1].upstream.status ~= 'follow' do fiber.sleep(0.0001) end end

This line's definitely worth splitting.

This case takes adds another 3 seconds to the replication/misc run time,
which seems to be a little bit too much. Besides, it isn't tested
properly by Travis, because Travis has the fd limit > 1024. I think you
should lower the ulimit in the test. Please try to call setrlimit via
Lua ffi.

> +test_run:cmd('switch default')
> +test_run:cmd('stop server sock')
> +test_run:cmd('cleanup server sock')
> +
>  box.schema.user.revoke('guest', 'replication')



More information about the Tarantool-patches mailing list