From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 10 Aug 2018 13:48:02 +0300 From: Vladimir Davydov Subject: Re: [PATCH v3] replication: do not ignore replication_connect_quorum. Message-ID: <20180810104802.oelzkdcwijpwbz4i@esperanza> References: <2DC61B5A-36F2-4F4B-8190-FD09E5BD9F50@tarantool.org> <20180810091453.94988-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180810091453.94988-1-sergepetrenko@tarantool.org> To: Serge Petrenko Cc: tarantool-patches@freelists.org, georgy@tarantool.org List-ID: On Fri, Aug 10, 2018 at 12:14:53PM +0300, Serge Petrenko wrote: > On bootstrap and after replication reconfiguration > replication_connect_quorum was ignored. The instance tried to connect to > every replica listed in replication parameter, and errored if it wasn't > possible. > The patch alters this behaviour. The instance still tries to connect to > every node listed in replication, but does not raise an error if it was > able to connect to at least replication_connect_quorum instances. > > Closes #3428 > > @TarantoolBot document > Title: replication_connect_quorum is not ignored. > Now on replica set bootstrap instance tries to connect > not to every replica, listed in box.cfg.replication, but > only to replication_connect_quorum replicas. If after > replication_connect_timeout seconds instance is not connected to > at least replication_connect_quorum other instances, we > throw an error. Please also mention that quorum is not ignored in case of replication reconfiguration (i.e. on subsequent calls to box.cfg{replication}). > diff --git a/src/box/box.cc b/src/box/box.cc > index e3eb2738f..8cf43a6ad 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -626,7 +626,7 @@ box_set_replication(void) > > box_check_replication(); > /* Try to connect to all replicas within the timeout period */ What happens if we fail to connect to all masters? The configuration will succeed as long as we've managed to connect to at least 'quorum' masters. Please update the comment. > - box_sync_replication(replication_connect_timeout, true); > + box_sync_replication(true); > /* Follow replica */ > replicaset_follow(); > } > @@ -1866,7 +1866,7 @@ box_cfg_xc(void) > title("orphan"); > > /* Wait for the cluster to start up */ > - box_sync_replication(replication_connect_timeout, false); > + box_sync_replication(false); This code cries for an explanation why we don't need to connect to 'quorum' masters here. Please update the comment. > } else { > if (!tt_uuid_is_nil(&instance_uuid)) > INSTANCE_UUID = instance_uuid; > @@ -1888,7 +1888,7 @@ box_cfg_xc(void) > * receive the same replica set UUID when a new cluster > * is deployed. > */ > - box_sync_replication(TIMEOUT_INFINITY, true); > + box_sync_replication(true); Here's the comment to this code (it got trimmed by diff): /* * Wait for the cluster to start up. * * Note, when bootstrapping a new instance, we have to * connect to all masters to make sure all replicas * receive the same replica set UUID when a new cluster * is deployed. */ It's obviously outdated by this patch. Please update. > /* Bootstrap a new master */ > bootstrap(&replicaset_uuid, &is_bootstrap_leader); > } > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 528fe4459..dc1b9d55a 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -557,7 +557,7 @@ replicaset_connect(struct applier **appliers, int count, > * - register a trigger in each applier to wake up our > * fiber via this channel when the remote peer becomes > * connected and a UUID is received; > - * - wait up to CONNECT_TIMEOUT seconds for `count` messages; > + * - wait up to REPLICATION_CONNECT_TIMEOUT seconds for `count` messages; The line is too long now and conspicuously stands out in the comment. Please carry it to the next line. Or not update it at all - as a matter of fact, I don't think we really need to update it. > * - on timeout, raise a CFG error, cancel and destroy > * the freshly created appliers (done in a guard); > * - an success, unregister the trigger, check the UUID set > @@ -587,7 +589,8 @@ replicaset_connect(struct applier **appliers, int count, > double wait_start = ev_monotonic_now(loop()); > if (fiber_cond_wait_timeout(&state.wakeup, timeout) != 0) > break; > - if (state.failed > 0 && connect_all) > + if (count - state.failed < > + MIN(count, replication_connect_quorum)) > break; > timeout -= ev_monotonic_now(loop()) - wait_start; > } > @@ -595,7 +598,8 @@ replicaset_connect(struct applier **appliers, int count, > say_crit("failed to connect to %d out of %d replicas", > count - state.connected, count); > /* Timeout or connection failure. */ > - if (connect_all) > + if (connect_quorum && state.connected < > + MIN(count, replication_connect_quorum)) Please eliminate the code duplication by introducing a separate variable: int quorum = MIN(count, replication_connect_quorum); > diff --git a/test/replication-py/master.lua b/test/replication-py/master.lua > index 0f9f7a6f0..51283efdf 100644 > --- a/test/replication-py/master.lua > +++ b/test/replication-py/master.lua > @@ -3,6 +3,8 @@ os = require('os') > box.cfg({ > listen = os.getenv("LISTEN"), > memtx_memory = 107374182, > + replication_connect_timeout = 1.0, > + replication_timeout = 0.3 > }) > > require('console').listen(os.getenv('ADMIN')) > diff --git a/test/replication-py/replica.lua b/test/replication-py/replica.lua > index 278291bba..b9d193b70 100644 > --- a/test/replication-py/replica.lua > +++ b/test/replication-py/replica.lua > @@ -7,6 +7,8 @@ box.cfg({ > listen = os.getenv("LISTEN"), > replication = os.getenv("MASTER"), > memtx_memory = 107374182, > + replication_connect_timeout = 1.0, > + replication_timeout = 0.3 Why don't you pass timeouts to these scripts, as you do for other tests? > }) > > box_cfg_done = true > diff --git a/test/replication/autobootstrap.lua b/test/replication/autobootstrap.lua > index 4f55417ae..c98f3c559 100644 > --- a/test/replication/autobootstrap.lua > +++ b/test/replication/autobootstrap.lua > @@ -5,6 +5,10 @@ local INSTANCE_ID = string.match(arg[0], "%d") > local USER = 'cluster' > local PASSWORD = 'somepassword' > local SOCKET_DIR = require('fio').cwd() > +local TIMEOUT = tonumber(arg[1]) > +local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or TIMEOUT * 3 > + As far as I can see, you always pass both timeouts to this script so I don't think you need the fallback. > + Extra new line. Please remove. > local function instance_uri(instance_id) > --return 'localhost:'..(3310 + instance_id) > return SOCKET_DIR..'/autobootstrap'..instance_id..'.sock'; > @@ -21,7 +25,9 @@ box.cfg({ > USER..':'..PASSWORD..'@'..instance_uri(2); > USER..':'..PASSWORD..'@'..instance_uri(3); > }; > - replication_connect_timeout = 0.5, > + replication_timeout = TIMEOUT; > + replication_connect_timeout = CON_TIMEOUT; > + replication_timeout = 0.1; 'replication_timeout' is set twice. > diff --git a/test/replication/autobootstrap_guest.lua b/test/replication/autobootstrap_guest.lua > index 40fef2c7a..2f6e527df 100644 > --- a/test/replication/autobootstrap_guest.lua > +++ b/test/replication/autobootstrap_guest.lua > @@ -4,6 +4,10 @@ > local INSTANCE_ID = string.match(arg[0], "%d") > > local SOCKET_DIR = require('fio').cwd() > + > +local TIMEOUT = tonumber(arg[1]) > +local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or TIMEOUT * 3 > + Again, I don't think you need to fall back on TIMEOUT * 3 in case CON_TIMEOUT is unset - you pass both values anyways. > diff --git a/test/replication/autobootstrap_guest.result b/test/replication/autobootstrap_guest.result > index 49f9bee01..f0b4cff9b 100644 > --- a/test/replication/autobootstrap_guest.result > +++ b/test/replication/autobootstrap_guest.result > @@ -7,13 +7,21 @@ vclock_diff = require('fast_replica').vclock_diff > test_run = env.new() > --- > ... > +create_cluster_with_args = require('cluster_cmd').create_cluster_with_args > +--- > +... > SERVERS = { 'autobootstrap_guest1', 'autobootstrap_guest2', 'autobootstrap_guest3' } > --- > ... > -- > -- Start servers > -- > -test_run:create_cluster(SERVERS) > +-- for some reason create_cluster_with_args, just like the > +-- usual test_run:create_cluster, can hang for arbitrary > +-- time, so we have to pass a rather large value for > +-- replication_connect_timeout(4 seconds). > +-- In most cases, 1 second is enough, though. Don't think we need this comment - it raises some unsettling questions. Let's just always pass a big timeout (say 10 seconds) to create_cluster, just in case? In other tests too, for consistency. > +create_cluster_with_args(test_run, SERVERS, "0.3 4.0") Why do you pass 0.3 for replication_timeout? 0.1 should work just fine, no? > diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result > index d561b4813..6f207ae01 100644 > --- a/test/replication/before_replace.result > +++ b/test/replication/before_replace.result > @@ -7,11 +7,14 @@ env = require('test_run') > test_run = env.new() > --- > ... > +create_cluster_with_args = require('cluster_cmd').create_cluster_with_args > +--- > +... > SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' } > --- > ... > -- Deploy a cluster. > -test_run:create_cluster(SERVERS) > +create_cluster_with_args(test_run, SERVERS, "0.5 4.0") Why 0.5 for replication_timeout? Let's leave it 0.1 as it used to be. You don't change 'replication_timeout' logic in this patch so it should work just fine. And please always pass a substantially big timeout (say, 10 seconds) to create_cluster(), the same in all old tests. Alternatively, you can just use the default value in those scripts (autobootstrap.lua, autobootstrap_guest.lua) if connect_timeout is omitted (it's 30 seconds, right?). If you do, then scratch my comment regarding the uselessness of the fallback. > --- > ... > test_run:wait_fullmesh(SERVERS) > @@ -125,7 +128,7 @@ box.space.test:select() > - [9, 90] > - [10, 100] > ... > -test_run:cmd('restart server autobootstrap1') > +test_run:cmd('restart server autobootstrap1 with args="0.3 1.0"') I don't think you need to change the timeouts here. That is, use 0.1 + 0.5 as before. Also, this patch is getting rather difficult to review. I think you should split it: - First, allow to pass args to create_cluster - Second, pass timeouts explicitly. Use a big timeout when creating a cluster, as a preparation for the final patch. Or omit it and use the default (30 seconds). - Finally, the patch changing behavior of replication_connect_timeout. It shouldn't touch existing tests then, only add new ones. Please do. I stop looking at the patch at this point. Just one more comment. Please don't introduce a new function for starting a cluster with custom arguments as you do below. Instead, patch tarantool/test-run so that create_cluster can optionally take arguments it is supposed to pass to the script (I think they should go to 'opts'). > diff --git a/test/replication/lua/cluster_cmd.lua b/test/replication/lua/cluster_cmd.lua > new file mode 100644 > index 000000000..e9e0ead65 > --- /dev/null > +++ b/test/replication/lua/cluster_cmd.lua > @@ -0,0 +1,14 @@ > +function create_cluster_with_args(test_run, servers, args) > + local cmd1 = 'create server %s with script="replication/%s.lua"' > + local cmd2 = 'start server %s with wait_load=False, wait=False, args="%s"' > + for _, srv in ipairs(servers) do > + test_run:cmd(cmd1:format(srv, srv)) > + end > + for _, srv in ipairs(servers) do > + test_run:cmd(cmd2:format(srv, args)) > + end > +end > + > +return { > + create_cluster_with_args = create_cluster_with_args; > +}