From: Vladimir Davydov <vdavydov.dev@gmail.com> To: Olga Arkhangelskaia <krishtal.olja@gmail.com> Cc: tarantool-patches@freelists.org Subject: Re: [tarantool-patches] [PATCH v3] box: adds replication sync after cfg. update Date: Mon, 27 Aug 2018 15:08:08 +0300 [thread overview] Message-ID: <20180827120808.ss3oo3weilhbescm@esperanza> (raw) In-Reply-To: <20180826082537.38552-1-krishtal.olja@gmail.com> On Sun, Aug 26, 2018 at 11:25:37AM +0300, Olga Arkhangelskaia wrote: > diff --git a/src/box/box.cc b/src/box/box.cc > index 8d7454d1f..617067ab5 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -634,6 +634,11 @@ box_set_replication(void) > box_sync_replication(true); > /* Follow replica */ > replicaset_follow(); > + /* Sync replica up to quorum */ > + if (!replicaset_sync()) { > + tnt_raise(ClientError, ER_REPLICA_SYNC, cfg_gets("instance_uuid"), > + cfg_gets("replicaset_uuid")); > + } I guess that if synchronization fails, we should rollback to the previous configuration, like we do in case we fail to connect. > } > > void > diff --git a/src/box/errcode.h b/src/box/errcode.h > index 7e3ea1ed1..4059930c0 100644 > --- a/src/box/errcode.h > +++ b/src/box/errcode.h > @@ -207,6 +207,7 @@ struct errcode_record { > /*152 */_(ER_NULLABLE_PRIMARY, "Primary index of the space '%s' can not contain nullable parts") \ > /*153 */_(ER_NULLABLE_MISMATCH, "Field %d is %s in space format, but %s in index parts") \ > /*154 */_(ER_TRANSACTION_YIELD, "Transaction has been aborted by a fiber yield") \ > + /*155 */_(ER_REPLICA_SYNC, "Failed to synchronize replica %s with replicaset %s") \ I don't think we need a separate error code for this - ER_CFG will do. > > /* > * !IMPORTANT! Please follow instructions at start of the file > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 861ce34ea..9ccb13fa2 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -661,13 +661,13 @@ replicaset_follow(void) > } > } > > -void > +bool > replicaset_sync(void) > { > int quorum = replicaset_quorum(); > > if (quorum == 0) > - return; > + return true; > > say_verbose("synchronizing with %d replicas", quorum); > > @@ -687,11 +687,12 @@ replicaset_sync(void) > * in 'orphan' state. > */ > say_crit("entering orphan mode"); > - return; > + return false; We don't enter orphan state on replication reconfiguration, instead we throw an error. So you should move this say_crit to box_cfg_xc. > } > > say_crit("replica set sync complete, quorum of %d " > "replicas formed", quorum); > + return true; > } > > void > diff --git a/src/box/replication.h b/src/box/replication.h > index 06a2867b6..d4e6f7e3e 100644 > --- a/src/box/replication.h > +++ b/src/box/replication.h > @@ -373,10 +373,10 @@ replicaset_follow(void); > > /** > * Wait until a replication quorum is formed. > - * Return immediately if a quorum cannot be > - * formed because of errors. > + * @return true in case of success. > + * @return false if a quorum cannot be formed because of errors. > */ > -void > +bool > replicaset_sync(void); > > /** > diff --git a/test/replication/config_change_sync.test.lua b/test/replication/config_change_sync.test.lua I think that you'd better call the test simply replication/sync.test.lua. > new file mode 100644 > index 000000000..b1a71dbaa > --- /dev/null > +++ b/test/replication/config_change_sync.test.lua > @@ -0,0 +1,43 @@ > +-- > +-- gh-3427: no sync after configuration update > +-- > + > +env = require('test_run') > +test_run = env.new() > +engine = test_run:get_cfg('engine') > + > +box.schema.user.grant('guest', 'read,write,execute', 'universe') You don't need this - replication role should be enough. > + > +box.schema.user.grant('guest', 'replication') > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica_orphan.lua'") > +test_run:cmd("start server replica") > + > +repl = test_run:eval('replica', 'return box.cfg.listen')[1] > +box.cfg{replication = repl} Why? You just want to check that the replica doesn't leave box.cfg() until it synchronizes with the master (default) so why do you need to connect the master to the replica? > + > +s = box.schema.space.create('test', {engine = engine}) > +index = s:create_index('primary') > + > +-- change replica configuration > +test_run:cmd("switch replica") > +box.cfg{replication={}} > + > +test_run:cmd("switch default") > +-- insert values on the master while replica is unconfigured > +a = 50000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} end box.commit() That's going to take too long. Please use error injection instead, see ERRINJ_RELAY_TIMEOUT. > + > +test_run:cmd("switch replica") > +box.cfg{replication = os.getenv("MASTER")} Nit: please remember the old configuration before turning off replication and then re-enable it instead: replication = box.cfg.replication ... box.cfg{replication = replication} > +require'fiber'.sleep(0.1) This shouldn't be here, should it? > + > + > +box.info.replication[1].upstream.lag > 0.1 Nit: I expect that an expression like this returns true if everything is right, not false so please change > to <=. > + > +test_run:cmd("switch default") > + > +-- cleanup > +test_run:cmd("stop server replica") > +test_run:cmd("cleanup server replica") > +box.space.test:drop() > +box.schema.user.revoke('guest', 'replication') > +box.schema.user.revoke('guest', 'read,write,execute', 'universe') > diff --git a/test/replication/replica_orphan.lua b/test/replication/replica_orphan.lua > new file mode 100644 > index 000000000..97740d69a > --- /dev/null > +++ b/test/replication/replica_orphan.lua > @@ -0,0 +1,12 @@ > +#!/usr/bin/env tarantool Since box.cfg.replication_sync_lag is now taken into account during replication reconfiguration, it totally makes sense to make this option dynamic. Then you wouldn't need this script and could use replica.lua instead. Please do it in a separate patch. Don't forget to write a documentation request in the commit message. > + > +local TIMEOUT = tonumber(arg[1]) > + > +box.cfg({ > + listen = os.getenv("LISTEN"), > + replication = os.getenv("MASTER"), > + replication_connect_timeout = 0.5, > + replication_sync_lag = 0.01, > +}) > + > +require('console').listen(os.getenv('ADMIN'))
next prev parent reply other threads:[~2018-08-27 12:08 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-26 8:25 Olga Arkhangelskaia 2018-08-27 12:08 ` Vladimir Davydov [this message] 2018-08-27 12:13 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180827120808.ss3oo3weilhbescm@esperanza \ --to=vdavydov.dev@gmail.com \ --cc=krishtal.olja@gmail.com \ --cc=tarantool-patches@freelists.org \ --subject='Re: [tarantool-patches] [PATCH v3] box: adds replication sync after cfg. update' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox