From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 30 Aug 2018 13:11:36 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH v5 3/3] box: adds replication sync after cfg. update Message-ID: <20180830101136.tegvxuxd7vtqi72e@esperanza> References: <20180829185642.49479-1-krishtal.olja@gmail.com> <20180829185642.49479-3-krishtal.olja@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180829185642.49479-3-krishtal.olja@gmail.com> To: Olga Arkhangelskaia Cc: tarantool-patches@freelists.org List-ID: On Wed, Aug 29, 2018 at 09:56:42PM +0300, Olga Arkhangelskaia wrote: > When replica reconnects to replica set not for the first time, we > suffer from absence of synchronization. Such behavior leads to giving > away outdated data. > > @TarantoolBot document > Title: Orphan status after configuration update or initial bootstrap. > In case of initial bootstrap or after configuration update we can get > an orphan status in two cases. If we synced up with number of replicas > that is smaller than quorum or if we failed to sync up during the time > specified in replication_sync_lag_timeout. > > Closes #3427 'Closes ####' must be before TarantoolBot request. > diff --git a/src/box/box.cc b/src/box/box.cc > index 0f8364ebc..6bae9ea78 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -646,6 +646,10 @@ box_set_replication(void) > box_sync_replication(true); > /* Follow replica */ > replicaset_follow(); > + /* Sync replica up to quorum */ > + if (!replicaset_sync()) { > + say_crit("entering orphan mode"); > + } I don't see where you enter the orphan state. > } > > void > @@ -1967,7 +1971,8 @@ box_cfg_xc(void) > is_box_configured = true; > > if (!is_bootstrap_leader) > - replicaset_sync(); > + if (!replicaset_sync()) > + say_crit("entering orphan mode"); Apparently, you don't need to return true/false from replicaset_sync anymore as box_set_replication and box_cfg_xc do the same in case of synchronization failure. > diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua > new file mode 100644 > index 000000000..e63edd0d3 > --- /dev/null > +++ b/test/replication/sync.test.lua > @@ -0,0 +1,57 @@ > +-- > +-- gh-3427: no sync after configuration update > +-- > + > +-- > +-- successful sync > +-- > + > +env = require('test_run') > +test_run = env.new() > +engine = test_run:get_cfg('engine') > + > +box.schema.user.grant('guest', 'replication') > + > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +test_run:cmd("start server replica") > + > +s = box.schema.space.create('test', {engine = engine}) > +index = s:create_index('primary') > + > +-- change replica configuration > +test_run:cmd("switch replica") > +box.cfg{replication_sync_lag = 0.1} The test passes without this line, that is you don't check that replication_sync_lag actually works. Please add a test case for this. > +replication = box.cfg.replication > +box.cfg{replication={}} > + > +test_run:cmd("switch default") > +-- insert values on the master while replica is unconfigured > +a = 100 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} end box.commit() for i = 1, 100 do ... end looks much more readable. > + > +test_run:cmd("switch replica") > +box.cfg{replication = replication} > + > +box.space.test:count() == 100 > + > + > +-- > +-- unsuccessful sync > +-- > +box.cfg{replication={}} > +box.cfg{replication_sync_lag_timeout = 0.001} > + > +test_run:cmd("switch default") > +-- insert values on the master while replica is unconfigured > +a = 200 box.begin() while a > 100 do a = a-1 box.space.test:insert{a,a} end box.commit() > + > +test_run:cmd("switch replica") > +box.cfg{replication = replication} > +box.space.test:count() < 200 You should check that tarantool enters the orphan state in this case and leaves it once it is synchronized. > + > +test_run:cmd("switch default") > + > +-- cleanup > +test_run:cmd("stop server replica") > +test_run:cmd("cleanup server replica") > +box.space.test:drop() > +box.schema.user.revoke('guest', 'replication')