From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 23 Aug 2018 18:06:27 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH v2] replication: adds replication sync after cfg. update Message-ID: <20180823150627.r4peajtud752kvgj@esperanza> References: <20180822155930.34980-1-krishtal.olja@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180822155930.34980-1-krishtal.olja@gmail.com> To: Olga Arkhangelskaia Cc: tarantool-patches@freelists.org List-ID: On Wed, Aug 22, 2018 at 06:59:30PM +0300, Olga Arkhangelskaia wrote: > When replica reconnects to replica set not for the first time, we > suffer from absence of synchronization. Such behavior leads to giving > away outdated data. > > Closes #3427 > --- > https://github.com/tarantool/tarantool/issues/3427 > https://github.com/tarantool/tarantool/tree/OKriw/replication_no_sync-1.9 > > v1: > https://www.freelists.org/post/tarantool-patches/PATCH-replication-adds-replication-sync-after-cfg-update > > Changes in v2: > - fixed test > - changed replicaset_sync > > src/box/box.cc | 7 ++- > src/box/replication.cc | 34 ++++++++------ > src/box/replication.h | 7 ++- > test/replication/orphan.result | 92 +++++++++++++++++++++++++++++++++++++ > test/replication/orphan.test.lua | 41 +++++++++++++++++ > test/replication/replica_orphan.lua | 12 +++++ The test isn't about the 'orphan' state. It's about replica set synchronization. Please call it appropriately. Also, the test takes prohibitively long - 15 seconds for memtx+vinyl. Please try to reduce the test run time. Also, the test passes with and without the changes done to the code, in other words it's useless. Please make sure, it doesn't pass without your patch. > 6 files changed, 177 insertions(+), 16 deletions(-) > create mode 100644 test/replication/orphan.result > create mode 100644 test/replication/orphan.test.lua > create mode 100644 test/replication/replica_orphan.lua > > diff --git a/src/box/box.cc b/src/box/box.cc > index 8d7454d1f..8c67c79e8 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -634,6 +634,9 @@ box_set_replication(void) > box_sync_replication(true); > /* Follow replica */ > replicaset_follow(); > + /* Sync replica up to quorum */ > + replicaset_sync(); > + say_verbose("synchronization complete"); This message is pointless IMO. I think that replication reconfiguration should throw an error if it failed to synchronize with the quorum. It would be consistent with the fact that it throws an error in case it failed to connect to the quorum. > } > > void > @@ -1941,8 +1944,10 @@ box_cfg_xc(void) > fiber_gc(); > is_box_configured = true; > > - if (!is_bootstrap_leader) > + if (!is_bootstrap_leader) { > replicaset_sync(); > + replicaset_is_synced(); > + } Instead of introducing another function, you could add a return value to replicaset_sync() - true in case it successfully synchronized, false otherwise - then use the return value to either log or throw an error, depending whether its initial configuration or reconfiguration. IMO it would look neater.