[tarantool-patches] [PATCH v2] replication: adds replication sync after cfg. update

Vladimir Davydov vdavydov.dev at gmail.com
Thu Aug 23 18:06:27 MSK 2018


On Wed, Aug 22, 2018 at 06:59:30PM +0300, Olga Arkhangelskaia wrote:
> When replica reconnects to replica set not for the first time, we
> suffer from absence of synchronization. Such behavior leads to giving
> away outdated data.
> 
> Closes #3427
> ---
> https://github.com/tarantool/tarantool/issues/3427
> https://github.com/tarantool/tarantool/tree/OKriw/replication_no_sync-1.9
> 
> v1:
> https://www.freelists.org/post/tarantool-patches/PATCH-replication-adds-replication-sync-after-cfg-update
> 
> Changes in v2:
> - fixed test
> - changed replicaset_sync
> 
>  src/box/box.cc                      |  7 ++-
>  src/box/replication.cc              | 34 ++++++++------
>  src/box/replication.h               |  7 ++-
>  test/replication/orphan.result      | 92 +++++++++++++++++++++++++++++++++++++
>  test/replication/orphan.test.lua    | 41 +++++++++++++++++
>  test/replication/replica_orphan.lua | 12 +++++

The test isn't about the 'orphan' state. It's about replica set
synchronization. Please call it appropriately.

Also, the test takes prohibitively long - 15 seconds for memtx+vinyl.
Please try to reduce the test run time.

Also, the test passes with and without the changes done to the code, in
other words it's useless. Please make sure, it doesn't pass without your
patch.


>  6 files changed, 177 insertions(+), 16 deletions(-)
>  create mode 100644 test/replication/orphan.result
>  create mode 100644 test/replication/orphan.test.lua
>  create mode 100644 test/replication/replica_orphan.lua
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 8d7454d1f..8c67c79e8 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -634,6 +634,9 @@ box_set_replication(void)
>  	box_sync_replication(true);
>  	/* Follow replica */
>  	replicaset_follow();
> +	/* Sync replica up to quorum */
> +	replicaset_sync();
> +	say_verbose("synchronization complete");

This message is pointless IMO. I think that replication reconfiguration
should throw an error if it failed to synchronize with the quorum. It
would be consistent with the fact that it throws an error in case it
failed to connect to the quorum.

>  }
>  
>  void
> @@ -1941,8 +1944,10 @@ box_cfg_xc(void)
>  	fiber_gc();
>  	is_box_configured = true;
>  
> -	if (!is_bootstrap_leader)
> +	if (!is_bootstrap_leader) {
>  		replicaset_sync();
> +		replicaset_is_synced();
> +	}

Instead of introducing another function, you could add a return value to
replicaset_sync() - true in case it successfully synchronized, false
otherwise - then use the return value to either log or throw an error,
depending whether its initial configuration or reconfiguration. IMO it
would look neater.



More information about the Tarantool-patches mailing list