[tarantool-patches] [PATCH v3] box: adds replication sync after cfg. update

Vladimir Davydov vdavydov.dev at gmail.com
Mon Aug 27 15:08:08 MSK 2018


On Sun, Aug 26, 2018 at 11:25:37AM +0300, Olga Arkhangelskaia wrote:
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 8d7454d1f..617067ab5 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -634,6 +634,11 @@ box_set_replication(void)
>  	box_sync_replication(true);
>  	/* Follow replica */
>  	replicaset_follow();
> +	/* Sync replica up to quorum */
> +	if (!replicaset_sync()) {
> +		tnt_raise(ClientError, ER_REPLICA_SYNC, cfg_gets("instance_uuid"),
> +			  cfg_gets("replicaset_uuid"));
> +	}

I guess that if synchronization fails, we should rollback to the
previous configuration, like we do in case we fail to connect.

>  }
>  
>  void
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index 7e3ea1ed1..4059930c0 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -207,6 +207,7 @@ struct errcode_record {
>  	/*152 */_(ER_NULLABLE_PRIMARY,		"Primary index of the space '%s' can not contain nullable parts") \
>  	/*153 */_(ER_NULLABLE_MISMATCH,		"Field %d is %s in space format, but %s in index parts") \
>  	/*154 */_(ER_TRANSACTION_YIELD,		"Transaction has been aborted by a fiber yield") \
> +	/*155 */_(ER_REPLICA_SYNC,		"Failed to synchronize replica %s with replicaset %s") \

I don't think we need a separate error code for this - ER_CFG will do.

>  
>  /*
>   * !IMPORTANT! Please follow instructions at start of the file
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 861ce34ea..9ccb13fa2 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -661,13 +661,13 @@ replicaset_follow(void)
>  	}
>  }
>  
> -void
> +bool
>  replicaset_sync(void)
>  {
>  	int quorum = replicaset_quorum();
>  
>  	if (quorum == 0)
> -		return;
> +		return true;
>  
>  	say_verbose("synchronizing with %d replicas", quorum);
>  
> @@ -687,11 +687,12 @@ replicaset_sync(void)
>  		 * in 'orphan' state.
>  		 */
>  		say_crit("entering orphan mode");
> -		return;
> +		return false;

We don't enter orphan state on replication reconfiguration, instead we
throw an error. So you should move this say_crit to box_cfg_xc.

>  	}
>  
>  	say_crit("replica set sync complete, quorum of %d "
>  		 "replicas formed", quorum);
> +	return true;
>  }
>  
>  void
> diff --git a/src/box/replication.h b/src/box/replication.h
> index 06a2867b6..d4e6f7e3e 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -373,10 +373,10 @@ replicaset_follow(void);
>  
>  /**
>   * Wait until a replication quorum is formed.
> - * Return immediately if a quorum cannot be
> - * formed because of errors.
> + * @return true in case of success.
> + * @return false if a quorum cannot be formed because of errors.
>   */
> -void
> +bool
>  replicaset_sync(void);
>  
>  /**

> diff --git a/test/replication/config_change_sync.test.lua b/test/replication/config_change_sync.test.lua

I think that you'd better call the test simply replication/sync.test.lua.

> new file mode 100644
> index 000000000..b1a71dbaa
> --- /dev/null
> +++ b/test/replication/config_change_sync.test.lua
> @@ -0,0 +1,43 @@
> +--
> +-- gh-3427: no sync after configuration update
> +--
> +
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +
> +box.schema.user.grant('guest', 'read,write,execute', 'universe')

You don't need this - replication role should be enough.

> +
> +box.schema.user.grant('guest', 'replication')
> +test_run:cmd("create server replica with rpl_master=default, script='replication/replica_orphan.lua'")
> +test_run:cmd("start server replica")
> +
> +repl = test_run:eval('replica', 'return box.cfg.listen')[1]
> +box.cfg{replication = repl}

Why? You just want to check that the replica doesn't leave box.cfg()
until it synchronizes with the master (default) so why do you need to
connect the master to the replica?

> +
> +s = box.schema.space.create('test', {engine = engine})
> +index = s:create_index('primary')
> +
> +-- change replica configuration
> +test_run:cmd("switch replica")
> +box.cfg{replication={}}
> +
> +test_run:cmd("switch default")
> +-- insert values on the master while replica is unconfigured
> +a = 50000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} end box.commit()

That's going to take too long. Please use error injection instead, see
ERRINJ_RELAY_TIMEOUT.

> +
> +test_run:cmd("switch replica")
> +box.cfg{replication = os.getenv("MASTER")}

Nit: please remember the old configuration before turning off
replication and then re-enable it instead:

replication = box.cfg.replication
...
box.cfg{replication = replication}

> +require'fiber'.sleep(0.1)

This shouldn't be here, should it?

> +
> +
> +box.info.replication[1].upstream.lag > 0.1

Nit: I expect that an expression like this returns true if everything is
right, not false so please change > to <=.

> +
> +test_run:cmd("switch default")
> +
> +-- cleanup
> +test_run:cmd("stop server replica")
> +test_run:cmd("cleanup server replica")
> +box.space.test:drop()
> +box.schema.user.revoke('guest', 'replication')
> +box.schema.user.revoke('guest', 'read,write,execute', 'universe')
> diff --git a/test/replication/replica_orphan.lua b/test/replication/replica_orphan.lua
> new file mode 100644
> index 000000000..97740d69a
> --- /dev/null
> +++ b/test/replication/replica_orphan.lua
> @@ -0,0 +1,12 @@
> +#!/usr/bin/env tarantool

Since box.cfg.replication_sync_lag is now taken into account during
replication reconfiguration, it totally makes sense to make this option
dynamic. Then you wouldn't need this script and could use replica.lua
instead. Please do it in a separate patch. Don't forget to write a
documentation request in the commit message.

> +
> +local TIMEOUT = tonumber(arg[1])
> +
> +box.cfg({
> +    listen              = os.getenv("LISTEN"),
> +    replication         = os.getenv("MASTER"),
> +    replication_connect_timeout = 0.5,
> +    replication_sync_lag = 0.01,
> +})
> +
> +require('console').listen(os.getenv('ADMIN'))



More information about the Tarantool-patches mailing list