Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Olga Arkhangelskaia <krishtal.olja@gmail.com>
Cc: tarantool-patches@freelists.org
Subject: Re: [tarantool-patches] [PATCH v3] box: adds replication sync after cfg. update
Date: Mon, 27 Aug 2018 15:08:08 +0300	[thread overview]
Message-ID: <20180827120808.ss3oo3weilhbescm@esperanza> (raw)
In-Reply-To: <20180826082537.38552-1-krishtal.olja@gmail.com>

On Sun, Aug 26, 2018 at 11:25:37AM +0300, Olga Arkhangelskaia wrote:
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 8d7454d1f..617067ab5 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -634,6 +634,11 @@ box_set_replication(void)
>  	box_sync_replication(true);
>  	/* Follow replica */
>  	replicaset_follow();
> +	/* Sync replica up to quorum */
> +	if (!replicaset_sync()) {
> +		tnt_raise(ClientError, ER_REPLICA_SYNC, cfg_gets("instance_uuid"),
> +			  cfg_gets("replicaset_uuid"));
> +	}

I guess that if synchronization fails, we should rollback to the
previous configuration, like we do in case we fail to connect.

>  }
>  
>  void
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index 7e3ea1ed1..4059930c0 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -207,6 +207,7 @@ struct errcode_record {
>  	/*152 */_(ER_NULLABLE_PRIMARY,		"Primary index of the space '%s' can not contain nullable parts") \
>  	/*153 */_(ER_NULLABLE_MISMATCH,		"Field %d is %s in space format, but %s in index parts") \
>  	/*154 */_(ER_TRANSACTION_YIELD,		"Transaction has been aborted by a fiber yield") \
> +	/*155 */_(ER_REPLICA_SYNC,		"Failed to synchronize replica %s with replicaset %s") \

I don't think we need a separate error code for this - ER_CFG will do.

>  
>  /*
>   * !IMPORTANT! Please follow instructions at start of the file
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 861ce34ea..9ccb13fa2 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -661,13 +661,13 @@ replicaset_follow(void)
>  	}
>  }
>  
> -void
> +bool
>  replicaset_sync(void)
>  {
>  	int quorum = replicaset_quorum();
>  
>  	if (quorum == 0)
> -		return;
> +		return true;
>  
>  	say_verbose("synchronizing with %d replicas", quorum);
>  
> @@ -687,11 +687,12 @@ replicaset_sync(void)
>  		 * in 'orphan' state.
>  		 */
>  		say_crit("entering orphan mode");
> -		return;
> +		return false;

We don't enter orphan state on replication reconfiguration, instead we
throw an error. So you should move this say_crit to box_cfg_xc.

>  	}
>  
>  	say_crit("replica set sync complete, quorum of %d "
>  		 "replicas formed", quorum);
> +	return true;
>  }
>  
>  void
> diff --git a/src/box/replication.h b/src/box/replication.h
> index 06a2867b6..d4e6f7e3e 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -373,10 +373,10 @@ replicaset_follow(void);
>  
>  /**
>   * Wait until a replication quorum is formed.
> - * Return immediately if a quorum cannot be
> - * formed because of errors.
> + * @return true in case of success.
> + * @return false if a quorum cannot be formed because of errors.
>   */
> -void
> +bool
>  replicaset_sync(void);
>  
>  /**

> diff --git a/test/replication/config_change_sync.test.lua b/test/replication/config_change_sync.test.lua

I think that you'd better call the test simply replication/sync.test.lua.

> new file mode 100644
> index 000000000..b1a71dbaa
> --- /dev/null
> +++ b/test/replication/config_change_sync.test.lua
> @@ -0,0 +1,43 @@
> +--
> +-- gh-3427: no sync after configuration update
> +--
> +
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +
> +box.schema.user.grant('guest', 'read,write,execute', 'universe')

You don't need this - replication role should be enough.

> +
> +box.schema.user.grant('guest', 'replication')
> +test_run:cmd("create server replica with rpl_master=default, script='replication/replica_orphan.lua'")
> +test_run:cmd("start server replica")
> +
> +repl = test_run:eval('replica', 'return box.cfg.listen')[1]
> +box.cfg{replication = repl}

Why? You just want to check that the replica doesn't leave box.cfg()
until it synchronizes with the master (default) so why do you need to
connect the master to the replica?

> +
> +s = box.schema.space.create('test', {engine = engine})
> +index = s:create_index('primary')
> +
> +-- change replica configuration
> +test_run:cmd("switch replica")
> +box.cfg{replication={}}
> +
> +test_run:cmd("switch default")
> +-- insert values on the master while replica is unconfigured
> +a = 50000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} end box.commit()

That's going to take too long. Please use error injection instead, see
ERRINJ_RELAY_TIMEOUT.

> +
> +test_run:cmd("switch replica")
> +box.cfg{replication = os.getenv("MASTER")}

Nit: please remember the old configuration before turning off
replication and then re-enable it instead:

replication = box.cfg.replication
...
box.cfg{replication = replication}

> +require'fiber'.sleep(0.1)

This shouldn't be here, should it?

> +
> +
> +box.info.replication[1].upstream.lag > 0.1

Nit: I expect that an expression like this returns true if everything is
right, not false so please change > to <=.

> +
> +test_run:cmd("switch default")
> +
> +-- cleanup
> +test_run:cmd("stop server replica")
> +test_run:cmd("cleanup server replica")
> +box.space.test:drop()
> +box.schema.user.revoke('guest', 'replication')
> +box.schema.user.revoke('guest', 'read,write,execute', 'universe')
> diff --git a/test/replication/replica_orphan.lua b/test/replication/replica_orphan.lua
> new file mode 100644
> index 000000000..97740d69a
> --- /dev/null
> +++ b/test/replication/replica_orphan.lua
> @@ -0,0 +1,12 @@
> +#!/usr/bin/env tarantool

Since box.cfg.replication_sync_lag is now taken into account during
replication reconfiguration, it totally makes sense to make this option
dynamic. Then you wouldn't need this script and could use replica.lua
instead. Please do it in a separate patch. Don't forget to write a
documentation request in the commit message.

> +
> +local TIMEOUT = tonumber(arg[1])
> +
> +box.cfg({
> +    listen              = os.getenv("LISTEN"),
> +    replication         = os.getenv("MASTER"),
> +    replication_connect_timeout = 0.5,
> +    replication_sync_lag = 0.01,
> +})
> +
> +require('console').listen(os.getenv('ADMIN'))

  reply	other threads:[~2018-08-27 12:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-26  8:25 Olga Arkhangelskaia
2018-08-27 12:08 ` Vladimir Davydov [this message]
2018-08-27 12:13 ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180827120808.ss3oo3weilhbescm@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=krishtal.olja@gmail.com \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] [PATCH v3] box: adds replication sync after cfg. update' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox