[patches] [PATCH] replication: fix rebootstrap race that results in broken subscription

Georgy Kirichenko georgy at tarantool.org
Fri Feb 16 20:53:55 MSK 2018


Seems ok to push

On Thursday, February 15, 2018 3:30:07 PM MSK Vladimir Davydov wrote:
> While a node of the cluster is re-bootstrapping (joining again),
> other nodes may try to re-subscribe to it. They will fail, because
> the rebootstrapped node hasn't tried to subscribe hence hasn't been
> added to the _cluster table yet and so is not present in the hash
> at the subscriber's side for replica_on_applier_reconnect() to look
> it up.
> 
> Fix this by making a subscriber create an id-less (REPLICA_ID_NIL)
> struct replica in this case and reattach the applier to it. It will
> be assigned an id when it finally subscribes and is registered in
> _cluster.
> 
> Fixes 71b3340537e9 replication: reconnect applier on master rebootstrap
> ---
> https://github.com/tarantool/tarantool/tree/gh-3151-replication-cfg-improvem
> ents
> 
>  src/box/box.cc                   |  9 ++---
>  src/box/replication.cc           | 13 +++++---
>  test/replication/quorum.result   | 71
> +++++++++++++++++++++++++++++++++++++--- test/replication/quorum.test.lua |
> 35 +++++++++++++++++---
>  4 files changed, 111 insertions(+), 17 deletions(-)
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index fa1eb051..33d237bc 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1152,7 +1152,7 @@ box_register_replica(uint32_t id, const struct tt_uuid
> *uuid) if (boxk(IPROTO_INSERT, BOX_CLUSTER_ID, "[%u%s]",
>  		 (unsigned) id, tt_uuid_str(uuid)) != 0)
>  		diag_raise();
> -	assert(replica_by_uuid(uuid) != NULL);
> +	assert(replica_by_uuid(uuid)->id == id);
>  }
> 
>  /**
> @@ -1166,7 +1166,7 @@ static void
>  box_on_join(const tt_uuid *instance_uuid)
>  {
>  	struct replica *replica = replica_by_uuid(instance_uuid);
> -	if (replica != NULL)
> +	if (replica != NULL && replica->id != REPLICA_ID_NIL)
>  		return; /* nothing to do - already registered */
> 
>  	box_check_writable_xc();
> @@ -1270,7 +1270,8 @@ box_process_join(struct ev_io *io, struct xrow_header
> *header) * is complete. Fail early if the caller does not have
>  	 * appropriate access privileges.
>  	 */
> -	if (replica_by_uuid(&instance_uuid) == NULL) {
> +	struct replica *replica = replica_by_uuid(&instance_uuid);
> +	if (replica == NULL || replica->id == REPLICA_ID_NIL) {
>  		box_check_writable_xc();
>  		struct space *space = space_cache_find_xc(BOX_CLUSTER_ID);
>  		access_check_space_xc(space, PRIV_W);
> @@ -1323,7 +1324,7 @@ box_process_join(struct ev_io *io, struct xrow_header
> *header) */
>  	box_on_join(&instance_uuid);
> 
> -	struct replica *replica = replica_by_uuid(&instance_uuid);
> +	replica = replica_by_uuid(&instance_uuid);
>  	assert(replica != NULL);
>  	replica->gc = gc;
>  	gc_guard.is_active = false;
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index fc8f900b..b1c84d36 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -284,10 +284,15 @@ replica_on_applier_reconnect(struct replica *replica)
>  		 * the new UUID and reassign the applier to it.
>  		 */
>  		struct replica *orig = replica_by_uuid(&applier->uuid);
> -		if (orig == NULL || orig->applier != NULL) {
> -			tnt_raise(ClientError, ER_INSTANCE_UUID_MISMATCH,
> -				  tt_uuid_str(&replica->uuid),
> -				  tt_uuid_str(&applier->uuid));
> +		if (orig == NULL) {
> +			orig = replica_new();
> +			orig->uuid = applier->uuid;
> +			replica_hash_insert(&replicaset.hash, orig);
> +		}
> +
> +		if (orig->applier != NULL) {
> +			tnt_raise(ClientError, ER_CFG, "replication",
> +				  "duplicate connection to the same replica");
>  		}
> 
>  		replica_set_applier(orig, applier);
> diff --git a/test/replication/quorum.result b/test/replication/quorum.result
> index 5a294103..34408d42 100644
> --- a/test/replication/quorum.result
> +++ b/test/replication/quorum.result
> @@ -133,7 +133,7 @@ test_run:cmd('switch quorum3')
>  ---
>  - true
>  ...
> -box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
>  ---
>  - ok
>  ...
> @@ -141,7 +141,7 @@ test_run:cmd('switch quorum2')
>  ---
>  - true
>  ...
> -box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
>  ---
>  - ok
>  ...
> @@ -149,7 +149,7 @@ test_run:cmd('stop server quorum1')
>  ---
>  - true
>  ...
> -for i = 1, 10 do box.space.test:insert{i} end
> +for i = 1, 100 do box.space.test:insert{i} end
>  ---
>  ...
>  fiber = require('fiber')
> @@ -166,9 +166,70 @@ test_run:cmd('switch quorum1')
>  ---
>  - true
>  ...
> -box.space.test:count() -- 10
> +box.space.test:count() -- 100
>  ---
> -- 10
> +- 100
> +...
> +-- Rebootstrap one node of the cluster and check that others follow.
> +-- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
> +-- between the moment the node starts listening and the moment it
> +-- completes bootstrap and subscribes. Other nodes will try and
> +-- fail to subscribe to the restarted node during this period.
> +-- This is OK - they have to retry until the bootstrap is complete.
> +test_run:cmd('switch quorum3')
> +---
> +- true
> +...
> +box.snapshot()
> +---
> +- ok
> +...
> +test_run:cmd('switch quorum2')
> +---
> +- true
> +...
> +box.snapshot()
> +---
> +- ok
> +...
> +test_run:cmd('switch quorum1')
> +---
> +- true
> +...
> +test_run:cmd('restart server quorum1 with cleanup=1')
> +box.space.test:count() -- 100
> +---
> +- 100
> +...
> +-- The rebootstrapped replica will be assigned id = 4,
> +-- because ids 1..3 are busy.
> +test_run:cmd('switch quorum2')
> +---
> +- true
> +...
> +fiber = require('fiber')
> +---
> +...
> +while box.info.replication[4].upstream.status ~= 'follow' do
> fiber.sleep(0.001) end +---
> +...
> +box.info.replication[4].upstream.status
> +---
> +- follow
> +...
> +test_run:cmd('switch quorum3')
> +---
> +- true
> +...
> +fiber = require('fiber')
> +---
> +...
> +while box.info.replication[4].upstream.status ~= 'follow' do
> fiber.sleep(0.001) end +---
> +...
> +box.info.replication[4].upstream.status
> +---
> +- follow
>  ...
>  -- Cleanup.
>  test_run:cmd('switch default')
> diff --git a/test/replication/quorum.test.lua
> b/test/replication/quorum.test.lua index 192f1369..85600684 100644
> --- a/test/replication/quorum.test.lua
> +++ b/test/replication/quorum.test.lua
> @@ -54,18 +54,45 @@ box.info.id == 3 or
> box.info.replication[3].upstream.status == 'follow' -- Check that box.cfg()
> doesn't return until the instance
>  -- catches up with all configured replicas.
>  test_run:cmd('switch quorum3')
> -box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
>  test_run:cmd('switch quorum2')
> -box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
>  test_run:cmd('stop server quorum1')
> 
> -for i = 1, 10 do box.space.test:insert{i} end
> +for i = 1, 100 do box.space.test:insert{i} end
>  fiber = require('fiber')
>  fiber.sleep(0.1)
> 
>  test_run:cmd('start server quorum1')
>  test_run:cmd('switch quorum1')
> -box.space.test:count() -- 10
> +box.space.test:count() -- 100
> +
> +-- Rebootstrap one node of the cluster and check that others follow.
> +-- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
> +-- between the moment the node starts listening and the moment it
> +-- completes bootstrap and subscribes. Other nodes will try and
> +-- fail to subscribe to the restarted node during this period.
> +-- This is OK - they have to retry until the bootstrap is complete.
> +test_run:cmd('switch quorum3')
> +box.snapshot()
> +test_run:cmd('switch quorum2')
> +box.snapshot()
> +
> +test_run:cmd('switch quorum1')
> +test_run:cmd('restart server quorum1 with cleanup=1')
> +
> +box.space.test:count() -- 100
> +
> +-- The rebootstrapped replica will be assigned id = 4,
> +-- because ids 1..3 are busy.
> +test_run:cmd('switch quorum2')
> +fiber = require('fiber')
> +while box.info.replication[4].upstream.status ~= 'follow' do
> fiber.sleep(0.001) end +box.info.replication[4].upstream.status
> +test_run:cmd('switch quorum3')
> +fiber = require('fiber')
> +while box.info.replication[4].upstream.status ~= 'follow' do
> fiber.sleep(0.001) end +box.info.replication[4].upstream.status
> 
>  -- Cleanup.
>  test_run:cmd('switch default')

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180216/beb8fd42/attachment.sig>


More information about the Tarantool-patches mailing list