[tarantool-patches] Re: [PATCH v3] replication: fix assertion with duplicate connection

Olga Arkhangelskaia arkholga at tarantool.org
Tue Sep 25 11:53:08 MSK 2018


Pls do not look.


25/09/2018 11:26, Olga Arkhangelskaia пишет:
> Patch fixes behavior when replica tries to connect to the same master
> more than once. In case when it is initial configuration we raise the
> exception. If it in not initial config we print the error and disconnect
> the applier.
>
> Closes #3610
> ---
> https://github.com/tarantool/tarantool/issues/3610
> https://github.com/tarantool/tarantool/tree/OKriw/gh-3610-assert_fail_when_connect_master_twice-1.10
>
> v1:
> https://www.freelists.org/post/tarantool-patches/PATCH-box-fix-assertion-with-duplication-in-repl-source
>
> v2:
> https://www.freelists.org/post/tarantool-patches/PATCH-v2-replication-fix-assertion-with-duplicated-connect-to-same-master
>
> Changes in v2:
> - changed completely, now we let duplicated params to proceed
> - stop the applier at the moment when replica has the same hash that other one
> - changed test
>
> Changes in v3:
> - changed the wsy and order of how we clean up applier replica on duplication
> - test is now in misc.test.lua
> - need to think over second test case
>
>   src/box/applier.cc             |  2 +-
>   src/box/box.cc                 |  4 +++-
>   src/box/replication.cc         |  4 ++++
>   test/replication/misc.result   | 43 ++++++++++++++++++++++++++++++++++++++++++
>   test/replication/misc.test.lua | 17 +++++++++++++++++
>   5 files changed, 68 insertions(+), 2 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 604119e9d..1f99f8090 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -630,7 +630,7 @@ applier_f(va_list ap)
>   				return -1;
>   			}
>   		} catch (FiberIsCancelled *e) {
> -			applier_disconnect(applier, APPLIER_OFF);
> +			applier_disconnect(applier, APPLIER_DISCONNECTED);
>   			break;
>   		} catch (SocketError *e) {
>   			applier_log_error(applier, e);
> diff --git a/src/box/box.cc b/src/box/box.cc
> index f25146d01..d3aeb5de0 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -667,8 +667,10 @@ box_sync_replication(bool connect_quorum)
>   		diag_raise();
>   
>   	auto guard = make_scoped_guard([=]{
> -		for (int i = 0; i < count; i++)
> +		for (int i = 0; i < count; i++) {
> +			applier_stop(appliers[i]);
>   			applier_delete(appliers[i]); /* doesn't affect diag */
> +		}
>   	});
>   
>   	replicaset_connect(appliers, count, connect_quorum);
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 5755ad45e..0f205212e 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -427,6 +427,7 @@ replicaset_update(struct applier **appliers, int count)
>   	auto uniq_guard = make_scoped_guard([&]{
>   		replica_hash_foreach_safe(&uniq, replica, next) {
>   			replica_hash_remove(&uniq, replica);
> +			replica_clear_applier(replica);
>   			replica_delete(replica);
>   		}
>   	});
> @@ -454,6 +455,9 @@ replicaset_update(struct applier **appliers, int count)
>   		replica->uuid = applier->uuid;
>   
>   		if (replica_hash_search(&uniq, replica) != NULL) {
> +			applier = replica->applier;
> +			replica_clear_applier(replica);
> +			replica_delete(replica);
>   			tnt_raise(ClientError, ER_CFG, "replication",
>   				  "duplicate connection to the same replica");
>   		}
> diff --git a/test/replication/misc.result b/test/replication/misc.result
> index 0ac48ba34..9e6ca75d8 100644
> --- a/test/replication/misc.result
> +++ b/test/replication/misc.result
> @@ -391,6 +391,49 @@ test_run:cmd("delete server replica_auth")
>   ---
>   - true
>   ...
> +--
> +-- Test case for gh-3610. Before the fix replica would fail with the assertion
> +-- when trying to connect to the same master twice.
> +--
> +box.schema.user.grant('guest', 'replication')
> +---
> +...
> +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
> +---
> +- true
> +...
> +test_run:cmd("start server replica")
> +---
> +- true
> +...
> +test_run:cmd("switch replica")
> +---
> +- true
> +...
> +replication = box.cfg.replication
> +---
> +...
> +box.cfg{replication = {replication, replication}}
> +---
> +- error: 'Incorrect value for option ''replication'': duplicate connection to the
> +    same replica'
> +...
> +test_run:cmd("switch default")
> +---
> +- true
> +...
> +test_run:cmd("stop server replica")
> +---
> +- true
> +...
> +test_run:cmd('cleanup server replica')
> +---
> +- true
> +...
> +test_run:cmd("delete server replica")
> +---
> +- true
> +...
>   box.schema.user.drop('cluster')
>   ---
>   ...
> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
> index 56e1bab69..f32464d19 100644
> --- a/test/replication/misc.test.lua
> +++ b/test/replication/misc.test.lua
> @@ -162,4 +162,21 @@ test_run:cmd("stop server replica_auth")
>   test_run:cmd("cleanup server replica_auth")
>   test_run:cmd("delete server replica_auth")
>   
> +
> +--
> +-- Test case for gh-3610. Before the fix replica would fail with the assertion
> +-- when trying to connect to the same master twice.
> +--
> +box.schema.user.grant('guest', 'replication')
> +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
> +test_run:cmd("start server replica")
> +test_run:cmd("switch replica")
> +replication = box.cfg.replication
> +box.cfg{replication = {replication, replication}}
> +
> +test_run:cmd("switch default")
> +test_run:cmd("stop server replica")
> +test_run:cmd('cleanup server replica')
> +test_run:cmd("delete server replica")
> +
>   box.schema.user.drop('cluster')





More information about the Tarantool-patches mailing list