From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B1F092A3BD for ; Tue, 25 Sep 2018 04:26:19 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Tfp7a1tJtQjB for ; Tue, 25 Sep 2018 04:26:19 -0400 (EDT) Received: from smtp55.i.mail.ru (smtp55.i.mail.ru [217.69.128.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 1172E2A344 for ; Tue, 25 Sep 2018 04:26:18 -0400 (EDT) From: Olga Arkhangelskaia Subject: [tarantool-patches] [PATCH v3] replication: fix assertion with duplicate connection Date: Tue, 25 Sep 2018 11:26:00 +0300 Message-Id: <20180925082600.21363-1-arkholga@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Olga Arkhangelskaia Patch fixes behavior when replica tries to connect to the same master more than once. In case when it is initial configuration we raise the exception. If it in not initial config we print the error and disconnect the applier. Closes #3610 --- https://github.com/tarantool/tarantool/issues/3610 https://github.com/tarantool/tarantool/tree/OKriw/gh-3610-assert_fail_when_connect_master_twice-1.10 v1: https://www.freelists.org/post/tarantool-patches/PATCH-box-fix-assertion-with-duplication-in-repl-source v2: https://www.freelists.org/post/tarantool-patches/PATCH-v2-replication-fix-assertion-with-duplicated-connect-to-same-master Changes in v2: - changed completely, now we let duplicated params to proceed - stop the applier at the moment when replica has the same hash that other one - changed test Changes in v3: - changed the wsy and order of how we clean up applier replica on duplication - test is now in misc.test.lua - need to think over second test case src/box/applier.cc | 2 +- src/box/box.cc | 4 +++- src/box/replication.cc | 4 ++++ test/replication/misc.result | 43 ++++++++++++++++++++++++++++++++++++++++++ test/replication/misc.test.lua | 17 +++++++++++++++++ 5 files changed, 68 insertions(+), 2 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index 604119e9d..1f99f8090 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -630,7 +630,7 @@ applier_f(va_list ap) return -1; } } catch (FiberIsCancelled *e) { - applier_disconnect(applier, APPLIER_OFF); + applier_disconnect(applier, APPLIER_DISCONNECTED); break; } catch (SocketError *e) { applier_log_error(applier, e); diff --git a/src/box/box.cc b/src/box/box.cc index f25146d01..d3aeb5de0 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -667,8 +667,10 @@ box_sync_replication(bool connect_quorum) diag_raise(); auto guard = make_scoped_guard([=]{ - for (int i = 0; i < count; i++) + for (int i = 0; i < count; i++) { + applier_stop(appliers[i]); applier_delete(appliers[i]); /* doesn't affect diag */ + } }); replicaset_connect(appliers, count, connect_quorum); diff --git a/src/box/replication.cc b/src/box/replication.cc index 5755ad45e..0f205212e 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -427,6 +427,7 @@ replicaset_update(struct applier **appliers, int count) auto uniq_guard = make_scoped_guard([&]{ replica_hash_foreach_safe(&uniq, replica, next) { replica_hash_remove(&uniq, replica); + replica_clear_applier(replica); replica_delete(replica); } }); @@ -454,6 +455,9 @@ replicaset_update(struct applier **appliers, int count) replica->uuid = applier->uuid; if (replica_hash_search(&uniq, replica) != NULL) { + applier = replica->applier; + replica_clear_applier(replica); + replica_delete(replica); tnt_raise(ClientError, ER_CFG, "replication", "duplicate connection to the same replica"); } diff --git a/test/replication/misc.result b/test/replication/misc.result index 0ac48ba34..9e6ca75d8 100644 --- a/test/replication/misc.result +++ b/test/replication/misc.result @@ -391,6 +391,49 @@ test_run:cmd("delete server replica_auth") --- - true ... +-- +-- Test case for gh-3610. Before the fix replica would fail with the assertion +-- when trying to connect to the same master twice. +-- +box.schema.user.grant('guest', 'replication') +--- +... +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +--- +- true +... +test_run:cmd("start server replica") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +replication = box.cfg.replication +--- +... +box.cfg{replication = {replication, replication}} +--- +- error: 'Incorrect value for option ''replication'': duplicate connection to the + same replica' +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd('cleanup server replica') +--- +- true +... +test_run:cmd("delete server replica") +--- +- true +... box.schema.user.drop('cluster') --- ... diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua index 56e1bab69..f32464d19 100644 --- a/test/replication/misc.test.lua +++ b/test/replication/misc.test.lua @@ -162,4 +162,21 @@ test_run:cmd("stop server replica_auth") test_run:cmd("cleanup server replica_auth") test_run:cmd("delete server replica_auth") + +-- +-- Test case for gh-3610. Before the fix replica would fail with the assertion +-- when trying to connect to the same master twice. +-- +box.schema.user.grant('guest', 'replication') +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +test_run:cmd("start server replica") +test_run:cmd("switch replica") +replication = box.cfg.replication +box.cfg{replication = {replication, replication}} + +test_run:cmd("switch default") +test_run:cmd("stop server replica") +test_run:cmd('cleanup server replica') +test_run:cmd("delete server replica") + box.schema.user.drop('cluster') -- 2.14.3 (Apple Git-98)