From: Vladimir Davydov <vdavydov.dev@gmail.com> To: Olga Arkhangelskaia <arkholga@tarantool.org> Cc: tarantool-patches@freelists.org Subject: Re: [tarantool-patches] [PATCH v3] replication: fix assertion with duplicate connection Date: Tue, 25 Sep 2018 13:26:14 +0300 [thread overview] Message-ID: <20180925102614.inmqslxq5zipm6ao@esperanza> (raw) In-Reply-To: <20180925085352.31408-1-arkholga@tarantool.org> On Tue, Sep 25, 2018 at 11:53:52AM +0300, Olga Arkhangelskaia wrote: > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 604119e9d..1f99f8090 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -630,7 +630,7 @@ applier_f(va_list ap) > return -1; > } > } catch (FiberIsCancelled *e) { > - applier_disconnect(applier, APPLIER_OFF); > + applier_disconnect(applier, APPLIER_DISCONNECTED); APPLIER_DISCONNECTED is a state that denotes temporary errors. It means that the applier will reconnect soon. In case when an applier is permanently stopped, we should use APPLIER_OFF. > break; > } catch (SocketError *e) { > applier_log_error(applier, e); > diff --git a/src/box/box.cc b/src/box/box.cc > index f25146d01..d3aeb5de0 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -667,8 +667,10 @@ box_sync_replication(bool connect_quorum) > diag_raise(); > > auto guard = make_scoped_guard([=]{ > - for (int i = 0; i < count; i++) > + for (int i = 0; i < count; i++) { > + applier_stop(appliers[i]); > applier_delete(appliers[i]); /* doesn't affect diag */ > + } As I mentioned in the previous review, this makes replicaset_connect protocol obscure: turns out it may or may not leave appliers running. I think that replicaset_connect should revert all appliers to the state they were in before this function was called, i.e. stopped. > }); > > replicaset_connect(appliers, count, connect_quorum); > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 5755ad45e..0f205212e 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -427,6 +427,7 @@ replicaset_update(struct applier **appliers, int count) > auto uniq_guard = make_scoped_guard([&]{ > replica_hash_foreach_safe(&uniq, replica, next) { > replica_hash_remove(&uniq, replica); > + replica_clear_applier(replica); > replica_delete(replica); > } > }); > @@ -454,6 +455,9 @@ replicaset_update(struct applier **appliers, int count) > replica->uuid = applier->uuid; > > if (replica_hash_search(&uniq, replica) != NULL) { > + applier = replica->applier; This line is pointless. > + replica_clear_applier(replica); > + replica_delete(replica); > tnt_raise(ClientError, ER_CFG, "replication", > "duplicate connection to the same replica"); > } > diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua > index 56e1bab69..95677e12b 100644 > --- a/test/replication/misc.test.lua > +++ b/test/replication/misc.test.lua > @@ -162,4 +162,22 @@ test_run:cmd("stop server replica_auth") > test_run:cmd("cleanup server replica_auth") > test_run:cmd("delete server replica_auth") > > + > +-- > +-- Test case for gh-3610. Before the fix replica would fail with the assertion > +-- when trying to connect to the same master twice. > +-- > +box.schema.user.grant('guest', 'replication') > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +test_run:cmd("start server replica") > +test_run:cmd("switch replica") > +replication = box.cfg.replication > +box.cfg{replication = {replication, replication}} > + > +test_run:cmd("switch default") > +box.schema.user.revoke('guest', 'replication') > +test_run:cmd("stop server replica") > +test_run:cmd('cleanup server replica') > +test_run:cmd("delete server replica") > + > box.schema.user.drop('cluster') Your test case should go before this line, because user 'cluster' was created by the previous test case. Also, there's still no test for the case when applier detects duplicate connection asynchronously. It should be pretty easy to implement: 1. On the master (default instance). Disable replication by clearing box.cfg.listen. 2. On the replica. Set box.cfg.replication_connect_quorum to 0 and replication_connect_timeout to 0.001. Then set box.cfg.replication so that the replica tries to connect to the master twice. Wait for box.cfg() to return. 3. On the master. Enable replication by setting box.cfg.listen back to its original value. 4. On the replica. Check that replication connection is established and that 'duplicate connection' error is printed to the log.
next prev parent reply other threads:[~2018-09-25 10:26 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-09-25 8:53 Olga Arkhangelskaia 2018-09-25 10:26 ` Vladimir Davydov [this message] -- strict thread matches above, loose matches on Subject: below -- 2018-09-25 8:26 Olga Arkhangelskaia
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180925102614.inmqslxq5zipm6ao@esperanza \ --to=vdavydov.dev@gmail.com \ --cc=arkholga@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [tarantool-patches] [PATCH v3] replication: fix assertion with duplicate connection' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox