From: Serge Petrenko <sergepetrenko@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync Date: Tue, 10 Nov 2020 10:44:31 +0300 [thread overview] Message-ID: <3612ded2-c2d4-aac8-6b2b-877e2084bb23@tarantool.org> (raw) In-Reply-To: <70029618-854d-ddec-95f7-b7de0adcc057@tarantool.org> 10.11.2020 01:36, Vladislav Shpilevoy пишет: > Hi! Thanks for the review! > >>> @@ -62,6 +65,7 @@ fiber = require('fiber') >>> -- Replication timeout is small to speed up a first election start. >> I checked, and the test speeds up a lot with this patch. But I don't understand >> why. We have only two instances, only one of them is candidate. >> Thanks to the small replication_timeout, the election starts shortly after the >> old leader dies. election_timeout isn't involved here, AFAICS. >> Am I missing something? >> Even when the test is restarted, there shouldn't be any 're-elections'. > That is a very good question! Honestly, I didn't think of it. Strangely, > when I did the patch, I just "believed" this is what I should do. > > I did some digging now, and there is a simple explanation. And a bug. So > this is really good you looked at this patch skeptically. > > Here is what is happening in the test, in short: > > master: create_replica() > master: set_mode('voter') > > replica: set_mode('candidate') > replica: wait_leader() > > When the instance is clear, master's term is 1 and this term does not have > votes yet. When replica is started, it votes for self, and gets elected > quite fast. > > This is happening fast even without this patch, when you run this test > separately. > > But when the test is running after some other Raft tests, on the leader the > term is not 1. It can be 3, 4, or more. When replica is attached, the > leader does not send its state to the replica, because election is disabled. > > So the replica starts from term 1, tries to elect itself and is ignored > because its term is too old. Waits for election timeout and tries again. This > is repeated as many times as is the term value. > > This is why the tests running in the end were always slower. > > There is a bug. The first node should have sent its state when election mode > was set to the voter on it. But it didn't. > > I reworked the patch so now the timeout, on the contrary, is set to 1000000 > seconds. And the tests hangs infinitely unless we send Raft state when election > is enabled. Thanks for all the digging and the explanation! The new patch LGTM. > > The new patch: > > ==================== > raft: send state when state machine is started > > Raft didn't broadcast its state when the state machine was > started. It could lead to the state being never sent until some > other node would generate a term number bigger that the local one. > > That happened when a node participated in some elections, > accumulated a big term number, then the election was turned off, > and a new replica was connected in a 'candidate' state. Then the > first node was configured to be a 'voter'. > > The first node didn't send anything to the replica, because at > the moment of its connection the election was off. > > So the replica started from term 1, tried to start elections in > this term, but was ignored by the first node. It waited for > election timeout, bumped the term to 2, and the process was > repeated until the replica reached the first node's term + 1. It > could take very long time. > > The patch fixes it so now Raft broadcasts its state when it is > enabled. To cover the replicas connected while it was disabled. > > Closes #5499 > > diff --git a/src/box/raft.c b/src/box/raft.c > index 914b0d68f..28ca74cb5 100644 > --- a/src/box/raft.c > +++ b/src/box/raft.c > @@ -877,6 +877,14 @@ raft_sm_start(void) > raft_sm_wait_leader_found(); > } > box_update_ro_summary(); > + /* > + * Nothing changed. But when raft was stopped, its state wasn't sent to > + * replicas. At least this was happening at the moment of this being > + * written. On the other hand, this instance may have a term bigger than > + * any other term in the cluster. And if it wouldn't share the term, it > + * would ignore all the messages, including vote requests. > + */ > + raft_schedule_broadcast(); > } > > static void > diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result > index 086b17686..cb349efcc 100644 > --- a/test/replication/election_qsync.result > +++ b/test/replication/election_qsync.result > @@ -9,6 +9,9 @@ box.schema.user.grant('guest', 'super') > old_election_mode = box.cfg.election_mode > | --- > | ... > +old_election_timeout = box.cfg.election_timeout > + | --- > + | ... > old_replication_synchro_timeout = box.cfg.replication_synchro_timeout > | --- > | ... > @@ -60,8 +63,11 @@ fiber = require('fiber') > | --- > | ... > -- Replication timeout is small to speed up a first election start. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_synchro_quorum = 3, \ > replication_synchro_timeout = 1000000, \ > replication_timeout = 0.1, \ > @@ -114,8 +120,11 @@ box.cfg{replication_synchro_timeout = 1000000} > -- Configure separately from synchro timeout not to depend on the order of > -- synchro and election options appliance. Replication timeout is tiny to speed > -- up notice of the old leader death. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_timeout = 0.01, \ > } > | --- > @@ -143,6 +152,7 @@ test_run:cmd('delete server replica') > | ... > box.cfg{ \ > election_mode = old_election_mode, \ > + election_timeout = old_election_timeout, \ > replication_timeout = old_replication_timeout, \ > replication = old_replication, \ > replication_synchro_timeout = old_replication_synchro_timeout, \ > diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua > index 6a80f4859..eb89e5b79 100644 > --- a/test/replication/election_qsync.test.lua > +++ b/test/replication/election_qsync.test.lua > @@ -2,6 +2,7 @@ test_run = require('test_run').new() > box.schema.user.grant('guest', 'super') > > old_election_mode = box.cfg.election_mode > +old_election_timeout = box.cfg.election_timeout > old_replication_synchro_timeout = box.cfg.replication_synchro_timeout > old_replication_timeout = box.cfg.replication_timeout > old_replication = box.cfg.replication > @@ -28,8 +29,11 @@ box.cfg{election_mode = 'voter'} > test_run:switch('replica') > fiber = require('fiber') > -- Replication timeout is small to speed up a first election start. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_synchro_quorum = 3, \ > replication_synchro_timeout = 1000000, \ > replication_timeout = 0.1, \ > @@ -57,8 +61,11 @@ box.cfg{replication_synchro_timeout = 1000000} > -- Configure separately from synchro timeout not to depend on the order of > -- synchro and election options appliance. Replication timeout is tiny to speed > -- up notice of the old leader death. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_timeout = 0.01, \ > } > > @@ -70,6 +77,7 @@ box.space.test:drop() > test_run:cmd('delete server replica') > box.cfg{ \ > election_mode = old_election_mode, \ > + election_timeout = old_election_timeout, \ > replication_timeout = old_replication_timeout, \ > replication = old_replication, \ > replication_synchro_timeout = old_replication_synchro_timeout, \ -- Serge Petrenko
next prev parent reply other threads:[~2020-11-10 7:44 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-06 23:46 [Tarantool-patches] [PATCH 0/2] Raft slow tests Vladislav Shpilevoy 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic Vladislav Shpilevoy 2020-11-09 9:01 ` Serge Petrenko 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync Vladislav Shpilevoy 2020-11-09 9:20 ` Serge Petrenko 2020-11-09 22:36 ` Vladislav Shpilevoy 2020-11-10 7:44 ` Serge Petrenko [this message] 2020-11-10 21:11 ` [Tarantool-patches] [PATCH 0/2] Raft slow tests Alexander V. Tikhonov 2020-11-10 22:05 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3612ded2-c2d4-aac8-6b2b-877e2084bb23@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox