Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
	tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync
Date: Tue, 10 Nov 2020 10:44:31 +0300	[thread overview]
Message-ID: <3612ded2-c2d4-aac8-6b2b-877e2084bb23@tarantool.org> (raw)
In-Reply-To: <70029618-854d-ddec-95f7-b7de0adcc057@tarantool.org>


10.11.2020 01:36, Vladislav Shpilevoy пишет:
> Hi! Thanks for the review!
>
>>> @@ -62,6 +65,7 @@ fiber = require('fiber')
>>>    -- Replication timeout is small to speed up a first election start.
>> I checked, and the test speeds up a lot with this patch. But I don't understand
>> why. We have only two instances, only one of them is candidate.
>> Thanks to the small replication_timeout, the election starts shortly after the
>> old leader dies. election_timeout isn't involved here, AFAICS.
>> Am I missing something?
>> Even when the test is restarted, there shouldn't be any 're-elections'.
> That is a very good question! Honestly, I didn't think of it. Strangely,
> when I did the patch, I just "believed" this is what I should do.
>
> I did some digging now, and there is a simple explanation. And a bug. So
> this is really good you looked at this patch skeptically.
>
> Here is what is happening in the test, in short:
>
> 	master:  create_replica()
> 	master:  set_mode('voter')
>
> 	replica: set_mode('candidate')
> 	replica: wait_leader()
>
> When the instance is clear, master's term is 1 and this term does not have
> votes yet. When replica is started, it votes for self, and gets elected
> quite fast.
>
> This is happening fast even without this patch, when you run this test
> separately.
>
> But when the test is running after some other Raft tests, on the leader the
> term is not 1. It can be 3, 4, or more. When replica is attached, the
> leader does not send its state to the replica, because election is disabled.
>
> So the replica starts from term 1, tries to elect itself and is ignored
> because its term is too old. Waits for election timeout and tries again. This
> is repeated as many times as is the term value.
>
> This is why the tests running in the end were always slower.
>
> There is a bug. The first node should have sent its state when election mode
> was set to the voter on it. But it didn't.
>
> I reworked the patch so now the timeout, on the contrary, is set to 1000000
> seconds. And the tests hangs infinitely unless we send Raft state when election
> is enabled.

Thanks for all the digging and the explanation!

The new patch LGTM.

>
> The new patch:
>
> ====================
>      raft: send state when state machine is started
>      
>      Raft didn't broadcast its state when the state machine was
>      started. It could lead to the state being never sent until some
>      other node would generate a term number bigger that the local one.
>      
>      That happened when a node participated in some elections,
>      accumulated a big term number, then the election was turned off,
>      and a new replica was connected in a 'candidate' state. Then the
>      first node was configured to be a 'voter'.
>      
>      The first node didn't send anything to the replica, because at
>      the moment of its connection the election was off.
>      
>      So the replica started from term 1, tried to start elections in
>      this term, but was ignored by the first node. It waited for
>      election timeout, bumped the term to 2, and the process was
>      repeated until the replica reached the first node's term + 1. It
>      could take very long time.
>      
>      The patch fixes it so now Raft broadcasts its state when it is
>      enabled. To cover the replicas connected while it was disabled.
>      
>      Closes #5499
>
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 914b0d68f..28ca74cb5 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -877,6 +877,14 @@ raft_sm_start(void)
>   		raft_sm_wait_leader_found();
>   	}
>   	box_update_ro_summary();
> +	/*
> +	 * Nothing changed. But when raft was stopped, its state wasn't sent to
> +	 * replicas. At least this was happening at the moment of this being
> +	 * written. On the other hand, this instance may have a term bigger than
> +	 * any other term in the cluster. And if it wouldn't share the term, it
> +	 * would ignore all the messages, including vote requests.
> +	 */
> +	raft_schedule_broadcast();
>   }
>   
>   static void
> diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result
> index 086b17686..cb349efcc 100644
> --- a/test/replication/election_qsync.result
> +++ b/test/replication/election_qsync.result
> @@ -9,6 +9,9 @@ box.schema.user.grant('guest', 'super')
>   old_election_mode = box.cfg.election_mode
>    | ---
>    | ...
> +old_election_timeout = box.cfg.election_timeout
> + | ---
> + | ...
>   old_replication_synchro_timeout = box.cfg.replication_synchro_timeout
>    | ---
>    | ...
> @@ -60,8 +63,11 @@ fiber = require('fiber')
>    | ---
>    | ...
>   -- Replication timeout is small to speed up a first election start.
> +-- Election timeout is set to a huge value to ensure the election does not hang
> +-- anywhere. Indeed, there can't be a split-vote when candidate is only one.
>   box.cfg{                                                                        \
>       election_mode = 'candidate',                                                \
> +    election_timeout = 1000000,                                                 \
>       replication_synchro_quorum = 3,                                             \
>       replication_synchro_timeout = 1000000,                                      \
>       replication_timeout = 0.1,                                                  \
> @@ -114,8 +120,11 @@ box.cfg{replication_synchro_timeout = 1000000}
>   -- Configure separately from synchro timeout not to depend on the order of
>   -- synchro and election options appliance. Replication timeout is tiny to speed
>   -- up notice of the old leader death.
> +-- Election timeout is set to a huge value to ensure the election does not hang
> +-- anywhere. Indeed, there can't be a split-vote when candidate is only one.
>   box.cfg{                                                                        \
>       election_mode = 'candidate',                                                \
> +    election_timeout = 1000000,                                                 \
>       replication_timeout = 0.01,                                                 \
>   }
>    | ---
> @@ -143,6 +152,7 @@ test_run:cmd('delete server replica')
>    | ...
>   box.cfg{                                                                        \
>       election_mode = old_election_mode,                                          \
> +    election_timeout = old_election_timeout,                                    \
>       replication_timeout = old_replication_timeout,                              \
>       replication = old_replication,                                              \
>       replication_synchro_timeout = old_replication_synchro_timeout,              \
> diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua
> index 6a80f4859..eb89e5b79 100644
> --- a/test/replication/election_qsync.test.lua
> +++ b/test/replication/election_qsync.test.lua
> @@ -2,6 +2,7 @@ test_run = require('test_run').new()
>   box.schema.user.grant('guest', 'super')
>   
>   old_election_mode = box.cfg.election_mode
> +old_election_timeout = box.cfg.election_timeout
>   old_replication_synchro_timeout = box.cfg.replication_synchro_timeout
>   old_replication_timeout = box.cfg.replication_timeout
>   old_replication = box.cfg.replication
> @@ -28,8 +29,11 @@ box.cfg{election_mode = 'voter'}
>   test_run:switch('replica')
>   fiber = require('fiber')
>   -- Replication timeout is small to speed up a first election start.
> +-- Election timeout is set to a huge value to ensure the election does not hang
> +-- anywhere. Indeed, there can't be a split-vote when candidate is only one.
>   box.cfg{                                                                        \
>       election_mode = 'candidate',                                                \
> +    election_timeout = 1000000,                                                 \
>       replication_synchro_quorum = 3,                                             \
>       replication_synchro_timeout = 1000000,                                      \
>       replication_timeout = 0.1,                                                  \
> @@ -57,8 +61,11 @@ box.cfg{replication_synchro_timeout = 1000000}
>   -- Configure separately from synchro timeout not to depend on the order of
>   -- synchro and election options appliance. Replication timeout is tiny to speed
>   -- up notice of the old leader death.
> +-- Election timeout is set to a huge value to ensure the election does not hang
> +-- anywhere. Indeed, there can't be a split-vote when candidate is only one.
>   box.cfg{                                                                        \
>       election_mode = 'candidate',                                                \
> +    election_timeout = 1000000,                                                 \
>       replication_timeout = 0.01,                                                 \
>   }
>   
> @@ -70,6 +77,7 @@ box.space.test:drop()
>   test_run:cmd('delete server replica')
>   box.cfg{                                                                        \
>       election_mode = old_election_mode,                                          \
> +    election_timeout = old_election_timeout,                                    \
>       replication_timeout = old_replication_timeout,                              \
>       replication = old_replication,                                              \
>       replication_synchro_timeout = old_replication_synchro_timeout,              \

-- 
Serge Petrenko

  reply	other threads:[~2020-11-10  7:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-06 23:46 [Tarantool-patches] [PATCH 0/2] Raft slow tests Vladislav Shpilevoy
2020-11-06 23:46 ` [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic Vladislav Shpilevoy
2020-11-09  9:01   ` Serge Petrenko
2020-11-06 23:46 ` [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync Vladislav Shpilevoy
2020-11-09  9:20   ` Serge Petrenko
2020-11-09 22:36     ` Vladislav Shpilevoy
2020-11-10  7:44       ` Serge Petrenko [this message]
2020-11-10 21:11 ` [Tarantool-patches] [PATCH 0/2] Raft slow tests Alexander V. Tikhonov
2020-11-10 22:05 ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3612ded2-c2d4-aac8-6b2b-877e2084bb23@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox