From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: Serge Petrenko <sergepetrenko@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] raft: add a test with synchronous replication Date: Mon, 5 Oct 2020 23:40:02 +0200 [thread overview] Message-ID: <2a1cf4cc-6ab3-f5b0-774b-8a6e04be095a@tarantool.org> (raw) In-Reply-To: <e8035fec-ed79-9e9a-9e27-041fd49ae7b5@tarantool.org> Hi! Thanks for the fixes! >>> + | --- >>> + | ... >>> diff --git a/test/replication/election_replica.lua b/test/replication/election_replica.lua >>> index 36ea1f077..887d8a2a0 100644 >>> --- a/test/replication/election_replica.lua >>> +++ b/test/replication/election_replica.lua >>> @@ -19,8 +20,11 @@ box.cfg({ >>> replication_timeout = 0.1, >>> election_is_enabled = true, >>> election_is_candidate = true, >>> - election_timeout = 0.1, >>> - replication_synchro_quorum = 3, >>> + -- Should be at least as big as replication_disconnect_timeout, which is >>> + -- 4 * replication_timeout. >>> + election_timeout = 0.4, >> 2. Why? Election timeout has nothing to do with disconnect. It is about >> split vote. This also will slow down raft_basic.test.lua, which is not >> supposed to be long. For heartbeat timeouts Raft already uses >> replication_disconnect_timeout = replication_timeout * 4. > > I've seen cases when a leader is elected, but doesn't send out the is_leader flag > in time, so new elections start over and over again. This only happened when the > tests were run in parallel, so the problem was probably in high load. It should not be a problem. 100ms is enough to eventually elect a leader when the instances run on the same machine. Several election attempts should not lead to a test fail. Because even 0.4 may lead to that. It is not a guaranteed protection. > So, my logic was that if we wait for 4 times replication timeout for the leader to > come back why not wait for 4 * replication timeout for the leader to establish > its leadership. > > I mean, if it's considered a normal situation when a leader disappears for not more > than 4 * replication_timeout, and this doesn't trigger an election, why should > elections end before at least 4 * replication_timeout seconds pass? Because it is safe to retry it, and it is normal due to split vote possibility. > By the way, the raft paper doesn't have a separate leader disconnect timeout. The > same election timeout is used for this purpose. So that's another argument for > setting election_timeout to at least 4 * replication_timeout. But I see your point. I started a discussion with other participants. It is likely we will remove election_timeout option and use replication death timeout instead. Also we will probably drop election_is_enabled and election_is_candidate, and replace them with a new option election_mode, which is a string: either 'off', or 'candidate', or 'voter'. Another alternative - 'off' / 'on' / 'voter'. Or 'voter' -> 'only_vote'. Idk yet. Anyway it looks better than 2 flags, I think. The patch LGTM. However it seems your didn't push the update on the branch.
next prev parent reply other threads:[~2020-10-05 21:40 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-02 10:33 Serge Petrenko 2020-10-04 13:54 ` Vladislav Shpilevoy 2020-10-05 8:52 ` Serge Petrenko 2020-10-05 21:40 ` Vladislav Shpilevoy [this message] 2020-10-06 7:30 ` Serge Petrenko 2020-10-06 10:04 ` Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2a1cf4cc-6ab3-f5b0-774b-8a6e04be095a@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] raft: add a test with synchronous replication' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox