From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH v2 4/5] raft: introduce split vote detection Date: Thu, 20 Jan 2022 16:22:43 +0300 [thread overview] Message-ID: <33c8217d-c839-1b1e-7595-db44c640eeac@tarantool.org> (raw) In-Reply-To: <72798befd5d6e32f4386aaeeb3209cc93c0e44b4.1642639079.git.v.shpilevoy@tarantool.org> 20.01.2022 03:43, Vladislav Shpilevoy пишет: > Split vote is a situation when during election nobody can win and > will not win in this term for sure. Not a single node could get > enough votes. For example, with 4 nodes one could get 2 votes and > other also 2 votes. Nobody will get quorum 3 in this term. > > The patch makes raft able to notice that situation and speed up > the term bump. It is not bumped immediately though, because nodes > might do that simultaneously and will get the split vote again > after voting for self. There is a random delay. But it is just max > 10% of election timeout, so it should speed up split vote > resolution on 90% at least. > > Part of #5285 > --- > src/lib/raft/raft.c | 129 +++++++++++++++- > src/lib/raft/raft.h | 6 + > test/unit/raft.c | 301 +++++++++++++++++++++++++++++++++++- > test/unit/raft.result | 64 +++++++- > test/unit/raft_test_utils.c | 12 ++ > test/unit/raft_test_utils.h | 5 + > 6 files changed, 512 insertions(+), 5 deletions(-) > Thanks for the fixes! Please find 2 comments below. > +static void > +raft_check_split_vote(struct raft *raft) > +{ > + /* When leader is known, there is no election. Thus no vote to split. */ > + if (raft->leader != 0) > + return; > + /* Not a candidate = can't trigger term bump anyway. */ > + if (!raft->is_candidate) > + return; > + /* > + * WAL write in progress means the state is changing. All is rechecked > + * when it is done. > + */ > + if (raft->is_write_in_progress) > + return; > + if (!raft_has_split_vote(raft)) > + return; > + assert(raft_ev_is_active(&raft->timer)); > + /* > + * Could be already detected before. The timeout would be updated by now > + * then. > + */ > + if (raft->timer.repeat < raft->election_timeout) > + return; I don't think you should decrease timer.repeat. This 'vote speedup' is for a single term only. Besides the check below about delay >= remaining is enough to test if split vote detection was already triggered. > + > + assert(raft->state == RAFT_STATE_FOLLOWER || > + raft->state == RAFT_STATE_CANDIDATE); > + struct ev_loop *loop = raft_loop(); > + struct ev_timer *timer = &raft->timer; > + double delay = raft_new_random_election_shift(raft); > + /* > + * Could be too late to speed up anything - probably the term is almost > + * over anyway. > + */ > + double remaining = raft_ev_timer_remaining(loop, timer); > + if (delay >= remaining) > + delay = remaining; > + say_info("RAFT: split vote is discovered - %s, new term in %lf sec", > + raft_scores_str(raft), delay); > + raft_ev_timer_stop(loop, timer); > + raft_ev_timer_set(timer, delay, delay); ... > diff --git a/test/unit/raft_test_utils.h b/test/unit/raft_test_utils.h > index c68dc3b22..2138a829e 100644 > --- a/test/unit/raft_test_utils.h > +++ b/test/unit/raft_test_utils.h > @@ -32,6 +32,7 @@ > #include "fakesys/fakeev.h" > #include "fiber.h" > #include "raft/raft.h" > +#include "raft/raft_ev.h" Why do you need it here? > #include "unit.h" > > /** WAL simulation. It stores a list of rows which raft wanted to persist. */ > @@ -105,6 +106,7 @@ struct raft_node { > int cfg_election_quorum; > double cfg_death_timeout; > uint32_t cfg_instance_id; > + int cfg_cluster_size; > struct vclock *cfg_vclock; > }; > > @@ -227,6 +229,9 @@ raft_node_cfg_is_enabled(struct raft_node *node, bool value); > void > raft_node_cfg_is_candidate(struct raft_node *node, bool value); > > +void > +raft_node_cfg_cluster_size(struct raft_node *node, int value); > + > void > raft_node_cfg_election_timeout(struct raft_node *node, double value); > -- Serge Petrenko
next prev parent reply other threads:[~2022-01-20 13:22 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-20 0:43 [Tarantool-patches] [PATCH v2 0/5] Split vote and bugs Vladislav Shpilevoy via Tarantool-patches 2022-01-20 0:43 ` [Tarantool-patches] [PATCH v2 1/5] raft: fix crash on election_timeout reconfig Vladislav Shpilevoy via Tarantool-patches 2022-01-20 0:43 ` [Tarantool-patches] [PATCH v2 2/5] raft: fix ev_timer.at incorrect usage Vladislav Shpilevoy via Tarantool-patches 2022-01-20 0:43 ` [Tarantool-patches] [PATCH v2 3/5] raft: track all votes, even not own Vladislav Shpilevoy via Tarantool-patches 2022-01-20 0:43 ` [Tarantool-patches] [PATCH v2 4/5] raft: introduce split vote detection Vladislav Shpilevoy via Tarantool-patches 2022-01-20 13:22 ` Serge Petrenko via Tarantool-patches [this message] 2022-01-20 23:02 ` Vladislav Shpilevoy via Tarantool-patches 2022-01-25 10:17 ` Serge Petrenko via Tarantool-patches 2022-01-20 0:43 ` [Tarantool-patches] [PATCH v2 5/5] election: activate raft split vote handling Vladislav Shpilevoy via Tarantool-patches 2022-01-25 10:18 ` [Tarantool-patches] [PATCH v2 0/5] Split vote and bugs Serge Petrenko via Tarantool-patches 2022-01-25 22:51 ` Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=33c8217d-c839-1b1e-7595-db44c640eeac@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 4/5] raft: introduce split vote detection' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox