[Tarantool-patches] [PATCH v2 4/5] raft: introduce split vote detection
Serge Petrenko
sergepetrenko at tarantool.org
Thu Jan 20 16:22:43 MSK 2022
20.01.2022 03:43, Vladislav Shpilevoy пишет:
> Split vote is a situation when during election nobody can win and
> will not win in this term for sure. Not a single node could get
> enough votes. For example, with 4 nodes one could get 2 votes and
> other also 2 votes. Nobody will get quorum 3 in this term.
>
> The patch makes raft able to notice that situation and speed up
> the term bump. It is not bumped immediately though, because nodes
> might do that simultaneously and will get the split vote again
> after voting for self. There is a random delay. But it is just max
> 10% of election timeout, so it should speed up split vote
> resolution on 90% at least.
>
> Part of #5285
> ---
> src/lib/raft/raft.c | 129 +++++++++++++++-
> src/lib/raft/raft.h | 6 +
> test/unit/raft.c | 301 +++++++++++++++++++++++++++++++++++-
> test/unit/raft.result | 64 +++++++-
> test/unit/raft_test_utils.c | 12 ++
> test/unit/raft_test_utils.h | 5 +
> 6 files changed, 512 insertions(+), 5 deletions(-)
>
Thanks for the fixes! Please find 2 comments below.
> +static void
> +raft_check_split_vote(struct raft *raft)
> +{
> + /* When leader is known, there is no election. Thus no vote to split. */
> + if (raft->leader != 0)
> + return;
> + /* Not a candidate = can't trigger term bump anyway. */
> + if (!raft->is_candidate)
> + return;
> + /*
> + * WAL write in progress means the state is changing. All is rechecked
> + * when it is done.
> + */
> + if (raft->is_write_in_progress)
> + return;
> + if (!raft_has_split_vote(raft))
> + return;
> + assert(raft_ev_is_active(&raft->timer));
> + /*
> + * Could be already detected before. The timeout would be updated by now
> + * then.
> + */
> + if (raft->timer.repeat < raft->election_timeout)
> + return;
I don't think you should decrease timer.repeat.
This 'vote speedup' is for a single term only.
Besides the check below about delay >= remaining is enough
to test if split vote detection was already triggered.
> +
> + assert(raft->state == RAFT_STATE_FOLLOWER ||
> + raft->state == RAFT_STATE_CANDIDATE);
> + struct ev_loop *loop = raft_loop();
> + struct ev_timer *timer = &raft->timer;
> + double delay = raft_new_random_election_shift(raft);
> + /*
> + * Could be too late to speed up anything - probably the term is almost
> + * over anyway.
> + */
> + double remaining = raft_ev_timer_remaining(loop, timer);
> + if (delay >= remaining)
> + delay = remaining;
> + say_info("RAFT: split vote is discovered - %s, new term in %lf sec",
> + raft_scores_str(raft), delay);
> + raft_ev_timer_stop(loop, timer);
> + raft_ev_timer_set(timer, delay, delay);
...
> diff --git a/test/unit/raft_test_utils.h b/test/unit/raft_test_utils.h
> index c68dc3b22..2138a829e 100644
> --- a/test/unit/raft_test_utils.h
> +++ b/test/unit/raft_test_utils.h
> @@ -32,6 +32,7 @@
> #include "fakesys/fakeev.h"
> #include "fiber.h"
> #include "raft/raft.h"
> +#include "raft/raft_ev.h"
Why do you need it here?
> #include "unit.h"
>
> /** WAL simulation. It stores a list of rows which raft wanted to persist. */
> @@ -105,6 +106,7 @@ struct raft_node {
> int cfg_election_quorum;
> double cfg_death_timeout;
> uint32_t cfg_instance_id;
> + int cfg_cluster_size;
> struct vclock *cfg_vclock;
> };
>
> @@ -227,6 +229,9 @@ raft_node_cfg_is_enabled(struct raft_node *node, bool value);
> void
> raft_node_cfg_is_candidate(struct raft_node *node, bool value);
>
> +void
> +raft_node_cfg_cluster_size(struct raft_node *node, int value);
> +
> void
> raft_node_cfg_election_timeout(struct raft_node *node, double value);
>
--
Serge Petrenko
More information about the Tarantool-patches
mailing list