[Tarantool-patches] [PATCH 2/2] election: during bootstrap prefer candidates

Serge Petrenko sergepetrenko at tarantool.org
Fri Jul 16 14:30:36 MSK 2021



16.07.2021 02:49, Vladislav Shpilevoy пишет:
> During cluster bootstrap the boot master election algorithm didn't
> take into account election modes of the instances. It could be
> that all nodes have box.cfg.read_only = false, none is booted,
> all are read-only now. Then the node with the smallest UUID was
> chosen even if it was box.cfg.election_mode='voter' node.
>
> It could neither boot nor register other nodes and the cluster
> couldn't start.
>
> The patch makes the boot master election prefer the instances
> which can become a Raft leader. If all the other parameters didn't
> help.
>
> Closes #6018
> ---

Hi! Thanks for the patch!

>   .../unreleased/gh-6018-election-boot-voter.md |   4 +
>   src/box/box.cc                                |  25 +++-
>   src/box/replication.cc                        |  11 +-
>   .../gh-6018-election-boot-voter.result        | 116 ++++++++++++++++++
>   .../gh-6018-election-boot-voter.test.lua      |  59 +++++++++
>   test/replication/gh-6018-master.lua           |  17 +++
>   test/replication/gh-6018-replica.lua          |  15 +++
>   test/replication/suite.cfg                    |   1 +
>   8 files changed, 245 insertions(+), 3 deletions(-)
>   create mode 100644 changelogs/unreleased/gh-6018-election-boot-voter.md
>   create mode 100644 test/replication/gh-6018-election-boot-voter.result
>   create mode 100644 test/replication/gh-6018-election-boot-voter.test.lua
>   create mode 100644 test/replication/gh-6018-master.lua
>   create mode 100644 test/replication/gh-6018-replica.lua
>
> diff --git a/changelogs/unreleased/gh-6018-election-boot-voter.md b/changelogs/unreleased/gh-6018-election-boot-voter.md
> new file mode 100644
> index 000000000..080484bbe
> --- /dev/null
> +++ b/changelogs/unreleased/gh-6018-election-boot-voter.md
> @@ -0,0 +1,4 @@
> +## bugfix/replication
> +
> +* Fixed a cluster sometimes being unable to bootstrap if it contains nodes with
> +  `election_mode` `manual` or `voter` (gh-6018).
> diff --git a/src/box/box.cc b/src/box/box.cc
> index ef3efe3e0..3105b04b6 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -3519,7 +3519,30 @@ box_cfg_xc(void)
>   		 * should take the control over the situation and start a new
>   		 * term immediately.
>   		 */
> -		raft_new_term(box_raft());
> +		struct raft *raft = box_raft();
> +		if (box_election_mode == ELECTION_MODE_MANUAL) {
> +			raft_start_candidate(raft);
> +			raft_new_term(raft);
> +			int rc = box_raft_wait_leader_found();
> +			/*
> +			 * No need to check if the mode is still manual - it
> +			 * couldn't change because box.cfg is protected with a
> +			 * fiber lock.
> +			 */
> +			assert(box_election_mode == ELECTION_MODE_MANUAL);
> +			raft_stop_candidate(raft, false);
> +			/*
> +			 * It should not fail, because on bootstrap the node is
> +			 * a single registered instance. It can't not win the
> +			 * elections while being a lone participant. But still
> +			 * check the result so as not to a ignore potential
> +			 * problems.
> +			 */
> +			if (rc != 0)
> +				diag_raise();
> +		} else {
> +			raft_new_term(raft);
> +		}

Could you please extract this fix into a separate commit?

Speaking of your problems with raft_try_candidate. I also can't think
of a good enough alternative.

For promote it would be nice to do:

do {
     raft_try_candidate_for_1_term();
} while (leader is not known);

and simply
raft_try_candidate_for_1_term();
for bootstrap.

But raft_try_candidate_for_1_term() looks hard to implement.

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list