Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v3 08/10] Support manual elections in `box.ctl.clear_synchro_queue()`
Date: Fri, 16 Apr 2021 01:30:54 +0200	[thread overview]
Message-ID: <e7faa53f-6aa7-dbda-4b3b-f1c9d0bebc25@tarantool.org> (raw)
In-Reply-To: <f22339adb70bd44c857e09898909b0a1db1d2906.1618409665.git.sergepetrenko@tarantool.org>

Thanks for working on this!

See 5 comments below.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 3729ed997..6c7c8968a 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1525,12 +1526,74 @@ box_clear_synchro_queue(bool try_wait)
>  	if (!is_box_configured ||
>  	    raft_source_term(box_raft(), instance_id) == box_raft()->term)
>  		return 0;
> +
> +	bool run_elections = false;
> +
> +	switch (box_election_mode) {
> +	case ELECTION_MODE_OFF:
> +		break;
> +	case ELECTION_MODE_VOTER:
> +		assert(box_raft()->state == RAFT_STATE_FOLLOWER);
> +		diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
> +			 "manual elections");
> +		return -1;
> +	case ELECTION_MODE_MANUAL:
> +		assert(box_raft()->state != RAFT_STATE_CANDIDATE);
> +		if (box_raft()->state == RAFT_STATE_LEADER) {
> +			diag_set(ClientError, ER_ALREADY_LEADER);
> +			return -1;
> +		}
> +		run_elections = true;
> +		try_wait = false;
> +		break;
> +	case ELECTION_MODE_CANDIDATE:
> +		/*
> +		 * Leader elections are enabled, and this instance is allowed to
> +		 * promote only if it's already an elected leader. No manual
> +		 * elections.
> +		 */
> +		if (box_raft()->state != RAFT_STATE_LEADER) {

1. That is strange. Why do you allow to promote the node
if it is already the leader when mode is candidate, but do
not allow the same when the mode is manual?

Shouldn't we throw an error when the mode is candidate
regardless of the node role?

> +			diag_set(ClientError, ER_UNSUPPORTED, "election_mode="
> +				 "'candidate'", "manual elections");
> +			return -1;
> +		}
> +		break;
> +	default:
> +		unreachable();
> +	}
> +
>  	uint32_t former_leader_id = txn_limbo.owner_id;
>  	int64_t wait_lsn = txn_limbo.confirmed_lsn;
>  	int rc = 0;
>  	int quorum = replication_synchro_quorum;
>  	in_clear_synchro_queue = true;
>  
> +	if (run_elections) {
> +		/*
> +		 * Make this instance a candidate and run until some leader, not
> +		 * necessarily this instance, emerges.
> +		 */
> +		raft_cfg_is_candidate(box_raft(), true, false);
> +		/*
> +		 * Trigger new elections without waiting for an old leader to
> +		 * disappear.
> +		 */
> +		raft_new_term(box_raft());
> +		box_raft_wait_leader_found();
> +		raft_cfg_is_candidate(box_raft(), false, false);

2. What if during box_raft_wait_leader_found() I made the node candidate
via box.cfg? Won't you then reset it back to non-candidate here?

It probably should reset the current box.cfg mode back. Not just
remove the candidate flag.

> +		if (!box_raft()->is_enabled) {
> +			diag_set(ClientError, ER_RAFT_DISABLED);
> +			in_clear_synchro_queue = false;
> +			return -1;
> +		}
> +		if (box_raft()->state != RAFT_STATE_LEADER) {
> +			diag_set(ClientError, ER_INTERFERING_PROMOTE,
> +				 box_raft()->leader);
> +			in_clear_synchro_queue = false;
> +			return -1;
> +		}
> +	}
> +
>  	if (txn_limbo_is_empty(&txn_limbo))
>  		goto promote;
>  
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 285dbe4fd..c7dc79f9b 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -336,6 +336,28 @@ fail:
>  	panic("Could not write a raft request to WAL\n");
>  }
>  
> +static int
> +box_raft_wait_leader_found_trig(struct trigger *trig, void *event)

3. I thought we usually call triggers with _f suffix, not _trig.

> +{
> +	struct raft *raft = (struct raft *)event;
> +	assert(raft == box_raft());
> +	struct fiber *waiter = (struct fiber *)trig->data;

4. No need to cast this and event - void * cast works naturally in C.

> +	if (raft->leader != REPLICA_ID_NIL || !raft->is_enabled)
> +		fiber_wakeup(waiter);
> +	return 0;
> +}
> diff --git a/src/box/raft.h b/src/box/raft.h
> index 15f4e80d9..8fce423e1 100644
> --- a/src/box/raft.h
> +++ b/src/box/raft.h
> @@ -97,6 +97,9 @@ box_raft_checkpoint_remote(struct raft_request *req);
>  int
>  box_raft_process(struct raft_request *req, uint32_t source);
>  
> +void
> +box_raft_wait_leader_found();
> +
>  void
>  box_raft_init(void);
>  
> diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
> index e9ce8cade..7b77e05ea 100644
> --- a/src/lib/raft/raft.c
> +++ b/src/lib/raft/raft.c
> @@ -846,7 +846,7 @@ raft_cfg_is_enabled(struct raft *raft, bool is_enabled)
>  }
>  
>  void
> -raft_cfg_is_candidate(struct raft *raft, bool is_candidate)
> +raft_cfg_is_candidate(struct raft *raft, bool is_candidate, bool demote)

5. I know it might lead to some code duplication, but probably
better move that to other functions. For example,

	raft_cfg_is_temporary_candidate()

or something like that. Otherwise it appears surprisingly hard
to follow these 2 flags together. Although I might be wrong and
it would look worse. Did you try?

Or another option:

	raft_cfg_is_candidate(box_raft(), true, false);
	raft_cfg_is_candidate(box_raft(), false, false);

turns into

	raft_start_candidate(box_raft())
	raft_stop_candidate(box_raft())

Also it would be good to have unit tests for the changes in raft.h
and raft.c.

  reply	other threads:[~2021-04-15 23:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14 14:17 [Tarantool-patches] [PATCH v3 00/10] raft: introduce manual elections and fix a bug with re-applying rolled back transactions Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 01/10] wal: enrich row's meta information with sync replication flags Serge Petrenko via Tarantool-patches
2021-04-15 23:18   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16  7:08     ` Serge Petrenko via Tarantool-patches
2021-04-16  7:11       ` Serge Petrenko via Tarantool-patches
2021-04-16  8:57       ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 02/10] xrow: introduce a PROMOTE entry Serge Petrenko via Tarantool-patches
2021-04-15 23:19   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 16:18     ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 03/10] box: actualise iproto_key_type array Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 04/10] box: make clear_synchro_queue() write a PROMOTE entry instead of CONFIRM + ROLLBACK Serge Petrenko via Tarantool-patches
2021-04-15 23:20   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16  9:28     ` Serge Petrenko via Tarantool-patches
2021-04-16 14:03       ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 05/10] box: write PROMOTE even for empty limbo Serge Petrenko via Tarantool-patches
2021-04-15 23:21   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16  9:33     ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 06/10] raft: keep track of greatest known term and filter replication sources based on that Serge Petrenko via Tarantool-patches
2021-04-15 23:27   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 14:16     ` Serge Petrenko via Tarantool-patches
2021-04-16 22:13       ` Vladislav Shpilevoy via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 07/10] replication: introduce a new election mode: "manual" Serge Petrenko via Tarantool-patches
2021-04-15 23:27   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 14:18     ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 08/10] Support manual elections in `box.ctl.clear_synchro_queue()` Serge Petrenko via Tarantool-patches
2021-04-15 23:30   ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-04-16 15:38     ` Serge Petrenko via Tarantool-patches
2021-04-16 15:40       ` Serge Petrenko via Tarantool-patches
2021-04-16 15:50         ` Serge Petrenko via Tarantool-patches
2021-04-14 14:17 ` [Tarantool-patches] [PATCH v3 09/10] box: remove parameter from clear_synchro_queue Serge Petrenko via Tarantool-patches
2021-04-14 14:18 ` [Tarantool-patches] [PATCH v3 10/10] box.ctl: rename clear_synchro_queue to promote Serge Petrenko via Tarantool-patches
2021-04-15 23:31   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 16:13     ` Serge Petrenko via Tarantool-patches
2021-04-14 18:21 ` [Tarantool-patches] [PATCH v3 00/10] raft: introduce manual elections and fix a bug with re-applying rolled back transactions Cyrill Gorcunov via Tarantool-patches
2021-04-15 23:16 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-16 16:35   ` Serge Petrenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7faa53f-6aa7-dbda-4b3b-f1c9d0bebc25@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v3 08/10] Support manual elections in `box.ctl.clear_synchro_queue()`' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox