[Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Jul 27 02:50:18 MSK 2021


Thanks for the fixes!

On 23.07.2021 09:44, Sergey Petrenko wrote:
> 
> 22.07.2021 01:28, Vladislav Shpilevoy пишет:
>> Thanks for the fixes!
>>
>>>> 2. On top of this commit and on top of the branch too I tried to
>>>> promote a candidate and got a strange error in the logs, although
>>>> the promotion was successful:
>>>>
>>>> -- 
>>>> -- Instance 1
>>>> -- 
>>>>
>>>> -- Step 1
>>>> box.cfg{
>>>>       listen = 3313,
>>>>       election_mode = 'candidate',
>>>>       replication_synchro_quorum = 2,
>>>>       replication = {'localhost:3314', 'localhost:3313'}
>>>> }
>>>> box.schema.user.grant('guest', 'super')
>>>>
>>>>
>>>> -- 
>>>> -- Instance 2
>>>> -- 
>>>>
>>>> -- Step 2
>>>> box.cfg{
>>>>       listen = 3314,
>>>>       election_mode = 'voter',
>>>>       replication_synchro_quorum = 2,
>>>>       replication = {'localhost:3314', 'localhost:3313'},
>>>>       read_only = true,
>>>> }
>>>>
>>>> -- Step 3
>>>> box.cfg{read_only = false, election_mode = 'candidate'}
>>>>
>>>> -- Step 4
>>>> box.ctl.promote()
>>>>
>>>> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
>>>> ---
>>>> ...
>>> That's because once a candidate becomes the leader, it tries to issue `box.ctl.promote()`, and fails,
>>> since we're already in `box.ctl.promote()` call.
>>> I'm not sure how to handle that properly. This doesn't break anything though.
>> Still, the error message looks really not good. There is an option -
>> make box_promote() for candidate node just call raft_promote() and set
>> box_in_promote = false. Then wait for the term outcome. Will it work?
>> You would need to rebase your patchset on master branch then.
> 
> 
> Hmm, I don't like that. This will work, but it will complicate things in
> 
> box_promote even more. Now one has to keep in mind that a manual
> 
> promote call has 2 phases, one when it is issued manually and the other
> 
> when it's called automatically when (and if) the instance becomes the leader.

But is it different in the new version on the branch? You still need to check
if you are "in promote", but instead of it being encapsulated into box_promote()
we now need to check it potentially in all box_promote() usage places. Which is
only 2 now though, and only 1 needs this. But you can see that now you need to
think if you should ignore being already in promote.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index a30e4f78d..5cdca4bd4 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1692,20 +1692,24 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
>  }
> 
>  /* A guard to block multiple simultaneous promote()/demote() invocations. */
> -static bool box_in_promote = false;
> +static bool in_box_promote = false;
> +
> +bool box_in_promote(void) {

Could you please put the return value on a separate line and use 'is'
as a suffix? For instance, `box_is_in_promote()`.

> diff --git a/src/box/box.h b/src/box/box.h
> index aaf20d9dd..344ed90f2 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -88,11 +88,11 @@ box_raft_update_synchro_queue(struct raft *raft)
>  {
>      assert(raft == box_raft());
>      /*
> -     * In case these are manual elections, we are already in the middle of a
> -     * `promote` call. No need to call it once again.
> +     * In case the elections were triggered manually, we are already in
> +     * the middle of a `promote` call. No need to call it once again.
>       */
>      if (raft->state == RAFT_STATE_LEADER &&
> -        box_election_mode != ELECTION_MODE_MANUAL) {
> +        !box_in_promote()) {

It just seems to me that it should not be of raft's business if it was
promoted manually or not.

>          int rc = 0;
>          uint32_t errcode = 0;
>          do {
> =====================================
> 
> 
>>
>> It might be a little easier to do if you apply the diff below. (Warning:
>> I didn't test it.) The motivation is that one of the main reasons why I
>> wanted box_promote() simplified was because of the strange meaning of some
>> flags. In particular, try_wait flag somewhy triggered elections before the
>> waiting which is super not obvious why. How does 'wait' come to 'elections'?
>>
>> In the diff I tried to remove these flags entirely. And now you have a
>> single place in the code of box_promote(), where ELECTION_MODE_CANDIDATE
>> stuff is handled. Here you could try the proposal I gave above.
> 
> 
> Thanks for the help! Your diff looks good, I've reworked my patches to comply.

Wouldn't it be too hard to merge this diff into the previous
commits? Because it could be done like that from the first
box_promote-refactoring commit it looks to me.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index af39f66f4..36302310b 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1701,14 +1701,14 @@ box_promote(void)
>  	if (!is_box_configured)
>  		return 0;
>  	if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
> -	    box_raft()->term)
> +	    raft->term)
>  		return 0;
> -	bool run_elections = false;
> -	bool try_wait = false;
> -
>  	switch (box_election_mode) {
>  	case ELECTION_MODE_OFF:
> -		try_wait = true;
> +		if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
> +			return -1;
> +		if (box_trigger_elections() != 0)
> +			return -1;
>  		break;
>  	case ELECTION_MODE_VOTER:
>  		assert(raft->state == RAFT_STATE_FOLLOWER);
> @@ -1717,23 +1717,17 @@ box_promote(void)
>  		return -1;
>  	case ELECTION_MODE_MANUAL:
>  	case ELECTION_MODE_CANDIDATE:
> -		run_elections = raft->state != RAFT_STATE_LEADER;
> +		if (raft->state != RAFT_STATE_LEADER &&
> +		    box_run_elections() != 0) {
> +			return -1;
> +		}
>  		break;
>  	default:
>  		unreachable();
>  	}
>  
> -	int64_t wait_lsn = -1;
> -
> -	if (run_elections && box_run_elections() != 0)
> -		return -1;
> -	if (try_wait) {
> -		if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
> -			return -1;
> -		if (box_trigger_elections() != 0)
> -			return -1;
> -	}
> -	if ((wait_lsn = box_wait_limbo_acked()) < 0)
> +	int64_t wait_lsn = box_wait_limbo_acked();
> +	if (wait_lsn < 0)
>  		return -1;
>  
>  	box_issue_promote(txn_limbo.owner_id, wait_lsn);

I thought more about how the manual promotion conflicts with the
automatic one and I tried to separate them. See my diff below, it
is done on top of this commit, but I didn't push it, because I
didn't run the tests except the one which used to log an error
in the first version of the patchset.

Thanks to the new small helper functions, I implemented a special
function for automatic promote which does not conflict with
the manual one. It you will apply it though, you might want to
do it earlier that this commit.

====================
diff --git a/src/box/box.cc b/src/box/box.cc
index 36302310b..65d064615 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1599,7 +1599,7 @@ box_try_wait_confirm(double timeout)
  * Return lsn of the last limbo entry on success, -1 on error.
  */
 static int64_t
-box_wait_limbo_acked(void)
+box_wait_limbo_acked(double timeout)
 {
 	if (txn_limbo_is_empty(&txn_limbo))
 		return txn_limbo.confirmed_lsn;
@@ -1629,7 +1629,7 @@ box_wait_limbo_acked(void)
 	int64_t wait_lsn = last_entry->lsn;
 
 	if (box_wait_quorum(txn_limbo.owner_id, wait_lsn, quorum,
-			    replication_synchro_timeout) != 0)
+			    timeout) != 0)
 		return -1;
 
 	if (box_check_promote_term_intact(promote_term) != 0)
@@ -1675,9 +1675,23 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
 /* A guard to block multiple simultaneous box_promote() invocations. */
 static bool in_box_promote = false;
 
-bool box_in_promote(void)
+int
+box_promote_qsync(void)
 {
-	return in_box_promote;
+	assert(!in_box_promote);
+	assert(is_box_configured);
+	struct raft *raft = box_raft();
+	if (raft->state != RAFT_STATE_LEADER)
+		return 0;
+	assert(box_election_mode == ELECTION_MODE_MANUAL ||
+	       box_election_mode == ELECTION_MODE_CANDIDATE);
+	if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
+		return 0;
+	int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
+	if (wait_lsn < 0)
+		return -1;
+	box_issue_promote(txn_limbo.owner_id, wait_lsn);
+	return 0;
 }
 
 int
@@ -1700,8 +1714,8 @@ box_promote(void)
 	 */
 	if (!is_box_configured)
 		return 0;
-	if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
-	    raft->term)
+	if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term ||
+	    raft->state == RAFT_STATE_LEADER)
 		return 0;
 	switch (box_election_mode) {
 	case ELECTION_MODE_OFF:
@@ -1715,18 +1729,16 @@ box_promote(void)
 		diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
 			 "manual elections");
 		return -1;
-	case ELECTION_MODE_MANUAL:
 	case ELECTION_MODE_CANDIDATE:
-		if (raft->state != RAFT_STATE_LEADER &&
-		    box_run_elections() != 0) {
-			return -1;
-		}
-		break;
+	case ELECTION_MODE_MANUAL:
+		/* Let the automatic elections finish the promotion properly. */
+		in_box_promote = false;
+		return box_run_elections();
 	default:
 		unreachable();
 	}
 
-	int64_t wait_lsn = box_wait_limbo_acked();
+	int64_t wait_lsn = box_wait_limbo_acked(replication_synchro_timeout);
 	if (wait_lsn < 0)
 		return -1;
 
diff --git a/src/box/box.h b/src/box/box.h
index db636b058..5e5d0bf6d 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -273,12 +273,12 @@ extern "C" {
 
 typedef struct tuple box_tuple_t;
 
-bool
-box_in_promote(void);
-
 int
 box_promote(void);
 
+int
+box_promote_qsync(void);
+
 /* box_select is private and used only by FFI */
 API_EXPORT int
 box_select(uint32_t space_id, uint32_t index_id,
diff --git a/src/box/raft.c b/src/box/raft.c
index 5e496c2e4..07501157f 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -83,30 +83,6 @@ box_raft_request_to_msg(const struct raft_request *req, struct raft_msg *msg)
 	};
 }
 
-static void
-box_raft_update_synchro_queue(struct raft *raft)
-{
-	assert(raft == box_raft());
-	/*
-	 * In case the elections were triggered manually, we are already in
-	 * the middle of a `promote` call. No need to call it once again.
-	 */
-	if (raft->state == RAFT_STATE_LEADER &&
-	    !box_in_promote()) {
-		int rc = 0;
-		uint32_t errcode = 0;
-		do {
-			rc = box_promote();
-			if (rc != 0) {
-				struct error *err = diag_last_error(diag_get());
-				errcode = box_error_code(err);
-				diag_log();
-			}
-		} while (rc != 0 && errcode == ER_QUORUM_WAIT &&
-		       !fiber_is_cancelled());
-	}
-}
-
 static int
 box_raft_worker_f(va_list args)
 {
@@ -117,7 +93,12 @@ box_raft_worker_f(va_list args)
 		box_raft_has_work = false;
 
 		raft_process_async(raft);
-		box_raft_update_synchro_queue(raft);
+		/*
+		 * XXX: perhaps it should not ever fail. Or at least need a
+		 * proper support for failures instead of the ignorance.
+		 */
+		if (box_promote_qsync() != 0)
+			diag_log();
 
 		if (!box_raft_has_work)
 			fiber_yield();


More information about the Tarantool-patches mailing list