[Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`

Sergey Petrenko sergepetrenko at tarantool.org
Thu Jul 29 23:56:45 MSK 2021


27.07.2021 02:50, Vladislav Shpilevoy пишет:
> Thanks for the fixes!
>
> On 23.07.2021 09:44, Sergey Petrenko wrote:
>> 22.07.2021 01:28, Vladislav Shpilevoy пишет:
>>> Thanks for the fixes!
>>>
>>>>> 2. On top of this commit and on top of the branch too I tried to
>>>>> promote a candidate and got a strange error in the logs, although
>>>>> the promotion was successful:
>>>>>
>>>>> -- 
>>>>> -- Instance 1
>>>>> -- 
>>>>>
>>>>> -- Step 1
>>>>> box.cfg{
>>>>>        listen = 3313,
>>>>>        election_mode = 'candidate',
>>>>>        replication_synchro_quorum = 2,
>>>>>        replication = {'localhost:3314', 'localhost:3313'}
>>>>> }
>>>>> box.schema.user.grant('guest', 'super')
>>>>>
>>>>>
>>>>> -- 
>>>>> -- Instance 2
>>>>> -- 
>>>>>
>>>>> -- Step 2
>>>>> box.cfg{
>>>>>        listen = 3314,
>>>>>        election_mode = 'voter',
>>>>>        replication_synchro_quorum = 2,
>>>>>        replication = {'localhost:3314', 'localhost:3313'},
>>>>>        read_only = true,
>>>>> }
>>>>>
>>>>> -- Step 3
>>>>> box.cfg{read_only = false, election_mode = 'candidate'}
>>>>>
>>>>> -- Step 4
>>>>> box.ctl.promote()
>>>>>
>>>>> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
>>>>> ---
>>>>> ...
>>>> That's because once a candidate becomes the leader, it tries to issue `box.ctl.promote()`, and fails,
>>>> since we're already in `box.ctl.promote()` call.
>>>> I'm not sure how to handle that properly. This doesn't break anything though.
>>> Still, the error message looks really not good. There is an option -
>>> make box_promote() for candidate node just call raft_promote() and set
>>> box_in_promote = false. Then wait for the term outcome. Will it work?
>>> You would need to rebase your patchset on master branch then.
>> Hmm, I don't like that. This will work, but it will complicate things in
>>
>> box_promote even more. Now one has to keep in mind that a manual
>>
>> promote call has 2 phases, one when it is issued manually and the other
>>
>> when it's called automatically when (and if) the instance becomes the leader.
> But is it different in the new version on the branch? You still need to check
> if you are "in promote", but instead of it being encapsulated into box_promote()
> we now need to check it potentially in all box_promote() usage places. Which is
> only 2 now though, and only 1 needs this. But you can see that now you need to
> think if you should ignore being already in promote.
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index a30e4f78d..5cdca4bd4 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1692,20 +1692,24 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
>>   }
>>
>>   /* A guard to block multiple simultaneous promote()/demote() invocations. */
>> -static bool box_in_promote = false;
>> +static bool in_box_promote = false;
>> +
>> +bool box_in_promote(void) {
> Could you please put the return value on a separate line and use 'is'
> as a suffix? For instance, `box_is_in_promote()`.
>> diff --git a/src/box/box.h b/src/box/box.h
>> index aaf20d9dd..344ed90f2 100644
>> --- a/src/box/box.h
>> +++ b/src/box/box.h
>> @@ -88,11 +88,11 @@ box_raft_update_synchro_queue(struct raft *raft)
>>   {
>>       assert(raft == box_raft());
>>       /*
>> -     * In case these are manual elections, we are already in the middle of a
>> -     * `promote` call. No need to call it once again.
>> +     * In case the elections were triggered manually, we are already in
>> +     * the middle of a `promote` call. No need to call it once again.
>>        */
>>       if (raft->state == RAFT_STATE_LEADER &&
>> -        box_election_mode != ELECTION_MODE_MANUAL) {
>> +        !box_in_promote()) {
> It just seems to me that it should not be of raft's business if it was
> promoted manually or not.


Ok, I see your point. I'll apply your diff.


>>           int rc = 0;
>>           uint32_t errcode = 0;
>>           do {
>> =====================================
>>
>>
>>> It might be a little easier to do if you apply the diff below. (Warning:
>>> I didn't test it.) The motivation is that one of the main reasons why I
>>> wanted box_promote() simplified was because of the strange meaning of some
>>> flags. In particular, try_wait flag somewhy triggered elections before the
>>> waiting which is super not obvious why. How does 'wait' come to 'elections'?
>>>
>>> In the diff I tried to remove these flags entirely. And now you have a
>>> single place in the code of box_promote(), where ELECTION_MODE_CANDIDATE
>>> stuff is handled. Here you could try the proposal I gave above.
>> Thanks for the help! Your diff looks good, I've reworked my patches to comply.
> Wouldn't it be too hard to merge this diff into the previous
> commits? Because it could be done like that from the first
> box_promote-refactoring commit it looks to me.


No problem.


>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index af39f66f4..36302310b 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1701,14 +1701,14 @@ box_promote(void)
>>   	if (!is_box_configured)
>>   		return 0;
>>   	if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
>> -	    box_raft()->term)
>> +	    raft->term)
>>   		return 0;
>> -	bool run_elections = false;
>> -	bool try_wait = false;
>> -
>>   	switch (box_election_mode) {
>>   	case ELECTION_MODE_OFF:
>> -		try_wait = true;
>> +		if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
>> +			return -1;
>> +		if (box_trigger_elections() != 0)
>> +			return -1;
>>   		break;
>>   	case ELECTION_MODE_VOTER:
>>   		assert(raft->state == RAFT_STATE_FOLLOWER);
>> @@ -1717,23 +1717,17 @@ box_promote(void)
>>   		return -1;
>>   	case ELECTION_MODE_MANUAL:
>>   	case ELECTION_MODE_CANDIDATE:
>> -		run_elections = raft->state != RAFT_STATE_LEADER;
>> +		if (raft->state != RAFT_STATE_LEADER &&
>> +		    box_run_elections() != 0) {
>> +			return -1;
>> +		}
>>   		break;
>>   	default:
>>   		unreachable();
>>   	}
>>   
>> -	int64_t wait_lsn = -1;
>> -
>> -	if (run_elections && box_run_elections() != 0)
>> -		return -1;
>> -	if (try_wait) {
>> -		if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
>> -			return -1;
>> -		if (box_trigger_elections() != 0)
>> -			return -1;
>> -	}
>> -	if ((wait_lsn = box_wait_limbo_acked()) < 0)
>> +	int64_t wait_lsn = box_wait_limbo_acked();
>> +	if (wait_lsn < 0)
>>   		return -1;
>>   
>>   	box_issue_promote(txn_limbo.owner_id, wait_lsn);
> I thought more about how the manual promotion conflicts with the
> automatic one and I tried to separate them. See my diff below, it
> is done on top of this commit, but I didn't push it, because I
> didn't run the tests except the one which used to log an error
> in the first version of the patchset.
>
> Thanks to the new small helper functions, I implemented a special
> function for automatic promote which does not conflict with
> the manual one. It you will apply it though, you might want to
> do it earlier that this commit.


Thanks for the help! I've applied your diff with minor changes to
patches

box: split manual and automatic promotion

box: allow calling promote on a candidate


>
> ====================
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 36302310b..65d064615 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1599,7 +1599,7 @@ box_try_wait_confirm(double timeout)
>    * Return lsn of the last limbo entry on success, -1 on error.
>    */
>   static int64_t
> -box_wait_limbo_acked(void)
> +box_wait_limbo_acked(double timeout)
>   {
>   	if (txn_limbo_is_empty(&txn_limbo))
>   		return txn_limbo.confirmed_lsn;
> @@ -1629,7 +1629,7 @@ box_wait_limbo_acked(void)
>   	int64_t wait_lsn = last_entry->lsn;
>   
>   	if (box_wait_quorum(txn_limbo.owner_id, wait_lsn, quorum,
> -			    replication_synchro_timeout) != 0)
> +			    timeout) != 0)
>   		return -1;
>   
>   	if (box_check_promote_term_intact(promote_term) != 0)
> @@ -1675,9 +1675,23 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
>   /* A guard to block multiple simultaneous box_promote() invocations. */
>   static bool in_box_promote = false;
>   
> -bool box_in_promote(void)
> +int
> +box_promote_qsync(void)
>   {
> -	return in_box_promote;
> +	assert(!in_box_promote);
> +	assert(is_box_configured);
> +	struct raft *raft = box_raft();
> +	if (raft->state != RAFT_STATE_LEADER)
> +		return 0;
> +	assert(box_election_mode == ELECTION_MODE_MANUAL ||
> +	       box_election_mode == ELECTION_MODE_CANDIDATE);
> +	if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
> +		return 0;
> +	int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
> +	if (wait_lsn < 0)
> +		return -1;
> +	box_issue_promote(txn_limbo.owner_id, wait_lsn);
> +	return 0;
>   }
>   
>   int
> @@ -1700,8 +1714,8 @@ box_promote(void)
>   	 */
>   	if (!is_box_configured)
>   		return 0;
> -	if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
> -	    raft->term)
> +	if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term ||
> +	    raft->state == RAFT_STATE_LEADER)
>   		return 0;
>   	switch (box_election_mode) {
>   	case ELECTION_MODE_OFF:
> @@ -1715,18 +1729,16 @@ box_promote(void)
>   		diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
>   			 "manual elections");
>   		return -1;
> -	case ELECTION_MODE_MANUAL:
>   	case ELECTION_MODE_CANDIDATE:
> -		if (raft->state != RAFT_STATE_LEADER &&
> -		    box_run_elections() != 0) {
> -			return -1;
> -		}
> -		break;
> +	case ELECTION_MODE_MANUAL:
> +		/* Let the automatic elections finish the promotion properly. */
> +		in_box_promote = false;
> +		return box_run_elections();
>   	default:
>   		unreachable();
>   	}
>   
> -	int64_t wait_lsn = box_wait_limbo_acked();
> +	int64_t wait_lsn = box_wait_limbo_acked(replication_synchro_timeout);
>   	if (wait_lsn < 0)
>   		return -1;
>   
> diff --git a/src/box/box.h b/src/box/box.h
> index db636b058..5e5d0bf6d 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -273,12 +273,12 @@ extern "C" {
>   
>   typedef struct tuple box_tuple_t;
>   
> -bool
> -box_in_promote(void);
> -
>   int
>   box_promote(void);
>   
> +int
> +box_promote_qsync(void);
> +
>   /* box_select is private and used only by FFI */
>   API_EXPORT int
>   box_select(uint32_t space_id, uint32_t index_id,
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 5e496c2e4..07501157f 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -83,30 +83,6 @@ box_raft_request_to_msg(const struct raft_request *req, struct raft_msg *msg)
>   	};
>   }
>   
> -static void
> -box_raft_update_synchro_queue(struct raft *raft)
> -{
> -	assert(raft == box_raft());
> -	/*
> -	 * In case the elections were triggered manually, we are already in
> -	 * the middle of a `promote` call. No need to call it once again.
> -	 */
> -	if (raft->state == RAFT_STATE_LEADER &&
> -	    !box_in_promote()) {
> -		int rc = 0;
> -		uint32_t errcode = 0;
> -		do {
> -			rc = box_promote();
> -			if (rc != 0) {
> -				struct error *err = diag_last_error(diag_get());
> -				errcode = box_error_code(err);
> -				diag_log();
> -			}
> -		} while (rc != 0 && errcode == ER_QUORUM_WAIT &&
> -		       !fiber_is_cancelled());
> -	}
> -}
> -
>   static int
>   box_raft_worker_f(va_list args)
>   {
> @@ -117,7 +93,12 @@ box_raft_worker_f(va_list args)
>   		box_raft_has_work = false;
>   
>   		raft_process_async(raft);
> -		box_raft_update_synchro_queue(raft);
> +		/*
> +		 * XXX: perhaps it should not ever fail. Or at least need a
> +		 * proper support for failures instead of the ignorance.
> +		 */
> +		if (box_promote_qsync() != 0)
> +			diag_log();
>   
>   		if (!box_raft_has_work)
>   			fiber_yield();


More information about the Tarantool-patches mailing list