[Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term

Serge Petrenko sergepetrenko at tarantool.org
Wed Jul 14 21:26:47 MSK 2021



04.07.2021 15:14, Vladislav Shpilevoy пишет:
> Thanks for the patch!

Thanks for the review!

> Did you think about making it NOP when the node is already a leader
> (even in manual/off mode)? The current solution is all good except
> that it makes the current leader temporary read-only until it wins
> the election again, which looks strange. I would say "unexpected" for
> users.

Sure, why not. I'll address it in a separate commit.

> See 4 comments below.
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 6a0950f44..ce37b307d 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1687,16 +1687,19 @@ box_promote(void)
>>   			rc = -1;
>>   		} else {
>>   promote:
>> -			/* We cannot possibly get here in a volatile state. */
>> -			assert(box_raft()->volatile_term == box_raft()->term);
>> -			txn_limbo_write_promote(&txn_limbo, wait_lsn,
>> -						box_raft()->term);
>> +			if (try_wait) {
>> +				raft_new_term(box_raft());
>> +				if (box_raft_wait_persisted() < 0)
> 1. What if during term WAL write another node also started promote,
> won the elections and delivered the promote to us? I suppose after
> the WAL write we will silently write PROMOTE for the term which
> was won by somebody else, right? Can it be covered by a test?

Yep, need to handle that.

>> +					return -1;
>> +			}
>> +			uint64_t term = box_raft()->term;
>> +			txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
>>   			struct synchro_request req = {
>>   				.type = IPROTO_PROMOTE,
>>   				.replica_id = former_leader_id,
>>   				.origin_id = instance_id,
>>   				.lsn = wait_lsn,
>> -				.term = box_raft()->term,
>> +				.term = term,
>>   			};
>>   			txn_limbo_process(&txn_limbo, &req);
>>   			assert(txn_limbo_is_empty(&txn_limbo));
>> diff --git a/src/box/raft.c b/src/box/raft.c
>> index 7f787c0c5..17caf6f54 100644
>> --- a/src/box/raft.c
>> +++ b/src/box/raft.c
>> @@ -354,6 +354,42 @@ box_raft_wait_leader_found(void)
>>   	return 0;
>>   }
>>   
>> +struct raft_wait_persisted_data {
>> +	struct fiber *waiter;
>> +	uint64_t term;
>> +};
>> +
>> +static int
>> +box_raft_wait_persisted_f(struct trigger *trig, void *event)
>> +{
>> +	struct raft *raft = event;
>> +	struct raft_wait_persisted_data *data = trig->data;
>> +	if (raft->term >= data->term)
>> +		fiber_wakeup(data->waiter);
>> +	return 0;
>> +}
>> +
>> +int
>> +box_raft_wait_persisted(void)
>> +{
>> +	if (box_raft()->term == box_raft()->volatile_term)
> 2. Since it only waits for term being persisted, I would rather
> call it 'wait_term_persisted'. Because there is also vote, and
> you do not look at it.

Ok.

>> +		return 0;
>> +	struct raft_wait_persisted_data data = {
>> +		.waiter = fiber(),
>> +		.term = box_raft()->volatile_term,
>> +	};
>> +	struct trigger trig;
>> +	trigger_create(&trig, box_raft_wait_persisted_f, &data, NULL);
>> +	raft_on_update(box_raft(), &trig);
>> +	fiber_yield();
> 3. What about spurious wakeups? I could call fiber.wakeup() from
> Lua on this fiber.

Yep, need to handle that.
I"ll do that for box_raft_wait_leader_found() as well. In a separate commit.
Good catch!

>> +	trigger_clear(&trig);
>> +	if (fiber_is_cancelled()) {
>> +		diag_set(FiberIsCancelled);
>> +		return -1;
>> +	}
>> +	return 0;
>> +}
>> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
>> index 9b63a4b99..e71eb60a8 100644
>> --- a/test/replication/gh-4114-local-space-replication.result
>> +++ b/test/replication/gh-4114-local-space-replication.result
>> @@ -45,9 +45,8 @@ test_run:cmd('switch replica')
>>    | ---
>>    | - true
>>    | ...
>> -box.info.vclock[0]
>> +a = box.info.vclock[0] or 0
>>    | ---
>> - | - null
>>    | ...
>>   box.cfg{checkpoint_count=1}
>>    | ---
>> @@ -77,9 +76,9 @@ box.space.test:insert{3}
>>    | - [3]
>>    | ...
>>   
>> -box.info.vclock[0]
>> +assert(box.info.vclock[0] == a + 3)
>>    | ---
>> - | - 3
>> + | - true
> 4. Why do you need these changes? I reverted this test and it passed.

When the test's run after some election test, master has non-default 
raft term,
and replica persists this term bumping vclock[0].

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list