[Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term
Serge Petrenko
sergepetrenko at tarantool.org
Wed Jul 14 21:26:47 MSK 2021
04.07.2021 15:14, Vladislav Shpilevoy пишет:
> Thanks for the patch!
Thanks for the review!
> Did you think about making it NOP when the node is already a leader
> (even in manual/off mode)? The current solution is all good except
> that it makes the current leader temporary read-only until it wins
> the election again, which looks strange. I would say "unexpected" for
> users.
Sure, why not. I'll address it in a separate commit.
> See 4 comments below.
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 6a0950f44..ce37b307d 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1687,16 +1687,19 @@ box_promote(void)
>> rc = -1;
>> } else {
>> promote:
>> - /* We cannot possibly get here in a volatile state. */
>> - assert(box_raft()->volatile_term == box_raft()->term);
>> - txn_limbo_write_promote(&txn_limbo, wait_lsn,
>> - box_raft()->term);
>> + if (try_wait) {
>> + raft_new_term(box_raft());
>> + if (box_raft_wait_persisted() < 0)
> 1. What if during term WAL write another node also started promote,
> won the elections and delivered the promote to us? I suppose after
> the WAL write we will silently write PROMOTE for the term which
> was won by somebody else, right? Can it be covered by a test?
Yep, need to handle that.
>> + return -1;
>> + }
>> + uint64_t term = box_raft()->term;
>> + txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
>> struct synchro_request req = {
>> .type = IPROTO_PROMOTE,
>> .replica_id = former_leader_id,
>> .origin_id = instance_id,
>> .lsn = wait_lsn,
>> - .term = box_raft()->term,
>> + .term = term,
>> };
>> txn_limbo_process(&txn_limbo, &req);
>> assert(txn_limbo_is_empty(&txn_limbo));
>> diff --git a/src/box/raft.c b/src/box/raft.c
>> index 7f787c0c5..17caf6f54 100644
>> --- a/src/box/raft.c
>> +++ b/src/box/raft.c
>> @@ -354,6 +354,42 @@ box_raft_wait_leader_found(void)
>> return 0;
>> }
>>
>> +struct raft_wait_persisted_data {
>> + struct fiber *waiter;
>> + uint64_t term;
>> +};
>> +
>> +static int
>> +box_raft_wait_persisted_f(struct trigger *trig, void *event)
>> +{
>> + struct raft *raft = event;
>> + struct raft_wait_persisted_data *data = trig->data;
>> + if (raft->term >= data->term)
>> + fiber_wakeup(data->waiter);
>> + return 0;
>> +}
>> +
>> +int
>> +box_raft_wait_persisted(void)
>> +{
>> + if (box_raft()->term == box_raft()->volatile_term)
> 2. Since it only waits for term being persisted, I would rather
> call it 'wait_term_persisted'. Because there is also vote, and
> you do not look at it.
Ok.
>> + return 0;
>> + struct raft_wait_persisted_data data = {
>> + .waiter = fiber(),
>> + .term = box_raft()->volatile_term,
>> + };
>> + struct trigger trig;
>> + trigger_create(&trig, box_raft_wait_persisted_f, &data, NULL);
>> + raft_on_update(box_raft(), &trig);
>> + fiber_yield();
> 3. What about spurious wakeups? I could call fiber.wakeup() from
> Lua on this fiber.
Yep, need to handle that.
I"ll do that for box_raft_wait_leader_found() as well. In a separate commit.
Good catch!
>> + trigger_clear(&trig);
>> + if (fiber_is_cancelled()) {
>> + diag_set(FiberIsCancelled);
>> + return -1;
>> + }
>> + return 0;
>> +}
>> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
>> index 9b63a4b99..e71eb60a8 100644
>> --- a/test/replication/gh-4114-local-space-replication.result
>> +++ b/test/replication/gh-4114-local-space-replication.result
>> @@ -45,9 +45,8 @@ test_run:cmd('switch replica')
>> | ---
>> | - true
>> | ...
>> -box.info.vclock[0]
>> +a = box.info.vclock[0] or 0
>> | ---
>> - | - null
>> | ...
>> box.cfg{checkpoint_count=1}
>> | ---
>> @@ -77,9 +76,9 @@ box.space.test:insert{3}
>> | - [3]
>> | ...
>>
>> -box.info.vclock[0]
>> +assert(box.info.vclock[0] == a + 3)
>> | ---
>> - | - 3
>> + | - true
> 4. Why do you need these changes? I reverted this test and it passed.
When the test's run after some election test, master has non-default
raft term,
and replica persists this term bumping vclock[0].
--
Serge Petrenko
More information about the Tarantool-patches
mailing list