[Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sun Jul 4 15:14:33 MSK 2021
Thanks for the patch!
Did you think about making it NOP when the node is already a leader
(even in manual/off mode)? The current solution is all good except
that it makes the current leader temporary read-only until it wins
the election again, which looks strange. I would say "unexpected" for
users.
See 4 comments below.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 6a0950f44..ce37b307d 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1687,16 +1687,19 @@ box_promote(void)
> rc = -1;
> } else {
> promote:
> - /* We cannot possibly get here in a volatile state. */
> - assert(box_raft()->volatile_term == box_raft()->term);
> - txn_limbo_write_promote(&txn_limbo, wait_lsn,
> - box_raft()->term);
> + if (try_wait) {
> + raft_new_term(box_raft());
> + if (box_raft_wait_persisted() < 0)
1. What if during term WAL write another node also started promote,
won the elections and delivered the promote to us? I suppose after
the WAL write we will silently write PROMOTE for the term which
was won by somebody else, right? Can it be covered by a test?
> + return -1;
> + }
> + uint64_t term = box_raft()->term;
> + txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
> struct synchro_request req = {
> .type = IPROTO_PROMOTE,
> .replica_id = former_leader_id,
> .origin_id = instance_id,
> .lsn = wait_lsn,
> - .term = box_raft()->term,
> + .term = term,
> };
> txn_limbo_process(&txn_limbo, &req);
> assert(txn_limbo_is_empty(&txn_limbo));
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 7f787c0c5..17caf6f54 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -354,6 +354,42 @@ box_raft_wait_leader_found(void)
> return 0;
> }
>
> +struct raft_wait_persisted_data {
> + struct fiber *waiter;
> + uint64_t term;
> +};
> +
> +static int
> +box_raft_wait_persisted_f(struct trigger *trig, void *event)
> +{
> + struct raft *raft = event;
> + struct raft_wait_persisted_data *data = trig->data;
> + if (raft->term >= data->term)
> + fiber_wakeup(data->waiter);
> + return 0;
> +}
> +
> +int
> +box_raft_wait_persisted(void)
> +{
> + if (box_raft()->term == box_raft()->volatile_term)
2. Since it only waits for term being persisted, I would rather
call it 'wait_term_persisted'. Because there is also vote, and
you do not look at it.
> + return 0;
> + struct raft_wait_persisted_data data = {
> + .waiter = fiber(),
> + .term = box_raft()->volatile_term,
> + };
> + struct trigger trig;
> + trigger_create(&trig, box_raft_wait_persisted_f, &data, NULL);
> + raft_on_update(box_raft(), &trig);
> + fiber_yield();
3. What about spurious wakeups? I could call fiber.wakeup() from
Lua on this fiber.
> + trigger_clear(&trig);
> + if (fiber_is_cancelled()) {
> + diag_set(FiberIsCancelled);
> + return -1;
> + }
> + return 0;
> +}
> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
> index 9b63a4b99..e71eb60a8 100644
> --- a/test/replication/gh-4114-local-space-replication.result
> +++ b/test/replication/gh-4114-local-space-replication.result
> @@ -45,9 +45,8 @@ test_run:cmd('switch replica')
> | ---
> | - true
> | ...
> -box.info.vclock[0]
> +a = box.info.vclock[0] or 0
> | ---
> - | - null
> | ...
> box.cfg{checkpoint_count=1}
> | ---
> @@ -77,9 +76,9 @@ box.space.test:insert{3}
> | - [3]
> | ...
>
> -box.info.vclock[0]
> +assert(box.info.vclock[0] == a + 3)
> | ---
> - | - 3
> + | - true
4. Why do you need these changes? I reverted this test and it passed.
More information about the Tarantool-patches
mailing list