[Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sun Jul 4 15:14:33 MSK 2021


Thanks for the patch!

Did you think about making it NOP when the node is already a leader
(even in manual/off mode)? The current solution is all good except
that it makes the current leader temporary read-only until it wins
the election again, which looks strange. I would say "unexpected" for
users.

See 4 comments below.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 6a0950f44..ce37b307d 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1687,16 +1687,19 @@ box_promote(void)
>  			rc = -1;
>  		} else {
>  promote:
> -			/* We cannot possibly get here in a volatile state. */
> -			assert(box_raft()->volatile_term == box_raft()->term);
> -			txn_limbo_write_promote(&txn_limbo, wait_lsn,
> -						box_raft()->term);
> +			if (try_wait) {
> +				raft_new_term(box_raft());
> +				if (box_raft_wait_persisted() < 0)

1. What if during term WAL write another node also started promote,
won the elections and delivered the promote to us? I suppose after
the WAL write we will silently write PROMOTE for the term which
was won by somebody else, right? Can it be covered by a test?

> +					return -1;
> +			}
> +			uint64_t term = box_raft()->term;
> +			txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
>  			struct synchro_request req = {
>  				.type = IPROTO_PROMOTE,
>  				.replica_id = former_leader_id,
>  				.origin_id = instance_id,
>  				.lsn = wait_lsn,
> -				.term = box_raft()->term,
> +				.term = term,
>  			};
>  			txn_limbo_process(&txn_limbo, &req);
>  			assert(txn_limbo_is_empty(&txn_limbo));
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 7f787c0c5..17caf6f54 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -354,6 +354,42 @@ box_raft_wait_leader_found(void)
>  	return 0;
>  }
>  
> +struct raft_wait_persisted_data {
> +	struct fiber *waiter;
> +	uint64_t term;
> +};
> +
> +static int
> +box_raft_wait_persisted_f(struct trigger *trig, void *event)
> +{
> +	struct raft *raft = event;
> +	struct raft_wait_persisted_data *data = trig->data;
> +	if (raft->term >= data->term)
> +		fiber_wakeup(data->waiter);
> +	return 0;
> +}
> +
> +int
> +box_raft_wait_persisted(void)
> +{
> +	if (box_raft()->term == box_raft()->volatile_term)

2. Since it only waits for term being persisted, I would rather
call it 'wait_term_persisted'. Because there is also vote, and
you do not look at it.

> +		return 0;
> +	struct raft_wait_persisted_data data = {
> +		.waiter = fiber(),
> +		.term = box_raft()->volatile_term,
> +	};
> +	struct trigger trig;
> +	trigger_create(&trig, box_raft_wait_persisted_f, &data, NULL);
> +	raft_on_update(box_raft(), &trig);
> +	fiber_yield();

3. What about spurious wakeups? I could call fiber.wakeup() from
Lua on this fiber.

> +	trigger_clear(&trig);
> +	if (fiber_is_cancelled()) {
> +		diag_set(FiberIsCancelled);
> +		return -1;
> +	}
> +	return 0;
> +}
> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
> index 9b63a4b99..e71eb60a8 100644
> --- a/test/replication/gh-4114-local-space-replication.result
> +++ b/test/replication/gh-4114-local-space-replication.result
> @@ -45,9 +45,8 @@ test_run:cmd('switch replica')
>   | ---
>   | - true
>   | ...
> -box.info.vclock[0]
> +a = box.info.vclock[0] or 0
>   | ---
> - | - null
>   | ...
>  box.cfg{checkpoint_count=1}
>   | ---
> @@ -77,9 +76,9 @@ box.space.test:insert{3}
>   | - [3]
>   | ...
>  
> -box.info.vclock[0]
> +assert(box.info.vclock[0] == a + 3)
>   | ---
> - | - 3
> + | - true

4. Why do you need these changes? I reverted this test and it passed.


More information about the Tarantool-patches mailing list