[Tarantool-patches] [PATCH v4 07/12] raft: filter rows based on known peer terms

Serge Petrenko sergepetrenko at tarantool.org
Wed Apr 21 08:58:57 MSK 2021



21.04.2021 01:30, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
>>> A couple of fixes on top:
>>> Only apply  PROMOTE when it's for a greater term than already received.
>>> If promote tries to confirm entries for instance id other than limbo->owner_id
>>> rollback everything that's unconfirmed.
>> Sorry, discard this. election_qsync_stress hangs again with this change.
>>
>> I haven't pushed it yet anyway.
> As we noticed in a private discussion, it hung on top of this commit, but
> does not hang on top of the branch. Because the last commit contains a
> fix for the test. So the initial change was good. I returned it in this
> commit (pushed on my branch sp/gh-5445-election-fixes-review):

Yes, indeed. Thanks for noticing!
Squashed your commit.

>
> ====================
>      [tosquash] Foreign promote should rollback local limbo
>      
>      See the comments why.
>
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 31f4c1b1e..c22bd6665 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -658,12 +658,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
>   			return;
>   		}
>   	}
> +	int64_t lsn = req->lsn;
>   	if (req->replica_id == REPLICA_ID_NIL) {
>   		/*
>   		 * The limbo was empty on the instance issuing the request.
>   		 * This means this instance must empty its limbo as well.
>   		 */
> -		assert(req->lsn == 0 && req->type == IPROTO_PROMOTE);
> +		assert(lsn == 0 && req->type == IPROTO_PROMOTE);
>   	} else if (req->replica_id != limbo->owner_id) {
>   		/*
>   		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
> @@ -671,17 +672,25 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
>   		 * data from an old leader, who has just started and written
>   		 * confirm right on synchronous transaction recovery.
>   		 */
> -		return;
> +		if (req->type != IPROTO_PROMOTE)
> +			return;
> +		/*
> +		 * Promote has a bigger term, and tries to steal the limbo. It
> +		 * means it probably was elected with a quorum, and it makes no
> +		 * sense to wait here for confirmations. The other nodes already
> +		 * elected a new leader. Rollback all the local txns.
> +		 */
> +		lsn = 0;
>   	}
>   	switch (req->type) {
>   	case IPROTO_CONFIRM:
> -		txn_limbo_read_confirm(limbo, req->lsn);
> +		txn_limbo_read_confirm(limbo, lsn);
>   		break;
>   	case IPROTO_ROLLBACK:
> -		txn_limbo_read_rollback(limbo, req->lsn);
> +		txn_limbo_read_rollback(limbo, lsn);
>   		break;
>   	case IPROTO_PROMOTE:
> -		txn_limbo_read_promote(limbo, req->origin_id, req->lsn);
> +		txn_limbo_read_promote(limbo, req->origin_id, lsn);
>   		break;
>   	default:
>   		unreachable();
>

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list