[Tarantool-patches] [PATCH v4 07/12] raft: filter rows based on known peer terms
Serge Petrenko
sergepetrenko at tarantool.org
Wed Apr 21 08:58:57 MSK 2021
21.04.2021 01:30, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
>>> A couple of fixes on top:
>>> Only apply PROMOTE when it's for a greater term than already received.
>>> If promote tries to confirm entries for instance id other than limbo->owner_id
>>> rollback everything that's unconfirmed.
>> Sorry, discard this. election_qsync_stress hangs again with this change.
>>
>> I haven't pushed it yet anyway.
> As we noticed in a private discussion, it hung on top of this commit, but
> does not hang on top of the branch. Because the last commit contains a
> fix for the test. So the initial change was good. I returned it in this
> commit (pushed on my branch sp/gh-5445-election-fixes-review):
Yes, indeed. Thanks for noticing!
Squashed your commit.
>
> ====================
> [tosquash] Foreign promote should rollback local limbo
>
> See the comments why.
>
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 31f4c1b1e..c22bd6665 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -658,12 +658,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
> return;
> }
> }
> + int64_t lsn = req->lsn;
> if (req->replica_id == REPLICA_ID_NIL) {
> /*
> * The limbo was empty on the instance issuing the request.
> * This means this instance must empty its limbo as well.
> */
> - assert(req->lsn == 0 && req->type == IPROTO_PROMOTE);
> + assert(lsn == 0 && req->type == IPROTO_PROMOTE);
> } else if (req->replica_id != limbo->owner_id) {
> /*
> * Ignore CONFIRM/ROLLBACK messages for a foreign master.
> @@ -671,17 +672,25 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
> * data from an old leader, who has just started and written
> * confirm right on synchronous transaction recovery.
> */
> - return;
> + if (req->type != IPROTO_PROMOTE)
> + return;
> + /*
> + * Promote has a bigger term, and tries to steal the limbo. It
> + * means it probably was elected with a quorum, and it makes no
> + * sense to wait here for confirmations. The other nodes already
> + * elected a new leader. Rollback all the local txns.
> + */
> + lsn = 0;
> }
> switch (req->type) {
> case IPROTO_CONFIRM:
> - txn_limbo_read_confirm(limbo, req->lsn);
> + txn_limbo_read_confirm(limbo, lsn);
> break;
> case IPROTO_ROLLBACK:
> - txn_limbo_read_rollback(limbo, req->lsn);
> + txn_limbo_read_rollback(limbo, lsn);
> break;
> case IPROTO_PROMOTE:
> - txn_limbo_read_promote(limbo, req->origin_id, req->lsn);
> + txn_limbo_read_promote(limbo, req->origin_id, lsn);
> break;
> default:
> unreachable();
>
--
Serge Petrenko
More information about the Tarantool-patches
mailing list