[Tarantool-patches] [PATCH v4 07/12] raft: filter rows based on known peer terms
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Wed Apr 21 01:30:44 MSK 2021
Thanks for the patch!
>> A couple of fixes on top:
>> Only apply PROMOTE when it's for a greater term than already received.
>> If promote tries to confirm entries for instance id other than limbo->owner_id
>> rollback everything that's unconfirmed.
>
> Sorry, discard this. election_qsync_stress hangs again with this change.
>
> I haven't pushed it yet anyway.
As we noticed in a private discussion, it hung on top of this commit, but
does not hang on top of the branch. Because the last commit contains a
fix for the test. So the initial change was good. I returned it in this
commit (pushed on my branch sp/gh-5445-election-fixes-review):
====================
[tosquash] Foreign promote should rollback local limbo
See the comments why.
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 31f4c1b1e..c22bd6665 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -658,12 +658,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
return;
}
}
+ int64_t lsn = req->lsn;
if (req->replica_id == REPLICA_ID_NIL) {
/*
* The limbo was empty on the instance issuing the request.
* This means this instance must empty its limbo as well.
*/
- assert(req->lsn == 0 && req->type == IPROTO_PROMOTE);
+ assert(lsn == 0 && req->type == IPROTO_PROMOTE);
} else if (req->replica_id != limbo->owner_id) {
/*
* Ignore CONFIRM/ROLLBACK messages for a foreign master.
@@ -671,17 +672,25 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
* data from an old leader, who has just started and written
* confirm right on synchronous transaction recovery.
*/
- return;
+ if (req->type != IPROTO_PROMOTE)
+ return;
+ /*
+ * Promote has a bigger term, and tries to steal the limbo. It
+ * means it probably was elected with a quorum, and it makes no
+ * sense to wait here for confirmations. The other nodes already
+ * elected a new leader. Rollback all the local txns.
+ */
+ lsn = 0;
}
switch (req->type) {
case IPROTO_CONFIRM:
- txn_limbo_read_confirm(limbo, req->lsn);
+ txn_limbo_read_confirm(limbo, lsn);
break;
case IPROTO_ROLLBACK:
- txn_limbo_read_rollback(limbo, req->lsn);
+ txn_limbo_read_rollback(limbo, lsn);
break;
case IPROTO_PROMOTE:
- txn_limbo_read_promote(limbo, req->origin_id, req->lsn);
+ txn_limbo_read_promote(limbo, req->origin_id, lsn);
break;
default:
unreachable();
More information about the Tarantool-patches
mailing list