[Tarantool-patches] [PATCH v4 07/12] raft: filter rows based on known peer terms

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Wed Apr 21 01:30:44 MSK 2021


Thanks for the patch!

>> A couple of fixes on top:
>> Only apply  PROMOTE when it's for a greater term than already received.
>> If promote tries to confirm entries for instance id other than limbo->owner_id
>> rollback everything that's unconfirmed.
> 
> Sorry, discard this. election_qsync_stress hangs again with this change.
> 
> I haven't pushed it yet anyway.

As we noticed in a private discussion, it hung on top of this commit, but
does not hang on top of the branch. Because the last commit contains a
fix for the test. So the initial change was good. I returned it in this
commit (pushed on my branch sp/gh-5445-election-fixes-review):

====================
    [tosquash] Foreign promote should rollback local limbo
    
    See the comments why.

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 31f4c1b1e..c22bd6665 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -658,12 +658,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 			return;
 		}
 	}
+	int64_t lsn = req->lsn;
 	if (req->replica_id == REPLICA_ID_NIL) {
 		/*
 		 * The limbo was empty on the instance issuing the request.
 		 * This means this instance must empty its limbo as well.
 		 */
-		assert(req->lsn == 0 && req->type == IPROTO_PROMOTE);
+		assert(lsn == 0 && req->type == IPROTO_PROMOTE);
 	} else if (req->replica_id != limbo->owner_id) {
 		/*
 		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
@@ -671,17 +672,25 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 		 * data from an old leader, who has just started and written
 		 * confirm right on synchronous transaction recovery.
 		 */
-		return;
+		if (req->type != IPROTO_PROMOTE)
+			return;
+		/*
+		 * Promote has a bigger term, and tries to steal the limbo. It
+		 * means it probably was elected with a quorum, and it makes no
+		 * sense to wait here for confirmations. The other nodes already
+		 * elected a new leader. Rollback all the local txns.
+		 */
+		lsn = 0;
 	}
 	switch (req->type) {
 	case IPROTO_CONFIRM:
-		txn_limbo_read_confirm(limbo, req->lsn);
+		txn_limbo_read_confirm(limbo, lsn);
 		break;
 	case IPROTO_ROLLBACK:
-		txn_limbo_read_rollback(limbo, req->lsn);
+		txn_limbo_read_rollback(limbo, lsn);
 		break;
 	case IPROTO_PROMOTE:
-		txn_limbo_read_promote(limbo, req->origin_id, req->lsn);
+		txn_limbo_read_promote(limbo, req->origin_id, lsn);
 		break;
 	default:
 		unreachable();



More information about the Tarantool-patches mailing list