[Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet

Cyrill Gorcunov gorcunov at gmail.com
Wed Sep 22 16:05:33 MSK 2021


While been investigating various cluster split-brain scenarios and
trying to gather valid incoming synchro request domains and ranges
we've discovered that limbo::confirmed_lsn updated not dense enough
to cover our needs.

In particular this variable is always updated by a limbo owner upon
write of syncro entry (to a journal) while replica just reads such
record without confirmed_lsn update, so when the replica become a cluster
leader it sends a promote request back to the former leader where the
packet carries zero LSN instead of previous confirmed_lsn and validation
of such packet won't pass.

Note the packet validation is not yet implemented in this patch so it
is rather a preparatory work for future.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
---
 src/box/txn_limbo.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index eb9aa7780..959811309 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -774,6 +774,20 @@ txn_limbo_process_run(struct txn_limbo *limbo,
 	switch (req->type) {
 	case IPROTO_RAFT_CONFIRM:
 		txn_limbo_read_confirm(limbo, lsn);
+		/*
+		 * We have to adjust confirmed_lsn according
+		 * to LSN coming from the request. It is because
+		 * we will need to report it as old's limbo owner
+		 * LSN inside PROMOTE requests (if administrator
+		 * or election engine will make us so).
+		 *
+		 * We could update confirmed_lsn on every
+		 * txn_limbo_read_confirm call but this function
+		 * is usually called in a couple with
+		 * txn_limbo_write_confirm, thus to eliminate redundant
+		 * variables update we make so once but explicitly.
+		 */
+		limbo->confirmed_lsn = req->lsn;
 		break;
 	case IPROTO_RAFT_ROLLBACK:
 		txn_limbo_read_rollback(limbo, lsn);
-- 
2.31.1



More information about the Tarantool-patches mailing list