[Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet
Cyrill Gorcunov
gorcunov at gmail.com
Wed Sep 22 16:05:33 MSK 2021
While been investigating various cluster split-brain scenarios and
trying to gather valid incoming synchro request domains and ranges
we've discovered that limbo::confirmed_lsn updated not dense enough
to cover our needs.
In particular this variable is always updated by a limbo owner upon
write of syncro entry (to a journal) while replica just reads such
record without confirmed_lsn update, so when the replica become a cluster
leader it sends a promote request back to the former leader where the
packet carries zero LSN instead of previous confirmed_lsn and validation
of such packet won't pass.
Note the packet validation is not yet implemented in this patch so it
is rather a preparatory work for future.
Part-of #6036
Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
---
src/box/txn_limbo.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index eb9aa7780..959811309 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -774,6 +774,20 @@ txn_limbo_process_run(struct txn_limbo *limbo,
switch (req->type) {
case IPROTO_RAFT_CONFIRM:
txn_limbo_read_confirm(limbo, lsn);
+ /*
+ * We have to adjust confirmed_lsn according
+ * to LSN coming from the request. It is because
+ * we will need to report it as old's limbo owner
+ * LSN inside PROMOTE requests (if administrator
+ * or election engine will make us so).
+ *
+ * We could update confirmed_lsn on every
+ * txn_limbo_read_confirm call but this function
+ * is usually called in a couple with
+ * txn_limbo_write_confirm, thus to eliminate redundant
+ * variables update we make so once but explicitly.
+ */
+ limbo->confirmed_lsn = req->lsn;
break;
case IPROTO_RAFT_ROLLBACK:
txn_limbo_read_rollback(limbo, lsn);
--
2.31.1
More information about the Tarantool-patches
mailing list