[Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response

Serge Petrenko sergepetrenko at tarantool.org
Wed Jul 14 21:25:35 MSK 2021


Make Raft nodes send out their latest persisted term to joining
replicas.

This is needed to avoid the situation when txn_limbo-managed 'promote
greatest term' is greater than current Raft term. Otherwise the
following may happen: replica joins off some instance and receives its
latest limbo state. The state includes "greatest term seen" and makes
limbo filter out any data coming from instances with smaller terms.
Imagine that master this replica has joined from dies before replica has
a chance to subscribe to it. Then it doesn't receive its current Raft
term and start elections at smallest term possible, 2 (when there are no
suitable Raft nodes besides the replica).

Once the elections in a small term number are won, a ton of problems
arises: starting with filtering out PROMOTE requests for "old" term and
nop-ifying any data coming from terms smaller than "greatest term seen".

Prerequisite #6034
---
 src/box/applier.cc | 5 +++++
 src/box/relay.cc   | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 4088fcc21..92ec088ea 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -459,6 +459,11 @@ applier_wait_snapshot(struct applier *applier)
 				if (xrow_decode_synchro(&row, &req) != 0)
 					diag_raise();
 				txn_limbo_process(&txn_limbo, &req);
+			} else if (iproto_type_is_raft_request(row.type)) {
+				struct raft_request req;
+				if (xrow_decode_raft(&row, &req, NULL) != 0)
+					diag_raise();
+				box_raft_recover(&req);
 			} else if (row.type != IPROTO_JOIN_SNAPSHOT) {
 				tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
 					  (uint32_t)row.type);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4b102a777..70f1a045b 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -428,7 +428,9 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
 		diag_raise();
 
 	struct synchro_request req;
+	struct raft_request raft_req;
 	txn_limbo_checkpoint(&txn_limbo, &req);
+	box_raft_checkpoint_local(&raft_req);
 
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
@@ -451,6 +453,10 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
 		row.sync = sync;
 		coio_write_xrow(&relay->io, &row);
 
+		xrow_encode_raft(&row, &fiber()->gc, &raft_req);
+		row.sync = sync;
+		coio_write_xrow(&relay->io, &row);
+
 		/* Mark the end of the metadata stream. */
 		row.type = IPROTO_JOIN_SNAPSHOT;
 		coio_write_xrow(&relay->io, &row);
-- 
2.30.1 (Apple Git-130)



More information about the Tarantool-patches mailing list