[Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join

Serge Petrenko sergepetrenko at tarantool.org
Fri Jun 18 00:00:18 MSK 2021



16.06.2021 00:00, Vladislav Shpilevoy пишет:
> Thanks for working on this!
>
> Hm. The patch makes me think why don't we send the Raft
> checkpoint on join too?
>
> Otherwise it might happen that a replica joined, didn't
> get the most actual Raft term yet, then the leader died,
> and the replica would start elections with term 1. Even if
> the latest term was 1000.
>
> Nothing critical, Raft will probably work, but it could
> be an "optimization"? Also it would be consistent with the
> libmo state - send all the snapshot data on join.
I tried to implement such a patch, but faced the following problem:

Unfortunately, we don't have information about replica's version
during join, so I can only send raft state based on term > 1.

Also, while writing a commit message I understood that this patch
doesn't help much. Even if a node joins, but doesn't subscribe to the
leader, it will still subscribe to someone else and receive the latest
Raft state.
After all, our doc states full-mesh is required for Raft to work, so we'll
have someone else to subscribe to and receive Raft state from for sure.

The patch's small and simple but I think it's not needed.

I've made a tiny amendment to this commit though, please find the diff 
below.


> Btw, the limbo state contains a term. And it means
> that after join, but before subscribe, the limbo's term
> is bigger than raft's term. Even though in the comments
> of the limbo we say:
>
> 	* It means the limbo's term might be smaller than the raft term, while
> 	* there are ongoing elections, or the leader is already known and this
> 	* instance hasn't read its PROMOTE request yet. During other times the
> 	* limbo and raft are in sync and the terms are the same.
>
> which means the limbo term should be always <= raft term.
> Can this break something? Is it possible to make a test confirming
> that we can't send the limbo state before the raft state?

I don't think this could break anything.
Limbo and Raft terms are not that interconnected now.

================================

diff --git a/src/box/relay.cc b/src/box/relay.cc
index 289dea0f3..e05b53d5d 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -408,12 +408,17 @@ relay_initial_join(int fd, uint64_t sync, struct 
vclock *vclock)
         row.sync = sync;
         coio_write_xrow(&relay->io, &row);

-       /* Send out the latest limbo state. */
-       char body[XROW_SYNCHRO_BODY_LEN_MAX];
-       xrow_encode_synchro(&row, body, &req);
-       row.replica_id = req.replica_id;
-       row.sync = sync;
-       coio_write_xrow(&relay->io, &row);
+       /*
+        * Send out the latest limbo state. Don't do that when limbo is 
unused,
+        * let the old instances join without trouble.
+        */
+       if (req.replica_id != REPLICA_ID_NIL) {
+               char body[XROW_SYNCHRO_BODY_LEN_MAX];
+               xrow_encode_synchro(&row, body, &req);
+               row.replica_id = req.replica_id;
+               row.sync = sync;
+               coio_write_xrow(&relay->io, &row);
+       }

         /* Send read view to the replica. */
         engine_join_xc(&ctx, &relay->stream);


================================

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list