[Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join
Serge Petrenko
sergepetrenko at tarantool.org
Fri Jun 18 00:00:18 MSK 2021
16.06.2021 00:00, Vladislav Shpilevoy пишет:
> Thanks for working on this!
>
> Hm. The patch makes me think why don't we send the Raft
> checkpoint on join too?
>
> Otherwise it might happen that a replica joined, didn't
> get the most actual Raft term yet, then the leader died,
> and the replica would start elections with term 1. Even if
> the latest term was 1000.
>
> Nothing critical, Raft will probably work, but it could
> be an "optimization"? Also it would be consistent with the
> libmo state - send all the snapshot data on join.
I tried to implement such a patch, but faced the following problem:
Unfortunately, we don't have information about replica's version
during join, so I can only send raft state based on term > 1.
Also, while writing a commit message I understood that this patch
doesn't help much. Even if a node joins, but doesn't subscribe to the
leader, it will still subscribe to someone else and receive the latest
Raft state.
After all, our doc states full-mesh is required for Raft to work, so we'll
have someone else to subscribe to and receive Raft state from for sure.
The patch's small and simple but I think it's not needed.
I've made a tiny amendment to this commit though, please find the diff
below.
> Btw, the limbo state contains a term. And it means
> that after join, but before subscribe, the limbo's term
> is bigger than raft's term. Even though in the comments
> of the limbo we say:
>
> * It means the limbo's term might be smaller than the raft term, while
> * there are ongoing elections, or the leader is already known and this
> * instance hasn't read its PROMOTE request yet. During other times the
> * limbo and raft are in sync and the terms are the same.
>
> which means the limbo term should be always <= raft term.
> Can this break something? Is it possible to make a test confirming
> that we can't send the limbo state before the raft state?
I don't think this could break anything.
Limbo and Raft terms are not that interconnected now.
================================
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 289dea0f3..e05b53d5d 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -408,12 +408,17 @@ relay_initial_join(int fd, uint64_t sync, struct
vclock *vclock)
row.sync = sync;
coio_write_xrow(&relay->io, &row);
- /* Send out the latest limbo state. */
- char body[XROW_SYNCHRO_BODY_LEN_MAX];
- xrow_encode_synchro(&row, body, &req);
- row.replica_id = req.replica_id;
- row.sync = sync;
- coio_write_xrow(&relay->io, &row);
+ /*
+ * Send out the latest limbo state. Don't do that when limbo is
unused,
+ * let the old instances join without trouble.
+ */
+ if (req.replica_id != REPLICA_ID_NIL) {
+ char body[XROW_SYNCHRO_BODY_LEN_MAX];
+ xrow_encode_synchro(&row, body, &req);
+ row.replica_id = req.replica_id;
+ row.sync = sync;
+ coio_write_xrow(&relay->io, &row);
+ }
/* Send read view to the replica. */
engine_join_xc(&ctx, &relay->stream);
================================
--
Serge Petrenko
More information about the Tarantool-patches
mailing list