[Tarantool-patches] [PATCH v2 05/11] [wip] box: do not register outgoing connections

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Thu Sep 10 02:16:56 MSK 2020


Replication protocol's first stage for non-anonymous replicas is
that the replica should be registered in _cluster to get a unique
ID number.

That happens, when replica connects to a writable node, which
performs the registration. So it means, registration always
happens on the master node when appears an incoming request for
it. When a relay is created.

That wasn't the case for bootstrap. If box.cfg.replication wasn't
empty on the master node doing the cluster bootstrap, it
registered all the outgoing connections in _cluster. Note, the
target node could be even anonymous, but still was registered.
Also the registration happened for the remote replicas even before
their bootstrap.

That breaks the protocol, and leads to registration of anon
replicas sometimes. The patch drops it.

The main motivation here though is Raft cluster bootstrap
specifics. During Raft bootstrap it is going to be very important
that non-bootstrapped nodes should not be registered in _cluster.
It would break the leader election during bootstrap.

Closes #5287
---
The patch fixes 5287, but now the same test leads to a crash. Because in the
code there is no handling for the case when a not anon replica becomes anon.

That happens, when a master connects to a replica before it is bootstrapped, the
replica allows it, and then after the replica is boostrapped, it sends SUBSCRIBE
right away. Then the master crashes in relay_subscribe() in the first line,
because the replica was connected as not anon (replica->anon == false), but it
does not have an ID (replica->id == REPLICA_ID_NIL).

I am not sure how to fix it now. Decided to think more about it, and see what
reviewers think. In the current state the fix is enough to unblock Raft, so it
is not urgent.

 src/box/box.cc | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index eeb00d5e2..3214ec340 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2217,15 +2217,6 @@ bootstrap_master(const struct tt_uuid *replicaset_uuid)
 	box_register_replica(replica_id, &INSTANCE_UUID);
 	assert(replica_by_uuid(&INSTANCE_UUID)->id == 1);
 
-	/* Register other cluster members */
-	replicaset_foreach(replica) {
-		if (tt_uuid_is_equal(&replica->uuid, &INSTANCE_UUID))
-			continue;
-		assert(replica->applier != NULL);
-		box_register_replica(++replica_id, &replica->uuid);
-		assert(replica->id == replica_id);
-	}
-
 	/* Set UUID of a new replica set */
 	box_set_replicaset_uuid(replicaset_uuid);
 
-- 
2.21.1 (Apple Git-122.3)



More information about the Tarantool-patches mailing list