[Tarantool-patches] [PATCH 1/1] replication: set replica ID before _cluster commit

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Wed Jul 28 00:53:44 MSK 2021


Hi! Thanks for the review!

>> On 25 Jul 2021, at 19:53, Vladislav Shpilevoy via Tarantool-patches <tarantool-patches at dev.tarantool.org> wrote:
>>
>> Replica registration works via looking for the smallest not
>> occupied ID in _cluster and inserting it into the space.
>>
>> It works not so good when mvcc is enabled. In particular, if more
>> than 1 replica try to register at the same time, they might get
>> the same replica_id because don't see changes of each other until
>> the registration in _cluster is complete.
>>
>> This in the end leads to all replicas failing the registration
>> except one with the 'duplicate key' error (primary index in
>> _cluster is replica ID).
>>
>> The patch makes the replicas occupy their ID before they commit it
>> into _cluster. And new replica ID search now uses the replica ID
>> map instead of _cluster iterator.
>>
>> This way the registration works like before - like MVCC does not
>> exist which is fine.
> 
> Did you discuss the MVCC capabilities - if we can address it there, by 
> setting a dedicated flag for this particular - and perhaps for some 
> other internal - space(s) to suppress MVCC ‘coverage’ for them. The 
> solution will be way more common with supposedly less hassle with 
> local structures, triggers, and so on?

I did think of that in the beginning, but then I decided it is a bad
idea. Firstly, from the side of unnecessary complication of MVCC which
is complex enough already. Secondly, _cluster DQL is going to see the
uncommitted changes despite MVCC turned on. Does not look well. In my
patch I made so that you do not see the tuples in :get() and :select()
in _cluster and you won't stumble into the "dirty" errors because you
are not supposed to make any :insert() nor :replace() into this space
manually. Automatic registration bypasses the busy IDs before trying
to insert them.

>> diff --git a/src/box/alter.cc b/src/box/alter.cc
>> index 89bb5946c..64ba09021 100644
>> --- a/src/box/alter.cc
>> +++ b/src/box/alter.cc
>> @@ -4280,14 +4244,34 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
>> 					  "updates of instance uuid");
>> 				return -1;
>> 			}
>> -		} else {
>> -			struct trigger *on_commit;
>> -			on_commit = txn_alter_trigger_new(register_replica,
>> -							  new_tuple);
>> -			if (on_commit == NULL)
>> -				return -1;
>> -			txn_stmt_on_commit(stmt, on_commit);
>> +			return 0;
>> +		}
>> +		/*
>> +		 * With read-views enabled there might be already a replica
>> +		 * whose registration is in progress in another transaction.
>> +		 * With the same replica ID.
>> +		 */
>> +		if (replica_by_id(replica_id) != NULL) {
>> +			diag_set(ClientError, ER_UNSUPPORTED, "Tarantool",
>> +				 "more than 1 replica with the same ID");
> 
> There should be more details on replica ID/UUID/name here, same as in unregister
> panic below.

The info would be visible in the logs anyway because before something
is added to _cluster, there are always more logs about 'joining ...',
'subscribed replica ...' and so on.

But I don't mind to extend the logs.

====================
diff --git a/src/box/alter.cc b/src/box/alter.cc
index 64ba09021..390199298 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -4251,9 +4251,14 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
 		 * whose registration is in progress in another transaction.
 		 * With the same replica ID.
 		 */
-		if (replica_by_id(replica_id) != NULL) {
-			diag_set(ClientError, ER_UNSUPPORTED, "Tarantool",
-				 "more than 1 replica with the same ID");
+		struct replica *replica = replica_by_id(replica_id);
+		if (replica != NULL) {
+			const char *msg = tt_sprintf(
+				"more than 1 replica with the same ID %u: new "
+				"uuid - %s, old uuid - %s", replica_id,
+				tt_uuid_str(&replica_uuid),
+				tt_uuid_str(&replica->uuid));
+			diag_set(ClientError, ER_UNSUPPORTED, "Tarantool", msg);
 			return -1;
 		}
 		struct trigger *on_rollback = txn_alter_trigger_new(
@@ -4265,7 +4270,7 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
 		 * replica ID now. While WAL write is in progress, new replicas
 		 * might come, they should see the ID is already in use.
 		 */
-		struct replica *replica = replica_by_uuid(&replica_uuid);
+		replica = replica_by_uuid(&replica_uuid);
 		if (replica != NULL)
 			replica_set_id(replica, replica_id);
 		else
@@ -4283,6 +4288,10 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
 			return -1;
 		if (replica_check_id(replica_id) != 0)
 			return -1;
+		tt_uuid replica_uuid;
+		if (tuple_field_uuid(old_tuple, BOX_CLUSTER_FIELD_UUID,
+				    &replica_uuid) != 0)
+			return -1;
 
 		struct replica *replica = replica_by_id(replica_id);
 		if (replica == NULL) {
@@ -4292,7 +4301,15 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
 			 * subsystem is affected.
 			 */
 			panic("Tried to unregister a replica not stored in "
-			      "replica_by_id map, id is %u", replica_id);
+			      "replica_by_id map, id is %u, uuid is %s",
+			      replica_id, tt_uuid_str(&replica_uuid));
+		}
+		if (!tt_uuid_is_equal(&replica->uuid, &replica_uuid)) {
+			panic("Tried to unregister a replica with id %u, but "
+			      "its uuid is different from stored internally: "
+			      "in space - %s, internally - %s", replica_id,
+			      tt_uuid_str(&replica_uuid),
+			      tt_uuid_str(&replica->uuid));
 		}
 		/*
 		 * Unregister only after commit. Otherwise if the transaction
====================

>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 8c10a99dd..5c10aceff 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -2407,22 +2407,9 @@ box_on_join(const tt_uuid *instance_uuid)
>> 		return; /* nothing to do - already registered */
>>
>> 	box_check_writable_xc();
>> -
>> -	/** Find the largest existing replica id. */
>> -	struct space *space = space_cache_find_xc(BOX_CLUSTER_ID);
>> -	struct index *index = index_find_system_xc(space, 0);
>> -	struct iterator *it = index_create_iterator_xc(index, ITER_ALL,
>> -						       NULL, 0);
>> -	IteratorGuard iter_guard(it);
>> -	struct tuple *tuple;
>> -	/** Assign a new replica id. */
>> -	uint32_t replica_id = 1;
>> -	while ((tuple = iterator_next_xc(it)) != NULL) {
>> -		if (tuple_field_u32_xc(tuple,
>> -				       BOX_CLUSTER_FIELD_ID) != replica_id)
>> -			break;
>> -		replica_id++;
>> -	}
>> +	uint32_t replica_id;
>> +	if (replica_find_new_id(&replica_id) != 0)
>> +		diag_raise();
> 
> Any info on why register fails?

replica_find_new_id() sets diag with a proper error. This is
why I do diag_raise() here.


More information about the Tarantool-patches mailing list