Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Ostanevich via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Cc: tarantool-patches <tarantool-patches@dev.tarantool.org>
Subject: Re: [Tarantool-patches]  [PATCH 1/1] replication: set replica ID before _cluster commit
Date: Thu, 29 Jul 2021 01:20:39 +0300	[thread overview]
Message-ID: <1627510838.498581177@f478.i.mail.ru> (raw)
In-Reply-To: <959e1a8f-f5f8-85db-e0ed-22d249ed496c@tarantool.org>

[-- Attachment #1: Type: text/plain, Size: 6914 bytes --]


Thanks for updates! LGTM.

Best regards,
Sergos 

Wednesday, 28 July 2021, 00:53 +0300 from Shpilevoy Vladislav  <v.shpilevoy@tarantool.org>:
>Hi! Thanks for the review!
>
>>> On 25 Jul 2021, at 19:53, Vladislav Shpilevoy via Tarantool-patches < tarantool-patches@dev.tarantool.org > wrote:
>>>
>>> Replica registration works via looking for the smallest not
>>> occupied ID in _cluster and inserting it into the space.
>>>
>>> It works not so good when mvcc is enabled. In particular, if more
>>> than 1 replica try to register at the same time, they might get
>>> the same replica_id because don't see changes of each other until
>>> the registration in _cluster is complete.
>>>
>>> This in the end leads to all replicas failing the registration
>>> except one with the 'duplicate key' error (primary index in
>>> _cluster is replica ID).
>>>
>>> The patch makes the replicas occupy their ID before they commit it
>>> into _cluster. And new replica ID search now uses the replica ID
>>> map instead of _cluster iterator.
>>>
>>> This way the registration works like before - like MVCC does not
>>> exist which is fine.
>> 
>> Did you discuss the MVCC capabilities - if we can address it there, by 
>> setting a dedicated flag for this particular - and perhaps for some 
>> other internal - space(s) to suppress MVCC ‘coverage’ for them. The 
>> solution will be way more common with supposedly less hassle with 
>> local structures, triggers, and so on?
>
>I did think of that in the beginning, but then I decided it is a bad
>idea. Firstly, from the side of unnecessary complication of MVCC which
>is complex enough already. Secondly, _cluster DQL is going to see the
>uncommitted changes despite MVCC turned on. Does not look well. In my
>patch I made so that you do not see the tuples in :get() and :select()
>in _cluster and you won't stumble into the "dirty" errors because you
>are not supposed to make any :insert() nor :replace() into this space
>manually. Automatic registration bypasses the busy IDs before trying
>to insert them.
>
>>> diff --git a/src/box/alter.cc b/src/box/alter.cc
>>> index 89bb5946c..64ba09021 100644
>>> --- a/src/box/alter.cc
>>> +++ b/src/box/alter.cc
>>> @@ -4280,14 +4244,34 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
>>> 					  "updates of instance uuid");
>>> 				return -1;
>>> 			}
>>> -		} else {
>>> -			struct trigger *on_commit;
>>> -			on_commit = txn_alter_trigger_new(register_replica,
>>> -							  new_tuple);
>>> -			if (on_commit == NULL)
>>> -				return -1;
>>> -			txn_stmt_on_commit(stmt, on_commit);
>>> +			return 0;
>>> +		}
>>> +		/*
>>> +		 * With read-views enabled there might be already a replica
>>> +		 * whose registration is in progress in another transaction.
>>> +		 * With the same replica ID.
>>> +		 */
>>> +		if (replica_by_id(replica_id) != NULL) {
>>> +			diag_set(ClientError, ER_UNSUPPORTED, "Tarantool",
>>> +				 "more than 1 replica with the same ID");
>> 
>> There should be more details on replica ID/UUID/name here, same as in unregister
>> panic below.
>
>The info would be visible in the logs anyway because before something
>is added to _cluster, there are always more logs about 'joining ...',
>'subscribed replica ...' and so on.
>
>But I don't mind to extend the logs.
>
>====================
>diff --git a/src/box/alter.cc b/src/box/alter.cc
>index 64ba09021..390199298 100644
>--- a/src/box/alter.cc
>+++ b/src/box/alter.cc
>@@ -4251,9 +4251,14 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
> 		 * whose registration is in progress in another transaction.
> 		 * With the same replica ID.
> 		 */
>-		if (replica_by_id(replica_id) != NULL) {
>-			diag_set(ClientError, ER_UNSUPPORTED, "Tarantool",
>-				 "more than 1 replica with the same ID");
>+		struct replica *replica = replica_by_id(replica_id);
>+		if (replica != NULL) {
>+			const char *msg = tt_sprintf(
>+				"more than 1 replica with the same ID %u: new "
>+				"uuid - %s, old uuid - %s", replica_id,
>+				tt_uuid_str(&replica_uuid),
>+				tt_uuid_str(&replica->uuid));
>+			diag_set(ClientError, ER_UNSUPPORTED, "Tarantool", msg);
> 			return -1;
> 		}
> 		struct trigger *on_rollback = txn_alter_trigger_new(
>@@ -4265,7 +4270,7 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
> 		 * replica ID now. While WAL write is in progress, new replicas
> 		 * might come, they should see the ID is already in use.
> 		 */
>-		struct replica *replica = replica_by_uuid(&replica_uuid);
>+		replica = replica_by_uuid(&replica_uuid);
> 		if (replica != NULL)
> 			replica_set_id(replica, replica_id);
> 		else
>@@ -4283,6 +4288,10 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
> 			return -1;
> 		if (replica_check_id(replica_id) != 0)
> 			return -1;
>+		tt_uuid replica_uuid;
>+		if (tuple_field_uuid(old_tuple, BOX_CLUSTER_FIELD_UUID,
>+				    &replica_uuid) != 0)
>+			return -1;
> 
> 		struct replica *replica = replica_by_id(replica_id);
> 		if (replica == NULL) {
>@@ -4292,7 +4301,15 @@ on_replace_dd_cluster(struct trigger *trigger, void *event)
> 			 * subsystem is affected.
> 			 */
> 			panic("Tried to unregister a replica not stored in "
>-			      "replica_by_id map, id is %u", replica_id);
>+			      "replica_by_id map, id is %u, uuid is %s",
>+			      replica_id, tt_uuid_str(&replica_uuid));
>+		}
>+		if (!tt_uuid_is_equal(&replica->uuid, &replica_uuid)) {
>+			panic("Tried to unregister a replica with id %u, but "
>+			      "its uuid is different from stored internally: "
>+			      "in space - %s, internally - %s", replica_id,
>+			      tt_uuid_str(&replica_uuid),
>+			      tt_uuid_str(&replica->uuid));
> 		}
> 		/*
> 		 * Unregister only after commit. Otherwise if the transaction
>====================
>
>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index 8c10a99dd..5c10aceff 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -2407,22 +2407,9 @@ box_on_join(const tt_uuid *instance_uuid)
>>> 		return; /* nothing to do - already registered */
>>>
>>> 	box_check_writable_xc();
>>> -
>>> -	/** Find the largest existing replica id. */
>>> -	struct space *space = space_cache_find_xc(BOX_CLUSTER_ID);
>>> -	struct index *index = index_find_system_xc(space, 0);
>>> -	struct iterator *it = index_create_iterator_xc(index, ITER_ALL,
>>> -						       NULL, 0);
>>> -	IteratorGuard iter_guard(it);
>>> -	struct tuple *tuple;
>>> -	/** Assign a new replica id. */
>>> -	uint32_t replica_id = 1;
>>> -	while ((tuple = iterator_next_xc(it)) != NULL) {
>>> -		if (tuple_field_u32_xc(tuple,
>>> -				       BOX_CLUSTER_FIELD_ID) != replica_id)
>>> -			break;
>>> -		replica_id++;
>>> -	}
>>> +	uint32_t replica_id;
>>> +	if (replica_find_new_id(&replica_id) != 0)
>>> +		diag_raise();
>> 
>> Any info on why register fails?
>
>replica_find_new_id() sets diag with a proper error. This is
>why I do diag_raise() here.

[-- Attachment #2: Type: text/html, Size: 14833 bytes --]

  reply	other threads:[~2021-07-28 22:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-25 16:53 Vladislav Shpilevoy via Tarantool-patches
2021-07-25 18:31 ` Sergey Petrenko via Tarantool-patches
2021-07-26 22:06   ` Vladislav Shpilevoy via Tarantool-patches
2021-07-27 12:13 ` Cyrill Gorcunov via Tarantool-patches
2021-07-27 15:10 ` Sergey Ostanevich via Tarantool-patches
2021-07-27 21:53   ` Vladislav Shpilevoy via Tarantool-patches
2021-07-28 22:20     ` Sergey Ostanevich via Tarantool-patches [this message]
2021-07-29 21:42 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1627510838.498581177@f478.i.mail.ru \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=sergos@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches]  [PATCH 1/1] replication: set replica ID before _cluster commit' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox