Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH] replication: enter orphan mode on every erroneous config change
Date: Fri, 23 Aug 2019 18:33:56 +0300	[thread overview]
Message-ID: <2D6789E2-1CE2-4512-849F-AE6CCEE4B689@tarantool.org> (raw)
In-Reply-To: <20190821094455.GX13834@esperanza>

[-- Attachment #1: Type: text/plain, Size: 7299 bytes --]

Hi! Thank you for review.
I’ve addressed your comments in v2. Please, check it out


> 21 авг. 2019 г., в 12:44, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
> 
> On Mon, Aug 19, 2019 at 03:11:01PM +0300, Serge Petrenko wrote:
>> We only entered orphan mode on bootrap and local recovery, but threw an
>> error when replicaton config was changed on the fly.
>> For consistency, in this case we should also enter orphan mode when
>> an instance fails to connect to quorum remote instances.
>> 
>> Closes #4424
> 
> A doc bot request is probably required.
> 
>> ---
>> https://github.com/tarantool/tarantool/issues/4424
>> https://github.com/tarantool/tarantool/tree/sp/gh-4424-repl-config-errors
>> 
>> src/box/box.cc                 |  9 ++--
>> src/box/replication.cc         | 30 ++++++-------
>> src/box/replication.h          |  2 +-
>> test/replication/misc.result   | 77 ++++++++++++++++++++++++++++++++--
>> test/replication/misc.test.lua | 29 ++++++++++++-
>> 5 files changed, 125 insertions(+), 22 deletions(-)
>> 
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 66cd6d3a4..43cc32d87 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -666,7 +666,7 @@ cfg_get_replication(int *p_count)
>>  * Sync box.cfg.replication with the cluster registry, but
>>  * don't start appliers.
>>  */
>> -static void
>> +static int
>> box_sync_replication(bool connect_quorum)
>> {
>> 	int count = 0;
>> @@ -679,9 +679,10 @@ box_sync_replication(bool connect_quorum)
>> 			applier_delete(appliers[i]); /* doesn't affect diag */
>> 	});
>> 
>> -	replicaset_connect(appliers, count, connect_quorum);
>> +	int rc = replicaset_connect(appliers, count, connect_quorum);
> 
> A function typically returns -1 as an error condition, but in this case
> -1 means something different. Besdies, this function can also throw an
> exception, and there isn't a single word in the comment regarding it,
> which is confusing.
> 
> Can you somehow manage without using a return code?
> 
>> 
>> 	guard.is_active = false;
>> +	return rc;
>> }
>> 
>> void
>> @@ -703,7 +704,9 @@ box_set_replication(void)
>> 	 * to connect to at least replication_connect_quorum
>> 	 * masters.
>> 	 */
>> -	box_sync_replication(true);
>> +	if (box_sync_replication(true) != 0) {
>> +		return;
>> +	}
> 
> The comment above is clearly outdated by this patch.
> 
>> 	/* Follow replica */
>> 	replicaset_follow();
>> 	/* Wait until appliers are in sync */
>> diff --git a/src/box/replication.cc b/src/box/replication.cc
>> index 28f7acedc..e9d0a9206 100644
>> --- a/src/box/replication.cc
>> +++ b/src/box/replication.cc
>> @@ -598,14 +598,14 @@ applier_on_connect_f(struct trigger *trigger, void *event)
>> 	applier_pause(applier);
>> }
>> 
>> -void
>> +int
>> replicaset_connect(struct applier **appliers, int count,
>> 		   bool connect_quorum)
>> {
>> 	if (count == 0) {
>> 		/* Cleanup the replica set. */
>> 		replicaset_update(appliers, count);
>> -		return;
>> +		return 0;
>> 	}
>> 
>> 	say_info("connecting to %d replicas", count);
>> @@ -660,9 +660,13 @@ replicaset_connect(struct applier **appliers, int count,
>> 			 count - state.connected, count);
>> 		/* Timeout or connection failure. */
>> 		if (connect_quorum && state.connected < quorum) {
>> -			diag_set(ClientError, ER_CFG, "replication",
>> -				 "failed to connect to one or more replicas");
>> -			goto error;
>> +			/* Destroy appliers */
>> +			for (int i = 0; i < count; i++) {
>> +				trigger_clear(&triggers[i].base);
>> +				applier_stop(appliers[i]);
>> +			}
>> +			box_set_orphan(true);
>> +			return -1;
>> 		}
>> 	} else {
>> 		say_info("connected to %d replicas", state.connected);
>> @@ -685,16 +689,14 @@ replicaset_connect(struct applier **appliers, int count,
>> 	try {
>> 		replicaset_update(appliers, count);
>> 	} catch (Exception *e) {
>> -		goto error;
>> -	}
>> -	return;
>> -error:
>> -	/* Destroy appliers */
>> -	for (int i = 0; i < count; i++) {
>> -		trigger_clear(&triggers[i].base);
>> -		applier_stop(appliers[i]);
>> +		/* Destroy appliers */
>> +		for (int i = 0; i < count; i++) {
>> +			trigger_clear(&triggers[i].base);
>> +			applier_stop(appliers[i]);
>> +		}
> 
> Copy-and-paste... Let's please try to avoid that. I'd prefer if you
> called replicaset_follow and _sync in any case and let them decide
> if the instance should enter the orphan state.
> 
>> +		diag_raise();
>> 	}
>> -	diag_raise();
>> +	return 0;
>> }
>> 
>> bool
>> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
>> index 99e995509..91a77b584 100644
>> --- a/test/replication/misc.test.lua
>> +++ b/test/replication/misc.test.lua
>> @@ -9,6 +9,8 @@ replication_timeout = box.cfg.replication_timeout
>> replication_connect_timeout = box.cfg.replication_connect_timeout
>> box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}}
>> box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> 
>> -- gh-3606 - Tarantool crashes if box.cfg.replication is updated concurrently
>> fiber = require('fiber')
>> @@ -19,7 +21,9 @@ f()
>> c:get()
>> c:get()
>> 
>> -box.cfg{replication_timeout = replication_timeout, replication_connect_timeout = replication_connect_timeout}
>> +box.cfg{replication = "", replication_timeout = replication_timeout, replication_connect_timeout = replication_connect_timeout}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> 
>> -- gh-3111 - Allow to rebootstrap a replica from a read-only master
>> replica_uuid = uuid.new()
>> @@ -293,3 +297,26 @@ test_run:cmd("cleanup server replica")
>> test_run:cmd("delete server replica")
>> test_run:cleanup_cluster()
>> box.schema.user.revoke('guest', 'replication')
>> +
>> +--
>> +-- gh-4424 Always enter orphan mode on error in replication
>> +-- configuration change.
>> +--
>> +replication_connect_timeout = box.cfg.replication_connect_timeout
>> +replication_connect_quorum = box.cfg.replication_connect_quorum
>> +
>> +box.cfg{replication="12345", replication_connect_timeout=0.1, replication_connect_quorum=1}
> 
> Let's please also test a more sophisticated use case: when there are two
> replicas and we manage to connect to one of them but fail to connect to
> another.
> 
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> +box.cfg{replication = ""}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> +-- no switch to orphan when quorum == 0
>> +box.cfg{replication= "12345", replication_connect_quorum=0}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> +test_run:cmd("setopt delimiter ';'")
>> +box.cfg{replication="",
>> +        replication_connect_timeout=replication_connect_timeout,
>> +        replication_connect_quorum=replication_connect_quorum};
>> +test_run:cmd("setopt delimiter ''");
> 
> Nit: I'd rewrite it without using 'setopt delimiter'::
> 
> box.cfg{replication = {}}
> box.cfg{replication_connect_timeout = replication_connect_timeout)
> box.cfg{replication_connect_quorum = replication_connect_quorum)


--
Serge Petrenko
sergepetrenko@tarantool.org



[-- Attachment #2: Type: text/html, Size: 36326 bytes --]

      reply	other threads:[~2019-08-23 15:33 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-19 12:11 Serge Petrenko
2019-08-21  9:44 ` Vladimir Davydov
2019-08-23 15:33   ` Serge Petrenko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2D6789E2-1CE2-4512-849F-AE6CCEE4B689@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH] replication: enter orphan mode on every erroneous config change' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox