From: Serge Petrenko <sergepetrenko@tarantool.org>
To: tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH] replication: enter orphan mode on every erroneous config change
Date: Fri, 23 Aug 2019 18:33:56 +0300 [thread overview]
Message-ID: <2D6789E2-1CE2-4512-849F-AE6CCEE4B689@tarantool.org> (raw)
In-Reply-To: <20190821094455.GX13834@esperanza>
[-- Attachment #1: Type: text/plain, Size: 7299 bytes --]
Hi! Thank you for review.
I’ve addressed your comments in v2. Please, check it out
> 21 авг. 2019 г., в 12:44, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
>
> On Mon, Aug 19, 2019 at 03:11:01PM +0300, Serge Petrenko wrote:
>> We only entered orphan mode on bootrap and local recovery, but threw an
>> error when replicaton config was changed on the fly.
>> For consistency, in this case we should also enter orphan mode when
>> an instance fails to connect to quorum remote instances.
>>
>> Closes #4424
>
> A doc bot request is probably required.
>
>> ---
>> https://github.com/tarantool/tarantool/issues/4424
>> https://github.com/tarantool/tarantool/tree/sp/gh-4424-repl-config-errors
>>
>> src/box/box.cc | 9 ++--
>> src/box/replication.cc | 30 ++++++-------
>> src/box/replication.h | 2 +-
>> test/replication/misc.result | 77 ++++++++++++++++++++++++++++++++--
>> test/replication/misc.test.lua | 29 ++++++++++++-
>> 5 files changed, 125 insertions(+), 22 deletions(-)
>>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 66cd6d3a4..43cc32d87 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -666,7 +666,7 @@ cfg_get_replication(int *p_count)
>> * Sync box.cfg.replication with the cluster registry, but
>> * don't start appliers.
>> */
>> -static void
>> +static int
>> box_sync_replication(bool connect_quorum)
>> {
>> int count = 0;
>> @@ -679,9 +679,10 @@ box_sync_replication(bool connect_quorum)
>> applier_delete(appliers[i]); /* doesn't affect diag */
>> });
>>
>> - replicaset_connect(appliers, count, connect_quorum);
>> + int rc = replicaset_connect(appliers, count, connect_quorum);
>
> A function typically returns -1 as an error condition, but in this case
> -1 means something different. Besdies, this function can also throw an
> exception, and there isn't a single word in the comment regarding it,
> which is confusing.
>
> Can you somehow manage without using a return code?
>
>>
>> guard.is_active = false;
>> + return rc;
>> }
>>
>> void
>> @@ -703,7 +704,9 @@ box_set_replication(void)
>> * to connect to at least replication_connect_quorum
>> * masters.
>> */
>> - box_sync_replication(true);
>> + if (box_sync_replication(true) != 0) {
>> + return;
>> + }
>
> The comment above is clearly outdated by this patch.
>
>> /* Follow replica */
>> replicaset_follow();
>> /* Wait until appliers are in sync */
>> diff --git a/src/box/replication.cc b/src/box/replication.cc
>> index 28f7acedc..e9d0a9206 100644
>> --- a/src/box/replication.cc
>> +++ b/src/box/replication.cc
>> @@ -598,14 +598,14 @@ applier_on_connect_f(struct trigger *trigger, void *event)
>> applier_pause(applier);
>> }
>>
>> -void
>> +int
>> replicaset_connect(struct applier **appliers, int count,
>> bool connect_quorum)
>> {
>> if (count == 0) {
>> /* Cleanup the replica set. */
>> replicaset_update(appliers, count);
>> - return;
>> + return 0;
>> }
>>
>> say_info("connecting to %d replicas", count);
>> @@ -660,9 +660,13 @@ replicaset_connect(struct applier **appliers, int count,
>> count - state.connected, count);
>> /* Timeout or connection failure. */
>> if (connect_quorum && state.connected < quorum) {
>> - diag_set(ClientError, ER_CFG, "replication",
>> - "failed to connect to one or more replicas");
>> - goto error;
>> + /* Destroy appliers */
>> + for (int i = 0; i < count; i++) {
>> + trigger_clear(&triggers[i].base);
>> + applier_stop(appliers[i]);
>> + }
>> + box_set_orphan(true);
>> + return -1;
>> }
>> } else {
>> say_info("connected to %d replicas", state.connected);
>> @@ -685,16 +689,14 @@ replicaset_connect(struct applier **appliers, int count,
>> try {
>> replicaset_update(appliers, count);
>> } catch (Exception *e) {
>> - goto error;
>> - }
>> - return;
>> -error:
>> - /* Destroy appliers */
>> - for (int i = 0; i < count; i++) {
>> - trigger_clear(&triggers[i].base);
>> - applier_stop(appliers[i]);
>> + /* Destroy appliers */
>> + for (int i = 0; i < count; i++) {
>> + trigger_clear(&triggers[i].base);
>> + applier_stop(appliers[i]);
>> + }
>
> Copy-and-paste... Let's please try to avoid that. I'd prefer if you
> called replicaset_follow and _sync in any case and let them decide
> if the instance should enter the orphan state.
>
>> + diag_raise();
>> }
>> - diag_raise();
>> + return 0;
>> }
>>
>> bool
>> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
>> index 99e995509..91a77b584 100644
>> --- a/test/replication/misc.test.lua
>> +++ b/test/replication/misc.test.lua
>> @@ -9,6 +9,8 @@ replication_timeout = box.cfg.replication_timeout
>> replication_connect_timeout = box.cfg.replication_connect_timeout
>> box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}}
>> box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>>
>> -- gh-3606 - Tarantool crashes if box.cfg.replication is updated concurrently
>> fiber = require('fiber')
>> @@ -19,7 +21,9 @@ f()
>> c:get()
>> c:get()
>>
>> -box.cfg{replication_timeout = replication_timeout, replication_connect_timeout = replication_connect_timeout}
>> +box.cfg{replication = "", replication_timeout = replication_timeout, replication_connect_timeout = replication_connect_timeout}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>>
>> -- gh-3111 - Allow to rebootstrap a replica from a read-only master
>> replica_uuid = uuid.new()
>> @@ -293,3 +297,26 @@ test_run:cmd("cleanup server replica")
>> test_run:cmd("delete server replica")
>> test_run:cleanup_cluster()
>> box.schema.user.revoke('guest', 'replication')
>> +
>> +--
>> +-- gh-4424 Always enter orphan mode on error in replication
>> +-- configuration change.
>> +--
>> +replication_connect_timeout = box.cfg.replication_connect_timeout
>> +replication_connect_quorum = box.cfg.replication_connect_quorum
>> +
>> +box.cfg{replication="12345", replication_connect_timeout=0.1, replication_connect_quorum=1}
>
> Let's please also test a more sophisticated use case: when there are two
> replicas and we manage to connect to one of them but fail to connect to
> another.
>
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> +box.cfg{replication = ""}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> +-- no switch to orphan when quorum == 0
>> +box.cfg{replication= "12345", replication_connect_quorum=0}
>> +box.info.status
>> +box.info.ro <http://box.info.ro/>
>> +test_run:cmd("setopt delimiter ';'")
>> +box.cfg{replication="",
>> + replication_connect_timeout=replication_connect_timeout,
>> + replication_connect_quorum=replication_connect_quorum};
>> +test_run:cmd("setopt delimiter ''");
>
> Nit: I'd rewrite it without using 'setopt delimiter'::
>
> box.cfg{replication = {}}
> box.cfg{replication_connect_timeout = replication_connect_timeout)
> box.cfg{replication_connect_quorum = replication_connect_quorum)
--
Serge Petrenko
sergepetrenko@tarantool.org
[-- Attachment #2: Type: text/html, Size: 36326 bytes --]
prev parent reply other threads:[~2019-08-23 15:33 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-19 12:11 Serge Petrenko
2019-08-21 9:44 ` Vladimir Davydov
2019-08-23 15:33 ` Serge Petrenko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2D6789E2-1CE2-4512-849F-AE6CCEE4B689@tarantool.org \
--to=sergepetrenko@tarantool.org \
--cc=tarantool-patches@freelists.org \
--subject='[tarantool-patches] Re: [PATCH] replication: enter orphan mode on every erroneous config change' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox