From: Konstantin Osipov <kostja@tarantool.org>
To: Konstantin Belyavskiy <k.belyavskiy@tarantool.org>
Cc: vdavydov@tarantool.org, georgy@tarantool.org,
tarantool-patches@freelists.org
Subject: Re: [PATCH] replication: fix bug with read-only replica as a bootstrap leader
Date: Tue, 22 May 2018 19:45:42 +0300 [thread overview]
Message-ID: <20180522164542.GC28644@atlas> (raw)
In-Reply-To: <20180522154004.2278-1-k.belyavskiy@tarantool.org>
* Konstantin Belyavskiy <k.belyavskiy@tarantool.org> [18/05/22 18:41]:
It's OK to push.
As a nitpick, I'd prefer extracting leader search loop into a
routine, rather than adding a goto label.
I'm not crazy about goto, but i think in this case the could would
win from not having it.
> Another broken case. Adding a new replica to cluster:
> + if (replica->applier->remote_is_ro &&
> + replica->applier->vclock.signature == 0)
> In this case we may got an ER_READONLY, since signature is not 0.
> So leader election now has two phases:
> 1. To select among read-write replicas.
> 2. If no such found, try old algorithm for backward compatibility
> (case then all replicas exist in cluster table).
>
> Closes #3257
> ---
> https://github.com/tarantool/tarantool/issues/3257
> https://github.com/tarantool/tarantool/tree/kbelyavs/gh-3257-fix-bug-with-read-only-as-a-leader
>
> src/box/replication.cc | 22 +++++++++++++++++-----
> test/replication/replica_uuid_ro3.lua | 1 +
> test/replication/replicaset_ro_mostly.result | 20 ++++++++++++++++++++
> test/replication/replicaset_ro_mostly.test.lua | 8 ++++++++
> 4 files changed, 46 insertions(+), 5 deletions(-)
> create mode 120000 test/replication/replica_uuid_ro3.lua
>
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 0b770c913..8dcf1f656 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -691,16 +691,24 @@ struct replica *
> replicaset_leader(void)
> {
> struct replica *leader = NULL;
> + bool skip_ro = true;
> + /**
> + * Two loops, first prefers read-write replicas among others.
> + * Second for backward compatibility, if there is no such
> + * replicas at all.
> + */
> +loop:
> replicaset_foreach(replica) {
> if (replica->applier == NULL)
> continue;
> /**
> - * While bootstrapping a new cluster,
> - * read-only replicas shouldn't be considered
> - * as a leader.
> + * While bootstrapping a new cluster, read-only
> + * replicas shouldn't be considered as a leader.
> + * The only exception if there is no read-write
> + * replicas since there is still a possibility
> + * that all replicas exist in cluster table.
> */
> - if (replica->applier->remote_is_ro &&
> - replica->applier->vclock.signature == 0)
> + if (skip_ro && replica->applier->remote_is_ro)
> continue;
> if (leader == NULL) {
> leader = replica;
> @@ -721,6 +729,10 @@ replicaset_leader(void)
> continue;
> leader = replica;
> }
> + if (skip_ro && leader == NULL) {
> + skip_ro = false;
> + goto loop;
> + }
> return leader;
> }
>
> diff --git a/test/replication/replica_uuid_ro3.lua b/test/replication/replica_uuid_ro3.lua
> new file mode 120000
> index 000000000..342d71c57
> --- /dev/null
> +++ b/test/replication/replica_uuid_ro3.lua
> @@ -0,0 +1 @@
> +replica_uuid_ro.lua
> \ No newline at end of file
> diff --git a/test/replication/replicaset_ro_mostly.result b/test/replication/replicaset_ro_mostly.result
> index d753a182d..b9e8f1fe8 100644
> --- a/test/replication/replicaset_ro_mostly.result
> +++ b/test/replication/replicaset_ro_mostly.result
> @@ -53,6 +53,26 @@ create_cluster_uuid(SERVERS, UUID)
> test_run:wait_fullmesh(SERVERS)
> ---
> ...
> +-- Add third replica
> +name = 'replica_uuid_ro3'
> +---
> +...
> +test_run:cmd(create_cluster_cmd1:format(name, name))
> +---
> +- true
> +...
> +test_run:cmd(create_cluster_cmd2:format(name, uuid.new()))
> +---
> +- true
> +...
> +test_run:cmd('switch replica_uuid_ro3')
> +---
> +- true
> +...
> +test_run:cmd('switch default')
> +---
> +- true
> +...
> -- Cleanup.
> test_run:drop_cluster(SERVERS)
> ---
> diff --git a/test/replication/replicaset_ro_mostly.test.lua b/test/replication/replicaset_ro_mostly.test.lua
> index 539ca5a13..f2c2d0d11 100644
> --- a/test/replication/replicaset_ro_mostly.test.lua
> +++ b/test/replication/replicaset_ro_mostly.test.lua
> @@ -26,5 +26,13 @@ test_run:cmd("setopt delimiter ''");
> -- Deploy a cluster.
> create_cluster_uuid(SERVERS, UUID)
> test_run:wait_fullmesh(SERVERS)
> +
> +-- Add third replica
> +name = 'replica_uuid_ro3'
> +test_run:cmd(create_cluster_cmd1:format(name, name))
> +test_run:cmd(create_cluster_cmd2:format(name, uuid.new()))
> +test_run:cmd('switch replica_uuid_ro3')
> +test_run:cmd('switch default')
> +
> -- Cleanup.
> test_run:drop_cluster(SERVERS)
> --
> 2.14.3 (Apple Git-98)
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
next prev parent reply other threads:[~2018-05-22 16:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-22 15:40 Konstantin Belyavskiy
2018-05-22 16:45 ` Konstantin Osipov [this message]
-- strict thread matches above, loose matches on Subject: below --
2018-04-11 16:02 Konstantin Belyavskiy
2018-04-13 8:54 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180522164542.GC28644@atlas \
--to=kostja@tarantool.org \
--cc=georgy@tarantool.org \
--cc=k.belyavskiy@tarantool.org \
--cc=tarantool-patches@freelists.org \
--cc=vdavydov@tarantool.org \
--subject='Re: [PATCH] replication: fix bug with read-only replica as a bootstrap leader' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox