[Tarantool-patches] [PATCH 6/6] replication: prefer to join from booted replicas

Serge Petrenko sergepetrenko at tarantool.org
Thu Jun 10 17:14:33 MSK 2021



05.06.2021 02:38, Vladislav Shpilevoy пишет:
> The algorithm of looking for an instance to join the replicaset
> from didn't take into account that some of the instances might be
> not bootstrapped but still perfectly available.
>
> As a result, a ridiculous situation could happen - an instance
> could connect to a cluster with just read-only instances, but it
> could have itself with box.cfg{read_only = false}. Then instead of
> failing or waiting it just booted a brand new cluster. And after
> that the node just started complaining about the others having a
> different replicaset UUID.
>
> The patch makes so a new instance always prefers a bootstrapped
> join-source to a non-boostrapped one, including self. In the
> situation above the new instance now terminates with an error.
>
> In future hopefully it should start a retry-loop instead.
>
> Closes #5613

Thanks! LGTM.

>
> @TarantoolBot document
> Title: IPROTO_BALLOT rework and a new field
>
> A couple of fields in `IPROTO_BALLOT 0x29` used to have values not
> matching with their names. They are changed.
>
> * `IPROTO_BALLOT_IS_RO 0x01` used to mean "the instance has
>    `box.cfg{read_only = true}`". It was renamed in the source code
>    to `IPROTO_BALLOT_IS_RO_CFG`. It has the same code `0x01`, and
>    the value is the same. Only the name has changed, and in the doc
>    should be too.
>
> * `IPROTO_BALLOT_IS_LOADING 0x04` used to mean "the instance has
>    finished `box.cfg()` and it has `read_only = true`". The name
>    was wrong therefore, because even if the instance finished
>    loading, the flag still was false for `read_only = true` nodes.
>    Also such a value is not very suitable for any sane usage.
>    The name was changed to `IPROTO_BALLOT_IS_RO`, the code stayed
>    the same, and the value now is "the instance is not writable".
>    The reason for being not writable can be any: the node is an
>    orphan; or it has `read_only = true`; or it is a Raft follower;
>    or anything else.
>
> And there is a new field.
>
> `IPROTO_BALLOT_IS_BOOTED 0x06` means the instance has finished its
> bootstrap or recovery.
> ---
>   .../gh-5613-bootstrap-prefer-booted.md        |  6 ++
>   src/box/replication.cc                        | 20 +++---
>   .../gh-5613-bootstrap-prefer-booted.result    | 70 +++++++++++++++++++
>   .../gh-5613-bootstrap-prefer-booted.test.lua  | 21 ++++++
>   test/replication/gh-5613-master.lua           | 11 +++
>   test/replication/gh-5613-replica1.lua         | 13 ++++
>   test/replication/gh-5613-replica2.lua         | 11 +++
>   test/replication/suite.cfg                    |  1 +
>   8 files changed, 144 insertions(+), 9 deletions(-)
>   create mode 100644 changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md
>   create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.result
>   create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.test.lua
>   create mode 100644 test/replication/gh-5613-master.lua
>   create mode 100644 test/replication/gh-5613-replica1.lua
>   create mode 100644 test/replication/gh-5613-replica2.lua
>
> diff --git a/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md
> new file mode 100644
> index 000000000..c022ee012
> --- /dev/null
> +++ b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md
> @@ -0,0 +1,6 @@
> +## bugfix/replication
> +
> +* Fixed an error when a replica, at attempt to join a cluster with exclusively
> +  read-only replicas available, instead of failing or retrying just decided to
> +  boot its own replicaset. Now it fails with an error about the other nodes
> +  being read-only so they can't register it (gh-5613).
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index d33e70f28..52086c65e 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -951,15 +951,6 @@ replicaset_next(struct replica *replica)
>   	return replica_hash_next(&replicaset.hash, replica);
>   }
>   
> -/**
> - * Compare vclock, read only mode and orphan status
> - * of all connected replicas and elect a leader.
> - * Initiallly, skip read-only replicas, since they
> - * can not properly act as bootstrap masters (register
> - * new nodes in _cluster table). If there are no read-write
> - * replicas, choose a read-only replica with biggest vclock
> - * as a leader, in hope it will become read-write soon.
> - */
>   struct replica *
>   replicaset_find_join_master(void)
>   {
> @@ -972,12 +963,23 @@ replicaset_find_join_master(void)
>   		const struct ballot *ballot = &applier->ballot;
>   		int score = 0;
>   		/*
> +		 * First of all try to ignore non-booted instances. Including
> +		 * self if not booted yet. For self it is even dangerous as the
> +		 * instance might decide to boot its own cluster if, for
> +		 * example, the other nodes are available, but read-only. It
> +		 * would be a mistake.
> +		 *
> +		 * For a new cluster it is ok to use a non-booted instance as it
> +		 * means the algorithm tries to find an initial "boot-master".
> +		 *
>   		 * Prefer instances not configured as read-only via box.cfg, and
>   		 * not being in read-only state due to any other reason. The
>   		 * config is stronger because if it is configured as read-only,
>   		 * it is in read-only state for sure, until the config is
>   		 * changed.
>   		 */
> +		if (ballot->is_booted)
> +			score += 10;
>   		if (!ballot->is_ro_cfg)
>   			score += 5;
>   		if (!ballot->is_ro)
> diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.result b/test/replication/gh-5613-bootstrap-prefer-booted.result
> new file mode 100644
> index 000000000..e8e7fb792
> --- /dev/null
> +++ b/test/replication/gh-5613-bootstrap-prefer-booted.result
> @@ -0,0 +1,70 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +test_run:cmd('create server master with script="replication/gh-5613-master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server master with wait=False')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +box.cfg{read_only = true}
> + | ---
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"')
> + | ---
> + | - true
> + | ...
> +-- It returns false, but it is expected.
> +test_run:cmd('start server replica2 with crash_expected=True')
> + | ---
> + | - false
> + | ...
> +opts = {filename = 'gh-5613-replica2.log'}
> + | ---
> + | ...
> +assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil)
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('delete server replica2')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server master')
> + | ---
> + | - true
> + | ...
> diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.test.lua b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua
> new file mode 100644
> index 000000000..d3c1c1189
> --- /dev/null
> +++ b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua
> @@ -0,0 +1,21 @@
> +test_run = require('test_run').new()
> +
> +test_run:cmd('create server master with script="replication/gh-5613-master.lua"')
> +test_run:cmd('start server master with wait=False')
> +test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"')
> +test_run:cmd('start server replica1')
> +test_run:switch('master')
> +box.cfg{read_only = true}
> +test_run:switch('default')
> +
> +test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"')
> +-- It returns false, but it is expected.
> +test_run:cmd('start server replica2 with crash_expected=True')
> +opts = {filename = 'gh-5613-replica2.log'}
> +assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil)
> +
> +test_run:cmd('delete server replica2')
> +test_run:cmd('stop server replica1')
> +test_run:cmd('delete server replica1')
> +test_run:cmd('stop server master')
> +test_run:cmd('delete server master')
> diff --git a/test/replication/gh-5613-master.lua b/test/replication/gh-5613-master.lua
> new file mode 100644
> index 000000000..408427315
> --- /dev/null
> +++ b/test/replication/gh-5613-master.lua
> @@ -0,0 +1,11 @@
> +#!/usr/bin/env tarantool
> +
> +require('console').listen(os.getenv('ADMIN'))
> +box.cfg({
> +	listen = 'unix/:./gh-5613-master.sock',
> +	replication = {
> +		'unix/:./gh-5613-master.sock',
> +		'unix/:./gh-5613-replica1.sock',
> +	},
> +})
> +box.schema.user.grant('guest', 'super')
> diff --git a/test/replication/gh-5613-replica1.lua b/test/replication/gh-5613-replica1.lua
> new file mode 100644
> index 000000000..d0d6e3372
> --- /dev/null
> +++ b/test/replication/gh-5613-replica1.lua
> @@ -0,0 +1,13 @@
> +#!/usr/bin/env tarantool
> +
> +require('console').listen(os.getenv('ADMIN'))
> +box.cfg({
> +	listen = 'unix/:./gh-5613-replica1.sock',
> +	replication = {
> +		'unix/:./gh-5613-master.sock',
> +		'unix/:./gh-5613-replica1.sock',
> +	},
> +	-- Set to read_only initially so as the bootstrap-master would be
> +	-- known in advance.
> +	read_only = true,
> +})
> diff --git a/test/replication/gh-5613-replica2.lua b/test/replication/gh-5613-replica2.lua
> new file mode 100644
> index 000000000..8cbd45b61
> --- /dev/null
> +++ b/test/replication/gh-5613-replica2.lua
> @@ -0,0 +1,11 @@
> +#!/usr/bin/env tarantool
> +
> +require('console').listen(os.getenv('ADMIN'))
> +box.cfg({
> +	listen = 'unix/:./gh-5613-replica2.sock',
> +	replication = {
> +		'unix/:./gh-5613-master.sock',
> +		'unix/:./gh-5613-replica1.sock',
> +		'unix/:./gh-5613-replica2.sock',
> +	},
> +})
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index 27eab20c2..f9d5ce1cc 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -44,6 +44,7 @@
>       "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
>       "gh-5536-wal-limit.test.lua": {},
>       "gh-5566-final-join-synchro.test.lua": {},
> +    "gh-5613-bootstrap-prefer-booted.test.lua": {},
>       "gh-6032-promote-wal-write.test.lua": {},
>       "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
>       "gh-6094-rs-uuid-mismatch.test.lua": {},

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list