Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tarantool-patches@dev.tarantool.org, gorcunov@gmail.com,
	sergepetrenko@tarantool.org
Subject: [Tarantool-patches] [PATCH 6/6] replication: prefer to join from booted replicas
Date: Sat,  5 Jun 2021 01:38:00 +0200	[thread overview]
Message-ID: <b73268d5a80874f3edac1320352dc8e253a0fbfb.1622849790.git.v.shpilevoy@tarantool.org> (raw)
In-Reply-To: <cover.1622849790.git.v.shpilevoy@tarantool.org>

The algorithm of looking for an instance to join the replicaset
from didn't take into account that some of the instances might be
not bootstrapped but still perfectly available.

As a result, a ridiculous situation could happen - an instance
could connect to a cluster with just read-only instances, but it
could have itself with box.cfg{read_only = false}. Then instead of
failing or waiting it just booted a brand new cluster. And after
that the node just started complaining about the others having a
different replicaset UUID.

The patch makes so a new instance always prefers a bootstrapped
join-source to a non-boostrapped one, including self. In the
situation above the new instance now terminates with an error.

In future hopefully it should start a retry-loop instead.

Closes #5613

@TarantoolBot document
Title: IPROTO_BALLOT rework and a new field

A couple of fields in `IPROTO_BALLOT 0x29` used to have values not
matching with their names. They are changed.

* `IPROTO_BALLOT_IS_RO 0x01` used to mean "the instance has
  `box.cfg{read_only = true}`". It was renamed in the source code
  to `IPROTO_BALLOT_IS_RO_CFG`. It has the same code `0x01`, and
  the value is the same. Only the name has changed, and in the doc
  should be too.

* `IPROTO_BALLOT_IS_LOADING 0x04` used to mean "the instance has
  finished `box.cfg()` and it has `read_only = true`". The name
  was wrong therefore, because even if the instance finished
  loading, the flag still was false for `read_only = true` nodes.
  Also such a value is not very suitable for any sane usage.
  The name was changed to `IPROTO_BALLOT_IS_RO`, the code stayed
  the same, and the value now is "the instance is not writable".
  The reason for being not writable can be any: the node is an
  orphan; or it has `read_only = true`; or it is a Raft follower;
  or anything else.

And there is a new field.

`IPROTO_BALLOT_IS_BOOTED 0x06` means the instance has finished its
bootstrap or recovery.
---
 .../gh-5613-bootstrap-prefer-booted.md        |  6 ++
 src/box/replication.cc                        | 20 +++---
 .../gh-5613-bootstrap-prefer-booted.result    | 70 +++++++++++++++++++
 .../gh-5613-bootstrap-prefer-booted.test.lua  | 21 ++++++
 test/replication/gh-5613-master.lua           | 11 +++
 test/replication/gh-5613-replica1.lua         | 13 ++++
 test/replication/gh-5613-replica2.lua         | 11 +++
 test/replication/suite.cfg                    |  1 +
 8 files changed, 144 insertions(+), 9 deletions(-)
 create mode 100644 changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md
 create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.result
 create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.test.lua
 create mode 100644 test/replication/gh-5613-master.lua
 create mode 100644 test/replication/gh-5613-replica1.lua
 create mode 100644 test/replication/gh-5613-replica2.lua

diff --git a/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md
new file mode 100644
index 000000000..c022ee012
--- /dev/null
+++ b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md
@@ -0,0 +1,6 @@
+## bugfix/replication
+
+* Fixed an error when a replica, at attempt to join a cluster with exclusively
+  read-only replicas available, instead of failing or retrying just decided to
+  boot its own replicaset. Now it fails with an error about the other nodes
+  being read-only so they can't register it (gh-5613).
diff --git a/src/box/replication.cc b/src/box/replication.cc
index d33e70f28..52086c65e 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -951,15 +951,6 @@ replicaset_next(struct replica *replica)
 	return replica_hash_next(&replicaset.hash, replica);
 }
 
-/**
- * Compare vclock, read only mode and orphan status
- * of all connected replicas and elect a leader.
- * Initiallly, skip read-only replicas, since they
- * can not properly act as bootstrap masters (register
- * new nodes in _cluster table). If there are no read-write
- * replicas, choose a read-only replica with biggest vclock
- * as a leader, in hope it will become read-write soon.
- */
 struct replica *
 replicaset_find_join_master(void)
 {
@@ -972,12 +963,23 @@ replicaset_find_join_master(void)
 		const struct ballot *ballot = &applier->ballot;
 		int score = 0;
 		/*
+		 * First of all try to ignore non-booted instances. Including
+		 * self if not booted yet. For self it is even dangerous as the
+		 * instance might decide to boot its own cluster if, for
+		 * example, the other nodes are available, but read-only. It
+		 * would be a mistake.
+		 *
+		 * For a new cluster it is ok to use a non-booted instance as it
+		 * means the algorithm tries to find an initial "boot-master".
+		 *
 		 * Prefer instances not configured as read-only via box.cfg, and
 		 * not being in read-only state due to any other reason. The
 		 * config is stronger because if it is configured as read-only,
 		 * it is in read-only state for sure, until the config is
 		 * changed.
 		 */
+		if (ballot->is_booted)
+			score += 10;
 		if (!ballot->is_ro_cfg)
 			score += 5;
 		if (!ballot->is_ro)
diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.result b/test/replication/gh-5613-bootstrap-prefer-booted.result
new file mode 100644
index 000000000..e8e7fb792
--- /dev/null
+++ b/test/replication/gh-5613-bootstrap-prefer-booted.result
@@ -0,0 +1,70 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+test_run:cmd('create server master with script="replication/gh-5613-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master with wait=False')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica1')
+ | ---
+ | - true
+ | ...
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg{read_only = true}
+ | ---
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"')
+ | ---
+ | - true
+ | ...
+-- It returns false, but it is expected.
+test_run:cmd('start server replica2 with crash_expected=True')
+ | ---
+ | - false
+ | ...
+opts = {filename = 'gh-5613-replica2.log'}
+ | ---
+ | ...
+assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil)
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('delete server replica2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.test.lua b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua
new file mode 100644
index 000000000..d3c1c1189
--- /dev/null
+++ b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua
@@ -0,0 +1,21 @@
+test_run = require('test_run').new()
+
+test_run:cmd('create server master with script="replication/gh-5613-master.lua"')
+test_run:cmd('start server master with wait=False')
+test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"')
+test_run:cmd('start server replica1')
+test_run:switch('master')
+box.cfg{read_only = true}
+test_run:switch('default')
+
+test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"')
+-- It returns false, but it is expected.
+test_run:cmd('start server replica2 with crash_expected=True')
+opts = {filename = 'gh-5613-replica2.log'}
+assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil)
+
+test_run:cmd('delete server replica2')
+test_run:cmd('stop server replica1')
+test_run:cmd('delete server replica1')
+test_run:cmd('stop server master')
+test_run:cmd('delete server master')
diff --git a/test/replication/gh-5613-master.lua b/test/replication/gh-5613-master.lua
new file mode 100644
index 000000000..408427315
--- /dev/null
+++ b/test/replication/gh-5613-master.lua
@@ -0,0 +1,11 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+box.cfg({
+	listen = 'unix/:./gh-5613-master.sock',
+	replication = {
+		'unix/:./gh-5613-master.sock',
+		'unix/:./gh-5613-replica1.sock',
+	},
+})
+box.schema.user.grant('guest', 'super')
diff --git a/test/replication/gh-5613-replica1.lua b/test/replication/gh-5613-replica1.lua
new file mode 100644
index 000000000..d0d6e3372
--- /dev/null
+++ b/test/replication/gh-5613-replica1.lua
@@ -0,0 +1,13 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+box.cfg({
+	listen = 'unix/:./gh-5613-replica1.sock',
+	replication = {
+		'unix/:./gh-5613-master.sock',
+		'unix/:./gh-5613-replica1.sock',
+	},
+	-- Set to read_only initially so as the bootstrap-master would be
+	-- known in advance.
+	read_only = true,
+})
diff --git a/test/replication/gh-5613-replica2.lua b/test/replication/gh-5613-replica2.lua
new file mode 100644
index 000000000..8cbd45b61
--- /dev/null
+++ b/test/replication/gh-5613-replica2.lua
@@ -0,0 +1,11 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+box.cfg({
+	listen = 'unix/:./gh-5613-replica2.sock',
+	replication = {
+		'unix/:./gh-5613-master.sock',
+		'unix/:./gh-5613-replica1.sock',
+		'unix/:./gh-5613-replica2.sock',
+	},
+})
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 27eab20c2..f9d5ce1cc 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -44,6 +44,7 @@
     "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
     "gh-5536-wal-limit.test.lua": {},
     "gh-5566-final-join-synchro.test.lua": {},
+    "gh-5613-bootstrap-prefer-booted.test.lua": {},
     "gh-6032-promote-wal-write.test.lua": {},
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
-- 
2.24.3 (Apple Git-128)


  parent reply	other threads:[~2021-06-04 23:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04 23:37 [Tarantool-patches] [PATCH 0/6] Instance join should prefer booted instances Vladislav Shpilevoy via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 1/6] replication: refactor replicaset_leader() Vladislav Shpilevoy via Tarantool-patches
2021-06-10 13:54   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 2/6] replication: ballot.is_ro -> is_ro_cfg Vladislav Shpilevoy via Tarantool-patches
2021-06-10 13:56   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 3/6] replication: ballot.is_loading -> is_ro Vladislav Shpilevoy via Tarantool-patches
2021-06-10 13:58   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 4/6] replication: introduce ballot.is_booted Vladislav Shpilevoy via Tarantool-patches
2021-06-10 14:02   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 5/6] replication: use 'score' to find a join-master Vladislav Shpilevoy via Tarantool-patches
2021-06-10 14:06   ` Serge Petrenko via Tarantool-patches
2021-06-10 15:02   ` Cyrill Gorcunov via Tarantool-patches
2021-06-10 20:09     ` Vladislav Shpilevoy via Tarantool-patches
2021-06-10 21:28       ` Cyrill Gorcunov via Tarantool-patches
2021-06-04 23:38 ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-06-06 17:06   ` [Tarantool-patches] [PATCH 6/6] replication: prefer to join from booted replicas Vladislav Shpilevoy via Tarantool-patches
2021-06-10 14:14   ` Serge Petrenko via Tarantool-patches
2021-06-06 17:03 ` [Tarantool-patches] [PATCH 7/6] raft: test join to a raft cluster Vladislav Shpilevoy via Tarantool-patches
2021-06-06 23:01   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-10 14:17   ` Serge Petrenko via Tarantool-patches
2021-06-10 15:03 ` [Tarantool-patches] [PATCH 0/6] Instance join should prefer booted instances Cyrill Gorcunov via Tarantool-patches
2021-06-11 20:56 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b73268d5a80874f3edac1320352dc8e253a0fbfb.1622849790.git.v.shpilevoy@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 6/6] replication: prefer to join from booted replicas' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox