[PATCH] replication: allow to rebootstrap replica from read-only master

Vladimir Davydov vdavydov.dev at gmail.com
Tue Feb 6 19:45:24 MSK 2018


If an instance is read-only, an attempt to join a new replica to it will
fail with ER_READONLY, because joining a replica to a cluster implies
registration in the _cluster system space. However, if the replica is
already registered, which is the case if it is being rebootstrapped with
the same uuid (see box.cfg.instance_uuid), the record corresponding to
the replica is already present in the _cluster space and hence no write
operation is required. Still, rebootstrap fails with the same error.

Let's rearrange the access checks to make it possible to rebootstrap a
replica from a read-only master provided it has the same uuid.

Closes #3111
---
Branch: gh-3111-replication-allow-rebootstrap-from-ro-master

 src/box/box.cc                    | 17 ++++++++++---
 test/replication/misc.result      | 52 ++++++++++++++++++++++++++++++++++++++-
 test/replication/misc.test.lua    | 22 ++++++++++++++---
 test/replication/replica_uuid.lua | 11 +++++++++
 4 files changed, 94 insertions(+), 8 deletions(-)
 create mode 100644 test/replication/replica_uuid.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index c33243a8..9d494257 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1129,11 +1129,12 @@ box_register_replica(uint32_t id, const struct tt_uuid *uuid)
 static void
 box_on_join(const tt_uuid *instance_uuid)
 {
-	box_check_writable_xc();
 	struct replica *replica = replica_by_uuid(instance_uuid);
 	if (replica != NULL)
 		return; /* nothing to do - already registered */
 
+	box_check_writable_xc();
+
 	/** Find the largest existing replica id. */
 	struct space *space = space_cache_find_xc(BOX_CLUSTER_ID);
 	struct index *index = index_find_system_xc(space, 0);
@@ -1226,10 +1227,18 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 
 	/* Check permissions */
 	access_check_universe_xc(PRIV_R);
-	access_check_space_xc(space_cache_find_xc(BOX_CLUSTER_ID), PRIV_W);
 
-	/* Check that we actually can register a new replica */
-	box_check_writable_xc();
+	/*
+	 * Unless already registered, the new replica will be
+	 * added to _cluster space once the initial join stage
+	 * is complete. Fail early if the caller does not have
+	 * appropriate access privileges.
+	 */
+	if (replica_by_uuid(&instance_uuid) == NULL) {
+		box_check_writable_xc();
+		struct space *space = space_cache_find_xc(BOX_CLUSTER_ID);
+		access_check_space_xc(space, PRIV_W);
+	}
 
 	/* Forbid replication with disabled WAL */
 	if (wal_mode() == WAL_NONE) {
diff --git a/test/replication/misc.result b/test/replication/misc.result
index ae26c703..070e4ea8 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -1,6 +1,15 @@
+uuid = require('uuid')
+---
+...
+test_run = require('test_run').new()
+---
+...
+box.schema.user.grant('guest', 'replication')
+---
+...
 -- gh-2991 - Tarantool asserts on box.cfg.replication update if one of
 -- servers is dead
-box.schema.user.grant('guest', 'replication')
+replication_timeout = box.cfg.replication_timeout
 ---
 ...
 box.cfg{replication_timeout=0.05, replication={}}
@@ -11,6 +20,47 @@ box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}}
 - error: 'Incorrect value for option ''replication'': failed to connect to one or
     more replicas'
 ...
+box.cfg{replication_timeout = replication_timeout}
+---
+...
+-- gh-3111 - Allow to rebootstrap a replica from a read-only master
+replica_uuid = uuid.new()
+---
+...
+test_run:cmd('create server test with rpl_master=default, script="replication/replica_uuid.lua"')
+---
+- true
+...
+test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
+---
+- true
+...
+test_run:cmd('stop server test')
+---
+- true
+...
+test_run:cmd('cleanup server test')
+---
+- true
+...
+box.cfg{read_only = true}
+---
+...
+test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
+---
+- true
+...
+test_run:cmd('stop server test')
+---
+- true
+...
+test_run:cmd('cleanup server test')
+---
+- true
+...
+box.cfg{read_only = false}
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 04967a24..d4f714d9 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -1,9 +1,25 @@
--- gh-2991 - Tarantool asserts on box.cfg.replication update if one of
--- servers is dead
+uuid = require('uuid')
+test_run = require('test_run').new()
+
 box.schema.user.grant('guest', 'replication')
 
+-- gh-2991 - Tarantool asserts on box.cfg.replication update if one of
+-- servers is dead
+replication_timeout = box.cfg.replication_timeout
 box.cfg{replication_timeout=0.05, replication={}}
-
 box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}}
+box.cfg{replication_timeout = replication_timeout}
+
+-- gh-3111 - Allow to rebootstrap a replica from a read-only master
+replica_uuid = uuid.new()
+test_run:cmd('create server test with rpl_master=default, script="replication/replica_uuid.lua"')
+test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
+test_run:cmd('stop server test')
+test_run:cmd('cleanup server test')
+box.cfg{read_only = true}
+test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
+test_run:cmd('stop server test')
+test_run:cmd('cleanup server test')
+box.cfg{read_only = false}
 
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica_uuid.lua b/test/replication/replica_uuid.lua
new file mode 100644
index 00000000..f92d3119
--- /dev/null
+++ b/test/replication/replica_uuid.lua
@@ -0,0 +1,11 @@
+#!/usr/bin/env tarantool
+
+box.cfg({
+    instance_uuid       = arg[1],
+    listen              = os.getenv("LISTEN"),
+    replication         = os.getenv("MASTER"),
+    memtx_memory        = 107374182,
+})
+
+require('console').listen(os.getenv('ADMIN'))
+
-- 
2.11.0




More information about the Tarantool-patches mailing list