[PATCH v4 0/3] Replica rejoin

Tarantool development patches archive
 help / color / mirror / Atom feed

* [PATCH v4 0/3] Replica rejoin
@ 2018-07-21 12:38 Vladimir Davydov
  2018-07-21 12:38 ` [PATCH v4 1/3] replication: rebootstrap instance on startup if it fell behind Vladimir Davydov
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Vladimir Davydov @ 2018-07-21 12:38 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

After this patch set is applied, an instance will try to detect if
it fell too much behind its peers in the cluster and so needs to be
rebootstrapped. If it does, it will skip local recovery and instead
proceed to bootstrap from a remote master. Old files (xlog, snap)
are not deleted during rebootstrap. They will be removed by gc as
usual.

https://github.com/tarantool/tarantool/issues/461
https://github.com/tarantool/tarantool/commits/dv/gh-461-replica-rejoin

Changes in v4:
 - Rebase on top of the latest 1.10 and remove merged patches.
 - Log everything that affects rebootstrap decision making.
 - Rebootstrap an instance only if it can't follow *all* masters,
   not just *any* of them, as it used to be.

v3: https://www.freelists.org/post/tarantool-patches/PATCH-v3-0011-Replica-rejoin

Changes in v3:
 - Remove merged patches, add some new ones.
 - Rebase on top of the latest 1.10: this required patching gc to make
   it track vclocks instead of signatures so that it could report the
   vclock of the oldest xlog stored on the instance.
 - Follow-up on the recently committed patch for recovery subsystem: add
   some comments and remove double scanning of the WAL directory.
 - Introduce a new IPROTO command, IPROTO_REQUEST_STATUS, to be used
   instead of IPROTO_REQUEST_VOTE; send a map in reply to this command.
   Rationale: a map is more flexible and can be extended. In particular,
   we can use the very same message for inquiring the oldest vclock
   stored on the master to detect if a replica needs to be rejoined,
   instead of introducing a new IPROTO command, as we did in v2.
 - Do NOT rebootstrap a replica if it has some data that is absent on
   the master. Rationale: we don't want to lose ANY data by rejoining a
   replica; besides, if a replica's vclock is incomparable with the
   master's, xdir_scan may break.

v2: https://www.freelists.org/post/tarantool-patches/PATCH-v2-0011-Replica-rejoin

Changes in v2:
 - Implement rebootstrap support for vinyl engine.
 - Call recover_remaining_wals() explicitly after recovery_stop_local()
   as suggested by @kostja.
 - Add comment to memtx_engine_new() explaining why we need to init
   INSTANCE_UUID before proceeding to local recovery.

v1: https://www.freelists.org/post/tarantool-patches/RFC-PATCH-0012-Replica-rejoin

Vladimir Davydov (3):
  replication: rebootstrap instance on startup if it fell behind
  vinyl: simplify vylog recovery from backup
  vinyl: implement rebootstrap support

 src/box/box.cc                           |   9 ++
 src/box/relay.cc                         |   3 +
 src/box/replication.cc                   |  59 +++++++
 src/box/replication.h                    |   9 ++
 src/box/vy_log.c                         | 190 +++++++++++++++++------
 src/box/vy_log.h                         |  34 ++++
 src/errinj.h                             |   1 +
 test/box/errinj.result                   |   6 +-
 test/replication/replica_rejoin.result   | 250 ++++++++++++++++++++++++++++++
 test/replication/replica_rejoin.test.lua |  91 +++++++++++
 test/vinyl/replica_rejoin.lua            |  13 ++
 test/vinyl/replica_rejoin.result         | 257 +++++++++++++++++++++++++++++++
 test/vinyl/replica_rejoin.test.lua       |  88 +++++++++++
 test/vinyl/suite.ini                     |   2 +-
 14 files changed, 965 insertions(+), 47 deletions(-)
 create mode 100644 test/replication/replica_rejoin.result
 create mode 100644 test/replication/replica_rejoin.test.lua
 create mode 100644 test/vinyl/replica_rejoin.lua
 create mode 100644 test/vinyl/replica_rejoin.result
 create mode 100644 test/vinyl/replica_rejoin.test.lua

-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v4 1/3] replication: rebootstrap instance on startup if it fell behind
  2018-07-21 12:38 [PATCH v4 0/3] Replica rejoin Vladimir Davydov
@ 2018-07-21 12:38 ` Vladimir Davydov
  2018-07-21 12:38 ` [PATCH v4 2/3] vinyl: simplify vylog recovery from backup Vladimir Davydov
  2018-07-21 12:38 ` [PATCH v4 3/3] vinyl: implement rebootstrap support Vladimir Davydov
  2 siblings, 0 replies; 4+ messages in thread
From: Vladimir Davydov @ 2018-07-21 12:38 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

If a replica fell too much behind its peers in the cluster and xlog
files needed for it to get up to speed have been removed, it won't be
able to proceed without rebootstrap. This patch makes the recovery
procedure detect such cases and initiate rebootstrap procedure if
necessary.

Note, rebootstrap is currently only supported by memtx engine. If there
are vinyl spaces on the replica, rebootstrap will fail. This is fixed by
the following patches.

Part of #461
---
 src/box/box.cc                           |   9 ++
 src/box/replication.cc                   |  59 ++++++++
 src/box/replication.h                    |   9 ++
 test/replication/replica_rejoin.result   | 247 +++++++++++++++++++++++++++++++
 test/replication/replica_rejoin.test.lua |  92 ++++++++++++
 test/replication/suite.cfg               |   1 +
 6 files changed, 417 insertions(+)
 create mode 100644 test/replication/replica_rejoin.result
 create mode 100644 test/replication/replica_rejoin.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index abb55e96..bf994835 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1797,6 +1797,9 @@ bootstrap(const struct tt_uuid *instance_uuid,
 /**
  * Recover the instance from the local directory.
  * Enter hot standby if the directory is locked.
+ * Invoke rebootstrap if the instance fell too much
+ * behind its peers in the replica set and needs
+ * to be rebootstrapped.
  */
 static void
 local_recovery(const struct tt_uuid *instance_uuid,
@@ -1832,6 +1835,12 @@ local_recovery(const struct tt_uuid *instance_uuid,
 	if (wal_dir_lock >= 0) {
 		box_listen();
 		box_sync_replication(replication_connect_timeout, false);
+
+		struct replica *master;
+		if (replicaset_needs_rejoin(&master)) {
+			say_crit("replica is too old, initiating rebootstrap");
+			return bootstrap_from_master(master);
+		}
 	}
 
 	/*
diff --git a/src/box/replication.cc b/src/box/replication.cc
index c4d6e6f2..bf7b8c22 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -41,6 +41,7 @@
 #include "error.h"
 #include "relay.h"
 #include "vclock.h" /* VCLOCK_MAX */
+#include "sio.h"
 
 uint32_t instance_id = REPLICA_ID_NIL;
 struct tt_uuid INSTANCE_UUID;
@@ -625,6 +626,64 @@ error:
 		  "failed to connect to one or more replicas");
 }
 
+bool
+replicaset_needs_rejoin(struct replica **master)
+{
+	struct replica *leader = NULL;
+	replicaset_foreach(replica) {
+		struct applier *applier = replica->applier;
+		if (applier == NULL)
+			continue;
+
+		const struct ballot *ballot = &applier->ballot;
+		if (vclock_compare(&ballot->gc_vclock,
+				   &replicaset.vclock) <= 0) {
+			/*
+			 * There's at least one master that still stores
+			 * WALs needed by this instance. Proceed to local
+			 * recovery.
+			 */
+			return false;
+		}
+
+		const char *addr_str = sio_strfaddr(&applier->addr,
+						applier->addr_len);
+		char *local_vclock_str = vclock_to_string(&replicaset.vclock);
+		char *remote_vclock_str = vclock_to_string(&ballot->vclock);
+		char *gc_vclock_str = vclock_to_string(&ballot->gc_vclock);
+
+		say_info("can't follow %s: required %s available %s",
+			 addr_str, local_vclock_str, gc_vclock_str);
+
+		if (vclock_compare(&replicaset.vclock, &ballot->vclock) > 0) {
+			/*
+			 * Replica has some rows that are not present on
+			 * the master. Don't rebootstrap as we don't want
+			 * to lose any data.
+			 */
+			say_info("can't rebootstrap from %s: "
+				 "replica has local rows: local %s remote %s",
+				 addr_str, local_vclock_str, remote_vclock_str);
+			goto next;
+		}
+
+		/* Prefer a master with the max vclock. */
+		if (leader == NULL ||
+		    vclock_sum(&ballot->vclock) >
+		    vclock_sum(&leader->applier->ballot.vclock))
+			leader = replica;
+next:
+		free(local_vclock_str);
+		free(remote_vclock_str);
+		free(gc_vclock_str);
+	}
+	if (leader == NULL)
+		return false;
+
+	*master = leader;
+	return true;
+}
+
 void
 replicaset_follow(void)
 {
diff --git a/src/box/replication.h b/src/box/replication.h
index fdf995c3..e8b391af 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -360,6 +360,15 @@ replicaset_connect(struct applier **appliers, int count,
 		   double timeout, bool connect_all);
 
 /**
+ * Check if the current instance fell too much behind its
+ * peers in the replica set and needs to be rebootstrapped.
+ * If it does, return true and set @master to the instance
+ * to use for rebootstrap, otherwise return false.
+ */
+bool
+replicaset_needs_rejoin(struct replica **master);
+
+/**
  * Resume all appliers registered with the replica set.
  */
 void
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
new file mode 100644
index 00000000..b7563ed9
--- /dev/null
+++ b/test/replication/replica_rejoin.result
@@ -0,0 +1,247 @@
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+-- Cleanup the instance to remove vylog files left from previous
+-- tests, since vinyl doesn't support rebootstrap yet.
+test_run:cmd('restart server default with cleanup=1')
+--
+-- gh-461: check that a replica refetches the last checkpoint
+-- in case it fell behind the master.
+--
+box.schema.user.grant('guest', 'replication')
+---
+...
+_ = box.schema.space.create('test')
+---
+...
+_ = box.space.test:create_index('pk')
+---
+...
+_ = box.space.test:insert{1}
+---
+...
+_ = box.space.test:insert{2}
+---
+...
+_ = box.space.test:insert{3}
+---
+...
+-- Join a replica, then stop it.
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.info.replication[1].upstream.status == 'follow' or box.info
+---
+- true
+...
+box.space.test:select()
+---
+- - [1]
+  - [2]
+  - [3]
+...
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+-- Restart the server to purge the replica from
+-- the garbage collection state.
+test_run:cmd("restart server default")
+-- Make some checkpoints to remove old xlogs.
+checkpoint_count = box.cfg.checkpoint_count
+---
+...
+box.cfg{checkpoint_count = 1}
+---
+...
+_ = box.space.test:delete{1}
+---
+...
+_ = box.space.test:insert{10}
+---
+...
+box.snapshot()
+---
+- ok
+...
+_ = box.space.test:delete{2}
+---
+...
+_ = box.space.test:insert{20}
+---
+...
+box.snapshot()
+---
+- ok
+...
+_ = box.space.test:delete{3}
+---
+...
+_ = box.space.test:insert{30}
+---
+...
+fio = require('fio')
+---
+...
+#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1
+---
+- 1
+...
+box.cfg{checkpoint_count = checkpoint_count}
+---
+...
+-- Restart the replica. Since xlogs have been removed,
+-- it is supposed to rejoin without changing id.
+test_run:cmd("start server replica")
+---
+- true
+...
+box.info.replication[2].downstream.vclock ~= nil or box.info
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.info.replication[1].upstream.status == 'follow' or box.info
+---
+- true
+...
+box.space.test:select()
+---
+- - [10]
+  - [20]
+  - [30]
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- Make sure the replica follows new changes.
+for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
+---
+...
+vclock = test_run:get_vclock('default')
+---
+...
+_ = test_run:wait_vclock('replica', vclock)
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.space.test:select()
+---
+- - [10, 10]
+  - [20, 20]
+  - [30, 30]
+...
+-- Check that restart works as usual.
+test_run:cmd("restart server replica")
+box.info.replication[1].upstream.status == 'follow' or box.info
+---
+- true
+...
+box.space.test:select()
+---
+- - [10, 10]
+  - [20, 20]
+  - [30, 30]
+...
+-- Check that rebootstrap is NOT initiated unless the replica
+-- is strictly behind the master.
+box.space.test:replace{1, 2, 3} -- bumps LSN on the replica
+---
+- [1, 2, 3]
+...
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+test_run:cmd("restart server default")
+checkpoint_count = box.cfg.checkpoint_count
+---
+...
+box.cfg{checkpoint_count = 1}
+---
+...
+for i = 1, 3 do box.space.test:delete{i * 10} end
+---
+...
+box.snapshot()
+---
+- ok
+...
+for i = 1, 3 do box.space.test:insert{i * 100} end
+---
+...
+fio = require('fio')
+---
+...
+#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1
+---
+- 1
+...
+box.cfg{checkpoint_count = checkpoint_count}
+---
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.info.status -- orphan
+---
+- orphan
+...
+box.space.test:select()
+---
+- - [1, 2, 3]
+  - [10, 10]
+  - [20, 20]
+  - [30, 30]
+...
+-- Cleanup.
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+box.space.test:drop()
+---
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
new file mode 100644
index 00000000..dfcb79cf
--- /dev/null
+++ b/test/replication/replica_rejoin.test.lua
@@ -0,0 +1,92 @@
+env = require('test_run')
+test_run = env.new()
+
+-- Cleanup the instance to remove vylog files left from previous
+-- tests, since vinyl doesn't support rebootstrap yet.
+test_run:cmd('restart server default with cleanup=1')
+
+--
+-- gh-461: check that a replica refetches the last checkpoint
+-- in case it fell behind the master.
+--
+box.schema.user.grant('guest', 'replication')
+_ = box.schema.space.create('test')
+_ = box.space.test:create_index('pk')
+_ = box.space.test:insert{1}
+_ = box.space.test:insert{2}
+_ = box.space.test:insert{3}
+
+-- Join a replica, then stop it.
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica")
+test_run:cmd("switch replica")
+box.info.replication[1].upstream.status == 'follow' or box.info
+box.space.test:select()
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+
+-- Restart the server to purge the replica from
+-- the garbage collection state.
+test_run:cmd("restart server default")
+
+-- Make some checkpoints to remove old xlogs.
+checkpoint_count = box.cfg.checkpoint_count
+box.cfg{checkpoint_count = 1}
+_ = box.space.test:delete{1}
+_ = box.space.test:insert{10}
+box.snapshot()
+_ = box.space.test:delete{2}
+_ = box.space.test:insert{20}
+box.snapshot()
+_ = box.space.test:delete{3}
+_ = box.space.test:insert{30}
+fio = require('fio')
+#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1
+box.cfg{checkpoint_count = checkpoint_count}
+
+-- Restart the replica. Since xlogs have been removed,
+-- it is supposed to rejoin without changing id.
+test_run:cmd("start server replica")
+box.info.replication[2].downstream.vclock ~= nil or box.info
+test_run:cmd("switch replica")
+box.info.replication[1].upstream.status == 'follow' or box.info
+box.space.test:select()
+test_run:cmd("switch default")
+
+-- Make sure the replica follows new changes.
+for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
+vclock = test_run:get_vclock('default')
+_ = test_run:wait_vclock('replica', vclock)
+test_run:cmd("switch replica")
+box.space.test:select()
+
+-- Check that restart works as usual.
+test_run:cmd("restart server replica")
+box.info.replication[1].upstream.status == 'follow' or box.info
+box.space.test:select()
+
+-- Check that rebootstrap is NOT initiated unless the replica
+-- is strictly behind the master.
+box.space.test:replace{1, 2, 3} -- bumps LSN on the replica
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+test_run:cmd("restart server default")
+checkpoint_count = box.cfg.checkpoint_count
+box.cfg{checkpoint_count = 1}
+for i = 1, 3 do box.space.test:delete{i * 10} end
+box.snapshot()
+for i = 1, 3 do box.space.test:insert{i * 100} end
+fio = require('fio')
+#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1
+box.cfg{checkpoint_count = checkpoint_count}
+test_run:cmd("start server replica")
+test_run:cmd("switch replica")
+box.info.status -- orphan
+box.space.test:select()
+
+-- Cleanup.
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+test_run:cmd("cleanup server replica")
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 95e94e5a..2b609f16 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -6,6 +6,7 @@
     "wal_off.test.lua": {},
     "hot_standby.test.lua": {},
     "rebootstrap.test.lua": {},
+    "replica_rejoin.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
         "vinyl": {"engine": "vinyl"}
-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v4 2/3] vinyl: simplify vylog recovery from backup
  2018-07-21 12:38 [PATCH v4 0/3] Replica rejoin Vladimir Davydov
  2018-07-21 12:38 ` [PATCH v4 1/3] replication: rebootstrap instance on startup if it fell behind Vladimir Davydov
@ 2018-07-21 12:38 ` Vladimir Davydov
  2018-07-21 12:38 ` [PATCH v4 3/3] vinyl: implement rebootstrap support Vladimir Davydov
  2 siblings, 0 replies; 4+ messages in thread
From: Vladimir Davydov @ 2018-07-21 12:38 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Since we don't create snapshot files for vylog, but instead append
records written after checkpoint to the same file, we have to use the
previous vylog file for backup (see vy_log_backup_path()). So when
recovering from a backup we need to rotate the last vylog to keep vylog
and checkpoint signatures in sync. Currently, we do it on recovery
completion and we use vy_log_create() instead of vy_log_rotate() for it.
This is done so that we can reuse the context that was used for recovery
instead of rereading vylog for rotation. Actually, there's no point in
this micro-optimization, because we rotate vylog only when recovering
from a backup. Let's remove it and use vy_log_rotate() for this.

Needed for #461
---
 src/box/vy_log.c | 59 +++++++++++++++++++++-----------------------------------
 1 file changed, 22 insertions(+), 37 deletions(-)

diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index c20f8038..10648106 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -184,6 +184,12 @@ static int
 vy_recovery_process_record(struct vy_recovery *recovery,
 			   const struct vy_log_record *record);
 
+static int
+vy_log_create(const struct vclock *vclock, struct vy_recovery *recovery);
+
+int
+vy_log_rotate(const struct vclock *vclock);
+
 /**
  * Return the name of the vylog file that has the given signature.
  */
@@ -883,10 +889,11 @@ vy_log_begin_recovery(const struct vclock *vclock)
 	if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
 		return NULL;
 
-	struct vclock vy_log_vclock;
-	vclock_create(&vy_log_vclock);
-	if (xdir_last_vclock(&vy_log.dir, &vy_log_vclock) >= 0 &&
-	    vclock_compare(&vy_log_vclock, vclock) > 0) {
+	if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) < 0)
+		vclock_copy(&vy_log.last_checkpoint, vclock);
+
+	int cmp = vclock_compare(&vy_log.last_checkpoint, vclock);
+	if (cmp > 0) {
 		/*
 		 * Last vy_log log is newer than the last snapshot.
 		 * This can't normally happen, as vy_log is rotated
@@ -896,21 +903,27 @@ vy_log_begin_recovery(const struct vclock *vclock)
 		diag_set(ClientError, ER_MISSING_SNAPSHOT);
 		return NULL;
 	}
+	if (cmp < 0) {
+		/*
+		 * Last vy_log log is older than the last snapshot.
+		 * This happens if we are recovering from a backup.
+		 * Rotate the log to keep its signature in sync with
+		 * checkpoint.
+		 */
+		if (vy_log_rotate(vclock) != 0)
+			return NULL;
+	}
 
 	struct vy_recovery *recovery;
-	recovery = vy_recovery_new(vclock_sum(&vy_log_vclock), 0);
+	recovery = vy_recovery_new(vclock_sum(&vy_log.last_checkpoint), 0);
 	if (recovery == NULL)
 		return NULL;
 
 	vy_log.next_id = recovery->max_id + 1;
 	vy_log.recovery = recovery;
-	vclock_copy(&vy_log.last_checkpoint, vclock);
 	return recovery;
 }
 
-static int
-vy_log_create(const struct vclock *vclock, struct vy_recovery *recovery);
-
 int
 vy_log_end_recovery(void)
 {
@@ -931,34 +944,6 @@ vy_log_end_recovery(void)
 		return -1;
 	}
 
-	/*
-	 * On backup we copy files corresponding to the most recent
-	 * checkpoint. Since vy_log does not create snapshots of its log
-	 * files, but instead appends records written after checkpoint
-	 * to the most recent log file, the signature of the vy_log file
-	 * corresponding to the last checkpoint equals the signature
-	 * of the previous checkpoint. So upon successful recovery
-	 * from a backup we need to rotate the log to keep checkpoint
-	 * and vy_log signatures in sync.
-	 */
-	struct vclock *vclock = vclockset_last(&vy_log.dir.index);
-	if (vclock == NULL ||
-	    vclock_compare(vclock, &vy_log.last_checkpoint) != 0) {
-		vclock = malloc(sizeof(*vclock));
-		if (vclock == NULL) {
-			diag_set(OutOfMemory, sizeof(*vclock),
-				 "malloc", "struct vclock");
-			return -1;
-		}
-		vclock_copy(vclock, &vy_log.last_checkpoint);
-		xdir_add_vclock(&vy_log.dir, vclock);
-		if (vy_log_create(vclock, vy_log.recovery) < 0) {
-			diag_log();
-			say_error("failed to write `%s'",
-				  vy_log_filename(vclock_sum(vclock)));
-			return -1;
-		}
-	}
 	xdir_collect_inprogress(&vy_log.dir);
 	vy_log.recovery = NULL;
 	return 0;
-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v4 3/3] vinyl: implement rebootstrap support
  2018-07-21 12:38 [PATCH v4 0/3] Replica rejoin Vladimir Davydov
  2018-07-21 12:38 ` [PATCH v4 1/3] replication: rebootstrap instance on startup if it fell behind Vladimir Davydov
  2018-07-21 12:38 ` [PATCH v4 2/3] vinyl: simplify vylog recovery from backup Vladimir Davydov
@ 2018-07-21 12:38 ` Vladimir Davydov
  2 siblings, 0 replies; 4+ messages in thread
From: Vladimir Davydov @ 2018-07-21 12:38 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

If vy_log_bootstrap() finds a vylog file in the vinyl directory, it
assumes it has to be rebootstrapped and calls vy_log_rebootstrap().
The latter scans the old vylog file to find the max vinyl object id,
from which it will start numbering objects created during rebootstrap to
avoid conflicts with old objects, then it writes VY_LOG_REBOOTSTRAP
record to the old vylog to denote the beginning of a rebootstrap
section. After that initial join proceeds as usual, writing information
about new objects to the old vylog file after VY_LOG_REBOOTSTRAP marker.
Upon successful rebootstrap completion, checkpoint, which is always
called right after bootstrap, rotates the old vylog and marks all
objects created before the VY_LOG_REBOOTSTRAP marker as dropped in the
new vylog. The old objects will be purged by the garbage collector as
usual.

In case rebootstrap fails and checkpoint never happens, local recovery
writes VY_LOG_ABORT_REBOOTSTRAP record to the vylog. This marker
indicates that the rebootstrap attempt failed and all objects created
during rebootstrap should be discarded. They will be purged by the
garbage collector on checkpoint. Thus even if rebootstrap fails, it is
possible to recover the database to the state that existed right before
a failed rebootstrap attempt.

Closes #461
---
 src/box/relay.cc                         |   3 +
 src/box/vy_log.c                         | 133 +++++++++++++++-
 src/box/vy_log.h                         |  34 ++++
 src/errinj.h                             |   1 +
 test/box/errinj.result                   |   6 +-
 test/replication/replica_rejoin.result   |  11 +-
 test/replication/replica_rejoin.test.lua |   7 +-
 test/replication/suite.cfg               |   1 -
 test/vinyl/replica_rejoin.lua            |  13 ++
 test/vinyl/replica_rejoin.result         | 257 +++++++++++++++++++++++++++++++
 test/vinyl/replica_rejoin.test.lua       |  88 +++++++++++
 test/vinyl/suite.ini                     |   2 +-
 12 files changed, 536 insertions(+), 20 deletions(-)
 create mode 100644 test/vinyl/replica_rejoin.lua
 create mode 100644 test/vinyl/replica_rejoin.result
 create mode 100644 test/vinyl/replica_rejoin.test.lua

diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4cacbc84..05468f20 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -287,6 +287,9 @@ relay_final_join(struct replica *replica, int fd, uint64_t sync,
 	if (rc != 0)
 		diag_raise();
 
+	ERROR_INJECT(ERRINJ_RELAY_FINAL_JOIN,
+		     tnt_raise(ClientError, ER_INJECTION, "relay final join"));
+
 	ERROR_INJECT(ERRINJ_RELAY_FINAL_SLEEP, {
 		while (vclock_compare(stop_vclock, &replicaset.vclock) == 0)
 			fiber_sleep(0.001);
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index 10648106..3843cad6 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -124,6 +124,8 @@ static const char *vy_log_type_name[] = {
 	[VY_LOG_MODIFY_LSM]		= "modify_lsm",
 	[VY_LOG_FORGET_LSM]		= "forget_lsm",
 	[VY_LOG_PREPARE_LSM]		= "prepare_lsm",
+	[VY_LOG_REBOOTSTRAP]		= "rebootstrap",
+	[VY_LOG_ABORT_REBOOTSTRAP]	= "abort_rebootstrap",
 };
 
 /** Metadata log object. */
@@ -852,17 +854,43 @@ vy_log_next_id(void)
 	return vy_log.next_id++;
 }
 
+/**
+ * If a vylog file already exists, we are doing a rebootstrap:
+ * - Load the vylog to find out the id to start indexing new
+ *   objects with.
+ * - Mark the beginning of a new rebootstrap attempt by writing
+ *   VY_LOG_REBOOTSTRAP record.
+ */
+static int
+vy_log_rebootstrap(void)
+{
+	struct vy_recovery *recovery;
+	recovery = vy_recovery_new(vclock_sum(&vy_log.last_checkpoint),
+				   VY_RECOVERY_ABORT_REBOOTSTRAP);
+	if (recovery == NULL)
+		return -1;
+
+	vy_log.next_id = recovery->max_id + 1;
+	vy_recovery_delete(recovery);
+
+	struct vy_log_record record;
+	vy_log_record_init(&record);
+	record.type = VY_LOG_REBOOTSTRAP;
+	vy_log_tx_begin();
+	vy_log_write(&record);
+	if (vy_log_tx_commit() != 0)
+		return -1;
+
+	return 0;
+}
+
 int
 vy_log_bootstrap(void)
 {
-	/*
-	 * Scan the directory to make sure there is no
-	 * vylog files left from previous setups.
-	 */
 	if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
 		return -1;
-	if (xdir_last_vclock(&vy_log.dir, NULL) >= 0)
-		panic("vinyl directory is not empty");
+	if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) >= 0)
+		return vy_log_rebootstrap();
 
 	/* Add initial vclock to the xdir. */
 	struct vclock *vclock = malloc(sizeof(*vclock));
@@ -914,11 +942,29 @@ vy_log_begin_recovery(const struct vclock *vclock)
 			return NULL;
 	}
 
+	/*
+	 * If we are recovering from a vylog that has an unfinished
+	 * rebootstrap section, checkpoint (and hence rebootstrap)
+	 * failed, and we need to mark rebootstrap as aborted.
+	 */
 	struct vy_recovery *recovery;
-	recovery = vy_recovery_new(vclock_sum(&vy_log.last_checkpoint), 0);
+	recovery = vy_recovery_new(vclock_sum(&vy_log.last_checkpoint),
+				   VY_RECOVERY_ABORT_REBOOTSTRAP);
 	if (recovery == NULL)
 		return NULL;
 
+	if (recovery->in_rebootstrap) {
+		struct vy_log_record record;
+		vy_log_record_init(&record);
+		record.type = VY_LOG_ABORT_REBOOTSTRAP;
+		vy_log_tx_begin();
+		vy_log_write(&record);
+		if (vy_log_tx_commit() != 0) {
+			vy_recovery_delete(recovery);
+			return NULL;
+		}
+	}
+
 	vy_log.next_id = recovery->max_id + 1;
 	vy_log.recovery = recovery;
 	return recovery;
@@ -1292,6 +1338,7 @@ vy_recovery_do_create_lsm(struct vy_recovery *recovery, int64_t id,
 	 * before the final version.
 	 */
 	rlist_add_tail_entry(&recovery->lsms, lsm, in_recovery);
+	lsm->in_rebootstrap = recovery->in_rebootstrap;
 	if (recovery->max_id < id)
 		recovery->max_id = id;
 	return lsm;
@@ -1875,6 +1922,42 @@ vy_recovery_delete_slice(struct vy_recovery *recovery, int64_t slice_id)
 }
 
 /**
+ * Mark all LSM trees created during rebootstrap as dropped so
+ * that they will be purged on the next garbage collection.
+ */
+static void
+vy_recovery_do_abort_rebootstrap(struct vy_recovery *recovery)
+{
+	struct vy_lsm_recovery_info *lsm;
+	rlist_foreach_entry(lsm, &recovery->lsms, in_recovery) {
+		if (lsm->in_rebootstrap) {
+			lsm->in_rebootstrap = false;
+			lsm->create_lsn = -1;
+			lsm->modify_lsn = -1;
+			lsm->drop_lsn = 0;
+		}
+	}
+}
+
+/** Handle a VY_LOG_REBOOTSTRAP log record. */
+static void
+vy_recovery_rebootstrap(struct vy_recovery *recovery)
+{
+	if (recovery->in_rebootstrap)
+		vy_recovery_do_abort_rebootstrap(recovery);
+	recovery->in_rebootstrap = true;
+}
+
+/** Handle VY_LOG_ABORT_REBOOTSTRAP record. */
+static void
+vy_recovery_abort_rebootstrap(struct vy_recovery *recovery)
+{
+	if (recovery->in_rebootstrap)
+		vy_recovery_do_abort_rebootstrap(recovery);
+	recovery->in_rebootstrap = false;
+}
+
+/**
  * Update a recovery context with a new log record.
  * Return 0 on success, -1 on failure.
  *
@@ -1885,7 +1968,7 @@ static int
 vy_recovery_process_record(struct vy_recovery *recovery,
 			   const struct vy_log_record *record)
 {
-	int rc;
+	int rc = 0;
 	switch (record->type) {
 	case VY_LOG_PREPARE_LSM:
 		rc = vy_recovery_prepare_lsm(recovery, record->lsm_id,
@@ -1950,6 +2033,12 @@ vy_recovery_process_record(struct vy_recovery *recovery,
 		/* Not used anymore, ignore. */
 		rc = 0;
 		break;
+	case VY_LOG_REBOOTSTRAP:
+		vy_recovery_rebootstrap(recovery);
+		break;
+	case VY_LOG_ABORT_REBOOTSTRAP:
+		vy_recovery_abort_rebootstrap(recovery);
+		break;
 	default:
 		unreachable();
 	}
@@ -1960,6 +2049,26 @@ vy_recovery_process_record(struct vy_recovery *recovery,
 }
 
 /**
+ * Commit the last rebootstrap attempt - drop all objects created
+ * before rebootstrap.
+ */
+static void
+vy_recovery_commit_rebootstrap(struct vy_recovery *recovery)
+{
+	assert(recovery->in_rebootstrap);
+	struct vy_lsm_recovery_info *lsm;
+	rlist_foreach_entry(lsm, &recovery->lsms, in_recovery) {
+		if (!lsm->in_rebootstrap && lsm->drop_lsn < 0) {
+			/*
+			 * The files will be removed when the current
+			 * checkpoint is purged by garbage collector.
+			 */
+			lsm->drop_lsn = vy_log_signature();
+		}
+	}
+}
+
+/**
  * Fill index_id_hash with LSM trees recovered from vylog.
  */
 static int
@@ -2050,6 +2159,7 @@ vy_recovery_new_f(va_list ap)
 	recovery->run_hash = NULL;
 	recovery->slice_hash = NULL;
 	recovery->max_id = -1;
+	recovery->in_rebootstrap = false;
 
 	recovery->index_id_hash = mh_i64ptr_new();
 	recovery->lsm_hash = mh_i64ptr_new();
@@ -2103,6 +2213,13 @@ vy_recovery_new_f(va_list ap)
 
 	xlog_cursor_close(&cursor, false);
 
+	if (recovery->in_rebootstrap) {
+		if ((flags & VY_RECOVERY_ABORT_REBOOTSTRAP) != 0)
+			vy_recovery_do_abort_rebootstrap(recovery);
+		else
+			vy_recovery_commit_rebootstrap(recovery);
+	}
+
 	if (vy_recovery_build_index_id_hash(recovery) != 0)
 		goto fail_free;
 out:
diff --git a/src/box/vy_log.h b/src/box/vy_log.h
index 98cbf6ee..7718d9c6 100644
--- a/src/box/vy_log.h
+++ b/src/box/vy_log.h
@@ -196,6 +196,27 @@ enum vy_log_record_type {
 	 * a VY_LOG_CREATE_LSM record to commit it.
 	 */
 	VY_LOG_PREPARE_LSM		= 15,
+	/**
+	 * This record denotes the beginning of a rebootstrap section.
+	 * A rebootstrap section ends either by another record of this
+	 * type or by VY_LOG_ABORT_REBOOTSTRAP or at the end of the file.
+	 * All objects created between a VY_LOG_REBOOTSTRAP record and
+	 * VY_LOG_ABORT_REBOOTSTRAP or another VY_LOG_REBOOTSTRAP are
+	 * considered to be garbage and marked as dropped on recovery.
+	 *
+	 * We write a record of this type if a vylog file already exists
+	 * at bootstrap time, which means we are going to rebootstrap.
+	 * If rebootstrap succeeds, we rotate the vylog on checkpoint and
+	 * mark all objects written before the last VY_LOG_REBOOTSTRAP
+	 * record as dropped in the rotated vylog. If rebootstrap fails,
+	 * we write VY_LOG_ABORT_REBOOTSTRAP on recovery.
+	 */
+	VY_LOG_REBOOTSTRAP		= 16,
+	/**
+	 * This record is written on recovery if rebootstrap failed.
+	 * See also VY_LOG_REBOOTSTRAP.
+	 */
+	VY_LOG_ABORT_REBOOTSTRAP	= 17,
 
 	vy_log_record_type_MAX
 };
@@ -276,6 +297,12 @@ struct vy_recovery {
 	 * or -1 in case no vinyl objects were recovered.
 	 */
 	int64_t max_id;
+	/**
+	 * Set if we are currently processing a rebootstrap section,
+	 * i.e. we encountered a VY_LOG_REBOOTSTRAP record and haven't
+	 * seen matching VY_LOG_ABORT_REBOOTSTRAP.
+	 */
+	bool in_rebootstrap;
 };
 
 /** LSM tree info stored in a recovery context. */
@@ -326,6 +353,8 @@ struct vy_lsm_recovery_info {
 	 * this one after successful ALTER.
 	 */
 	struct vy_lsm_recovery_info *prepared;
+	/** Set if this LSM tree was created during rebootstrap. */
+	bool in_rebootstrap;
 };
 
 /** Vinyl range info stored in a recovery context. */
@@ -533,6 +562,11 @@ enum vy_recovery_flag {
 	 * of the last checkpoint.
 	 */
 	VY_RECOVERY_LOAD_CHECKPOINT	= 1 << 0,
+	/**
+	 * Consider the last attempt to rebootstrap aborted even if
+	 * there's no VY_LOG_ABORT_REBOOTSTRAP record.
+	 */
+	VY_RECOVERY_ABORT_REBOOTSTRAP	= 1 << 1,
 };
 
 /**
diff --git a/src/errinj.h b/src/errinj.h
index cde58d48..64d13b02 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -97,6 +97,7 @@ struct errinj {
 	_(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \
+	_(ERRINJ_RELAY_FINAL_JOIN, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_PORT_DUMP, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_XLOG_GARBAGE, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_XLOG_META, ERRINJ_BOOL, {.bparam = false}) \
diff --git a/test/box/errinj.result b/test/box/errinj.result
index 54b6d578..c6b2bbac 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -60,13 +60,15 @@ errinj.info()
     state: false
   ERRINJ_WAL_WRITE_DISK:
     state: false
+  ERRINJ_VY_LOG_FILE_RENAME:
+    state: false
   ERRINJ_VY_RUN_WRITE:
     state: false
-  ERRINJ_VY_LOG_FILE_RENAME:
+  ERRINJ_HTTP_RESPONSE_ADD_WAIT:
     state: false
   ERRINJ_VY_LOG_FLUSH_DELAY:
     state: false
-  ERRINJ_HTTP_RESPONSE_ADD_WAIT:
+  ERRINJ_RELAY_FINAL_JOIN:
     state: false
   ERRINJ_SNAP_COMMIT_DELAY:
     state: false
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index b7563ed9..4370fae4 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -4,9 +4,12 @@ env = require('test_run')
 test_run = env.new()
 ---
 ...
--- Cleanup the instance to remove vylog files left from previous
--- tests, since vinyl doesn't support rebootstrap yet.
-test_run:cmd('restart server default with cleanup=1')
+engine = test_run:get_cfg('engine')
+---
+...
+test_run:cleanup_cluster()
+---
+...
 --
 -- gh-461: check that a replica refetches the last checkpoint
 -- in case it fell behind the master.
@@ -14,7 +17,7 @@ test_run:cmd('restart server default with cleanup=1')
 box.schema.user.grant('guest', 'replication')
 ---
 ...
-_ = box.schema.space.create('test')
+_ = box.schema.space.create('test', {engine = engine})
 ---
 ...
 _ = box.space.test:create_index('pk')
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index dfcb79cf..f998f60d 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -1,16 +1,15 @@
 env = require('test_run')
 test_run = env.new()
+engine = test_run:get_cfg('engine')
 
--- Cleanup the instance to remove vylog files left from previous
--- tests, since vinyl doesn't support rebootstrap yet.
-test_run:cmd('restart server default with cleanup=1')
+test_run:cleanup_cluster()
 
 --
 -- gh-461: check that a replica refetches the last checkpoint
 -- in case it fell behind the master.
 --
 box.schema.user.grant('guest', 'replication')
-_ = box.schema.space.create('test')
+_ = box.schema.space.create('test', {engine = engine})
 _ = box.space.test:create_index('pk')
 _ = box.space.test:insert{1}
 _ = box.space.test:insert{2}
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 2b609f16..95e94e5a 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -6,7 +6,6 @@
     "wal_off.test.lua": {},
     "hot_standby.test.lua": {},
     "rebootstrap.test.lua": {},
-    "replica_rejoin.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
         "vinyl": {"engine": "vinyl"}
diff --git a/test/vinyl/replica_rejoin.lua b/test/vinyl/replica_rejoin.lua
new file mode 100644
index 00000000..7cb7e09a
--- /dev/null
+++ b/test/vinyl/replica_rejoin.lua
@@ -0,0 +1,13 @@
+#!/usr/bin/env tarantool
+
+local replication = os.getenv("MASTER")
+if arg[1] == 'disable_replication' then
+    replication = nil
+end
+
+box.cfg({
+    replication     = replication,
+    vinyl_memory    = 1024 * 1024,
+})
+
+require('console').listen(os.getenv('ADMIN'))
diff --git a/test/vinyl/replica_rejoin.result b/test/vinyl/replica_rejoin.result
new file mode 100644
index 00000000..bd5d1ed3
--- /dev/null
+++ b/test/vinyl/replica_rejoin.result
@@ -0,0 +1,257 @@
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+--
+-- gh-461: check that garbage collection works as expected
+-- after rebootstrap.
+--
+box.schema.user.grant('guest', 'replication')
+---
+...
+_ = box.schema.space.create('test', { id = 9000, engine = 'vinyl' })
+---
+...
+_ = box.space.test:create_index('pk')
+---
+...
+pad = string.rep('x', 15 * 1024)
+---
+...
+for i = 1, 100 do box.space.test:replace{i, pad} end
+---
+...
+box.snapshot()
+---
+- ok
+...
+-- Join a replica. Check its files.
+test_run:cmd("create server replica with rpl_master=default, script='vinyl/replica_rejoin.lua'")
+---
+- true
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+fio = require('fio')
+---
+...
+fio.chdir(box.cfg.vinyl_dir)
+---
+- true
+...
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+---
+- - 9000/0/00000000000000000002.index
+  - 9000/0/00000000000000000002.run
+  - 9000/0/00000000000000000004.index
+  - 9000/0/00000000000000000004.run
+...
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+-- Invoke garbage collector on the master.
+test_run:cmd("restart server default")
+checkpoint_count = box.cfg.checkpoint_count
+---
+...
+box.cfg{checkpoint_count = 1}
+---
+...
+box.space.test:delete(1)
+---
+...
+box.snapshot()
+---
+- ok
+...
+box.cfg{checkpoint_count = checkpoint_count}
+---
+...
+-- Rebootstrap the replica. Check that old files are removed
+-- by garbage collector.
+test_run:cmd("start server replica")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{checkpoint_count = 1}
+---
+...
+box.snapshot()
+---
+- ok
+...
+fio = require('fio')
+---
+...
+fio.chdir(box.cfg.vinyl_dir)
+---
+- true
+...
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+---
+- - 9000/0/00000000000000000008.index
+  - 9000/0/00000000000000000008.run
+  - 9000/0/00000000000000000010.index
+  - 9000/0/00000000000000000010.run
+...
+box.space.test:count() -- 99
+---
+- 99
+...
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+-- Invoke garbage collector on the master.
+test_run:cmd("restart server default")
+checkpoint_count = box.cfg.checkpoint_count
+---
+...
+box.cfg{checkpoint_count = 1}
+---
+...
+box.space.test:delete(2)
+---
+...
+box.snapshot()
+---
+- ok
+...
+box.cfg{checkpoint_count = checkpoint_count}
+---
+...
+-- Make the master fail join after sending data. Check that
+-- files written during failed rebootstrap attempt are removed
+-- by garbage collector.
+box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', true)
+---
+- ok
+...
+test_run:cmd("start server replica with crash_expected=True") -- fail
+---
+- false
+...
+test_run:cmd("start server replica with crash_expected=True") -- fail again
+---
+- false
+...
+test_run:cmd("start server replica with args='disable_replication'")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{checkpoint_count = 1}
+---
+...
+box.snapshot()
+---
+- ok
+...
+fio = require('fio')
+---
+...
+fio.chdir(box.cfg.vinyl_dir)
+---
+- true
+...
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+---
+- - 9000/0/00000000000000000008.index
+  - 9000/0/00000000000000000008.run
+  - 9000/0/00000000000000000010.index
+  - 9000/0/00000000000000000010.run
+...
+box.space.test:count() -- 99
+---
+- 99
+...
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', false)
+---
+- ok
+...
+-- Rebootstrap after several failed attempts and make sure
+-- old files are removed.
+test_run:cmd("start server replica")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{checkpoint_count = 1}
+---
+...
+box.snapshot()
+---
+- ok
+...
+fio = require('fio')
+---
+...
+fio.chdir(box.cfg.vinyl_dir)
+---
+- true
+...
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+---
+- - 9000/0/00000000000000000022.index
+  - 9000/0/00000000000000000022.run
+  - 9000/0/00000000000000000024.index
+  - 9000/0/00000000000000000024.run
+...
+box.space.test:count() -- 98
+---
+- 98
+...
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+-- Cleanup.
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+box.space.test:drop()
+---
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
diff --git a/test/vinyl/replica_rejoin.test.lua b/test/vinyl/replica_rejoin.test.lua
new file mode 100644
index 00000000..972b04e5
--- /dev/null
+++ b/test/vinyl/replica_rejoin.test.lua
@@ -0,0 +1,88 @@
+env = require('test_run')
+test_run = env.new()
+
+--
+-- gh-461: check that garbage collection works as expected
+-- after rebootstrap.
+--
+box.schema.user.grant('guest', 'replication')
+_ = box.schema.space.create('test', { id = 9000, engine = 'vinyl' })
+_ = box.space.test:create_index('pk')
+pad = string.rep('x', 15 * 1024)
+for i = 1, 100 do box.space.test:replace{i, pad} end
+box.snapshot()
+
+-- Join a replica. Check its files.
+test_run:cmd("create server replica with rpl_master=default, script='vinyl/replica_rejoin.lua'")
+test_run:cmd("start server replica")
+test_run:cmd("switch replica")
+fio = require('fio')
+fio.chdir(box.cfg.vinyl_dir)
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+
+-- Invoke garbage collector on the master.
+test_run:cmd("restart server default")
+checkpoint_count = box.cfg.checkpoint_count
+box.cfg{checkpoint_count = 1}
+box.space.test:delete(1)
+box.snapshot()
+box.cfg{checkpoint_count = checkpoint_count}
+
+-- Rebootstrap the replica. Check that old files are removed
+-- by garbage collector.
+test_run:cmd("start server replica")
+test_run:cmd("switch replica")
+box.cfg{checkpoint_count = 1}
+box.snapshot()
+fio = require('fio')
+fio.chdir(box.cfg.vinyl_dir)
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+box.space.test:count() -- 99
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+
+-- Invoke garbage collector on the master.
+test_run:cmd("restart server default")
+checkpoint_count = box.cfg.checkpoint_count
+box.cfg{checkpoint_count = 1}
+box.space.test:delete(2)
+box.snapshot()
+box.cfg{checkpoint_count = checkpoint_count}
+
+-- Make the master fail join after sending data. Check that
+-- files written during failed rebootstrap attempt are removed
+-- by garbage collector.
+box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', true)
+test_run:cmd("start server replica with crash_expected=True") -- fail
+test_run:cmd("start server replica with crash_expected=True") -- fail again
+test_run:cmd("start server replica with args='disable_replication'")
+test_run:cmd("switch replica")
+box.cfg{checkpoint_count = 1}
+box.snapshot()
+fio = require('fio')
+fio.chdir(box.cfg.vinyl_dir)
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+box.space.test:count() -- 99
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', false)
+
+-- Rebootstrap after several failed attempts and make sure
+-- old files are removed.
+test_run:cmd("start server replica")
+test_run:cmd("switch replica")
+box.cfg{checkpoint_count = 1}
+box.snapshot()
+fio = require('fio')
+fio.chdir(box.cfg.vinyl_dir)
+fio.glob(fio.pathjoin(box.space.test.id, 0, '*'))
+box.space.test:count() -- 98
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+
+-- Cleanup.
+test_run:cmd("cleanup server replica")
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini
index ca964289..b9dae380 100644
--- a/test/vinyl/suite.ini
+++ b/test/vinyl/suite.ini
@@ -2,7 +2,7 @@
 core = tarantool
 description = vinyl integration tests
 script = vinyl.lua
-release_disabled = errinj.test.lua errinj_gc.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua
+release_disabled = errinj.test.lua errinj_gc.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua
 config = suite.cfg
 lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua
 use_unix_sockets = True
-- 
2.11.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-07-21 12:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-21 12:38 [PATCH v4 0/3] Replica rejoin Vladimir Davydov
2018-07-21 12:38 ` [PATCH v4 1/3] replication: rebootstrap instance on startup if it fell behind Vladimir Davydov
2018-07-21 12:38 ` [PATCH v4 2/3] vinyl: simplify vylog recovery from backup Vladimir Davydov
2018-07-21 12:38 ` [PATCH v4 3/3] vinyl: implement rebootstrap support Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox