From: Vladimir Davydov <vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org Subject: [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind Date: Sat, 14 Jul 2018 23:49:22 +0300 [thread overview] Message-ID: <64016c63c3727e5df2e4495fe1de52eb8ca5d2eb.1531598427.git.vdavydov.dev@gmail.com> (raw) In-Reply-To: <cover.1531598427.git.vdavydov.dev@gmail.com> In-Reply-To: <cover.1531598427.git.vdavydov.dev@gmail.com> If a replica fell too much behind its peers in the cluster and xlog files needed for it to get up to speed have been removed, it won't be able to proceed without rebootstrap. This patch makes the recovery procedure detect such cases and initiate rebootstrap procedure if necessary. Note, rebootstrap is currently only supported by memtx engine. If there are vinyl spaces on the replica, rebootstrap will fail. This is fixed by the following patches. Part of #461 --- src/box/box.cc | 9 ++ src/box/replication.cc | 27 ++++ src/box/replication.h | 9 ++ test/replication/replica_rejoin.result | 247 +++++++++++++++++++++++++++++++ test/replication/replica_rejoin.test.lua | 92 ++++++++++++ test/replication/suite.cfg | 1 + 6 files changed, 385 insertions(+) create mode 100644 test/replication/replica_rejoin.result create mode 100644 test/replication/replica_rejoin.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index b629a4d8..baf30fce 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1797,6 +1797,9 @@ bootstrap(const struct tt_uuid *instance_uuid, /** * Recover the instance from the local directory. * Enter hot standby if the directory is locked. + * Invoke rebootstrap if the instance fell too much + * behind its peers in the replica set and needs + * to be rebootstrapped. */ static void local_recovery(const struct tt_uuid *instance_uuid, @@ -1832,6 +1835,12 @@ local_recovery(const struct tt_uuid *instance_uuid, if (wal_dir_lock >= 0) { box_listen(); box_sync_replication(replication_connect_timeout, false); + + struct replica *master; + if (replicaset_needs_rejoin(&master)) { + say_info("replica is too old, initiating rejoin"); + return bootstrap_from_master(master); + } } /* diff --git a/src/box/replication.cc b/src/box/replication.cc index f12244c9..d61a984f 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -625,6 +625,33 @@ error: "failed to connect to one or more replicas"); } +bool +replicaset_needs_rejoin(struct replica **master) +{ + replicaset_foreach(replica) { + /* + * Rebootstrap this instance from a master if: + * - the oldest vclock stored on the master is greater + * than or incomparable with the instance vclock + * (so that the instance can't follow the master) and + * - the instance is strictly behind the master (so + * that we won't lose any data by rebootstrapping + * this instance) + */ + struct applier *applier = replica->applier; + if (applier != NULL && + vclock_compare(&applier->remote_status.gc_vclock, + &replicaset.vclock) > 0 && + vclock_compare(&replicaset.vclock, + &applier->remote_status.vclock) < 0) { + *master = replica; + return true; + } + } + *master = NULL; + return false; +} + void replicaset_follow(void) { diff --git a/src/box/replication.h b/src/box/replication.h index fdf995c3..e8b391af 100644 --- a/src/box/replication.h +++ b/src/box/replication.h @@ -360,6 +360,15 @@ replicaset_connect(struct applier **appliers, int count, double timeout, bool connect_all); /** + * Check if the current instance fell too much behind its + * peers in the replica set and needs to be rebootstrapped. + * If it does, return true and set @master to the instance + * to use for rebootstrap, otherwise return false. + */ +bool +replicaset_needs_rejoin(struct replica **master); + +/** * Resume all appliers registered with the replica set. */ void diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result new file mode 100644 index 00000000..b7563ed9 --- /dev/null +++ b/test/replication/replica_rejoin.result @@ -0,0 +1,247 @@ +env = require('test_run') +--- +... +test_run = env.new() +--- +... +-- Cleanup the instance to remove vylog files left from previous +-- tests, since vinyl doesn't support rebootstrap yet. +test_run:cmd('restart server default with cleanup=1') +-- +-- gh-461: check that a replica refetches the last checkpoint +-- in case it fell behind the master. +-- +box.schema.user.grant('guest', 'replication') +--- +... +_ = box.schema.space.create('test') +--- +... +_ = box.space.test:create_index('pk') +--- +... +_ = box.space.test:insert{1} +--- +... +_ = box.space.test:insert{2} +--- +... +_ = box.space.test:insert{3} +--- +... +-- Join a replica, then stop it. +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +--- +- true +... +test_run:cmd("start server replica") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +box.info.replication[1].upstream.status == 'follow' or box.info +--- +- true +... +box.space.test:select() +--- +- - [1] + - [2] + - [3] +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +-- Restart the server to purge the replica from +-- the garbage collection state. +test_run:cmd("restart server default") +-- Make some checkpoints to remove old xlogs. +checkpoint_count = box.cfg.checkpoint_count +--- +... +box.cfg{checkpoint_count = 1} +--- +... +_ = box.space.test:delete{1} +--- +... +_ = box.space.test:insert{10} +--- +... +box.snapshot() +--- +- ok +... +_ = box.space.test:delete{2} +--- +... +_ = box.space.test:insert{20} +--- +... +box.snapshot() +--- +- ok +... +_ = box.space.test:delete{3} +--- +... +_ = box.space.test:insert{30} +--- +... +fio = require('fio') +--- +... +#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1 +--- +- 1 +... +box.cfg{checkpoint_count = checkpoint_count} +--- +... +-- Restart the replica. Since xlogs have been removed, +-- it is supposed to rejoin without changing id. +test_run:cmd("start server replica") +--- +- true +... +box.info.replication[2].downstream.vclock ~= nil or box.info +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +box.info.replication[1].upstream.status == 'follow' or box.info +--- +- true +... +box.space.test:select() +--- +- - [10] + - [20] + - [30] +... +test_run:cmd("switch default") +--- +- true +... +-- Make sure the replica follows new changes. +for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end +--- +... +vclock = test_run:get_vclock('default') +--- +... +_ = test_run:wait_vclock('replica', vclock) +--- +... +test_run:cmd("switch replica") +--- +- true +... +box.space.test:select() +--- +- - [10, 10] + - [20, 20] + - [30, 30] +... +-- Check that restart works as usual. +test_run:cmd("restart server replica") +box.info.replication[1].upstream.status == 'follow' or box.info +--- +- true +... +box.space.test:select() +--- +- - [10, 10] + - [20, 20] + - [30, 30] +... +-- Check that rebootstrap is NOT initiated unless the replica +-- is strictly behind the master. +box.space.test:replace{1, 2, 3} -- bumps LSN on the replica +--- +- [1, 2, 3] +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd("restart server default") +checkpoint_count = box.cfg.checkpoint_count +--- +... +box.cfg{checkpoint_count = 1} +--- +... +for i = 1, 3 do box.space.test:delete{i * 10} end +--- +... +box.snapshot() +--- +- ok +... +for i = 1, 3 do box.space.test:insert{i * 100} end +--- +... +fio = require('fio') +--- +... +#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1 +--- +- 1 +... +box.cfg{checkpoint_count = checkpoint_count} +--- +... +test_run:cmd("start server replica") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +box.info.status -- orphan +--- +- orphan +... +box.space.test:select() +--- +- - [1, 2, 3] + - [10, 10] + - [20, 20] + - [30, 30] +... +-- Cleanup. +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd("cleanup server replica") +--- +- true +... +box.space.test:drop() +--- +... +box.schema.user.revoke('guest', 'replication') +--- +... diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua new file mode 100644 index 00000000..dfcb79cf --- /dev/null +++ b/test/replication/replica_rejoin.test.lua @@ -0,0 +1,92 @@ +env = require('test_run') +test_run = env.new() + +-- Cleanup the instance to remove vylog files left from previous +-- tests, since vinyl doesn't support rebootstrap yet. +test_run:cmd('restart server default with cleanup=1') + +-- +-- gh-461: check that a replica refetches the last checkpoint +-- in case it fell behind the master. +-- +box.schema.user.grant('guest', 'replication') +_ = box.schema.space.create('test') +_ = box.space.test:create_index('pk') +_ = box.space.test:insert{1} +_ = box.space.test:insert{2} +_ = box.space.test:insert{3} + +-- Join a replica, then stop it. +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +test_run:cmd("start server replica") +test_run:cmd("switch replica") +box.info.replication[1].upstream.status == 'follow' or box.info +box.space.test:select() +test_run:cmd("switch default") +test_run:cmd("stop server replica") + +-- Restart the server to purge the replica from +-- the garbage collection state. +test_run:cmd("restart server default") + +-- Make some checkpoints to remove old xlogs. +checkpoint_count = box.cfg.checkpoint_count +box.cfg{checkpoint_count = 1} +_ = box.space.test:delete{1} +_ = box.space.test:insert{10} +box.snapshot() +_ = box.space.test:delete{2} +_ = box.space.test:insert{20} +box.snapshot() +_ = box.space.test:delete{3} +_ = box.space.test:insert{30} +fio = require('fio') +#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1 +box.cfg{checkpoint_count = checkpoint_count} + +-- Restart the replica. Since xlogs have been removed, +-- it is supposed to rejoin without changing id. +test_run:cmd("start server replica") +box.info.replication[2].downstream.vclock ~= nil or box.info +test_run:cmd("switch replica") +box.info.replication[1].upstream.status == 'follow' or box.info +box.space.test:select() +test_run:cmd("switch default") + +-- Make sure the replica follows new changes. +for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end +vclock = test_run:get_vclock('default') +_ = test_run:wait_vclock('replica', vclock) +test_run:cmd("switch replica") +box.space.test:select() + +-- Check that restart works as usual. +test_run:cmd("restart server replica") +box.info.replication[1].upstream.status == 'follow' or box.info +box.space.test:select() + +-- Check that rebootstrap is NOT initiated unless the replica +-- is strictly behind the master. +box.space.test:replace{1, 2, 3} -- bumps LSN on the replica +test_run:cmd("switch default") +test_run:cmd("stop server replica") +test_run:cmd("restart server default") +checkpoint_count = box.cfg.checkpoint_count +box.cfg{checkpoint_count = 1} +for i = 1, 3 do box.space.test:delete{i * 10} end +box.snapshot() +for i = 1, 3 do box.space.test:insert{i * 100} end +fio = require('fio') +#fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) -- 1 +box.cfg{checkpoint_count = checkpoint_count} +test_run:cmd("start server replica") +test_run:cmd("switch replica") +box.info.status -- orphan +box.space.test:select() + +-- Cleanup. +test_run:cmd("switch default") +test_run:cmd("stop server replica") +test_run:cmd("cleanup server replica") +box.space.test:drop() +box.schema.user.revoke('guest', 'replication') diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 95e94e5a..2b609f16 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -6,6 +6,7 @@ "wal_off.test.lua": {}, "hot_standby.test.lua": {}, "rebootstrap.test.lua": {}, + "replica_rejoin.test.lua": {}, "*": { "memtx": {"engine": "memtx"}, "vinyl": {"engine": "vinyl"} -- 2.11.0
next prev parent reply other threads:[~2018-07-14 20:49 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-14 20:49 [PATCH v3 00/11] Replica rejoin Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 01/11] recovery: clean up WAL dir scan code Vladimir Davydov 2018-07-19 7:08 ` Konstantin Osipov 2018-07-14 20:49 ` [PATCH v3 02/11] xrow: factor out function for decoding vclock Vladimir Davydov 2018-07-19 7:08 ` Konstantin Osipov 2018-07-14 20:49 ` [PATCH v3 03/11] Introduce IPROTO_REQUEST_STATUS command Vladimir Davydov 2018-07-19 7:10 ` Konstantin Osipov 2018-07-19 8:17 ` Vladimir Davydov 2018-07-21 10:25 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 04/11] Get rid of IPROTO_SERVER_IS_RO Vladimir Davydov 2018-07-19 7:10 ` Konstantin Osipov 2018-07-21 12:07 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 05/11] gc: keep track of vclocks instead of signatures Vladimir Davydov 2018-07-19 7:11 ` Konstantin Osipov 2018-07-14 20:49 ` [PATCH v3 06/11] Include oldest vclock available on the instance in IPROTO_STATUS Vladimir Davydov 2018-07-19 7:12 ` Konstantin Osipov 2018-07-21 12:07 ` Vladimir Davydov 2018-07-14 20:49 ` Vladimir Davydov [this message] 2018-07-19 7:19 ` [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind Konstantin Osipov 2018-07-19 10:04 ` Vladimir Davydov 2018-07-23 20:19 ` Konstantin Osipov 2018-07-27 16:13 ` [PATCH] replication: print master uuid when (re)bootstrapping Vladimir Davydov 2018-07-31 8:34 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 08/11] vinyl: simplify vylog recovery from backup Vladimir Davydov 2018-07-31 8:21 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 09/11] vinyl: pass flags to vy_recovery_new Vladimir Davydov 2018-07-21 11:12 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 10/11] Update test-run Vladimir Davydov 2018-07-21 11:13 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 11/11] vinyl: implement rebootstrap support Vladimir Davydov 2018-07-31 8:23 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=64016c63c3727e5df2e4495fe1de52eb8ca5d2eb.1531598427.git.vdavydov.dev@gmail.com \ --to=vdavydov.dev@gmail.com \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox