From: Vladimir Davydov <vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org Subject: [PATCH v4 3/3] vinyl: implement rebootstrap support Date: Sat, 21 Jul 2018 15:38:14 +0300 [thread overview] Message-ID: <d99f38dea0f16db774680e8b49d75f8f6211da82.1532175728.git.vdavydov.dev@gmail.com> (raw) In-Reply-To: <cover.1532175728.git.vdavydov.dev@gmail.com> In-Reply-To: <cover.1532175728.git.vdavydov.dev@gmail.com> If vy_log_bootstrap() finds a vylog file in the vinyl directory, it assumes it has to be rebootstrapped and calls vy_log_rebootstrap(). The latter scans the old vylog file to find the max vinyl object id, from which it will start numbering objects created during rebootstrap to avoid conflicts with old objects, then it writes VY_LOG_REBOOTSTRAP record to the old vylog to denote the beginning of a rebootstrap section. After that initial join proceeds as usual, writing information about new objects to the old vylog file after VY_LOG_REBOOTSTRAP marker. Upon successful rebootstrap completion, checkpoint, which is always called right after bootstrap, rotates the old vylog and marks all objects created before the VY_LOG_REBOOTSTRAP marker as dropped in the new vylog. The old objects will be purged by the garbage collector as usual. In case rebootstrap fails and checkpoint never happens, local recovery writes VY_LOG_ABORT_REBOOTSTRAP record to the vylog. This marker indicates that the rebootstrap attempt failed and all objects created during rebootstrap should be discarded. They will be purged by the garbage collector on checkpoint. Thus even if rebootstrap fails, it is possible to recover the database to the state that existed right before a failed rebootstrap attempt. Closes #461 --- src/box/relay.cc | 3 + src/box/vy_log.c | 133 +++++++++++++++- src/box/vy_log.h | 34 ++++ src/errinj.h | 1 + test/box/errinj.result | 6 +- test/replication/replica_rejoin.result | 11 +- test/replication/replica_rejoin.test.lua | 7 +- test/replication/suite.cfg | 1 - test/vinyl/replica_rejoin.lua | 13 ++ test/vinyl/replica_rejoin.result | 257 +++++++++++++++++++++++++++++++ test/vinyl/replica_rejoin.test.lua | 88 +++++++++++ test/vinyl/suite.ini | 2 +- 12 files changed, 536 insertions(+), 20 deletions(-) create mode 100644 test/vinyl/replica_rejoin.lua create mode 100644 test/vinyl/replica_rejoin.result create mode 100644 test/vinyl/replica_rejoin.test.lua diff --git a/src/box/relay.cc b/src/box/relay.cc index 4cacbc84..05468f20 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -287,6 +287,9 @@ relay_final_join(struct replica *replica, int fd, uint64_t sync, if (rc != 0) diag_raise(); + ERROR_INJECT(ERRINJ_RELAY_FINAL_JOIN, + tnt_raise(ClientError, ER_INJECTION, "relay final join")); + ERROR_INJECT(ERRINJ_RELAY_FINAL_SLEEP, { while (vclock_compare(stop_vclock, &replicaset.vclock) == 0) fiber_sleep(0.001); diff --git a/src/box/vy_log.c b/src/box/vy_log.c index 10648106..3843cad6 100644 --- a/src/box/vy_log.c +++ b/src/box/vy_log.c @@ -124,6 +124,8 @@ static const char *vy_log_type_name[] = { [VY_LOG_MODIFY_LSM] = "modify_lsm", [VY_LOG_FORGET_LSM] = "forget_lsm", [VY_LOG_PREPARE_LSM] = "prepare_lsm", + [VY_LOG_REBOOTSTRAP] = "rebootstrap", + [VY_LOG_ABORT_REBOOTSTRAP] = "abort_rebootstrap", }; /** Metadata log object. */ @@ -852,17 +854,43 @@ vy_log_next_id(void) return vy_log.next_id++; } +/** + * If a vylog file already exists, we are doing a rebootstrap: + * - Load the vylog to find out the id to start indexing new + * objects with. + * - Mark the beginning of a new rebootstrap attempt by writing + * VY_LOG_REBOOTSTRAP record. + */ +static int +vy_log_rebootstrap(void) +{ + struct vy_recovery *recovery; + recovery = vy_recovery_new(vclock_sum(&vy_log.last_checkpoint), + VY_RECOVERY_ABORT_REBOOTSTRAP); + if (recovery == NULL) + return -1; + + vy_log.next_id = recovery->max_id + 1; + vy_recovery_delete(recovery); + + struct vy_log_record record; + vy_log_record_init(&record); + record.type = VY_LOG_REBOOTSTRAP; + vy_log_tx_begin(); + vy_log_write(&record); + if (vy_log_tx_commit() != 0) + return -1; + + return 0; +} + int vy_log_bootstrap(void) { - /* - * Scan the directory to make sure there is no - * vylog files left from previous setups. - */ if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT) return -1; - if (xdir_last_vclock(&vy_log.dir, NULL) >= 0) - panic("vinyl directory is not empty"); + if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) >= 0) + return vy_log_rebootstrap(); /* Add initial vclock to the xdir. */ struct vclock *vclock = malloc(sizeof(*vclock)); @@ -914,11 +942,29 @@ vy_log_begin_recovery(const struct vclock *vclock) return NULL; } + /* + * If we are recovering from a vylog that has an unfinished + * rebootstrap section, checkpoint (and hence rebootstrap) + * failed, and we need to mark rebootstrap as aborted. + */ struct vy_recovery *recovery; - recovery = vy_recovery_new(vclock_sum(&vy_log.last_checkpoint), 0); + recovery = vy_recovery_new(vclock_sum(&vy_log.last_checkpoint), + VY_RECOVERY_ABORT_REBOOTSTRAP); if (recovery == NULL) return NULL; + if (recovery->in_rebootstrap) { + struct vy_log_record record; + vy_log_record_init(&record); + record.type = VY_LOG_ABORT_REBOOTSTRAP; + vy_log_tx_begin(); + vy_log_write(&record); + if (vy_log_tx_commit() != 0) { + vy_recovery_delete(recovery); + return NULL; + } + } + vy_log.next_id = recovery->max_id + 1; vy_log.recovery = recovery; return recovery; @@ -1292,6 +1338,7 @@ vy_recovery_do_create_lsm(struct vy_recovery *recovery, int64_t id, * before the final version. */ rlist_add_tail_entry(&recovery->lsms, lsm, in_recovery); + lsm->in_rebootstrap = recovery->in_rebootstrap; if (recovery->max_id < id) recovery->max_id = id; return lsm; @@ -1875,6 +1922,42 @@ vy_recovery_delete_slice(struct vy_recovery *recovery, int64_t slice_id) } /** + * Mark all LSM trees created during rebootstrap as dropped so + * that they will be purged on the next garbage collection. + */ +static void +vy_recovery_do_abort_rebootstrap(struct vy_recovery *recovery) +{ + struct vy_lsm_recovery_info *lsm; + rlist_foreach_entry(lsm, &recovery->lsms, in_recovery) { + if (lsm->in_rebootstrap) { + lsm->in_rebootstrap = false; + lsm->create_lsn = -1; + lsm->modify_lsn = -1; + lsm->drop_lsn = 0; + } + } +} + +/** Handle a VY_LOG_REBOOTSTRAP log record. */ +static void +vy_recovery_rebootstrap(struct vy_recovery *recovery) +{ + if (recovery->in_rebootstrap) + vy_recovery_do_abort_rebootstrap(recovery); + recovery->in_rebootstrap = true; +} + +/** Handle VY_LOG_ABORT_REBOOTSTRAP record. */ +static void +vy_recovery_abort_rebootstrap(struct vy_recovery *recovery) +{ + if (recovery->in_rebootstrap) + vy_recovery_do_abort_rebootstrap(recovery); + recovery->in_rebootstrap = false; +} + +/** * Update a recovery context with a new log record. * Return 0 on success, -1 on failure. * @@ -1885,7 +1968,7 @@ static int vy_recovery_process_record(struct vy_recovery *recovery, const struct vy_log_record *record) { - int rc; + int rc = 0; switch (record->type) { case VY_LOG_PREPARE_LSM: rc = vy_recovery_prepare_lsm(recovery, record->lsm_id, @@ -1950,6 +2033,12 @@ vy_recovery_process_record(struct vy_recovery *recovery, /* Not used anymore, ignore. */ rc = 0; break; + case VY_LOG_REBOOTSTRAP: + vy_recovery_rebootstrap(recovery); + break; + case VY_LOG_ABORT_REBOOTSTRAP: + vy_recovery_abort_rebootstrap(recovery); + break; default: unreachable(); } @@ -1960,6 +2049,26 @@ vy_recovery_process_record(struct vy_recovery *recovery, } /** + * Commit the last rebootstrap attempt - drop all objects created + * before rebootstrap. + */ +static void +vy_recovery_commit_rebootstrap(struct vy_recovery *recovery) +{ + assert(recovery->in_rebootstrap); + struct vy_lsm_recovery_info *lsm; + rlist_foreach_entry(lsm, &recovery->lsms, in_recovery) { + if (!lsm->in_rebootstrap && lsm->drop_lsn < 0) { + /* + * The files will be removed when the current + * checkpoint is purged by garbage collector. + */ + lsm->drop_lsn = vy_log_signature(); + } + } +} + +/** * Fill index_id_hash with LSM trees recovered from vylog. */ static int @@ -2050,6 +2159,7 @@ vy_recovery_new_f(va_list ap) recovery->run_hash = NULL; recovery->slice_hash = NULL; recovery->max_id = -1; + recovery->in_rebootstrap = false; recovery->index_id_hash = mh_i64ptr_new(); recovery->lsm_hash = mh_i64ptr_new(); @@ -2103,6 +2213,13 @@ vy_recovery_new_f(va_list ap) xlog_cursor_close(&cursor, false); + if (recovery->in_rebootstrap) { + if ((flags & VY_RECOVERY_ABORT_REBOOTSTRAP) != 0) + vy_recovery_do_abort_rebootstrap(recovery); + else + vy_recovery_commit_rebootstrap(recovery); + } + if (vy_recovery_build_index_id_hash(recovery) != 0) goto fail_free; out: diff --git a/src/box/vy_log.h b/src/box/vy_log.h index 98cbf6ee..7718d9c6 100644 --- a/src/box/vy_log.h +++ b/src/box/vy_log.h @@ -196,6 +196,27 @@ enum vy_log_record_type { * a VY_LOG_CREATE_LSM record to commit it. */ VY_LOG_PREPARE_LSM = 15, + /** + * This record denotes the beginning of a rebootstrap section. + * A rebootstrap section ends either by another record of this + * type or by VY_LOG_ABORT_REBOOTSTRAP or at the end of the file. + * All objects created between a VY_LOG_REBOOTSTRAP record and + * VY_LOG_ABORT_REBOOTSTRAP or another VY_LOG_REBOOTSTRAP are + * considered to be garbage and marked as dropped on recovery. + * + * We write a record of this type if a vylog file already exists + * at bootstrap time, which means we are going to rebootstrap. + * If rebootstrap succeeds, we rotate the vylog on checkpoint and + * mark all objects written before the last VY_LOG_REBOOTSTRAP + * record as dropped in the rotated vylog. If rebootstrap fails, + * we write VY_LOG_ABORT_REBOOTSTRAP on recovery. + */ + VY_LOG_REBOOTSTRAP = 16, + /** + * This record is written on recovery if rebootstrap failed. + * See also VY_LOG_REBOOTSTRAP. + */ + VY_LOG_ABORT_REBOOTSTRAP = 17, vy_log_record_type_MAX }; @@ -276,6 +297,12 @@ struct vy_recovery { * or -1 in case no vinyl objects were recovered. */ int64_t max_id; + /** + * Set if we are currently processing a rebootstrap section, + * i.e. we encountered a VY_LOG_REBOOTSTRAP record and haven't + * seen matching VY_LOG_ABORT_REBOOTSTRAP. + */ + bool in_rebootstrap; }; /** LSM tree info stored in a recovery context. */ @@ -326,6 +353,8 @@ struct vy_lsm_recovery_info { * this one after successful ALTER. */ struct vy_lsm_recovery_info *prepared; + /** Set if this LSM tree was created during rebootstrap. */ + bool in_rebootstrap; }; /** Vinyl range info stored in a recovery context. */ @@ -533,6 +562,11 @@ enum vy_recovery_flag { * of the last checkpoint. */ VY_RECOVERY_LOAD_CHECKPOINT = 1 << 0, + /** + * Consider the last attempt to rebootstrap aborted even if + * there's no VY_LOG_ABORT_REBOOTSTRAP record. + */ + VY_RECOVERY_ABORT_REBOOTSTRAP = 1 << 1, }; /** diff --git a/src/errinj.h b/src/errinj.h index cde58d48..64d13b02 100644 --- a/src/errinj.h +++ b/src/errinj.h @@ -97,6 +97,7 @@ struct errinj { _(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \ _(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \ _(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \ + _(ERRINJ_RELAY_FINAL_JOIN, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_PORT_DUMP, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_XLOG_GARBAGE, ERRINJ_BOOL, {.bparam = false}) \ _(ERRINJ_XLOG_META, ERRINJ_BOOL, {.bparam = false}) \ diff --git a/test/box/errinj.result b/test/box/errinj.result index 54b6d578..c6b2bbac 100644 --- a/test/box/errinj.result +++ b/test/box/errinj.result @@ -60,13 +60,15 @@ errinj.info() state: false ERRINJ_WAL_WRITE_DISK: state: false + ERRINJ_VY_LOG_FILE_RENAME: + state: false ERRINJ_VY_RUN_WRITE: state: false - ERRINJ_VY_LOG_FILE_RENAME: + ERRINJ_HTTP_RESPONSE_ADD_WAIT: state: false ERRINJ_VY_LOG_FLUSH_DELAY: state: false - ERRINJ_HTTP_RESPONSE_ADD_WAIT: + ERRINJ_RELAY_FINAL_JOIN: state: false ERRINJ_SNAP_COMMIT_DELAY: state: false diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result index b7563ed9..4370fae4 100644 --- a/test/replication/replica_rejoin.result +++ b/test/replication/replica_rejoin.result @@ -4,9 +4,12 @@ env = require('test_run') test_run = env.new() --- ... --- Cleanup the instance to remove vylog files left from previous --- tests, since vinyl doesn't support rebootstrap yet. -test_run:cmd('restart server default with cleanup=1') +engine = test_run:get_cfg('engine') +--- +... +test_run:cleanup_cluster() +--- +... -- -- gh-461: check that a replica refetches the last checkpoint -- in case it fell behind the master. @@ -14,7 +17,7 @@ test_run:cmd('restart server default with cleanup=1') box.schema.user.grant('guest', 'replication') --- ... -_ = box.schema.space.create('test') +_ = box.schema.space.create('test', {engine = engine}) --- ... _ = box.space.test:create_index('pk') diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua index dfcb79cf..f998f60d 100644 --- a/test/replication/replica_rejoin.test.lua +++ b/test/replication/replica_rejoin.test.lua @@ -1,16 +1,15 @@ env = require('test_run') test_run = env.new() +engine = test_run:get_cfg('engine') --- Cleanup the instance to remove vylog files left from previous --- tests, since vinyl doesn't support rebootstrap yet. -test_run:cmd('restart server default with cleanup=1') +test_run:cleanup_cluster() -- -- gh-461: check that a replica refetches the last checkpoint -- in case it fell behind the master. -- box.schema.user.grant('guest', 'replication') -_ = box.schema.space.create('test') +_ = box.schema.space.create('test', {engine = engine}) _ = box.space.test:create_index('pk') _ = box.space.test:insert{1} _ = box.space.test:insert{2} diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 2b609f16..95e94e5a 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -6,7 +6,6 @@ "wal_off.test.lua": {}, "hot_standby.test.lua": {}, "rebootstrap.test.lua": {}, - "replica_rejoin.test.lua": {}, "*": { "memtx": {"engine": "memtx"}, "vinyl": {"engine": "vinyl"} diff --git a/test/vinyl/replica_rejoin.lua b/test/vinyl/replica_rejoin.lua new file mode 100644 index 00000000..7cb7e09a --- /dev/null +++ b/test/vinyl/replica_rejoin.lua @@ -0,0 +1,13 @@ +#!/usr/bin/env tarantool + +local replication = os.getenv("MASTER") +if arg[1] == 'disable_replication' then + replication = nil +end + +box.cfg({ + replication = replication, + vinyl_memory = 1024 * 1024, +}) + +require('console').listen(os.getenv('ADMIN')) diff --git a/test/vinyl/replica_rejoin.result b/test/vinyl/replica_rejoin.result new file mode 100644 index 00000000..bd5d1ed3 --- /dev/null +++ b/test/vinyl/replica_rejoin.result @@ -0,0 +1,257 @@ +env = require('test_run') +--- +... +test_run = env.new() +--- +... +-- +-- gh-461: check that garbage collection works as expected +-- after rebootstrap. +-- +box.schema.user.grant('guest', 'replication') +--- +... +_ = box.schema.space.create('test', { id = 9000, engine = 'vinyl' }) +--- +... +_ = box.space.test:create_index('pk') +--- +... +pad = string.rep('x', 15 * 1024) +--- +... +for i = 1, 100 do box.space.test:replace{i, pad} end +--- +... +box.snapshot() +--- +- ok +... +-- Join a replica. Check its files. +test_run:cmd("create server replica with rpl_master=default, script='vinyl/replica_rejoin.lua'") +--- +- true +... +test_run:cmd("start server replica") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +fio = require('fio') +--- +... +fio.chdir(box.cfg.vinyl_dir) +--- +- true +... +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +--- +- - 9000/0/00000000000000000002.index + - 9000/0/00000000000000000002.run + - 9000/0/00000000000000000004.index + - 9000/0/00000000000000000004.run +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +-- Invoke garbage collector on the master. +test_run:cmd("restart server default") +checkpoint_count = box.cfg.checkpoint_count +--- +... +box.cfg{checkpoint_count = 1} +--- +... +box.space.test:delete(1) +--- +... +box.snapshot() +--- +- ok +... +box.cfg{checkpoint_count = checkpoint_count} +--- +... +-- Rebootstrap the replica. Check that old files are removed +-- by garbage collector. +test_run:cmd("start server replica") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +box.cfg{checkpoint_count = 1} +--- +... +box.snapshot() +--- +- ok +... +fio = require('fio') +--- +... +fio.chdir(box.cfg.vinyl_dir) +--- +- true +... +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +--- +- - 9000/0/00000000000000000008.index + - 9000/0/00000000000000000008.run + - 9000/0/00000000000000000010.index + - 9000/0/00000000000000000010.run +... +box.space.test:count() -- 99 +--- +- 99 +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +-- Invoke garbage collector on the master. +test_run:cmd("restart server default") +checkpoint_count = box.cfg.checkpoint_count +--- +... +box.cfg{checkpoint_count = 1} +--- +... +box.space.test:delete(2) +--- +... +box.snapshot() +--- +- ok +... +box.cfg{checkpoint_count = checkpoint_count} +--- +... +-- Make the master fail join after sending data. Check that +-- files written during failed rebootstrap attempt are removed +-- by garbage collector. +box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', true) +--- +- ok +... +test_run:cmd("start server replica with crash_expected=True") -- fail +--- +- false +... +test_run:cmd("start server replica with crash_expected=True") -- fail again +--- +- false +... +test_run:cmd("start server replica with args='disable_replication'") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +box.cfg{checkpoint_count = 1} +--- +... +box.snapshot() +--- +- ok +... +fio = require('fio') +--- +... +fio.chdir(box.cfg.vinyl_dir) +--- +- true +... +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +--- +- - 9000/0/00000000000000000008.index + - 9000/0/00000000000000000008.run + - 9000/0/00000000000000000010.index + - 9000/0/00000000000000000010.run +... +box.space.test:count() -- 99 +--- +- 99 +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', false) +--- +- ok +... +-- Rebootstrap after several failed attempts and make sure +-- old files are removed. +test_run:cmd("start server replica") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +box.cfg{checkpoint_count = 1} +--- +... +box.snapshot() +--- +- ok +... +fio = require('fio') +--- +... +fio.chdir(box.cfg.vinyl_dir) +--- +- true +... +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +--- +- - 9000/0/00000000000000000022.index + - 9000/0/00000000000000000022.run + - 9000/0/00000000000000000024.index + - 9000/0/00000000000000000024.run +... +box.space.test:count() -- 98 +--- +- 98 +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +-- Cleanup. +test_run:cmd("cleanup server replica") +--- +- true +... +box.space.test:drop() +--- +... +box.schema.user.revoke('guest', 'replication') +--- +... diff --git a/test/vinyl/replica_rejoin.test.lua b/test/vinyl/replica_rejoin.test.lua new file mode 100644 index 00000000..972b04e5 --- /dev/null +++ b/test/vinyl/replica_rejoin.test.lua @@ -0,0 +1,88 @@ +env = require('test_run') +test_run = env.new() + +-- +-- gh-461: check that garbage collection works as expected +-- after rebootstrap. +-- +box.schema.user.grant('guest', 'replication') +_ = box.schema.space.create('test', { id = 9000, engine = 'vinyl' }) +_ = box.space.test:create_index('pk') +pad = string.rep('x', 15 * 1024) +for i = 1, 100 do box.space.test:replace{i, pad} end +box.snapshot() + +-- Join a replica. Check its files. +test_run:cmd("create server replica with rpl_master=default, script='vinyl/replica_rejoin.lua'") +test_run:cmd("start server replica") +test_run:cmd("switch replica") +fio = require('fio') +fio.chdir(box.cfg.vinyl_dir) +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +test_run:cmd("switch default") +test_run:cmd("stop server replica") + +-- Invoke garbage collector on the master. +test_run:cmd("restart server default") +checkpoint_count = box.cfg.checkpoint_count +box.cfg{checkpoint_count = 1} +box.space.test:delete(1) +box.snapshot() +box.cfg{checkpoint_count = checkpoint_count} + +-- Rebootstrap the replica. Check that old files are removed +-- by garbage collector. +test_run:cmd("start server replica") +test_run:cmd("switch replica") +box.cfg{checkpoint_count = 1} +box.snapshot() +fio = require('fio') +fio.chdir(box.cfg.vinyl_dir) +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +box.space.test:count() -- 99 +test_run:cmd("switch default") +test_run:cmd("stop server replica") + +-- Invoke garbage collector on the master. +test_run:cmd("restart server default") +checkpoint_count = box.cfg.checkpoint_count +box.cfg{checkpoint_count = 1} +box.space.test:delete(2) +box.snapshot() +box.cfg{checkpoint_count = checkpoint_count} + +-- Make the master fail join after sending data. Check that +-- files written during failed rebootstrap attempt are removed +-- by garbage collector. +box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', true) +test_run:cmd("start server replica with crash_expected=True") -- fail +test_run:cmd("start server replica with crash_expected=True") -- fail again +test_run:cmd("start server replica with args='disable_replication'") +test_run:cmd("switch replica") +box.cfg{checkpoint_count = 1} +box.snapshot() +fio = require('fio') +fio.chdir(box.cfg.vinyl_dir) +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +box.space.test:count() -- 99 +test_run:cmd("switch default") +test_run:cmd("stop server replica") +box.error.injection.set('ERRINJ_RELAY_FINAL_JOIN', false) + +-- Rebootstrap after several failed attempts and make sure +-- old files are removed. +test_run:cmd("start server replica") +test_run:cmd("switch replica") +box.cfg{checkpoint_count = 1} +box.snapshot() +fio = require('fio') +fio.chdir(box.cfg.vinyl_dir) +fio.glob(fio.pathjoin(box.space.test.id, 0, '*')) +box.space.test:count() -- 98 +test_run:cmd("switch default") +test_run:cmd("stop server replica") + +-- Cleanup. +test_run:cmd("cleanup server replica") +box.space.test:drop() +box.schema.user.revoke('guest', 'replication') diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini index ca964289..b9dae380 100644 --- a/test/vinyl/suite.ini +++ b/test/vinyl/suite.ini @@ -2,7 +2,7 @@ core = tarantool description = vinyl integration tests script = vinyl.lua -release_disabled = errinj.test.lua errinj_gc.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua +release_disabled = errinj.test.lua errinj_gc.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua config = suite.cfg lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua use_unix_sockets = True -- 2.11.0
prev parent reply other threads:[~2018-07-21 12:38 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-21 12:38 [PATCH v4 0/3] Replica rejoin Vladimir Davydov 2018-07-21 12:38 ` [PATCH v4 1/3] replication: rebootstrap instance on startup if it fell behind Vladimir Davydov 2018-07-21 12:38 ` [PATCH v4 2/3] vinyl: simplify vylog recovery from backup Vladimir Davydov 2018-07-21 12:38 ` Vladimir Davydov [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d99f38dea0f16db774680e8b49d75f8f6211da82.1532175728.git.vdavydov.dev@gmail.com \ --to=vdavydov.dev@gmail.com \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH v4 3/3] vinyl: implement rebootstrap support' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox