* [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery @ 2020-06-19 18:00 Serge Petrenko 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko 0 siblings, 2 replies; 11+ messages in thread From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw) To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches Branch: gh-4842-sync-replication Serge Petrenko (2): box: rework local_recovery to use async txn_commit replication: support ROLLBACK and CONFIRM during recovery src/box/box.cc | 60 +++++++++++++++++++++++++++++++++++++++++---- src/box/txn_limbo.c | 6 ++--- 2 files changed, 57 insertions(+), 9 deletions(-) -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko @ 2020-06-19 18:00 ` Serge Petrenko 2020-06-19 18:45 ` Serge Petrenko 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko 1 sibling, 1 reply; 11+ messages in thread From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw) To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches Local recovery should use asynchronous txn commit procedure in order to get to CONFIRM and ROLLBACK statements for a transaction that needs confirmation before confirmation timeout happens. Using async txn commit doesn't harm other transactions, since the journal used during local recovery fakes writes and its write_async() method may reuse plain write(). Follow-up #4847 Follow-up #4848 --- src/box/box.cc | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index 8ba7ffafb..f80d6f8e6 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -118,6 +118,8 @@ static struct gc_checkpoint_ref backup_gc; static bool is_box_configured = false; static bool is_ro = true; static fiber_cond ro_cond; +/** Set to true during recovery from local files. */ +static bool is_local_recovery = false; /** * The following flag is set if the instance failed to @@ -206,7 +208,24 @@ box_process_rw(struct request *request, struct space *space, goto rollback; if (is_autocommit) { - if (txn_commit(txn) != 0) + int res = 0; + /* + * During local recovery the commit procedure + * should be async, otherwise the only fiber + * processing recovery will get stuck on the first + * synchronous tx it meets until confirm timeout + * is reached and the tx is rolled back, yielding + * an error. + * Moreover, txn_commit_async() doesn't hurt at + * all during local recovery, since journal_write + * is faked at this stage and returns immediately. + */ + if (is_local_recovery) { + res = txn_commit_async(txn); + } else { + res = txn_commit(txn); + } + if (res < 0) goto error; fiber_gc(); } @@ -327,12 +346,25 @@ recovery_journal_write(struct journal *base, return 0; } +static int +recovery_journal_write_async(struct journal *base, + struct journal_entry *entry) +{ + recovery_journal_write(base, entry); + /* + * Since there're no actual writes, fire a + * journal_async_complete callback right away. + */ + journal_async_complete(base, entry); + return 0; +} + static void recovery_journal_create(struct vclock *v) { static struct recovery_journal journal; - journal_create(&journal.base, journal_no_write_async, - journal_no_write_async_cb, + journal_create(&journal.base, recovery_journal_write_async, + txn_complete_async, recovery_journal_write, NULL); journal.vclock = v; journal_set(&journal.base); @@ -2315,6 +2347,7 @@ local_recovery(const struct tt_uuid *instance_uuid, memtx = (struct memtx_engine *)engine_by_name("memtx"); assert(memtx != NULL); + is_local_recovery = true; recovery_journal_create(&recovery->vclock); /* @@ -2356,6 +2389,7 @@ local_recovery(const struct tt_uuid *instance_uuid, box_sync_replication(false); } recovery_finalize(recovery); + is_local_recovery = false; /* * We must enable WAL before finalizing engine recovery, -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko @ 2020-06-19 18:45 ` Serge Petrenko 2020-06-21 16:25 ` Vladislav Shpilevoy 0 siblings, 1 reply; 11+ messages in thread From: Serge Petrenko @ 2020-06-19 18:45 UTC (permalink / raw) To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches 19.06.2020 21:00, Serge Petrenko пишет: > Local recovery should use asynchronous txn commit procedure in order to > get to CONFIRM and ROLLBACK statements for a transaction that needs > confirmation before confirmation timeout happens. > Using async txn commit doesn't harm other transactions, since the > journal used during local recovery fakes writes and its write_async() > method may reuse plain write(). Added a new commit. I didn't squash it, because its debatable. commit 83760f8b0d1a59e17ab6adeeda60efc1bc6e94ad Author: Serge Petrenko <sergepetrenko@tarantool.org> Date: Fri Jun 19 21:42:35 2020 +0300 box: fix for 'box: rework local_recovery to use async txn_commit' [TO BE SQUASHED INTO THE PREVIOUS COMMIT] diff --git a/src/box/box.cc b/src/box/box.cc index f80d6f8e6..0fe7625fb 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space, */ if (is_local_recovery) { res = txn_commit_async(txn); + /* + * Hack: remove the unnecessary trigger. + * I don't know of a better place to do + * it. + */ + trigger_clear(&txn->on_write_failure); } else { res = txn_commit(txn); } diff --git a/src/box/txn.c b/src/box/txn.c index 2360ecae3..d36c11dda 100644 --- a/src/box/txn.c +++ b/src/box/txn.c @@ -688,9 +688,9 @@ txn_commit_async(struct txn *txn) * Set a trigger to abort waiting for confirm on WAL write * failure. */ - trigger_create(&txn->on_write_failure, txn_limbo_on_rollback, - limbo_entry, NULL); if (is_sync) { + trigger_create(&txn->on_write_failure, txn_limbo_on_rollback, + limbo_entry, NULL); txn_on_rollback(txn, &txn->on_write_failure); } return 0; -- Serge Petrenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-19 18:45 ` Serge Petrenko @ 2020-06-21 16:25 ` Vladislav Shpilevoy 2020-06-22 11:19 ` Serge Petrenko 0 siblings, 1 reply; 11+ messages in thread From: Vladislav Shpilevoy @ 2020-06-21 16:25 UTC (permalink / raw) To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches Hi! Thanks for the patch! > diff --git a/src/box/box.cc b/src/box/box.cc > index f80d6f8e6..0fe7625fb 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space, > */ > if (is_local_recovery) { > res = txn_commit_async(txn); > + /* > + * Hack: remove the unnecessary trigger. > + * I don't know of a better place to do > + * it. > + */ Why is it necessary to remove it? > + trigger_clear(&txn->on_write_failure); > } else { > res = txn_commit(txn); > } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-21 16:25 ` Vladislav Shpilevoy @ 2020-06-22 11:19 ` Serge Petrenko 2020-06-22 18:55 ` Serge Petrenko 0 siblings, 1 reply; 11+ messages in thread From: Serge Petrenko @ 2020-06-22 11:19 UTC (permalink / raw) To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches 21.06.2020 19:25, Vladislav Shpilevoy пишет: > Hi! Thanks for the patch! > >> diff --git a/src/box/box.cc b/src/box/box.cc >> index f80d6f8e6..0fe7625fb 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space, >> */ >> if (is_local_recovery) { >> res = txn_commit_async(txn); >> + /* >> + * Hack: remove the unnecessary trigger. >> + * I don't know of a better place to do >> + * it. >> + */ > Why is it necessary to remove it? The trigger is set after journal_write_async() returns. The idea is that journal_write_async() just submits the request for writing, and later, journal_async_complete(), which is txn_complete_async() completes the tx processing. The key word here is "later", meaning that txn_complete_async() is called after txn_commit async() returns. This is why txn_complete_async() clears the trigger set on write failure: if write fails, we rollback the tx and remove the entry from the limbo. If writing doesn't fail, it's limbo's business to remove an entry. The tx still may be rolled back due to a timeout, then limbo will remove the entry, and if an on_rollback trigger isn't cleared, it'll try to remove the entry again. Now, back to this case: I reworked recovery_journal to use write_async(). recovery_journal_write_async() calls journal_async_complete() right away, since there're no actual writes and no one else can call journal_async_complete(). So, once journal_write_async() returns, the trigger is cleared, but later it's reset at the end of txn_commit_async(), because txn_commit_async() assumes that the write hasn't happened yet, and journal_async_complete() is yet to be called. One option is to call journal_async_complete() after txn_commit_async(), this will solve the problem, but journal_async_complete() receives `struct journal_entry *`, and we have no knowledge of the journal entry outside of `txn_commit_async()`. Another option is to set a txn_flag, like "TXN_DONT_EXPECT_WRITE_FAILURE", so that the txn doesn't set an on_write_failure trigger when we don't need it. But this also looks ugly. That's why I unconditionally reset the trigger after txn_commit_async(). > >> + trigger_clear(&txn->on_write_failure); >> } else { >> res = txn_commit(txn); >> } -- Serge Petrenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-22 11:19 ` Serge Petrenko @ 2020-06-22 18:55 ` Serge Petrenko 2020-06-22 21:43 ` Vladislav Shpilevoy 0 siblings, 1 reply; 11+ messages in thread From: Serge Petrenko @ 2020-06-22 18:55 UTC (permalink / raw) To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches 22.06.2020 14:19, Serge Petrenko пишет: > > 21.06.2020 19:25, Vladislav Shpilevoy пишет: >> Hi! Thanks for the patch! >> >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index f80d6f8e6..0fe7625fb 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct >>> space *space, >>> */ >>> if (is_local_recovery) { >>> res = txn_commit_async(txn); >>> + /* >>> + * Hack: remove the unnecessary trigger. >>> + * I don't know of a better place to do >>> + * it. >>> + */ >> Why is it necessary to remove it? > > The trigger is set after journal_write_async() returns. > The idea is that journal_write_async() just submits the request for > writing, and later, journal_async_complete(), which is > txn_complete_async() > completes the tx processing. The key word here is "later", meaning that > txn_complete_async() is called after txn_commit async() returns. > > This is why txn_complete_async() clears the trigger set on write failure: > if write fails, we rollback the tx and remove the entry from the limbo. > If writing doesn't fail, it's limbo's business to remove an entry. The > tx still > may be rolled back due to a timeout, then limbo will remove the entry, > and if > an on_rollback trigger isn't cleared, it'll try to remove the entry > again. > > Now, back to this case: I reworked recovery_journal to use write_async(). > recovery_journal_write_async() calls journal_async_complete() right away, > since there're no actual writes and no one else can call > journal_async_complete(). > > So, once journal_write_async() returns, the trigger is cleared, but > later it's > reset at the end of txn_commit_async(), because txn_commit_async() > assumes that > the write hasn't happened yet, and journal_async_complete() is yet to > be called. > > One option is to call journal_async_complete() after > txn_commit_async(), this will > solve the problem, but journal_async_complete() receives `struct > journal_entry *`, > and we have no knowledge of the journal entry outside of > `txn_commit_async()`. > > Another option is to set a txn_flag, like > "TXN_DONT_EXPECT_WRITE_FAILURE", so that > the txn doesn't set an on_write_failure trigger when we don't need it. > But this > also looks ugly. > > That's why I unconditionally reset the trigger after txn_commit_async(). After further discussion, we decided to use different values of txn->signature < 0 as reasons for rollbackand instead of resetting the on_write_failure triggers, we'll just make them work onlyfor some rollback reasons (like failed write). > >> >>> + trigger_clear(&txn->on_write_failure); >>> } else { >>> res = txn_commit(txn); >>> } > -- Serge Petrenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-22 18:55 ` Serge Petrenko @ 2020-06-22 21:43 ` Vladislav Shpilevoy 2020-06-23 8:39 ` Serge Petrenko 0 siblings, 1 reply; 11+ messages in thread From: Vladislav Shpilevoy @ 2020-06-22 21:43 UTC (permalink / raw) To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches I am looking at the commit which added on_write_failure. Why can't we reuse on_rollback trigger? As we discussed, we can pass whatever we want using txn->signature. So for manual rollback we can set it to -1, for write failure to -2, for limbo rollback to -3, for timeout to -4, etc. I am just afraid that +16 bytes for transaction object may affect performance of the async transactions. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-22 21:43 ` Vladislav Shpilevoy @ 2020-06-23 8:39 ` Serge Petrenko 2020-06-26 22:00 ` Vladislav Shpilevoy 0 siblings, 1 reply; 11+ messages in thread From: Serge Petrenko @ 2020-06-23 8:39 UTC (permalink / raw) To: Vladislav Shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches 23.06.2020 00:43, Vladislav Shpilevoy пишет: > I am looking at the commit which added on_write_failure. > Why can't we reuse on_rollback trigger? As we discussed, > we can pass whatever we want using txn->signature. So > for manual rollback we can set it to -1, for write failure > to -2, for limbo rollback to -3, for timeout to -4, etc. > > I am just afraid that +16 bytes for transaction object > may affect performance of the async transactions. We can. I've just done it & dropped the additionnal commit (the one that hacked the trigger removal in recovery. -- Serge Petrenko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit 2020-06-23 8:39 ` Serge Petrenko @ 2020-06-26 22:00 ` Vladislav Shpilevoy 0 siblings, 0 replies; 11+ messages in thread From: Vladislav Shpilevoy @ 2020-06-26 22:00 UTC (permalink / raw) To: Serge Petrenko, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches On 23/06/2020 10:39, Serge Petrenko wrote: > > 23.06.2020 00:43, Vladislav Shpilevoy пишет: >> I am looking at the commit which added on_write_failure. >> Why can't we reuse on_rollback trigger? As we discussed, >> we can pass whatever we want using txn->signature. So >> for manual rollback we can set it to -1, for write failure >> to -2, for limbo rollback to -3, for timeout to -4, etc. >> >> I am just afraid that +16 bytes for transaction object >> may affect performance of the async transactions. > > We can. I've just done it & dropped the additionnal commit > > (the one that hacked the trigger removal in recovery. We also can drop TXN_IS_ABORTED_BY_YIELD and reuse the signature for this type of error too. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery 2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko @ 2020-06-19 18:00 ` Serge Petrenko 2020-06-23 11:50 ` Serge Petrenko 1 sibling, 1 reply; 11+ messages in thread From: Serge Petrenko @ 2020-06-19 18:00 UTC (permalink / raw) To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches Follow-up #4847 Follow-up #4848 --- src/box/box.cc | 20 ++++++++++++++++++-- src/box/txn_limbo.c | 6 ++---- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index f80d6f8e6..f4c22b340 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -374,9 +374,25 @@ static void apply_wal_row(struct xstream *stream, struct xrow_header *row) { struct request request; - // TODO: process confirmation during recovery. - if (iproto_type_is_synchro_request(row->type)) + if (iproto_type_is_synchro_request(row->type)) { + uint32_t replica_id; + int64_t lsn; + switch(row->type) { + case IPROTO_CONFIRM: + if (xrow_decode_confirm(row, &replica_id, &lsn) < 0) + diag_raise(); + assert(txn_limbo.instance_id == replica_id); + txn_limbo_read_confirm(&txn_limbo, lsn); + break; + case IPROTO_ROLLBACK: + if (xrow_decode_rollback(row, &replica_id, &lsn) < 0) + diag_raise(); + assert(txn_limbo.instance_id == replica_id); + txn_limbo_read_rollback(&txn_limbo, lsn); + break; + } return; + } xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type)); if (request.type != IPROTO_NOP) { struct space *space = space_cache_find_xc(request.space_id); diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c index 1e1fa1aaf..ed19d4ec5 100644 --- a/src/box/txn_limbo.c +++ b/src/box/txn_limbo.c @@ -228,8 +228,7 @@ txn_limbo_write_confirm(struct txn_limbo *limbo, void txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn) { - assert(limbo->instance_id != REPLICA_ID_NIL && - limbo->instance_id != instance_id); + assert(limbo->instance_id != REPLICA_ID_NIL); struct txn_limbo_entry *e, *tmp; rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) { if (e->lsn > lsn) @@ -262,8 +261,7 @@ txn_limbo_write_rollback(struct txn_limbo *limbo, void txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn) { - assert(limbo->instance_id != REPLICA_ID_NIL && - limbo->instance_id != instance_id); + assert(limbo->instance_id != REPLICA_ID_NIL); struct txn_limbo_entry *e, *tmp; rlist_foreach_entry_safe_reverse(e, &limbo->queue, in_queue, tmp) { if (e->lsn <= lsn) -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko @ 2020-06-23 11:50 ` Serge Petrenko 0 siblings, 0 replies; 11+ messages in thread From: Serge Petrenko @ 2020-06-23 11:50 UTC (permalink / raw) To: v.shpilevoy, sergos, gorcunov, lvasiliev; +Cc: tarantool-patches 19.06.2020 21:00, Serge Petrenko пишет: > Follow-up #4847 > Follow-up #4848 Added a small test for CONFIRM/ROLLBACK: commit 5b91fbd3996c4ca43f5c96a1767e5b7ec7cea86d Author: Serge Petrenko <sergepetrenko@tarantool.org> Date: Tue Jun 23 14:20:45 2020 +0300 replication: add test for synchro CONFIRM/ROLLBACK [TO BE SQUASHED INTO THE PREVIOUS COMMIT] diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result index 551df7daf..4b9823d77 100644 --- a/test/replication/sync_replication_sanity.result +++ b/test/replication/sync_replication_sanity.result @@ -69,3 +69,135 @@ box.schema.create_space('test', {is_sync = true, is_local = true}) | --- | - error: 'Failed to create space ''test'': local space can''t be synchronous' | ... + +-- +-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL. +-- +env = require('test_run') + | --- + | ... +test_run = env.new() + | --- + | ... +fiber = require('fiber') + | --- + | ... +engine = test_run:get_cfg('engine') + | --- + | ... + +box.schema.user.grant('guest', 'replication') + | --- + | ... +-- Set up synchronous replication options. +quorum = box.cfg.replication_synchro_quorum + | --- + | ... +timeout = box.cfg.replication_synchro_timeout + | --- + | ... +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1} + | --- + | ... + +test_run:cmd('create server replica with rpl_master=default,\ + script="replication/replica.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica with wait=True, wait_load=True') + | --- + | - true + | ... + +_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) + | --- + | ... +_ = box.space.sync:create_index('pk') + | --- + | ... + +lsn = box.info.lsn + | --- + | ... +box.space.sync:insert{1} + | --- + | - [1] + | ... +-- 1 for insertion, 1 for CONFIRM message. +box.info.lsn - lsn + | --- + | - 2 + | ... +-- Raise quorum so that master has to issue a ROLLBACK. +box.cfg{replication_synchro_quorum=3} + | --- + | ... +t = fiber.time() + | --- + | ... +box.space.sync:insert{2} + | --- + | - error: Quorum collection for a synchronous transaction is timed out + | ... +-- Check that master waited for acks. +fiber.time() - t > box.cfg.replication_synchro_timeout + | --- + | - true + | ... +box.cfg{replication_synchro_quorum=2} + | --- + | ... +box.space.sync:insert{3} + | --- + | - [3] + | ... +box.space.sync:select{} + | --- + | - - [1] + | - [3] + | ... + +-- Check consistency on replica. +test_run:cmd('switch replica') + | --- + | - true + | ... +box.space.sync:select{} + | --- + | - - [1] + | - [3] + | ... + +-- Check consistency in recovered data. +test_run:cmd('restart server replica') + | +box.space.sync:select{} + | --- + | - - [1] + | - [3] + | ... + +-- Cleanup. +test_run:cmd('switch default') + | --- + | - true + | ... + +box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout} + | --- + | ... +test_run:cmd('stop server replica') + | --- + | - true + | ... +test_run:cmd('delete server replica') + | --- + | - true + | ... +box.space.sync:drop() + | --- + | ... +box.schema.user.revoke('guest', 'replication') + | --- + | ... diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua index fd7ed537e..8715a4600 100644 --- a/test/replication/sync_replication_sanity.test.lua +++ b/test/replication/sync_replication_sanity.test.lua @@ -27,3 +27,55 @@ s2:drop() -- Local space can't be synchronous. box.schema.create_space('test', {is_sync = true, is_local = true}) + +-- +-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL. +-- +env = require('test_run') +test_run = env.new() +fiber = require('fiber') +engine = test_run:get_cfg('engine') + +box.schema.user.grant('guest', 'replication') +-- Set up synchronous replication options. +quorum = box.cfg.replication_synchro_quorum +timeout = box.cfg.replication_synchro_timeout +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1} + +test_run:cmd('create server replica with rpl_master=default,\ + script="replication/replica.lua"') +test_run:cmd('start server replica with wait=True, wait_load=True') + +_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) +_ = box.space.sync:create_index('pk') + +lsn = box.info.lsn +box.space.sync:insert{1} +-- 1 for insertion, 1 for CONFIRM message. +box.info.lsn - lsn +-- Raise quorum so that master has to issue a ROLLBACK. +box.cfg{replication_synchro_quorum=3} +t = fiber.time() +box.space.sync:insert{2} +-- Check that master waited for acks. +fiber.time() - t > box.cfg.replication_synchro_timeout +box.cfg{replication_synchro_quorum=2} +box.space.sync:insert{3} +box.space.sync:select{} + +-- Check consistency on replica. +test_run:cmd('switch replica') +box.space.sync:select{} + +-- Check consistency in recovered data. +test_run:cmd('restart server replica') +box.space.sync:select{} + +-- Cleanup. +test_run:cmd('switch default') + +box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout} +test_run:cmd('stop server replica') +test_run:cmd('delete server replica') +box.space.sync:drop() +box.schema.user.revoke('guest', 'replication') -- Serge Petrenko ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-06-26 22:00 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-19 18:00 [Tarantool-patches] [PATCH 0/2] replication: support CONFIRM and ROLLBACK in recovery Serge Petrenko 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit Serge Petrenko 2020-06-19 18:45 ` Serge Petrenko 2020-06-21 16:25 ` Vladislav Shpilevoy 2020-06-22 11:19 ` Serge Petrenko 2020-06-22 18:55 ` Serge Petrenko 2020-06-22 21:43 ` Vladislav Shpilevoy 2020-06-23 8:39 ` Serge Petrenko 2020-06-26 22:00 ` Vladislav Shpilevoy 2020-06-19 18:00 ` [Tarantool-patches] [PATCH 2/2] replication: support ROLLBACK and CONFIRM during recovery Serge Petrenko 2020-06-23 11:50 ` Serge Petrenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox