<HTML><BODY><div>Hi!<br>Thanks for the patch, LGTM.</div><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Serge Petrenko</div></div></div><div> </div><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">Вторник, 11 августа 2020, 1:18 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:<br> <div id=""><div class="js-helper js-readmsg-msg"><style type="text/css"></style><div><div id="style_15970978860020428146_BODY">Box.snapshot() could include rolled back data in case synchronous<br>transaction ROLLBACK arrived during WAL rotation in preparation of<br>a checkpoint.<br><br>More specifically, snapshot consists of fixating the engines'<br>content (creation of a read-view), doing WAL rotation, and writing<br>the snapshot itself. All data changes after content fixation won't<br>go into the snap. So if ROLLBACK arrives during WAL rotation, the<br>fixated content will have rolled back data, not present in the<br>newest dataset.<br><br>The patch makes it fail if during WAL rotation anything was rolled<br>back. The bug sometimes appeared in an existing test about qsync<br>snapshots, but with a very poor reproducibility. In a new test<br>file it is reproduced 100% without the patch.<br><br>Closes #5167<br>---<br>Branch: <a href="http://github.com/tarantool/tarantool/tree/gerold103/gh-5167-flaky-qsync-snapshots" target="_blank">http://github.com/tarantool/tarantool/tree/gerold103/gh-5167-flaky-qsync-snapshots</a><br>Issue: <a href="https://github.com/tarantool/tarantool/issues/5167" target="_blank">https://github.com/tarantool/tarantool/issues/5167</a><br><br>@ChangeLog<br>* Snapshot could contain changes from a rolled back synchronous transaction (gh-5167).<br><br> src/box/gc.c | 12 +-<br> src/box/wal.c | 4 +<br> src/lib/core/errinj.h | 1 +<br> test/box/errinj.result | 1 +<br> .../gh-5167-qsync-rollback-snap.result | 165 ++++++++++++++++++<br> .../gh-5167-qsync-rollback-snap.test.lua | 67 +++++++<br> test/replication/suite.ini | 2 +-<br> 7 files changed, 250 insertions(+), 2 deletions(-)<br> create mode 100644 test/replication/gh-5167-qsync-rollback-snap.result<br> create mode 100644 test/replication/gh-5167-qsync-rollback-snap.test.lua<br><br>diff --git a/src/box/gc.c b/src/box/gc.c<br>index d50a64d66..76f7c6325 100644<br>--- a/src/box/gc.c<br>+++ b/src/box/gc.c<br>@@ -382,6 +382,7 @@ gc_do_checkpoint(bool is_scheduled)<br> {<br>  int rc;<br>  struct wal_checkpoint checkpoint;<br>+ int64_t limbo_rollback_count = txn_limbo.rollback_count;<br> <br>  assert(!gc.checkpoint_is_in_progress);<br>  gc.checkpoint_is_in_progress = true;<br>@@ -396,7 +397,16 @@ gc_do_checkpoint(bool is_scheduled)<br>  rc = wal_begin_checkpoint(&checkpoint);<br>  if (rc != 0)<br>  goto out;<br>-<br>+ /*<br>+ * Check if the checkpoint contains rolled back data. That makes the<br>+ * checkpoint not self-sufficient - it needs the xlog file with<br>+ * ROLLBACK. Drop it.<br>+ */<br>+ if (txn_limbo.rollback_count != limbo_rollback_count) {<br>+ rc = -1;<br>+ diag_set(ClientError, ER_SYNC_ROLLBACK);<br>+ goto out;<br>+ }<br>  /*<br>  * Wait the confirms on all "sync" transactions before<br>  * create a snapshot.<br>diff --git a/src/box/wal.c b/src/box/wal.c<br>index 220e68245..d8c92aa36 100644<br>--- a/src/box/wal.c<br>+++ b/src/box/wal.c<br>@@ -662,6 +662,7 @@ wal_begin_checkpoint_f(struct cbus_call_msg *data)<br>  }<br>  vclock_copy(&msg->vclock, &writer->vclock);<br>  msg->wal_size = writer->checkpoint_wal_size;<br>+ ERROR_INJECT_SLEEP(ERRINJ_WAL_DELAY);<br>  return 0;<br> }<br> <br>@@ -1272,6 +1273,9 @@ wal_write_async(struct journal *journal, struct journal_entry *entry)<br>  writer->last_entry = entry;<br>  batch->approx_len += entry->approx_len;<br>  writer->wal_pipe.n_input += entry->n_rows * XROW_IOVMAX;<br>+#ifndef NDEBUG<br>+ ++errinj(ERRINJ_WAL_WRITE_COUNT, ERRINJ_INT)->iparam;<br>+#endif<br>  cpipe_flush_input(&writer->wal_pipe);<br>  return 0;<br> <br>diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h<br>index aace8736f..814c57c2e 100644<br>--- a/src/lib/core/errinj.h<br>+++ b/src/lib/core/errinj.h<br>@@ -82,6 +82,7 @@ struct errinj {<br>  _(ERRINJ_WAL_DELAY, ERRINJ_BOOL, {.bparam = false}) \<br>  _(ERRINJ_WAL_DELAY_COUNTDOWN, ERRINJ_INT, {.iparam = -1}) \<br>  _(ERRINJ_WAL_FALLOCATE, ERRINJ_INT, {.iparam = 0}) \<br>+ _(ERRINJ_WAL_WRITE_COUNT, ERRINJ_INT, {.iparam = 0}) \<br>  _(ERRINJ_INDEX_ALLOC, ERRINJ_BOOL, {.bparam = false}) \<br>  _(ERRINJ_TUPLE_ALLOC, ERRINJ_BOOL, {.bparam = false}) \<br>  _(ERRINJ_TUPLE_FIELD, ERRINJ_BOOL, {.bparam = false}) \<br>diff --git a/test/box/errinj.result b/test/box/errinj.result<br>index 4bea0f46f..613d22c51 100644<br>--- a/test/box/errinj.result<br>+++ b/test/box/errinj.result<br>@@ -114,6 +114,7 @@ evals<br>   - ERRINJ_WAL_ROTATE: false<br>   - ERRINJ_WAL_SYNC: false<br>   - ERRINJ_WAL_WRITE: false<br>+ - ERRINJ_WAL_WRITE_COUNT: 3<br>   - ERRINJ_WAL_WRITE_DISK: false<br>   - ERRINJ_WAL_WRITE_EOF: false<br>   - ERRINJ_WAL_WRITE_PARTIAL: -1<br>diff --git a/test/replication/gh-5167-qsync-rollback-snap.result b/test/replication/gh-5167-qsync-rollback-snap.result<br>new file mode 100644<br>index 000000000..06f58526c<br>--- /dev/null<br>+++ b/test/replication/gh-5167-qsync-rollback-snap.result<br>@@ -0,0 +1,165 @@<br>+-- test-run result file version 2<br>+test_run = require('test_run').new()<br>+ | ---<br>+ | ...<br>+engine = test_run:get_cfg('engine')<br>+ | ---<br>+ | ...<br>+<br>+orig_synchro_quorum = box.cfg.replication_synchro_quorum<br>+ | ---<br>+ | ...<br>+orig_synchro_timeout = box.cfg.replication_synchro_timeout<br>+ | ---<br>+ | ...<br>+box.schema.user.grant('guest', 'super')<br>+ | ---<br>+ | ...<br>+<br>+test_run:cmd('create server replica with rpl_master=default,\<br>+ script="replication/replica.lua"')<br>+ | ---<br>+ | - true<br>+ | ...<br>+test_run:cmd('start server replica with wait=True, wait_load=True')<br>+ | ---<br>+ | - true<br>+ | ...<br>+<br>+--<br>+-- gh-5167:<br>+--<br>+fiber = require('fiber')<br>+ | ---<br>+ | ...<br>+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}<br>+ | ---<br>+ | ...<br>+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})<br>+ | ---<br>+ | ...<br>+_ = box.space.sync:create_index('pk')<br>+ | ---<br>+ | ...<br>+-- Write something to flush the current master's state to replica.<br>+_ = box.space.sync:insert{1}<br>+ | ---<br>+ | ...<br>+_ = box.space.sync:delete{1}<br>+ | ---<br>+ | ...<br>+<br>+box.cfg{replication_synchro_quorum = 3}<br>+ | ---<br>+ | ...<br>+ok, err = nil<br>+ | ---<br>+ | ...<br>+f = fiber.create(function() \<br>+ ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \<br>+end)<br>+ | ---<br>+ | ...<br>+<br>+test_run:switch('replica')<br>+ | ---<br>+ | - true<br>+ | ...<br>+fiber = require('fiber')<br>+ | ---<br>+ | ...<br>+test_run:wait_cond(function() return box.space.sync:count() == 1 end)<br>+ | ---<br>+ | - true<br>+ | ...<br>+-- Snapshot will stuck in WAL thread on rotation before starting wait on the<br>+-- limbo.<br>+box.error.injection.set("ERRINJ_WAL_DELAY", true)<br>+ | ---<br>+ | - ok<br>+ | ...<br>+wal_write_count = box.error.injection.get("ERRINJ_WAL_WRITE_COUNT")<br>+ | ---<br>+ | ...<br>+ok, err = nil<br>+ | ---<br>+ | ...<br>+f = fiber.create(function() ok, err = pcall(box.snapshot) end)<br>+ | ---<br>+ | ...<br>+<br>+test_run:switch('default')<br>+ | ---<br>+ | - true<br>+ | ...<br>+box.cfg{replication_synchro_timeout = 0.0001}<br>+ | ---<br>+ | ...<br>+test_run:wait_cond(function() return f:status() == 'dead' end)<br>+ | ---<br>+ | - true<br>+ | ...<br>+ok, err<br>+ | ---<br>+ | - false<br>+ | - Quorum collection for a synchronous transaction is timed out<br>+ | ...<br>+box.space.sync:select{}<br>+ | ---<br>+ | - []<br>+ | ...<br>+<br>+test_run:switch('replica')<br>+ | ---<br>+ | - true<br>+ | ...<br>+-- Rollback was received. Note, it is not legit to look for space:count() == 0.<br>+-- Because ideally ROLLBACK should not be applied before written to WAL. That<br>+-- means count() will be > 0 until WAL write succeeds.<br>+test_run:wait_cond(function() \<br>+ return box.error.injection.get("ERRINJ_WAL_WRITE_COUNT") > wal_write_count \<br>+end)<br>+ | ---<br>+ | - true<br>+ | ...<br>+-- Now WAL rotation is done. Snapshot will fail, because will see that a<br>+-- rollback happened during that. Meaning that the rotated WAL contains<br>+-- not confirmed data, and it can't be used as a checkpoint.<br>+box.error.injection.set("ERRINJ_WAL_DELAY", false)<br>+ | ---<br>+ | - ok<br>+ | ...<br>+test_run:wait_cond(function() return f:status() == 'dead' end)<br>+ | ---<br>+ | - true<br>+ | ...<br>+ok, err<br>+ | ---<br>+ | - false<br>+ | - A rollback for a synchronous transaction is received<br>+ | ...<br>+<br>+test_run:switch('default')<br>+ | ---<br>+ | - true<br>+ | ...<br>+box.space.sync:drop()<br>+ | ---<br>+ | ...<br>+test_run:cmd('stop server replica')<br>+ | ---<br>+ | - true<br>+ | ...<br>+test_run:cmd('delete server replica')<br>+ | ---<br>+ | - true<br>+ | ...<br>+box.schema.user.revoke('guest', 'super')<br>+ | ---<br>+ | ...<br>+box.cfg{ \<br>+ replication_synchro_quorum = orig_synchro_quorum, \<br>+ replication_synchro_timeout = orig_synchro_timeout, \<br>+}<br>+ | ---<br>+ | ...<br>diff --git a/test/replication/gh-5167-qsync-rollback-snap.test.lua b/test/replication/gh-5167-qsync-rollback-snap.test.lua<br>new file mode 100644<br>index 000000000..475727e61<br>--- /dev/null<br>+++ b/test/replication/gh-5167-qsync-rollback-snap.test.lua<br>@@ -0,0 +1,67 @@<br>+test_run = require('test_run').new()<br>+engine = test_run:get_cfg('engine')<br>+<br>+orig_synchro_quorum = box.cfg.replication_synchro_quorum<br>+orig_synchro_timeout = box.cfg.replication_synchro_timeout<br>+box.schema.user.grant('guest', 'super')<br>+<br>+test_run:cmd('create server replica with rpl_master=default,\<br>+ script="replication/replica.lua"')<br>+test_run:cmd('start server replica with wait=True, wait_load=True')<br>+<br>+--<br>+-- gh-5167:<br>+--<br>+fiber = require('fiber')<br>+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}<br>+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})<br>+_ = box.space.sync:create_index('pk')<br>+-- Write something to flush the current master's state to replica.<br>+_ = box.space.sync:insert{1}<br>+_ = box.space.sync:delete{1}<br>+<br>+box.cfg{replication_synchro_quorum = 3}<br>+ok, err = nil<br>+f = fiber.create(function() \<br>+ ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \<br>+end)<br>+<br>+test_run:switch('replica')<br>+fiber = require('fiber')<br>+test_run:wait_cond(function() return box.space.sync:count() == 1 end)<br>+-- Snapshot will stuck in WAL thread on rotation before starting wait on the<br>+-- limbo.<br>+box.error.injection.set("ERRINJ_WAL_DELAY", true)<br>+wal_write_count = box.error.injection.get("ERRINJ_WAL_WRITE_COUNT")<br>+ok, err = nil<br>+f = fiber.create(function() ok, err = pcall(box.snapshot) end)<br>+<br>+test_run:switch('default')<br>+box.cfg{replication_synchro_timeout = 0.0001}<br>+test_run:wait_cond(function() return f:status() == 'dead' end)<br>+ok, err<br>+box.space.sync:select{}<br>+<br>+test_run:switch('replica')<br>+-- Rollback was received. Note, it is not legit to look for space:count() == 0.<br>+-- Because ideally ROLLBACK should not be applied before written to WAL. That<br>+-- means count() will be > 0 until WAL write succeeds.<br>+test_run:wait_cond(function() \<br>+ return box.error.injection.get("ERRINJ_WAL_WRITE_COUNT") > wal_write_count \<br>+end)<br>+-- Now WAL rotation is done. Snapshot will fail, because will see that a<br>+-- rollback happened during that. Meaning that the rotated WAL contains<br>+-- not confirmed data, and it can't be used as a checkpoint.<br>+box.error.injection.set("ERRINJ_WAL_DELAY", false)<br>+test_run:wait_cond(function() return f:status() == 'dead' end)<br>+ok, err<br>+<br>+test_run:switch('default')<br>+box.space.sync:drop()<br>+test_run:cmd('stop server replica')<br>+test_run:cmd('delete server replica')<br>+box.schema.user.revoke('guest', 'super')<br>+box.cfg{ \<br>+ replication_synchro_quorum = orig_synchro_quorum, \<br>+ replication_synchro_timeout = orig_synchro_timeout, \<br>+}<br>diff --git a/test/replication/suite.ini b/test/replication/suite.ini<br>index 73f73eb89..247289d58 100644<br>--- a/test/replication/suite.ini<br>+++ b/test/replication/suite.ini<br>@@ -3,7 +3,7 @@ core = tarantool<br> script = master.lua<br> description = tarantool/box, replication<br> disabled = consistent.test.lua<br>-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua<br>+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-rollback-snap.test.lua<br> config = suite.cfg<br> lua_libs = lua/fast_replica.lua lua/rlimit.lua<br> use_unix_sockets = True<br>--<br>2.21.1 (Apple Git-122.3)<br> </div></div></div></div></blockquote><div> </div></BODY></HTML>