From: Vladimir Davydov <vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org Subject: [PATCH 2/2] replication: fix garbage collection logic Date: Tue, 9 Apr 2019 20:07:11 +0300 [thread overview] Message-ID: <44bdf3f4affee3c95cf9b23add24c30dece44151.1554829366.git.vdavydov.dev@gmail.com> (raw) In-Reply-To: <24d1ce5202869981a2182ff1c7947d7d4914bb5b.1554829366.git.vdavydov.dev@gmail.com> In-Reply-To: <24d1ce5202869981a2182ff1c7947d7d4914bb5b.1554829366.git.vdavydov.dev@gmail.com> Currently, the garbage collector works with vclock signatures and doesn't take into account vclock components. This works as long as the caller (i.e. relay) makes sure that it doesn't advance a consumer associated with a replica unless its acknowledged vclock is greater than or equal to the vclock of a WAL file fed to it. The bug is that it does not - it only compares vclock signatures. As a result, if a replica has some local changes or changes pulled from other members of the cluster, which render its signature greater, the master may remove files that are still needed by the replica, permanently breaking replication and requiring rebootstrap. I guess the proper fix would be teaching the garbage collector operate on vclock components rather than signatures, but it's rather difficult to implement. This patch is a quick fix, which simply replaces vclock signature comparison in relay with vclock_compare. Closes #4106 --- https://github.com/tarantool/tarantool/issues/4106 https://github.com/tarantool/tarantool/commits/dv/gh-4106-replica-gc-fix src/box/relay.cc | 12 ++++- test/replication/gc.result | 124 +++++++++++++++++++++++++++++++++++++++++-- test/replication/gc.test.lua | 59 +++++++++++++++++++- 3 files changed, 189 insertions(+), 6 deletions(-) diff --git a/src/box/relay.cc b/src/box/relay.cc index 29570323..e3d7f0ed 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -422,7 +422,17 @@ relay_schedule_pending_gc(struct relay *relay, const struct vclock *vclock) { struct relay_gc_msg *curr, *next, *gc_msg = NULL; stailq_foreach_entry_safe(curr, next, &relay->pending_gc, in_pending) { - if (vclock_sum(&curr->vclock) > vclock_sum(vclock)) + /* + * We may delete a WAL file only if its vclock is + * less than or equal to the vclock acknowledged by + * the replica. Even if the replica's signature is + * is greater, but the vclocks are incomparable, we + * must not delete the WAL, because there may still + * be rows not applied by the replica in it while + * the greater signatures is due to changes pulled + * from other members of the cluster. + */ + if (vclock_compare(&curr->vclock, vclock) > 0) break; stailq_shift(&relay->pending_gc); free(gc_msg); diff --git a/test/replication/gc.result b/test/replication/gc.result index 5b7057ad..65785f47 100644 --- a/test/replication/gc.result +++ b/test/replication/gc.result @@ -404,10 +404,6 @@ test_run:cmd("cleanup server replica") --- - true ... -test_run:cmd("delete server replica") ---- -- true -... _ = s:auto_increment{} --- ... @@ -450,7 +446,127 @@ wait_xlog(0, 10) or fio.listdir('./master') box.cfg{replication = {}} --- ... +s:truncate() +--- +... +-- +-- Check that a master doesn't remove WAL files needed by a replica +-- in case the replica has local changes due to which its vclock has +-- greater signature than the master's (gh-4106). +-- +test_run:cmd("start server replica") +--- +- true +... +-- Temporarily disable replication. +test_run:cmd("switch replica") +--- +- true +... +replication = box.cfg.replication +--- +... +box.cfg{replication = {}} +--- +... +-- Generate some WALs on the master. +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("setopt delimiter ';'") +--- +- true +... +for i = 1, 4 do + for j = 1, 100 do + s:replace{1, i, j} + end + box.snapshot() +end; +--- +... +test_run:cmd("setopt delimiter ''"); +--- +- true +... +-- Stall WAL writes on the replica and reestablish replication, +-- then bump local LSN and break replication on WAL error so that +-- the master sends all rows and receives ack with the replica's +-- vclock, but not all rows are actually applied on the replica. +-- The master must not delete unconfirmed WALs even though the +-- replica's vclock has a greater signature (checked later on). +test_run:cmd("switch replica") +--- +- true +... +box.error.injection.set('ERRINJ_WAL_DELAY', true) +--- +- ok +... +box.cfg{replication_connect_quorum = 0} -- disable sync +--- +... +box.cfg{replication = replication} +--- +... +fiber = require('fiber') +--- +... +test_run:cmd("setopt delimiter ';'"); +--- +- true +... +_ = fiber.create(function() + box.begin() + for i = 1, 1000 do + box.space.test:replace{1, i} + end + box.commit() + box.error.injection.set('ERRINJ_WAL_WRITE_DISK', true) +end); +--- +... +test_run:cmd("setopt delimiter ''"); +--- +- true +... +box.error.injection.set('ERRINJ_WAL_DELAY', false) +--- +- ok +... +-- Restart the replica and wait for it to sync. +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd("start server replica") +--- +- true +... +test_run:wait_lsn('replica', 'default') +--- +... -- Cleanup. +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd("cleanup server replica") +--- +- true +... +test_run:cmd("delete server replica") +--- +- true +... +test_run:cleanup_cluster() +--- +... s:drop() --- ... diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua index a92a6ed7..890fe29a 100644 --- a/test/replication/gc.test.lua +++ b/test/replication/gc.test.lua @@ -189,7 +189,6 @@ box.cfg{replication = replica_port} -- Stop the replica and write a few WALs. test_run:cmd("stop server replica") test_run:cmd("cleanup server replica") -test_run:cmd("delete server replica") _ = s:auto_increment{} box.snapshot() _ = s:auto_increment{} @@ -207,7 +206,65 @@ wait_xlog(0, 10) or fio.listdir('./master') -- Restore the config. box.cfg{replication = {}} +s:truncate() + +-- +-- Check that a master doesn't remove WAL files needed by a replica +-- in case the replica has local changes due to which its vclock has +-- greater signature than the master's (gh-4106). +-- +test_run:cmd("start server replica") + +-- Temporarily disable replication. +test_run:cmd("switch replica") +replication = box.cfg.replication +box.cfg{replication = {}} + +-- Generate some WALs on the master. +test_run:cmd("switch default") +test_run:cmd("setopt delimiter ';'") +for i = 1, 4 do + for j = 1, 100 do + s:replace{1, i, j} + end + box.snapshot() +end; +test_run:cmd("setopt delimiter ''"); + +-- Stall WAL writes on the replica and reestablish replication, +-- then bump local LSN and break replication on WAL error so that +-- the master sends all rows and receives ack with the replica's +-- vclock, but not all rows are actually applied on the replica. +-- The master must not delete unconfirmed WALs even though the +-- replica's vclock has a greater signature (checked later on). +test_run:cmd("switch replica") +box.error.injection.set('ERRINJ_WAL_DELAY', true) +box.cfg{replication_connect_quorum = 0} -- disable sync +box.cfg{replication = replication} +fiber = require('fiber') +test_run:cmd("setopt delimiter ';'"); +_ = fiber.create(function() + box.begin() + for i = 1, 1000 do + box.space.test:replace{1, i} + end + box.commit() + box.error.injection.set('ERRINJ_WAL_WRITE_DISK', true) +end); +test_run:cmd("setopt delimiter ''"); +box.error.injection.set('ERRINJ_WAL_DELAY', false) + +-- Restart the replica and wait for it to sync. +test_run:cmd("switch default") +test_run:cmd("stop server replica") +test_run:cmd("start server replica") +test_run:wait_lsn('replica', 'default') + -- Cleanup. +test_run:cmd("stop server replica") +test_run:cmd("cleanup server replica") +test_run:cmd("delete server replica") +test_run:cleanup_cluster() s:drop() box.error.injection.set("ERRINJ_RELAY_REPORT_INTERVAL", 0) box.schema.user.revoke('guest', 'replication') -- 2.11.0
next prev parent reply other threads:[~2019-04-09 17:07 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-09 17:07 [PATCH 1/2] Revert "replication: update replica gc state on subscribe" Vladimir Davydov 2019-04-09 17:07 ` Vladimir Davydov [this message] 2019-04-11 7:32 ` [tarantool-patches] Re: [PATCH 2/2] replication: fix garbage collection logic Konstantin Osipov 2019-04-11 8:25 ` Vladimir Davydov 2019-04-11 11:01 ` Vladimir Davydov 2019-04-11 10:25 ` [tarantool-patches] Re: [PATCH 1/2] Revert "replication: update replica gc state on subscribe" Konstantin Osipov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=44bdf3f4affee3c95cf9b23add24c30dece44151.1554829366.git.vdavydov.dev@gmail.com \ --to=vdavydov.dev@gmail.com \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH 2/2] replication: fix garbage collection logic' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox