From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org Subject: [Tarantool-patches] [PATCH 2/3] raft: use local LSN in relay recovery restart Date: Sat, 17 Oct 2020 19:17:56 +0200 [thread overview] Message-ID: <8d052e2a773dace9788603a4d0f85e89dfa99b5e.1602954690.git.v.shpilevoy@tarantool.org> (raw) In-Reply-To: <cover.1602954690.git.v.shpilevoy@tarantool.org> When a Raft node is elected as a leader, it should resend all its data to the followers from the last acked vclock. Because while the node was not a leader, the other instances ignored all the changes from it. The resending is done via restart of the recovery cursor in the relay thread. When the cursor was restarted, it used the last acked vclock to find the needed xlog file. But it didn't set the local LSN component, which was 0 (replicas don't send it). When in reality the component was not zero, the recovery cursor still tried to find the oldest xlog file having the first local row. And it couldn't. The first created local row may be gone long time ago. The patch makes the restart keep the local LSN component unchanged, as it was used by the previous recovery cursor, before the restart. Closes #5433 --- src/box/relay.cc | 20 ++- .../gh-5433-election-restart-recovery.result | 119 ++++++++++++++++++ ...gh-5433-election-restart-recovery.test.lua | 59 +++++++++ test/replication/suite.cfg | 1 + 4 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 test/replication/gh-5433-election-restart-recovery.result create mode 100644 test/replication/gh-5433-election-restart-recovery.test.lua diff --git a/src/box/relay.cc b/src/box/relay.cc index 096f455a1..285f96108 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -864,8 +864,26 @@ relay_send_initial_join_row(struct xstream *stream, struct xrow_header *row) static void relay_restart_recovery(struct relay *relay) { + /* + * Need to keep the 0 component unchanged. The remote vclock has it + * unset, because it is a local LSN, and it is not sent between the + * instances. Nonetheless, the recovery procedures still need the local + * LSN set to find a proper xlog to start from. If it is not set, the + * recovery process will treat it as 0, and will try to find an xlog + * file having the first local row. Which of course may not exist for a + * long time already. And then it will fail with an xlog gap error. + * + * The currently used local LSN is ok to use, even if it is outdated a + * bit. Because the existing recovery object keeps the current xlog file + * not deleted thanks to xlog GC subsystem. So by using the current + * local LSN and the last confirmed remote LSNs we can be sure such an + * xlog file still exists. + */ + struct vclock restart_vclock; + vclock_copy(&restart_vclock, &relay->recv_vclock); + vclock_reset(&restart_vclock, 0, vclock_get(&relay->r->vclock, 0)); recovery_delete(relay->r); - relay->r = recovery_new(wal_dir(), false, &relay->recv_vclock); + relay->r = recovery_new(wal_dir(), false, &restart_vclock); recover_remaining_wals(relay->r, &relay->stream, NULL, true); } diff --git a/test/replication/gh-5433-election-restart-recovery.result b/test/replication/gh-5433-election-restart-recovery.result new file mode 100644 index 000000000..a8d7893bd --- /dev/null +++ b/test/replication/gh-5433-election-restart-recovery.result @@ -0,0 +1,119 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +box.schema.user.grant('guest', 'super') + | --- + | ... + +old_election_mode = box.cfg.election_mode + | --- + | ... +old_replication_timeout = box.cfg.replication_timeout + | --- + | ... + +test_run:cmd('create server replica with rpl_master=default,\ + script="replication/replica.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica with wait=True, wait_load=True') + | --- + | - true + | ... + +-- +-- gh-5433: leader, when elected, should resend rows to replicas starting from +-- the last acked vclock. Because they ignored all what was sent afterwards as +-- it didn't come from a leader. +-- +-- There was a bug, that the local LSN wasn't passed to the restarted recovery +-- iterator, and it tried to find not existing files. This caused a reconnect +-- and a scary error message. +-- + +-- Increase local LSN so as without it being accounted during recovery cursor +-- restart it would fail, as it won't find the xlogs with first local LSN. Do +-- snapshots to mark the old files for drop. +s = box.schema.create_space('test', {is_local = true}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +for i = 1, 10 do s:replace({i}) end + | --- + | ... +box.snapshot() + | --- + | - ok + | ... +for i = 11, 20 do s:replace({i}) end + | --- + | ... +box.snapshot() + | --- + | - ok + | ... + +-- The log may contain old irrelevant records. Add some garbage and in the next +-- lines check for not more than the added garbage. So the old text won't be +-- touched. +require('log').info(string.rep('a', 1000)) + | --- + | ... + +-- Small timeout to speed up the election. +box.cfg{ \ + replication_timeout = 0.1, \ + election_mode = 'candidate', \ +} + | --- + | ... +test_run:wait_cond(function() return box.info.election.state == 'leader' end) + | --- + | - true + | ... + +test_run:switch('replica') + | --- + | - true + | ... +test_run:wait_cond(function() return box.info.election.leader ~= 0 end) + | --- + | - true + | ... + +test_run:switch('default') + | --- + | - true + | ... +-- When leader was elected, it should have restarted the recovery iterator in +-- the relay. And should have accounted the local LSN. Otherwise will see the +-- XlogGapError. +assert(not test_run:grep_log('default', 'XlogGapError', 1000)) + | --- + | - true + | ... +s:drop() + | --- + | ... + +test_run:cmd('stop server replica') + | --- + | - true + | ... +test_run:cmd('delete server replica') + | --- + | - true + | ... +box.cfg{ \ + election_mode = old_election_mode, \ + replication_timeout = old_replication_timeout, \ +} + | --- + | ... +box.schema.user.revoke('guest', 'super') + | --- + | ... diff --git a/test/replication/gh-5433-election-restart-recovery.test.lua b/test/replication/gh-5433-election-restart-recovery.test.lua new file mode 100644 index 000000000..0339a5504 --- /dev/null +++ b/test/replication/gh-5433-election-restart-recovery.test.lua @@ -0,0 +1,59 @@ +test_run = require('test_run').new() +box.schema.user.grant('guest', 'super') + +old_election_mode = box.cfg.election_mode +old_replication_timeout = box.cfg.replication_timeout + +test_run:cmd('create server replica with rpl_master=default,\ + script="replication/replica.lua"') +test_run:cmd('start server replica with wait=True, wait_load=True') + +-- +-- gh-5433: leader, when elected, should resend rows to replicas starting from +-- the last acked vclock. Because they ignored all what was sent afterwards as +-- it didn't come from a leader. +-- +-- There was a bug, that the local LSN wasn't passed to the restarted recovery +-- iterator, and it tried to find not existing files. This caused a reconnect +-- and a scary error message. +-- + +-- Increase local LSN so as without it being accounted during recovery cursor +-- restart it would fail, as it won't find the xlogs with first local LSN. Do +-- snapshots to mark the old files for drop. +s = box.schema.create_space('test', {is_local = true}) +_ = s:create_index('pk') +for i = 1, 10 do s:replace({i}) end +box.snapshot() +for i = 11, 20 do s:replace({i}) end +box.snapshot() + +-- The log may contain old irrelevant records. Add some garbage and in the next +-- lines check for not more than the added garbage. So the old text won't be +-- touched. +require('log').info(string.rep('a', 1000)) + +-- Small timeout to speed up the election. +box.cfg{ \ + replication_timeout = 0.1, \ + election_mode = 'candidate', \ +} +test_run:wait_cond(function() return box.info.election.state == 'leader' end) + +test_run:switch('replica') +test_run:wait_cond(function() return box.info.election.leader ~= 0 end) + +test_run:switch('default') +-- When leader was elected, it should have restarted the recovery iterator in +-- the relay. And should have accounted the local LSN. Otherwise will see the +-- XlogGapError. +assert(not test_run:grep_log('default', 'XlogGapError', 1000)) +s:drop() + +test_run:cmd('stop server replica') +test_run:cmd('delete server replica') +box.cfg{ \ + election_mode = old_election_mode, \ + replication_timeout = old_replication_timeout, \ +} +box.schema.user.revoke('guest', 'super') diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 766f276a2..8fd62fdb8 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -15,6 +15,7 @@ "gh-4399-misc-no-failure-on-error-reading-wal.test.lua": {}, "gh-4424-misc-orphan-on-reconfiguration-error.test.lua": {}, "gh-5426-election-on-off.test.lua": {}, + "gh-5433-election-restart-recovery.test.lua": {}, "once.test.lua": {}, "on_replace.test.lua": {}, "status.test.lua": {}, -- 2.21.1 (Apple Git-122.3)
next prev parent reply other threads:[~2020-10-17 17:18 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-17 17:17 [Tarantool-patches] [PATCH 0/3] Raft on leader election " Vladislav Shpilevoy 2020-10-17 17:17 ` [Tarantool-patches] [PATCH 1/3] raft: send state to new subscribers if Raft worked Vladislav Shpilevoy 2020-10-20 20:43 ` Vladislav Shpilevoy 2020-10-21 11:41 ` Serge Petrenko 2020-10-21 21:41 ` Vladislav Shpilevoy 2020-10-22 8:53 ` Alexander V. Tikhonov 2020-10-17 17:17 ` Vladislav Shpilevoy [this message] 2020-10-17 17:17 ` [Tarantool-patches] [PATCH 3/3] raft: don't drop GC when restart relay recovery Vladislav Shpilevoy 2020-10-19 9:36 ` [Tarantool-patches] [PATCH 0/3] Raft on leader election recovery restart Serge Petrenko 2020-10-19 20:26 ` Vladislav Shpilevoy 2020-10-20 8:18 ` Serge Petrenko 2020-10-22 8:55 ` Alexander V. Tikhonov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=8d052e2a773dace9788603a4d0f85e89dfa99b5e.1602954690.git.v.shpilevoy@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/3] raft: use local LSN in relay recovery restart' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox