From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 87AAD44643A for ; Sat, 17 Oct 2020 20:18:00 +0300 (MSK) From: Vladislav Shpilevoy Date: Sat, 17 Oct 2020 19:17:56 +0200 Message-Id: <8d052e2a773dace9788603a4d0f85e89dfa99b5e.1602954690.git.v.shpilevoy@tarantool.org> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 2/3] raft: use local LSN in relay recovery restart List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org When a Raft node is elected as a leader, it should resend all its data to the followers from the last acked vclock. Because while the node was not a leader, the other instances ignored all the changes from it. The resending is done via restart of the recovery cursor in the relay thread. When the cursor was restarted, it used the last acked vclock to find the needed xlog file. But it didn't set the local LSN component, which was 0 (replicas don't send it). When in reality the component was not zero, the recovery cursor still tried to find the oldest xlog file having the first local row. And it couldn't. The first created local row may be gone long time ago. The patch makes the restart keep the local LSN component unchanged, as it was used by the previous recovery cursor, before the restart. Closes #5433 --- src/box/relay.cc | 20 ++- .../gh-5433-election-restart-recovery.result | 119 ++++++++++++++++++ ...gh-5433-election-restart-recovery.test.lua | 59 +++++++++ test/replication/suite.cfg | 1 + 4 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 test/replication/gh-5433-election-restart-recovery.result create mode 100644 test/replication/gh-5433-election-restart-recovery.test.lua diff --git a/src/box/relay.cc b/src/box/relay.cc index 096f455a1..285f96108 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -864,8 +864,26 @@ relay_send_initial_join_row(struct xstream *stream, struct xrow_header *row) static void relay_restart_recovery(struct relay *relay) { + /* + * Need to keep the 0 component unchanged. The remote vclock has it + * unset, because it is a local LSN, and it is not sent between the + * instances. Nonetheless, the recovery procedures still need the local + * LSN set to find a proper xlog to start from. If it is not set, the + * recovery process will treat it as 0, and will try to find an xlog + * file having the first local row. Which of course may not exist for a + * long time already. And then it will fail with an xlog gap error. + * + * The currently used local LSN is ok to use, even if it is outdated a + * bit. Because the existing recovery object keeps the current xlog file + * not deleted thanks to xlog GC subsystem. So by using the current + * local LSN and the last confirmed remote LSNs we can be sure such an + * xlog file still exists. + */ + struct vclock restart_vclock; + vclock_copy(&restart_vclock, &relay->recv_vclock); + vclock_reset(&restart_vclock, 0, vclock_get(&relay->r->vclock, 0)); recovery_delete(relay->r); - relay->r = recovery_new(wal_dir(), false, &relay->recv_vclock); + relay->r = recovery_new(wal_dir(), false, &restart_vclock); recover_remaining_wals(relay->r, &relay->stream, NULL, true); } diff --git a/test/replication/gh-5433-election-restart-recovery.result b/test/replication/gh-5433-election-restart-recovery.result new file mode 100644 index 000000000..a8d7893bd --- /dev/null +++ b/test/replication/gh-5433-election-restart-recovery.result @@ -0,0 +1,119 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +box.schema.user.grant('guest', 'super') + | --- + | ... + +old_election_mode = box.cfg.election_mode + | --- + | ... +old_replication_timeout = box.cfg.replication_timeout + | --- + | ... + +test_run:cmd('create server replica with rpl_master=default,\ + script="replication/replica.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica with wait=True, wait_load=True') + | --- + | - true + | ... + +-- +-- gh-5433: leader, when elected, should resend rows to replicas starting from +-- the last acked vclock. Because they ignored all what was sent afterwards as +-- it didn't come from a leader. +-- +-- There was a bug, that the local LSN wasn't passed to the restarted recovery +-- iterator, and it tried to find not existing files. This caused a reconnect +-- and a scary error message. +-- + +-- Increase local LSN so as without it being accounted during recovery cursor +-- restart it would fail, as it won't find the xlogs with first local LSN. Do +-- snapshots to mark the old files for drop. +s = box.schema.create_space('test', {is_local = true}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +for i = 1, 10 do s:replace({i}) end + | --- + | ... +box.snapshot() + | --- + | - ok + | ... +for i = 11, 20 do s:replace({i}) end + | --- + | ... +box.snapshot() + | --- + | - ok + | ... + +-- The log may contain old irrelevant records. Add some garbage and in the next +-- lines check for not more than the added garbage. So the old text won't be +-- touched. +require('log').info(string.rep('a', 1000)) + | --- + | ... + +-- Small timeout to speed up the election. +box.cfg{ \ + replication_timeout = 0.1, \ + election_mode = 'candidate', \ +} + | --- + | ... +test_run:wait_cond(function() return box.info.election.state == 'leader' end) + | --- + | - true + | ... + +test_run:switch('replica') + | --- + | - true + | ... +test_run:wait_cond(function() return box.info.election.leader ~= 0 end) + | --- + | - true + | ... + +test_run:switch('default') + | --- + | - true + | ... +-- When leader was elected, it should have restarted the recovery iterator in +-- the relay. And should have accounted the local LSN. Otherwise will see the +-- XlogGapError. +assert(not test_run:grep_log('default', 'XlogGapError', 1000)) + | --- + | - true + | ... +s:drop() + | --- + | ... + +test_run:cmd('stop server replica') + | --- + | - true + | ... +test_run:cmd('delete server replica') + | --- + | - true + | ... +box.cfg{ \ + election_mode = old_election_mode, \ + replication_timeout = old_replication_timeout, \ +} + | --- + | ... +box.schema.user.revoke('guest', 'super') + | --- + | ... diff --git a/test/replication/gh-5433-election-restart-recovery.test.lua b/test/replication/gh-5433-election-restart-recovery.test.lua new file mode 100644 index 000000000..0339a5504 --- /dev/null +++ b/test/replication/gh-5433-election-restart-recovery.test.lua @@ -0,0 +1,59 @@ +test_run = require('test_run').new() +box.schema.user.grant('guest', 'super') + +old_election_mode = box.cfg.election_mode +old_replication_timeout = box.cfg.replication_timeout + +test_run:cmd('create server replica with rpl_master=default,\ + script="replication/replica.lua"') +test_run:cmd('start server replica with wait=True, wait_load=True') + +-- +-- gh-5433: leader, when elected, should resend rows to replicas starting from +-- the last acked vclock. Because they ignored all what was sent afterwards as +-- it didn't come from a leader. +-- +-- There was a bug, that the local LSN wasn't passed to the restarted recovery +-- iterator, and it tried to find not existing files. This caused a reconnect +-- and a scary error message. +-- + +-- Increase local LSN so as without it being accounted during recovery cursor +-- restart it would fail, as it won't find the xlogs with first local LSN. Do +-- snapshots to mark the old files for drop. +s = box.schema.create_space('test', {is_local = true}) +_ = s:create_index('pk') +for i = 1, 10 do s:replace({i}) end +box.snapshot() +for i = 11, 20 do s:replace({i}) end +box.snapshot() + +-- The log may contain old irrelevant records. Add some garbage and in the next +-- lines check for not more than the added garbage. So the old text won't be +-- touched. +require('log').info(string.rep('a', 1000)) + +-- Small timeout to speed up the election. +box.cfg{ \ + replication_timeout = 0.1, \ + election_mode = 'candidate', \ +} +test_run:wait_cond(function() return box.info.election.state == 'leader' end) + +test_run:switch('replica') +test_run:wait_cond(function() return box.info.election.leader ~= 0 end) + +test_run:switch('default') +-- When leader was elected, it should have restarted the recovery iterator in +-- the relay. And should have accounted the local LSN. Otherwise will see the +-- XlogGapError. +assert(not test_run:grep_log('default', 'XlogGapError', 1000)) +s:drop() + +test_run:cmd('stop server replica') +test_run:cmd('delete server replica') +box.cfg{ \ + election_mode = old_election_mode, \ + replication_timeout = old_replication_timeout, \ +} +box.schema.user.revoke('guest', 'super') diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 766f276a2..8fd62fdb8 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -15,6 +15,7 @@ "gh-4399-misc-no-failure-on-error-reading-wal.test.lua": {}, "gh-4424-misc-orphan-on-reconfiguration-error.test.lua": {}, "gh-5426-election-on-off.test.lua": {}, + "gh-5433-election-restart-recovery.test.lua": {}, "once.test.lua": {}, "on_replace.test.lua": {}, "status.test.lua": {}, -- 2.21.1 (Apple Git-122.3)