[Tarantool-patches] [PATCH 2/3] raft: use local LSN in relay recovery restart

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org
Subject: [Tarantool-patches] [PATCH 2/3] raft: use local LSN in relay recovery restart
Date: Sat, 17 Oct 2020 19:17:56 +0200	[thread overview]
Message-ID: <8d052e2a773dace9788603a4d0f85e89dfa99b5e.1602954690.git.v.shpilevoy@tarantool.org> (raw)
In-Reply-To: <cover.1602954690.git.v.shpilevoy@tarantool.org>

When a Raft node is elected as a leader, it should resend all its
data to the followers from the last acked vclock. Because while
the node was not a leader, the other instances ignored all the
changes from it.

The resending is done via restart of the recovery cursor in the
relay thread. When the cursor was restarted, it used the last
acked vclock to find the needed xlog file. But it didn't set the
local LSN component, which was 0 (replicas don't send it).

When in reality the component was not zero, the recovery cursor
still tried to find the oldest xlog file having the first local
row. And it couldn't. The first created local row may be gone long
time ago.

The patch makes the restart keep the local LSN component
unchanged, as it was used by the previous recovery cursor, before
the restart.

Closes #5433
---
 src/box/relay.cc                              |  20 ++-
 .../gh-5433-election-restart-recovery.result  | 119 ++++++++++++++++++
 ...gh-5433-election-restart-recovery.test.lua |  59 +++++++++
 test/replication/suite.cfg                    |   1 +
 4 files changed, 198 insertions(+), 1 deletion(-)
 create mode 100644 test/replication/gh-5433-election-restart-recovery.result
 create mode 100644 test/replication/gh-5433-election-restart-recovery.test.lua

diff --git a/src/box/relay.cc b/src/box/relay.cc
index 096f455a1..285f96108 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -864,8 +864,26 @@ relay_send_initial_join_row(struct xstream *stream, struct xrow_header *row)
 static void
 relay_restart_recovery(struct relay *relay)
 {
+	/*
+	 * Need to keep the 0 component unchanged. The remote vclock has it
+	 * unset, because it is a local LSN, and it is not sent between the
+	 * instances. Nonetheless, the recovery procedures still need the local
+	 * LSN set to find a proper xlog to start from. If it is not set, the
+	 * recovery process will treat it as 0, and will try to find an xlog
+	 * file having the first local row. Which of course may not exist for a
+	 * long time already. And then it will fail with an xlog gap error.
+	 *
+	 * The currently used local LSN is ok to use, even if it is outdated a
+	 * bit. Because the existing recovery object keeps the current xlog file
+	 * not deleted thanks to xlog GC subsystem. So by using the current
+	 * local LSN and the last confirmed remote LSNs we can be sure such an
+	 * xlog file still exists.
+	 */
+	struct vclock restart_vclock;
+	vclock_copy(&restart_vclock, &relay->recv_vclock);
+	vclock_reset(&restart_vclock, 0, vclock_get(&relay->r->vclock, 0));
 	recovery_delete(relay->r);
-	relay->r = recovery_new(wal_dir(), false, &relay->recv_vclock);
+	relay->r = recovery_new(wal_dir(), false, &restart_vclock);
 	recover_remaining_wals(relay->r, &relay->stream, NULL, true);
 }
 
diff --git a/test/replication/gh-5433-election-restart-recovery.result b/test/replication/gh-5433-election-restart-recovery.result
new file mode 100644
index 000000000..a8d7893bd
--- /dev/null
+++ b/test/replication/gh-5433-election-restart-recovery.result
@@ -0,0 +1,119 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+box.schema.user.grant('guest', 'super')
+ | ---
+ | ...
+
+old_election_mode = box.cfg.election_mode
+ | ---
+ | ...
+old_replication_timeout = box.cfg.replication_timeout
+ | ---
+ | ...
+
+test_run:cmd('create server replica with rpl_master=default,\
+              script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+--
+-- gh-5433: leader, when elected, should resend rows to replicas starting from
+-- the last acked vclock. Because they ignored all what was sent afterwards as
+-- it didn't come from a leader.
+--
+-- There was a bug, that the local LSN wasn't passed to the restarted recovery
+-- iterator, and it tried to find not existing files. This caused a reconnect
+-- and a scary error message.
+--
+
+-- Increase local LSN so as without it being accounted during recovery cursor
+-- restart it would fail, as it won't find the xlogs with first local LSN. Do
+-- snapshots to mark the old files for drop.
+s = box.schema.create_space('test', {is_local = true})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+for i = 1, 10 do s:replace({i}) end
+ | ---
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+for i = 11, 20 do s:replace({i}) end
+ | ---
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+-- The log may contain old irrelevant records. Add some garbage and in the next
+-- lines check for not more than the added garbage. So the old text won't be
+-- touched.
+require('log').info(string.rep('a', 1000))
+ | ---
+ | ...
+
+-- Small timeout to speed up the election.
+box.cfg{                                                                        \
+    replication_timeout = 0.1,                                                  \
+    election_mode = 'candidate',                                                \
+}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.state == 'leader' end)
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.info.election.leader ~= 0 end)
+ | ---
+ | - true
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+-- When leader was elected, it should have restarted the recovery iterator in
+-- the relay. And should have accounted the local LSN. Otherwise will see the
+-- XlogGapError.
+assert(not test_run:grep_log('default', 'XlogGapError', 1000))
+ | ---
+ | - true
+ | ...
+s:drop()
+ | ---
+ | ...
+
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.cfg{                                                                        \
+    election_mode = old_election_mode,                                          \
+    replication_timeout = old_replication_timeout,                              \
+}
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'super')
+ | ---
+ | ...
diff --git a/test/replication/gh-5433-election-restart-recovery.test.lua b/test/replication/gh-5433-election-restart-recovery.test.lua
new file mode 100644
index 000000000..0339a5504
--- /dev/null
+++ b/test/replication/gh-5433-election-restart-recovery.test.lua
@@ -0,0 +1,59 @@
+test_run = require('test_run').new()
+box.schema.user.grant('guest', 'super')
+
+old_election_mode = box.cfg.election_mode
+old_replication_timeout = box.cfg.replication_timeout
+
+test_run:cmd('create server replica with rpl_master=default,\
+              script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+--
+-- gh-5433: leader, when elected, should resend rows to replicas starting from
+-- the last acked vclock. Because they ignored all what was sent afterwards as
+-- it didn't come from a leader.
+--
+-- There was a bug, that the local LSN wasn't passed to the restarted recovery
+-- iterator, and it tried to find not existing files. This caused a reconnect
+-- and a scary error message.
+--
+
+-- Increase local LSN so as without it being accounted during recovery cursor
+-- restart it would fail, as it won't find the xlogs with first local LSN. Do
+-- snapshots to mark the old files for drop.
+s = box.schema.create_space('test', {is_local = true})
+_ = s:create_index('pk')
+for i = 1, 10 do s:replace({i}) end
+box.snapshot()
+for i = 11, 20 do s:replace({i}) end
+box.snapshot()
+
+-- The log may contain old irrelevant records. Add some garbage and in the next
+-- lines check for not more than the added garbage. So the old text won't be
+-- touched.
+require('log').info(string.rep('a', 1000))
+
+-- Small timeout to speed up the election.
+box.cfg{                                                                        \
+    replication_timeout = 0.1,                                                  \
+    election_mode = 'candidate',                                                \
+}
+test_run:wait_cond(function() return box.info.election.state == 'leader' end)
+
+test_run:switch('replica')
+test_run:wait_cond(function() return box.info.election.leader ~= 0 end)
+
+test_run:switch('default')
+-- When leader was elected, it should have restarted the recovery iterator in
+-- the relay. And should have accounted the local LSN. Otherwise will see the
+-- XlogGapError.
+assert(not test_run:grep_log('default', 'XlogGapError', 1000))
+s:drop()
+
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.cfg{                                                                        \
+    election_mode = old_election_mode,                                          \
+    replication_timeout = old_replication_timeout,                              \
+}
+box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 766f276a2..8fd62fdb8 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -15,6 +15,7 @@
     "gh-4399-misc-no-failure-on-error-reading-wal.test.lua": {},
     "gh-4424-misc-orphan-on-reconfiguration-error.test.lua": {},
     "gh-5426-election-on-off.test.lua": {},
+    "gh-5433-election-restart-recovery.test.lua": {},
     "once.test.lua": {},
     "on_replace.test.lua": {},
     "status.test.lua": {},
-- 
2.21.1 (Apple Git-122.3)

next prev parent reply	other threads:[~2020-10-17 17:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-17 17:17 [Tarantool-patches] [PATCH 0/3] Raft on leader election " Vladislav Shpilevoy
2020-10-17 17:17 ` [Tarantool-patches] [PATCH 1/3] raft: send state to new subscribers if Raft worked Vladislav Shpilevoy
2020-10-20 20:43   ` Vladislav Shpilevoy
2020-10-21 11:41     ` Serge Petrenko
2020-10-21 21:41       ` Vladislav Shpilevoy
2020-10-22  8:53         ` Alexander V. Tikhonov
2020-10-17 17:17 ` Vladislav Shpilevoy [this message]
2020-10-17 17:17 ` [Tarantool-patches] [PATCH 3/3] raft: don't drop GC when restart relay recovery Vladislav Shpilevoy
2020-10-19  9:36 ` [Tarantool-patches] [PATCH 0/3] Raft on leader election recovery restart Serge Petrenko
2020-10-19 20:26   ` Vladislav Shpilevoy
2020-10-20  8:18     ` Serge Petrenko
2020-10-22  8:55 ` Alexander V. Tikhonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d052e2a773dace9788603a4d0f85e89dfa99b5e.1602954690.git.v.shpilevoy@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/3] raft: use local LSN in relay recovery restart' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox