* [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence
@ 2020-11-07 16:45 Vladislav Shpilevoy
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 1/2] raft: fix crash on sm restart during WAL write Vladislav Shpilevoy
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-07 16:45 UTC (permalink / raw)
To: tarantool-patches, sergepetrenko, avtikhon
The patchset fixes 2 crashes related to Raft WAL write being in progress,
when the state machine is restarted, and when it is configured to be a candidate
with a known leader.
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5506-raft-crash
Issue: https://github.com/tarantool/tarantool/issues/5506
@ChangeLog
* Fixed a couple of crashes on various tweaks of election mode (gh-5506).
Vladislav Shpilevoy (2):
raft: fix crash on sm restart during WAL write
raft: fix crash on candidate cfg during WAL write
src/box/raft.c | 23 ++-
.../gh-5506-election-on-off.result | 140 ++++++++++++++++++
.../gh-5506-election-on-off.test.lua | 68 +++++++++
test/replication/suite.cfg | 1 +
test/replication/suite.ini | 2 +-
5 files changed, 225 insertions(+), 9 deletions(-)
create mode 100644 test/replication/gh-5506-election-on-off.result
create mode 100644 test/replication/gh-5506-election-on-off.test.lua
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Tarantool-patches] [PATCH 1/2] raft: fix crash on sm restart during WAL write
2020-11-07 16:45 [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Vladislav Shpilevoy
@ 2020-11-07 16:45 ` Vladislav Shpilevoy
2020-11-09 10:16 ` Serge Petrenko
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg " Vladislav Shpilevoy
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-07 16:45 UTC (permalink / raw)
To: tarantool-patches, sergepetrenko, avtikhon
Raft state machine crashed if was restarted during a WAL write
being in progress. When the machine was started, it didn't assume
there still can be a not finished WAL write from the time it was
enabled earlier.
The patch makes it continue waiting for the write end.
Part of #5506
---
src/box/raft.c | 13 ++++-
.../gh-5506-election-on-off.result | 55 +++++++++++++++++++
.../gh-5506-election-on-off.test.lua | 31 +++++++++++
test/replication/suite.cfg | 1 +
test/replication/suite.ini | 2 +-
5 files changed, 98 insertions(+), 4 deletions(-)
create mode 100644 test/replication/gh-5506-election-on-off.result
create mode 100644 test/replication/gh-5506-election-on-off.test.lua
diff --git a/src/box/raft.c b/src/box/raft.c
index 914b0d68f..3a99a0f26 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -857,13 +857,20 @@ raft_sm_start(void)
{
say_info("RAFT: start state machine");
assert(!ev_is_active(&raft.timer));
- assert(!raft.is_write_in_progress);
assert(!raft.is_enabled);
assert(raft.state == RAFT_STATE_FOLLOWER);
raft.is_enabled = true;
raft.is_candidate = raft.is_cfg_candidate;
- if (!raft.is_candidate) {
- /* Nop. */;
+ if (raft.is_write_in_progress) {
+ /*
+ * Nop. If write is in progress, the state machine is frozen. It
+ * is continued when write ends.
+ */
+ } else if (!raft.is_candidate) {
+ /*
+ * Nop. When a node is not a candidate, it can't initiate
+ * elections anyway, so it does not need to monitor the leader.
+ */
} else if (raft.leader != 0) {
raft_sm_wait_leader_dead();
} else {
diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
new file mode 100644
index 000000000..1a718396f
--- /dev/null
+++ b/test/replication/gh-5506-election-on-off.result
@@ -0,0 +1,55 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+old_election_mode = box.cfg.election_mode
+ | ---
+ | ...
+old_replication_timeout = box.cfg.replication_timeout
+ | ---
+ | ...
+
+--
+-- gh-5506: Raft state machine crashed in case there was a WAL write in
+-- progress, and Raft was disabled + enabled back immediately. It didn't assume
+-- that there can be a not finished WAL write when Raft is just enabled.
+--
+
+-- Start a WAL write and wait when it starts.
+box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
+ | ---
+ | - ok
+ | ...
+box.cfg{ \
+ election_mode = 'candidate', \
+ replication_timeout = 0.1, \
+}
+ | ---
+ | ...
+test_run:wait_cond(function() \
+ return box.error.injection.get("ERRINJ_WAL_DELAY") \
+end)
+ | ---
+ | - true
+ | ...
+
+-- Restart the state machine. It should notice the not finished WAL write and
+-- continue waiting.
+box.cfg{election_mode = 'off'}
+ | ---
+ | ...
+box.cfg{election_mode = 'candidate'}
+ | ---
+ | ...
+box.error.injection.set("ERRINJ_WAL_DELAY", false)
+ | ---
+ | - ok
+ | ...
+
+box.cfg{ \
+ election_mode = old_election_mode, \
+ replication_timeout = old_replication_timeout, \
+}
+ | ---
+ | ...
diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
new file mode 100644
index 000000000..290408f06
--- /dev/null
+++ b/test/replication/gh-5506-election-on-off.test.lua
@@ -0,0 +1,31 @@
+test_run = require('test_run').new()
+
+old_election_mode = box.cfg.election_mode
+old_replication_timeout = box.cfg.replication_timeout
+
+--
+-- gh-5506: Raft state machine crashed in case there was a WAL write in
+-- progress, and Raft was disabled + enabled back immediately. It didn't assume
+-- that there can be a not finished WAL write when Raft is just enabled.
+--
+
+-- Start a WAL write and wait when it starts.
+box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
+box.cfg{ \
+ election_mode = 'candidate', \
+ replication_timeout = 0.1, \
+}
+test_run:wait_cond(function() \
+ return box.error.injection.get("ERRINJ_WAL_DELAY") \
+end)
+
+-- Restart the state machine. It should notice the not finished WAL write and
+-- continue waiting.
+box.cfg{election_mode = 'off'}
+box.cfg{election_mode = 'candidate'}
+box.error.injection.set("ERRINJ_WAL_DELAY", false)
+
+box.cfg{ \
+ election_mode = old_election_mode, \
+ replication_timeout = old_replication_timeout, \
+}
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 8fd62fdb8..f2addebda 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -16,6 +16,7 @@
"gh-4424-misc-orphan-on-reconfiguration-error.test.lua": {},
"gh-5426-election-on-off.test.lua": {},
"gh-5433-election-restart-recovery.test.lua": {},
+ "gh-5506-election-on-off.test.lua": {},
"once.test.lua": {},
"on_replace.test.lua": {},
"status.test.lua": {},
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 6136c934f..34ee32550 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
script = master.lua
description = tarantool/box, replication
disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua
config = suite.cfg
lua_libs = lua/fast_replica.lua lua/rlimit.lua
use_unix_sockets = True
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg during WAL write
2020-11-07 16:45 [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Vladislav Shpilevoy
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 1/2] raft: fix crash on sm restart during WAL write Vladislav Shpilevoy
@ 2020-11-07 16:45 ` Vladislav Shpilevoy
2020-11-09 10:19 ` Serge Petrenko
2020-11-10 21:09 ` [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Alexander V. Tikhonov
2020-11-10 22:05 ` Vladislav Shpilevoy
3 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-07 16:45 UTC (permalink / raw)
To: tarantool-patches, sergepetrenko, avtikhon
Raft state machine crashed if it was configured to be a candidate
during a WAL write with a known leader.
It tried to start waiting for the leader death, but should have
waited for the WAL write end first.
The code tried to handle it, but the order of 'if' conditions was
wrong. WAL write being in progress was checked last, but should
have been checked first.
Closes #5506
---
src/box/raft.c | 10 +--
.../gh-5506-election-on-off.result | 85 +++++++++++++++++++
.../gh-5506-election-on-off.test.lua | 37 ++++++++
3 files changed, 127 insertions(+), 5 deletions(-)
diff --git a/src/box/raft.c b/src/box/raft.c
index 3a99a0f26..8f8b59ba6 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -960,17 +960,17 @@ raft_cfg_is_candidate(bool is_candidate)
if (raft.is_candidate) {
assert(raft.state == RAFT_STATE_FOLLOWER);
- if (raft.leader != 0) {
- raft_sm_wait_leader_dead();
- } else if (raft_is_fully_on_disk()) {
- raft_sm_wait_leader_found();
- } else {
+ if (raft.is_write_in_progress) {
/*
* If there is an on-going WAL write, it means there was
* some node who sent newer data to this node. So it is
* probably a better candidate. Anyway can't do anything
* until the new state is fully persisted.
*/
+ } else if (raft.leader != 0) {
+ raft_sm_wait_leader_dead();
+ } else {
+ raft_sm_wait_leader_found();
}
} else {
if (raft.state != RAFT_STATE_LEADER) {
diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
index 1a718396f..4fbc31986 100644
--- a/test/replication/gh-5506-election-on-off.result
+++ b/test/replication/gh-5506-election-on-off.result
@@ -47,6 +47,91 @@ box.error.injection.set("ERRINJ_WAL_DELAY", false)
| - ok
| ...
+box.cfg{ \
+ election_mode = old_election_mode, \
+}
+ | ---
+ | ...
+
+--
+-- Another crash could happen when election mode was configured to be
+-- 'candidate' with a known leader, but there was a not finished WAL write.
+-- The node tried to start waiting for the leader death, even though with an
+-- active WAL write it should wait for its end first.
+--
+box.schema.user.grant('guest', 'super')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.cfg{election_mode = 'voter'}
+ | ---
+ | ...
+box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
+ | ---
+ | - ok
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{election_mode = 'candidate'}
+ | ---
+ | ...
+test_run:wait_cond(function() \
+ return box.info.election.leader ~= 0 \
+end)
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() \
+ return box.error.injection.get("ERRINJ_WAL_DELAY") \
+end)
+ | ---
+ | - true
+ | ...
+box.cfg{election_mode = 'candidate'}
+ | ---
+ | ...
+box.error.injection.set("ERRINJ_WAL_DELAY", false)
+ | ---
+ | - ok
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+
+box.schema.user.revoke('guest', 'super')
+ | ---
+ | ...
box.cfg{ \
election_mode = old_election_mode, \
replication_timeout = old_replication_timeout, \
diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
index 290408f06..bb89477d1 100644
--- a/test/replication/gh-5506-election-on-off.test.lua
+++ b/test/replication/gh-5506-election-on-off.test.lua
@@ -25,6 +25,43 @@ box.cfg{election_mode = 'off'}
box.cfg{election_mode = 'candidate'}
box.error.injection.set("ERRINJ_WAL_DELAY", false)
+box.cfg{ \
+ election_mode = old_election_mode, \
+}
+
+--
+-- Another crash could happen when election mode was configured to be
+-- 'candidate' with a known leader, but there was a not finished WAL write.
+-- The node tried to start waiting for the leader death, even though with an
+-- active WAL write it should wait for its end first.
+--
+box.schema.user.grant('guest', 'super')
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+test_run:switch('replica')
+box.cfg{election_mode = 'voter'}
+box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
+
+test_run:switch('default')
+box.cfg{election_mode = 'candidate'}
+test_run:wait_cond(function() \
+ return box.info.election.leader ~= 0 \
+end)
+
+test_run:switch('replica')
+test_run:wait_cond(function() \
+ return box.error.injection.get("ERRINJ_WAL_DELAY") \
+end)
+box.cfg{election_mode = 'candidate'}
+box.error.injection.set("ERRINJ_WAL_DELAY", false)
+
+test_run:switch('default')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+
+box.schema.user.revoke('guest', 'super')
box.cfg{ \
election_mode = old_election_mode, \
replication_timeout = old_replication_timeout, \
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] raft: fix crash on sm restart during WAL write
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 1/2] raft: fix crash on sm restart during WAL write Vladislav Shpilevoy
@ 2020-11-09 10:16 ` Serge Petrenko
0 siblings, 0 replies; 9+ messages in thread
From: Serge Petrenko @ 2020-11-09 10:16 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, avtikhon
07.11.2020 19:45, Vladislav Shpilevoy пишет:
> Raft state machine crashed if was restarted during a WAL write
> being in progress. When the machine was started, it didn't assume
> there still can be a not finished WAL write from the time it was
> enabled earlier.
>
> The patch makes it continue waiting for the write end.
Hi! Thanks for the patch!
LGTM.
>
> Part of #5506
> ---
> src/box/raft.c | 13 ++++-
> .../gh-5506-election-on-off.result | 55 +++++++++++++++++++
> .../gh-5506-election-on-off.test.lua | 31 +++++++++++
> test/replication/suite.cfg | 1 +
> test/replication/suite.ini | 2 +-
> 5 files changed, 98 insertions(+), 4 deletions(-)
> create mode 100644 test/replication/gh-5506-election-on-off.result
> create mode 100644 test/replication/gh-5506-election-on-off.test.lua
>
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 914b0d68f..3a99a0f26 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -857,13 +857,20 @@ raft_sm_start(void)
> {
> say_info("RAFT: start state machine");
> assert(!ev_is_active(&raft.timer));
> - assert(!raft.is_write_in_progress);
> assert(!raft.is_enabled);
> assert(raft.state == RAFT_STATE_FOLLOWER);
> raft.is_enabled = true;
> raft.is_candidate = raft.is_cfg_candidate;
> - if (!raft.is_candidate) {
> - /* Nop. */;
> + if (raft.is_write_in_progress) {
> + /*
> + * Nop. If write is in progress, the state machine is frozen. It
> + * is continued when write ends.
> + */
> + } else if (!raft.is_candidate) {
> + /*
> + * Nop. When a node is not a candidate, it can't initiate
> + * elections anyway, so it does not need to monitor the leader.
> + */
> } else if (raft.leader != 0) {
> raft_sm_wait_leader_dead();
> } else {
> diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
> new file mode 100644
> index 000000000..1a718396f
> --- /dev/null
> +++ b/test/replication/gh-5506-election-on-off.result
> @@ -0,0 +1,55 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +old_election_mode = box.cfg.election_mode
> + | ---
> + | ...
> +old_replication_timeout = box.cfg.replication_timeout
> + | ---
> + | ...
> +
> +--
> +-- gh-5506: Raft state machine crashed in case there was a WAL write in
> +-- progress, and Raft was disabled + enabled back immediately. It didn't assume
> +-- that there can be a not finished WAL write when Raft is just enabled.
> +--
> +
> +-- Start a WAL write and wait when it starts.
> +box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
> + | ---
> + | - ok
> + | ...
> +box.cfg{ \
> + election_mode = 'candidate', \
> + replication_timeout = 0.1, \
> +}
> + | ---
> + | ...
> +test_run:wait_cond(function() \
> + return box.error.injection.get("ERRINJ_WAL_DELAY") \
> +end)
> + | ---
> + | - true
> + | ...
> +
> +-- Restart the state machine. It should notice the not finished WAL write and
> +-- continue waiting.
> +box.cfg{election_mode = 'off'}
> + | ---
> + | ...
> +box.cfg{election_mode = 'candidate'}
> + | ---
> + | ...
> +box.error.injection.set("ERRINJ_WAL_DELAY", false)
> + | ---
> + | - ok
> + | ...
> +
> +box.cfg{ \
> + election_mode = old_election_mode, \
> + replication_timeout = old_replication_timeout, \
> +}
> + | ---
> + | ...
> diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
> new file mode 100644
> index 000000000..290408f06
> --- /dev/null
> +++ b/test/replication/gh-5506-election-on-off.test.lua
> @@ -0,0 +1,31 @@
> +test_run = require('test_run').new()
> +
> +old_election_mode = box.cfg.election_mode
> +old_replication_timeout = box.cfg.replication_timeout
> +
> +--
> +-- gh-5506: Raft state machine crashed in case there was a WAL write in
> +-- progress, and Raft was disabled + enabled back immediately. It didn't assume
> +-- that there can be a not finished WAL write when Raft is just enabled.
> +--
> +
> +-- Start a WAL write and wait when it starts.
> +box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
> +box.cfg{ \
> + election_mode = 'candidate', \
> + replication_timeout = 0.1, \
> +}
> +test_run:wait_cond(function() \
> + return box.error.injection.get("ERRINJ_WAL_DELAY") \
> +end)
> +
> +-- Restart the state machine. It should notice the not finished WAL write and
> +-- continue waiting.
> +box.cfg{election_mode = 'off'}
> +box.cfg{election_mode = 'candidate'}
> +box.error.injection.set("ERRINJ_WAL_DELAY", false)
> +
> +box.cfg{ \
> + election_mode = old_election_mode, \
> + replication_timeout = old_replication_timeout, \
> +}
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index 8fd62fdb8..f2addebda 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -16,6 +16,7 @@
> "gh-4424-misc-orphan-on-reconfiguration-error.test.lua": {},
> "gh-5426-election-on-off.test.lua": {},
> "gh-5433-election-restart-recovery.test.lua": {},
> + "gh-5506-election-on-off.test.lua": {},
> "once.test.lua": {},
> "on_replace.test.lua": {},
> "status.test.lua": {},
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index 6136c934f..34ee32550 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -3,7 +3,7 @@ core = tarantool
> script = master.lua
> description = tarantool/box, replication
> disabled = consistent.test.lua
> -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua
> +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua
> config = suite.cfg
> lua_libs = lua/fast_replica.lua lua/rlimit.lua
> use_unix_sockets = True
--
Serge Petrenko
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg during WAL write
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg " Vladislav Shpilevoy
@ 2020-11-09 10:19 ` Serge Petrenko
2020-11-09 22:42 ` Vladislav Shpilevoy
0 siblings, 1 reply; 9+ messages in thread
From: Serge Petrenko @ 2020-11-09 10:19 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, avtikhon
07.11.2020 19:45, Vladislav Shpilevoy пишет:
> Raft state machine crashed if it was configured to be a candidate
> during a WAL write with a known leader.
>
> It tried to start waiting for the leader death, but should have
> waited for the WAL write end first.
>
> The code tried to handle it, but the order of 'if' conditions was
> wrong. WAL write being in progress was checked last, but should
> have been checked first.
>
> Closes #5506
Hi! Thanks for the patch!
Please see some minor comments below.
> ---
> src/box/raft.c | 10 +--
> .../gh-5506-election-on-off.result | 85 +++++++++++++++++++
> .../gh-5506-election-on-off.test.lua | 37 ++++++++
> 3 files changed, 127 insertions(+), 5 deletions(-)
>
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 3a99a0f26..8f8b59ba6 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -960,17 +960,17 @@ raft_cfg_is_candidate(bool is_candidate)
>
> if (raft.is_candidate) {
> assert(raft.state == RAFT_STATE_FOLLOWER);
> - if (raft.leader != 0) {
> - raft_sm_wait_leader_dead();
> - } else if (raft_is_fully_on_disk()) {
> - raft_sm_wait_leader_found();
> - } else {
> + if (raft.is_write_in_progress) {
> /*
> * If there is an on-going WAL write, it means there was
> * some node who sent newer data to this node. So it is
> * probably a better candidate. Anyway can't do anything
> * until the new state is fully persisted.
> */
> + } else if (raft.leader != 0) {
> + raft_sm_wait_leader_dead();
> + } else {
> + raft_sm_wait_leader_found();
> }
> } else {
> if (raft.state != RAFT_STATE_LEADER) {
> diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
> index 1a718396f..4fbc31986 100644
> --- a/test/replication/gh-5506-election-on-off.result
> +++ b/test/replication/gh-5506-election-on-off.result
> @@ -47,6 +47,91 @@ box.error.injection.set("ERRINJ_WAL_DELAY", false)
> | - ok
> | ...
>
> +box.cfg{ \
> + election_mode = old_election_mode, \
> +}
> + | ---
> + | ...
> +
> +--
> +-- Another crash could happen when election mode was configured to be
> +-- 'candidate' with a known leader, but there was a not finished WAL write.
> +-- The node tried to start waiting for the leader death, even though with an
> +-- active WAL write it should wait for its end first.
> +--
> +box.schema.user.grant('guest', 'super')
box.schema.user.grant('guest', 'replication') is enough.
> + | ---
> + | ...
> +test_run:cmd('create server replica with rpl_master=default,\
> + script="replication/replica.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.cfg{election_mode = 'voter'}
> + | ---
> + | ...
> +box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
> + | ---
> + | - ok
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{election_mode = 'candidate'}
> + | ---
> + | ...
> +test_run:wait_cond(function() \
> + return box.info.election.leader ~= 0 \
> +end)
Wouldn't it be simpler to wait for `box.info.election.state == 'leader'`?
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +test_run:wait_cond(function() \
> + return box.error.injection.get("ERRINJ_WAL_DELAY") \
> +end)
> + | ---
> + | - true
> + | ...
> +box.cfg{election_mode = 'candidate'}
> + | ---
> + | ...
> +box.error.injection.set("ERRINJ_WAL_DELAY", false)
> + | ---
> + | - ok
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica')
> + | ---
> + | - true
> + | ...
> +
> +box.schema.user.revoke('guest', 'super')
> + | ---
> + | ...
> box.cfg{ \
> election_mode = old_election_mode, \
> replication_timeout = old_replication_timeout, \
> diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
> index 290408f06..bb89477d1 100644
> --- a/test/replication/gh-5506-election-on-off.test.lua
> +++ b/test/replication/gh-5506-election-on-off.test.lua
> @@ -25,6 +25,43 @@ box.cfg{election_mode = 'off'}
> box.cfg{election_mode = 'candidate'}
> box.error.injection.set("ERRINJ_WAL_DELAY", false)
>
> +box.cfg{ \
> + election_mode = old_election_mode, \
> +}
> +
> +--
> +-- Another crash could happen when election mode was configured to be
> +-- 'candidate' with a known leader, but there was a not finished WAL write.
> +-- The node tried to start waiting for the leader death, even though with an
> +-- active WAL write it should wait for its end first.
> +--
> +box.schema.user.grant('guest', 'super')
> +test_run:cmd('create server replica with rpl_master=default,\
> + script="replication/replica.lua"')
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> +
> +test_run:switch('replica')
> +box.cfg{election_mode = 'voter'}
> +box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
> +
> +test_run:switch('default')
> +box.cfg{election_mode = 'candidate'}
> +test_run:wait_cond(function() \
> + return box.info.election.leader ~= 0 \
> +end)
> +
> +test_run:switch('replica')
> +test_run:wait_cond(function() \
> + return box.error.injection.get("ERRINJ_WAL_DELAY") \
> +end)
> +box.cfg{election_mode = 'candidate'}
> +box.error.injection.set("ERRINJ_WAL_DELAY", false)
> +
> +test_run:switch('default')
> +test_run:cmd('stop server replica')
> +test_run:cmd('delete server replica')
> +
> +box.schema.user.revoke('guest', 'super')
> box.cfg{ \
> election_mode = old_election_mode, \
> replication_timeout = old_replication_timeout, \
--
Serge Petrenko
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg during WAL write
2020-11-09 10:19 ` Serge Petrenko
@ 2020-11-09 22:42 ` Vladislav Shpilevoy
2020-11-10 7:48 ` Serge Petrenko
0 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-09 22:42 UTC (permalink / raw)
To: Serge Petrenko, tarantool-patches, avtikhon
Thanks for the review!
>> diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
>> index 1a718396f..4fbc31986 100644
>> --- a/test/replication/gh-5506-election-on-off.result
>> +++ b/test/replication/gh-5506-election-on-off.result
>> @@ -47,6 +47,91 @@ box.error.injection.set("ERRINJ_WAL_DELAY", false)
>> | - ok
>> | ...
>> +box.cfg{ \
>> + election_mode = old_election_mode, \
>> +}
>> + | ---
>> + | ...
>> +
>> +--
>> +-- Another crash could happen when election mode was configured to be
>> +-- 'candidate' with a known leader, but there was a not finished WAL write.
>> +-- The node tried to start waiting for the leader death, even though with an
>> +-- active WAL write it should wait for its end first.
>> +--
>> +box.schema.user.grant('guest', 'super')
>
>
> box.schema.user.grant('guest', 'replication') is enough.
I know. But I prefer using super, because it works always. I don't
want to think of the weakest possible rights which would fit. So
I usually add 'super' and forget about it.
>> + | ---
>> + | ...
>> +test_run:cmd('create server replica with rpl_master=default,\
>> + script="replication/replica.lua"')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('start server replica with wait=True, wait_load=True')
>> + | ---
>> + | - true
>> + | ...
>> +
>> +test_run:switch('replica')
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{election_mode = 'voter'}
>> + | ---
>> + | ...
>> +box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
>> + | ---
>> + | - ok
>> + | ...
>> +
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{election_mode = 'candidate'}
>> + | ---
>> + | ...
>> +test_run:wait_cond(function() \
>> + return box.info.election.leader ~= 0 \
>> +end)
>
>
> Wouldn't it be simpler to wait for `box.info.election.state == 'leader'`?
I don't mind.
====================
diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
index 4fbc31986..b8abd7ecd 100644
--- a/test/replication/gh-5506-election-on-off.result
+++ b/test/replication/gh-5506-election-on-off.result
@@ -92,7 +92,7 @@ box.cfg{election_mode = 'candidate'}
| ---
| ...
test_run:wait_cond(function() \
- return box.info.election.leader ~= 0 \
+ return box.info.election.state == 'leader' \
end)
| ---
| - true
diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
index bb89477d1..476b00ec0 100644
--- a/test/replication/gh-5506-election-on-off.test.lua
+++ b/test/replication/gh-5506-election-on-off.test.lua
@@ -47,7 +47,7 @@ box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
test_run:switch('default')
box.cfg{election_mode = 'candidate'}
test_run:wait_cond(function() \
- return box.info.election.leader ~= 0 \
+ return box.info.election.state == 'leader' \
end)
test_run:switch('replica')
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg during WAL write
2020-11-09 22:42 ` Vladislav Shpilevoy
@ 2020-11-10 7:48 ` Serge Petrenko
0 siblings, 0 replies; 9+ messages in thread
From: Serge Petrenko @ 2020-11-10 7:48 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, avtikhon
10.11.2020 01:42, Vladislav Shpilevoy пишет:
> Thanks for the review!
>
>>> diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
>>> index 1a718396f..4fbc31986 100644
>>> --- a/test/replication/gh-5506-election-on-off.result
>>> +++ b/test/replication/gh-5506-election-on-off.result
>>> @@ -47,6 +47,91 @@ box.error.injection.set("ERRINJ_WAL_DELAY", false)
>>> | - ok
>>> | ...
>>> +box.cfg{ \
>>> + election_mode = old_election_mode, \
>>> +}
>>> + | ---
>>> + | ...
>>> +
>>> +--
>>> +-- Another crash could happen when election mode was configured to be
>>> +-- 'candidate' with a known leader, but there was a not finished WAL write.
>>> +-- The node tried to start waiting for the leader death, even though with an
>>> +-- active WAL write it should wait for its end first.
>>> +--
>>> +box.schema.user.grant('guest', 'super')
>>
>> box.schema.user.grant('guest', 'replication') is enough.
> I know. But I prefer using super, because it works always. I don't
> want to think of the weakest possible rights which would fit. So
> I usually add 'super' and forget about it.
Ok.
>
>>> + | ---
>>> + | ...
>>> +test_run:cmd('create server replica with rpl_master=default,\
>>> + script="replication/replica.lua"')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd('start server replica with wait=True, wait_load=True')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +
>>> +test_run:switch('replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.cfg{election_mode = 'voter'}
>>> + | ---
>>> + | ...
>>> +box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
>>> + | ---
>>> + | - ok
>>> + | ...
>>> +
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.cfg{election_mode = 'candidate'}
>>> + | ---
>>> + | ...
>>> +test_run:wait_cond(function() \
>>> + return box.info.election.leader ~= 0 \
>>> +end)
>>
>> Wouldn't it be simpler to wait for `box.info.election.state == 'leader'`?
> I don't mind.
>
> ====================
> diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
> index 4fbc31986..b8abd7ecd 100644
> --- a/test/replication/gh-5506-election-on-off.result
> +++ b/test/replication/gh-5506-election-on-off.result
> @@ -92,7 +92,7 @@ box.cfg{election_mode = 'candidate'}
> | ---
> | ...
> test_run:wait_cond(function() \
> - return box.info.election.leader ~= 0 \
> + return box.info.election.state == 'leader' \
> end)
> | ---
> | - true
> diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
> index bb89477d1..476b00ec0 100644
> --- a/test/replication/gh-5506-election-on-off.test.lua
> +++ b/test/replication/gh-5506-election-on-off.test.lua
> @@ -47,7 +47,7 @@ box.error.injection.set("ERRINJ_WAL_DELAY_COUNTDOWN", 0)
> test_run:switch('default')
> box.cfg{election_mode = 'candidate'}
> test_run:wait_cond(function() \
> - return box.info.election.leader ~= 0 \
> + return box.info.election.state == 'leader' \
> end)
>
> test_run:switch('replica')
Thanks for the changes!
LGTM.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence
2020-11-07 16:45 [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Vladislav Shpilevoy
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 1/2] raft: fix crash on sm restart during WAL write Vladislav Shpilevoy
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg " Vladislav Shpilevoy
@ 2020-11-10 21:09 ` Alexander V. Tikhonov
2020-11-10 22:05 ` Vladislav Shpilevoy
3 siblings, 0 replies; 9+ messages in thread
From: Alexander V. Tikhonov @ 2020-11-10 21:09 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tarantool-patches
Hi Vlad, I've checked all results in gitlab-ci, and no new degradations
found [1], patch LGTM.
[1] - https://gitlab.com/tarantool/tarantool/-/pipelines/213913895
On Sat, Nov 07, 2020 at 05:45:30PM +0100, Vladislav Shpilevoy wrote:
> The patchset fixes 2 crashes related to Raft WAL write being in progress,
> when the state machine is restarted, and when it is configured to be a candidate
> with a known leader.
>
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5506-raft-crash
> Issue: https://github.com/tarantool/tarantool/issues/5506
>
> @ChangeLog
> * Fixed a couple of crashes on various tweaks of election mode (gh-5506).
>
> Vladislav Shpilevoy (2):
> raft: fix crash on sm restart during WAL write
> raft: fix crash on candidate cfg during WAL write
>
> src/box/raft.c | 23 ++-
> .../gh-5506-election-on-off.result | 140 ++++++++++++++++++
> .../gh-5506-election-on-off.test.lua | 68 +++++++++
> test/replication/suite.cfg | 1 +
> test/replication/suite.ini | 2 +-
> 5 files changed, 225 insertions(+), 9 deletions(-)
> create mode 100644 test/replication/gh-5506-election-on-off.result
> create mode 100644 test/replication/gh-5506-election-on-off.test.lua
>
> --
> 2.21.1 (Apple Git-122.3)
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence
2020-11-07 16:45 [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Vladislav Shpilevoy
` (2 preceding siblings ...)
2020-11-10 21:09 ` [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Alexander V. Tikhonov
@ 2020-11-10 22:05 ` Vladislav Shpilevoy
3 siblings, 0 replies; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-10 22:05 UTC (permalink / raw)
To: tarantool-patches, sergepetrenko, avtikhon
Pushed to master and 2.6.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-10 22:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-07 16:45 [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Vladislav Shpilevoy
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 1/2] raft: fix crash on sm restart during WAL write Vladislav Shpilevoy
2020-11-09 10:16 ` Serge Petrenko
2020-11-07 16:45 ` [Tarantool-patches] [PATCH 2/2] raft: fix crash on candidate cfg " Vladislav Shpilevoy
2020-11-09 10:19 ` Serge Petrenko
2020-11-09 22:42 ` Vladislav Shpilevoy
2020-11-10 7:48 ` Serge Petrenko
2020-11-10 21:09 ` [Tarantool-patches] [PATCH 0/2] Raft crash on re-enablence Alexander V. Tikhonov
2020-11-10 22:05 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox