From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 4/8] test: introduce raft unit tests
Date: Wed, 16 Dec 2020 16:03:04 +0300 [thread overview]
Message-ID: <1f433ed9-b8c8-f67a-fb7c-3da7cc504110@tarantool.org> (raw)
In-Reply-To: <51c5f48d04df7d43e4b773e8d9c2cdad76137dc1.1607879643.git.v.shpilevoy@tarantool.org>
13.12.2020 20:15, Vladislav Shpilevoy пишет:
> Raft algorithm was tested only by functional Lua tests, as a part
> of the Tarantool executable.
>
> Functional testing of something like raft algorithm has drawbacks:
>
> - Not possible or too hard to cover some cases without error
> injections and/or good stability. Such as invalid messages, or
> small time durations, or a complex case which needs multiple
> steps to be reproduced. For instance, start WAL write, receive a
> message, finish the WAL write, and see if an expected thing
> happens.
>
> - Too long time to run when need to test not tiny timeouts. On the
> other hand, with tiny timeouts the test would become unstable.
>
> - Poor reproducibility due to random used in raft, due to system
> load, and number of other tests running in parallel.
>
> - Hard to debug, because for raft it is necessary to start one
> Tarantool process per raft instance.
>
> - Involves too much other systems, such as threads, real journal,
> relays, appliers, and so on. They can affect the process on the
> whole and reduce reproducibility and debug simplicity even more.
>
> Exactly the same problem existed for SWIM algorithm implemented as
> a module 'swim'. In order to test it, swim was moved to a separate
> library, refactored to be able to start many swims in the process
> (no global variables), its functions talking to other systems
> were virtualized (swim_ev, swim_transport), and substituted in the
> unit tests with fake analogue systems.
>
> In the unit tests these virtual functions were implemented
> differently, but the core swim algorithm was left intact and
> properly tested.
>
> The same is done for raft. This patch implements a set of helper
> functions and objects to unit test raft in raft_test_utils.c/.h
> files, and uses it to cover almost all the raft algorithm code.
>
> During implementation of the tests some bugs were found, which are
> not covered here, but fixed and covered in next commits.
>
> Part of #5303
> ---
Hi! Thanks for the patch!
Please find my 6 comments below.
I've dropped the hunks where I have no comments so that the
letter is easier to follow.
> test/unit/CMakeLists.txt | 3 +
> test/unit/raft.c | 1199 +++++++++++++++++++++++++++++++++++
> test/unit/raft.result | 207 ++++++
> test/unit/raft_test_utils.c | 557 ++++++++++++++++
> test/unit/raft_test_utils.h | 287 +++++++++
> 5 files changed, 2253 insertions(+)
> create mode 100644 test/unit/raft.c
> create mode 100644 test/unit/raft.result
> create mode 100644 test/unit/raft_test_utils.c
> create mode 100644 test/unit/raft_test_utils.h
>
>
> diff --git a/test/unit/raft.c b/test/unit/raft.c
> new file mode 100644
> index 000000000..dfb5f8e43
> --- /dev/null
> +++ b/test/unit/raft.c
<stripped>
> +
> +static void
> +raft_test_vote_skip(void)
> +{
> + raft_start_test(33);
> + struct raft_node node;
> + raft_node_create(&node);
> +
> + /* Everything is skipped if the term is outdated. */
1. Let's also test a case when vote response has greater term than the
candidate itself.
> +
> + raft_run_next_event();
> + is(node.raft.state, RAFT_STATE_CANDIDATE, "became candidate");
> + is(node.raft.term, 2, "term is bumped");
> +
> + is(raft_node_send_vote_response(&node,
> + 1 /* Term. */,
> + 1 /* Vote. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote_count, 1, "but ignored - too old term");
> +
> + /* Competing vote requests are skipped. */
> +
> + is(raft_node_send_vote_response(&node,
> + 2 /* Term. */,
> + 3 /* Vote. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote_count, 1, "but ignored - vote not for this node");
> + is(node.raft.state, RAFT_STATE_CANDIDATE, "this node does not give up");
> +
> + /* Vote requests are ignored when node is disabled. */
> +
> + raft_node_cfg_is_enabled(&node, false);
> +
> + is(raft_node_send_follower(&node,
> + 3 /* Term. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_FOLLOWER /* State. */,
> + 0 /* Leader. */,
> + 3 /* Term. */,
> + 0 /* Vote. */,
> + 3 /* Volatile term. */,
> + 0 /* Volatile vote. */,
> + "{0: 2}" /* Vclock. */
> + ), "term bump to be able to vote again");
> + is(raft_node_send_vote_request(&node,
> + 3 /* Term. */,
> + "{}" /* Vclock. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 0, "but ignored - node is disabled");
> +
> + /* Disabled node still takes term from the vote request. */
> +
> + is(raft_node_send_vote_request(&node,
> + 4 /* Term. */,
> + "{}" /* Vclock. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_FOLLOWER /* State. */,
> + 0 /* Leader. */,
> + 4 /* Term. */,
> + 0 /* Vote. */,
> + 4 /* Volatile term. */,
> + 0 /* Volatile vote. */,
> + "{0: 3}" /* Vclock. */
> + ), "term is bumped, but vote request is ignored");
> +
> + raft_node_cfg_is_enabled(&node, true);
> +
> + /* Not a candidate won't accept vote request for self. */
> +
> + is(raft_node_send_vote_response(&node,
> + 4 /* Term. */,
> + 1 /* Vote. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 0, "but ignored - vote works only on a candidate");
> +
> + /* Ignore vote response for some third node. */
> +
> + is(raft_node_send_vote_response(&node,
> + 4 /* Term. */,
> + 3 /* Vote. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 0, "but ignored - sender != vote, so it is not a "
> + "request");
> +
> + /* Ignore if leader is already known. */
> +
> + is(raft_node_send_leader(&node,
> + 4 /* Term. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.leader, 2, "leader is accepted");
> +
> + is(raft_node_send_vote_request(&node,
> + 4 /* Term. */,
> + "{}" /* Vclock. */,
> + 3 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 0, "but ignored - leader is already known");
> + is(node.raft.leader, 2, "leader is not changed");
> +
> + /* Ignore too small vclock. */
> +
> + /*
> + * Need to turn off the candidate role to bump the term and not become
> + * a candidate.
> + */
> + raft_node_cfg_is_candidate(&node, false);
> +
> + raft_node_journal_follow(&node, 1, 5);
> + raft_node_journal_follow(&node, 2, 5);
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_FOLLOWER /* State. */,
> + 2 /* Leader. */,
> + 4 /* Term. */,
> + 0 /* Vote. */,
> + 4 /* Volatile term. */,
> + 0 /* Volatile vote. */,
> + "{0: 3, 1: 5, 2: 5}" /* Vclock. */
> + ), "vclock is bumped");
> +
> + is(raft_node_send_vote_request(&node,
> + 5 /* Term. */,
> + "{1: 4}" /* Vclock. */,
> + 3 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 0, "but ignored - vclock is too small");
> + is(node.raft.term, 5, "new term");
> + is(node.raft.leader, 0, "leader is dropped in the new term");
> +
> + /* Ignore incomparable vclock. */
> +
> + is(raft_node_send_vote_request(&node,
> + 5 /* Term. */,
> + "{1: 4, 2: 6}" /* Vclock. */,
> + 3 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 0, "but ignored - vclock is incomparable");
> +
> + /* Ignore if voted in the current term. */
> +
> + is(raft_node_send_vote_request(&node,
> + 6 /* Term. */,
> + "{1: 5, 2: 5}" /* Vclock. */,
> + 2 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 2, "voted");
> +
> + is(raft_node_send_vote_request(&node,
> + 6 /* Term. */,
> + "{1: 5, 2: 5}" /* Vclock. */,
> + 3 /* Source. */
> + ), 0, "message is accepted");
> + is(node.raft.vote, 2, "but ignored - already voted in the term");
2. I'd also check the case that the node ignores vote request after restart.
> +
> + raft_node_cfg_is_candidate(&node, true);
> +
> + raft_node_destroy(&node);
> + raft_finish_test();
> +}
<stripped>
> +
> +static void
> +raft_test_heartbeat(void)
> +{
> + raft_start_test(12);
> + struct raft_node node;
> + raft_node_create(&node);
> +
> + /* Let the node know there is a leader somewhere. */
> +
> + is(raft_node_send_leader(&node,
> + 2 /* Term. */,
> + 2 /* Source. */
> + ), 0, "leader notification");
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_FOLLOWER /* State. */,
> + 2 /* Leader. */,
> + 2 /* Term. */,
> + 0 /* Vote. */,
> + 2 /* Volatile term. */,
> + 0 /* Volatile vote. */,
> + "{0: 1}" /* Vclock. */
> + ), "follow the leader after notification");
> +
> + /* Leader can send the same message many times. */
> +
> + is(raft_node_send_leader(&node,
> + 2 /* Term. */,
> + 2 /* Source. */
> + ), 0, "leader notification");
> +
> + /* The node won't do anything if it is not a candidate. */
> +
> + raft_node_cfg_is_candidate(&node, false);
> + raft_run_for(node.cfg_death_timeout * 2);
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_FOLLOWER /* State. */,
> + 2 /* Leader. */,
> + 2 /* Term. */,
> + 0 /* Vote. */,
> + 2 /* Volatile term. */,
> + 0 /* Volatile vote. */,
> + "{0: 1}" /* Vclock. */
> + ), "follow the leader because no candidate");
3. Shouldn't the node reset the leader to 0 even if if it isn't a
candidate itself?
> + raft_node_cfg_is_candidate(&node, true);
> +
> + /* Heartbeats from the leader are accepted. */
> +
> + for (int i = 0; i < 5; ++i) {
> + raft_run_for(node.cfg_death_timeout / 2);
> + raft_node_send_heartbeat(&node, 2);
> + }
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_FOLLOWER /* State. */,
> + 2 /* Leader. */,
> + 2 /* Term. */,
> + 0 /* Vote. */,
> + 2 /* Volatile term. */,
> + 0 /* Volatile vote. */,
> + "{0: 1}" /* Vclock. */
> + ), "follow the leader because had heartbeats");
> +
> + /* Heartbeats not from the leader won't do anything. */
> +
> + double start = raft_time();
> + raft_run_for(node.cfg_death_timeout / 3);
> + raft_node_send_heartbeat(&node, 3);
> + raft_run_for(node.cfg_death_timeout / 3);
> + raft_node_send_heartbeat(&node, 0);
> + raft_run_next_event();
> + ok(raft_time() >= start + node.cfg_death_timeout, "death timeout "
> + "passed");
4. Looks like this test would succeed even if heartbeats from other
instances were counted as leader heartbeats. You simply say
`raft_run_next_event()` which'll make the death timeout fire sooner
or later anyway. I think it's better to add another
`raft_run_for(node.cfg_death_timeout / 2)` to make sure the node
times out exactly when it should.
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_CANDIDATE /* State. */,
> + 0 /* Leader. */,
> + 3 /* Term. */,
> + 1 /* Vote. */,
> + 3 /* Volatile term. */,
> + 1 /* Volatile vote. */,
> + "{0: 2}" /* Vclock. */
> + ), "enter candidate state when no heartbeats from the leader");
> +
> + /* Non-candidate ignores heartbeats. */
> +
> + raft_node_cfg_is_candidate(&node, false);
> + raft_node_send_heartbeat(&node, 2);
> + raft_node_cfg_is_candidate(&node, true);
> +
> + /* Leader ignores all heartbeats - nothing to wait for. */
> +
> + raft_node_new_term(&node);
> + is(raft_node_send_vote_response(&node,
> + 4 /* Term. */,
> + 1 /* Vote. */,
> + 2 /* Source. */
> + ), 0, "vote from 2");
> + is(raft_node_send_vote_response(&node,
> + 4 /* Term. */,
> + 1 /* Vote. */,
> + 3 /* Source. */
> + ), 0, "vote from 3");
> + is(node.raft.state, RAFT_STATE_LEADER, "became leader");
> + /* From self. */
> + raft_node_send_heartbeat(&node, 1);
> + /* From somebody else. */
> + raft_node_send_heartbeat(&node, 2);
> +
> + /* Heartbeats are ignored during WAL write. */
> +
> + raft_node_block(&node);
> + is(raft_node_send_leader(&node,
> + 5 /* Term. */,
> + 2 /* Source. */
> + ), 0, "message from leader");
> + raft_node_send_heartbeat(&node, 2);
> + ok(raft_node_check_full_state(&node,
> + RAFT_STATE_FOLLOWER /* State. */,
> + 2 /* Leader. */,
> + 4 /* Term. */,
> + 1 /* Vote. */,
> + 5 /* Volatile term. */,
> + 0 /* Volatile vote. */,
> + "{0: 4}" /* Vclock. */
> + ), "nothing changed - waiting for WAL write");
> + raft_node_unblock(&node);
> +
> + raft_node_destroy(&node);
> + raft_finish_test();
> +}
> +
<stripped>
> diff --git a/test/unit/raft_test_utils.c b/test/unit/raft_test_utils.c
> new file mode 100644
> index 000000000..4acd74e8f
> --- /dev/null
> +++ b/test/unit/raft_test_utils.c
<stripped>
> +
> +static void
> +raft_method_node_broadcast(struct raft *raft, const struct raft_msg *msg);
> +
> +static void
> +raft_method_node_write(struct raft *raft, const struct raft_msg *msg);
> +
> +static void
> +raft_method_node_schedule_async(struct raft *raft);
> +
5. Why such a long prefix? Why not simply raft_node_broadcast?
Ok, raft_node_broadcast may sound too general and referring more
to a raft test node rather than the raft core itself.
What about raft_node_broadcast_f then?
> +static struct raft_vtab raft_vtab = {
> + .broadcast = raft_method_node_broadcast,
> + .write = raft_method_node_write,
> + .schedule_async = raft_method_node_schedule_async,
> +};
> +
> +static int
> +raft_node_on_update(struct trigger *t, void *event)
> +{
> + struct raft_node *n = t->data;
> + assert(&n->on_update == t);
> + assert(&n->raft == event);
6. `(void) event` for building in release?
> + ++n->update_count;
> + return 0;
> +}
> +
<stripped>
--
Serge Petrenko
next prev parent reply other threads:[~2020-12-16 13:03 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-13 17:15 [Tarantool-patches] [PATCH 0/8] Raft module, part 4 - " Vladislav Shpilevoy
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 1/8] fakesys: fix ev_is_active not working on fake timers Vladislav Shpilevoy
2020-12-15 9:42 ` Serge Petrenko
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 2/8] fakesys: introduce fakeev_timer_remaining() Vladislav Shpilevoy
2020-12-15 9:43 ` Serge Petrenko
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 3/8] raft: introduce raft_ev Vladislav Shpilevoy
2020-12-15 10:02 ` Serge Petrenko
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 4/8] test: introduce raft unit tests Vladislav Shpilevoy
2020-12-13 18:10 ` Vladislav Shpilevoy
2020-12-16 13:03 ` Serge Petrenko [this message]
2020-12-17 22:44 ` Vladislav Shpilevoy
2020-12-18 8:17 ` Serge Petrenko
2020-12-20 17:28 ` Vladislav Shpilevoy
2020-12-21 7:36 ` Serge Petrenko
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 5/8] raft: fix crash when received 0 term message Vladislav Shpilevoy
2020-12-16 13:05 ` Serge Petrenko
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 6/8] raft: fix ignorance of bad state receipt Vladislav Shpilevoy
2020-12-16 13:06 ` Serge Petrenko
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 7/8] raft: fix crash on election timeout decrease Vladislav Shpilevoy
2020-12-16 13:08 ` Serge Petrenko
2020-12-13 17:15 ` [Tarantool-patches] [PATCH 8/8] raft: fix crash on death " Vladislav Shpilevoy
2020-12-16 13:10 ` Serge Petrenko
2020-12-21 16:50 ` [Tarantool-patches] [PATCH 0/8] Raft module, part 4 - unit tests Vladislav Shpilevoy
2020-12-21 17:29 ` Vladislav Shpilevoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1f433ed9-b8c8-f67a-fb7c-3da7cc504110@tarantool.org \
--to=sergepetrenko@tarantool.org \
--cc=tarantool-patches@dev.tarantool.org \
--cc=v.shpilevoy@tarantool.org \
--subject='Re: [Tarantool-patches] [PATCH 4/8] test: introduce raft unit tests' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox