From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id BD0A96ECE3; Thu, 20 Jan 2022 03:45:50 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BD0A96ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1642639550; bh=NUfejcP8GtIA9XrphCqrteHBYDqbOsqB5KiphOOv16k=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Iri2ZlJtI8ULVAJF5ErhzhGEhoQxgvxD2ToQIO1ZKrkcpt/IobCnoakJSzwdmNusD 4XwmJak5gQM2PIFvApW/zPC1L7Xrg5DHmt/cTDdaIJ/e6SGMNyaX/PF6MW23EhqXkA svvC5Fyf8aH3Gw+PQHHQqufr6k9YLtw96Z1LdqWc= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4DB086EC43 for ; Thu, 20 Jan 2022 03:43:52 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4DB086EC43 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1nALYN-0006wT-GU; Thu, 20 Jan 2022 03:43:52 +0300 To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org Date: Thu, 20 Jan 2022 01:43:46 +0100 Message-Id: <72798befd5d6e32f4386aaeeb3209cc93c0e44b4.1642639079.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD98A33503A0B8627DB9E9A584FB8FFFC000C4C51AF5450967E182A05F538085040D0512AE6009BA8741420FAB3673A28AF8D0B24AD224366F9CA06B8A444E84C71 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE76964B76561B12926EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637DAE863916B47044D8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D859CE274E344F58727ED5C73576736969117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8B974A882099E279BDA471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CFA80D66F452D417A43847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FCBEEB5B7715175159E4C5ECA03534A2A4F1E0B1C845B70A35B1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDB049C5FA07E1E73E1DC48ACC2A39D04F89CDFB48F4795C241BDAD6C7F3747799A X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D346C409ABC5F9C579B109E00201CCA249C7E43AB3BDF28363E15FFB83B6B80B473501A619AD0BD06551D7E09C32AA3244CB93D857CC607C82029D1A93518C28669408A6A02710B7304729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojPeoZbWa28mz0whBNXPN0Tw== X-Mailru-Sender: 689FA8AB762F739339CABD9B3CA9A7D6826AD20A33162BE8A0E4196F8A22C6BF3841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E25FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v2 4/5] raft: introduce split vote detection X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Split vote is a situation when during election nobody can win and will not win in this term for sure. Not a single node could get enough votes. For example, with 4 nodes one could get 2 votes and other also 2 votes. Nobody will get quorum 3 in this term. The patch makes raft able to notice that situation and speed up the term bump. It is not bumped immediately though, because nodes might do that simultaneously and will get the split vote again after voting for self. There is a random delay. But it is just max 10% of election timeout, so it should speed up split vote resolution on 90% at least. Part of #5285 --- src/lib/raft/raft.c | 129 +++++++++++++++- src/lib/raft/raft.h | 6 + test/unit/raft.c | 301 +++++++++++++++++++++++++++++++++++- test/unit/raft.result | 64 +++++++- test/unit/raft_test_utils.c | 12 ++ test/unit/raft_test_utils.h | 5 + 6 files changed, 512 insertions(+), 5 deletions(-) diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c index 8182e73e3..5dcc5beaf 100644 --- a/src/lib/raft/raft.c +++ b/src/lib/raft/raft.c @@ -152,20 +152,83 @@ raft_can_vote_for(const struct raft *raft, const struct vclock *v) return cmp == 0 || cmp == 1; } -static inline void +static inline bool raft_add_vote(struct raft *raft, int src, int dst) { struct raft_vote *v = &raft->votes[src]; if (v->did_vote) - return; + return false; v->did_vote = true; ++raft->votes[dst].count; + return true; +} + +static bool +raft_has_split_vote(const struct raft *raft) +{ + int max_vote = 0; + int vote_vac = raft->cluster_size; + int quorum = raft->election_quorum; + /* + * Quorum > cluster is either a misconfiguration or some instances + * didn't register yet. Anyway, speeding the elections up won't help. + */ + if (vote_vac < quorum) + return false; + for (int i = 0; i < VCLOCK_MAX; ++i) { + int count = raft->votes[i].count; + vote_vac -= count; + if (count > max_vote) + max_vote = count; + } + /* + * More nodes voted than there are nodes configured. The reason is the + * the same as with quorum > cluster. The action is also the same - + * faster term bumps won't help. + */ + if (vote_vac < 0) + return false; + return max_vote + vote_vac < quorum; +} + +static int +raft_scores_snprintf(const struct raft *raft, char *buf, int size) +{ + int total = 0; + bool is_empty = true; + SNPRINT(total, snprintf, buf, size, "{"); + for (int i = 0; i < VCLOCK_MAX; ++i) { + int count = raft->votes[i].count; + if (count == 0) + continue; + if (!is_empty) + SNPRINT(total, snprintf, buf, size, ", "); + else + is_empty = false; + SNPRINT(total, snprintf, buf, size, "%d: %d", i, count); + } + SNPRINT(total, snprintf, buf, size, "}"); + return total; +} + +static const char * +raft_scores_str(const struct raft *raft) +{ + char *buf = tt_static_buf(); + int rc = raft_scores_snprintf(raft, buf, TT_STATIC_BUF_LEN); + assert(rc >= 0); + (void)rc; + return buf; } /** Schedule broadcast of the complete Raft state to all the followers. */ static void raft_schedule_broadcast(struct raft *raft); +/** If there is split vote, the node might reduce the next term delay. */ +static void +raft_check_split_vote(struct raft *raft); + /** Raft state machine methods. 'sm' stands for State Machine. */ /** @@ -351,7 +414,8 @@ raft_process_msg(struct raft *raft, const struct raft_msg *req, uint32_t source) * persisted long time ago and still broadcasted. Or a vote response. */ if (req->vote != 0) { - raft_add_vote(raft, source, req->vote); + if (raft_add_vote(raft, source, req->vote)) + raft_check_split_vote(raft); switch (raft->state) { case RAFT_STATE_FOLLOWER: @@ -771,6 +835,11 @@ raft_sm_wait_election_end(struct raft *raft) raft_new_random_election_shift(raft); raft_ev_timer_set(&raft->timer, election_timeout, election_timeout); raft_ev_timer_start(raft_loop(), &raft->timer); + /* + * Could start the waiting after a WAL write during which the split vote + * could happen. + */ + raft_check_split_vote(raft); } static void @@ -977,6 +1046,8 @@ raft_cfg_election_quorum(struct raft *raft, int election_quorum) if (raft->state == RAFT_STATE_CANDIDATE && raft_vote_count(raft) >= raft->election_quorum) raft_sm_become_leader(raft); + else + raft_check_split_vote(raft); } void @@ -1024,6 +1095,13 @@ raft_cfg_vclock(struct raft *raft, const struct vclock *vclock) raft->vclock = vclock; } +void +raft_cfg_cluster_size(struct raft *raft, int size) +{ + raft->cluster_size = size; + raft_check_split_vote(raft); +} + void raft_new_term(struct raft *raft) { @@ -1048,6 +1126,50 @@ raft_schedule_broadcast(struct raft *raft) raft_schedule_async(raft); } +static void +raft_check_split_vote(struct raft *raft) +{ + /* When leader is known, there is no election. Thus no vote to split. */ + if (raft->leader != 0) + return; + /* Not a candidate = can't trigger term bump anyway. */ + if (!raft->is_candidate) + return; + /* + * WAL write in progress means the state is changing. All is rechecked + * when it is done. + */ + if (raft->is_write_in_progress) + return; + if (!raft_has_split_vote(raft)) + return; + assert(raft_ev_is_active(&raft->timer)); + /* + * Could be already detected before. The timeout would be updated by now + * then. + */ + if (raft->timer.repeat < raft->election_timeout) + return; + + assert(raft->state == RAFT_STATE_FOLLOWER || + raft->state == RAFT_STATE_CANDIDATE); + struct ev_loop *loop = raft_loop(); + struct ev_timer *timer = &raft->timer; + double delay = raft_new_random_election_shift(raft); + /* + * Could be too late to speed up anything - probably the term is almost + * over anyway. + */ + double remaining = raft_ev_timer_remaining(loop, timer); + if (delay >= remaining) + delay = remaining; + say_info("RAFT: split vote is discovered - %s, new term in %lf sec", + raft_scores_str(raft), delay); + raft_ev_timer_stop(loop, timer); + raft_ev_timer_set(timer, delay, delay); + raft_ev_timer_start(loop, timer); +} + void raft_create(struct raft *raft, const struct raft_vtab *vtab) { @@ -1058,6 +1180,7 @@ raft_create(struct raft *raft, const struct raft_vtab *vtab) .election_quorum = 1, .election_timeout = 5, .death_timeout = 5, + .cluster_size = VCLOCK_MAX, .vtab = vtab, }; raft_ev_timer_init(&raft->timer, raft_sm_schedule_new_election_cb, diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index 1d064bef0..817148792 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -219,6 +219,8 @@ struct raft { * elections can be started. */ double death_timeout; + /** Number of instances registered in the cluster. */ + int cluster_size; /** Virtual table to perform application-specific actions. */ const struct raft_vtab *vtab; /** @@ -333,6 +335,10 @@ raft_cfg_instance_id(struct raft *raft, uint32_t instance_id); void raft_cfg_vclock(struct raft *raft, const struct vclock *vclock); +/** Configure number of registered instances in the cluster. */ +void +raft_cfg_cluster_size(struct raft *raft, int size); + /** * Bump the term. When it is persisted, the node checks if there is a leader, * and if there is not, a new election is started. That said, this function can diff --git a/test/unit/raft.c b/test/unit/raft.c index 9f18f4d8e..2fe6354cd 100644 --- a/test/unit/raft.c +++ b/test/unit/raft.c @@ -1497,10 +1497,308 @@ raft_test_promote_restore(void) raft_finish_test(); } +static void +raft_test_split_vote(void) +{ + raft_start_test(58); + struct raft_node node; + raft_node_create(&node); + + /* + * Normal split vote. + */ + raft_node_cfg_cluster_size(&node, 4); + raft_node_cfg_election_quorum(&node, 3); + raft_run_next_event(); + + ok(raft_node_check_full_state(&node, + RAFT_STATE_CANDIDATE /* State. */, + 0 /* Leader. */, + 2 /* Term. */, + 1 /* Vote. */, + 2 /* Volatile term. */, + 1 /* Volatile vote. */, + "{0: 1}" /* Vclock. */ + ), "elections with a new term"); + + /* Make so node 1 has votes 1 and 2. Node 3 has votes 3 and 4. */ + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 1 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 1 from 2"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 3 /* Vote. */, + 3 /* Source. */ + ), 0, "vote response for 3 from 3"); + + ok(node.raft.timer.repeat >= node.raft.election_timeout, "term " + "timeout >= election timeout normally"); + + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 3 /* Vote. */, + 4 /* Source. */ + ), 0, "vote response for 3 from 4"); + + ok(node.raft.timer.repeat < node.raft.election_timeout, "split vote " + "reduced the term timeout"); + + raft_run_next_event(); + ok(raft_node_check_full_state(&node, + RAFT_STATE_CANDIDATE /* State. */, + 0 /* Leader. */, + 3 /* Term. */, + 1 /* Vote. */, + 3 /* Volatile term. */, + 1 /* Volatile vote. */, + "{0: 2}" /* Vclock. */ + ), "a new term"); + + ok(node.raft.timer.repeat >= node.raft.election_timeout, "timeout is " + "normal again"); + + /* + * Cluster size change can make split vote. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 3); + raft_node_cfg_election_quorum(&node, 2); + raft_run_next_event(); + is(node.raft.state, RAFT_STATE_CANDIDATE, "is candidate"); + is(node.raft.vote, 1, "voted for self"); + + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + ok(node.raft.timer.repeat >= node.raft.election_timeout, "the vote is " + "not split yet"); + + raft_node_cfg_cluster_size(&node, 2); + ok(node.raft.timer.repeat < node.raft.election_timeout, "cluster size " + "change makes split vote"); + + raft_run_next_event(); + ok(raft_node_check_full_state(&node, + RAFT_STATE_CANDIDATE /* State. */, + 0 /* Leader. */, + 3 /* Term. */, + 1 /* Vote. */, + 3 /* Volatile term. */, + 1 /* Volatile vote. */, + "{0: 2}" /* Vclock. */ + ), "a new term"); + + /* + * Split vote can be even when the leader is already known - then + * nothing to do. Votes are just left from the beginning of the term and + * then probably cluster size reduced a bit. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 3); + raft_node_cfg_election_quorum(&node, 2); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + /* There is also a vote from 3 for 2. But it wasn't delivered to 1. */ + is(raft_node_send_leader(&node, + 2 /* Term. */, + 2 /* Source. */ + ), 0, "message is accepted"); + is(node.raft.leader, 2, "other node's leadership is accepted"); + is(raft_vote_count(&node.raft), 1, "the only own vote was from self"); + + raft_node_cfg_cluster_size(&node, 2); + ok(node.raft.timer.repeat >= node.raft.death_timeout, "cluster change " + "does not affect the leader's death waiting"); + + /* + * Non-candidate should ignore split vote. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 3); + raft_node_cfg_election_quorum(&node, 3); + raft_node_cfg_is_candidate(&node, false); + + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 3 /* Vote. */, + 3 /* Source. */ + ), 0, "vote response for 3 from 3"); + + ok(!raft_ev_is_active(&node.raft.timer), "voter couldn't schedule " + "new term"); + + /* + * Split vote can get worse, but it shouldn't lead to new term delay + * restart. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 3); + raft_node_cfg_election_quorum(&node, 3); + + raft_run_next_event(); + is(node.raft.term, 2, "bump term"); + is(node.raft.vote, 1, "vote for self"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + + double delay = node.raft.timer.repeat; + ok(delay < node.raft.election_timeout, "split vote is already " + "inevitable"); + + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 3 /* Vote. */, + 3 /* Source. */ + ), 0, "vote response for 3 from 3"); + + is(delay, node.raft.timer.repeat, "split vote got worse, but delay " + "didn't change"); + + /* + * Handle split vote when WAL write is in progress. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 2); + raft_node_cfg_election_quorum(&node, 2); + + raft_node_block(&node); + raft_run_next_event(); + is(node.raft.term, 1, "old term"); + is(node.raft.vote, 0, "unused vote"); + is(node.raft.volatile_term, 2, "new volatile term"); + is(node.raft.volatile_vote, 1, "new volatile vote"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + + raft_node_unblock(&node); + is(node.raft.term, 2, "new term"); + is(node.raft.vote, 1, "voted for self"); + is(node.raft.volatile_term, 2, "volatile term"); + is(node.raft.volatile_vote, 1, "volatile vote"); + ok(node.raft.timer.repeat < node.raft.election_timeout, "found split " + "vote after WAL write"); + + raft_run_next_event(); + is(node.raft.term, 3, "bump term"); + is(node.raft.vote, 1, "vote for self"); + + /* + * Split vote check is disabled when cluster size < quorum. Makes no + * sense to speed the elections up. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 1); + raft_node_cfg_election_quorum(&node, 2); + + raft_run_next_event(); + is(node.raft.term, 2, "bump term"); + is(node.raft.vote, 1, "vote for self"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + + ok(node.raft.timer.repeat >= node.raft.election_timeout, "split vote " + "is not checked for cluster < quorum"); + + /* + * Split vote check is disabled when vote count > cluster size. The + * reason is the same as with quorum > cluster size - something is odd, + * more term bumps won't help. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 3); + raft_node_cfg_election_quorum(&node, 2); + + raft_run_next_event(); + is(node.raft.term, 2, "bump term"); + is(node.raft.vote, 1, "vote for self"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 3 /* Source. */ + ), 0, "vote response for 2 from 3"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 3 /* Vote. */, + 4 /* Source. */ + ), 0, "vote response for 3 from 4"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 4 /* Vote. */, + 5 /* Source. */ + ), 0, "vote response for 4 from 5"); + + ok(node.raft.timer.repeat >= node.raft.election_timeout, "split vote " + "is not checked when vote count > cluster size"); + + /* + * Split vote can happen if quorum was suddenly increased. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 3); + raft_node_cfg_election_quorum(&node, 2); + + raft_run_next_event(); + is(node.raft.term, 2, "bump term"); + is(node.raft.vote, 1, "vote for self"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + + ok(node.raft.timer.repeat >= node.raft.election_timeout, "not split " + "vote yet"); + + raft_node_cfg_election_quorum(&node, 3); + ok(node.raft.timer.repeat < node.raft.election_timeout, "split vote " + "after quorum increase"); + + raft_run_next_event(); + is(node.raft.term, 3, "bump term"); + is(node.raft.vote, 1, "vote for self"); + + raft_node_destroy(&node); + raft_finish_test(); +} + static int main_f(va_list ap) { - raft_start_test(14); + raft_start_test(15); (void) ap; fakeev_init(); @@ -1519,6 +1817,7 @@ main_f(va_list ap) raft_test_enable_disable(); raft_test_too_long_wal_write(); raft_test_promote_restore(); + raft_test_split_vote(); fakeev_free(); diff --git a/test/unit/raft.result b/test/unit/raft.result index 3fcdf8180..cfc9be733 100644 --- a/test/unit/raft.result +++ b/test/unit/raft.result @@ -1,5 +1,5 @@ *** main_f *** -1..14 +1..15 *** raft_test_leader_election *** 1..24 ok 1 - 1 pending message at start @@ -262,4 +262,66 @@ ok 13 - subtests ok 12 - not a candidate ok 14 - subtests *** raft_test_promote_restore: done *** + *** raft_test_split_vote *** + 1..58 + ok 1 - elections with a new term + ok 2 - vote response for 1 from 2 + ok 3 - vote response for 3 from 3 + ok 4 - term timeout >= election timeout normally + ok 5 - vote response for 3 from 4 + ok 6 - split vote reduced the term timeout + ok 7 - a new term + ok 8 - timeout is normal again + ok 9 - is candidate + ok 10 - voted for self + ok 11 - vote response for 2 from 2 + ok 12 - the vote is not split yet + ok 13 - cluster size change makes split vote + ok 14 - a new term + ok 15 - vote response for 2 from 2 + ok 16 - message is accepted + ok 17 - other node's leadership is accepted + ok 18 - the only own vote was from self + ok 19 - cluster change does not affect the leader's death waiting + ok 20 - vote response for 2 from 2 + ok 21 - vote response for 3 from 3 + ok 22 - voter couldn't schedule new term + ok 23 - bump term + ok 24 - vote for self + ok 25 - vote response for 2 from 2 + ok 26 - split vote is already inevitable + ok 27 - vote response for 3 from 3 + ok 28 - split vote got worse, but delay didn't change + ok 29 - old term + ok 30 - unused vote + ok 31 - new volatile term + ok 32 - new volatile vote + ok 33 - vote response for 2 from 2 + ok 34 - new term + ok 35 - voted for self + ok 36 - volatile term + ok 37 - volatile vote + ok 38 - found split vote after WAL write + ok 39 - bump term + ok 40 - vote for self + ok 41 - bump term + ok 42 - vote for self + ok 43 - vote response for 2 from 2 + ok 44 - split vote is not checked for cluster < quorum + ok 45 - bump term + ok 46 - vote for self + ok 47 - vote response for 2 from 2 + ok 48 - vote response for 2 from 3 + ok 49 - vote response for 3 from 4 + ok 50 - vote response for 4 from 5 + ok 51 - split vote is not checked when vote count > cluster size + ok 52 - bump term + ok 53 - vote for self + ok 54 - vote response for 2 from 2 + ok 55 - not split vote yet + ok 56 - split vote after quorum increase + ok 57 - bump term + ok 58 - vote for self +ok 15 - subtests + *** raft_test_split_vote: done *** *** main_f: done *** diff --git a/test/unit/raft_test_utils.c b/test/unit/raft_test_utils.c index b5754cf78..1d7dd01ba 100644 --- a/test/unit/raft_test_utils.c +++ b/test/unit/raft_test_utils.c @@ -194,6 +194,7 @@ raft_node_create(struct raft_node *node) node->cfg_election_quorum = 3; node->cfg_death_timeout = 5; node->cfg_instance_id = 1; + node->cfg_cluster_size = VCLOCK_MAX; node->cfg_vclock = &node->journal.vclock; raft_journal_create(&node->journal, node->cfg_instance_id); raft_node_start(node); @@ -365,6 +366,7 @@ raft_node_start(struct raft_node *node) raft_cfg_election_quorum(&node->raft, node->cfg_election_quorum); raft_cfg_death_timeout(&node->raft, node->cfg_death_timeout); raft_cfg_instance_id(&node->raft, node->cfg_instance_id); + raft_cfg_cluster_size(&node->raft, node->cfg_cluster_size); raft_cfg_vclock(&node->raft, node->cfg_vclock); raft_run_async_work(); } @@ -423,6 +425,16 @@ raft_node_cfg_is_candidate(struct raft_node *node, bool value) } } +void +raft_node_cfg_cluster_size(struct raft_node *node, int value) +{ + node->cfg_cluster_size = value; + if (raft_node_is_started(node)) { + raft_cfg_cluster_size(&node->raft, value); + raft_run_async_work(); + } +} + void raft_node_cfg_election_timeout(struct raft_node *node, double value) { diff --git a/test/unit/raft_test_utils.h b/test/unit/raft_test_utils.h index c68dc3b22..2138a829e 100644 --- a/test/unit/raft_test_utils.h +++ b/test/unit/raft_test_utils.h @@ -32,6 +32,7 @@ #include "fakesys/fakeev.h" #include "fiber.h" #include "raft/raft.h" +#include "raft/raft_ev.h" #include "unit.h" /** WAL simulation. It stores a list of rows which raft wanted to persist. */ @@ -105,6 +106,7 @@ struct raft_node { int cfg_election_quorum; double cfg_death_timeout; uint32_t cfg_instance_id; + int cfg_cluster_size; struct vclock *cfg_vclock; }; @@ -227,6 +229,9 @@ raft_node_cfg_is_enabled(struct raft_node *node, bool value); void raft_node_cfg_is_candidate(struct raft_node *node, bool value); +void +raft_node_cfg_cluster_size(struct raft_node *node, int value); + void raft_node_cfg_election_timeout(struct raft_node *node, double value); -- 2.24.3 (Apple Git-128)