From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id E363C6E454; Thu, 10 Feb 2022 04:09:36 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E363C6E454 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1644455377; bh=XbW/VXYjj6eFzGl1lp7fJ0PHUgk7wN5vEeCSQbx9N3M=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=VJIFNaxpPuWEGDY7S6seMonT2rfnjRTiN4t+A8fJynsxK0UT8542yZaGmQ9Jw8njq NAdewAf6HQ8OTHu8wgaH9vZLQowYaotnZyOvmf1InCL/CSXc938is4Ft4jIJP35CN5 StfcQmJeZsvTjrZIFjdVZKslJNgI2OtN6y86mnJM= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 60CA76E454 for ; Thu, 10 Feb 2022 04:09:34 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 60CA76E454 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1nHxxl-0006AT-90; Thu, 10 Feb 2022 04:09:33 +0300 To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org Date: Thu, 10 Feb 2022 02:09:32 +0100 Message-Id: <3a1c7b7824df4481c8d70d4867d8af430b6c5b83.1644455329.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9B74B50284A7C1A0BDB3817FC5E3D06AF2B7372310051E825182A05F538085040466B1F7A6C5231B89C1DA6D0EA78E960DEABA153030E1F4155E4482D68B4A869 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE77B382B3D3EF460BFEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006379BBD3AAEA3DAB18A8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D864B0D2A23002EA58C861E445A1F86D80117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC8C7ADC89C2F0B2A5A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352026055571C92BF10FC26CFBAC0749D213D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6B1CFA6D474D4A6A4089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-8FC586DF: 6EFBBC1D9D64D975 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FC9470F1552F42C49B9CE157F21A1C6017E3E5076137C0A686B1881A6453793CE9C32612AADDFBE061A061FA01F0C13A426DABF04D5057A81F728CF7B057D10C70B4B51A2BAB7FBE05886A5961035A09600383DAD389E261318FB05168BE4CE3AF X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3472E5ECC12A9739C1A5EF7ED83BA279D95133291CC1CA060FD7BC4ED30F8EE4DE484AF84D472C24101D7E09C32AA3244C7A197841002F4122F84667CE81D5BD0DB038C9161EF167A1729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojptAT+HCMDqNRJ7/sB3Aqng== X-Mailru-Sender: 689FA8AB762F739339CABD9B3CA9A7D6CBE82D14C14CC46E197D8DFCC42CEE973841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E25FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: [Tarantool-patches] [PATCH 1/1] raft: do not rely on just ev_is_active for timers X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" When timer has 0 timeout and 0 repeat, during timers preparation libev makes them inactive right away but puts into the array of pending events. Therefore to check if a timer is really active (will be executed in the future) need to look both at ev_is_active() and ev_is_pending(). It could happen only during split-vote, because all the other places use election_timeout + random shift, while split vote uses just random shift - it can be 0. The patch makes raft do that for all timer checks. Also to make the testing reliable and not depend on random the shift factor now is configurable. For the test it is set to 0. Closes #6847 NO_DOC=Bugfix NO_CHANGELOG=Bug was not released --- Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-6847-split-vote-crash Issue: https://github.com/tarantool/tarantool/issues/6847 src/lib/fakesys/fakeev.c | 15 +++++++++++++-- src/lib/raft/raft.c | 33 ++++++++++++++++----------------- src/lib/raft/raft.h | 9 +++++++++ src/lib/raft/raft_ev.h | 5 +++++ test/unit/raft.c | 33 ++++++++++++++++++++++++++++++++- test/unit/raft.result | 8 +++++++- test/unit/raft_test_utils.c | 12 ++++++++++++ test/unit/raft_test_utils.h | 4 ++++ 8 files changed, 98 insertions(+), 21 deletions(-) diff --git a/src/lib/fakesys/fakeev.c b/src/lib/fakesys/fakeev.c index 333c1646f..00163e57d 100644 --- a/src/lib/fakesys/fakeev.c +++ b/src/lib/fakesys/fakeev.c @@ -184,8 +184,9 @@ fakeev_timer_event_delete(struct fakeev_event *e) { assert(e->type == FAKEEV_EVENT_TIMER); struct fakeev_timer_event *te = (struct fakeev_timer_event *)e; - assert(te->watcher->active == 1); + assert(te->watcher->active == 1 || te->watcher->pending == 1); te->watcher->active = 0; + te->watcher->pending = 0; mh_int_t rc = mh_i64ptr_find(events_hash, (uint64_t) te->watcher, NULL); assert(rc != mh_end(events_hash)); mh_i64ptr_del(events_hash, rc, NULL); @@ -206,8 +207,11 @@ fakeev_timer_event_process(struct fakeev_event *e, struct ev_loop *loop) struct ev_timer *t = (struct ev_timer *) w; fakeev_timer_event_delete(e); t->at = 0; + w->pending = 0; if (t->repeat > 0) fakeev_timer_event_new(w, t->repeat); + else + w->active = 0; ev_invoke(loop, w, EV_TIMER); } @@ -215,13 +219,20 @@ static void fakeev_timer_event_new(struct ev_watcher *watcher, double delay) { assert(watcher->active == 0); - watcher->active = 1; struct fakeev_timer_event *e = malloc(sizeof(*e)); assert(e != NULL); fakeev_event_create(&e->base, FAKEEV_EVENT_TIMER, delay, fakeev_timer_event_process, fakeev_timer_event_delete); e->watcher = watcher; + /* + * In libev 0 timeout without repeat makes the timer not + * active but just pending. To execute only once. + */ + if (delay > 0) + watcher->active = 1; + else + watcher->pending = 1; assert(fakeev_event_by_ev(watcher) == NULL); struct mh_i64ptr_node_t node = {(uint64_t) watcher, e}; mh_i64ptr_put(events_hash, &node, NULL, NULL); diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c index 89cfb3c17..f4ca67439 100644 --- a/src/lib/raft/raft.c +++ b/src/lib/raft/raft.c @@ -34,11 +34,6 @@ #include "fiber.h" #include "tt_static.h" -/** - * Maximal random deviation of the election timeout. From the configured value. - */ -#define RAFT_RANDOM_ELECTION_FACTOR 0.1 - /** * When decoding we should never trust that there is * a valid data incomes. @@ -124,10 +119,7 @@ raft_new_random_election_shift(const struct raft *raft) { double timeout = raft->election_timeout; /* Translate to ms. Integer is needed to be able to use mod below. */ - uint32_t rand_part = - (uint32_t)(timeout * RAFT_RANDOM_ELECTION_FACTOR * 1000); - if (rand_part == 0) - rand_part = 1; + uint32_t rand_part = (uint32_t)(timeout * raft->max_shift * 1000); /* * XXX: this is not giving a good distribution, but it is not so trivial * to implement a correct random value generator. There is a task to @@ -552,7 +544,7 @@ raft_process_heartbeat(struct raft *raft, uint32_t source) * anything was heard from the leader. Then in the timer callback check * the timestamp, and restart the timer, if it is fine. */ - assert(raft_ev_is_active(&raft->timer)); + assert(raft_ev_timer_is_active(&raft->timer)); raft_ev_timer_stop(raft_loop(), &raft->timer); raft_sm_wait_leader_dead(raft); } @@ -795,7 +787,7 @@ raft_sm_schedule_new_election_cb(struct ev_loop *loop, struct ev_timer *timer, static void raft_sm_wait_leader_dead(struct raft *raft) { - assert(!raft_ev_is_active(&raft->timer)); + assert(!raft_ev_timer_is_active(&raft->timer)); assert(!raft->is_write_in_progress); assert(raft->is_candidate); assert(raft->state == RAFT_STATE_FOLLOWER); @@ -807,7 +799,7 @@ raft_sm_wait_leader_dead(struct raft *raft) static void raft_sm_wait_leader_found(struct raft *raft) { - assert(!raft_ev_is_active(&raft->timer)); + assert(!raft_ev_timer_is_active(&raft->timer)); assert(!raft->is_write_in_progress); assert(raft->is_candidate); assert(raft->state == RAFT_STATE_FOLLOWER); @@ -819,7 +811,7 @@ raft_sm_wait_leader_found(struct raft *raft) static void raft_sm_wait_election_end(struct raft *raft) { - assert(!raft_ev_is_active(&raft->timer)); + assert(!raft_ev_timer_is_active(&raft->timer)); assert(!raft->is_write_in_progress); assert(raft->is_candidate); assert(raft->state == RAFT_STATE_FOLLOWER || @@ -841,7 +833,7 @@ static void raft_sm_start(struct raft *raft) { say_info("RAFT: start state machine"); - assert(!raft_ev_is_active(&raft->timer)); + assert(!raft_ev_timer_is_active(&raft->timer)); assert(!raft->is_enabled); assert(raft->state == RAFT_STATE_FOLLOWER); raft->is_enabled = true; @@ -1022,7 +1014,7 @@ raft_cfg_election_timeout(struct raft *raft, double timeout) raft->is_write_in_progress) return; - assert(raft_ev_is_active(&raft->timer)); + assert(raft_ev_timer_is_active(&raft->timer)); struct ev_loop *loop = raft_loop(); timeout += raft_ev_timer_remaining(loop, &raft->timer) - old_timeout; if (timeout < 0) @@ -1057,7 +1049,7 @@ raft_cfg_death_timeout(struct raft *raft, double timeout) raft->leader == 0) return; - assert(raft_ev_is_active(&raft->timer)); + assert(raft_ev_timer_is_active(&raft->timer)); struct ev_loop *loop = raft_loop(); timeout += raft_ev_timer_remaining(loop, &raft->timer) - old_timeout; if (timeout < 0) @@ -1067,6 +1059,12 @@ raft_cfg_death_timeout(struct raft *raft, double timeout) raft_ev_timer_start(loop, &raft->timer); } +void +raft_cfg_max_shift(struct raft *raft, double shift) +{ + raft->max_shift = shift; +} + void raft_cfg_instance_id(struct raft *raft, uint32_t instance_id) { @@ -1138,7 +1136,7 @@ raft_check_split_vote(struct raft *raft) return; if (!raft_has_split_vote(raft)) return; - assert(raft_ev_is_active(&raft->timer)); + assert(raft_ev_timer_is_active(&raft->timer)); /* * Could be already detected before. The timeout would be updated by now * then. @@ -1175,6 +1173,7 @@ raft_create(struct raft *raft, const struct raft_vtab *vtab) .election_quorum = 1, .election_timeout = 5, .death_timeout = 5, + .max_shift = 0.1, .cluster_size = VCLOCK_MAX, .vtab = vtab, }; diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index 05e373254..8935d2a82 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -223,6 +223,8 @@ struct raft { * elections can be started. */ double death_timeout; + /** Maximal deviation from the election timeout. */ + double max_shift; /** Number of instances registered in the cluster. */ int cluster_size; /** Virtual table to perform application-specific actions. */ @@ -325,6 +327,13 @@ raft_cfg_election_quorum(struct raft *raft, int election_quorum); void raft_cfg_death_timeout(struct raft *raft, double timeout); +/** + * Configure maximal random shift from the election timeout. The timeout during + * election is randomized as cfg_timeout * (1 + shift). + */ +void +raft_cfg_max_shift(struct raft *raft, double shift); + /** * Configure ID of the given Raft instance. The ID can't be changed after it is * assigned first time. diff --git a/src/lib/raft/raft_ev.h b/src/lib/raft/raft_ev.h index cfa663743..01f069b79 100644 --- a/src/lib/raft/raft_ev.h +++ b/src/lib/raft/raft_ev.h @@ -52,6 +52,11 @@ raft_loop(void); #define raft_ev_is_active ev_is_active +#define raft_ev_is_pending ev_is_pending + #define raft_ev_timer_init ev_timer_init #define raft_ev_timer_set ev_timer_set + +#define raft_ev_timer_is_active(w) \ + (raft_ev_is_active(w) || raft_ev_is_pending(w)) diff --git a/test/unit/raft.c b/test/unit/raft.c index 19144d632..41ad8c632 100644 --- a/test/unit/raft.c +++ b/test/unit/raft.c @@ -1565,7 +1565,7 @@ raft_test_bump_term_before_cfg() static void raft_test_split_vote(void) { - raft_start_test(58); + raft_start_test(64); struct raft_node node; raft_node_create(&node); @@ -1856,6 +1856,37 @@ raft_test_split_vote(void) is(node.raft.term, 3, "bump term"); is(node.raft.vote, 1, "vote for self"); + /* + * Split vote can make delay to next election 0. Timer with 0 timeout + * has a special state in libev. Another vote can come on the next + * even loop iteration just before the timer is triggered. It should be + * ready to the special state of the timer. + */ + raft_node_destroy(&node); + raft_node_create(&node); + raft_node_cfg_cluster_size(&node, 3); + raft_node_cfg_election_quorum(&node, 3); + raft_node_cfg_max_shift(&node, 0); + + raft_run_next_event(); + is(node.raft.term, 2, "bump term"); + is(node.raft.vote, 1, "vote for self"); + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 2 /* Vote. */, + 2 /* Source. */ + ), 0, "vote response for 2 from 2"); + + is(node.raft.timer.repeat, 0, "planned new election after yield"); + + is(raft_node_send_vote_response(&node, + 2 /* Term. */, + 3 /* Vote. */, + 3 /* Source. */ + ), 0, "vote response for 3 from 3"); + + is(node.raft.timer.repeat, 0, "still waiting for yield"); + raft_node_destroy(&node); raft_finish_test(); } diff --git a/test/unit/raft.result b/test/unit/raft.result index a4057acee..07349e4e0 100644 --- a/test/unit/raft.result +++ b/test/unit/raft.result @@ -273,7 +273,7 @@ ok 14 - subtests ok 15 - subtests *** raft_test_bump_term_before_cfg: done *** *** raft_test_split_vote *** - 1..58 + 1..64 ok 1 - elections with a new term ok 2 - vote response for 1 from 2 ok 3 - vote response for 3 from 3 @@ -332,6 +332,12 @@ ok 15 - subtests ok 56 - split vote after quorum increase ok 57 - bump term ok 58 - vote for self + ok 59 - bump term + ok 60 - vote for self + ok 61 - vote response for 2 from 2 + ok 62 - planned new election after yield + ok 63 - vote response for 3 from 3 + ok 64 - still waiting for yield ok 16 - subtests *** raft_test_split_vote: done *** *** main_f: done *** diff --git a/test/unit/raft_test_utils.c b/test/unit/raft_test_utils.c index d7e50984c..7fd945185 100644 --- a/test/unit/raft_test_utils.c +++ b/test/unit/raft_test_utils.c @@ -193,6 +193,7 @@ raft_node_create(struct raft_node *node) node->cfg_election_timeout = 5; node->cfg_election_quorum = 3; node->cfg_death_timeout = 5; + node->cfg_max_shift = 0.1; node->cfg_instance_id = 1; node->cfg_cluster_size = VCLOCK_MAX; node->cfg_vclock = &node->journal.vclock; @@ -384,6 +385,7 @@ raft_node_cfg(struct raft_node *node) raft_cfg_election_timeout(&node->raft, node->cfg_election_timeout); raft_cfg_election_quorum(&node->raft, node->cfg_election_quorum); raft_cfg_death_timeout(&node->raft, node->cfg_death_timeout); + raft_cfg_max_shift(&node->raft, node->cfg_max_shift); raft_cfg_instance_id(&node->raft, node->cfg_instance_id); raft_cfg_cluster_size(&node->raft, node->cfg_cluster_size); raft_cfg_vclock(&node->raft, node->cfg_vclock); @@ -484,6 +486,16 @@ raft_node_cfg_death_timeout(struct raft_node *node, double value) } } +void +raft_node_cfg_max_shift(struct raft_node *node, double value) +{ + node->cfg_max_shift = value; + if (raft_node_is_started(node)) { + raft_cfg_max_shift(&node->raft, value); + raft_run_async_work(); + } +} + bool raft_msg_check(const struct raft_msg *msg, enum raft_state state, uint64_t term, uint32_t vote, const char *vclock) diff --git a/test/unit/raft_test_utils.h b/test/unit/raft_test_utils.h index 8eab41490..7fd17f16a 100644 --- a/test/unit/raft_test_utils.h +++ b/test/unit/raft_test_utils.h @@ -105,6 +105,7 @@ struct raft_node { double cfg_election_timeout; int cfg_election_quorum; double cfg_death_timeout; + double cfg_max_shift; uint32_t cfg_instance_id; int cfg_cluster_size; struct vclock *cfg_vclock; @@ -249,6 +250,9 @@ raft_node_cfg_election_quorum(struct raft_node *node, int value); void raft_node_cfg_death_timeout(struct raft_node *node, double value); +void +raft_node_cfg_max_shift(struct raft_node *node, double value); + /** Check that @a msg message matches the given arguments. */ bool raft_msg_check(const struct raft_msg *msg, enum raft_state state, uint64_t term, -- 2.24.3 (Apple Git-128)