From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4C14E6ECE3; Tue, 18 Jan 2022 16:20:55 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4C14E6ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1642512055; bh=C6E5dpX8Rl/i659+j2NHp9Q1wCyU3vLOez2A7IUIPZM=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=hlSfxxq8TstKvYBIU+P2wGPG+4pw608ObJYoxFCRiW9Yv3U9lfGFJyCisZQqYFCbZ Nk2GykvHe05yQxz3BGPNC2jAeVbKVz4K6QV7cVeJQfL64OdO5W7peiZvrAawKIJd+c 7lXiDO4Ix9w85Vof92PxqNL67f//0RESwnuWh6Ws= Received: from smtp36.i.mail.ru (smtp36.i.mail.ru [94.100.177.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id EEC6F6ECE3 for ; Tue, 18 Jan 2022 16:20:53 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EEC6F6ECE3 Received: by smtp36.i.mail.ru with esmtpa (envelope-from ) id 1n9oPt-0002nd-9X; Tue, 18 Jan 2022 16:20:53 +0300 Message-ID: <0d65c52d-c42f-7271-d4d2-a997268138a7@tarantool.org> Date: Tue, 18 Jan 2022 16:20:52 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org References: <8ce7d7d2ff3c79f11f73272ad08e43838689681a.1642207647.git.v.shpilevoy@tarantool.org> Content-Language: en-GB In-Reply-To: <8ce7d7d2ff3c79f11f73272ad08e43838689681a.1642207647.git.v.shpilevoy@tarantool.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9674A2F03CC1604743A02485C61358AA042E7106EAE311396182A05F538085040CA9020C84666CA7FC84C5B7391AF9933A3E3E098A3FF3EFECBDF7AADC3168B8A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE72E2D36A15E1833D8EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637389D8DDD54F43F7A8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D89B166FC2C8C5CAADA412A1F37DE0F1B2117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC60CDF180582EB8FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751F6FD1C55BDD38FC3FD2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE9647ADFADE5905B1FCE65BE3358055BDD8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE3FA486DC37A503D0B6E0066C2D8992A16C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637B8F435DEDE9E76EBEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-C1DE0DAB: 0D63561A33F958A526056659DB23F0EFA69A1FED4CF991EC6B20FBD55A79A7BCD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7506FE1F977233B9BB410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3457FA942CB4462B4CCF1593BE8577844CE1C86593D2AB777D44F9F51192A87C3397B5527C8B3196A01D7E09C32AA3244CF54CA6879C98C33D23C7BABB5F14FEBB30452B15D76AEC14729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojvTjFtNOxqHDXb1u4iFYTVQ== X-Mailru-Sender: 583F1D7ACE8F49BD7B46BC6C7C9DD5A8644155520AE1EBB90BF7AD3A5BC7F830C80A510C50CF11BA424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BCB0DAF586E7D11B3E67EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 3/4] raft: introduce split vote detection X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for the patch! I don't think this optimisation is "too much of a hassle". It's quite nice, and looks like a bunch of SLOC in the patch are used up by verbose printing (I mean raft_scores_snprint). In other words, I like the idea and I think we should have that on board. (Just like pre-voting) Please find my comments below. > diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c > index 289d53fd5..5dcbc7821 100644 > --- a/src/lib/raft/raft.c > +++ b/src/lib/raft/raft.c > @@ -152,20 +152,69 @@ raft_can_vote_for(const struct raft *raft, const struct vclock *v) > return cmp == 0 || cmp == 1; > } > > -static inline void > +static inline bool > raft_add_vote(struct raft *raft, int src, int dst) > { > struct raft_vote *v = &raft->votes[src]; > if (v->did_vote) > - return; > + return false; > v->did_vote = true; > ++raft->votes[dst].count; > + return true; > +} > + You may check split_vote right in raft_add_vote: simply track number of votes given in this term and max votes given for one instance. This way you won't have to run over all 32 nodes each time a vote is casted. > +static bool > +raft_has_split_vote(const struct raft *raft) > +{ > + int max_vote = 0; > + int vote_vac = raft->cluster_size; > + int quorum = raft->election_quorum; > + for (int i = 0; i < VCLOCK_MAX; ++i) { > + int count = raft->votes[i].count; > + vote_vac -= count; > + if (count > max_vote) > + max_vote = count; > + } > + return max_vote < quorum && max_vote + vote_vac < quorum; This is equal to `return max_vote + vote_vac < quorum` > +} > + > +static int > +raft_scores_snprintf(const struct raft *raft, char *buf, int size) > +{ > + int total = 0; > + bool is_empty = true; > + SNPRINT(total, snprintf, buf, size, "{"); > + for (int i = 0; i < VCLOCK_MAX; ++i) { > + int count = raft->votes[i].count; > + if (count == 0) > + continue; > + if (!is_empty) > + SNPRINT(total, snprintf, buf, size, ", "); > + is_empty = false; Nit: you may move is_empty = false into the 'else' branch. > + SNPRINT(total, snprintf, buf, size, "%d: %d", i, count); > + } > + SNPRINT(total, snprintf, buf, size, "}"); > + return total; > +} > + ... > > +static void > +raft_check_split_vote(struct raft *raft) > +{ > + /* When leader is known, there is no election. Thus no vote to split. */ > + if (raft->leader != 0) > + return; > + /* Not a candidate = can't trigger term bump anyway. */ > + if (!raft->is_candidate) > + return; > + /* > + * WAL write in progress means the state is changing. All is rechecked > + * when it is done. > + */ > + if (raft->is_write_in_progress) > + return; > + if (!raft_has_split_vote(raft)) > + return; > + assert(raft_ev_is_active(&raft->timer)); > + if (raft->timer.at < raft->election_timeout) > + return; I don't understand that.  timer.at should point at current time, shouldn't it? > + > + assert(raft->state == RAFT_STATE_FOLLOWER || > + raft->state == RAFT_STATE_CANDIDATE); > + struct ev_loop *loop = raft_loop(); > + struct ev_timer *timer = &raft->timer; > + double delay = raft_new_random_election_shift(raft); > + /* > + * Could be too late to speed up anything - probably the term is almost > + * over anyway. > + */ > + double remaining = raft_ev_timer_remaining(loop, timer); > + if (delay >= remaining) > + delay = remaining; > + say_info("RAFT: split vote is discovered - %s, new term in %lf sec", > + raft_scores_str(raft), delay); > + raft_ev_timer_stop(loop, timer); > + raft_ev_timer_set(timer, delay, delay); > + raft_ev_timer_start(loop, timer); > +} > + > void > raft_create(struct raft *raft, const struct raft_vtab *vtab) > { > @@ -1053,6 +1150,7 @@ raft_create(struct raft *raft, const struct raft_vtab *vtab) > .election_quorum = 1, > .election_timeout = 5, > .death_timeout = 5, > + .cluster_size = VCLOCK_MAX, > .vtab = vtab, > }; > raft_ev_timer_init(&raft->timer, raft_sm_schedule_new_election_cb, ... > -- Serge Petrenko