From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 606B26EC40; Sat, 5 Jun 2021 02:40:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 606B26EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1622850038; bh=v8IG6wklOCgDwbfXRnhDRAgS/n7UAopTUYeIXa3dQvM=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=kBBlU3L+AUI0XCGPqdM9ZssDNH28QIoPJgTTdxTO4eV3ww7sb1R0indn9AxXpoHqx lXDN64jLzMYpDS5KcMO5dPqO8Y6oKG5JZ28nYjAM642nih6xeLYFAcunIjB7G9RIcT QspAxgF6P7Ebh+gyiru7qRHgjra/ZTEuLgQvph5I= Received: from smtp16.mail.ru (smtp16.mail.ru [94.100.176.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 408A06EC5D for ; Sat, 5 Jun 2021 02:38:06 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 408A06EC5D Received: by smtp16.mail.ru with esmtpa (envelope-from ) id 1lpJOA-0001Ge-0N; Sat, 05 Jun 2021 02:38:06 +0300 To: tarantool-patches@dev.tarantool.org, gorcunov@gmail.com, sergepetrenko@tarantool.org Date: Sat, 5 Jun 2021 01:37:59 +0200 Message-Id: <6675abcfa409f1fd6e05a7e7852b42e1a08d1795.1622849790.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C540E30C2BDD69416C6E46C178CD572CB10182A05F538085040F01FFC17E31D9D4BF98245E8F7168FE099E5BC155AD1B158468BB721CE7E2C83 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE72B221FD723B94806EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006375F0BD5CF353A411D8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8750340C9E27A36E9A7FE969B7A473829117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD182CC0D3CB04F14752D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE599709FD55CB46A6D94E105876FE7799D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE364E7220B7C5505926E0066C2D8992A16C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F79006373BC478629CBEC79DEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2368A440D3B0F6089093C9A16E5BC824A2A04A2ABAA09D25379311020FFC8D4AD775094620C760E12B11B044584C7FD8E X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975CC176FDF34A1E429FE610AFE4714315DA08D2C79BF4FB412E9C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EFD99FB7B2A39B4961699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D346F8291983715AC66867A55C6A3C0C2B25693A2484ACF80377942FDD2A7B06D2BA65B421AAD80FF761D7E09C32AA3244CA08396F6E330414CD55570DDE362995C69B6CAE0477E908DFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojz99asgmzejoc4u6iwgvqVA== X-Mailru-Sender: 504CC1E875BF3E7D9BC0E5172ADA3110B8F3BE3BFF3B9650606BEC0FD62BA5AA307B5A32C6BFE16407784C02288277CA03E0582D3806FB6A5317862B1921BA260ED6CFD6382C13A6112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH 5/6] replication: use 'score' to find a join-master X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" The patch refactors the algorithm of finding a join-master (in replicaset_find_join_master()) to use scores instead of multiple iterations with different criteria. The original code was relatively fine as long as it had only one parameter to change - whether should it skip `box.cfg{read_only = true}` nodes. Although it was clear that it was "on the edge" of acceptable complexity due to a second non-configurable parameter whether a replica is in read-only state regardless of its config. It is going to get more complicated when the algorithm will take into account the third parameter whether an instance is bootstrapped. Then it should make decisions like "among bootstrapped nodes try to prefer instances not having read_only=true, and not being in read-only state". The easiest way to do so is to use scores/weights incremented according to the instance's parameters matching certain "good points". Part of #5613 --- src/box/replication.cc | 62 ++++++++++++++++-------------------------- 1 file changed, 23 insertions(+), 39 deletions(-) diff --git a/src/box/replication.cc b/src/box/replication.cc index 990f6239c..d33e70f28 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -960,71 +960,55 @@ replicaset_next(struct replica *replica) * replicas, choose a read-only replica with biggest vclock * as a leader, in hope it will become read-write soon. */ -static struct replica * -replicaset_round(bool skip_ro) +struct replica * +replicaset_find_join_master(void) { struct replica *leader = NULL; + int leader_score = -1; replicaset_foreach(replica) { struct applier *applier = replica->applier; if (applier == NULL) continue; const struct ballot *ballot = &applier->ballot; - /** - * While bootstrapping a new cluster, read-only - * replicas shouldn't be considered as a leader. - * The only exception if there is no read-write - * replicas since there is still a possibility - * that all replicas exist in cluster table. - */ - if (skip_ro && ballot->is_ro_cfg) - continue; - if (leader == NULL) { - leader = replica; - continue; - } - const struct ballot *leader_ballot = &leader->applier->ballot; + int score = 0; /* - * Try to find a replica which has already left - * orphan mode. + * Prefer instances not configured as read-only via box.cfg, and + * not being in read-only state due to any other reason. The + * config is stronger because if it is configured as read-only, + * it is in read-only state for sure, until the config is + * changed. */ - if (ballot->is_ro && !leader_ballot->is_ro) + if (!ballot->is_ro_cfg) + score += 5; + if (!ballot->is_ro) + score += 1; + if (leader_score < score) + goto elect; + if (score < leader_score) continue; + const struct ballot *leader_ballot; + leader_ballot = &leader->applier->ballot; /* * Choose the replica with the most advanced * vclock. If there are two or more replicas * with the same vclock, prefer the one with * the lowest uuid. */ - int cmp = vclock_compare_ignore0(&ballot->vclock, - &leader_ballot->vclock); + int cmp; + cmp = vclock_compare_ignore0(&ballot->vclock, + &leader_ballot->vclock); if (cmp < 0) continue; if (cmp == 0 && tt_uuid_compare(&replica->uuid, &leader->uuid) > 0) continue; + elect: leader = replica; + leader_score = score; } return leader; } -struct replica * -replicaset_find_join_master(void) -{ - bool skip_ro = true; - /** - * Two loops, first prefers read-write replicas among others. - * Second for backward compatibility, if there is no such - * replicas at all. - */ - struct replica *leader = replicaset_round(skip_ro); - if (leader == NULL) { - skip_ro = false; - leader = replicaset_round(skip_ro); - } - - return leader; -} - struct replica * replica_by_uuid(const struct tt_uuid *uuid) { -- 2.24.3 (Apple Git-128)