From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 1AD406901A; Wed, 14 Apr 2021 17:21:24 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1AD406901A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618410084; bh=2JmzhgRrc7vqovr4oovDVfq+zCCA6x+MKruu8E6giAQ=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=yJjszRy4C17YkpKllcUASxJXcMt+oFLaJTY50SEeshsnqB4vFeLunO1fU+/AWLdZN YysihA5hiyDPgP5tH+/xG/4LcX4dGE31QHd/8wkYTqZ8KKWNMHfUfgqiE8WZzhmiCF LJgRFITkD8aNMKvPRJzjWOhUwvRV5Us9FI31EMTU= Received: from smtp41.i.mail.ru (smtp41.i.mail.ru [94.100.177.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id CFF676901F for ; Wed, 14 Apr 2021 17:17:30 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CFF676901F Received: by smtp41.i.mail.ru with esmtpa (envelope-from ) id 1lWgKg-000456-2d; Wed, 14 Apr 2021 17:17:30 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Wed, 14 Apr 2021 17:17:18 +0300 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E74809299FB6B3996B874F289A033C44AD400182A05F538085040E80B4DC550F93223A6A662BFCE71F2AB35529A93255B3E62FC2F70BE79D490B0 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE780D115B306136E0AEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637E724D704EAF55A818638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B22F154D0E469B81315D1836BB3C0FA55679373A8315E4CF65D2E47CDBA5A96583C09775C1D3CA48CF4964A708C60C975A117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE7D38CA8945C5827839FA2833FD35BB23DF004C90652538430302FCEF25BFAB3454AD6D5ED66289B5278DA827A17800CE78789DBBAA2D158C7D32BA5DBAC0009BE395957E7521B51C20BC6067A898B09E4090A508E0FED6299176DF2183F8FC7C08D8F4ECA0265E8D1CD04E86FAF290E2D7E9C4E3C761E06A71DD303D21008E298D5E8D9A59859A8B6B372FE9A2E580EFC725E5C173C3A84C3147EE4BF5BC498A835872C767BF85DA2F004C90652538430E4A6367B16DE6309 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8183A4AFAF3EA6BDC44E1F4276B80994196B69F9342289A40B32780266D8734C8D54DEE0E591B02A3EF9C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF0417BEADF48D1460699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34F0A5F58274334C952964F9517519D93E0A64F97D76191C40444189F259FD5D8CA2570461E9E923E71D7E09C32AA3244C16A7F0F82ADC8372D8C4AE5E866BC6C1D9ADFF0C0BDB8D1F927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojrcJA+pXcDungKoKvrBUTbA== X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F832D6326595065D5C7332A133D186D518B1FD8F7A6DEF3EBB424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v3 08/10] Support manual elections in `box.ctl.clear_synchro_queue()` X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This patch adds support for manual elections from `box.ctl.clear_synchro_queue()`. When an instance is in `election_mode='manual'`, calling `clear_synchro_queue()` will make it start a new election round. Follow-up #5445 Part of #3055 @TarantoolBot document Title: describe election_mode='manual' Manual election mode is introduced. It may be used when the user wants to control which instance is the leader explicitly instead of relying on Raft election algorithm. When an instance is configured with `election_mode='manual'`, it behaves as follows: 1) By default, the instance acts like a voter: it is read-only and may vote for other instances that are candidates. 2) Once `box.ctl.clear_synchro_queue()` is called, the instance becomes a candidate and starts a new election round. If the instance wins the elections, it remains leader, but won't participate in any new elections. --- src/box/box.cc | 73 ++++++++++++++++++++++++++++++++++--- src/box/errcode.h | 3 ++ src/box/raft.c | 30 +++++++++++++-- src/box/raft.h | 3 ++ src/lib/raft/raft.c | 12 +++++- src/lib/raft/raft.h | 2 +- test/box/error.result | 3 ++ test/unit/raft_test_utils.c | 4 +- 8 files changed, 115 insertions(+), 15 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index 3729ed997..6c7c8968a 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1161,7 +1161,8 @@ box_set_election_mode(void) if (mode == ELECTION_MODE_INVALID) return -1; box_election_mode = mode; - raft_cfg_is_candidate(box_raft(), mode == ELECTION_MODE_CANDIDATE); + raft_cfg_is_candidate(box_raft(), mode == ELECTION_MODE_CANDIDATE, + true); raft_cfg_is_enabled(box_raft(), mode != ELECTION_MODE_OFF); return 0; } @@ -1525,12 +1526,74 @@ box_clear_synchro_queue(bool try_wait) if (!is_box_configured || raft_source_term(box_raft(), instance_id) == box_raft()->term) return 0; + + bool run_elections = false; + + switch (box_election_mode) { + case ELECTION_MODE_OFF: + break; + case ELECTION_MODE_VOTER: + assert(box_raft()->state == RAFT_STATE_FOLLOWER); + diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'", + "manual elections"); + return -1; + case ELECTION_MODE_MANUAL: + assert(box_raft()->state != RAFT_STATE_CANDIDATE); + if (box_raft()->state == RAFT_STATE_LEADER) { + diag_set(ClientError, ER_ALREADY_LEADER); + return -1; + } + run_elections = true; + try_wait = false; + break; + case ELECTION_MODE_CANDIDATE: + /* + * Leader elections are enabled, and this instance is allowed to + * promote only if it's already an elected leader. No manual + * elections. + */ + if (box_raft()->state != RAFT_STATE_LEADER) { + diag_set(ClientError, ER_UNSUPPORTED, "election_mode=" + "'candidate'", "manual elections"); + return -1; + } + break; + default: + unreachable(); + } + uint32_t former_leader_id = txn_limbo.owner_id; int64_t wait_lsn = txn_limbo.confirmed_lsn; int rc = 0; int quorum = replication_synchro_quorum; in_clear_synchro_queue = true; + if (run_elections) { + /* + * Make this instance a candidate and run until some leader, not + * necessarily this instance, emerges. + */ + raft_cfg_is_candidate(box_raft(), true, false); + /* + * Trigger new elections without waiting for an old leader to + * disappear. + */ + raft_new_term(box_raft()); + box_raft_wait_leader_found(); + raft_cfg_is_candidate(box_raft(), false, false); + if (!box_raft()->is_enabled) { + diag_set(ClientError, ER_RAFT_DISABLED); + in_clear_synchro_queue = false; + return -1; + } + if (box_raft()->state != RAFT_STATE_LEADER) { + diag_set(ClientError, ER_INTERFERING_PROMOTE, + box_raft()->leader); + in_clear_synchro_queue = false; + return -1; + } + } + if (txn_limbo_is_empty(&txn_limbo)) goto promote; @@ -1548,12 +1611,10 @@ box_clear_synchro_queue(bool try_wait) * transactions. Exit in case someone did that for us. */ if (former_leader_id != txn_limbo.owner_id) { - /* - * TODO: error once we see someone else has become the - * leader already. - */ + diag_set(ClientError, ER_INTERFERING_PROMOTE, + txn_limbo.owner_id); in_clear_synchro_queue = false; - return 0; + return -1; } } diff --git a/src/box/errcode.h b/src/box/errcode.h index c63191fb6..df36085db 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -275,6 +275,9 @@ struct errcode_record { /*220 */_(ER_TOO_EARLY_SUBSCRIBE, "Can't subscribe non-anonymous replica %s until join is done") \ /*221 */_(ER_SQL_CANT_ADD_AUTOINC, "Can't add AUTOINCREMENT: space %s can't feature more than one AUTOINCREMENT field") \ /*222 */_(ER_QUORUM_WAIT, "Couldn't wait for quorum %d: %s") \ + /*223 */_(ER_INTERFERING_PROMOTE, "Instance with replica id %u was promoted first") \ + /*224 */_(ER_RAFT_DISABLED, "Elections were turned off while running box.ctl.promote()")\ + /*225 */_(ER_ALREADY_LEADER, "Can't promote an existing leader")\ /* * !IMPORTANT! Please follow instructions at start of the file diff --git a/src/box/raft.c b/src/box/raft.c index 285dbe4fd..c7dc79f9b 100644 --- a/src/box/raft.c +++ b/src/box/raft.c @@ -91,11 +91,11 @@ box_raft_update_synchro_queue(struct raft *raft) * If the node became a leader, it means it will ignore all records from * all the other nodes, and won't get late CONFIRM messages anyway. Can * clear the queue without waiting for confirmations. - * It's alright that the user may have called clear_synchro_queue - * manually. In this case the call below will exit immediately and we'll - * simply log a warning. + * In case these are manual elections, we are already in the middle of a + * `clear_synchro_queue` call. No need to call it once again. */ - if (raft->state == RAFT_STATE_LEADER) { + if (raft->state == RAFT_STATE_LEADER && + box_election_mode != ELECTION_MODE_MANUAL) { int rc = 0; uint32_t errcode = 0; do { @@ -336,6 +336,28 @@ fail: panic("Could not write a raft request to WAL\n"); } +static int +box_raft_wait_leader_found_trig(struct trigger *trig, void *event) +{ + struct raft *raft = (struct raft *)event; + assert(raft == box_raft()); + struct fiber *waiter = (struct fiber *)trig->data; + if (raft->leader != REPLICA_ID_NIL || !raft->is_enabled) + fiber_wakeup(waiter); + return 0; +} + +void +box_raft_wait_leader_found(void) +{ + struct trigger trig; + trigger_create(&trig, box_raft_wait_leader_found_trig, fiber(), NULL); + raft_on_update(box_raft(), &trig); + fiber_yield(); + assert(box_raft()->leader != REPLICA_ID_NIL || !box_raft()->is_enabled); + trigger_clear(&trig); +} + void box_raft_init(void) { diff --git a/src/box/raft.h b/src/box/raft.h index 15f4e80d9..8fce423e1 100644 --- a/src/box/raft.h +++ b/src/box/raft.h @@ -97,6 +97,9 @@ box_raft_checkpoint_remote(struct raft_request *req); int box_raft_process(struct raft_request *req, uint32_t source); +void +box_raft_wait_leader_found(); + void box_raft_init(void); diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c index e9ce8cade..7b77e05ea 100644 --- a/src/lib/raft/raft.c +++ b/src/lib/raft/raft.c @@ -846,7 +846,7 @@ raft_cfg_is_enabled(struct raft *raft, bool is_enabled) } void -raft_cfg_is_candidate(struct raft *raft, bool is_candidate) +raft_cfg_is_candidate(struct raft *raft, bool is_candidate, bool demote) { bool old_is_candidate = raft->is_candidate; raft->is_cfg_candidate = is_candidate; @@ -874,8 +874,16 @@ raft_cfg_is_candidate(struct raft *raft, bool is_candidate) raft_ev_timer_stop(raft_loop(), &raft->timer); } if (raft->state != RAFT_STATE_FOLLOWER) { - if (raft->state == RAFT_STATE_LEADER) + if (raft->state == RAFT_STATE_LEADER) { + if (!demote) { + /* + * Remain leader until someone + * triggers new elections. + */ + return; + } raft->leader = 0; + } raft->state = RAFT_STATE_FOLLOWER; /* State is visible and changed - broadcast. */ raft_schedule_broadcast(raft); diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index 01f548fee..390ea8ed8 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -325,7 +325,7 @@ raft_cfg_is_enabled(struct raft *raft, bool is_enabled); * the node still can vote, when Raft is enabled. */ void -raft_cfg_is_candidate(struct raft *raft, bool is_candidate); +raft_cfg_is_candidate(struct raft *raft, bool is_candidate, bool demote); /** Configure Raft leader election timeout. */ void diff --git a/test/box/error.result b/test/box/error.result index 7761c6949..dad6a21d3 100644 --- a/test/box/error.result +++ b/test/box/error.result @@ -441,6 +441,9 @@ t; | 220: box.error.TOO_EARLY_SUBSCRIBE | 221: box.error.SQL_CANT_ADD_AUTOINC | 222: box.error.QUORUM_WAIT + | 223: box.error.INTERFERING_PROMOTE + | 224: box.error.RAFT_DISABLED + | 225: box.error.ALREADY_LEADER | ... test_run:cmd("setopt delimiter ''"); diff --git a/test/unit/raft_test_utils.c b/test/unit/raft_test_utils.c index b8735f373..a10ccae6a 100644 --- a/test/unit/raft_test_utils.c +++ b/test/unit/raft_test_utils.c @@ -360,7 +360,7 @@ raft_node_start(struct raft_node *node) raft_process_recovery(&node->raft, &node->journal.rows[i]); raft_cfg_is_enabled(&node->raft, node->cfg_is_enabled); - raft_cfg_is_candidate(&node->raft, node->cfg_is_candidate); + raft_cfg_is_candidate(&node->raft, node->cfg_is_candidate, true); raft_cfg_election_timeout(&node->raft, node->cfg_election_timeout); raft_cfg_election_quorum(&node->raft, node->cfg_election_quorum); raft_cfg_death_timeout(&node->raft, node->cfg_death_timeout); @@ -402,7 +402,7 @@ raft_node_cfg_is_candidate(struct raft_node *node, bool value) { node->cfg_is_candidate = value; if (raft_node_is_started(node)) { - raft_cfg_is_candidate(&node->raft, value); + raft_cfg_is_candidate(&node->raft, value, true); raft_run_async_work(); } } -- 2.24.3 (Apple Git-128)