From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id DEA3F6F87A; Fri, 16 Apr 2021 18:38:02 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org DEA3F6F87A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618587482; bh=S+VyOX/H2SLoUcQytpbgdiJVX+3QiZ63AnTrArpweLE=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=H9ref7qM/eygN1iGjMeQjm2+PkMRtUDxAaIm0tXUCDALwS01QzUy5v2hnEzipvkse zeAvMOMsqmOyFY/MoaUbo53LNRAbR9ru33+je0wGq70M0vtkKa9GhQZEdA07GXuY05 /zwHbkFwE48Zk2z/4neZ22eHk1n80U4AShtOCZG4= Received: from smtp39.i.mail.ru (smtp39.i.mail.ru [94.100.177.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 08AB96F87A for ; Fri, 16 Apr 2021 18:38:01 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 08AB96F87A Received: by smtp39.i.mail.ru with esmtpa (envelope-from ) id 1lXQXh-0001Bi-6u; Fri, 16 Apr 2021 18:38:01 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: Message-ID: Date: Fri, 16 Apr 2021 18:38:00 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E7480257C85EA0BB7A95D5E28B957962BB550182A05F5380850402BBAC3687129D26A557CA5746D1CB4915675378791D9180225C4CD066835F26D X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7D6AE5EDF9DDECFEFC2099A533E45F2D0395957E7521B51C2CFCAF695D4D8E9FCEA1F7E6F0F101C6778DA827A17800CE7FBCED7D376B82B5EEA1F7E6F0F101C67CDEEF6D7F21E0D1D9295C2E9FA3191EE1B59CA4C82EFA658D093C58BF012B7A9257F9C2C6CD4D31CF6B57BC7E64490618DEB871D839B73339E8FC8737B5C2249957A4DEDD2346B42CC7F00164DA146DAFE8445B8C89999729449624AB7ADAF37F6B57BC7E64490611E7FA7ABCAF51C92176DF2183F8FC7C0B27420F9988F54058941B15DA834481F9449624AB7ADAF37BA3038C0950A5D3613377AFFFEAFD2691661749BA6B9773522FA39806B5793E97B076A6E789B0E97A8DF7F3B2552694A1E7802607F20496D49FD398EE364050F1E561CDFBCA1751FB78CF848AE20165DB3661434B16C20AC78D18283394535A9E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6753C3A5E0A5AB5B7089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A24209795067102C07E8F7B195E1C97831A8ED5C039F65B4A59BFD3320250DC888 X-C1DE0DAB: 0D63561A33F958A58EAA40A7196C3479F9E58B17CD805A650F2C97E8CCF9E2EDD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34D7D5115130EFC8241F1719D15A77061E914A4CAFAB879063E3AFD2994BAAAE629ADF3E1F91800FCC1D7E09C32AA3244C950B963D68AE6B8970853EDEC72820258A6D4CC6FBFAC251FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj3S6P1v0GIqQzZr19eV11Yw== X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F8F700C84F29D6A086350F2862374B97FD5E6D74ED9B01F55A424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3 08/10] Support manual elections in `box.ctl.clear_synchro_queue()` X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 16.04.2021 02:30, Vladislav Shpilevoy пишет: > Thanks for working on this! > > See 5 comments below. > >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 3729ed997..6c7c8968a 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -1525,12 +1526,74 @@ box_clear_synchro_queue(bool try_wait) >> if (!is_box_configured || >> raft_source_term(box_raft(), instance_id) == box_raft()->term) >> return 0; >> + >> + bool run_elections = false; >> + >> + switch (box_election_mode) { >> + case ELECTION_MODE_OFF: >> + break; >> + case ELECTION_MODE_VOTER: >> + assert(box_raft()->state == RAFT_STATE_FOLLOWER); >> + diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'", >> + "manual elections"); >> + return -1; >> + case ELECTION_MODE_MANUAL: >> + assert(box_raft()->state != RAFT_STATE_CANDIDATE); >> + if (box_raft()->state == RAFT_STATE_LEADER) { >> + diag_set(ClientError, ER_ALREADY_LEADER); >> + return -1; >> + } >> + run_elections = true; >> + try_wait = false; >> + break; >> + case ELECTION_MODE_CANDIDATE: >> + /* >> + * Leader elections are enabled, and this instance is allowed to >> + * promote only if it's already an elected leader. No manual >> + * elections. >> + */ >> + if (box_raft()->state != RAFT_STATE_LEADER) { > 1. That is strange. Why do you allow to promote the node > if it is already the leader when mode is candidate, but do > not allow the same when the mode is manual? It's because box_clear_synchro_queue() may be called by raft worker, once the node becomes leader. So I cannot forbid this. Actually, there's a guard for manual `box.ctl.clear_synchro_queue()` above, I just didn't make proper use of it. I mean these lines: if (!is_box_configured || raft_source_term(box_raft(), instance_id) == box_raft()->term) return 0; So I don't need the ER_ALREADY_LEADER in ELECTION_MODE_MANUAL case. Will fix. Thanks for pointing this out! > > Shouldn't we throw an error when the mode is candidate > regardless of the node role? > >> + diag_set(ClientError, ER_UNSUPPORTED, "election_mode=" >> + "'candidate'", "manual elections"); >> + return -1; >> + } >> + break; >> + default: >> + unreachable(); >> + } >> + >> uint32_t former_leader_id = txn_limbo.owner_id; >> int64_t wait_lsn = txn_limbo.confirmed_lsn; >> int rc = 0; >> int quorum = replication_synchro_quorum; >> in_clear_synchro_queue = true; >> >> + if (run_elections) { >> + /* >> + * Make this instance a candidate and run until some leader, not >> + * necessarily this instance, emerges. >> + */ >> + raft_cfg_is_candidate(box_raft(), true, false); >> + /* >> + * Trigger new elections without waiting for an old leader to >> + * disappear. >> + */ >> + raft_new_term(box_raft()); >> + box_raft_wait_leader_found(); >> + raft_cfg_is_candidate(box_raft(), false, false); > 2. What if during box_raft_wait_leader_found() I made the node candidate > via box.cfg? Won't you then reset it back to non-candidate here? Yes, that's true. > > It probably should reset the current box.cfg mode back. Not just > remove the candidate flag. I think it's strange to reconfigure box automatically. I suggest to reset node to non-candidate only if the mode remains MANUAL after the election. > >> + if (!box_raft()->is_enabled) { >> + diag_set(ClientError, ER_RAFT_DISABLED); >> + in_clear_synchro_queue = false; >> + return -1; >> + } >> + if (box_raft()->state != RAFT_STATE_LEADER) { >> + diag_set(ClientError, ER_INTERFERING_PROMOTE, >> + box_raft()->leader); >> + in_clear_synchro_queue = false; >> + return -1; >> + } >> + } >> + >> if (txn_limbo_is_empty(&txn_limbo)) >> goto promote; >> >> diff --git a/src/box/raft.c b/src/box/raft.c >> index 285dbe4fd..c7dc79f9b 100644 >> --- a/src/box/raft.c >> +++ b/src/box/raft.c >> @@ -336,6 +336,28 @@ fail: >> panic("Could not write a raft request to WAL\n"); >> } >> >> +static int >> +box_raft_wait_leader_found_trig(struct trigger *trig, void *event) > 3. I thought we usually call triggers with _f suffix, not _trig. Sure. Fixed. > >> +{ >> + struct raft *raft = (struct raft *)event; >> + assert(raft == box_raft()); >> + struct fiber *waiter = (struct fiber *)trig->data; > 4. No need to cast this and event - void * cast works naturally in C. Ok. > >> + if (raft->leader != REPLICA_ID_NIL || !raft->is_enabled) >> + fiber_wakeup(waiter); >> + return 0; >> +} >> diff --git a/src/box/raft.h b/src/box/raft.h >> index 15f4e80d9..8fce423e1 100644 >> --- a/src/box/raft.h >> +++ b/src/box/raft.h >> @@ -97,6 +97,9 @@ box_raft_checkpoint_remote(struct raft_request *req); >> int >> box_raft_process(struct raft_request *req, uint32_t source); >> >> +void >> +box_raft_wait_leader_found(); >> + >> void >> box_raft_init(void); >> >> diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c >> index e9ce8cade..7b77e05ea 100644 >> --- a/src/lib/raft/raft.c >> +++ b/src/lib/raft/raft.c >> @@ -846,7 +846,7 @@ raft_cfg_is_enabled(struct raft *raft, bool is_enabled) >> } >> >> void >> -raft_cfg_is_candidate(struct raft *raft, bool is_candidate) >> +raft_cfg_is_candidate(struct raft *raft, bool is_candidate, bool demote) > 5. I know it might lead to some code duplication, but probably > better move that to other functions. For example, > > raft_cfg_is_temporary_candidate() > > or something like that. Otherwise it appears surprisingly hard > to follow these 2 flags together. Although I might be wrong and > it would look worse. Did you try? > > Or another option: > > raft_cfg_is_candidate(box_raft(), true, false); > raft_cfg_is_candidate(box_raft(), false, false); > > turns into > > raft_start_candidate(box_raft()) > raft_stop_candidate(box_raft()) > > Also it would be good to have unit tests for the changes in raft.h > and raft.c. This variant sounds good. I'll implement in in a new commit. Incremental diff: ========================================== diff --git a/src/box/box.cc b/src/box/box.cc index 37938df15..fcd812c09 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1157,8 +1157,7 @@ box_set_election_mode(void)      if (mode == ELECTION_MODE_INVALID)          return -1;      box_election_mode = mode; -    raft_cfg_is_candidate(box_raft(), mode == ELECTION_MODE_CANDIDATE, -                  true); +    raft_cfg_is_candidate(box_raft(), mode == ELECTION_MODE_CANDIDATE);      raft_cfg_is_enabled(box_raft(), mode != ELECTION_MODE_OFF);      return 0;  } @@ -1534,11 +1533,7 @@ box_clear_synchro_queue(bool try_wait)               "manual elections");          return -1;      case ELECTION_MODE_MANUAL: -        assert(box_raft()->state != RAFT_STATE_CANDIDATE); -        if (box_raft()->state == RAFT_STATE_LEADER) { -            diag_set(ClientError, ER_ALREADY_LEADER); -            return -1; -        } +        assert(box_raft()->state == RAFT_STATE_FOLLOWER);          run_elections = true;          try_wait = false;          break; @@ -1569,14 +1564,19 @@ box_clear_synchro_queue(bool try_wait)           * Make this instance a candidate and run until some leader, not           * necessarily this instance, emerges.           */ -        raft_cfg_is_candidate(box_raft(), true, false); +        raft_start_candidate(box_raft());          /*           * Trigger new elections without waiting for an old leader to           * disappear.           */          raft_new_term(box_raft());          box_raft_wait_leader_found(); -        raft_cfg_is_candidate(box_raft(), false, false); +        /* +         * Do not reset raft mode if it was changed while running the +         * elections. +         */ +        if (box_election_mode == ELECTION_MODE_MANUAL) +            raft_stop_candidate(box_raft(), false);          if (!box_raft()->is_enabled) {              diag_set(ClientError, ER_RAFT_DISABLED);              in_clear_synchro_queue = false; diff --git a/src/box/errcode.h b/src/box/errcode.h index df36085db..d93820e96 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -277,7 +277,6 @@ struct errcode_record {      /*222 */_(ER_QUORUM_WAIT,        "Couldn't wait for quorum %d: %s") \      /*223 */_(ER_INTERFERING_PROMOTE,    "Instance with replica id %u was promoted first") \      /*224 */_(ER_RAFT_DISABLED,        "Elections were turned off while running box.ctl.promote()")\ -    /*225 */_(ER_ALREADY_LEADER,        "Can't promote an existing leader")\  /*   * !IMPORTANT! Please follow instructions at start of the file diff --git a/src/box/raft.c b/src/box/raft.c index c7dc79f9b..425353207 100644 --- a/src/box/raft.c +++ b/src/box/raft.c @@ -337,11 +337,11 @@ fail:  }  static int -box_raft_wait_leader_found_trig(struct trigger *trig, void *event) +box_raft_wait_leader_found_f(struct trigger *trig, void *event)  { -    struct raft *raft = (struct raft *)event; +    struct raft *raft = event;      assert(raft == box_raft()); -    struct fiber *waiter = (struct fiber *)trig->data; +    struct fiber *waiter = trig->data;      if (raft->leader != REPLICA_ID_NIL || !raft->is_enabled)          fiber_wakeup(waiter);      return 0; @@ -351,7 +351,7 @@ void  box_raft_wait_leader_found(void)  {      struct trigger trig; -    trigger_create(&trig, box_raft_wait_leader_found_trig, fiber(), NULL); +    trigger_create(&trig, box_raft_wait_leader_found_f, fiber(), NULL);      raft_on_update(box_raft(), &trig);      fiber_yield();      assert(box_raft()->leader != REPLICA_ID_NIL || !box_raft()->is_enabled); diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c index d557c907b..8deb06eb5 100644 --- a/src/lib/raft/raft.c +++ b/src/lib/raft/raft.c @@ -846,7 +846,7 @@ raft_cfg_is_enabled(struct raft *raft, bool is_enabled)  }  void -raft_cfg_is_candidate(struct raft *raft, bool is_candidate, bool demote) +raft_cfg_is_candidate(struct raft *raft, bool is_candidate)  {      raft->is_cfg_candidate = is_candidate;      is_candidate = is_candidate && raft->is_enabled; diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index b140ff3ba..69dec63c6 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -325,7 +325,7 @@ raft_cfg_is_enabled(struct raft *raft, bool is_enabled);   * the node still can vote, when Raft is enabled.   */  void -raft_cfg_is_candidate(struct raft *raft, bool is_candidate, bool demote); +raft_cfg_is_candidate(struct raft *raft, bool is_candidate);  /**   * Make the instance a candidate. diff --git a/test/box/error.result b/test/box/error.result index dad6a21d3..cc8cbaaa9 100644 --- a/test/box/error.result +++ b/test/box/error.result @@ -443,7 +443,6 @@ t;   |   222: box.error.QUORUM_WAIT   |   223: box.error.INTERFERING_PROMOTE   |   224: box.error.RAFT_DISABLED - |   225: box.error.ALREADY_LEADER   | ...  test_run:cmd("setopt delimiter ''"); diff --git a/test/unit/raft_test_utils.c b/test/unit/raft_test_utils.c index a10ccae6a..b8735f373 100644 --- a/test/unit/raft_test_utils.c +++ b/test/unit/raft_test_utils.c @@ -360,7 +360,7 @@ raft_node_start(struct raft_node *node)          raft_process_recovery(&node->raft, &node->journal.rows[i]);      raft_cfg_is_enabled(&node->raft, node->cfg_is_enabled); -    raft_cfg_is_candidate(&node->raft, node->cfg_is_candidate, true); +    raft_cfg_is_candidate(&node->raft, node->cfg_is_candidate);      raft_cfg_election_timeout(&node->raft, node->cfg_election_timeout);      raft_cfg_election_quorum(&node->raft, node->cfg_election_quorum);      raft_cfg_death_timeout(&node->raft, node->cfg_death_timeout); @@ -402,7 +402,7 @@ raft_node_cfg_is_candidate(struct raft_node *node, bool value)  {      node->cfg_is_candidate = value;      if (raft_node_is_started(node)) { -        raft_cfg_is_candidate(&node->raft, value, true); +        raft_cfg_is_candidate(&node->raft, value);          raft_run_async_work();      }  } -- Serge Petrenko