From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id CADA86EC55; Fri, 23 Jul 2021 10:44:41 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CADA86EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627026281; bh=ucF9/l+RSjH5xYmWqVhb41Kes1z94x5V6g3lb/sc7+Q=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=XimClFh5U7VdXvKFaYTIczBaYjQkg32kD/W/tdZt8ccxMuz0gu/b+gcMyVl/eM+3L ZguMHEr3bZeKaLmILENjSGqZgjaKKYugG0WjZuH3ZB7M+sOlt51wrEdV2KZkj65dw+ FkdJWJOMRy+77MrkToBZ/tFdHIM9BSE3fqN0A5Zg= Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8E0E96EC55 for ; Fri, 23 Jul 2021 10:44:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 8E0E96EC55 Received: by smtp42.i.mail.ru with esmtpa (envelope-from ) id 1m6prK-0005eO-Kt; Fri, 23 Jul 2021 10:44:39 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <3b4f7752-f68b-afce-c83a-355bcfbc28cc@tarantool.org> <9ae5765f-ffcb-af66-469b-b0a99d71c331@tarantool.org> Message-ID: <036d7d32-54bc-3239-9291-d713b062f324@tarantool.org> Date: Fri, 23 Jul 2021 09:44:36 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <9ae5765f-ffcb-af66-469b-b0a99d71c331@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C3FDAB68B812060C77E621B90589399EB5182A05F538085040ECC031A8C6F7CE3E323B2A10641980FC1E138BB696E8EA13CF2858984DD86216 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE707E3B922BD41848CEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063727BBC20C3D5F36038638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D85689973341884EE95D8DBD1B82DBBC31117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE437C869540D2AB0F1C9461EB66F04EBFD8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE317119E5299B287EE9735652A29929C6CC4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637FB177F6A8366F17BEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B505242D48C77D4EE1AB46FD04A998461368 X-C1DE0DAB: 0D63561A33F958A53414E27AF55A9EC4D5FA868A9763BC10B628B167E3E756D2D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7501A9DF589746230F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34AE88D5ADEBE7C983165493118417C5595CE50C914B328926D66350B2AF9F437A397804D0BC7D5F5D1D7E09C32AA3244CEFA66A4AC9A3E613553BD9571467F536BBA718C7E6A9E042FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojtKXY6NTIXk0wF5C9DMkLbg== X-Mailru-Sender: 9482C2B233321BD2827D960CA7C4D2E2CAE17C45F23809F32FF0E7B6C79CD36A219434EAC0284D466BB2E709EA627F343C7DDD459B58856F0E45BC603594F5A135B915D4279FF0579437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote` X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Petrenko via Tarantool-patches Reply-To: Sergey Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 22.07.2021 01:28, Vladislav Shpilevoy пишет: > Thanks for the fixes! > >>> 2. On top of this commit and on top of the branch too I tried to >>> promote a candidate and got a strange error in the logs, although >>> the promotion was successful: >>> >>> -- >>> -- Instance 1 >>> -- >>> >>> -- Step 1 >>> box.cfg{ >>>      listen = 3313, >>>      election_mode = 'candidate', >>>      replication_synchro_quorum = 2, >>>      replication = {'localhost:3314', 'localhost:3313'} >>> } >>> box.schema.user.grant('guest', 'super') >>> >>> >>> -- >>> -- Instance 2 >>> -- >>> >>> -- Step 2 >>> box.cfg{ >>>      listen = 3314, >>>      election_mode = 'voter', >>>      replication_synchro_quorum = 2, >>>      replication = {'localhost:3314', 'localhost:3313'}, >>>      read_only = true, >>> } >>> >>> -- Step 3 >>> box.cfg{read_only = false, election_mode = 'candidate'} >>> >>> -- Step 4 >>> box.ctl.promote() >>> >>> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations >>> --- >>> ... >> That's because once a candidate becomes the leader, it tries to issue `box.ctl.promote()`, and fails, >> since we're already in `box.ctl.promote()` call. >> I'm not sure how to handle that properly. This doesn't break anything though. > Still, the error message looks really not good. There is an option - > make box_promote() for candidate node just call raft_promote() and set > box_in_promote = false. Then wait for the term outcome. Will it work? > You would need to rebase your patchset on master branch then. Hmm, I don't like that. This will work, but it will complicate things in box_promote even more. Now one has to keep in mind that a manual promote call has 2 phases, one when it is issued manually and the other when it's called automatically when (and if) the instance becomes the leader. I think it'd be better to check whether we should run box_promote() right in box_raft_update_synchro_queue(). Check out the diff (mostly for commit "box: allow calling promote on a candidate"): ===================================== diff --git a/src/box/box.cc b/src/box/box.cc index a30e4f78d..5cdca4bd4 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1692,20 +1692,24 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)  }  /* A guard to block multiple simultaneous promote()/demote() invocations. */ -static bool box_in_promote = false; +static bool in_box_promote = false; + +bool box_in_promote(void) { +    return in_box_promote; +}  int  box_promote(void)  { -    if (box_in_promote) { +    if (in_box_promote) {          diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.promote",               "simultaneous invocations");          return -1;      }      struct raft *raft = box_raft(); -    box_in_promote = true; +    in_box_promote = true;      auto promote_guard = make_scoped_guard([&] { -        box_in_promote = false; +        in_box_promote = false;      });      if (!is_box_configured) @@ -1757,14 +1761,14 @@ box_promote(void)  int  box_demote(void)  { -    if (box_in_promote) { +    if (in_box_promote) {          diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.demote",               "simultaneous invocations");          return -1;      } -    box_in_promote = true; +    in_box_promote = true;      auto promote_guard = make_scoped_guard([&] { -        box_in_promote = false; +        in_box_promote = false;      });      if (!is_box_configured) diff --git a/src/box/box.h b/src/box/box.h index aaf20d9dd..344ed90f2 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -273,6 +273,9 @@ extern "C" {  typedef struct tuple box_tuple_t; +bool +box_in_promote(void); +  int  box_promote(void); diff --git a/src/box/raft.c b/src/box/raft.c index 35c471f58..5e496c2e4 100644 --- a/src/box/raft.c +++ b/src/box/raft.c @@ -88,11 +88,11 @@ box_raft_update_synchro_queue(struct raft *raft)  {      assert(raft == box_raft());      /* -     * In case these are manual elections, we are already in the middle of a -     * `promote` call. No need to call it once again. +     * In case the elections were triggered manually, we are already in +     * the middle of a `promote` call. No need to call it once again.       */      if (raft->state == RAFT_STATE_LEADER && -        box_election_mode != ELECTION_MODE_MANUAL) { +        !box_in_promote()) {          int rc = 0;          uint32_t errcode = 0;          do { ===================================== > > It might be a little easier to do if you apply the diff below. (Warning: > I didn't test it.) The motivation is that one of the main reasons why I > wanted box_promote() simplified was because of the strange meaning of some > flags. In particular, try_wait flag somewhy triggered elections before the > waiting which is super not obvious why. How does 'wait' come to 'elections'? > > In the diff I tried to remove these flags entirely. And now you have a > single place in the code of box_promote(), where ELECTION_MODE_CANDIDATE > stuff is handled. Here you could try the proposal I gave above. Thanks for the help! Your diff looks good, I've reworked my patches to comply. > ==================== > diff --git a/src/box/box.cc b/src/box/box.cc > index f68fffcab..e7765b657 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1698,34 +1698,6 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn) > assert(txn_limbo_is_empty(&txn_limbo)); > } > > -/** > - * Check whether this instance may run a promote() and set promote parameters > - * according to its election mode. > - */ > -static int > -box_check_promote_election_mode(bool *try_wait, bool *run_elections) > -{ > - switch (box_election_mode) { > - case ELECTION_MODE_OFF: > - if (try_wait != NULL) > - *try_wait = true; > - break; > - case ELECTION_MODE_VOTER: > - assert(box_raft()->state == RAFT_STATE_FOLLOWER); > - diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'", > - "manual elections"); > - return -1; > - case ELECTION_MODE_MANUAL: > - case ELECTION_MODE_CANDIDATE: > - if (run_elections != NULL) > - *run_elections = box_raft()->state != RAFT_STATE_LEADER; > - break; > - default: > - unreachable(); > - } > - return 0; > -} > - > /* A guard to block multiple simultaneous promote()/demote() invocations. */ > static bool box_in_promote = false; > > @@ -1757,27 +1729,35 @@ box_promote(void) > if (is_leader) > return 0; > > - bool run_elections = false; > - bool try_wait = false; > - > - if (box_check_promote_election_mode(&try_wait, &run_elections) < 0) > - return -1; > - > - int64_t wait_lsn = -1; > - > - if (run_elections && box_run_elections() < 0) > - return -1; > - if (try_wait) { > - if (box_try_wait_confirm(2 * replication_synchro_timeout) < 0) > + switch (box_election_mode) { > + case ELECTION_MODE_OFF: > + if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0) > + return -1; > + if (box_trigger_elections() != 0) > return -1; > - if (box_trigger_elections() < 0) > + break; > + case ELECTION_MODE_VOTER: > + assert(box_raft()->state == RAFT_STATE_FOLLOWER); > + diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'", > + "manual elections"); > + return -1; > + case ELECTION_MODE_MANUAL: > + case ELECTION_MODE_CANDIDATE: > + if (box_raft()->state != RAFT_STATE_LEADER && > + box_run_elections() != 0) > return -1; > + break; > + default: > + unreachable(); > } > - if ((wait_lsn = box_wait_limbo_acked()) < 0) > + if (box_check_promote_election_mode(&try_wait, &run_elections) < 0) > return -1; > > - box_issue_promote(txn_limbo.owner_id, wait_lsn); > + int64_t wait_lsn = box_wait_limbo_acked(); > + if (wait_lsn < 0) > + return -1; > > + box_issue_promote(txn_limbo.owner_id, wait_lsn); > return 0; > } > > @@ -1804,29 +1784,16 @@ box_demote(void) > is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER; > if (!is_leader) > return 0; > - > - bool try_wait = false; > - > - if (box_check_promote_election_mode(&try_wait, NULL) < 0) > - return -1; > - > - int64_t wait_lsn = -1; > - > if (box_trigger_elections() < 0) > return -1; > - > if (box_election_mode != ELECTION_MODE_OFF) > return 0; > - > - if (try_wait && > - box_try_wait_confirm(2 * replication_synchro_timeout) < 0) > + if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0) > return -1; > - > - if ((wait_lsn = box_wait_limbo_acked()) < 0) > + int64_t wait_lsn = box_wait_limbo_acked(); > + if (wait_lsn < 0) > return -1; > - > box_issue_demote(txn_limbo.owner_id, wait_lsn); > - > return 0; > } >