From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id F3E7F6EC58; Tue, 3 Aug 2021 10:56:50 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F3E7F6EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627977411; bh=WO0qh9xXqbTMm8BzmpDmasyYB5Rla8uD2jTVCJ8eJDo=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=rzsOvZAy6YGw4b13qILYghngq+SYLWUz6plaa9cJFmca0SCoDt2UuSXQCIsbi4mze Q2VJCNFmBFjpD8EbonEqLm1lkiCBaoKdg/G+b3PWAap/gP/oOMiVa/wfUeZ0kt53Fw aMbUa6yButjGYarhIRTtPb04JD39Q+G0eiPy+u3M= Received: from smtp47.i.mail.ru (smtp47.i.mail.ru [94.100.177.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 206806EC58 for ; Tue, 3 Aug 2021 10:56:49 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 206806EC58 Received: by smtp47.i.mail.ru with esmtpa (envelope-from ) id 1mApI8-0008W8-2E; Tue, 03 Aug 2021 10:56:48 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <3b4f7752-f68b-afce-c83a-355bcfbc28cc@tarantool.org> <9ae5765f-ffcb-af66-469b-b0a99d71c331@tarantool.org> <036d7d32-54bc-3239-9291-d713b062f324@tarantool.org> Message-ID: <03c81edf-709b-6320-b3e3-a2331b2b08fe@tarantool.org> Date: Tue, 3 Aug 2021 10:56:47 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C33D83595CA30D6DC5179D1C9A908C47E5182A05F538085040B299D32FF7B3024BE95B8853083EF977252088028D77124699ABBCA6E8969DDB X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7A140E7B1D51EB231EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637F78F3D6E0D6791938638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8B75DC0C154815E5EB4E402CEF9EB9EF0117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352026055571C92BF10F618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6F7FD1A3A8AE6177F089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A516404A1DF41DE48BA1FCF68EF3444828770AA5D3936ABFAED59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA751B940EDA0DFB0535410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34315359784EE451037A84BF1D89BA22F7904E3E30EFA65AD9366C018B18921B9C1247E5AD6E2151CE1D7E09C32AA3244C7774BD7C502E35BA2360307718474317F522A1CF68F4BE05FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj9N286KAyvN4JQeT4E5rY0g== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446AF048D1472D1AE0F2E39776CEA466FD40322F42A681478E1424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote` X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 01.08.2021 19:19, Vladislav Shpilevoy пишет: > Hi! Thanks for the patch! > > I think we are on the finish line here, see 4 small > comments below. After them and after you fix the failing > vinyl test, the patchset will probably be finished! As discussed, fixed vinyl/errinj test by making iproto_write_error() use blocking writes in debug build. Here's the relevant commit (indentation is broken in the letter) : ============================================= commit c6283769a887429f81d0005b2354691ace22514d Author: Serge Petrenko Date:   Mon Aug 2 19:33:36 2021 +0300     iproto: make iproto_write_error() blocking in debug     iproto_write_error() used to be blocking until the commit     4dac37a66b0674c345e036faa9984c9ae0d70382 (iproto: remove     iproto_write_error_blocking())     Actually, it should block until the error is written to socket, because     some tests (vinyl/errinj.test.lua, for example) rely on that.     Do not make iproto_write_error() blocking in release builds for safety     reasons, as stated in commit above. But make it blocking in debug for     testing sake.     Part-of #6034 diff --git a/src/box/iproto.cc b/src/box/iproto.cc index 3ed641eea..5cc69b77f 100644 --- a/src/box/iproto.cc +++ b/src/box/iproto.cc @@ -32,6 +32,7 @@  #include  #include  #include +#include  #include  #include @@ -557,6 +558,20 @@ struct iproto_connection      struct iproto_thread *iproto_thread;  }; +#ifdef NDEBUG +#define iproto_write_error(fd, e, schema_version, sync)                         \ +    iproto_do_write_error(fd, e, schema_version, sync); +#else +#define iproto_write_error(fd, e, schema_version, sync) do {                    \ +    int flags = fcntl(fd, F_GETFL, 0);                                      \ +    if (flags >= 0)                                                         \ +        fcntl(fd, F_SETFL, flags & (~O_NONBLOCK));                      \ +    iproto_do_write_error(fd, e, schema_version, sync);                     \ +    if (flags >= 0)                                                         \ +        fcntl(fd, F_SETFL, flags);                                      \ +} while (0); +#endif +  /**   * Return true if we have not enough spare messages   * in the message pool. diff --git a/src/box/xrow.c b/src/box/xrow.c index 116be01ce..5c5da4808 100644 --- a/src/box/xrow.c +++ b/src/box/xrow.c @@ -543,8 +543,8 @@ iproto_reply_error(struct obuf *out, const struct error *e, uint64_t sync,  }  void -iproto_write_error(int fd, const struct error *e, uint32_t schema_version, -           uint64_t sync) +iproto_do_write_error(int fd, const struct error *e, uint32_t schema_version, +              uint64_t sync)  {      bool is_error = false;      struct mpstream stream; diff --git a/src/box/xrow.h b/src/box/xrow.h index 3867e0c0e..30d6b8639 100644 --- a/src/box/xrow.h +++ b/src/box/xrow.h @@ -719,8 +719,8 @@ iproto_reply_chunk(struct obuf *buf, struct obuf_svp *svp, uint64_t sync,  /** Write error directly to a socket. */  void -iproto_write_error(int fd, const struct error *e, uint32_t schema_version, -           uint64_t sync); +iproto_do_write_error(int fd, const struct error *e, uint32_t schema_version, +              uint64_t sync);  enum {      /* Maximal length of protocol name in handshake */ ============================================= I've also noticed some failures in box-py/iproto.test.py, so I've made the following changes in these 3 commits: d61435f4a replication: send current Raft term in join response f0e4d1b73 replication: send latest effective promote in initial join 2d2ba4d35 replication: add META stage to JOIN ============================================= diff --git a/src/box/box.cc b/src/box/box.cc index f3da02231..e7b8ddda5 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -2724,7 +2724,7 @@ box_process_join(struct ev_io *io, struct xrow_header *header)      /* Decode JOIN request */      struct tt_uuid instance_uuid = uuid_nil; -    uint32_t replica_version_id; +    uint32_t replica_version_id = 0;      xrow_decode_join_xc(header, &instance_uuid, &replica_version_id);      /* Check that bootstrap has been finished */ diff --git a/src/box/relay.cc b/src/box/relay.cc index 805b5e7ff..2947468ba 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -444,20 +444,20 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,       */      if (replica_version_id > 0) {          /* Mark the beginning of the metadata stream. */ -        row.type = IPROTO_JOIN_META; -        coio_write_xrow(&relay->io, &row); +        xrow_encode_type(&row, IPROTO_JOIN_META); +        xstream_write(&relay->stream, &row);          xrow_encode_raft(&row, &fiber()->gc, &raft_req); -        coio_write_xrow(&relay->io, &row); +        xstream_write(&relay->stream, &row);          char body[XROW_SYNCHRO_BODY_LEN_MAX];          xrow_encode_synchro(&row, body, &req);          row.replica_id = req.replica_id; -        coio_write_xrow(&relay->io, &row); +        xstream_write(&relay->stream, &row);          /* Mark the end of the metadata stream. */ -        row.type = IPROTO_JOIN_SNAPSHOT; -        coio_write_xrow(&relay->io, &row); +        xrow_encode_type(&row, IPROTO_JOIN_SNAPSHOT); +        xstream_write(&relay->stream, &row);      }      /* Send read view to the replica. */ diff --git a/src/box/xrow.c b/src/box/xrow.c index 5c5da4808..8ab8b2768 100644 --- a/src/box/xrow.c +++ b/src/box/xrow.c @@ -1730,6 +1730,13 @@ xrow_encode_timestamp(struct xrow_header *row, uint32_t replica_id, double tm)      row->tm = tm;  } +void +xrow_encode_type(struct xrow_header *row, uint16_t type) +{ +    memset(row, 0, sizeof(*row)); +    row->type = type; +} +  void  greeting_encode(char *greetingbuf, uint32_t version_id,          const struct tt_uuid *uuid, const char *salt, uint32_t salt_len) diff --git a/src/box/xrow.h b/src/box/xrow.h index 30d6b8639..c6e8ed0fd 100644 --- a/src/box/xrow.h +++ b/src/box/xrow.h @@ -570,6 +570,14 @@ xrow_decode_subscribe_response(struct xrow_header *row,  void  xrow_encode_timestamp(struct xrow_header *row, uint32_t replica_id, double tm); +/** + * Encode any bodyless message. + * @param row[out] Row to encode into. + * @param type Message type. + */ +void +xrow_encode_type(struct xrow_header *row, uint16_t type); +  /**   * Fast encode xrow header using the specified header fields.   * It is faster than the xrow_header_encode, because uses ============================================= > > There are only 2 things which bother me. They are not bugs > and we can work on them in the next quarter. > > 1) Assume you have election_mode = 'manual'. And you are a > leader. You call box.ctl.demote() and stop being a leader. > But the limbo is still yours. If now you switch election_mode to > 'off', you need to call box.ctl.demote() again to free the > limbo. This is an inconvenience rather than a bug IMO. I couldn't find a solution right away. > 2) In the last commit I see we make too much actions to ensure > we are a writable leader. Perhaps in the future we should not > report box.info.election.state == 'leader' until promote is > written and should not say the instance is writable. Yes, maybe. Need to think about that. > > I don't have a plan for either of these ideas yet. > >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 41f665e38..a34e05e94 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -1679,20 +1679,44 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn) >> assert(txn_limbo_is_empty(&txn_limbo)); >> } >> >> +/* A guard to block multiple simultaneous box_promote() invocations. */ > 1. For out of function comments we usually use /** as an opening. Fixed, sorry. >> +static bool in_box_promote = false; > 2. Could you please use `is_` prefix here? `is_in_box_promote`. Ok. >> + >> +int >> +box_promote_qsync(void) >> +{ >> + assert(!in_box_promote); >> + assert(is_box_configured); >> + struct raft *raft = box_raft(); >> + in_box_promote = true; >> + auto promote_guard = make_scoped_guard([&] { >> + in_box_promote = false; >> + }); >> + if (raft->state != RAFT_STATE_LEADER) >> + return 0; > 3. This condition is not reachable, according to what I see in > box_raft_worker_f(). Indeed. Changed, please, see the diff below. >> + assert(box_election_mode == ELECTION_MODE_MANUAL || >> + box_election_mode == ELECTION_MODE_CANDIDATE); >> + if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term) >> + return 0; >> + int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY); >> + if (wait_lsn < 0) >> + return -1; > 4. Perhaps this better be panic? Because infinity timeout should > not ever be reached. And then the function becomes void, because > would not be able to fail anymore. No. Actually, there are still multiple reasons for box_wait_limbo_acked() to fail: the quorum may be reached, but then increased, the fiber might be cancelled, new synchronous transactions might appear while waiting. I think box_promote_qsync() should be retried in raft worker, like it was done for box_promote(). Also there's box_check_promote_term_intact() inside box_wait_limbo_acked(), which could cause box_promote_qsync() to fail, but we shouldn't panic in this case. Here's the diff for this commit (box: split manual and automatic promotion): ============================================ diff --git a/src/box/box.cc b/src/box/box.cc index dad5e4557..53ed64e51 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1679,23 +1679,20 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)      assert(txn_limbo_is_empty(&txn_limbo));  } -/* A guard to block multiple simultaneous box_promote() invocations. */ -static bool in_box_promote = false; +/** A guard to block multiple simultaneous box_promote() invocations. */ +static bool is_in_box_promote = false;  int  box_promote_qsync(void)  { -    assert(!in_box_promote); +    assert(!is_in_box_promote);      assert(is_box_configured);      struct raft *raft = box_raft(); -    in_box_promote = true; +    is_in_box_promote = true;      auto promote_guard = make_scoped_guard([&] { -        in_box_promote = false; +        is_in_box_promote = false;      }); -    if (raft->state != RAFT_STATE_LEADER) -        return 0; -    assert(box_election_mode == ELECTION_MODE_MANUAL || -           box_election_mode == ELECTION_MODE_CANDIDATE); +    assert(raft->state == RAFT_STATE_LEADER);      if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)          return 0;      int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY); @@ -1708,15 +1705,15 @@ box_promote_qsync(void)  int  box_promote(void)  { -    if (in_box_promote) { +    if (is_in_box_promote) {          diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.promote",               "simultaneous invocations");          return -1;      }      struct raft *raft = box_raft(); -    in_box_promote = true; +    is_in_box_promote = true;      auto promote_guard = make_scoped_guard([&] { -        in_box_promote = false; +        is_in_box_promote = false;      });      /*       * Do nothing when box isn't configured and when PROMOTE was already @@ -1742,7 +1739,7 @@ box_promote(void)      case ELECTION_MODE_MANUAL:          if (raft->state == RAFT_STATE_LEADER)              return 0; -        in_box_promote = false; +        is_in_box_promote = false;          return box_run_elections();      case ELECTION_MODE_CANDIDATE:          /* diff --git a/src/box/raft.c b/src/box/raft.c index f8d13aa32..bc69f7f1b 100644 --- a/src/box/raft.c +++ b/src/box/raft.c @@ -83,6 +83,24 @@ box_raft_request_to_msg(const struct raft_request *req, struct raft_msg *msg)      };  } +static void +box_raft_update_synchro_queue(struct raft *raft) +{ +    assert(raft == box_raft()); +    if (raft->state != RAFT_STATE_LEADER) +        return; +    int rc = 0; +    uint32_t errcode = 0; +    do { +        rc = box_promote_qsync(); +        if (rc != 0) { +            struct error *err = diag_last_error(diag_get()); +            errcode = box_error_code(err); +            diag_log(); +        } +    } while (rc != 0 && errcode == ER_QUORUM_WAIT && !fiber_is_cancelled()); +} +  static int  box_raft_worker_f(va_list args)  { @@ -93,9 +111,7 @@ box_raft_worker_f(va_list args)          box_raft_has_work = false;          raft_process_async(raft); -        if (raft->state == RAFT_STATE_LEADER && -            box_promote_qsync() != 0) -            diag_log(); +        box_raft_update_synchro_queue(raft);          if (!box_raft_has_work)              fiber_yield(); ============================================ >> + box_issue_promote(txn_limbo.owner_id, wait_lsn); >> + return 0; >> +} >> commit 7980cb3096f2616a2851f8d97db8091f0d82879c >> Author: Serge Petrenko >> Date: Mon Jun 28 11:52:44 2021 +0300 >> >> box: allow calling promote on a candidate >> >> Part of #6034 >> >> diff --git a/test/replication/gh-6034-election-candidate-promote.result b/test/replication/gh-6034-election-candidate-promote.result >> new file mode 100644 >> index 000000000..2b4bc0213 >> --- /dev/null >> +++ b/test/replication/gh-6034-election-candidate-promote.result > 5. The test name format `gh-####-...` is obligatory only for bug tests. > This patch seems to be adding a feature. Ok: =============================== diff --git a/test/replication/gh-6034-election-candidate-promote.result b/test/replication/election-candidate-promote.result similarity index 100% rename from test/replication/gh-6034-election-candidate-promote.result rename to test/replication/election-candidate-promote.result diff --git a/test/replication/gh-6034-election-candidate-promote.test.lua b/test/replication/election-candidate-promote.test.lua similarity index 100% rename from test/replication/gh-6034-election-candidate-promote.test.lua rename to test/replication/election-candidate-promote.test.lua diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 1ec1a94eb..1a3c991f0 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -57,8 +57,8 @@      "gh-6057-qsync-confirm-async-no-wal.test.lua": {},      "gh-6094-rs-uuid-mismatch.test.lua": {},      "gh-6127-election-join-new.test.lua": {}, -    "gh-6034-election-candidate-promote.test.lua": {},      "gh-6035-applier-filter.test.lua": {}, +    "election-candidate-promote.test.lua": {},      "*": {          "memtx": {"engine": "memtx"},          "vinyl": {"engine": "vinyl"} =============================== -- Serge Petrenko