From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A10F26EC40; Sun, 26 Sep 2021 17:29:17 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A10F26EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1632666557; bh=oOYfb/OGfe9qZXYEhMKYGlfPX8buN9s2RzU4jKFx6vs=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=chHAVlczvNEtZRuVhf65EvyMH+o1nC+4ud4lKL8M+u7WXLiDxVEBcO0O0s6UKNbHB Y3m9bkXnETLlYrowSaJgInN9bkKsqXmo+8XRX/YX9xZHyvv7OneHxCVeAgUoNIFSAL B4NoM2XHWBm/FLVteFMWaxlV6bF8fbcfLeVo4qVM= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6CD786EC40 for ; Sun, 26 Sep 2021 17:29:15 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6CD786EC40 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1mUV9W-0003ju-H7; Sun, 26 Sep 2021 17:29:15 +0300 To: Cyrill Gorcunov , tml References: <20210922130535.79479-1-gorcunov@gmail.com> <20210922130535.79479-3-gorcunov@gmail.com> Message-ID: <8b6bec31-0c60-3486-ba72-b1a12d2a1144@tarantool.org> Date: Sun, 26 Sep 2021 16:29:13 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20210922130535.79479-3-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD96A58C36AA2E996498E412B95EB25F30472F9DA0B5FF28EB2182A05F538085040DD0AD0F0C5873D5B1B224FD0220D6EF2DA7280DD324EEA14B933FCC41D9FB6AB X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE740ED6E8A51A4BC6EEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637991D0D3E51C637188F08D7030A58E5AD1A62830130A00468AEEEE3FBA3A834EE7353EFBB5533756635616092EE2E9A3BBAE8182AF9243BF732BCB294B4E93B06A471835C12D1D9774AD6D5ED66289B5278DA827A17800CE70F3DDF2BBF19B93A9FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C3BEC81E4AEBD6D2BF117882F4460429728AD0CFFFB425014E868A13BD56FB6657E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637B8F435DEDE9E76EBEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B5052BA8149F47818D4CD73FC097E0059837 X-C1DE0DAB: 0D63561A33F958A5D4C185D2A95FE5D6ED8AFF44B8050626BFC43F2AF79F862CD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75BFC02AB3DF06BA5A410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D341776B9FDE05FBA7A6B1C596AB960297BACB953C3484D9713398A91A365CD0803F7BF4DF92B4C1B661D7E09C32AA3244CC6ADA7CBDD140AC6CE7718E6882C0070F94338140B71B8EEFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojP2ll4YlUPyaQbobASYheyg== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D150DF9BA184D05C2872C79C6EB3E7DDB3841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch! I consider this series as an independent patchset which fixes the ordering, not split-brain. Like you said. But now it is not independent. The main problem is that you just blended in a few changes from the split-brain patches. I point them out in my comments. See 4 comments below. > src/box/applier.cc | 16 +++++++--- > src/box/box.cc | 71 +++++++++++++++++++---------------------- > src/box/memtx_engine.cc | 3 +- > src/box/txn_limbo.c | 34 ++++++++++++++++++-- > src/box/txn_limbo.h | 49 +++++++++++++++++++++++++--- > 5 files changed, 121 insertions(+), 52 deletions(-) > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index b981bd436..f0751b68a 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier) > struct synchro_request req; > if (xrow_decode_synchro(&row, &req) != 0) > diag_raise(); > - txn_limbo_process(&txn_limbo, &req); > + if (txn_limbo_process(&txn_limbo, &req) != 0) > + diag_raise(); 1. How txn_limbo_process() can fail? You just fixed ordering. Essentially, added a few yields in some places. You didn't add any validation, any new errors in this patchset. Please, drop the empty 'return 0's from this patchset. They can't be and are not tested here anyway. Addition of a lock-unlock pair to txn_limbo_process didn't affect whether it can fail. It just blocks the execution sometimes for a while. > } else if (iproto_type_is_raft_request(row.type)) { > struct raft_request req; > if (xrow_decode_raft(&row, &req, NULL) != 0) > @@ -857,7 +858,7 @@ apply_synchro_row_cb(struct journal_entry *entry) > applier_rollback_by_wal_io(entry->res); > } else { > replica_txn_wal_write_cb(synchro_entry->rcb); > - txn_limbo_process(&txn_limbo, synchro_entry->req); > + txn_limbo_process_run(&txn_limbo, synchro_entry->req); 2. _run is usually used for infinite loops in fibers, or for handling a sequence of something. Like trigger_run(). Here you handle a single request. The only difference from process() is that the lock is taken. I propose to rename it to _do or _ex or _in_tx or something. > diff --git a/src/box/box.cc b/src/box/box.cc > index 7b11d56d6..19e67b186 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1670,48 +1669,43 @@ box_wait_limbo_acked(double timeout) > return wait_lsn; > } > > -/** Write and process a PROMOTE request. */ > -static void > -box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn) > +/** Write and process PROMOTE or DEMOTE request. */ > +static int > +box_issue_synchro(uint16_t type, uint32_t prev_leader_id, int64_t promote_lsn) > { > struct raft *raft = box_raft(); > + > + assert(type == IPROTO_RAFT_PROMOTE || > + type == IPROTO_RAFT_DEMOTE); > assert(raft->volatile_term == raft->term); > assert(promote_lsn >= 0); > - txn_limbo_write_promote(&txn_limbo, promote_lsn, > - raft->term); > + > struct synchro_request req = { > - .type = IPROTO_RAFT_PROMOTE, > - .replica_id = prev_leader_id, > - .origin_id = instance_id, > - .lsn = promote_lsn, > - .term = raft->term, > + .type = type, > + .replica_id = prev_leader_id, > + .origin_id = instance_id, > + .lsn = promote_lsn, > + .term = raft->term, > }; > - txn_limbo_process(&txn_limbo, &req); > + > + if (txn_limbo_process_begin(&txn_limbo, &req) != 0) > + return -1; > + > + if (type == IPROTO_RAFT_PROMOTE) > + txn_limbo_write_promote(&txn_limbo, req.lsn, req.term); > + else > + txn_limbo_write_demote(&txn_limbo, req.lsn, req.term); > + > + txn_limbo_process_run(&txn_limbo, &req); > assert(txn_limbo_is_empty(&txn_limbo)); > + > + txn_limbo_process_commit(&txn_limbo); > + return 0; > } > > /** A guard to block multiple simultaneous promote()/demote() invocations. */ > static bool is_in_box_promote = false; > > -/** Write and process a DEMOTE request. */ > -static void > -box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn) > -{ > - assert(box_raft()->volatile_term == box_raft()->term); > - assert(promote_lsn >= 0); > - txn_limbo_write_demote(&txn_limbo, promote_lsn, > - box_raft()->term); > - struct synchro_request req = { > - .type = IPROTO_RAFT_DEMOTE, > - .replica_id = prev_leader_id, > - .origin_id = instance_id, > - .lsn = promote_lsn, > - .term = box_raft()->term, > - }; > - txn_limbo_process(&txn_limbo, &req); > - assert(txn_limbo_is_empty(&txn_limbo)); 3. Why did you merge these 2 functions? AFAIR, their split was deliberate. To make each of them simpler to understand and maintain. > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index 70447caaf..eb9aa7780 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -786,6 +790,32 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req) > return; > } > > +int > +txn_limbo_process_begin(struct txn_limbo *limbo, > + const struct synchro_request *req) > +{ > + latch_lock(&limbo->promote_latch); > + /* > + * FIXME: For now we take a lock only but idea > + * is to verify incoming requests to detect split > + * brain situation. Thus we need to change the code > + * semantics in advance. > + */ > + (void)req; > + return 0; 4. Return value is a part of the split-brain patch, not of the ordering patch. It is clearly seen from this patchset, because this series never changes `return 0` to anything else. I get that you want to merge something. Hence we are working on this independent issue of reodering. But then lets make it truly independent.