From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 41A676FFA4; Sun, 23 May 2021 15:18:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 41A676FFA4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1621772311; bh=BkShChxDZpUF4vasFQvAC7DyhsgygKY+KXqegUAeG4Y=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=iFDHQWLTW8UiZZ15SPPhPI9/ghBiYauIhMY3ema3g2nG9XWdlQYIpu5u4U0fPd+/0 pG3kfroxE+BRW1Gm4+7giZaIJri+vubq6yffFUJ1P2y8LZE2jnHLNC4pQmbnRtZkp7 QFNVR1JTQzPMFhrtJ341h0h2Vo8BbNNbjX+Y2DrI= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 332FF6FFA4 for ; Sun, 23 May 2021 15:18:29 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 332FF6FFA4 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1lkn3r-0004hw-Qv; Sun, 23 May 2021 15:18:28 +0300 To: Serge Petrenko , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: Message-ID: <1b0f0a3d-59ba-38ba-7f7e-f214664c8976@tarantool.org> Date: Sun, 23 May 2021 14:18:26 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD91B019B01C53E51AF388A0BB768BC527FF0271862FFB192DD00894C459B0CD1B9ED897A4AC289A934E8BC2AE40A9A396683BBC0BBCB8643103A387867F39FB055 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE70C6BE81AD14BE2BFEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063745B6F93C788775E78638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D850E66725AD4EC23F055170326829F117117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6300D3B61E77C8D3B089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A5D3F882EC775C07D09A15DAB4F1527D0C3078209EFD517EF7D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA752546FE575EB473F1410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D349A949488F6BF46DD0D663055BD327D7A261B79677563EBF93AC2B9B8CB0BB71E3382C026F1BECDCE1D7E09C32AA3244CE7B0E85122C8D062C9F0E0456B3AE76360759606DA2E136AFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojzUqYFPhUCf9DyLDNNj3Ixg== X-Mailru-Sender: 689FA8AB762F73936BC43F508A063822BAFE0EB08756B4447DEB91D14128CF753841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 3/3] box: fix an assertion failure in box.ctl.promote() X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch! I see some of CI jobs have failed the new test: https://github.com/tarantool/tarantool/runs/2620153809 See 4 comments below. > diff --git a/src/box/box.cc b/src/box/box.cc > index c10e0d8bf..1b1e7eec0 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1442,17 +1442,22 @@ box_quorum_on_ack_f(struct trigger *trigger, void *event) > } > > /** > - * Wait until at least @a quorum of nodes confirm @a target_lsn from the node > - * with id @a lead_id. > + * Wait until at least @a quorum of nodes confirm the last available synchronous > + * entry from the node with id @a lead_id. > */ > static int > -box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum, > +box_wait_quorum(uint32_t lead_id, struct txn_limbo_entry **entry, int quorum, > double timeout) 1. Maybe try to leave this function not depending on the limbo and its entries? It was supposed to wait for replication of just LSN, not necessarily a synchronous transaction. > { > struct box_quorum_trigger t; > memset(&t, 0, sizeof(t)); > vclock_create(&t.vclock); > > + *entry = txn_limbo_wait_lsn_assigned(&txn_limbo); > + if (*entry == NULL) > + return -1; > + int64_t target_lsn = (*entry)->lsn; > + > /* Take this node into account immediately. */ > int ack_count = vclock_get(box_vclock, lead_id) >= target_lsn; > replicaset_foreach(replica) { > @@ -1622,22 +1627,17 @@ box_promote(void) > } > } > > - /* > - * promote() is a no-op on the limbo owner, so all the rows > - * in the limbo must've come through the applier meaning they already > - * have an lsn assigned, even if their WAL write hasn't finished yet. > - */ > - wait_lsn = txn_limbo_last_synchro_entry(&txn_limbo)->lsn; > - assert(wait_lsn > 0); > - > - rc = box_wait_quorum(former_leader_id, wait_lsn, quorum, > + struct txn_limbo_entry *last_entry; > + rc = box_wait_quorum(former_leader_id,&last_entry, quorum, 2. Missing whitespace after the first argument. > replication_synchro_timeout); > if (rc == 0) { > + wait_lsn = last_entry->lsn; > if (quorum < replication_synchro_quorum) { > diag_set(ClientError, ER_QUORUM_WAIT, quorum, > "quorum was increased while waiting"); > rc = -1; > - } else if (wait_lsn < txn_limbo_last_synchro_entry(&txn_limbo)->lsn) { > + } else if (last_entry != > + txn_limbo_last_synchro_entry(&txn_limbo)) { > diag_set(ClientError, ER_QUORUM_WAIT, quorum, > "new synchronous transactions appeared"); > rc = -1; 3. Could all the 3 commits be replaced with calling wal_sync() in the beginning of the promote() if we see the last LSN is unknown? After wal_sync() several outcomes are possible: - All was rolled back, and the limbo is empty; - The last transaction is different after sync - it means it was added during promote() which is an error like in the code above; - The transaction in the end of the limbo is the same. In the last case you work like before - box_wait_quorum() with the known LSN. Will it work? > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index f287369a2..406f2de89 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -69,6 +69,48 @@ txn_limbo_last_synchro_entry(struct txn_limbo *limbo) > return NULL; > } > > +static int > +txn_limbo_wait_lsn_assigned_f(struct trigger *trig, void *event) > +{ > + (void)event; > + struct fiber *fiber = trig->data; > + fiber_wakeup(fiber); > + return 0; > +} > + > +struct txn_limbo_entry * > +txn_limbo_wait_lsn_assigned(struct txn_limbo *limbo) > +{ > + assert(!txn_limbo_is_empty(limbo)); > + struct txn_limbo_entry *entry = txn_limbo_last_synchro_entry(limbo); > + if (entry->lsn >= 0) > + return entry; > + > + struct trigger write_trigger, rollback_trigger; > + trigger_create(&write_trigger, txn_limbo_wait_lsn_assigned_f, fiber(), > + NULL); > + trigger_create(&rollback_trigger, txn_limbo_wait_lsn_assigned_f, > + fiber(), NULL); > + txn_on_wal_write(entry->txn, &write_trigger); > + txn_on_rollback(entry->txn, &rollback_trigger); > + do { > + fiber_yield(); > + if (fiber_is_cancelled()) { > + diag_set(FiberIsCancelled); > + entry = NULL; > + break; > + } > + if (entry->txn->signature < 0) { > + diag_set(ClientError, ER_SYNC_ROLLBACK); > + entry = NULL; > + break; > + } > + } while (entry->lsn == -1); > + trigger_clear(&write_trigger); > + trigger_clear(&rollback_trigger); 4. Why do you need the LSN assigned in the on_wal_write trigger in the previous commit? I can't see where do you use it here.