From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id EA6216EC5F; Tue, 2 Mar 2021 20:51:58 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EA6216EC5F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1614707519; bh=iFdhRGrz0eBYKbsnquK87cQziYcw2Wgg2CcbHOJkpDE=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=CWAvKwN9iO6XCYRTlVTzyvFFC38kPcaSh+9zyxWCAEvSVqlUBQVL+8GosdIDEllYF vDn7lGkxFjIjaB0kT/hTz2TChao6H+J5X1jaIGl/uUN1mI6X3QxgsgIxoheqXmi4Lw HGlqFSB5jtkD5Jk3cSx3IsjOQ+RQJP+jGa8KlH2g= Received: from smtp29.i.mail.ru (smtp29.i.mail.ru [94.100.177.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id EEBE36EC5F for ; Tue, 2 Mar 2021 20:51:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EEBE36EC5F Received: by smtp29.i.mail.ru with esmtpa (envelope-from ) id 1lH9Bc-0004QH-66; Tue, 02 Mar 2021 20:51:56 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <20210224193549.70017-1-sergepetrenko@tarantool.org> <469ef629-699c-0180-facb-f166d6d3c073@tarantool.org> <17046c4d-b239-f3d8-f150-941026096b1e@tarantool.org> Message-ID: <97734227-410b-3e72-2ee7-30e7d6f63217@tarantool.org> Date: Tue, 2 Mar 2021 20:51:55 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD92A98208ECBDD29F5D8C0A06ED169F0586A18EBFAC8707E15182A05F538085040CD78D6F055F7A9E831C602EA83A02D8DEF60AEE73E3A3C142F7B88D57AEDD402 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE79FF7180C05A1FF7CEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637A361B53D384D9D40EA1F7E6F0F101C67CDEEF6D7F21E0D1D174C73DBBBFC76649D30E1DED2820F1D35FE3E599308E4EC2F9C41A69BC58CCF389733CBF5DBD5E913377AFFFEAFD269A417C69337E82CC2CC7F00164DA146DAFE8445B8C89999729449624AB7ADAF37F6B57BC7E64490611E7FA7ABCAF51C92176DF2183F8FC7C07E7E81EEA8A9722B8941B15DA834481F9449624AB7ADAF37BA3038C0950A5D3613377AFFFEAFD269176DF2183F8FC7C01F93811EDFD95AC87B076A6E789B0E97A8DF7F3B2552694A1E7802607F20496D49FD398EE364050FCCD848CCB6FE560C846F39228950D27DB3661434B16C20AC78D18283394535A975ECD9A6C639B01BC09775C1D3CA48CFCD42BCEBB57B85E635872C767BF85DA22EF20D2F80756B5F40A5AABA2AD3711975ECD9A6C639B01B78DA827A17800CE772B24F7D6CBA3C2C731C566533BA786A40A5AABA2AD371193C9F3DD0FB1AF5EB82E77451A5C57BD33C9F3DD0FB1AF5EB4E70A05D1297E1BBCB5012B2E24CD356 X-B7AD71C0: 2623F767319EFA42AC98609FCEE262F9192335DD689A58EBAE0174B7F1092AFB3C5BCCDB361A5E937B6A47F5B933A485 X-C1DE0DAB: 0D63561A33F958A5470665F9B522418C5FAFC03AA49088B6DCD2534EFB52E149D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75448CF9D3A7B2C848410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D344A431191C56981FE2ACD1F22EC70C13AF62771AC9D1FAE40A898A146B63F432A4EEE503B9338EE391D7E09C32AA3244CC7DD365FD96A6BFB6D42D3536B36ADE6CE0B41342B755BCDFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojnllmSZCgRqurJRxLz2Pi3Q== X-Mailru-Sender: 583F1D7ACE8F49BDF0EA4664CAF0825D0A13E2F9FF7DD7AD6A4A5B4A02F17D9D09C9060410F0400C823C4E0A9438D55D74690CA6451351EDEC462FDC9CAD1E11B969B486931C0B990F27244EEAA5B9A5AE208404248635DF X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v3] wal: introduce limits on simultaneous writes X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 02.03.2021 01:05, Vladislav Shpilevoy пишет: > Hi! Thanks for the fixes! > >>>> @@ -159,6 +276,13 @@ journal_write(struct journal_entry *entry) >>>>   static inline int >>>>   journal_write_async(struct journal_entry *entry) >>>>   { >>>> +    /* >>>> +     * It's the job of the caller to check whether the queue is full prior >>>> +     * to submitting the request. >>>> +     */ >>>> +    assert(!journal_queue_is_full() || journal_queue.is_ready); >>>> +    journal_queue_on_append(entry); >>> 8. Probably must assert that waiters list is empty. Otherwise you >>> could go out of order. >> It's not empty by the time the first entry gets to 'journal_write_async'. >> Everyone else is waken up, but not yet removed from the queue. >> >> Looks like we cannot determine whether a write is called after waiting in queue >> or not. > It bothers me a lot, the rest looks good. We don't have any protection against > a reorder, even not an assertion. How about this? (I didn't test): Thanks for thinking this through! Yes, indeed, this is a problem. Unfortunately, your solution will allow reordering. We needed the waiter count to prevent anyone from sliding in front when the qeueue is already waken, but the fibers are not yet scheduled. We also discussed that a reorder may happen if applier tries to apply a vinyl tx, which may yield after it has waited in queue. In this case, if some tx is committed (synchronously) and added to the queue while the vinyl tx sleeps, the vinyl tx will bypass that synchronous tx in queue. So, there shouldn't be any yields between journal_queue_wait() and journal_write(_async)(). We cannot yield in memtx transactions, so the only solution is to put journal_queue_wait() after txn_prepare() inside txn_commit_async(). Take a look at this patch (on top of the branch): I decided to move journal_queue_wait() to journal_write_async() to be consistent with journal_write(), which also checks queue. Now reordering is not a problem since waiting in queue is done right before the write request is submitted. No yields in between. Also, I renamed journal_write_async() to journal_write_try_async() and txn_commit_async() to txn_commit_try_async() to indicate that they may yield occasionally now. What do you think of this approach? P.S. there's another thing I wanted to discuss: should we set defaults for wal_queue_max_size(len) to some finite value? If yes, then what value should it be? wal_queue_max_size=default memtx_memory (256 Mb)? Don't know which value to choose for wal_queue_max_len at all. =================================== Subject: [PATCH] [tosquash] journal_write and txn_commit: async -> try_async Make journal_write_async() wait for journal queue. Rename it to journal_write_try_async() to show that it might yield now (rarely). Same for txn_commit_async(): rename it to txn_commit_try_async() since it uses journal_write_try_async(). ---  src/box/applier.cc  | 12 ++----------  src/box/box.cc      |  4 ++--  src/box/journal.h   |  7 ++-----  src/box/txn.c       |  4 ++--  src/box/txn.h       |  3 ++-  src/box/txn_limbo.h |  2 +-  6 files changed, 11 insertions(+), 21 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index dff43795c..5a88a013e 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -868,7 +868,7 @@ apply_synchro_row(struct xrow_header *row)      if (entry == NULL)          goto err; -    if (journal_write_async(&entry->journal_entry) != 0) { +    if (journal_write_try_async(&entry->journal_entry) != 0) {          diag_set(ClientError, ER_WAL_IO);          goto err;      } @@ -967,14 +967,6 @@ applier_apply_tx(struct applier *applier, struct stailq *rows)          goto success;      } -    /* -     * Do not spam WAL with excess write requests, let it process what's -     * piled up first. -     * This is done before opening the transaction to avoid problems with -     * yielding inside it. -     */ -    journal_queue_wait(); -      /**       * Explicitly begin the transaction so that we can       * control fiber->gc life cycle and, in case of apply @@ -1048,7 +1040,7 @@ applier_apply_tx(struct applier *applier, struct stailq *rows)      trigger_create(on_wal_write, applier_txn_wal_write_cb, NULL, NULL);      txn_on_wal_write(txn, on_wal_write); -    if (txn_commit_async(txn) < 0) +    if (txn_commit_try_async(txn) < 0)          goto fail;  success: diff --git a/src/box/box.cc b/src/box/box.cc index 9a3b092d0..a8aa2663f 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -246,12 +246,12 @@ box_process_rw(struct request *request, struct space *space,           * synchronous tx it meets until confirm timeout           * is reached and the tx is rolled back, yielding           * an error. -         * Moreover, txn_commit_async() doesn't hurt at +         * Moreover, txn_commit_try_async() doesn't hurt at           * all during local recovery, since journal_write           * is faked at this stage and returns immediately.           */          if (is_local_recovery) { -            res = txn_commit_async(txn); +            res = txn_commit_try_async(txn);          } else {              res = txn_commit(txn);          } diff --git a/src/box/journal.h b/src/box/journal.h index 5f0e0accd..3a945fa53 100644 --- a/src/box/journal.h +++ b/src/box/journal.h @@ -253,12 +253,9 @@ journal_write(struct journal_entry *entry)   * @return 0 if write was queued to a backend or -1 in case of an error.   */  static inline int -journal_write_async(struct journal_entry *entry) +journal_write_try_async(struct journal_entry *entry)  { -    /* -     * It's the job of the caller to check whether the queue is full prior -     * to submitting the request. -     */ +    journal_queue_wait();      journal_queue_on_append(entry);      return current_journal->write_async(current_journal, entry); diff --git a/src/box/txn.c b/src/box/txn.c index 71c89ce5f..40061ff09 100644 --- a/src/box/txn.c +++ b/src/box/txn.c @@ -789,7 +789,7 @@ txn_limbo_on_rollback(struct trigger *trig, void *event)  }  int -txn_commit_async(struct txn *txn) +txn_commit_try_async(struct txn *txn)  {      struct journal_entry *req; @@ -859,7 +859,7 @@ txn_commit_async(struct txn *txn)      }      fiber_set_txn(fiber(), NULL); -    if (journal_write_async(req) != 0) { +    if (journal_write_try_async(req) != 0) {          fiber_set_txn(fiber(), txn);          diag_set(ClientError, ER_WAL_IO);          diag_log(); diff --git a/src/box/txn.h b/src/box/txn.h index 29fe6d5ce..a45518064 100644 --- a/src/box/txn.h +++ b/src/box/txn.h @@ -473,12 +473,13 @@ txn_rollback(struct txn *txn);   * journal write completion. Note, the journal write may still fail.   * To track transaction status, one is supposed to use on_commit and   * on_rollback triggers. + * Note, this may yield occasionally, once journal queue gets full.   *   * On failure -1 is returned and the transaction is rolled back and   * freed.   */  int -txn_commit_async(struct txn *txn); +txn_commit_try_async(struct txn *txn);  /**   * Most txns don't have triggers, and txn objects diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h index c28b5666d..af0addf8d 100644 --- a/src/box/txn_limbo.h +++ b/src/box/txn_limbo.h @@ -229,7 +229,7 @@ txn_limbo_assign_local_lsn(struct txn_limbo *limbo,   * remote transactions. The function exists to be used in a   * context, where a transaction is not known whether it is local   * or not. For example, when a transaction is committed not bound - * to any fiber (txn_commit_async()), it can be created by applier + * to any fiber (txn_commit_try_async()), it can be created by applier   * (then it is remote) or by recovery (then it is local). Besides,   * recovery can commit remote transactions as well, when works on   * a replica - it will recover data received from master. -- 2.24.3 (Apple Git-128) =================================== > > ==================== > --- a/src/box/journal.c > +++ b/src/box/journal.c > @@ -69,8 +69,11 @@ void > journal_queue_wakeup(void) > { > struct rlist *list = &journal_queue.waiters; > - if (!rlist_empty(list) && !journal_queue_is_full()) > + if (!rlist_empty(list) && !journal_queue_is_full()) { > fiber_wakeup(rlist_first_entry(list, struct fiber, state)); > + --journal_queue.waiter_count; > + assert(journal_queue.waiter_count >= 0); > + } > } > > void > @@ -84,7 +87,6 @@ journal_queue_wait(void) > * Will be waken up by either queue emptying or a synchronous write. > */ > fiber_yield(); > - --journal_queue.waiter_count; > journal_queue_wakeup(); > } > > @@ -96,5 +98,6 @@ journal_queue_flush(void) > struct rlist *list = &journal_queue.waiters; > while (!rlist_empty(list)) > fiber_wakeup(rlist_first_entry(list, struct fiber, state)); > + journal_queue.waiter_count = 0; > journal_queue_wait(); > } > diff --git a/src/box/journal.h b/src/box/journal.h > index 5f0e0accd..ea56043e2 100644 > --- a/src/box/journal.h > +++ b/src/box/journal.h > @@ -260,6 +260,7 @@ journal_write_async(struct journal_entry *entry) > * to submitting the request. > */ > journal_queue_on_append(entry); > + assert(journal_queue.waiter_count == 0); > > return current_journal->write_async(current_journal, entry); > } -- Serge Petrenko