From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp32.i.mail.ru (smtp32.i.mail.ru [94.100.177.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4D06B41C5DA for ; Mon, 22 Jun 2020 14:19:49 +0300 (MSK) References: <46ed0cc08e6ecf7ee7c9250d9cdfc3d68aa5c764.1592589312.git.sergepetrenko@tarantool.org> <30f75001-3ecc-9d7f-9cba-88c824087f4e@tarantool.org> From: Serge Petrenko Message-ID: Date: Mon, 22 Jun 2020 14:19:47 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH 1/2] box: rework local_recovery to use async txn_commit List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , sergos@tarantool.org, gorcunov@tarantool.org, lvasiliev@tarantool.org Cc: tarantool-patches@dev.tarantool.org 21.06.2020 19:25, Vladislav Shpilevoy пишет: > Hi! Thanks for the patch! > >> diff --git a/src/box/box.cc b/src/box/box.cc >> index f80d6f8e6..0fe7625fb 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -222,6 +222,12 @@ box_process_rw(struct request *request, struct space *space, >>                  */ >>                 if (is_local_recovery) { >>                         res = txn_commit_async(txn); >> +                       /* >> +                        * Hack: remove the unnecessary trigger. >> +                        * I don't know of a better place to do >> +                        * it. >> +                        */ > Why is it necessary to remove it? The trigger is set after journal_write_async() returns. The idea is that journal_write_async() just submits the request for writing, and later, journal_async_complete(), which is txn_complete_async() completes the tx processing. The key word here is "later", meaning that txn_complete_async() is called after txn_commit async() returns. This is why txn_complete_async() clears the trigger set on write failure: if write fails, we rollback the tx and remove the entry from the limbo. If writing doesn't fail, it's limbo's business to remove an entry. The tx still may be rolled back due to a timeout, then limbo will remove the entry, and if an on_rollback trigger isn't cleared, it'll try to remove the entry again. Now, back to this case: I reworked recovery_journal to use write_async(). recovery_journal_write_async() calls journal_async_complete() right away, since there're no actual writes and no one else can call journal_async_complete(). So, once journal_write_async() returns, the trigger is cleared, but later it's reset at the end of txn_commit_async(), because txn_commit_async() assumes that the write hasn't happened yet, and journal_async_complete() is yet to be called. One option is to call journal_async_complete() after txn_commit_async(), this will solve the problem, but journal_async_complete() receives `struct journal_entry *`, and we have no knowledge of the journal entry outside of `txn_commit_async()`. Another option is to set a txn_flag, like "TXN_DONT_EXPECT_WRITE_FAILURE", so that the txn doesn't set an on_write_failure trigger when we don't need it. But this also looks ugly. That's why I unconditionally reset the trigger after txn_commit_async(). > >> + trigger_clear(&txn->on_write_failure); >>                 } else { >>                         res = txn_commit(txn); >>                 } -- Serge Petrenko