From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp40.i.mail.ru (smtp40.i.mail.ru [94.100.177.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4244D469719 for ; Sat, 12 Sep 2020 07:35:36 +0300 (MSK) Date: Sat, 12 Sep 2020 07:35:33 +0300 From: "Alexander V. Tikhonov" Message-ID: <20200912043533.GA32313@hpalx> References: <8e02ff5740f7ea73a7cbc45f3fbdb423054984d9.1599774793.git.v.shpilevoy@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8e02ff5740f7ea73a7cbc45f3fbdb423054984d9.1599774793.git.v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 1/1] limbo: don't wake self fiber on CONFIRM write List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Hi Vlad, thanks a lot for your patch. I've checked and it really helped to fix the issue, patch LGTM. On Thu, Sep 10, 2020 at 11:53:57PM +0200, Vladislav Shpilevoy wrote: > During recovery WAL writes end immediately, without yields. > Therefore WAL write completion callback is executed in the > currently active fiber. > > Txn limbo on CONFIRM WAL write wakes up the waiting fiber, which > appears to be the same as the active fiber during recovery. > > That breaks the fiber scheduler, because apparently it is not safe > to wake the currently active fiber unless it is going to call > fiber_yield() immediately after. See a comment in fiber_wakeup() > implementation about that way of usage. > > The patch simply stops waking the waiting fiber, if it is the > currently active one. > > Closes #5288 > Closes #5232 > --- > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5288-limbo-wakeup > Issue: https://github.com/tarantool/tarantool/issues/5288 > Issue: https://github.com/tarantool/tarantool/issues/5232 > > src/box/txn_limbo.c | 3 ++- > .../replication/gh-5288-qsync-recovery.result | 27 +++++++++++++++++++ > .../gh-5288-qsync-recovery.test.lua | 11 ++++++++ > 3 files changed, 40 insertions(+), 1 deletion(-) > create mode 100644 test/replication/gh-5288-qsync-recovery.result > create mode 100644 test/replication/gh-5288-qsync-recovery.test.lua > > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index 2ff3bdd05..3655338d8 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -281,7 +281,8 @@ static void > txn_limbo_write_cb(struct journal_entry *entry) > { > assert(entry->complete_data != NULL); > - fiber_wakeup(entry->complete_data); > + if (fiber() != entry->complete_data) > + fiber_wakeup(entry->complete_data); > } > > static void > diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result > new file mode 100644 > index 000000000..dc0babef6 > --- /dev/null > +++ b/test/replication/gh-5288-qsync-recovery.result > @@ -0,0 +1,27 @@ > +-- test-run result file version 2 > +test_run = require('test_run').new() > + | --- > + | ... > +-- > +-- gh-5288: transaction limbo could crash during recovery, because in WAL write > +-- completion callback it woken up the currently active fiber. > +-- > +s = box.schema.space.create('sync', {is_sync = true}) > + | --- > + | ... > +_ = s:create_index('pk') > + | --- > + | ... > +s:insert{1} > + | --- > + | - [1] > + | ... > +box.snapshot() > + | --- > + | - ok > + | ... > +test_run:cmd('restart server default') > + | > +box.space.sync:drop() > + | --- > + | ... > diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua > new file mode 100644 > index 000000000..00bff7b87 > --- /dev/null > +++ b/test/replication/gh-5288-qsync-recovery.test.lua > @@ -0,0 +1,11 @@ > +test_run = require('test_run').new() > +-- > +-- gh-5288: transaction limbo could crash during recovery, because in WAL write > +-- completion callback it woken up the currently active fiber. > +-- > +s = box.schema.space.create('sync', {is_sync = true}) > +_ = s:create_index('pk') > +s:insert{1} > +box.snapshot() > +test_run:cmd('restart server default') > +box.space.sync:drop() > -- > 2.21.1 (Apple Git-122.3) >