From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 087E4469719 for ; Fri, 11 Sep 2020 00:53:58 +0300 (MSK) From: Vladislav Shpilevoy Date: Thu, 10 Sep 2020 23:53:57 +0200 Message-Id: <8e02ff5740f7ea73a7cbc45f3fbdb423054984d9.1599774793.git.v.shpilevoy@tarantool.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 1/1] limbo: don't wake self fiber on CONFIRM write List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org, avtikhon@tarantool.org During recovery WAL writes end immediately, without yields. Therefore WAL write completion callback is executed in the currently active fiber. Txn limbo on CONFIRM WAL write wakes up the waiting fiber, which appears to be the same as the active fiber during recovery. That breaks the fiber scheduler, because apparently it is not safe to wake the currently active fiber unless it is going to call fiber_yield() immediately after. See a comment in fiber_wakeup() implementation about that way of usage. The patch simply stops waking the waiting fiber, if it is the currently active one. Closes #5288 Closes #5232 --- Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5288-limbo-wakeup Issue: https://github.com/tarantool/tarantool/issues/5288 Issue: https://github.com/tarantool/tarantool/issues/5232 src/box/txn_limbo.c | 3 ++- .../replication/gh-5288-qsync-recovery.result | 27 +++++++++++++++++++ .../gh-5288-qsync-recovery.test.lua | 11 ++++++++ 3 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 test/replication/gh-5288-qsync-recovery.result create mode 100644 test/replication/gh-5288-qsync-recovery.test.lua diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c index 2ff3bdd05..3655338d8 100644 --- a/src/box/txn_limbo.c +++ b/src/box/txn_limbo.c @@ -281,7 +281,8 @@ static void txn_limbo_write_cb(struct journal_entry *entry) { assert(entry->complete_data != NULL); - fiber_wakeup(entry->complete_data); + if (fiber() != entry->complete_data) + fiber_wakeup(entry->complete_data); } static void diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result new file mode 100644 index 000000000..dc0babef6 --- /dev/null +++ b/test/replication/gh-5288-qsync-recovery.result @@ -0,0 +1,27 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +-- +-- gh-5288: transaction limbo could crash during recovery, because in WAL write +-- completion callback it woken up the currently active fiber. +-- +s = box.schema.space.create('sync', {is_sync = true}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +s:insert{1} + | --- + | - [1] + | ... +box.snapshot() + | --- + | - ok + | ... +test_run:cmd('restart server default') + | +box.space.sync:drop() + | --- + | ... diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua new file mode 100644 index 000000000..00bff7b87 --- /dev/null +++ b/test/replication/gh-5288-qsync-recovery.test.lua @@ -0,0 +1,11 @@ +test_run = require('test_run').new() +-- +-- gh-5288: transaction limbo could crash during recovery, because in WAL write +-- completion callback it woken up the currently active fiber. +-- +s = box.schema.space.create('sync', {is_sync = true}) +_ = s:create_index('pk') +s:insert{1} +box.snapshot() +test_run:cmd('restart server default') +box.space.sync:drop() -- 2.21.1 (Apple Git-122.3)