Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Alexander V. Tikhonov" <avtikhon@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 1/1] limbo: don't wake self fiber on CONFIRM write
Date: Sat, 12 Sep 2020 07:35:33 +0300	[thread overview]
Message-ID: <20200912043533.GA32313@hpalx> (raw)
In-Reply-To: <8e02ff5740f7ea73a7cbc45f3fbdb423054984d9.1599774793.git.v.shpilevoy@tarantool.org>

Hi Vlad, thanks a lot for your patch. I've checked and it really helped
to fix the issue, patch LGTM.

On Thu, Sep 10, 2020 at 11:53:57PM +0200, Vladislav Shpilevoy wrote:
> During recovery WAL writes end immediately, without yields.
> Therefore WAL write completion callback is executed in the
> currently active fiber.
> 
> Txn limbo on CONFIRM WAL write wakes up the waiting fiber, which
> appears to be the same as the active fiber during recovery.
> 
> That breaks the fiber scheduler, because apparently it is not safe
> to wake the currently active fiber unless it is going to call
> fiber_yield() immediately after. See a comment in fiber_wakeup()
> implementation about that way of usage.
> 
> The patch simply stops waking the waiting fiber, if it is the
> currently active one.
> 
> Closes #5288
> Closes #5232
> ---
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5288-limbo-wakeup
> Issue: https://github.com/tarantool/tarantool/issues/5288
> Issue: https://github.com/tarantool/tarantool/issues/5232
> 
>  src/box/txn_limbo.c                           |  3 ++-
>  .../replication/gh-5288-qsync-recovery.result | 27 +++++++++++++++++++
>  .../gh-5288-qsync-recovery.test.lua           | 11 ++++++++
>  3 files changed, 40 insertions(+), 1 deletion(-)
>  create mode 100644 test/replication/gh-5288-qsync-recovery.result
>  create mode 100644 test/replication/gh-5288-qsync-recovery.test.lua
> 
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 2ff3bdd05..3655338d8 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -281,7 +281,8 @@ static void
>  txn_limbo_write_cb(struct journal_entry *entry)
>  {
>  	assert(entry->complete_data != NULL);
> -	fiber_wakeup(entry->complete_data);
> +	if (fiber() != entry->complete_data)
> +		fiber_wakeup(entry->complete_data);
>  }
>  
>  static void
> diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result
> new file mode 100644
> index 000000000..dc0babef6
> --- /dev/null
> +++ b/test/replication/gh-5288-qsync-recovery.result
> @@ -0,0 +1,27 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +--
> +-- gh-5288: transaction limbo could crash during recovery, because in WAL write
> +-- completion callback it woken up the currently active fiber.
> +--
> +s = box.schema.space.create('sync', {is_sync = true})
> + | ---
> + | ...
> +_ = s:create_index('pk')
> + | ---
> + | ...
> +s:insert{1}
> + | ---
> + | - [1]
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +test_run:cmd('restart server default')
> + | 
> +box.space.sync:drop()
> + | ---
> + | ...
> diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua
> new file mode 100644
> index 000000000..00bff7b87
> --- /dev/null
> +++ b/test/replication/gh-5288-qsync-recovery.test.lua
> @@ -0,0 +1,11 @@
> +test_run = require('test_run').new()
> +--
> +-- gh-5288: transaction limbo could crash during recovery, because in WAL write
> +-- completion callback it woken up the currently active fiber.
> +--
> +s = box.schema.space.create('sync', {is_sync = true})
> +_ = s:create_index('pk')
> +s:insert{1}
> +box.snapshot()
> +test_run:cmd('restart server default')
> +box.space.sync:drop()
> -- 
> 2.21.1 (Apple Git-122.3)
> 

  reply	other threads:[~2020-09-12  4:35 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-10 21:53 Vladislav Shpilevoy
2020-09-12  4:35 ` Alexander V. Tikhonov [this message]
2020-09-12 12:13 ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200912043533.GA32313@hpalx \
    --to=avtikhon@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 1/1] limbo: don'\''t wake self fiber on CONFIRM write' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox