Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] recovery: make it yield when positioning in a WAL
Date: Thu, 29 Apr 2021 11:55:28 +0300
Message-ID: <3d0997f3-066f-5b2c-3ae3-22b90583e9bf@tarantool.org> (raw)
In-Reply-To: <738d3030-80cf-8079-1b03-55a7d665dbbf@tarantool.org>



28.04.2021 23:50, Vladislav Shpilevoy пишет:
>>>> We had various places in box.cc and relay.cc which counted processed
>>>> rows and yielded every now and then. These yields didn't cover cases,
>>>> when recovery has to position inside a long WAL file:
>>>>
>>>> For example, when tarantool exits without leaving an empty WAL file
>>>> which'll be used to recover instance vclock on restart. In this case
>>>> the instance freezes while processing the last availabe WAL in order
>>>> to recover the vclock.
>>>>
>>>> Another issue is with replication. If a replica connects and needs data
>>>> from the end of a really long WAL, recovery will read up to the needed
>>>> position without yields, making relay disconnect by timeout.
>>>>
>>>> In order to fix the issue, make recovery decide when a yield should
>>>> happen. Introduce a new callback: schedule_yield, which is called by
>>>> recovery once it processes (no matter how, either simply skips or calls
>>>> xstream_write) enough rows (WAL_ROWS_PER_YIELD).
>>>>
>>>> schedule_yield either yields right away, in case of relay, or saves the
>>>> yield for later, in case of local recovery, because it might be in the
>>>> middle of a transaction.
>>> 1. Did you consider an option to yield explicitly in recovery code when
>>> it skips rows? If they are being skipped, it does not matter what are
>>> their transaction borders.
>> I did consider that. It is possible to do so, but then we'll have yet another
>> place (in addition to relay and wal_stream) which counts rows and yields
>> every now and then.
>>
>> I thought it would be better to unify all these places. Actually, this could be
>> done this way from the very beginning.
>> I think it's not recovery's business whether to yield or not once
>> some rows are processed.
>>
>> Anyway, I can make it this way, if you insist.
> The current solution is also fine.
>
>>>> +
>>>> +/**
>>>> + * Plan a yield in recovery stream. Wal stream will execute it as soon as it's
>>>> + * ready.
>>>> + */
>>>> +static void
>>>> +wal_stream_schedule_yield(void)
>>>> +{
>>>> +    wal_stream.has_yield = true;
>>>> +    wal_stream_try_yield(&wal_stream);
>>>> +}
>>>> diff --git a/src/box/recovery.cc b/src/box/recovery.cc
>>>> index cd33e7635..5351d8524 100644
>>>> --- a/src/box/recovery.cc
>>>> +++ b/src/box/recovery.cc
>>>> @@ -241,10 +248,16 @@ static void
>>>>    recover_xlog(struct recovery *r, struct xstream *stream,
>>>>             const struct vclock *stop_vclock)
>>>>    {
>>>> +    /* Imitate old behaviour. Rows are counted separately for each xlog. */
>>>> +    r->row_count = 0;
>>> 3. But why do you need to imitate it? Does it mean if the files are
>>> too small to yield even once in each, but in total their number is
>>> huge, there won't be yields?
>> Yes, that's true.
> Does not this look wrong to you? The xlog files might not contain enough
> rows if wal_max_size is small enough, and then the same issue still
> exists - no yields.
>
>>> Also does it mean "1M rows processed" was not ever printed in that
>>> case?
>> Yes, when WALs are not big enough.
>> Recovery starts over with '0.1M rows processed' on every new WAL file.
> Does not this look wrong to you too? That at least the number of
> rows should not drop to 0 on each next xlog file.

Yep, let's change it then. I thought we had to preserve log output.
Fixed and added a changelog entry.

=================================

diff --git a/changelogs/unreleased/gh-5979-recovery-ligs.md 
b/changelogs/unreleased/gh-5979-recovery-ligs.md
new file mode 100644
index 000000000..86abfd66a
--- /dev/null
+++ b/changelogs/unreleased/gh-5979-recovery-ligs.md
@@ -0,0 +1,11 @@
+# bugfix/core
+
+* Now tarantool yields when scanning `.xlog` files for the latest 
applied vclock
+  and when finding the right place in `.xlog`s to start recovering. 
This means
+  that the instance is responsive right after `box.cfg` call even when 
an empty
+  `.xlog` was not created on previous exit.
+  Also this prevents relay from timing out when a freshly subscribed 
replica
+  needs rows from the end of a relatively long (hundreds of MBs) `.xlog`
+  (gh-5979).
+* The counter in `x.yM rows processed` log messages does not reset on 
each new
+  recovered `xlog` anymore.
diff --git a/src/box/recovery.cc b/src/box/recovery.cc
index 5351d8524..8359f216d 100644
--- a/src/box/recovery.cc
+++ b/src/box/recovery.cc
@@ -149,6 +149,13 @@ recovery_scan(struct recovery *r, struct vclock 
*end_vclock,
                 }
         }
         xlog_cursor_close(&cursor, false);
+
+       /*
+        * Do not show scanned rows in log output and yield just in case
+        * row_count was less than WAL_ROWS_PER_YIELD when we reset it.
+        */
+       r->row_count = 0;
+       r->schedule_yield();
  }

  static inline void
@@ -248,8 +255,6 @@ static void
  recover_xlog(struct recovery *r, struct xstream *stream,
              const struct vclock *stop_vclock)
  {
-       /* Imitate old behaviour. Rows are counted separately for each 
xlog. */
-       r->row_count = 0;
         struct xrow_header row;
         while (xlog_cursor_next_xc(&r->cursor, &row,
                                    r->wal_dir.force_recovery) == 0) {

=================================

-- 
Serge Petrenko


  reply	other threads:[~2021-04-29  8:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 16:59 Serge Petrenko via Tarantool-patches
2021-04-26 21:20 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-28 15:34   ` Serge Petrenko via Tarantool-patches
2021-04-28 20:50     ` Vladislav Shpilevoy via Tarantool-patches
2021-04-29  8:55       ` Serge Petrenko via Tarantool-patches [this message]
2021-04-29 20:03         ` Vladislav Shpilevoy via Tarantool-patches
2021-05-12 11:30           ` Serge Petrenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d0997f3-066f-5b2c-3ae3-22b90583e9bf@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git