Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] recovery: make it yield when positioning in a WAL
Date: Wed, 28 Apr 2021 18:34:05 +0300
Message-ID: <eff3dfea-b394-606d-e342-eb5aa46e5efd@tarantool.org> (raw)
In-Reply-To: <b10505e4-5732-c224-4e0c-e9ccb99c3924@tarantool.org>



27.04.2021 00:20, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
>
> See 2 questions, 1 comment.
>
> On 26.04.2021 18:59, Serge Petrenko wrote:

Thanks for the review!
>> We had various places in box.cc and relay.cc which counted processed
>> rows and yielded every now and then. These yields didn't cover cases,
>> when recovery has to position inside a long WAL file:
>>
>> For example, when tarantool exits without leaving an empty WAL file
>> which'll be used to recover instance vclock on restart. In this case
>> the instance freezes while processing the last availabe WAL in order
>> to recover the vclock.
>>
>> Another issue is with replication. If a replica connects and needs data
>> from the end of a really long WAL, recovery will read up to the needed
>> position without yields, making relay disconnect by timeout.
>>
>> In order to fix the issue, make recovery decide when a yield should
>> happen. Introduce a new callback: schedule_yield, which is called by
>> recovery once it processes (no matter how, either simply skips or calls
>> xstream_write) enough rows (WAL_ROWS_PER_YIELD).
>>
>> schedule_yield either yields right away, in case of relay, or saves the
>> yield for later, in case of local recovery, because it might be in the
>> middle of a transaction.
> 1. Did you consider an option to yield explicitly in recovery code when
> it skips rows? If they are being skipped, it does not matter what are
> their transaction borders.

I did consider that. It is possible to do so, but then we'll have yet 
another
place (in addition to relay and wal_stream) which counts rows and yields
every now and then.

I thought it would be better to unify all these places. Actually, this 
could be
done this way from the very beginning.
I think it's not recovery's business whether to yield or not once
some rows are processed.

Anyway, I can make it this way, if you insist.

>
> Then the whole patch would be to add the yield once per WAL_ROWS_PER_YIELD
> to recovery_scan(), correct?

True. One place in recovery_scan() and one place in recover_xlog(), when
the rows are skipped.

>
>> The only place with explicit row counting and manual yielding is now in
>> relay_initial_join, since its row sources are engines rather than recovery
>> with its WAL files.
>>
>> Closes #5979
>> ---
>> https://github.com/tarantool/tarantool/tree/sp/gh-5979-recovery-yield
>> https://github.com/tarantool/tarantool/issues/5979
>>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 59925962d..69a8f87eb 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -3101,6 +3087,19 @@ bootstrap(const struct tt_uuid *instance_uuid,
>>   	}
>>   }
>>   
>> +struct wal_stream wal_stream;
> 2. This must be static.

Sure.

===================================================

diff --git a/src/box/box.cc b/src/box/box.cc
index 69a8f87eb..62b55352e 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -3087,7 +3087,7 @@ bootstrap(const struct tt_uuid *instance_uuid,
         }
  }

-struct wal_stream wal_stream;
+static struct wal_stream wal_stream;

  /**
   * Plan a yield in recovery stream. Wal stream will execute it as soon 
as it's


===================================================
>
>> +
>> +/**
>> + * Plan a yield in recovery stream. Wal stream will execute it as soon as it's
>> + * ready.
>> + */
>> +static void
>> +wal_stream_schedule_yield(void)
>> +{
>> +	wal_stream.has_yield = true;
>> +	wal_stream_try_yield(&wal_stream);
>> +}
>> diff --git a/src/box/recovery.cc b/src/box/recovery.cc
>> index cd33e7635..5351d8524 100644
>> --- a/src/box/recovery.cc
>> +++ b/src/box/recovery.cc
>> @@ -241,10 +248,16 @@ static void
>>   recover_xlog(struct recovery *r, struct xstream *stream,
>>   	     const struct vclock *stop_vclock)
>>   {
>> +	/* Imitate old behaviour. Rows are counted separately for each xlog. */
>> +	r->row_count = 0;
> 3. But why do you need to imitate it? Does it mean if the files are
> too small to yield even once in each, but in total their number is
> huge, there won't be yields?

Yes, that's true.

>
> Also does it mean "1M rows processed" was not ever printed in that
> case?

Yes, when WALs are not big enough.
Recovery starts over with '0.1M rows processed' on every new WAL file.

>
>>   	struct xrow_header row;
>> -	uint64_t row_count = 0;
>>   	while (xlog_cursor_next_xc(&r->cursor, &row,
>>   				   r->wal_dir.force_recovery) == 0) {
>> +		if (++r->row_count % WAL_ROWS_PER_YIELD == 0) {
>> +			r->schedule_yield();
>> +		}
>> +		if (r->row_count % 100000 == 0)
>> +			say_info("%.1fM rows processed", r->row_count / 1000000.);
>>   		/*
>>   		 * Read the next row from xlog file.
>>   		 *
>> @@ -273,12 +286,7 @@ recover_xlog(struct recovery *r, struct xstream *stream,
>>   		 * failed row anyway.
>>   		 */
>>   		vclock_follow_xrow(&r->vclock, &row);
>> -		if (xstream_write(stream, &row) == 0) {
>> -			++row_count;
>> -			if (row_count % 100000 == 0)
>> -				say_info("%.1fM rows processed",
>> -					 row_count / 1000000.);
>> -		} else {
>> +		if (xstream_write(stream, &row) != 0) {
>>   			if (!r->wal_dir.force_recovery)
>>   				diag_raise();
>>   

-- 
Serge Petrenko


  reply	other threads:[~2021-04-28 15:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 16:59 Serge Petrenko via Tarantool-patches
2021-04-26 21:20 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-28 15:34   ` Serge Petrenko via Tarantool-patches [this message]
2021-04-28 20:50     ` Vladislav Shpilevoy via Tarantool-patches
2021-04-29  8:55       ` Serge Petrenko via Tarantool-patches
2021-04-29 20:03         ` Vladislav Shpilevoy via Tarantool-patches
2021-05-12 11:30           ` Serge Petrenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eff3dfea-b394-606d-e342-eb5aa46e5efd@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git