From: Oleg Babin via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
tarantool-patches@dev.tarantool.org,
yaroslav.dynnikov@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH vshard 1/1] recovery: relax recovery messages verbosity
Date: Thu, 3 Jun 2021 13:19:17 +0300 [thread overview]
Message-ID: <38f8a40b-4fe7-f26a-ddc5-7f088445e4f2@tarantool.org> (raw)
In-Reply-To: <ddf4570db038934ba121f5a3e99f690930faa869.1622669605.git.v.shpilevoy@tarantool.org>
Hi! Thanks for your patch! Looks good but I placed three comments below.
On 03.06.2021 00:34, Vladislav Shpilevoy wrote:
> Recovery fiber on the storages used to print messages about
> starting recovery even when no recovery was needed yet:
>
> Starting ... buckets recovery step
> Finish bucket recovery step
>
> It happened a lot during rebalancing even if it worked fine.
> Because there appear receiving/sending buckets, and recovery
> double-checks if they are really transferring, not stuck.
>
> The patch makes recovery fiber not account the buckets, whose
> transfer is actually in progress, as broken. Hence it won't print
> the recovery messages anymore unless the transfer was really
> interrupted.
>
> Along with that the recovery now prints more details about the
> first bucket which triggered the real recovery.
>
> Closes #274
> ---
> Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-274-user-friendly-recovery
> Issue: https://github.com/tarantool/vshard/issues/274
>
> vshard/storage/init.lua | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 7045d91..8a019fa 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -736,21 +736,25 @@ local function recovery_step_by_type(type)
> local is_empty = true
To be honest I don't completely understand what "is_empty" means.
> local recovered = 0
> local total = 0
> + local start_format = 'Starting %s buckets recovery step'
> for _, bucket in _bucket.index.status:pairs(type) do
> total = total + 1
> local bucket_id = bucket.id
> if M.rebalancer_transfering_buckets[bucket_id] then
Side-note: transfering -> transferring
> goto continue
> end
> - if is_empty then
> - log.info('Starting %s buckets recovery step', type)
> - end
> - is_empty = false
> assert(bucket_is_transfer_in_progress(bucket))
> - local destination = M.replicasets[bucket.destination]
> + local peer_uuid = bucket.destination
> + local destination = M.replicasets[peer_uuid]
> if not destination or not destination.master then
> -- No replicaset master for a bucket. Wait until it
> -- appears.
This comment states that there is no critical and it's appropriate and
we should just wait. Why "error" and not "warn"?
> + if is_empty then
> + log.info(start_format, type)
> + log.error('Can not find for bucket %s its peer %s', bucket_id,
> + peer_uuid)
> + is_empty = false
> + end
> goto continue
> end
> local remote_bucket, err =
> @@ -759,6 +763,15 @@ local function recovery_step_by_type(type)
> -- not be used to recovery anything. Try later.
> if not remote_bucket and (not err or err.type ~= 'ShardingError' or
> err.code ~= lerror.code.WRONG_BUCKET) then
> + if is_empty then
> + if err == nil then
> + err = 'unknown'
> + end
> + log.info(start_format, type)
> + log.error('Error during recovery of bucket %s on replicaset '..
> + '%s: %s', bucket_id, peer_uuid, err)
> + is_empty = false
> + end
> goto continue
> end
> -- Do nothing until the bucket on both sides stopped
> @@ -772,13 +785,20 @@ local function recovery_step_by_type(type)
> if not bucket or not bucket_is_transfer_in_progress(bucket) then
> goto continue
> end
> + if is_empty then
> + log.info(start_format, type)
> + end
> if recovery_local_bucket_is_garbage(bucket, remote_bucket) then
> _bucket:update({bucket_id}, {{'=', 2, consts.BUCKET.GARBAGE}})
> recovered = recovered + 1
> elseif recovery_local_bucket_is_active(bucket, remote_bucket) then
> _bucket:replace({bucket_id, consts.BUCKET.ACTIVE})
> recovered = recovered + 1
> + elseif is_empty then
> + log.info('Bucket %s is %s local and %s on replicaset %s, waiting',
> + bucket_id, bucket.status, remote_bucket.status, peer_uuid)
> end
> + is_empty = false
> ::continue::
> end
> if not is_empty then
next prev parent reply other threads:[~2021-06-03 10:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-02 21:34 Vladislav Shpilevoy via Tarantool-patches
2021-06-03 10:19 ` Oleg Babin via Tarantool-patches [this message]
2021-06-03 12:55 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-03 13:22 ` Oleg Babin via Tarantool-patches
2021-06-03 14:13 ` Vladislav Shpilevoy via Tarantool-patches
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=38f8a40b-4fe7-f26a-ddc5-7f088445e4f2@tarantool.org \
--to=tarantool-patches@dev.tarantool.org \
--cc=olegrok@tarantool.org \
--cc=v.shpilevoy@tarantool.org \
--cc=yaroslav.dynnikov@tarantool.org \
--subject='Re: [Tarantool-patches] [PATCH vshard 1/1] recovery: relax recovery messages verbosity' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox