[Tarantool-patches] [PATCH vshard 1/1] recovery: relax recovery messages verbosity
Oleg Babin
olegrok at tarantool.org
Thu Jun 3 13:19:17 MSK 2021
Hi! Thanks for your patch! Looks good but I placed three comments below.
On 03.06.2021 00:34, Vladislav Shpilevoy wrote:
> Recovery fiber on the storages used to print messages about
> starting recovery even when no recovery was needed yet:
>
> Starting ... buckets recovery step
> Finish bucket recovery step
>
> It happened a lot during rebalancing even if it worked fine.
> Because there appear receiving/sending buckets, and recovery
> double-checks if they are really transferring, not stuck.
>
> The patch makes recovery fiber not account the buckets, whose
> transfer is actually in progress, as broken. Hence it won't print
> the recovery messages anymore unless the transfer was really
> interrupted.
>
> Along with that the recovery now prints more details about the
> first bucket which triggered the real recovery.
>
> Closes #274
> ---
> Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-274-user-friendly-recovery
> Issue: https://github.com/tarantool/vshard/issues/274
>
> vshard/storage/init.lua | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 7045d91..8a019fa 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -736,21 +736,25 @@ local function recovery_step_by_type(type)
> local is_empty = true
To be honest I don't completely understand what "is_empty" means.
> local recovered = 0
> local total = 0
> + local start_format = 'Starting %s buckets recovery step'
> for _, bucket in _bucket.index.status:pairs(type) do
> total = total + 1
> local bucket_id = bucket.id
> if M.rebalancer_transfering_buckets[bucket_id] then
Side-note: transfering -> transferring
> goto continue
> end
> - if is_empty then
> - log.info('Starting %s buckets recovery step', type)
> - end
> - is_empty = false
> assert(bucket_is_transfer_in_progress(bucket))
> - local destination = M.replicasets[bucket.destination]
> + local peer_uuid = bucket.destination
> + local destination = M.replicasets[peer_uuid]
> if not destination or not destination.master then
> -- No replicaset master for a bucket. Wait until it
> -- appears.
This comment states that there is no critical and it's appropriate and
we should just wait. Why "error" and not "warn"?
> + if is_empty then
> + log.info(start_format, type)
> + log.error('Can not find for bucket %s its peer %s', bucket_id,
> + peer_uuid)
> + is_empty = false
> + end
> goto continue
> end
> local remote_bucket, err =
> @@ -759,6 +763,15 @@ local function recovery_step_by_type(type)
> -- not be used to recovery anything. Try later.
> if not remote_bucket and (not err or err.type ~= 'ShardingError' or
> err.code ~= lerror.code.WRONG_BUCKET) then
> + if is_empty then
> + if err == nil then
> + err = 'unknown'
> + end
> + log.info(start_format, type)
> + log.error('Error during recovery of bucket %s on replicaset '..
> + '%s: %s', bucket_id, peer_uuid, err)
> + is_empty = false
> + end
> goto continue
> end
> -- Do nothing until the bucket on both sides stopped
> @@ -772,13 +785,20 @@ local function recovery_step_by_type(type)
> if not bucket or not bucket_is_transfer_in_progress(bucket) then
> goto continue
> end
> + if is_empty then
> + log.info(start_format, type)
> + end
> if recovery_local_bucket_is_garbage(bucket, remote_bucket) then
> _bucket:update({bucket_id}, {{'=', 2, consts.BUCKET.GARBAGE}})
> recovered = recovered + 1
> elseif recovery_local_bucket_is_active(bucket, remote_bucket) then
> _bucket:replace({bucket_id, consts.BUCKET.ACTIVE})
> recovered = recovered + 1
> + elseif is_empty then
> + log.info('Bucket %s is %s local and %s on replicaset %s, waiting',
> + bucket_id, bucket.status, remote_bucket.status, peer_uuid)
> end
> + is_empty = false
> ::continue::
> end
> if not is_empty then
More information about the Tarantool-patches
mailing list