Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tarantool-patches@dev.tarantool.org,
	yaroslav.dynnikov@tarantool.org, olegrok@tarantool.org
Subject: [Tarantool-patches] [PATCH vshard 1/1] recovery: relax recovery messages verbosity
Date: Wed,  2 Jun 2021 23:34:29 +0200	[thread overview]
Message-ID: <ddf4570db038934ba121f5a3e99f690930faa869.1622669605.git.v.shpilevoy@tarantool.org> (raw)

Recovery fiber on the storages used to print messages about
starting recovery even when no recovery was needed yet:

    Starting ... buckets recovery step
    Finish bucket recovery step

It happened a lot during rebalancing even if it worked fine.
Because there appear receiving/sending buckets, and recovery
double-checks if they are really transferring, not stuck.

The patch makes recovery fiber not account the buckets, whose
transfer is actually in progress, as broken. Hence it won't print
the recovery messages anymore unless the transfer was really

Along with that the recovery now prints more details about the
first bucket which triggered the real recovery.

Closes #274
Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-274-user-friendly-recovery
Issue: https://github.com/tarantool/vshard/issues/274

 vshard/storage/init.lua | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 7045d91..8a019fa 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -736,21 +736,25 @@ local function recovery_step_by_type(type)
     local is_empty = true
     local recovered = 0
     local total = 0
+    local start_format = 'Starting %s buckets recovery step'
     for _, bucket in _bucket.index.status:pairs(type) do
         total = total + 1
         local bucket_id = bucket.id
         if M.rebalancer_transfering_buckets[bucket_id] then
             goto continue
-        if is_empty then
-            log.info('Starting %s buckets recovery step', type)
-        end
-        is_empty = false
-        local destination = M.replicasets[bucket.destination]
+        local peer_uuid = bucket.destination
+        local destination = M.replicasets[peer_uuid]
         if not destination or not destination.master then
             -- No replicaset master for a bucket. Wait until it
             -- appears.
+            if is_empty then
+                log.info(start_format, type)
+                log.error('Can not find for bucket %s its peer %s', bucket_id,
+                          peer_uuid)
+                is_empty = false
+            end
             goto continue
         local remote_bucket, err =
@@ -759,6 +763,15 @@ local function recovery_step_by_type(type)
         -- not be used to recovery anything. Try later.
         if not remote_bucket and (not err or err.type ~= 'ShardingError' or
                                   err.code ~= lerror.code.WRONG_BUCKET) then
+            if is_empty then
+                if err == nil then
+                    err = 'unknown'
+                end
+                log.info(start_format, type)
+                log.error('Error during recovery of bucket %s on replicaset '..
+                          '%s: %s', bucket_id, peer_uuid, err)
+                is_empty = false
+            end
             goto continue
         -- Do nothing until the bucket on both sides stopped
@@ -772,13 +785,20 @@ local function recovery_step_by_type(type)
         if not bucket or not bucket_is_transfer_in_progress(bucket) then
             goto continue
+        if is_empty then
+            log.info(start_format, type)
+        end
         if recovery_local_bucket_is_garbage(bucket, remote_bucket) then
             _bucket:update({bucket_id}, {{'=', 2, consts.BUCKET.GARBAGE}})
             recovered = recovered + 1
         elseif recovery_local_bucket_is_active(bucket, remote_bucket) then
             _bucket:replace({bucket_id, consts.BUCKET.ACTIVE})
             recovered = recovered + 1
+        elseif is_empty then
+            log.info('Bucket %s is %s local and %s on replicaset %s, waiting',
+                     bucket_id, bucket.status, remote_bucket.status, peer_uuid)
+        is_empty = false
     if not is_empty then
2.24.3 (Apple Git-128)

             reply	other threads:[~2021-06-02 21:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-02 21:34 Vladislav Shpilevoy via Tarantool-patches [this message]
2021-06-03 10:19 ` Oleg Babin via Tarantool-patches
2021-06-03 12:55   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-03 13:22     ` Oleg Babin via Tarantool-patches
2021-06-03 14:13       ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddf4570db038934ba121f5a3e99f690930faa869.1622669605.git.v.shpilevoy@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=yaroslav.dynnikov@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH vshard 1/1] recovery: relax recovery messages verbosity' \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox