Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tarantool-patches@dev.tarantool.org,
	yaroslav.dynnikov@tarantool.org, olegrok@tarantool.org
Subject: [Tarantool-patches] [PATCH vshard 1/1] rebalancer: give more info at bucket_recv() fail
Date: Tue, 25 May 2021 22:42:59 +0200	[thread overview]
Message-ID: <8d7e89a4884559963c02719fca0dc0720632fc73.1621975324.git.v.shpilevoy@tarantool.org> (raw)

vshard.storage.bucket_recv() used to raise the natural
space:insert(...) error without any additional info when it
failed. For instance, due to incorrect format, or a duplicate key.

When such an error happens, it is very useful to know what was the
problematic space and what was the failed tuple. This patch
enriches the space insertion error with that information.

The new detailed error object and its message should help to fix
the rebalancing issues, which quite often are about schema
mismatch in different replicasets. Especially hard to debug when
number of spaces is tens of even hundreds.

The old way was fine for errors like "duplicate key" because on
the newest version of Tarantool it contains the space name, the
old and the new tuples. But errors like tuple format mismatch
still are not very informative. VShard now tries to enrich all the
possible errors.

Closes #275
---
Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-275-bucket_recv-detailed-error
Issue: https://github.com/tarantool/vshard/issues/275

 test/storage/storage.result   | 47 +++++++++++++++++++++++++++++++++++
 test/storage/storage.test.lua | 17 +++++++++++++
 vshard/error.lua              |  5 ++++
 vshard/storage/init.lua       |  8 +++++-
 4 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/test/storage/storage.result b/test/storage/storage.result
index 570d9c6..5372059 100644
--- a/test/storage/storage.result
+++ b/test/storage/storage.result
@@ -521,6 +521,53 @@ while box.space._bucket:get{4} do vshard.storage.recovery_wakeup() fiber.sleep(0
 ---
 ...
 --
+-- gh-275: detailed info when couldn't insert into a space.
+--
+res, err = vshard.storage.bucket_recv(                                          \
+    4, util.replicasets[2], {{box.space.test.id, {{9, 4}, {10, 4}, {1, 4}}}})
+---
+...
+assert(not res)
+---
+- true
+...
+assert(err.space == 'test')
+---
+- true
+...
+assert(err.bucket_id == 4)
+---
+- true
+...
+assert(tostring(err.tuple) == '[1, 4]')
+---
+- true
+...
+assert(err.reason:match('Duplicate key exists') ~= nil)
+---
+- true
+...
+err = err.message
+---
+...
+assert(err:match('bucket 4 data in space "test" at tuple %[1, 4%]') ~= nil)
+---
+- true
+...
+assert(err:match('Duplicate key exists') ~= nil)
+---
+- true
+...
+while box.space._bucket:get{4} do                                               \
+    vshard.storage.recovery_wakeup() fiber.sleep(0.01)                          \
+end
+---
+...
+assert(box.space.test:get{9} == nil and box.space.test:get{10} == nil)
+---
+- true
+...
+--
 -- Bucket transfer
 --
 -- Transfer to unknown replicaset.
diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
index 494e2e8..d1f3f50 100644
--- a/test/storage/storage.test.lua
+++ b/test/storage/storage.test.lua
@@ -125,6 +125,23 @@ vshard.storage.bucket_recv(100, 'from_uuid', {{1000, {{1}}}})
 res, err = vshard.storage.bucket_recv(4, util.replicasets[2], {{1000, {{1}}}})
 util.portable_error(err)
 while box.space._bucket:get{4} do vshard.storage.recovery_wakeup() fiber.sleep(0.01) end
+--
+-- gh-275: detailed info when couldn't insert into a space.
+--
+res, err = vshard.storage.bucket_recv(                                          \
+    4, util.replicasets[2], {{box.space.test.id, {{9, 4}, {10, 4}, {1, 4}}}})
+assert(not res)
+assert(err.space == 'test')
+assert(err.bucket_id == 4)
+assert(tostring(err.tuple) == '[1, 4]')
+assert(err.reason:match('Duplicate key exists') ~= nil)
+err = err.message
+assert(err:match('bucket 4 data in space "test" at tuple %[1, 4%]') ~= nil)
+assert(err:match('Duplicate key exists') ~= nil)
+while box.space._bucket:get{4} do                                               \
+    vshard.storage.recovery_wakeup() fiber.sleep(0.01)                          \
+end
+assert(box.space.test:get{9} == nil and box.space.test:get{10} == nil)
 
 --
 -- Bucket transfer
diff --git a/vshard/error.lua b/vshard/error.lua
index b02bfe9..bcbcd71 100644
--- a/vshard/error.lua
+++ b/vshard/error.lua
@@ -149,6 +149,11 @@ local error_message_template = {
         msg = 'Can not delete a storage ref: %s',
         args = {'reason'},
     },
+    [30] = {
+        name = 'BUCKET_RECV_DATA_ERROR',
+        msg = 'Can not receive the bucket %s data in space "%s" at tuple %s: %s',
+        args = {'bucket_id', 'space', 'tuple', 'reason'},
+    }
 }
 
 --
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 63e0398..7045d91 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -1254,7 +1254,13 @@ local function bucket_recv_xc(bucket_id, from, data, opts)
         end
         box.begin()
         for _, tuple in ipairs(space_data) do
-            space:insert(tuple)
+            local ok, err = pcall(space.insert, space, tuple)
+            if not ok then
+                box.rollback()
+                return nil, lerror.vshard(lerror.code.BUCKET_RECV_DATA_ERROR,
+                                          bucket_id, space.name,
+                                          box.tuple.new(tuple), err)
+            end
             limit = limit - 1
             if limit == 0 then
                 box.commit()
-- 
2.24.3 (Apple Git-128)


             reply	other threads:[~2021-05-25 20:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-25 20:42 Vladislav Shpilevoy via Tarantool-patches [this message]
2021-05-26  9:03 ` Oleg Babin via Tarantool-patches
2021-05-26 18:44   ` Vladislav Shpilevoy via Tarantool-patches
2021-05-27  8:24     ` Oleg Babin via Tarantool-patches
2021-05-27 19:20 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d7e89a4884559963c02719fca0dc0720632fc73.1621975324.git.v.shpilevoy@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=yaroslav.dynnikov@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH vshard 1/1] rebalancer: give more info at bucket_recv() fail' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox