Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master
@ 2020-08-04 21:04 Vladislav Shpilevoy
  2020-08-04 22:45 ` Yaroslav Dynnikov
  0 siblings, 1 reply; 5+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-04 21:04 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Before the patch a replica couldn't call vshard.storage.cfg()
before the master is bootstrapped. That could lead to an exception
in the middle of vshard.storage.cfg().

It sounds not so bad and even looks unreachable, because replica
bootstrap anyway is blocked in box.cfg() until a master node is
started, otherwise the replica instance is terminated.
But the box.cfg bootstrap != vshard bootstrap.

It could be that both replica and master nodes are already
bootstrapped and working, but vshard.storage.cfg() wasn't called
yet. In that case vshard.storage.cfg() on the replica wouldn't
block in box.cfg(), and would happily fail a few lines later in
vshard.storage.cfg() at attempt to touch not existing
box.space._bucket.

That was the case for cartridge. The framework configures
instances in its own way before vshard.storage.cfg(). Then it
calls vshard.storage.cfg() on them in arbitrary order, and
sometimes replicas were configured earlier than the master.

The fail is fixed by simply skipping the _bucket's trigger
installation on the replica node. Because so far it is not needed
here for anything. The trigger's sole purpose is to increment
bucket generation used only for DML on _bucket on the master node.

Now vshard.storage.cfg() on replica is able to finish before
master does the same. However the replica still won't be able to
handle vshard.storage.* requests until receives vshard schema
from master.

Closes #237
---
Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-237-boot-replica-first
Issue: https://github.com/tarantool/vshard/issues/237

 test/lua_libs/storage_template.lua      |  55 +++++++++++-
 test/reload_evolution/storage.result    |  10 +++
 test/reload_evolution/storage.test.lua  |   8 ++
 test/router/boot_replica_first.result   | 112 ++++++++++++++++++++++++
 test/router/boot_replica_first.test.lua |  42 +++++++++
 vshard/storage/init.lua                 |  18 +++-
 vshard/storage/reload_evolution.lua     |  10 ++-
 7 files changed, 251 insertions(+), 4 deletions(-)
 create mode 100644 test/router/boot_replica_first.result
 create mode 100644 test/router/boot_replica_first.test.lua

diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
index 29a753d..84e4180 100644
--- a/test/lua_libs/storage_template.lua
+++ b/test/lua_libs/storage_template.lua
@@ -1,5 +1,7 @@
 #!/usr/bin/env tarantool
 
+local luri = require('uri')
+
 NAME = require('fio').basename(arg[0], '.lua')
 fiber = require('fiber')
 test_run = require('test_run').new()
@@ -10,12 +12,63 @@ log = require('log')
 if not cfg.shard_index then
     cfg.shard_index = 'bucket_id'
 end
+instance_uuid = util.name_to_uuid[NAME]
+
+--
+-- Bootstrap the instance exactly like vshard does. But don't
+-- initialize any vshard-specific code.
+--
+local function boot_like_vshard()
+    assert(type(box.cfg) == 'function')
+    for rs_uuid, rs in pairs(cfg.sharding) do
+        for replica_uuid, replica in pairs(rs.replicas) do
+            if replica_uuid == instance_uuid then
+                local box_cfg = {replication = {}}
+                box_cfg.instance_uuid = replica_uuid
+                box_cfg.replicaset_uuid = rs_uuid
+                box_cfg.listen = replica.uri
+                box_cfg.read_only = not replica.master
+                box_cfg.replication_connect_quorum = 0
+                box_cfg.replication_timeout = 0.1
+                for _, replica in pairs(rs.replicas) do
+                    table.insert(box_cfg.replication, replica.uri)
+                end
+                box.cfg(box_cfg)
+                if not replica.master then
+                    return
+                end
+                local uri = luri.parse(replica.uri)
+                box.schema.user.create(uri.login, {
+                    password = uri.password, if_not_exists = true,
+                })
+                box.schema.user.grant(uri.login, 'super')
+                return
+            end
+        end
+    end
+    assert(false)
+end
+
+local omit_cfg = false
+local i = 1
+while arg[i] ~= nil do
+    local key = arg[i]
+    i = i + 1
+    if key == 'boot_before_cfg' then
+        boot_like_vshard()
+        omit_cfg = true
+    end
+end
 
 vshard = require('vshard')
 echo_count = 0
 cfg.replication_connect_timeout = 3
 cfg.replication_timeout = 0.1
-vshard.storage.cfg(cfg, util.name_to_uuid[NAME])
+
+if not omit_cfg then
+    vshard.storage.cfg(cfg, instance_uuid)
+end
+
 function bootstrap_storage(engine)
     box.once("testapp:schema:1", function()
         if rawget(_G, 'CHANGE_SPACE_IDS') then
diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
index ebf4fdc..4652c4f 100644
--- a/test/reload_evolution/storage.result
+++ b/test/reload_evolution/storage.result
@@ -92,6 +92,16 @@ test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to')
 ...
 vshard.storage.internal.reload_version
 ---
+- 2
+...
+--
+-- gh-237: should be only one trigger. During gh-237 the trigger installation
+-- became conditional and therefore required to remember the current trigger
+-- somewhere. When reloaded from the old version, the trigger needed to be
+-- fetched from _bucket:on_replace().
+--
+#box.space._bucket:on_replace()
+---
 - 1
 ...
 -- Make sure storage operates well.
diff --git a/test/reload_evolution/storage.test.lua b/test/reload_evolution/storage.test.lua
index 56c1693..06f7117 100644
--- a/test/reload_evolution/storage.test.lua
+++ b/test/reload_evolution/storage.test.lua
@@ -37,6 +37,14 @@ package.loaded['vshard.storage'] = nil
 vshard.storage = require("vshard.storage")
 test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to') ~= nil
 vshard.storage.internal.reload_version
+--
+-- gh-237: should be only one trigger. During gh-237 the trigger installation
+-- became conditional and therefore required to remember the current trigger
+-- somewhere. When reloaded from the old version, the trigger needed to be
+-- fetched from _bucket:on_replace().
+--
+#box.space._bucket:on_replace()
+
 -- Make sure storage operates well.
 vshard.storage.bucket_force_drop(2000)
 vshard.storage.bucket_force_create(2000)
diff --git a/test/router/boot_replica_first.result b/test/router/boot_replica_first.result
new file mode 100644
index 0000000..1705230
--- /dev/null
+++ b/test/router/boot_replica_first.result
@@ -0,0 +1,112 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
+ | ---
+ | ...
+test_run:create_cluster(REPLICASET_1, 'router', {args = 'boot_before_cfg'})
+ | ---
+ | ...
+util = require('util')
+ | ---
+ | ...
+util.wait_master(test_run, REPLICASET_1, 'box_1_a')
+ | ---
+ | ...
+_ = test_run:cmd("create server router with script='router/router_2.lua'")
+ | ---
+ | ...
+_ = test_run:cmd("start server router")
+ | ---
+ | ...
+
+--
+-- gh-237: replica should be able to boot before master. Before the issue was
+-- fixed, replica always tried to install a trigger on _bucket space even when
+-- it was not created on a master yet - that lead to an exception in
+-- storage.cfg. Now it should not install the trigger at all, because anyway it
+-- is not needed on replica for anything.
+--
+
+test_run:switch('box_1_b')
+ | ---
+ | - true
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+-- _bucket is not created yet. Will fail.
+util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
+ | ---
+ | - attempt to index field '_bucket' (a nil value)
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
+ | ---
+ | ...
+
+test_run:switch('box_1_a')
+ | ---
+ | - true
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('box_1_b')
+ | ---
+ | - true
+ | ...
+test_run:wait_lsn('box_1_b', 'box_1_a')
+ | ---
+ | ...
+-- Fails, but gracefully. Because the bucket is not found here.
+vshard.storage.call(1, 'read', 'echo', {100})
+ | ---
+ | - null
+ | - bucket_id: 1
+ |   reason: Not found
+ |   code: 1
+ |   type: ShardingError
+ |   message: 'Cannot perform action with bucket 1, reason: Not found'
+ |   name: WRONG_BUCKET
+ | ...
+-- Should not have triggers.
+#box.space._bucket:on_replace()
+ | ---
+ | - 0
+ | ...
+
+test_run:switch('router')
+ | ---
+ | - true
+ | ...
+vshard.router.bootstrap()
+ | ---
+ | - true
+ | ...
+vshard.router.callro(1, 'echo', {100})
+ | ---
+ | - 100
+ | ...
+
+test_run:switch("default")
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server router')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server router')
+ | ---
+ | - true
+ | ...
+test_run:drop_cluster(REPLICASET_1)
+ | ---
+ | ...
diff --git a/test/router/boot_replica_first.test.lua b/test/router/boot_replica_first.test.lua
new file mode 100644
index 0000000..7b1b3fd
--- /dev/null
+++ b/test/router/boot_replica_first.test.lua
@@ -0,0 +1,42 @@
+test_run = require('test_run').new()
+REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
+test_run:create_cluster(REPLICASET_1, 'router', {args = 'boot_before_cfg'})
+util = require('util')
+util.wait_master(test_run, REPLICASET_1, 'box_1_a')
+_ = test_run:cmd("create server router with script='router/router_2.lua'")
+_ = test_run:cmd("start server router")
+
+--
+-- gh-237: replica should be able to boot before master. Before the issue was
+-- fixed, replica always tried to install a trigger on _bucket space even when
+-- it was not created on a master yet - that lead to an exception in
+-- storage.cfg. Now it should not install the trigger at all, because anyway it
+-- is not needed on replica for anything.
+--
+
+test_run:switch('box_1_b')
+vshard.storage.cfg(cfg, instance_uuid)
+-- _bucket is not created yet. Will fail.
+util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
+
+test_run:switch('default')
+util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
+
+test_run:switch('box_1_a')
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('box_1_b')
+test_run:wait_lsn('box_1_b', 'box_1_a')
+-- Fails, but gracefully. Because the bucket is not found here.
+vshard.storage.call(1, 'read', 'echo', {100})
+-- Should not have triggers.
+#box.space._bucket:on_replace()
+
+test_run:switch('router')
+vshard.router.bootstrap()
+vshard.router.callro(1, 'echo', {100})
+
+test_run:switch("default")
+test_run:cmd('stop server router')
+test_run:cmd('delete server router')
+test_run:drop_cluster(REPLICASET_1)
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index c6a78fe..ed577f9 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -94,6 +94,17 @@ if not M then
         -- detect that _bucket was not changed between yields.
         --
         bucket_generation = 0,
+        --
+        -- Reference to the function used as on_replace trigger on
+        -- _bucket space. It is used to replace the trigger with
+        -- a new function when reload happens. It is kept
+        -- explicitly because the old function is deleted on
+        -- reload from the global namespace. On the other hand, it
+        -- is still stored in _bucket:on_replace() somewhere, but
+        -- it is not known where. The only 100% way to be able to
+        -- replace the old function is to keep its reference.
+        --
+        bucket_on_replace = nil,
 
         ------------------- Garbage collection -------------------
         -- Fiber to remove garbage buckets data.
@@ -2435,8 +2446,11 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
     local uri = luri.parse(this_replica.uri)
     schema_upgrade(is_master, uri.login, uri.password)
 
-    local old_trigger = box.space._bucket:on_replace()[1]
-    box.space._bucket:on_replace(bucket_generation_increment, old_trigger)
+    if is_master then
+        local old_trigger = M.bucket_on_replace
+        box.space._bucket:on_replace(bucket_generation_increment, old_trigger)
+        M.bucket_on_replace = bucket_generation_increment
+    end
 
     lreplicaset.rebind_replicasets(new_replicasets, M.replicasets)
     lreplicaset.outdate_replicasets(M.replicasets)
diff --git a/vshard/storage/reload_evolution.lua b/vshard/storage/reload_evolution.lua
index 5d09b11..f38af74 100644
--- a/vshard/storage/reload_evolution.lua
+++ b/vshard/storage/reload_evolution.lua
@@ -17,6 +17,14 @@ migrations[#migrations + 1] = function(M)
     -- Code to update Lua objects.
 end
 
+migrations[#migrations + 1] = function(M)
+    local bucket = box.space._bucket
+    if bucket then
+        assert(M.bucket_on_replace == nil)
+        M.bucket_on_replace = bucket:on_replace()[1]
+    end
+end
+
 --
 -- Perform an update based on a version stored in `M` (internals).
 -- @param M Old module internals which should be updated.
@@ -33,7 +41,7 @@ local function upgrade(M)
         log.error(err_msg)
         error(err_msg)
     end
-    for i = start_version, #migrations  do
+    for i = start_version + 1, #migrations do
         local ok, err = pcall(migrations[i], M)
         if ok then
             log.info('vshard.storage.reload_evolution: upgraded to %d version',
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master
  2020-08-04 21:04 [Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master Vladislav Shpilevoy
@ 2020-08-04 22:45 ` Yaroslav Dynnikov
  2020-08-05  9:46   ` Yaroslav Dynnikov
  2020-08-05 22:15   ` Vladislav Shpilevoy
  0 siblings, 2 replies; 5+ messages in thread
From: Yaroslav Dynnikov @ 2020-08-04 22:45 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: Yaroslav Dynnikov, tml

[-- Attachment #1: Type: text/plain, Size: 14428 bytes --]

Hi, Vlad

Thanks for the fix a lot. In general it looks good to me, but I've got few
questions. Find them below.

Best regards
Yaroslav Dynnikov


On Wed, 5 Aug 2020 at 00:04, Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
wrote:

> Before the patch a replica couldn't call vshard.storage.cfg()
> before the master is bootstrapped. That could lead to an exception
> in the middle of vshard.storage.cfg().
>
> It sounds not so bad and even looks unreachable, because replica
> bootstrap anyway is blocked in box.cfg() until a master node is
> started, otherwise the replica instance is terminated.
> But the box.cfg bootstrap != vshard bootstrap.
>
> It could be that both replica and master nodes are already
> bootstrapped and working, but vshard.storage.cfg() wasn't called
> yet. In that case vshard.storage.cfg() on the replica wouldn't
> block in box.cfg(), and would happily fail a few lines later in
> vshard.storage.cfg() at attempt to touch not existing
> box.space._bucket.
>
> That was the case for cartridge. The framework configures
> instances in its own way before vshard.storage.cfg(). Then it
> calls vshard.storage.cfg() on them in arbitrary order, and
> sometimes replicas were configured earlier than the master.
>
> The fail is fixed by simply skipping the _bucket's trigger
> installation on the replica node. Because so far it is not needed
> here for anything. The trigger's sole purpose is to increment
> bucket generation used only for DML on _bucket on the master node.
>
> Now vshard.storage.cfg() on replica is able to finish before
> master does the same. However the replica still won't be able to
> handle vshard.storage.* requests until receives vshard schema
> from master.
>
> Closes #237
> ---
> Branch:
> http://github.com/tarantool/vshard/tree/gerold103/gh-237-boot-replica-first
> Issue: https://github.com/tarantool/vshard/issues/237
>
>  test/lua_libs/storage_template.lua      |  55 +++++++++++-
>  test/reload_evolution/storage.result    |  10 +++
>  test/reload_evolution/storage.test.lua  |   8 ++
>  test/router/boot_replica_first.result   | 112 ++++++++++++++++++++++++
>  test/router/boot_replica_first.test.lua |  42 +++++++++
>  vshard/storage/init.lua                 |  18 +++-
>  vshard/storage/reload_evolution.lua     |  10 ++-
>  7 files changed, 251 insertions(+), 4 deletions(-)
>  create mode 100644 test/router/boot_replica_first.result
>  create mode 100644 test/router/boot_replica_first.test.lua
>
> diff --git a/test/lua_libs/storage_template.lua
> b/test/lua_libs/storage_template.lua
> index 29a753d..84e4180 100644
> --- a/test/lua_libs/storage_template.lua
> +++ b/test/lua_libs/storage_template.lua
> @@ -1,5 +1,7 @@
>  #!/usr/bin/env tarantool
>
> +local luri = require('uri')
> +
>  NAME = require('fio').basename(arg[0], '.lua')
>  fiber = require('fiber')
>  test_run = require('test_run').new()
> @@ -10,12 +12,63 @@ log = require('log')
>  if not cfg.shard_index then
>      cfg.shard_index = 'bucket_id'
>  end
> +instance_uuid = util.name_to_uuid[NAME]
> +
> +--
> +-- Bootstrap the instance exactly like vshard does. But don't
> +-- initialize any vshard-specific code.
> +--
> +local function boot_like_vshard()
> +    assert(type(box.cfg) == 'function')
> +    for rs_uuid, rs in pairs(cfg.sharding) do
> +        for replica_uuid, replica in pairs(rs.replicas) do
> +            if replica_uuid == instance_uuid then
> +                local box_cfg = {replication = {}}
> +                box_cfg.instance_uuid = replica_uuid
> +                box_cfg.replicaset_uuid = rs_uuid
> +                box_cfg.listen = replica.uri
> +                box_cfg.read_only = not replica.master
> +                box_cfg.replication_connect_quorum = 0
> +                box_cfg.replication_timeout = 0.1
> +                for _, replica in pairs(rs.replicas) do
> +                    table.insert(box_cfg.replication, replica.uri)
> +                end
> +                box.cfg(box_cfg)
> +                if not replica.master then
> +                    return
> +                end
> +                local uri = luri.parse(replica.uri)
> +                box.schema.user.create(uri.login, {
> +                    password = uri.password, if_not_exists = true,
> +                })
> +                box.schema.user.grant(uri.login, 'super')
> +                return
> +            end
> +        end
> +    end
> +    assert(false)
>

Should there be some meaningful message?


> +end
> +
> +local omit_cfg = false
> +local i = 1
> +while arg[i] ~= nil do
> +    local key = arg[i]
> +    i = i + 1
> +    if key == 'boot_before_cfg' then
> +        boot_like_vshard()
> +        omit_cfg = true
> +    end
> +end
>
>  vshard = require('vshard')
>  echo_count = 0
>  cfg.replication_connect_timeout = 3
>  cfg.replication_timeout = 0.1
> -vshard.storage.cfg(cfg, util.name_to_uuid[NAME])
> +
> +if not omit_cfg then
> +    vshard.storage.cfg(cfg, instance_uuid)
> +end
> +
>  function bootstrap_storage(engine)
>      box.once("testapp:schema:1", function()
>          if rawget(_G, 'CHANGE_SPACE_IDS') then
> diff --git a/test/reload_evolution/storage.result
> b/test/reload_evolution/storage.result
> index ebf4fdc..4652c4f 100644
> --- a/test/reload_evolution/storage.result
> +++ b/test/reload_evolution/storage.result
> @@ -92,6 +92,16 @@ test_run:grep_log('storage_2_a',
> 'vshard.storage.reload_evolution: upgraded to')
>  ...
>  vshard.storage.internal.reload_version
>  ---
> +- 2
> +...
> +--
> +-- gh-237: should be only one trigger. During gh-237 the trigger
> installation
> +-- became conditional and therefore required to remember the current
> trigger
> +-- somewhere. When reloaded from the old version, the trigger needed to be
> +-- fetched from _bucket:on_replace().
> +--
> +#box.space._bucket:on_replace()
> +---
>  - 1
>  ...
>  -- Make sure storage operates well.
> diff --git a/test/reload_evolution/storage.test.lua
> b/test/reload_evolution/storage.test.lua
> index 56c1693..06f7117 100644
> --- a/test/reload_evolution/storage.test.lua
> +++ b/test/reload_evolution/storage.test.lua
> @@ -37,6 +37,14 @@ package.loaded['vshard.storage'] = nil
>  vshard.storage = require("vshard.storage")
>  test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution:
> upgraded to') ~= nil
>  vshard.storage.internal.reload_version
> +--
> +-- gh-237: should be only one trigger. During gh-237 the trigger
> installation
> +-- became conditional and therefore required to remember the current
> trigger
> +-- somewhere. When reloaded from the old version, the trigger needed to be
> +-- fetched from _bucket:on_replace().
> +--
> +#box.space._bucket:on_replace()
> +
>  -- Make sure storage operates well.
>  vshard.storage.bucket_force_drop(2000)
>  vshard.storage.bucket_force_create(2000)
> diff --git a/test/router/boot_replica_first.result
> b/test/router/boot_replica_first.result
> new file mode 100644
> index 0000000..1705230
> --- /dev/null
> +++ b/test/router/boot_replica_first.result
> @@ -0,0 +1,112 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
> + | ---
> + | ...
> +test_run:create_cluster(REPLICASET_1, 'router', {args =
> 'boot_before_cfg'})
> + | ---
> + | ...
> +util = require('util')
> + | ---
> + | ...
> +util.wait_master(test_run, REPLICASET_1, 'box_1_a')
> + | ---
> + | ...
> +_ = test_run:cmd("create server router with script='router/router_2.lua'")
> + | ---
> + | ...
> +_ = test_run:cmd("start server router")
> + | ---
> + | ...
> +
> +--
> +-- gh-237: replica should be able to boot before master. Before the issue
> was
> +-- fixed, replica always tried to install a trigger on _bucket space even
> when
> +-- it was not created on a master yet - that lead to an exception in
> +-- storage.cfg. Now it should not install the trigger at all, because
> anyway it
> +-- is not needed on replica for anything.
> +--
> +
> +test_run:switch('box_1_b')
> + | ---
> + | - true
> + | ...
> +vshard.storage.cfg(cfg, instance_uuid)
> + | ---
> + | ...
> +-- _bucket is not created yet. Will fail.
> +util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
> + | ---
> + | - attempt to index field '_bucket' (a nil value)
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
> + | ---
> + | ...
> +
> +test_run:switch('box_1_a')
> + | ---
> + | - true
> + | ...
> +vshard.storage.cfg(cfg, instance_uuid)
> + | ---
> + | ...
> +
> +test_run:switch('box_1_b')
> + | ---
> + | - true
> + | ...
> +test_run:wait_lsn('box_1_b', 'box_1_a')
> + | ---
> + | ...
> +-- Fails, but gracefully. Because the bucket is not found here.
> +vshard.storage.call(1, 'read', 'echo', {100})
> + | ---
> + | - null
> + | - bucket_id: 1
> + |   reason: Not found
> + |   code: 1
> + |   type: ShardingError
> + |   message: 'Cannot perform action with bucket 1, reason: Not found'
> + |   name: WRONG_BUCKET
> + | ...
> +-- Should not have triggers.
> +#box.space._bucket:on_replace()
> + | ---
> + | - 0
> + | ...
> +
> +test_run:switch('router')
> + | ---
> + | - true
> + | ...
> +vshard.router.bootstrap()
> + | ---
> + | - true
> + | ...
> +vshard.router.callro(1, 'echo', {100})
> + | ---
> + | - 100
> + | ...
> +
> +test_run:switch("default")
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server router')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server router')
> + | ---
> + | - true
> + | ...
> +test_run:drop_cluster(REPLICASET_1)
> + | ---
> + | ...
> diff --git a/test/router/boot_replica_first.test.lua
> b/test/router/boot_replica_first.test.lua
> new file mode 100644
> index 0000000..7b1b3fd
> --- /dev/null
> +++ b/test/router/boot_replica_first.test.lua
> @@ -0,0 +1,42 @@
> +test_run = require('test_run').new()
> +REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
> +test_run:create_cluster(REPLICASET_1, 'router', {args =
> 'boot_before_cfg'})
> +util = require('util')
> +util.wait_master(test_run, REPLICASET_1, 'box_1_a')
> +_ = test_run:cmd("create server router with script='router/router_2.lua'")
> +_ = test_run:cmd("start server router")
> +
> +--
> +-- gh-237: replica should be able to boot before master. Before the issue
> was
> +-- fixed, replica always tried to install a trigger on _bucket space even
> when
> +-- it was not created on a master yet - that lead to an exception in
> +-- storage.cfg. Now it should not install the trigger at all, because
> anyway it
> +-- is not needed on replica for anything.
> +--
> +
> +test_run:switch('box_1_b')
> +vshard.storage.cfg(cfg, instance_uuid)
> +-- _bucket is not created yet. Will fail.
> +util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
> +
> +test_run:switch('default')
> +util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
> +
> +test_run:switch('box_1_a')
> +vshard.storage.cfg(cfg, instance_uuid)
> +
> +test_run:switch('box_1_b')
> +test_run:wait_lsn('box_1_b', 'box_1_a')
> +-- Fails, but gracefully. Because the bucket is not found here.
> +vshard.storage.call(1, 'read', 'echo', {100})
> +-- Should not have triggers.
> +#box.space._bucket:on_replace()
> +
> +test_run:switch('router')
> +vshard.router.bootstrap()
> +vshard.router.callro(1, 'echo', {100})
> +
> +test_run:switch("default")
> +test_run:cmd('stop server router')
> +test_run:cmd('delete server router')
> +test_run:drop_cluster(REPLICASET_1)
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index c6a78fe..ed577f9 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -94,6 +94,17 @@ if not M then
>          -- detect that _bucket was not changed between yields.
>          --
>          bucket_generation = 0,
> +        --
> +        -- Reference to the function used as on_replace trigger on
> +        -- _bucket space. It is used to replace the trigger with
> +        -- a new function when reload happens. It is kept
> +        -- explicitly because the old function is deleted on
> +        -- reload from the global namespace. On the other hand, it
> +        -- is still stored in _bucket:on_replace() somewhere, but
> +        -- it is not known where. The only 100% way to be able to
> +        -- replace the old function is to keep its reference.
> +        --
> +        bucket_on_replace = nil,
>
>          ------------------- Garbage collection -------------------
>          -- Fiber to remove garbage buckets data.
> @@ -2435,8 +2446,11 @@ local function storage_cfg(cfg, this_replica_uuid,
> is_reload)
>      local uri = luri.parse(this_replica.uri)
>      schema_upgrade(is_master, uri.login, uri.password)
>
> -    local old_trigger = box.space._bucket:on_replace()[1]
> -    box.space._bucket:on_replace(bucket_generation_increment, old_trigger)
> +    if is_master then
> +        local old_trigger = M.bucket_on_replace
> +        box.space._bucket:on_replace(bucket_generation_increment,
> old_trigger)
> +        M.bucket_on_replace = bucket_generation_increment
> +    end
>
>      lreplicaset.rebind_replicasets(new_replicasets, M.replicasets)
>      lreplicaset.outdate_replicasets(M.replicasets)
> diff --git a/vshard/storage/reload_evolution.lua
> b/vshard/storage/reload_evolution.lua
> index 5d09b11..f38af74 100644
> --- a/vshard/storage/reload_evolution.lua
> +++ b/vshard/storage/reload_evolution.lua
> @@ -17,6 +17,14 @@ migrations[#migrations + 1] = function(M)
>      -- Code to update Lua objects.
>  end
>
> +migrations[#migrations + 1] = function(M)
> +    local bucket = box.space._bucket
> +    if bucket then
> +        assert(M.bucket_on_replace == nil)
> +        M.bucket_on_replace = bucket:on_replace()[1]
> +    end
> +end
> +
>  --
>  -- Perform an update based on a version stored in `M` (internals).
>  -- @param M Old module internals which should be updated.
> @@ -33,7 +41,7 @@ local function upgrade(M)
>          log.error(err_msg)
>          error(err_msg)
>      end
> -    for i = start_version, #migrations  do
> +    for i = start_version + 1, #migrations do
>

Is it already tested somewhere else (migrations I mean)? This change looks
like a fix for some other problem.


>          local ok, err = pcall(migrations[i], M)
>          if ok then
>              log.info('vshard.storage.reload_evolution: upgraded to %d
> version',
> --
> 2.21.1 (Apple Git-122.3)
>
>

[-- Attachment #2: Type: text/html, Size: 17873 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master
  2020-08-04 22:45 ` Yaroslav Dynnikov
@ 2020-08-05  9:46   ` Yaroslav Dynnikov
  2020-08-05 22:15     ` Vladislav Shpilevoy
  2020-08-05 22:15   ` Vladislav Shpilevoy
  1 sibling, 1 reply; 5+ messages in thread
From: Yaroslav Dynnikov @ 2020-08-05  9:46 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: Yaroslav Dynnikov, tml

[-- Attachment #1: Type: text/plain, Size: 15173 bytes --]

And one more question.

Best regards
Yaroslav Dynnikov


On Wed, 5 Aug 2020 at 01:45, Yaroslav Dynnikov <
yaroslav.dynnikov@tarantool.org> wrote:

> Hi, Vlad
>
> Thanks for the fix a lot. In general it looks good to me, but I've got few
> questions. Find them below.
>
> Best regards
> Yaroslav Dynnikov
>
>
> On Wed, 5 Aug 2020 at 00:04, Vladislav Shpilevoy <
> v.shpilevoy@tarantool.org> wrote:
>
>> Before the patch a replica couldn't call vshard.storage.cfg()
>> before the master is bootstrapped. That could lead to an exception
>> in the middle of vshard.storage.cfg().
>>
>> It sounds not so bad and even looks unreachable, because replica
>> bootstrap anyway is blocked in box.cfg() until a master node is
>> started, otherwise the replica instance is terminated.
>> But the box.cfg bootstrap != vshard bootstrap.
>>
>> It could be that both replica and master nodes are already
>> bootstrapped and working, but vshard.storage.cfg() wasn't called
>> yet. In that case vshard.storage.cfg() on the replica wouldn't
>> block in box.cfg(), and would happily fail a few lines later in
>> vshard.storage.cfg() at attempt to touch not existing
>> box.space._bucket.
>>
>> That was the case for cartridge. The framework configures
>> instances in its own way before vshard.storage.cfg(). Then it
>> calls vshard.storage.cfg() on them in arbitrary order, and
>> sometimes replicas were configured earlier than the master.
>>
>> The fail is fixed by simply skipping the _bucket's trigger
>> installation on the replica node. Because so far it is not needed
>> here for anything. The trigger's sole purpose is to increment
>> bucket generation used only for DML on _bucket on the master node.
>>
>> Now vshard.storage.cfg() on replica is able to finish before
>> master does the same. However the replica still won't be able to
>> handle vshard.storage.* requests until receives vshard schema
>> from master.
>>
>> Closes #237
>> ---
>> Branch:
>> http://github.com/tarantool/vshard/tree/gerold103/gh-237-boot-replica-first
>> Issue: https://github.com/tarantool/vshard/issues/237
>>
>>  test/lua_libs/storage_template.lua      |  55 +++++++++++-
>>  test/reload_evolution/storage.result    |  10 +++
>>  test/reload_evolution/storage.test.lua  |   8 ++
>>  test/router/boot_replica_first.result   | 112 ++++++++++++++++++++++++
>>  test/router/boot_replica_first.test.lua |  42 +++++++++
>>  vshard/storage/init.lua                 |  18 +++-
>>  vshard/storage/reload_evolution.lua     |  10 ++-
>>  7 files changed, 251 insertions(+), 4 deletions(-)
>>  create mode 100644 test/router/boot_replica_first.result
>>  create mode 100644 test/router/boot_replica_first.test.lua
>>
>> diff --git a/test/lua_libs/storage_template.lua
>> b/test/lua_libs/storage_template.lua
>> index 29a753d..84e4180 100644
>> --- a/test/lua_libs/storage_template.lua
>> +++ b/test/lua_libs/storage_template.lua
>> @@ -1,5 +1,7 @@
>>  #!/usr/bin/env tarantool
>>
>> +local luri = require('uri')
>> +
>>  NAME = require('fio').basename(arg[0], '.lua')
>>  fiber = require('fiber')
>>  test_run = require('test_run').new()
>> @@ -10,12 +12,63 @@ log = require('log')
>>  if not cfg.shard_index then
>>      cfg.shard_index = 'bucket_id'
>>  end
>> +instance_uuid = util.name_to_uuid[NAME]
>> +
>> +--
>> +-- Bootstrap the instance exactly like vshard does. But don't
>> +-- initialize any vshard-specific code.
>> +--
>> +local function boot_like_vshard()
>> +    assert(type(box.cfg) == 'function')
>> +    for rs_uuid, rs in pairs(cfg.sharding) do
>> +        for replica_uuid, replica in pairs(rs.replicas) do
>> +            if replica_uuid == instance_uuid then
>> +                local box_cfg = {replication = {}}
>> +                box_cfg.instance_uuid = replica_uuid
>> +                box_cfg.replicaset_uuid = rs_uuid
>> +                box_cfg.listen = replica.uri
>> +                box_cfg.read_only = not replica.master
>> +                box_cfg.replication_connect_quorum = 0
>> +                box_cfg.replication_timeout = 0.1
>> +                for _, replica in pairs(rs.replicas) do
>> +                    table.insert(box_cfg.replication, replica.uri)
>> +                end
>> +                box.cfg(box_cfg)
>> +                if not replica.master then
>> +                    return
>> +                end
>> +                local uri = luri.parse(replica.uri)
>> +                box.schema.user.create(uri.login, {
>> +                    password = uri.password, if_not_exists = true,
>> +                })
>> +                box.schema.user.grant(uri.login, 'super')
>> +                return
>> +            end
>> +        end
>> +    end
>> +    assert(false)
>>
>
> Should there be some meaningful message?
>
>
>> +end
>> +
>> +local omit_cfg = false
>> +local i = 1
>> +while arg[i] ~= nil do
>> +    local key = arg[i]
>> +    i = i + 1
>> +    if key == 'boot_before_cfg' then
>> +        boot_like_vshard()
>> +        omit_cfg = true
>> +    end
>> +end
>>
>>  vshard = require('vshard')
>>  echo_count = 0
>>  cfg.replication_connect_timeout = 3
>>  cfg.replication_timeout = 0.1
>> -vshard.storage.cfg(cfg, util.name_to_uuid[NAME])
>> +
>> +if not omit_cfg then
>> +    vshard.storage.cfg(cfg, instance_uuid)
>> +end
>> +
>>  function bootstrap_storage(engine)
>>      box.once("testapp:schema:1", function()
>>          if rawget(_G, 'CHANGE_SPACE_IDS') then
>> diff --git a/test/reload_evolution/storage.result
>> b/test/reload_evolution/storage.result
>> index ebf4fdc..4652c4f 100644
>> --- a/test/reload_evolution/storage.result
>> +++ b/test/reload_evolution/storage.result
>> @@ -92,6 +92,16 @@ test_run:grep_log('storage_2_a',
>> 'vshard.storage.reload_evolution: upgraded to')
>>  ...
>>  vshard.storage.internal.reload_version
>>  ---
>> +- 2
>> +...
>> +--
>> +-- gh-237: should be only one trigger. During gh-237 the trigger
>> installation
>> +-- became conditional and therefore required to remember the current
>> trigger
>> +-- somewhere. When reloaded from the old version, the trigger needed to
>> be
>> +-- fetched from _bucket:on_replace().
>> +--
>> +#box.space._bucket:on_replace()
>> +---
>>  - 1
>>  ...
>>  -- Make sure storage operates well.
>> diff --git a/test/reload_evolution/storage.test.lua
>> b/test/reload_evolution/storage.test.lua
>> index 56c1693..06f7117 100644
>> --- a/test/reload_evolution/storage.test.lua
>> +++ b/test/reload_evolution/storage.test.lua
>> @@ -37,6 +37,14 @@ package.loaded['vshard.storage'] = nil
>>  vshard.storage = require("vshard.storage")
>>  test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution:
>> upgraded to') ~= nil
>>  vshard.storage.internal.reload_version
>> +--
>> +-- gh-237: should be only one trigger. During gh-237 the trigger
>> installation
>> +-- became conditional and therefore required to remember the current
>> trigger
>> +-- somewhere. When reloaded from the old version, the trigger needed to
>> be
>> +-- fetched from _bucket:on_replace().
>> +--
>> +#box.space._bucket:on_replace()
>> +
>>  -- Make sure storage operates well.
>>  vshard.storage.bucket_force_drop(2000)
>>  vshard.storage.bucket_force_create(2000)
>> diff --git a/test/router/boot_replica_first.result
>> b/test/router/boot_replica_first.result
>> new file mode 100644
>> index 0000000..1705230
>> --- /dev/null
>> +++ b/test/router/boot_replica_first.result
>> @@ -0,0 +1,112 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
>> + | ---
>> + | ...
>> +test_run:create_cluster(REPLICASET_1, 'router', {args =
>> 'boot_before_cfg'})
>> + | ---
>> + | ...
>> +util = require('util')
>> + | ---
>> + | ...
>> +util.wait_master(test_run, REPLICASET_1, 'box_1_a')
>> + | ---
>> + | ...
>> +_ = test_run:cmd("create server router with
>> script='router/router_2.lua'")
>> + | ---
>> + | ...
>> +_ = test_run:cmd("start server router")
>> + | ---
>> + | ...
>> +
>> +--
>> +-- gh-237: replica should be able to boot before master. Before the
>> issue was
>> +-- fixed, replica always tried to install a trigger on _bucket space
>> even when
>> +-- it was not created on a master yet - that lead to an exception in
>> +-- storage.cfg. Now it should not install the trigger at all, because
>> anyway it
>> +-- is not needed on replica for anything.
>> +--
>> +
>> +test_run:switch('box_1_b')
>> + | ---
>> + | - true
>> + | ...
>> +vshard.storage.cfg(cfg, instance_uuid)
>> + | ---
>> + | ...
>> +-- _bucket is not created yet. Will fail.
>> +util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
>> + | ---
>> + | - attempt to index field '_bucket' (a nil value)
>> + | ...
>> +
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
>> + | ---
>> + | ...
>> +
>> +test_run:switch('box_1_a')
>> + | ---
>> + | - true
>> + | ...
>> +vshard.storage.cfg(cfg, instance_uuid)
>> + | ---
>> + | ...
>> +
>> +test_run:switch('box_1_b')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:wait_lsn('box_1_b', 'box_1_a')
>> + | ---
>> + | ...
>> +-- Fails, but gracefully. Because the bucket is not found here.
>> +vshard.storage.call(1, 'read', 'echo', {100})
>> + | ---
>> + | - null
>> + | - bucket_id: 1
>> + |   reason: Not found
>> + |   code: 1
>> + |   type: ShardingError
>> + |   message: 'Cannot perform action with bucket 1, reason: Not found'
>> + |   name: WRONG_BUCKET
>> + | ...
>> +-- Should not have triggers.
>> +#box.space._bucket:on_replace()
>> + | ---
>> + | - 0
>> + | ...
>> +
>> +test_run:switch('router')
>> + | ---
>> + | - true
>> + | ...
>> +vshard.router.bootstrap()
>> + | ---
>> + | - true
>> + | ...
>> +vshard.router.callro(1, 'echo', {100})
>> + | ---
>> + | - 100
>> + | ...
>> +
>> +test_run:switch("default")
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('stop server router')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('delete server router')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:drop_cluster(REPLICASET_1)
>> + | ---
>> + | ...
>> diff --git a/test/router/boot_replica_first.test.lua
>> b/test/router/boot_replica_first.test.lua
>> new file mode 100644
>> index 0000000..7b1b3fd
>> --- /dev/null
>> +++ b/test/router/boot_replica_first.test.lua
>> @@ -0,0 +1,42 @@
>> +test_run = require('test_run').new()
>> +REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
>> +test_run:create_cluster(REPLICASET_1, 'router', {args =
>> 'boot_before_cfg'})
>> +util = require('util')
>> +util.wait_master(test_run, REPLICASET_1, 'box_1_a')
>> +_ = test_run:cmd("create server router with
>> script='router/router_2.lua'")
>> +_ = test_run:cmd("start server router")
>> +
>> +--
>> +-- gh-237: replica should be able to boot before master. Before the
>> issue was
>> +-- fixed, replica always tried to install a trigger on _bucket space
>> even when
>> +-- it was not created on a master yet - that lead to an exception in
>> +-- storage.cfg. Now it should not install the trigger at all, because
>> anyway it
>> +-- is not needed on replica for anything.
>> +--
>> +
>> +test_run:switch('box_1_b')
>> +vshard.storage.cfg(cfg, instance_uuid)
>> +-- _bucket is not created yet. Will fail.
>> +util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
>> +
>> +test_run:switch('default')
>> +util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
>> +
>> +test_run:switch('box_1_a')
>> +vshard.storage.cfg(cfg, instance_uuid)
>> +
>> +test_run:switch('box_1_b')
>> +test_run:wait_lsn('box_1_b', 'box_1_a')
>> +-- Fails, but gracefully. Because the bucket is not found here.
>> +vshard.storage.call(1, 'read', 'echo', {100})
>> +-- Should not have triggers.
>> +#box.space._bucket:on_replace()
>> +
>> +test_run:switch('router')
>> +vshard.router.bootstrap()
>> +vshard.router.callro(1, 'echo', {100})
>> +
>> +test_run:switch("default")
>> +test_run:cmd('stop server router')
>> +test_run:cmd('delete server router')
>> +test_run:drop_cluster(REPLICASET_1)
>> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
>> index c6a78fe..ed577f9 100644
>> --- a/vshard/storage/init.lua
>> +++ b/vshard/storage/init.lua
>> @@ -94,6 +94,17 @@ if not M then
>>          -- detect that _bucket was not changed between yields.
>>          --
>>          bucket_generation = 0,
>> +        --
>> +        -- Reference to the function used as on_replace trigger on
>> +        -- _bucket space. It is used to replace the trigger with
>> +        -- a new function when reload happens. It is kept
>> +        -- explicitly because the old function is deleted on
>> +        -- reload from the global namespace. On the other hand, it
>> +        -- is still stored in _bucket:on_replace() somewhere, but
>> +        -- it is not known where. The only 100% way to be able to
>> +        -- replace the old function is to keep its reference.
>> +        --
>> +        bucket_on_replace = nil,
>>
>>          ------------------- Garbage collection -------------------
>>          -- Fiber to remove garbage buckets data.
>> @@ -2435,8 +2446,11 @@ local function storage_cfg(cfg, this_replica_uuid,
>> is_reload)
>>      local uri = luri.parse(this_replica.uri)
>>      schema_upgrade(is_master, uri.login, uri.password)
>>
>> -    local old_trigger = box.space._bucket:on_replace()[1]
>> -    box.space._bucket:on_replace(bucket_generation_increment,
>> old_trigger)
>> +    if is_master then
>> +        local old_trigger = M.bucket_on_replace
>> +        box.space._bucket:on_replace(bucket_generation_increment,
>> old_trigger)
>> +        M.bucket_on_replace = bucket_generation_increment
>> +    end
>>
>
It seems that the trigger is never removed. If a replica was a master some
time ago, it still has the trigger.
Is it safe?


>>      lreplicaset.rebind_replicasets(new_replicasets, M.replicasets)
>>      lreplicaset.outdate_replicasets(M.replicasets)
>> diff --git a/vshard/storage/reload_evolution.lua
>> b/vshard/storage/reload_evolution.lua
>> index 5d09b11..f38af74 100644
>> --- a/vshard/storage/reload_evolution.lua
>> +++ b/vshard/storage/reload_evolution.lua
>> @@ -17,6 +17,14 @@ migrations[#migrations + 1] = function(M)
>>      -- Code to update Lua objects.
>>  end
>>
>> +migrations[#migrations + 1] = function(M)
>> +    local bucket = box.space._bucket
>> +    if bucket then
>> +        assert(M.bucket_on_replace == nil)
>> +        M.bucket_on_replace = bucket:on_replace()[1]
>> +    end
>> +end
>> +
>>  --
>>  -- Perform an update based on a version stored in `M` (internals).
>>  -- @param M Old module internals which should be updated.
>> @@ -33,7 +41,7 @@ local function upgrade(M)
>>          log.error(err_msg)
>>          error(err_msg)
>>      end
>> -    for i = start_version, #migrations  do
>> +    for i = start_version + 1, #migrations do
>>
>
> Is it already tested somewhere else (migrations I mean)? This change looks
> like a fix for some other problem.
>
>
>>          local ok, err = pcall(migrations[i], M)
>>          if ok then
>>              log.info('vshard.storage.reload_evolution: upgraded to %d
>> version',
>> --
>> 2.21.1 (Apple Git-122.3)
>>
>>

[-- Attachment #2: Type: text/html, Size: 18995 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master
  2020-08-05  9:46   ` Yaroslav Dynnikov
@ 2020-08-05 22:15     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 5+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-05 22:15 UTC (permalink / raw)
  To: Yaroslav Dynnikov; +Cc: tml

Thanks for the review!

>         diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
>         index c6a78fe..ed577f9 100644
>         --- a/vshard/storage/init.lua
>         +++ b/vshard/storage/init.lua
>         @@ -2435,8 +2446,11 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
>              local uri = luri.parse(this_replica.uri)
>              schema_upgrade(is_master, uri.login, uri.password)
> 
>         -    local old_trigger = box.space._bucket:on_replace()[1]
>         -    box.space._bucket:on_replace(bucket_generation_increment, old_trigger)
>         +    if is_master then
>         +        local old_trigger = M.bucket_on_replace
>         +        box.space._bucket:on_replace(bucket_generation_increment, old_trigger)
>         +        M.bucket_on_replace = bucket_generation_increment
>         +    end
> 
> 
> It seems that the trigger is never removed. If a replica was a master some time ago, it still has the trigger.
> Is it safe?

Yes, it is. The trigger only increments a counter in M. And the counter is
not used for anything out of the node. It should be just locally monotonic.
So it is ok to increment it even on replica. After all this is how it
worked before.

But again, after this comment I had second thoughts, and decided to remove the
trigger when switched to replica to be consistent. See v2 in a new mail thread.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master
  2020-08-04 22:45 ` Yaroslav Dynnikov
  2020-08-05  9:46   ` Yaroslav Dynnikov
@ 2020-08-05 22:15   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 5+ messages in thread
From: Vladislav Shpilevoy @ 2020-08-05 22:15 UTC (permalink / raw)
  To: Yaroslav Dynnikov; +Cc: tml

Hi! Thanks for the review!

>     diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
>     index 29a753d..84e4180 100644
>     --- a/test/lua_libs/storage_template.lua
>     +++ b/test/lua_libs/storage_template.lua
>     @@ -10,12 +12,63 @@ log = require('log')
>      if not cfg.shard_index then
>          cfg.shard_index = 'bucket_id'
>      end
>     +instance_uuid = util.name_to_uuid[NAME]
>     +
>     +--
>     +-- Bootstrap the instance exactly like vshard does. But don't
>     +-- initialize any vshard-specific code.
>     +--
>     +local function boot_like_vshard()
>     +    assert(type(box.cfg) == 'function')
>     +    for rs_uuid, rs in pairs(cfg.sharding) do
>     +        for replica_uuid, replica in pairs(rs.replicas) do
>     +            if replica_uuid == instance_uuid then
>     +                local box_cfg = {replication = {}}
>     +                box_cfg.instance_uuid = replica_uuid
>     +                box_cfg.replicaset_uuid = rs_uuid
>     +                box_cfg.listen = replica.uri
>     +                box_cfg.read_only = not replica.master
>     +                box_cfg.replication_connect_quorum = 0
>     +                box_cfg.replication_timeout = 0.1
>     +                for _, replica in pairs(rs.replicas) do
>     +                    table.insert(box_cfg.replication, replica.uri)
>     +                end
>     +                box.cfg(box_cfg)
>     +                if not replica.master then
>     +                    return
>     +                end
>     +                local uri = luri.parse(replica.uri)
>     +                box.schema.user.create(uri.login, {
>     +                    password = uri.password, if_not_exists = true,
>     +                })
>     +                box.schema.user.grant(uri.login, 'super')
>     +                return
>     +            end
>     +        end
>     +    end
>     +    assert(false)
> 
> 
> Should there be some meaningful message?

Nope. This is just a test, so if it will fail, I will fix it.
Don't want to invent anything here. The code is unreachable,
and assert(false) is telling that fine.

If you mean the message which would help to locate the error -
not needed as well. The assertion fail message will contain file
name and line number.

>     +end
>     +
>     +local omit_cfg = false
>     +local i = 1
>     +while arg[i] ~= nil do
>     +    local key = arg[i]
>     +    i = i + 1
>     +    if key == 'boot_before_cfg' then
>     +        boot_like_vshard()
>     +        omit_cfg = true
>     +    end
>     +end
>     diff --git a/vshard/storage/reload_evolution.lua b/vshard/storage/reload_evolution.lua
>     index 5d09b11..f38af74 100644
>     --- a/vshard/storage/reload_evolution.lua
>     +++ b/vshard/storage/reload_evolution.lua
>     @@ -17,6 +17,14 @@ migrations[#migrations + 1] = function(M)
>          -- Code to update Lua objects.
>      end
> 
>     +migrations[#migrations + 1] = function(M)
>     +    local bucket = box.space._bucket
>     +    if bucket then
>     +        assert(M.bucket_on_replace == nil)
>     +        M.bucket_on_replace = bucket:on_replace()[1]
>     +    end
>     +end
>     +
>      --
>      -- Perform an update based on a version stored in `M` (internals).
>      -- @param M Old module internals which should be updated.
>     @@ -33,7 +41,7 @@ local function upgrade(M)
>              log.error(err_msg)
>              error(err_msg)
>          end
>     -    for i = start_version, #migrations  do
>     +    for i = start_version + 1, #migrations do
> 
> 
> Is it already tested somewhere else (migrations I mean)? This change looks like a fix for some other problem.

Good you noticed this. I thought this fix can't be separated, because
reload evolution didn't have any meaningful migrations, and that
couldn't be tested. But after your question I had second thoughts and
it appears the bug does have a "test" already. I extracted it into a
separate commit. Although the test is reverted by the second commit
anyway.

See v2 patchset in a new thread.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-05 22:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04 21:04 [Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master Vladislav Shpilevoy
2020-08-04 22:45 ` Yaroslav Dynnikov
2020-08-05  9:46   ` Yaroslav Dynnikov
2020-08-05 22:15     ` Vladislav Shpilevoy
2020-08-05 22:15   ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox