[Tarantool-patches] [PATCH vshard 1/1] storage: allow replica to boot before master

Wed Aug 5 12:46:57 MSK 2020

And one more question.

Best regards
Yaroslav Dynnikov

On Wed, 5 Aug 2020 at 01:45, Yaroslav Dynnikov <
yaroslav.dynnikov at tarantool.org> wrote:

> Hi, Vlad
>
> Thanks for the fix a lot. In general it looks good to me, but I've got few
> questions. Find them below.
>
> Best regards
> Yaroslav Dynnikov
>
>
> On Wed, 5 Aug 2020 at 00:04, Vladislav Shpilevoy <
> v.shpilevoy at tarantool.org> wrote:
>
>> Before the patch a replica couldn't call vshard.storage.cfg()
>> before the master is bootstrapped. That could lead to an exception
>> in the middle of vshard.storage.cfg().
>>
>> It sounds not so bad and even looks unreachable, because replica
>> bootstrap anyway is blocked in box.cfg() until a master node is
>> started, otherwise the replica instance is terminated.
>> But the box.cfg bootstrap != vshard bootstrap.
>>
>> It could be that both replica and master nodes are already
>> bootstrapped and working, but vshard.storage.cfg() wasn't called
>> yet. In that case vshard.storage.cfg() on the replica wouldn't
>> block in box.cfg(), and would happily fail a few lines later in
>> vshard.storage.cfg() at attempt to touch not existing
>> box.space._bucket.
>>
>> That was the case for cartridge. The framework configures
>> instances in its own way before vshard.storage.cfg(). Then it
>> calls vshard.storage.cfg() on them in arbitrary order, and
>> sometimes replicas were configured earlier than the master.
>>
>> The fail is fixed by simply skipping the _bucket's trigger
>> installation on the replica node. Because so far it is not needed
>> here for anything. The trigger's sole purpose is to increment
>> bucket generation used only for DML on _bucket on the master node.
>>
>> Now vshard.storage.cfg() on replica is able to finish before
>> master does the same. However the replica still won't be able to
>> handle vshard.storage.* requests until receives vshard schema
>> from master.
>>
>> Closes #237
>> ---
>> Branch:
>> http://github.com/tarantool/vshard/tree/gerold103/gh-237-boot-replica-first
>> Issue: https://github.com/tarantool/vshard/issues/237
>>
>>  test/lua_libs/storage_template.lua      |  55 +++++++++++-
>>  test/reload_evolution/storage.result    |  10 +++
>>  test/reload_evolution/storage.test.lua  |   8 ++
>>  test/router/boot_replica_first.result   | 112 ++++++++++++++++++++++++
>>  test/router/boot_replica_first.test.lua |  42 +++++++++
>>  vshard/storage/init.lua                 |  18 +++-
>>  vshard/storage/reload_evolution.lua     |  10 ++-
>>  7 files changed, 251 insertions(+), 4 deletions(-)
>>  create mode 100644 test/router/boot_replica_first.result
>>  create mode 100644 test/router/boot_replica_first.test.lua
>>
>> diff --git a/test/lua_libs/storage_template.lua
>> b/test/lua_libs/storage_template.lua
>> index 29a753d..84e4180 100644
>> --- a/test/lua_libs/storage_template.lua
>> +++ b/test/lua_libs/storage_template.lua
>> @@ -1,5 +1,7 @@
>>  #!/usr/bin/env tarantool
>>
>> +local luri = require('uri')
>> +
>>  NAME = require('fio').basename(arg[0], '.lua')
>>  fiber = require('fiber')
>>  test_run = require('test_run').new()
>> @@ -10,12 +12,63 @@ log = require('log')
>>  if not cfg.shard_index then
>>      cfg.shard_index = 'bucket_id'
>>  end
>> +instance_uuid = util.name_to_uuid[NAME]
>> +
>> +--
>> +-- Bootstrap the instance exactly like vshard does. But don't
>> +-- initialize any vshard-specific code.
>> +--
>> +local function boot_like_vshard()
>> +    assert(type(box.cfg) == 'function')
>> +    for rs_uuid, rs in pairs(cfg.sharding) do
>> +        for replica_uuid, replica in pairs(rs.replicas) do
>> +            if replica_uuid == instance_uuid then
>> +                local box_cfg = {replication = {}}
>> +                box_cfg.instance_uuid = replica_uuid
>> +                box_cfg.replicaset_uuid = rs_uuid
>> +                box_cfg.listen = replica.uri
>> +                box_cfg.read_only = not replica.master
>> +                box_cfg.replication_connect_quorum = 0
>> +                box_cfg.replication_timeout = 0.1
>> +                for _, replica in pairs(rs.replicas) do
>> +                    table.insert(box_cfg.replication, replica.uri)
>> +                end
>> +                box.cfg(box_cfg)
>> +                if not replica.master then
>> +                    return
>> +                end
>> +                local uri = luri.parse(replica.uri)
>> +                box.schema.user.create(uri.login, {
>> +                    password = uri.password, if_not_exists = true,
>> +                })
>> +                box.schema.user.grant(uri.login, 'super')
>> +                return
>> +            end
>> +        end
>> +    end
>> +    assert(false)
>>
>
> Should there be some meaningful message?
>
>
>> +end
>> +
>> +local omit_cfg = false
>> +local i = 1
>> +while arg[i] ~= nil do
>> +    local key = arg[i]
>> +    i = i + 1
>> +    if key == 'boot_before_cfg' then
>> +        boot_like_vshard()
>> +        omit_cfg = true
>> +    end
>> +end
>>
>>  vshard = require('vshard')
>>  echo_count = 0
>>  cfg.replication_connect_timeout = 3
>>  cfg.replication_timeout = 0.1
>> -vshard.storage.cfg(cfg, util.name_to_uuid[NAME])
>> +
>> +if not omit_cfg then
>> +    vshard.storage.cfg(cfg, instance_uuid)
>> +end
>> +
>>  function bootstrap_storage(engine)
>>      box.once("testapp:schema:1", function()
>>          if rawget(_G, 'CHANGE_SPACE_IDS') then
>> diff --git a/test/reload_evolution/storage.result
>> b/test/reload_evolution/storage.result
>> index ebf4fdc..4652c4f 100644
>> --- a/test/reload_evolution/storage.result
>> +++ b/test/reload_evolution/storage.result
>> @@ -92,6 +92,16 @@ test_run:grep_log('storage_2_a',
>> 'vshard.storage.reload_evolution: upgraded to')
>>  ...
>>  vshard.storage.internal.reload_version
>>  ---
>> +- 2
>> +...
>> +--
>> +-- gh-237: should be only one trigger. During gh-237 the trigger
>> installation
>> +-- became conditional and therefore required to remember the current
>> trigger
>> +-- somewhere. When reloaded from the old version, the trigger needed to
>> be
>> +-- fetched from _bucket:on_replace().
>> +--
>> +#box.space._bucket:on_replace()
>> +---
>>  - 1
>>  ...
>>  -- Make sure storage operates well.
>> diff --git a/test/reload_evolution/storage.test.lua
>> b/test/reload_evolution/storage.test.lua
>> index 56c1693..06f7117 100644
>> --- a/test/reload_evolution/storage.test.lua
>> +++ b/test/reload_evolution/storage.test.lua
>> @@ -37,6 +37,14 @@ package.loaded['vshard.storage'] = nil
>>  vshard.storage = require("vshard.storage")
>>  test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution:
>> upgraded to') ~= nil
>>  vshard.storage.internal.reload_version
>> +--
>> +-- gh-237: should be only one trigger. During gh-237 the trigger
>> installation
>> +-- became conditional and therefore required to remember the current
>> trigger
>> +-- somewhere. When reloaded from the old version, the trigger needed to
>> be
>> +-- fetched from _bucket:on_replace().
>> +--
>> +#box.space._bucket:on_replace()
>> +
>>  -- Make sure storage operates well.
>>  vshard.storage.bucket_force_drop(2000)
>>  vshard.storage.bucket_force_create(2000)
>> diff --git a/test/router/boot_replica_first.result
>> b/test/router/boot_replica_first.result
>> new file mode 100644
>> index 0000000..1705230
>> --- /dev/null
>> +++ b/test/router/boot_replica_first.result
>> @@ -0,0 +1,112 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
>> + | ---
>> + | ...
>> +test_run:create_cluster(REPLICASET_1, 'router', {args =
>> 'boot_before_cfg'})
>> + | ---
>> + | ...
>> +util = require('util')
>> + | ---
>> + | ...
>> +util.wait_master(test_run, REPLICASET_1, 'box_1_a')
>> + | ---
>> + | ...
>> +_ = test_run:cmd("create server router with
>> script='router/router_2.lua'")
>> + | ---
>> + | ...
>> +_ = test_run:cmd("start server router")
>> + | ---
>> + | ...
>> +
>> +--
>> +-- gh-237: replica should be able to boot before master. Before the
>> issue was
>> +-- fixed, replica always tried to install a trigger on _bucket space
>> even when
>> +-- it was not created on a master yet - that lead to an exception in
>> +-- storage.cfg. Now it should not install the trigger at all, because
>> anyway it
>> +-- is not needed on replica for anything.
>> +--
>> +
>> +test_run:switch('box_1_b')
>> + | ---
>> + | - true
>> + | ...
>> +vshard.storage.cfg(cfg, instance_uuid)
>> + | ---
>> + | ...
>> +-- _bucket is not created yet. Will fail.
>> +util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
>> + | ---
>> + | - attempt to index field '_bucket' (a nil value)
>> + | ...
>> +
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
>> + | ---
>> + | ...
>> +
>> +test_run:switch('box_1_a')
>> + | ---
>> + | - true
>> + | ...
>> +vshard.storage.cfg(cfg, instance_uuid)
>> + | ---
>> + | ...
>> +
>> +test_run:switch('box_1_b')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:wait_lsn('box_1_b', 'box_1_a')
>> + | ---
>> + | ...
>> +-- Fails, but gracefully. Because the bucket is not found here.
>> +vshard.storage.call(1, 'read', 'echo', {100})
>> + | ---
>> + | - null
>> + | - bucket_id: 1
>> + |   reason: Not found
>> + |   code: 1
>> + |   type: ShardingError
>> + |   message: 'Cannot perform action with bucket 1, reason: Not found'
>> + |   name: WRONG_BUCKET
>> + | ...
>> +-- Should not have triggers.
>> +#box.space._bucket:on_replace()
>> + | ---
>> + | - 0
>> + | ...
>> +
>> +test_run:switch('router')
>> + | ---
>> + | - true
>> + | ...
>> +vshard.router.bootstrap()
>> + | ---
>> + | - true
>> + | ...
>> +vshard.router.callro(1, 'echo', {100})
>> + | ---
>> + | - 100
>> + | ...
>> +
>> +test_run:switch("default")
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('stop server router')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('delete server router')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:drop_cluster(REPLICASET_1)
>> + | ---
>> + | ...
>> diff --git a/test/router/boot_replica_first.test.lua
>> b/test/router/boot_replica_first.test.lua
>> new file mode 100644
>> index 0000000..7b1b3fd
>> --- /dev/null
>> +++ b/test/router/boot_replica_first.test.lua
>> @@ -0,0 +1,42 @@
>> +test_run = require('test_run').new()
>> +REPLICASET_1 = { 'box_1_a', 'box_1_b', 'box_1_c' }
>> +test_run:create_cluster(REPLICASET_1, 'router', {args =
>> 'boot_before_cfg'})
>> +util = require('util')
>> +util.wait_master(test_run, REPLICASET_1, 'box_1_a')
>> +_ = test_run:cmd("create server router with
>> script='router/router_2.lua'")
>> +_ = test_run:cmd("start server router")
>> +
>> +--
>> +-- gh-237: replica should be able to boot before master. Before the
>> issue was
>> +-- fixed, replica always tried to install a trigger on _bucket space
>> even when
>> +-- it was not created on a master yet - that lead to an exception in
>> +-- storage.cfg. Now it should not install the trigger at all, because
>> anyway it
>> +-- is not needed on replica for anything.
>> +--
>> +
>> +test_run:switch('box_1_b')
>> +vshard.storage.cfg(cfg, instance_uuid)
>> +-- _bucket is not created yet. Will fail.
>> +util.check_error(vshard.storage.call, 1, 'read', 'echo', {100})
>> +
>> +test_run:switch('default')
>> +util.map_evals(test_run, {REPLICASET_1}, 'bootstrap_storage(\'memtx\')')
>> +
>> +test_run:switch('box_1_a')
>> +vshard.storage.cfg(cfg, instance_uuid)
>> +
>> +test_run:switch('box_1_b')
>> +test_run:wait_lsn('box_1_b', 'box_1_a')
>> +-- Fails, but gracefully. Because the bucket is not found here.
>> +vshard.storage.call(1, 'read', 'echo', {100})
>> +-- Should not have triggers.
>> +#box.space._bucket:on_replace()
>> +
>> +test_run:switch('router')
>> +vshard.router.bootstrap()
>> +vshard.router.callro(1, 'echo', {100})
>> +
>> +test_run:switch("default")
>> +test_run:cmd('stop server router')
>> +test_run:cmd('delete server router')
>> +test_run:drop_cluster(REPLICASET_1)
>> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
>> index c6a78fe..ed577f9 100644
>> --- a/vshard/storage/init.lua
>> +++ b/vshard/storage/init.lua
>> @@ -94,6 +94,17 @@ if not M then
>>          -- detect that _bucket was not changed between yields.
>>          --
>>          bucket_generation = 0,
>> +        --
>> +        -- Reference to the function used as on_replace trigger on
>> +        -- _bucket space. It is used to replace the trigger with
>> +        -- a new function when reload happens. It is kept
>> +        -- explicitly because the old function is deleted on
>> +        -- reload from the global namespace. On the other hand, it
>> +        -- is still stored in _bucket:on_replace() somewhere, but
>> +        -- it is not known where. The only 100% way to be able to
>> +        -- replace the old function is to keep its reference.
>> +        --
>> +        bucket_on_replace = nil,
>>
>>          ------------------- Garbage collection -------------------
>>          -- Fiber to remove garbage buckets data.
>> @@ -2435,8 +2446,11 @@ local function storage_cfg(cfg, this_replica_uuid,
>> is_reload)
>>      local uri = luri.parse(this_replica.uri)
>>      schema_upgrade(is_master, uri.login, uri.password)
>>
>> -    local old_trigger = box.space._bucket:on_replace()[1]
>> -    box.space._bucket:on_replace(bucket_generation_increment,
>> old_trigger)
>> +    if is_master then
>> +        local old_trigger = M.bucket_on_replace
>> +        box.space._bucket:on_replace(bucket_generation_increment,
>> old_trigger)
>> +        M.bucket_on_replace = bucket_generation_increment
>> +    end
>>
>
It seems that the trigger is never removed. If a replica was a master some
time ago, it still has the trigger.
Is it safe?

>>      lreplicaset.rebind_replicasets(new_replicasets, M.replicasets)
>>      lreplicaset.outdate_replicasets(M.replicasets)
>> diff --git a/vshard/storage/reload_evolution.lua
>> b/vshard/storage/reload_evolution.lua
>> index 5d09b11..f38af74 100644
>> --- a/vshard/storage/reload_evolution.lua
>> +++ b/vshard/storage/reload_evolution.lua
>> @@ -17,6 +17,14 @@ migrations[#migrations + 1] = function(M)
>>      -- Code to update Lua objects.
>>  end
>>
>> +migrations[#migrations + 1] = function(M)
>> +    local bucket = box.space._bucket
>> +    if bucket then
>> +        assert(M.bucket_on_replace == nil)
>> +        M.bucket_on_replace = bucket:on_replace()[1]
>> +    end
>> +end
>> +
>>  --
>>  -- Perform an update based on a version stored in `M` (internals).
>>  -- @param M Old module internals which should be updated.
>> @@ -33,7 +41,7 @@ local function upgrade(M)
>>          log.error(err_msg)
>>          error(err_msg)
>>      end
>> -    for i = start_version, #migrations  do
>> +    for i = start_version + 1, #migrations do
>>
>
> Is it already tested somewhere else (migrations I mean)? This change looks
> like a fix for some other problem.
>
>
>>          local ok, err = pcall(migrations[i], M)
>>          if ok then
>>              log.info('vshard.storage.reload_evolution: upgraded to %d
>> version',
>> --
>> 2.21.1 (Apple Git-122.3)
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20200805/4bb5b7be/attachment.html>