[tarantool-patches] Re: [PATCH 2/3] Complete module reload
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Thu Jul 19 18:14:53 MSK 2018
Thanks for the patch! See 13 comments below.
On 18/07/2018 20:47, AKhatskevich wrote:
> In case one need to upgrade vshard to a new version, this commit
> improves reload mechanism to allow to do that for a wider variety of
> possible changes (between two versions).
>
> Changes:
> * introduce cfg option `connection_outdate_delay`
> * improve reload mechanism
> * add `util.async_task` method, which runs a function after a
> delay
> * delete replicaset:rebind_connections method as it is replaced
> with `rebind_replicasets` which updates all replicasets at once
>
> Reload mechanism:
> * reload all vshard modules
> * create new `replicaset` and `replica` objects
> * reuse old netbox connections in new replica objects if
> possible
> * update router/storage.internal table
> * after a `connection_outdate_delay` disable old instances of
> `replicaset` and `replica` objects
>
> Reload works for modules:
> * vshard.router
> * vshard.storage
>
> Here is a module reload algorithm:
> * old vshard is working
> * delete old vshard src
> * install new vshard
> * call: package.loaded['vshard.router'] = nil
> * call: old_router = vshard.router -- Save working router copy.
> * call: vshard.router = require('vshard.router')
> * if require fails: continue using old_router
> * if require succeeds: use vshard.router
>
> In case reload process fails, old router/storage module, replicaset and
> replica objects continue working properly. If reload succeeds, all old
> objects would be deprecated.
>
> Extra changes:
> * introduce MODULE_INTERNALS which stores name of the module
> internal data in the global namespace
>
> Part of <shut git>112
1. Stray '<shut git>'.
> ---
> test/router/reload.result | 159 +++++++++++++++++++++++++++++++++++++++++++
> test/router/reload.test.lua | 48 +++++++++++++
> test/router/router.result | 3 +-
> test/storage/reload.result | 68 ++++++++++++++++++
> test/storage/reload.test.lua | 23 +++++++
> vshard/cfg.lua | 6 ++
> vshard/error.lua | 5 ++
> vshard/replicaset.lua | 100 ++++++++++++++++++++-------
> vshard/router/init.lua | 44 ++++++++----
> vshard/storage/init.lua | 45 ++++++++----
> vshard/util.lua | 20 ++++++
> 11 files changed, 466 insertions(+), 55 deletions(-)
>
> diff --git a/test/router/reload.result b/test/router/reload.result
> index 47f3c2e..3fbbe6e 100644
> --- a/test/router/reload.result
> +++ b/test/router/reload.result
> @@ -174,6 +174,165 @@ vshard.router.module_version()
> check_reloaded()
> ---
> ...
> +--
> +-- Outdate old replicaset and replica objets.
> +--
> +rs = vshard.router.route(1)
> +---
> +...
> +rs:callro('echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +package.loaded["vshard.router"] = nil
> +---
> +...
> +_ = require('vshard.router')
> +---
> +...
> +-- Make sure outdate async task has had cpu time.
> +fiber.sleep(0.005)
2. As I asked earlier, please, avoid constant timeouts.
When you want to wait for something, use 'while'.
> +---
> +...
> +rs.callro(rs, 'echo', {'some_data'})
> +---
> +- null
> +- type: ShardingError
> + name: OBJECT_IS_OUTDATED
> + message: Object is outdated after module reload/reconfigure. Use new instance.
> + code: 20
> +...
> +vshard.router = require('vshard.router')
> +---
> +...
> +rs = vshard.router.route(1)
> +---
> +...
> +rs:callro('echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +-- Test `connection_outdate_delay`.
> +old_connection_delay = cfg.connection_outdate_delay
> +---
> +...
> +cfg.connection_outdate_delay = 0.3
> +---
> +...
> +vshard.router.cfg(cfg)
> +---
> +...
> +cfg.connection_outdate_delay = old_connection_delay
> +---
> +...
> +vshard.router.internal.connection_outdate_delay = nil
> +---
> +...
> +vshard.router = require('vshard.router')
3. You have already did it few lines above.
> +---
> +...
> +rs_new = vshard.router.route(1)
> +---
> +...
> +rs_old = rs
> +---
> +...
> +_, replica_old = next(rs_old.replicas)
> +---
> +...
> +rs_new:callro('echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +-- Check old objets are still valid.
> +rs_old:callro('echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +replica_old.conn ~= nil
> +---
> +- true
> +...
> +fiber.sleep(0.2)
> +---
> +...
> +rs_old:callro('echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +replica_old.conn ~= nil
> +---
> +- true
> +...
> +replica_old.outdated == nil
> +---
> +- true
> +...
> +fiber.sleep(0.2)
> +---
> +...
> +rs_old:callro('echo', {'some_data'})
> +---
> +- null
> +- type: ShardingError
> + name: OBJECT_IS_OUTDATED
> + message: Object is outdated after module reload/reconfigure. Use new instance.
> + code: 20
> +...
> +replica_old.conn == nil
> +---
> +- true
> +...
> +replica_old.outdated == true
> +---
> +- true
> +...
> +rs_new:callro('echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +-- Error during reconfigure process.
4. You added this test in the previous commit.
> +_ = vshard.router.route(1):callro('echo', {'some_data'})
> +---
> +...
> +vshard.router.internal.errinj.ERRINJ_CFG = true
> +---
> +...
> +old_internal = table.copy(vshard.router.internal)
> +---
> +...
> +package.loaded["vshard.router"] = nil
> +---
> +...
> +_, err = pcall(require, 'vshard.router')
> +---
> +...
> +err:match('Error injection:.*')
> +---
> +- 'Error injection: cfg'
> +...
> +vshard.router.internal.errinj.ERRINJ_CFG = false
> +---
> +...
> +util.has_same_fields(old_internal, vshard.router.internal)
> +---
> +- true
> +...
> +_ = vshard.router.route(1):callro('echo', {'some_data'})
> +---
> +...
> test_run:switch('default')
> ---
> - true
> diff --git a/test/storage/reload.result b/test/storage/reload.result
> index 531d984..0281e27 100644
> --- a/test/storage/reload.result
> +++ b/test/storage/reload.result
> @@ -174,6 +174,74 @@ vshard.storage.module_version()
> check_reloaded()
> ---
> ...
> +--
> +-- Outdate old replicaset and replica objets.
> +--
> +_, rs = next(vshard.storage.internal.replicasets)
> +---
> +...
> +package.loaded["vshard.storage"] = nil
> +---
> +...
> +_ = require('vshard.storage')
> +---
> +...
> +rs.callro(rs, 'echo', {'some_data'})
> +---
> +- null
> +- type: ShardingError
> + name: OBJECT_IS_OUTDATED
> + message: Object is outdated after module reload/reconfigure. Use new instance.
> + code: 20
> +...
> +_, rs = next(vshard.storage.internal.replicasets)
> +---
> +...
> +rs.callro(rs, 'echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +-- Error during reload process.
> +_, rs = next(vshard.storage.internal.replicasets)
> +---
> +...
> +rs:callro('echo', {'some_data'})
> +---
> +- some_data
> +- null
> +- null
> +...
> +vshard.storage.internal.errinj.ERRINJ_CFG = true
5. Same as 4. We have already added the test in the previous
commit, it is not?
> +---
> +...
> +old_internal = table.copy(vshard.storage.internal)
> +---
> +...
> +package.loaded["vshard.storage"] = nil
> +---
> +...
> +_, err = pcall(require, 'vshard.storage')
> +---
> +...
> +err:match('Error injection:.*')
> +---
> +- 'Error injection: cfg'
> +...
> +vshard.storage.internal.errinj.ERRINJ_CFG = false
> +---
> +...
> +util.has_same_fields(old_internal, vshard.storage.internal)
> +---
> +- true
> +...
> +_, rs = next(vshard.storage.internal.replicasets)
> +---
> +...
> +_ = rs:callro('echo', {'some_data'})
> +---
> +...
> test_run:switch('default')
> ---
> - true
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index d5429af..87d0fc8 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -217,6 +217,10 @@ local cfg_template = {
> type = 'non-negative number', name = 'Sync timeout', is_optional = true,
> default = consts.DEFAULT_SYNC_TIMEOUT
> }},
> + {'connection_outdate_delay', {
> + type = 'non-negative number', name = 'Object outdate timeout',
> + is_optional = true, default = nil
6. default = nil makes no sense for Lua tables.
> + }},
> }
>
> --
> @@ -264,6 +268,8 @@ local function remove_non_box_options(cfg)
> cfg.collect_bucket_garbage_interval = nil
> cfg.collect_lua_garbage = nil
> cfg.sync_timeout = nil
> + cfg.sync_timeout = nil
7. Why do you need the second nullify of sync_timeout?
> + cfg.connection_outdate_delay = nil
> end
>
> return {
> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
> index 99f59aa..ec6e95b 100644
> --- a/vshard/replicaset.lua
> +++ b/vshard/replicaset.lua> @@ -412,6 +424,49 @@ for name, func in pairs(replica_mt.__index) do
> end
> replica_mt.__index = index
>
> +--
> +-- Meta-methods of outdated objects
8. Put the dot at the end of the sentence.
> +-- They define only arrtibutes from corresponding metatables to
9. Typo: arrtibutes.
> +-- make user able to access fields of old objects.
> +--
> +local function outdated_warning(...)
> + return nil, lerror.vshard(lerror.code.OBJECT_IS_OUTDATED)
> +end
> +
> +local outdated_replicaset_mt = {
> + __index = {
> + outdated = true
> + }
> +}
10. outdated_replicaset/replica_mt are identical. Please,
make one mt object names 'outdated_mt'.
> +for fname, func in pairs(replicaset_mt.__index) do
> + outdated_replicaset_mt.__index[fname] = outdated_warning
> +end
11. As I remember, my proposal was to do not
duplicate each method here, but just on any __index return
callable object that on call invokes outdated_warning.
You can ask, what to do with 'outdated' attribute then, but
you can set it directly in the object in outdate_replicasets()
function.
What is more, please, use 'is_outdated' since this value is a
flag. And add it to the description on top of the file, where
all attributes are documented.
> +
> +local outdated_replica_mt = {
> + __index = {
> + outdated = true
> + }
> +}
> +for fname, func in pairs(replica_mt.__index) do
> + outdated_replica_mt.__index[fname] = outdated_warning
> +end
> +
> +--
> +-- Outdate replicaset and replica objects:
> +-- * Set outdated_metatables.
> +-- * Remove connections.
> +--
> +outdate_replicasets = function(replicasets)
> + for _, replicaset in pairs(replicasets) do
> + setmetatable(replicaset, outdated_replicaset_mt)
> + for _, replica in pairs(replicaset.replicas) do
> + setmetatable(replica, outdated_replica_mt)
> + replica.conn = nil
Here you can put 'replica.is_outdated = true'.
> + end
Here you can put 'replicaset.is_outdated = true'.
> + end
> + log.info('Old replicaset and replica objects are outdated.')
> +end
> +
> --
> -- Calculate for each replicaset its etalon bucket count.
> -- Iterative algorithm is used to learn the best balance in a
> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
> index a143070..a8f6bbc 100644
> --- a/vshard/router/init.lua
> +++ b/vshard/router/init.lua
> @@ -479,12 +493,13 @@ local function router_cfg(cfg)
> else
> log.info('Starting router reconfiguration')
> end
> - local new_replicasets = lreplicaset.buildall(cfg, M.replicasets)
> + local new_replicasets = lreplicaset.buildall(cfg)
> local total_bucket_count = cfg.bucket_count
> local collect_lua_garbage = cfg.collect_lua_garbage
> - lcfg.remove_non_box_options(cfg)
> + local box_cfg = table.deepcopy(cfg)
12. As I remember, I asked to use table.copy when
possible. And this case looks appropriate.
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 052e94f..bf560e6 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -2,20 +2,34 @@ local log = require('log')
> local luri = require('uri')
> local lfiber = require('fiber')
> local netbox = require('net.box') -- for net.box:self()
> +local trigger = require('internal.trigger')
> +
> +local MODULE_INTERNALS = '__module_vshard_storage'
> +-- Reload requirements, in case this module is reloaded manually.
> +if rawget(_G, MODULE_INTERNALS) then
> + local vshard_modules = {
> + 'vshard.consts', 'vshard.error', 'vshard.cfg',
> + 'vshard.replicaset', 'vshard.util',
> + }
> + for _, module in pairs(vshard_modules) do
> + package.loaded[module] = nil
> + end
> +end
> local consts = require('vshard.consts')
> local lerror = require('vshard.error')
> -local util = require('vshard.util')
> local lcfg = require('vshard.cfg')
> local lreplicaset = require('vshard.replicaset')
> -local trigger = require('internal.trigger')
> +local util = require('vshard.util')
>
> -local M = rawget(_G, '__module_vshard_storage')
> +local M = rawget(_G, MODULE_INTERNALS)
> if not M then
> --
> -- The module is loaded for the first time.
> --
> M = {
> ---------------- Common module attributes ----------------
> + -- The last passed configuration.
> + current_cfg = nil,
13. Please, add the same assignment to the router module
initialization.
More information about the Tarantool-patches
mailing list