From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: AKhatskevich <avkhatskevich@tarantool.org>, tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH 2/2] Complete module reload Date: Sun, 17 Jun 2018 22:46:21 +0300 [thread overview] Message-ID: <991c6880-9d5e-3f51-9003-a095984dc30d@tarantool.org> (raw) In-Reply-To: <d132dfc18a98151b4d27f4017a7c3d5e54ece3a6.1528566184.git.avkhatskevich@tarantool.org> Thanks for the patch! See 11 comments below. On 09/06/2018 20:47, AKhatskevich wrote: > In case one need to upgrade vshard to a new version, this commit > introduces a reload mechanism which allows to do that for a > noticeable variety of possible changes (between the two versions). Nitpicking: it does not introduce reload mechanism. This commit just improves the thing. Maybe should we place here a manual how to reload? A short example. > > Reload process: > * reload all vshard modules > * create new `replicaset` and `replica` objects > * reuse old netbox connections in new replica objects if > possible > * update router/storage.internal table > * after a `reload_deprecate_timeout` disable old instances of > `replicaset` and `replica` objects > > Reload works for modules: > * vshard.router > * vshard.storage > > In case reload process fails, old router/storage module continue > working properly. > > Extra changes: > * add `util.async_task` method, which runs a function after a > delay > * delete replicaset:rebind_connections method as it is replaced > with `rebind_replicasets` which updates all replicasets at once > * introduce `module_reloading` for distinguishing reloaf from 1. reloaf -> reload. > reconfigure > * introduce MODULE_INTERNALS which stores name of the module > internal data in the global namespace > > Closes #112 > --- > test/router/reload.result | 109 +++++++++++++++++++++++++++++++++++++++++++ > test/router/reload.test.lua | 41 ++++++++++++++++ > test/router/router.result | 3 +- > test/storage/reload.result | 61 ++++++++++++++++++++++++ > test/storage/reload.test.lua | 24 ++++++++++ > vshard/replicaset.lua | 91 +++++++++++++++++++++++++++++------- > vshard/router/init.lua | 40 ++++++++++++---- > vshard/storage/init.lua | 44 ++++++++++++----- > vshard/util.lua | 20 ++++++++ > 9 files changed, 392 insertions(+), 41 deletions(-) > I will review the tests later, after we finish with the code. > diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua > index 99f59aa..48053a0 100644 > --- a/vshard/replicaset.lua > +++ b/vshard/replicaset.lua > @@ -48,7 +48,8 @@ local lerror = require('vshard.error') > local fiber = require('fiber') > local luri = require('uri') > local ffi = require('ffi') > -local gsc = require('vshard.util').generate_self_checker > +local util = require('vshard.util') > +local gsc = util.generate_self_checker > > -- > -- on_connect() trigger for net.box > @@ -338,26 +339,79 @@ local function replicaset_tostring(replicaset) > end > > -- > --- Rebind connections of old replicas to new ones. > +-- Deprecate old objects in case of reload. > +-- > +local function clean_old_objects(old_replicaset_mt, old_replica_mt, > + old_replicas) > + local function get_deprecated_warning(obj_name) > + return function(...) > + local finfo = debug.getinfo(2) > + local file = finfo.short_src > + local name = finfo.name > + local line = finfo.currentline > + local err_fmt = '%s.%s:%s: Object %s is outdated. Use new instance' > + error(string.format(err_fmt, file, name, line, obj_name)) > + end > + end 2. Too complex. 2.1 Why could not you just declare this function out of clean_old_object()? It does not depend on any upvalue here. And it is parsed every time when clean_old_objects is called slowing it down. 2.2. Too complex error message. Lets just throw error that object is outdated. > + local function replace_mt_methods(mt, method) > + if not mt then > + return > + end > + for name, _ in pairs(mt.__index) do > + mt.__index[name] = method > + end > + end > + replace_mt_methods(old_replicaset_mt, get_deprecated_warning('replicaset')) > + replace_mt_methods(old_replica_mt, get_deprecated_warning('replica')) 3. Why replace_mt_methods needs old mt? Why not just replace the whole old __index? I thought you would do something like this: getmetatable(old_object).__index = function() return function() error('The object is outdated') end end Any attempt to call a method in an old object leads to throwing an error. It allows to remove both replace_mt_methods, get_deprecated_warning. And one more remark: our convention is to throw only OOM and misusage errors. Here we should return two values: nil, error_object. Lets add a new error code for this object. > + for _, replica in pairs(old_replicas) do > + replica.conn = nil > + end > + log.info('Old replicaset and replica objects are cleaned up.') > +end > + > -- > -local function replicaset_rebind_connections(replicaset) > - for _, replica in pairs(replicaset.replicas) do > - local old_replica = replica.old_replica > - if old_replica then > - local conn = old_replica.conn > - replica.conn = conn > - replica.down_ts = old_replica.down_ts > - replica.net_timeout = old_replica.net_timeout > - replica.net_sequential_ok = old_replica.net_sequential_ok > - replica.net_sequential_fail = old_replica.net_sequential_fail > - if conn then > - conn.replica = replica > - conn.replicaset = replicaset > - old_replica.conn = nil > +-- Copy netbox conections from old replica objects to new ones. > +-- > +local function rebind_replicasets(replicasets, deprecate, deprecate_timeout) 4. Lets always deprecate old objects. Reconfigure is like reload called from API. And remove 'deprecate' parameter. On reconfiguration deprecating is even more useful because if during the reconfiguration a cluster topology is changed, then some of buckets can be transferred making old routes (and thus old replicaset objects that had been got by router.route() call) outdated. > + -- Collect data to deprecate old objects. > + local old_replicaset_mt = nil > + local old_replica_mt = nil > + local old_replicas = {} > + for _, replicaset in pairs(replicasets) do > + if replicaset.old_replicaset then > + local old_mt = getmetatable(replicaset.old_replicaset) > + assert(old_replicaset_mt == nil or old_replicaset_mt == old_mt) > + assert(not deprecate or > + old_replicaset_mt ~= getmetatable(replicaset)) > + old_replicaset_mt = old_mt > + replicaset.old_replicaset = nil > + end > + for _, replica in pairs(replicaset.replicas) do > + local old_replica = replica.old_replica > + if old_replica then > + local old_mt = getmetatable(old_replica) > + assert(old_replica_mt == nil or old_replica_mt == old_mt) > + assert(not deprecate or old_replica_mt ~= getmetatable(replica)) > + old_replica_mt = old_mt > + table.insert(old_replicas, old_replica) > + local conn = old_replica.conn > + replica.conn = conn > + replica.down_ts = old_replica.down_ts > + replica.net_timeout = old_replica.net_timeout > + replica.net_sequential_ok = old_replica.net_sequential_ok > + replica.net_sequential_fail = old_replica.net_sequential_fail > + if conn then > + conn.replica = replica > + conn.replicaset = replicaset > + end > + replica.old_replica = nil > end > - replica.old_replica = nil > end > end > + if deprecate then > + util.async_task(deprecate_timeout, clean_old_objects, old_replicaset_mt, > + old_replica_mt, old_replicas) 5. Here you have lost replicas and replicasets that had been removed from the configuration. Not updated, but exactly removed. You iterate over new replicasets only, that has not links to removed ones. You should not deprecate old objects via links in new ones. You should have two different methods: to schedule old objects deprecation and to rebind connections. Looks, like rebind_connections should be returned. > + end > end > > -- > @@ -523,6 +576,7 @@ local function buildall(sharding_cfg, old_replicasets) > weight = replicaset.weight, > bucket_count = 0, > lock = replicaset.lock, > + old_replicaset = old_replicaset, 6. You should not have this link. > }, replicaset_mt) > local priority_list = {} > diff --git a/vshard/router/init.lua b/vshard/router/init.lua > index 1dee80c..c075436 100644 > --- a/vshard/router/init.lua > +++ b/vshard/router/init.lua > @@ -129,6 +143,9 @@ local function discovery_f(module_version) > consts.COLLECT_LUA_GARBAGE_INTERVAL / consts.DISCOVERY_INTERVAL > while module_version == M.module_version do > for _, replicaset in pairs(M.replicasets) do > + if module_version ~= M.module_version then > + return > + end 7. As I told you verbally, this check should be done after callro. I can reload the module during callro(), and old replicaset object will fill the route map. Can you please extract this 3-line fix in a separate commit with a test? > local active_buckets, err = > replicaset:callro('vshard.storage.buckets_discovery', {}, > {timeout = 2}) > @@ -457,8 +474,11 @@ end > -- Configuration > -------------------------------------------------------------------------------- > > +-- Distinguish reload from reconfigure. > +local module_reloading = true > local function router_cfg(cfg) > cfg = lcfg.check(cfg) > + local current_cfg = table.deepcopy(cfg) 8. cfg is already deepcopy of the original cfg. > if not M.replicasets then > log.info('Starting router configuration') > else > diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua > index 879c7c4..b570821 100644 > --- a/vshard/storage/init.lua > +++ b/vshard/storage/init.lua > @@ -2,14 +2,27 @@ local log = require('log') > local luri = require('uri') > local lfiber = require('fiber') > local netbox = require('net.box') -- for net.box:self() > +local trigger = require('internal.trigger') > + > +local MODULE_INTERNALS = '__module_vshard_storage' > +-- Reload requirements, in case this module is reloaded manually. > +if rawget(_G, MODULE_INTERNALS) then > + local vshard_modules = { > + 'vshard.consts', 'vshard.error', 'vshard.cfg', > + 'vshard.replicaset', 'vshard.util', > + } > + for _, module in pairs(vshard_modules) do > + package.loaded[module] = nil > + end > +end > local consts = require('vshard.consts') > local lerror = require('vshard.error') > -local util = require('vshard.util') > local lcfg = require('vshard.cfg') > local lreplicaset = require('vshard.replicaset') > -local trigger = require('internal.trigger') > +local util = require('vshard.util') > > -local M = rawget(_G, '__module_vshard_storage') > + 9. Extra empty line. > +local M = rawget(_G, MODULE_INTERNALS) > if not M then > -- > -- The module is loaded for the first time. > @@ -21,6 +34,8 @@ if not M then > -- See format in replicaset.lua. > -- > replicasets = nil, > + -- Time to deprecate old objects on reload. > + reload_deprecate_timeout = nil, 10. Looks like this thing is always 'nil' until somebody does not manually set it via storage/router.internal. It is not good. Please, move it to configuration and create a default value. > -- Triggers on master switch event. They are called right > -- before the event occurs. > _on_master_enable = trigger.new('_on_master_enable'), > diff --git a/vshard/util.lua b/vshard/util.lua > index bb71318..f5bd78c 100644 > --- a/vshard/util.lua > +++ b/vshard/util.lua > @@ -75,8 +75,28 @@ local function generate_self_checker(obj_name, func_name, mt, func) > end > end > > + > +-- > +-- Run a function without interrupting current fiber. > +-- @param delay Delay in seconds before the task should be > +-- executed. > +-- @param task Function to be executed. > +-- @param ... Arguments which would be passed to the `task`. > +-- > +local function async_task(delay, task, ...) > + assert(delay == nil or type(delay) == 'number') > + local function wait_and_call(...) > + if delay then > + fiber.sleep(delay) > + end > + task(...) > + end 11. Please, avoid closures when possible. Every time the function is called the closure is parsed again. > + fiber.create(wait_and_call, ...) > +end > + > return { > tuple_extract_key = tuple_extract_key, > reloadable_fiber_f = reloadable_fiber_f, > generate_self_checker = generate_self_checker, > + async_task = async_task, > } >
next prev parent reply other threads:[~2018-06-17 19:46 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-06-09 17:47 [tarantool-patches] [PATCH 0/2][vshard] Vshard safe reload AKhatskevich 2018-06-09 17:47 ` [tarantool-patches] [PATCH 1/2] Add test on error during reconfigure AKhatskevich 2018-06-17 19:46 ` [tarantool-patches] " Vladislav Shpilevoy 2018-06-18 10:10 ` Alex Khatskevich 2018-06-18 10:48 ` Vladislav Shpilevoy 2018-06-19 10:36 ` Alex Khatskevich 2018-06-09 17:47 ` [tarantool-patches] [PATCH 2/2] Complete module reload AKhatskevich 2018-06-17 19:46 ` Vladislav Shpilevoy [this message] 2018-06-19 7:09 ` [tarantool-patches] " Alex Khatskevich 2018-06-19 9:00 ` Vladislav Shpilevoy 2018-06-19 11:49 ` Alex Khatskevich 2018-06-19 11:57 ` Vladislav Shpilevoy 2018-06-17 19:39 ` [tarantool-patches] Re: [PATCH 0/2][vshard] Vshard safe reload Vladislav Shpilevoy 2018-06-19 7:20 ` Alex Khatskevich
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=991c6880-9d5e-3f51-9003-a095984dc30d@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=avkhatskevich@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH 2/2] Complete module reload' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox