From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 9F1F1267E3 for ; Tue, 19 Jun 2018 05:00:45 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zLQ3jC3Wn3GU for ; Tue, 19 Jun 2018 05:00:45 -0400 (EDT) Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 390C625F00 for ; Tue, 19 Jun 2018 05:00:45 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 2/2] Complete module reload References: <991c6880-9d5e-3f51-9003-a095984dc30d@tarantool.org> From: Vladislav Shpilevoy Message-ID: <18667cab-8501-a46d-8255-89abfcb2bf0c@tarantool.org> Date: Tue, 19 Jun 2018 12:00:42 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: Alex Khatskevich , tarantool-patches@freelists.org Hello. Thanks for the fixes! See 10 comments below and the commit with some new fixes on the branch. > diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua > index 99f59aa..4e7bc10 100644 > --- a/vshard/replicaset.lua > +++ b/vshard/replicaset.lua > @@ -338,26 +339,80 @@ local function replicaset_tostring(replicaset) >  end > >  -- > --- Rebind connections of old replicas to new ones. > +-- Outdate old objects in case of reload/reconfigure. >  -- > -local function replicaset_rebind_connections(replicaset) > -    for _, replica in pairs(replicaset.replicas) do > -        local old_replica = replica.old_replica > -        if old_replica then > -            local conn = old_replica.conn > -            replica.conn = conn > -            replica.down_ts = old_replica.down_ts > -            replica.net_timeout = old_replica.net_timeout > -            replica.net_sequential_ok = old_replica.net_sequential_ok > -            replica.net_sequential_fail = old_replica.net_sequential_fail > -            if conn then > -                conn.replica = replica > -                conn.replicaset = replicaset > -                old_replica.conn = nil > +local function cleanup_old_objects(old_replicaset_mt, old_replica_mt, > +                                 old_replicas) > +    local function get_outdated_warning() > +        return function(...) > +            return nil, lerror.vshard(lerror.code.OBJECT_IS_OUTDATED) > +        end > +    end 1. Please, declare this outdated_mt out of this function in a module local variable like it is done with ordinal replica/replicaset mt. Not just __index, but entire mt. See comment 3 why. 2. You did not outdate replica metatables. How did your tests pass then? > +    old_replicaset_mt.__index = get_outdated_warning > +    -- Leave replica_mt unchanged. All its methods are safe. > +    for _, replica in pairs(old_replicas) do > +        replica.conn = nil > +    end > +    log.info('Old replicaset and replica objects are outdated.') > +end > + > +-- > +-- Initiate outdating process, which cancels connections and > +-- spoils object methods. > +-- > +local function outdate_replicasets(replicasets, outdate_delay) > +    -- Collect data to outdate old objects. > +    local old_replicaset_mt = nil > +    local old_replica_mt = nil > +    local old_replicas = {} > +    for _, replicaset in pairs(replicasets) do > +        local old_mt = getmetatable(replicaset) > +        assert(old_replicaset_mt == nil or old_replicaset_mt == old_mt) > +        old_replicaset_mt = old_mt > +        for _, replica in pairs(replicaset.replicas) do > +            table.insert(old_replicas, replica) > +            local old_mt = getmetatable(replica) > +            assert(old_replica_mt == nil or old_replica_mt == old_mt) > +            old_replica_mt = old_mt > +        end > +    end > +    util.async_task(outdate_delay, cleanup_old_objects, old_replicaset_mt, > +                    old_replica_mt, old_replicas) 3. You do not need to search old_replica_mt and old_replicaset_mt here. You do not need them at all. Just declare special object_outdate_mt and set this metatable to the old objects. Not just __index, but entire metatable. So you can get rid of this old mts search, old mt parameters. > +end > + > + > +-- > +-- Copy netbox conections from old replica objects to new ones > +-- and outdate old objects. > +-- @param replicasets New replicasets > +-- @param old_replicasets Replicasets and replicas to be outdated. > +-- @param outdate_delay Number of seconds; delay to outdate > +--        old objects. > +-- > +local function rebind_replicasets(replicasets, old_replicasets, outdate_delay) > +    for replicaset_uuid, replicaset in pairs(replicasets) do > +        local old_replicaset = old_replicasets and > + old_replicasets[replicaset_uuid]> +        for replica_uuid, replica in pairs(replicaset.replicas) do > +            local old_replica = old_replicaset and > + old_replicaset.replicas[replica_uuid]> +            if old_replica then > +                local conn = old_replica.conn > +                replica.conn = conn > +                replica.down_ts = old_replica.down_ts > +                replica.net_timeout = old_replica.net_timeout > +                replica.net_sequential_ok = old_replica.net_sequential_ok > +                replica.net_sequential_fail = old_replica.net_sequential_fail > +                if conn then > +                    conn.replica = replica > +                    conn.replicaset = replicaset > +                end >              end > -            replica.old_replica = nil >          end >      end > +    if old_replicasets then > +        outdate_replicasets(old_replicasets, outdate_delay) 4. Please, name it schedule_connections_outdate. Now the name confuses. It could be thought to deprecate the objects now, not after timeout. > +    end >  end > @@ -406,11 +452,30 @@ local replica_mt = { >          end >      end, >  } > -index = {} > -for name, func in pairs(replica_mt.__index) do > -    index[name] = gsc("replica", name, replica_mt, func) > + > +-- > +-- Get new copy of replicaset and replica meta tables. > +-- It is important to have distinct copies of meta tables to > +-- be able to outdate only old objects. > +-- > +local function copy_metatables() 5. Please, remove. This function is harmful and slow. You do not need to copy metatables. Use my recommendations above and think. > +    -- > +    -- Wrap self methods with a sanity checker. > +    -- > +    local replicaset_mt_copy = table.copy(replicaset_mt) > +    local replica_mt_copy = table.copy(replica_mt) > +    local index = {} > +    for name, func in pairs(replicaset_mt_copy.__index) do > +        index[name] = gsc("replicaset", name, replicaset_mt_copy, func) > +    end > +    replicaset_mt_copy.__index = index > +    index = {} > +    for name, func in pairs(replica_mt_copy.__index) do > +        index[name] = gsc("replica", name, replica_mt_copy, func) > +    end > +    replica_mt_copy.__index = index > +    return replicaset_mt_copy, replica_mt_copy >  end > -replica_mt.__index = index > > @@ -514,20 +579,17 @@ local function buildall(sharding_cfg, old_replicasets) 6. Unused parameter. >          zone_weights = {} >      end > @@ -535,8 +597,8 @@ local function buildall(sharding_cfg, old_replicasets) >                  uri = replica.uri, name = replica.name, uuid = replica_uuid, >                  zone = replica.zone, net_timeout = consts.CALL_TIMEOUT_MIN, >                  net_sequential_ok = 0, net_sequential_fail = 0, > -                down_ts = curr_ts, old_replica = old_replica, > -            }, replica_mt) > +                down_ts = curr_ts, > +            }, replica_mt_copy) >              new_replicaset.replicas[replica_uuid] = new_replica >              if replica.master then >                  new_replicaset.master = new_replica > @@ -596,4 +658,6 @@ return { >      buildall = buildall, >      calculate_etalon_balance = cluster_calculate_etalon_balance, >      wait_masters_connect = wait_masters_connect, > +    rebind_replicasets = rebind_replicasets, > +    outdate_replicasets = outdate_replicasets, 7. Why do you need to export this function? >  } > diff --git a/vshard/router/init.lua b/vshard/router/init.lua > index 1dee80c..d70d060 100644 > --- a/vshard/router/init.lua > +++ b/vshard/router/init.lua > @@ -459,6 +473,7 @@ end > >  local function router_cfg(cfg) >      cfg = lcfg.check(cfg) > +    local current_cfg = table.deepcopy(cfg) 8. Ok, I understand why do you copy it, but please, write a comment about it. >      if not M.replicasets then >          log.info('Starting router configuration') >      else > @@ -776,15 +792,16 @@ end >  -- About functions, saved in M, and reloading see comment in >  -- storage/init.lua. >  -- > -M.discovery_f = discovery_f > -M.failover_f = failover_f > - > -if not rawget(_G, '__module_vshard_router') then > -    rawset(_G, '__module_vshard_router', M) > +if not rawget(_G, MODULE_INTERNALS) then > +    rawset(_G, MODULE_INTERNALS, M) >  else > +    router_cfg(M.current_cfg) >      M.module_version = M.module_version + 1 >  end > > +M.discovery_f = discovery_f > +M.failover_f = failover_f 9. Why did you move these lines? > + >  return { >      cfg = router_cfg; >      info = router_info; > diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua > index 879c7c4..50347b0 100644 > --- a/vshard/storage/init.lua > +++ b/vshard/storage/init.lua > @@ -21,6 +33,8 @@ if not M then >          -- See format in replicaset.lua. >          -- >          replicasets = nil, > +        -- Time to deprecate old objects on reload. > +        reload_deprecate_timeout = nil, 10. Why here is 'reload_deprecate_timeout', but 'connection_outdate_delay' in router? >          -- Triggers on master switch event. They are called right >          -- before the event occurs. >          _on_master_enable = trigger.new('_on_master_enable'),