From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: tarantool-patches@freelists.org, AKhatskevich <avkhatskevich@tarantool.org> Subject: [tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race Date: Thu, 21 Jun 2018 15:54:25 +0300 [thread overview] Message-ID: <edbc3f57-83b9-8434-7893-99857cacb699@tarantool.org> (raw) In-Reply-To: <fa97ea478a9e4d85237eeb2bb8ccab0067d4914d.1529066485.git.avkhatskevich@tarantool.org> Thanks for the patch! See 5 comments below. On 15/06/2018 15:47, AKhatskevich wrote: > This commit prevents discovery fiber from discovering old replicasets > and spoiling `route_map`. > --- > test/router/router.result | 62 +++++++++++++++++++++++++++++++++++++++++++++ > test/router/router.test.lua | 42 ++++++++++++++++++++++++++++++ > vshard/router/init.lua | 15 ++++++++++- > 3 files changed, 118 insertions(+), 1 deletion(-) > > diff --git a/test/router/router.result b/test/router/router.result > index 5643f3e..e61505e 100644 > --- a/test/router/router.result > +++ b/test/router/router.result > @@ -1095,6 +1095,68 @@ for bucket, old_rs in pairs(bucket_to_old_rs) do > end; > --- > ... > +-- > +-- Check route_map is not filled with old replica objects after > +-- recpnfigure. 1. Typo. > +-- > +-- Perform #replicasets phases of discovery, to update replicasets > +-- object in for loop of discovery fiber since previous cfg. > +for _, __ in pairs(vshard.router.internal.replicasets) do > + vshard.router.discovery_wakeup() > + fiber.sleep(0.02) > +end; > +--- > +... > +-- Simulate long `callro`. > +-- Stuck on first rs in replicasets. > +vshard.router.internal.errinj.LONG_DISCOVERY = true; 2. I do not see this error injection in the M.errinj declaration. > +--- > +... > +for _, __ in pairs(vshard.router.internal.replicasets) do > + vshard.router.discovery_wakeup() > + fiber.sleep(0.02) > +end; 3. This cycle makes no sense. With set LONG_DISCOVERY it is equivalent to calling router.discovery_wakeup() once. > +--- > +... > +vshard.router.cfg(cfg); > +--- > +... > +vshard.router.internal.errinj.LONG_DISCOVERY = nil; > +--- > +... > +-- Do discovery iteration. > +vshard.router.discovery_wakeup() > +fiber.sleep(0.02) 4. Concrete timeouts are the way to create an unstable test. Please, get rid of them and replace with 'while not cond do wait end' where necessary. > + > +rs_cnt = 0; > +--- > +... > +new_replicasets = {} > +for _, rs in pairs(vshard.router.internal.replicasets) do > + new_replicasets[rs] = true > + rs_cnt = rs_cnt + 1 > +end; > +--- > +... > +rs_cnt; > +--- > +- 2 > +... > +bucket_cnt = 0; > +--- > +... > +for bucket_id, rs in pairs(vshard.router.internal.route_map) do > + if not new_replicasets[rs] then> + error('Old object added to route_map.') > + end > + bucket_cnt = bucket_cnt + 1 > +end; > +--- > +... > +bucket_cnt; > +--- > +- 3000 > +... > test_run:cmd("setopt delimiter ''"); > --- > - true > diff --git a/vshard/router/init.lua b/vshard/router/init.lua > index 7e765fa..df5b343 100644 > --- a/vshard/router/init.lua > +++ b/vshard/router/init.lua > @@ -127,10 +127,23 @@ local function discovery_f(module_version) > local iterations_until_lua_gc = > consts.COLLECT_LUA_GARBAGE_INTERVAL / consts.DISCOVERY_INTERVAL > while module_version == M.module_version do > - for _, replicaset in pairs(M.replicasets) do > + local old_replicasets = M.replicasets > + for rs_uuid, replicaset in pairs(M.replicasets) do > local active_buckets, err = > replicaset:callro('vshard.storage.buckets_discovery', {}, > {timeout = 2}) > + while M.errinj.LONG_DISCOVERY do > + -- Stuck on the first replicaset. > + if rs_uuid ~= select(1, next(M.replicasets)) then > + break > + end > + lfiber.sleep(0.01) > + end > + -- Renew replicasets object in case of reconfigure > + -- and reload events. 5. You do not renew here anything. > + if M.replicasets ~= old_replicasets then > + break > + end > if not active_buckets then > log.error('Error during discovery %s: %s', replicaset, err) > else >
next prev parent reply other threads:[~2018-06-21 12:54 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-06-15 12:47 [tarantool-patches] [PATCH 0/2][vshard] preserve route map AKhatskevich 2018-06-15 12:47 ` [tarantool-patches] [PATCH 1/2] Preserve route_map on router.cfg AKhatskevich 2018-06-21 12:54 ` [tarantool-patches] " Vladislav Shpilevoy 2018-06-25 12:48 ` Alex Khatskevich 2018-06-15 12:47 ` [tarantool-patches] [PATCH 2/2] Fix discovery/reconfigure race AKhatskevich 2018-06-21 12:54 ` Vladislav Shpilevoy [this message] 2018-06-25 12:48 ` [tarantool-patches] " Alex Khatskevich 2018-06-26 11:11 ` Vladislav Shpilevoy 2018-06-26 14:03 ` Alex Khatskevich 2018-06-27 11:45 ` Vladislav Shpilevoy 2018-06-27 19:50 ` Alex Khatskevich 2018-06-28 19:41 ` Vladislav Shpilevoy 2018-06-21 12:54 ` [tarantool-patches] Re: [PATCH 0/2][vshard] preserve route map Vladislav Shpilevoy 2018-06-25 11:52 ` Alex Khatskevich
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=edbc3f57-83b9-8434-7893-99857cacb699@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=avkhatskevich@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox