From: Oleg Babin via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Date: Sun, 5 Dec 2021 20:53:26 +0300 [thread overview] Message-ID: <03bd48be-b017-a48f-773e-f4b8e95f8cf5@tarantool.org> (raw) In-Reply-To: <36c68a48a627c8af98fb6f9809d2661abcd93658.1638577114.git.v.shpilevoy@tarantool.org> Thanks for your patch. See one comment below. Also as I see CI is red on your branch. Is it related to your changes? On 04.12.2021 03:19, Vladislav Shpilevoy wrote: > RO requests use the replica with the highest prio as specified in > the weights matrix. > > If the best replica is not available now and failover didn't > happen yet, then RO requests used to fallback to master. Even if > there were other RO replicas with better prio. > > This patch makes RO call firstly try the currently selected most > prio replica. If it is not available (no other connected replicas > at all or failover didn't happen yet), the call will try to walk > the prio list starting from this replica until it finds an > available one. > > If it also fails, the call will try to walk the list from the > beginning hoping that the unavailable replica wasn't the best one > and there might be better option on the other side of the prio > list. > > The patch was done in scope of task about replica backoff (#298) > because the problem would additionally exist when the best replica > is in backoff, not only disconnected. It would get worse. > > Closes #288 > Needed for #298 > --- > test/failover/failover.result | 64 +++++++++++++------ > test/failover/failover.test.lua | 30 +++++---- > test/router/master_discovery.result | 92 +++++++++++++++++++++++---- > test/router/master_discovery.test.lua | 38 ++++++++--- > vshard/replicaset.lua | 21 ++++-- > 5 files changed, 185 insertions(+), 60 deletions(-) > > diff --git a/test/failover/failover.result b/test/failover/failover.result > index bae57fa..31eda0c 100644 > --- a/test/failover/failover.result > +++ b/test/failover/failover.result > @@ -85,8 +85,8 @@ test_run:cmd('switch default') > -- available; > -- * up nearest replica priority if the best one is available > -- again; > --- * replicaset uses master connection, if the nearest's one is > --- not available before call(); > +-- * replicaset uses next prio available connection, if the > +-- nearest's one is not available before call(); > -- * current nearest connection is not down, when trying to > -- connect to the replica with less weight. > -- > @@ -199,30 +199,17 @@ test_run:cmd('stop server box_1_d') > - true > ... > -- Down_ts must be set in on_disconnect() trigger. > -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end > +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] > --- > ... > --- Try to execute read-only request - it must use master > --- connection, because a replica's one is not available. > -vshard.router.call(1, 'read', 'echo', {123}) > ---- > -- 123 > -... > -test_run:switch('box_1_a') > ---- > -- true > -... > -echo_count > ---- > -- 2 > -... > -test_run:switch('router_1') > +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) > --- > - true > ... > -- New replica is box_1_b. > -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end > +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) > --- > +- true > ... > rs1.replica.down_ts == nil > --- > @@ -247,14 +234,49 @@ test_run:cmd('switch box_1_b') > ... > -- Ensure the 'read' echo was executed on box_1_b - nearest > -- available replica. > -echo_count > +assert(echo_count == 1) > --- > -- 1 > +- true > +... > +test_run:switch('router_1') > +--- > +- true > +... > +-- > +-- Kill the best replica. Don't need to wait for failover to happen for the > +-- router to start using the next best available replica. > +-- > +test_run:cmd('stop server box_1_b') > +--- > +- true > +... > +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] > +--- > +... > +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) > +--- > +- true > +... > +vshard.router.callro(1, 'echo', {123}) > +--- > +- 123 > +... > +test_run:switch('box_1_c') > +--- > +- true > +... > +assert(echo_count == 1) > +--- > +- true > ... > test_run:switch('router_1') > --- > - true > ... > +test_run:cmd('start server box_1_b') > +--- > +- true > +... > -- Revive the best replica. A router must reconnect to it in > -- FAILOVER_UP_TIMEOUT seconds. > test_run:cmd('start server box_1_d') > diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua > index a969e0e..b713319 100644 > --- a/test/failover/failover.test.lua > +++ b/test/failover/failover.test.lua > @@ -43,8 +43,8 @@ test_run:cmd('switch default') > -- available; > -- * up nearest replica priority if the best one is available > -- again; > --- * replicaset uses master connection, if the nearest's one is > --- not available before call(); > +-- * replicaset uses next prio available connection, if the > +-- nearest's one is not available before call(); > -- * current nearest connection is not down, when trying to > -- connect to the replica with less weight. > -- > @@ -86,15 +86,10 @@ rs1.replica_up_ts - old_up_ts >= vshard.consts.FAILOVER_UP_TIMEOUT > -- box_1_d. > test_run:cmd('stop server box_1_d') > -- Down_ts must be set in on_disconnect() trigger. > -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end > --- Try to execute read-only request - it must use master > --- connection, because a replica's one is not available. > -vshard.router.call(1, 'read', 'echo', {123}) > -test_run:switch('box_1_a') > -echo_count > -test_run:switch('router_1') > +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] > +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) > -- New replica is box_1_b. > -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end > +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) > rs1.replica.down_ts == nil > rs1.replica_up_ts ~= nil > test_run:grep_log('router_1', 'New replica box_1_b%(storage%@') > @@ -103,8 +98,21 @@ vshard.router.callro(1, 'echo', {123}) > test_run:cmd('switch box_1_b') > -- Ensure the 'read' echo was executed on box_1_b - nearest > -- available replica. > -echo_count > +assert(echo_count == 1) > +test_run:switch('router_1') > + > +-- > +-- gh-288: kill the best replica. Don't need to wait for failover to happen for > +-- the router to start using the next best available replica. > +-- > +test_run:cmd('stop server box_1_b') > +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] > +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) > +vshard.router.callro(1, 'echo', {123}) > +test_run:switch('box_1_c') > +assert(echo_count == 1) > test_run:switch('router_1') > +test_run:cmd('start server box_1_b') > > -- Revive the best replica. A router must reconnect to it in > -- FAILOVER_UP_TIMEOUT seconds. > diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result > index 7ebb67d..94a848b 100644 > --- a/test/router/master_discovery.result > +++ b/test/router/master_discovery.result > @@ -325,6 +325,22 @@ end) > start_aggressive_master_search() > | --- > | ... > +test_run:cmd('stop server storage_1_b') > + | --- > + | - true > + | ... > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > + | --- > + | ... > +replica = rs1.replicas[util.name_to_uuid.storage_1_b] > + | --- > + | ... > +-- Ensure the replica is not available. Otherwise RO requests sneak into it > +-- instead of waiting for master. > +test_run:wait_cond(function() return not replica:is_connected() end) > + | --- > + | - true > + | ... > > forget_masters() > | --- > @@ -337,8 +353,6 @@ vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) > forget_masters() > | --- > | ... > --- XXX: this should not depend on master so much. RO requests should be able to > --- go to replicas. > vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) > | --- > | - 1 > @@ -357,8 +371,6 @@ vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) > forget_masters() > | --- > | ... > --- XXX: the same as above - should not really wait for master. Regardless of it > --- being auto or not. > vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) > | --- > | - 1 > @@ -369,6 +381,10 @@ vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) > stop_aggressive_master_search() > | --- > | ... > +test_run:cmd('start server storage_1_b') > + | --- > + | - true > + | ... > > test_run:switch('storage_1_a') > | --- > @@ -411,7 +427,7 @@ vshard.storage.cfg(cfg, instance_uuid) > | --- > | ... > > --- Try to make RW and RO requests but then turn of the auto search. > +-- Try to make an RW request but then turn of the auto search. > test_run:switch('router_1') > | --- > | - true > @@ -425,14 +441,6 @@ f1 = fiber.create(function() > end) > | --- > | ... > --- XXX: should not really wait for master since this is an RO request. It could > --- use a replica. > -f2 = fiber.create(function() \ > - fiber.self():set_joinable(true) \ > - return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ > -end) > - | --- > - | ... > fiber.sleep(0.01) > | --- > | ... > @@ -449,6 +457,31 @@ f1:join() > | replicaset_uuid: <replicaset_1> > | message: Master is not configured for replicaset <replicaset_1> > | ... > + > +-- Try to make an RO request but then turn of the auto search. > +test_run:cmd('stop server storage_1_a') > + | --- > + | - true > + | ... > +test_run:cmd('stop server storage_1_b') > + | --- > + | - true > + | ... > +forget_masters() > + | --- > + | ... > +f2 = fiber.create(function() \ > + fiber.self():set_joinable(true) \ > + return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ > +end) > + | --- > + | ... > +fiber.sleep(0.01) > + | --- > + | ... > +disable_auto_masters() > + | --- > + | ... > f2:join() > | --- > | - true > @@ -459,6 +492,14 @@ f2:join() > | replicaset_uuid: <replicaset_1> > | message: Master is not configured for replicaset <replicaset_1> > | ... > +test_run:cmd('start server storage_1_a') > + | --- > + | - true > + | ... > +test_run:cmd('start server storage_1_b') > + | --- > + | - true > + | ... > > -- > -- Multiple masters logging. > @@ -473,6 +514,9 @@ replicas = cfg.sharding[util.replicasets[1]].replicas > replicas[util.name_to_uuid.storage_1_a].master = true > | --- > | ... > +replicas[util.name_to_uuid.storage_1_b].master = false > + | --- > + | ... > vshard.storage.cfg(cfg, instance_uuid) > | --- > | ... > @@ -484,6 +528,9 @@ test_run:switch('storage_1_b') > replicas = cfg.sharding[util.replicasets[1]].replicas > | --- > | ... > +replicas[util.name_to_uuid.storage_1_a].master = false > + | --- > + | ... > replicas[util.name_to_uuid.storage_1_b].master = true > | --- > | ... > @@ -495,6 +542,25 @@ test_run:switch('router_1') > | --- > | - true > | ... > +-- Ensure both replicas are connected. Otherwise the router can go to only one, > +-- find it is master, and won't go to the second one until the first one resigns > +-- or dies. > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > + | --- > + | ... > +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] > + | --- > + | ... > +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] > + | --- > + | ... > +test_run:wait_cond(function() \ > + return replica1:is_connected() and replica2:is_connected() \ > +end) > + | --- > + | - true > + | ... > + > forget_masters() > | --- > | ... > diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua > index 6276dc9..8cacd6d 100644 > --- a/test/router/master_discovery.test.lua > +++ b/test/router/master_discovery.test.lua > @@ -181,24 +181,27 @@ end) > -- Call tries to wait for master if has enough time left. > -- > start_aggressive_master_search() > +test_run:cmd('stop server storage_1_b') > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > +replica = rs1.replicas[util.name_to_uuid.storage_1_b] > +-- Ensure the replica is not available. Otherwise RO requests sneak into it > +-- instead of waiting for master. > +test_run:wait_cond(function() return not replica:is_connected() end) > > forget_masters() > vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) > > forget_masters() > --- XXX: this should not depend on master so much. RO requests should be able to > --- go to replicas. > vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) > > forget_masters() > vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) > > forget_masters() > --- XXX: the same as above - should not really wait for master. Regardless of it > --- being auto or not. > vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) > > stop_aggressive_master_search() > +test_run:cmd('start server storage_1_b') > > test_run:switch('storage_1_a') > assert(echo_count == 4) > @@ -218,23 +221,30 @@ replicas = cfg.sharding[util.replicasets[1]].replicas > replicas[util.name_to_uuid.storage_1_a].master = false > vshard.storage.cfg(cfg, instance_uuid) > > --- Try to make RW and RO requests but then turn of the auto search. > +-- Try to make an RW request but then turn of the auto search. > test_run:switch('router_1') > forget_masters() > f1 = fiber.create(function() \ > fiber.self():set_joinable(true) \ > return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) \ > end) > --- XXX: should not really wait for master since this is an RO request. It could > --- use a replica. > +fiber.sleep(0.01) > +disable_auto_masters() > +f1:join() > + > +-- Try to make an RO request but then turn of the auto search. > +test_run:cmd('stop server storage_1_a') > +test_run:cmd('stop server storage_1_b') > +forget_masters() > f2 = fiber.create(function() \ > fiber.self():set_joinable(true) \ > return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ > end) > fiber.sleep(0.01) > disable_auto_masters() > -f1:join() > f2:join() > +test_run:cmd('start server storage_1_a') > +test_run:cmd('start server storage_1_b') > > -- > -- Multiple masters logging. > @@ -242,14 +252,26 @@ f2:join() > test_run:switch('storage_1_a') > replicas = cfg.sharding[util.replicasets[1]].replicas > replicas[util.name_to_uuid.storage_1_a].master = true > +replicas[util.name_to_uuid.storage_1_b].master = false > vshard.storage.cfg(cfg, instance_uuid) > > test_run:switch('storage_1_b') > replicas = cfg.sharding[util.replicasets[1]].replicas > +replicas[util.name_to_uuid.storage_1_a].master = false > replicas[util.name_to_uuid.storage_1_b].master = true > vshard.storage.cfg(cfg, instance_uuid) > > test_run:switch('router_1') > +-- Ensure both replicas are connected. Otherwise the router can go to only one, > +-- find it is master, and won't go to the second one until the first one resigns > +-- or dies. > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] > +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] > +test_run:wait_cond(function() \ > + return replica1:is_connected() and replica2:is_connected() \ > +end) > + > forget_masters() > start_aggressive_master_search() > test_run:wait_log('router_1', 'Found more than one master', nil, 10) > diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua > index 174c761..93f210e 100644 > --- a/vshard/replicaset.lua > +++ b/vshard/replicaset.lua > @@ -455,18 +455,25 @@ local function replicaset_template_multicallro(prefer_replica, balance) > return r > end > end > - elseif prefer_replica then > - r = replicaset.replica > + else > + local start_r = replicaset.replica > + r = start_r > while r do > - if r:is_connected() and r ~= master then > + if r:is_connected() and (not prefer_replica or r ~= master) then > return r > end > r = r.next_by_priority > end > - else > - r = replicaset.replica > - if r and r:is_connected() then > - return r > + -- Iteration above could start not from the best prio replica. > + -- Check the beginning of the list too. > + for _, r in pairs(replicaset.priority_list) do pairs can't be jitted (at least now until https://github.com/tarantool/tarantool/issues/6475). Also pairs don't give any guarantees about iteration order. Could it be replaced with ipairs? > + if r == start_r then > + -- Reached already checked part. > + break > + end > + if r:is_connected() and (not prefer_replica or r ~= master) then > + return r > + end > end > end > end
next prev parent reply other threads:[~2021-12-05 17:53 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-12-04 0:19 [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Vladislav Shpilevoy via Tarantool-patches 2021-12-05 17:44 ` Oleg Babin via Tarantool-patches 2021-12-06 23:34 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Vladislav Shpilevoy via Tarantool-patches 2021-12-05 17:53 ` Oleg Babin via Tarantool-patches [this message] 2021-12-06 23:56 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 2021-12-07 22:38 ` [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=03bd48be-b017-a48f-773e-f4b8e95f8cf5@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=olegrok@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH vshard 2/2] router: don'\''t fallback RO to master right away' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox