From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id EAFA170150; Sat, 4 Dec 2021 03:20:42 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EAFA170150 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1638577243; bh=MAfy3+jBcYNAAkHZTQDIvuO6LSgVtJCnoFxtnoqixpM=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=JwG3WahsUfkis5srrRhV3O008VS5awUaNhpUVM1LTPsCQVT3gkuj6aP/wOE++zQ1b dzGXX22j88XRc9j2/gqY2wkuHxigZGF+lsr3F7Caojynb/p5JlFCS94HB2Yme1UTc2 ZOy8jvnY2AWxw8ZUDgC0PhXAcXpeapEN2zn9tj+o= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id AC74D6D3C6 for ; Sat, 4 Dec 2021 03:19:41 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org AC74D6D3C6 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1mtImC-0003OJ-SN; Sat, 04 Dec 2021 03:19:41 +0300 To: tarantool-patches@dev.tarantool.org, olegrok@tarantool.org Date: Sat, 4 Dec 2021 01:19:38 +0100 Message-Id: <36c68a48a627c8af98fb6f9809d2661abcd93658.1638577114.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD93822B471089FF64D2EEAA3714BB05B1726D59E38B3EB6006182A05F538085040BC30B7EA05B3285B839093F5E4D5FBF2DB38A31156C1651EE2F9F99A7252B15F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE728F774C865CF4B07EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637BA5555A0C16230F4EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F23E694A44E35EA2E753303DA936B4ED87CC7F00164DA146DAFE8445B8C89999728AA50765F79006372A3B24BF85B2E607389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC80839144E5BB460BAF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B953A8A48A05D51F175ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FC30C0237AF5DE82703D4FD4A3A1262489CBE0EDFC75DD6F42B1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDBF87214F1A954108EDC48ACC2A39D04F89CDFB48F4795C241BDAD6C7F3747799A X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34047E322BFAFD69BD380F61A2B196A9591406326B70D6B0E820CB58E71DEDA9DA79D5534F13E660741D7E09C32AA3244C0DA41C5C36278EADD29E8ED4179AF97469B6CAE0477E908D729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj+/8mdklYaiiyxvPOUoAi4w== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DF4320FA13AB0C939D4DC92544408E7543841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E25FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" RO requests use the replica with the highest prio as specified in the weights matrix. If the best replica is not available now and failover didn't happen yet, then RO requests used to fallback to master. Even if there were other RO replicas with better prio. This patch makes RO call firstly try the currently selected most prio replica. If it is not available (no other connected replicas at all or failover didn't happen yet), the call will try to walk the prio list starting from this replica until it finds an available one. If it also fails, the call will try to walk the list from the beginning hoping that the unavailable replica wasn't the best one and there might be better option on the other side of the prio list. The patch was done in scope of task about replica backoff (#298) because the problem would additionally exist when the best replica is in backoff, not only disconnected. It would get worse. Closes #288 Needed for #298 --- test/failover/failover.result | 64 +++++++++++++------ test/failover/failover.test.lua | 30 +++++---- test/router/master_discovery.result | 92 +++++++++++++++++++++++---- test/router/master_discovery.test.lua | 38 ++++++++--- vshard/replicaset.lua | 21 ++++-- 5 files changed, 185 insertions(+), 60 deletions(-) diff --git a/test/failover/failover.result b/test/failover/failover.result index bae57fa..31eda0c 100644 --- a/test/failover/failover.result +++ b/test/failover/failover.result @@ -85,8 +85,8 @@ test_run:cmd('switch default') -- available; -- * up nearest replica priority if the best one is available -- again; --- * replicaset uses master connection, if the nearest's one is --- not available before call(); +-- * replicaset uses next prio available connection, if the +-- nearest's one is not available before call(); -- * current nearest connection is not down, when trying to -- connect to the replica with less weight. -- @@ -199,30 +199,17 @@ test_run:cmd('stop server box_1_d') - true ... -- Down_ts must be set in on_disconnect() trigger. -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] --- ... --- Try to execute read-only request - it must use master --- connection, because a replica's one is not available. -vshard.router.call(1, 'read', 'echo', {123}) ---- -- 123 -... -test_run:switch('box_1_a') ---- -- true -... -echo_count ---- -- 2 -... -test_run:switch('router_1') +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) --- - true ... -- New replica is box_1_b. -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) --- +- true ... rs1.replica.down_ts == nil --- @@ -247,14 +234,49 @@ test_run:cmd('switch box_1_b') ... -- Ensure the 'read' echo was executed on box_1_b - nearest -- available replica. -echo_count +assert(echo_count == 1) --- -- 1 +- true +... +test_run:switch('router_1') +--- +- true +... +-- +-- Kill the best replica. Don't need to wait for failover to happen for the +-- router to start using the next best available replica. +-- +test_run:cmd('stop server box_1_b') +--- +- true +... +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] +--- +... +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) +--- +- true +... +vshard.router.callro(1, 'echo', {123}) +--- +- 123 +... +test_run:switch('box_1_c') +--- +- true +... +assert(echo_count == 1) +--- +- true ... test_run:switch('router_1') --- - true ... +test_run:cmd('start server box_1_b') +--- +- true +... -- Revive the best replica. A router must reconnect to it in -- FAILOVER_UP_TIMEOUT seconds. test_run:cmd('start server box_1_d') diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua index a969e0e..b713319 100644 --- a/test/failover/failover.test.lua +++ b/test/failover/failover.test.lua @@ -43,8 +43,8 @@ test_run:cmd('switch default') -- available; -- * up nearest replica priority if the best one is available -- again; --- * replicaset uses master connection, if the nearest's one is --- not available before call(); +-- * replicaset uses next prio available connection, if the +-- nearest's one is not available before call(); -- * current nearest connection is not down, when trying to -- connect to the replica with less weight. -- @@ -86,15 +86,10 @@ rs1.replica_up_ts - old_up_ts >= vshard.consts.FAILOVER_UP_TIMEOUT -- box_1_d. test_run:cmd('stop server box_1_d') -- Down_ts must be set in on_disconnect() trigger. -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end --- Try to execute read-only request - it must use master --- connection, because a replica's one is not available. -vshard.router.call(1, 'read', 'echo', {123}) -test_run:switch('box_1_a') -echo_count -test_run:switch('router_1') +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) -- New replica is box_1_b. -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) rs1.replica.down_ts == nil rs1.replica_up_ts ~= nil test_run:grep_log('router_1', 'New replica box_1_b%(storage%@') @@ -103,8 +98,21 @@ vshard.router.callro(1, 'echo', {123}) test_run:cmd('switch box_1_b') -- Ensure the 'read' echo was executed on box_1_b - nearest -- available replica. -echo_count +assert(echo_count == 1) +test_run:switch('router_1') + +-- +-- gh-288: kill the best replica. Don't need to wait for failover to happen for +-- the router to start using the next best available replica. +-- +test_run:cmd('stop server box_1_b') +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) +vshard.router.callro(1, 'echo', {123}) +test_run:switch('box_1_c') +assert(echo_count == 1) test_run:switch('router_1') +test_run:cmd('start server box_1_b') -- Revive the best replica. A router must reconnect to it in -- FAILOVER_UP_TIMEOUT seconds. diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result index 7ebb67d..94a848b 100644 --- a/test/router/master_discovery.result +++ b/test/router/master_discovery.result @@ -325,6 +325,22 @@ end) start_aggressive_master_search() | --- | ... +test_run:cmd('stop server storage_1_b') + | --- + | - true + | ... +rs1 = vshard.router.static.replicasets[util.replicasets[1]] + | --- + | ... +replica = rs1.replicas[util.name_to_uuid.storage_1_b] + | --- + | ... +-- Ensure the replica is not available. Otherwise RO requests sneak into it +-- instead of waiting for master. +test_run:wait_cond(function() return not replica:is_connected() end) + | --- + | - true + | ... forget_masters() | --- @@ -337,8 +353,6 @@ vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) forget_masters() | --- | ... --- XXX: this should not depend on master so much. RO requests should be able to --- go to replicas. vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) | --- | - 1 @@ -357,8 +371,6 @@ vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) forget_masters() | --- | ... --- XXX: the same as above - should not really wait for master. Regardless of it --- being auto or not. vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) | --- | - 1 @@ -369,6 +381,10 @@ vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) stop_aggressive_master_search() | --- | ... +test_run:cmd('start server storage_1_b') + | --- + | - true + | ... test_run:switch('storage_1_a') | --- @@ -411,7 +427,7 @@ vshard.storage.cfg(cfg, instance_uuid) | --- | ... --- Try to make RW and RO requests but then turn of the auto search. +-- Try to make an RW request but then turn of the auto search. test_run:switch('router_1') | --- | - true @@ -425,14 +441,6 @@ f1 = fiber.create(function() end) | --- | ... --- XXX: should not really wait for master since this is an RO request. It could --- use a replica. -f2 = fiber.create(function() \ - fiber.self():set_joinable(true) \ - return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ -end) - | --- - | ... fiber.sleep(0.01) | --- | ... @@ -449,6 +457,31 @@ f1:join() | replicaset_uuid: | message: Master is not configured for replicaset | ... + +-- Try to make an RO request but then turn of the auto search. +test_run:cmd('stop server storage_1_a') + | --- + | - true + | ... +test_run:cmd('stop server storage_1_b') + | --- + | - true + | ... +forget_masters() + | --- + | ... +f2 = fiber.create(function() \ + fiber.self():set_joinable(true) \ + return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ +end) + | --- + | ... +fiber.sleep(0.01) + | --- + | ... +disable_auto_masters() + | --- + | ... f2:join() | --- | - true @@ -459,6 +492,14 @@ f2:join() | replicaset_uuid: | message: Master is not configured for replicaset | ... +test_run:cmd('start server storage_1_a') + | --- + | - true + | ... +test_run:cmd('start server storage_1_b') + | --- + | - true + | ... -- -- Multiple masters logging. @@ -473,6 +514,9 @@ replicas = cfg.sharding[util.replicasets[1]].replicas replicas[util.name_to_uuid.storage_1_a].master = true | --- | ... +replicas[util.name_to_uuid.storage_1_b].master = false + | --- + | ... vshard.storage.cfg(cfg, instance_uuid) | --- | ... @@ -484,6 +528,9 @@ test_run:switch('storage_1_b') replicas = cfg.sharding[util.replicasets[1]].replicas | --- | ... +replicas[util.name_to_uuid.storage_1_a].master = false + | --- + | ... replicas[util.name_to_uuid.storage_1_b].master = true | --- | ... @@ -495,6 +542,25 @@ test_run:switch('router_1') | --- | - true | ... +-- Ensure both replicas are connected. Otherwise the router can go to only one, +-- find it is master, and won't go to the second one until the first one resigns +-- or dies. +rs1 = vshard.router.static.replicasets[util.replicasets[1]] + | --- + | ... +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] + | --- + | ... +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] + | --- + | ... +test_run:wait_cond(function() \ + return replica1:is_connected() and replica2:is_connected() \ +end) + | --- + | - true + | ... + forget_masters() | --- | ... diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua index 6276dc9..8cacd6d 100644 --- a/test/router/master_discovery.test.lua +++ b/test/router/master_discovery.test.lua @@ -181,24 +181,27 @@ end) -- Call tries to wait for master if has enough time left. -- start_aggressive_master_search() +test_run:cmd('stop server storage_1_b') +rs1 = vshard.router.static.replicasets[util.replicasets[1]] +replica = rs1.replicas[util.name_to_uuid.storage_1_b] +-- Ensure the replica is not available. Otherwise RO requests sneak into it +-- instead of waiting for master. +test_run:wait_cond(function() return not replica:is_connected() end) forget_masters() vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) forget_masters() --- XXX: this should not depend on master so much. RO requests should be able to --- go to replicas. vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) forget_masters() vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) forget_masters() --- XXX: the same as above - should not really wait for master. Regardless of it --- being auto or not. vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) stop_aggressive_master_search() +test_run:cmd('start server storage_1_b') test_run:switch('storage_1_a') assert(echo_count == 4) @@ -218,23 +221,30 @@ replicas = cfg.sharding[util.replicasets[1]].replicas replicas[util.name_to_uuid.storage_1_a].master = false vshard.storage.cfg(cfg, instance_uuid) --- Try to make RW and RO requests but then turn of the auto search. +-- Try to make an RW request but then turn of the auto search. test_run:switch('router_1') forget_masters() f1 = fiber.create(function() \ fiber.self():set_joinable(true) \ return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) \ end) --- XXX: should not really wait for master since this is an RO request. It could --- use a replica. +fiber.sleep(0.01) +disable_auto_masters() +f1:join() + +-- Try to make an RO request but then turn of the auto search. +test_run:cmd('stop server storage_1_a') +test_run:cmd('stop server storage_1_b') +forget_masters() f2 = fiber.create(function() \ fiber.self():set_joinable(true) \ return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ end) fiber.sleep(0.01) disable_auto_masters() -f1:join() f2:join() +test_run:cmd('start server storage_1_a') +test_run:cmd('start server storage_1_b') -- -- Multiple masters logging. @@ -242,14 +252,26 @@ f2:join() test_run:switch('storage_1_a') replicas = cfg.sharding[util.replicasets[1]].replicas replicas[util.name_to_uuid.storage_1_a].master = true +replicas[util.name_to_uuid.storage_1_b].master = false vshard.storage.cfg(cfg, instance_uuid) test_run:switch('storage_1_b') replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = false replicas[util.name_to_uuid.storage_1_b].master = true vshard.storage.cfg(cfg, instance_uuid) test_run:switch('router_1') +-- Ensure both replicas are connected. Otherwise the router can go to only one, +-- find it is master, and won't go to the second one until the first one resigns +-- or dies. +rs1 = vshard.router.static.replicasets[util.replicasets[1]] +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] +test_run:wait_cond(function() \ + return replica1:is_connected() and replica2:is_connected() \ +end) + forget_masters() start_aggressive_master_search() test_run:wait_log('router_1', 'Found more than one master', nil, 10) diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua index 174c761..93f210e 100644 --- a/vshard/replicaset.lua +++ b/vshard/replicaset.lua @@ -455,18 +455,25 @@ local function replicaset_template_multicallro(prefer_replica, balance) return r end end - elseif prefer_replica then - r = replicaset.replica + else + local start_r = replicaset.replica + r = start_r while r do - if r:is_connected() and r ~= master then + if r:is_connected() and (not prefer_replica or r ~= master) then return r end r = r.next_by_priority end - else - r = replicaset.replica - if r and r:is_connected() then - return r + -- Iteration above could start not from the best prio replica. + -- Check the beginning of the list too. + for _, r in pairs(replicaset.priority_list) do + if r == start_r then + -- Reached already checked part. + break + end + if r:is_connected() and (not prefer_replica or r ~= master) then + return r + end end end end -- 2.24.3 (Apple Git-128)