From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounces@dev.tarantool.org>
Received: from [87.239.111.99] (localhost [127.0.0.1])
	by dev.tarantool.org (Postfix) with ESMTP id EAFA170150;
	Sat,  4 Dec 2021 03:20:42 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EAFA170150
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev;
	t=1638577243; bh=MAfy3+jBcYNAAkHZTQDIvuO6LSgVtJCnoFxtnoqixpM=;
	h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=JwG3WahsUfkis5srrRhV3O008VS5awUaNhpUVM1LTPsCQVT3gkuj6aP/wOE++zQ1b
	 dzGXX22j88XRc9j2/gqY2wkuHxigZGF+lsr3F7Caojynb/p5JlFCS94HB2Yme1UTc2
	 ZOy8jvnY2AWxw8ZUDgC0PhXAcXpeapEN2zn9tj+o=
Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id AC74D6D3C6
 for <tarantool-patches@dev.tarantool.org>;
 Sat,  4 Dec 2021 03:19:41 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org AC74D6D3C6
Received: by smtpng1.m.smailru.net with esmtpa (envelope-from
 <v.shpilevoy@tarantool.org>)
 id 1mtImC-0003OJ-SN; Sat, 04 Dec 2021 03:19:41 +0300
To: tarantool-patches@dev.tarantool.org,
	olegrok@tarantool.org
Date: Sat,  4 Dec 2021 01:19:38 +0100
Message-Id: <36c68a48a627c8af98fb6f9809d2661abcd93658.1638577114.git.v.shpilevoy@tarantool.org>
X-Mailer: git-send-email 2.24.3 (Apple Git-128)
In-Reply-To: <cover.1638577114.git.v.shpilevoy@tarantool.org>
References: <cover.1638577114.git.v.shpilevoy@tarantool.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-7564579A: EEAE043A70213CC8
X-77F55803: 4F1203BC0FB41BD93822B471089FF64D2EEAA3714BB05B1726D59E38B3EB6006182A05F538085040BC30B7EA05B3285B839093F5E4D5FBF2DB38A31156C1651EE2F9F99A7252B15F
X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE728F774C865CF4B07EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637BA5555A0C16230F4EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F23E694A44E35EA2E753303DA936B4ED87CC7F00164DA146DAFE8445B8C89999728AA50765F79006372A3B24BF85B2E607389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC80839144E5BB460BAF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B953A8A48A05D51F175ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356
X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FC30C0237AF5DE82703D4FD4A3A1262489CBE0EDFC75DD6F42B1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDBF87214F1A954108EDC48ACC2A39D04F89CDFB48F4795C241BDAD6C7F3747799A
X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34047E322BFAFD69BD380F61A2B196A9591406326B70D6B0E820CB58E71DEDA9DA79D5534F13E660741D7E09C32AA3244C0DA41C5C36278EADD29E8ED4179AF97469B6CAE0477E908D729B2BEF169E0186
X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj+/8mdklYaiiyxvPOUoAi4w==
X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DF4320FA13AB0C939D4DC92544408E7543841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E25FEEDEB644C299C0ED14614B50AE0675
X-Mras: Ok
Subject: [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to
 master right away
X-BeenThere: tarantool-patches@dev.tarantool.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
From: Vladislav Shpilevoy via Tarantool-patches
 <tarantool-patches@dev.tarantool.org>
Reply-To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Errors-To: tarantool-patches-bounces@dev.tarantool.org
Sender: "Tarantool-patches" <tarantool-patches-bounces@dev.tarantool.org>

RO requests use the replica with the highest prio as specified in
the weights matrix.

If the best replica is not available now and failover didn't
happen yet, then RO requests used to fallback to master. Even if
there were other RO replicas with better prio.

This patch makes RO call firstly try the currently selected most
prio replica. If it is not available (no other connected replicas
at all or failover didn't happen yet), the call will try to walk
the prio list starting from this replica until it finds an
available one.

If it also fails, the call will try to walk the list from the
beginning hoping that the unavailable replica wasn't the best one
and there might be better option on the other side of the prio
list.

The patch was done in scope of task about replica backoff (#298)
because the problem would additionally exist when the best replica
is in backoff, not only disconnected. It would get worse.

Closes #288
Needed for #298
---
 test/failover/failover.result         | 64 +++++++++++++------
 test/failover/failover.test.lua       | 30 +++++----
 test/router/master_discovery.result   | 92 +++++++++++++++++++++++----
 test/router/master_discovery.test.lua | 38 ++++++++---
 vshard/replicaset.lua                 | 21 ++++--
 5 files changed, 185 insertions(+), 60 deletions(-)

diff --git a/test/failover/failover.result b/test/failover/failover.result
index bae57fa..31eda0c 100644
--- a/test/failover/failover.result
+++ b/test/failover/failover.result
@@ -85,8 +85,8 @@ test_run:cmd('switch default')
 --   available;
 -- * up nearest replica priority if the best one is available
 --   again;
--- * replicaset uses master connection, if the nearest's one is
---   not available before call();
+-- * replicaset uses next prio available connection, if the
+--   nearest's one is not available before call();
 -- * current nearest connection is not down, when trying to
 --   connect to the replica with less weight.
 --
@@ -199,30 +199,17 @@ test_run:cmd('stop server box_1_d')
 - true
 ...
 -- Down_ts must be set in on_disconnect() trigger.
-while rs1.replica.down_ts == nil do fiber.sleep(0.1) end
+box_1_d = rs1.replicas[names.replica_uuid.box_1_d]
 ---
 ...
--- Try to execute read-only request - it must use master
--- connection, because a replica's one is not available.
-vshard.router.call(1, 'read', 'echo', {123})
----
-- 123
-...
-test_run:switch('box_1_a')
----
-- true
-...
-echo_count
----
-- 2
-...
-test_run:switch('router_1')
+test_run:wait_cond(function() return box_1_d.down_ts ~= nil end)
 ---
 - true
 ...
 -- New replica is box_1_b.
-while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end
+test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end)
 ---
+- true
 ...
 rs1.replica.down_ts == nil
 ---
@@ -247,14 +234,49 @@ test_run:cmd('switch box_1_b')
 ...
 -- Ensure the 'read' echo was executed on box_1_b - nearest
 -- available replica.
-echo_count
+assert(echo_count == 1)
 ---
-- 1
+- true
+...
+test_run:switch('router_1')
+---
+- true
+...
+--
+-- Kill the best replica. Don't need to wait for failover to happen for the
+-- router to start using the next best available replica.
+--
+test_run:cmd('stop server box_1_b')
+---
+- true
+...
+box_1_b = rs1.replicas[names.replica_uuid.box_1_b]
+---
+...
+test_run:wait_cond(function() return box_1_b.down_ts ~= nil end)
+---
+- true
+...
+vshard.router.callro(1, 'echo', {123})
+---
+- 123
+...
+test_run:switch('box_1_c')
+---
+- true
+...
+assert(echo_count == 1)
+---
+- true
 ...
 test_run:switch('router_1')
 ---
 - true
 ...
+test_run:cmd('start server box_1_b')
+---
+- true
+...
 -- Revive the best replica. A router must reconnect to it in
 -- FAILOVER_UP_TIMEOUT seconds.
 test_run:cmd('start server box_1_d')
diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua
index a969e0e..b713319 100644
--- a/test/failover/failover.test.lua
+++ b/test/failover/failover.test.lua
@@ -43,8 +43,8 @@ test_run:cmd('switch default')
 --   available;
 -- * up nearest replica priority if the best one is available
 --   again;
--- * replicaset uses master connection, if the nearest's one is
---   not available before call();
+-- * replicaset uses next prio available connection, if the
+--   nearest's one is not available before call();
 -- * current nearest connection is not down, when trying to
 --   connect to the replica with less weight.
 --
@@ -86,15 +86,10 @@ rs1.replica_up_ts - old_up_ts >= vshard.consts.FAILOVER_UP_TIMEOUT
 -- box_1_d.
 test_run:cmd('stop server box_1_d')
 -- Down_ts must be set in on_disconnect() trigger.
-while rs1.replica.down_ts == nil do fiber.sleep(0.1) end
--- Try to execute read-only request - it must use master
--- connection, because a replica's one is not available.
-vshard.router.call(1, 'read', 'echo', {123})
-test_run:switch('box_1_a')
-echo_count
-test_run:switch('router_1')
+box_1_d = rs1.replicas[names.replica_uuid.box_1_d]
+test_run:wait_cond(function() return box_1_d.down_ts ~= nil end)
 -- New replica is box_1_b.
-while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end
+test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end)
 rs1.replica.down_ts == nil
 rs1.replica_up_ts ~= nil
 test_run:grep_log('router_1', 'New replica box_1_b%(storage%@')
@@ -103,8 +98,21 @@ vshard.router.callro(1, 'echo', {123})
 test_run:cmd('switch box_1_b')
 -- Ensure the 'read' echo was executed on box_1_b - nearest
 -- available replica.
-echo_count
+assert(echo_count == 1)
+test_run:switch('router_1')
+
+--
+-- gh-288: kill the best replica. Don't need to wait for failover to happen for
+-- the router to start using the next best available replica.
+--
+test_run:cmd('stop server box_1_b')
+box_1_b = rs1.replicas[names.replica_uuid.box_1_b]
+test_run:wait_cond(function() return box_1_b.down_ts ~= nil end)
+vshard.router.callro(1, 'echo', {123})
+test_run:switch('box_1_c')
+assert(echo_count == 1)
 test_run:switch('router_1')
+test_run:cmd('start server box_1_b')
 
 -- Revive the best replica. A router must reconnect to it in
 -- FAILOVER_UP_TIMEOUT seconds.
diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result
index 7ebb67d..94a848b 100644
--- a/test/router/master_discovery.result
+++ b/test/router/master_discovery.result
@@ -325,6 +325,22 @@ end)
 start_aggressive_master_search()
  | ---
  | ...
+test_run:cmd('stop server storage_1_b')
+ | ---
+ | - true
+ | ...
+rs1 = vshard.router.static.replicasets[util.replicasets[1]]
+ | ---
+ | ...
+replica = rs1.replicas[util.name_to_uuid.storage_1_b]
+ | ---
+ | ...
+-- Ensure the replica is not available. Otherwise RO requests sneak into it
+-- instead of waiting for master.
+test_run:wait_cond(function() return not replica:is_connected() end)
+ | ---
+ | - true
+ | ...
 
 forget_masters()
  | ---
@@ -337,8 +353,6 @@ vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
 forget_masters()
  | ---
  | ...
--- XXX: this should not depend on master so much. RO requests should be able to
--- go to replicas.
 vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)
  | ---
  | - 1
@@ -357,8 +371,6 @@ vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout)
 forget_masters()
  | ---
  | ...
--- XXX: the same as above - should not really wait for master. Regardless of it
--- being auto or not.
 vshard.router.route(1501):callro('echo', {1}, opts_big_timeout)
  | ---
  | - 1
@@ -369,6 +381,10 @@ vshard.router.route(1501):callro('echo', {1}, opts_big_timeout)
 stop_aggressive_master_search()
  | ---
  | ...
+test_run:cmd('start server storage_1_b')
+ | ---
+ | - true
+ | ...
 
 test_run:switch('storage_1_a')
  | ---
@@ -411,7 +427,7 @@ vshard.storage.cfg(cfg, instance_uuid)
  | ---
  | ...
 
--- Try to make RW and RO requests but then turn of the auto search.
+-- Try to make an RW request but then turn of the auto search.
 test_run:switch('router_1')
  | ---
  | - true
@@ -425,14 +441,6 @@ f1 = fiber.create(function()
 end)
  | ---
  | ...
--- XXX: should not really wait for master since this is an RO request. It could
--- use a replica.
-f2 = fiber.create(function()                                                    \
-    fiber.self():set_joinable(true)                                             \
-    return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)            \
-end)
- | ---
- | ...
 fiber.sleep(0.01)
  | ---
  | ...
@@ -449,6 +457,31 @@ f1:join()
  |   replicaset_uuid: <replicaset_1>
  |   message: Master is not configured for replicaset <replicaset_1>
  | ...
+
+-- Try to make an RO request but then turn of the auto search.
+test_run:cmd('stop server storage_1_a')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server storage_1_b')
+ | ---
+ | - true
+ | ...
+forget_masters()
+ | ---
+ | ...
+f2 = fiber.create(function()                                                    \
+    fiber.self():set_joinable(true)                                             \
+    return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)            \
+end)
+ | ---
+ | ...
+fiber.sleep(0.01)
+ | ---
+ | ...
+disable_auto_masters()
+ | ---
+ | ...
 f2:join()
  | ---
  | - true
@@ -459,6 +492,14 @@ f2:join()
  |   replicaset_uuid: <replicaset_1>
  |   message: Master is not configured for replicaset <replicaset_1>
  | ...
+test_run:cmd('start server storage_1_a')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server storage_1_b')
+ | ---
+ | - true
+ | ...
 
 --
 -- Multiple masters logging.
@@ -473,6 +514,9 @@ replicas = cfg.sharding[util.replicasets[1]].replicas
 replicas[util.name_to_uuid.storage_1_a].master = true
  | ---
  | ...
+replicas[util.name_to_uuid.storage_1_b].master = false
+ | ---
+ | ...
 vshard.storage.cfg(cfg, instance_uuid)
  | ---
  | ...
@@ -484,6 +528,9 @@ test_run:switch('storage_1_b')
 replicas = cfg.sharding[util.replicasets[1]].replicas
  | ---
  | ...
+replicas[util.name_to_uuid.storage_1_a].master = false
+ | ---
+ | ...
 replicas[util.name_to_uuid.storage_1_b].master = true
  | ---
  | ...
@@ -495,6 +542,25 @@ test_run:switch('router_1')
  | ---
  | - true
  | ...
+-- Ensure both replicas are connected. Otherwise the router can go to only one,
+-- find it is master, and won't go to the second one until the first one resigns
+-- or dies.
+rs1 = vshard.router.static.replicasets[util.replicasets[1]]
+ | ---
+ | ...
+replica1 = rs1.replicas[util.name_to_uuid.storage_1_a]
+ | ---
+ | ...
+replica2 = rs1.replicas[util.name_to_uuid.storage_1_b]
+ | ---
+ | ...
+test_run:wait_cond(function()                                                   \
+    return replica1:is_connected() and replica2:is_connected()                  \
+end)
+ | ---
+ | - true
+ | ...
+
 forget_masters()
  | ---
  | ...
diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua
index 6276dc9..8cacd6d 100644
--- a/test/router/master_discovery.test.lua
+++ b/test/router/master_discovery.test.lua
@@ -181,24 +181,27 @@ end)
 -- Call tries to wait for master if has enough time left.
 --
 start_aggressive_master_search()
+test_run:cmd('stop server storage_1_b')
+rs1 = vshard.router.static.replicasets[util.replicasets[1]]
+replica = rs1.replicas[util.name_to_uuid.storage_1_b]
+-- Ensure the replica is not available. Otherwise RO requests sneak into it
+-- instead of waiting for master.
+test_run:wait_cond(function() return not replica:is_connected() end)
 
 forget_masters()
 vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
 
 forget_masters()
--- XXX: this should not depend on master so much. RO requests should be able to
--- go to replicas.
 vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)
 
 forget_masters()
 vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout)
 
 forget_masters()
--- XXX: the same as above - should not really wait for master. Regardless of it
--- being auto or not.
 vshard.router.route(1501):callro('echo', {1}, opts_big_timeout)
 
 stop_aggressive_master_search()
+test_run:cmd('start server storage_1_b')
 
 test_run:switch('storage_1_a')
 assert(echo_count == 4)
@@ -218,23 +221,30 @@ replicas = cfg.sharding[util.replicasets[1]].replicas
 replicas[util.name_to_uuid.storage_1_a].master = false
 vshard.storage.cfg(cfg, instance_uuid)
 
--- Try to make RW and RO requests but then turn of the auto search.
+-- Try to make an RW request but then turn of the auto search.
 test_run:switch('router_1')
 forget_masters()
 f1 = fiber.create(function()                                                    \
     fiber.self():set_joinable(true)                                             \
     return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)            \
 end)
--- XXX: should not really wait for master since this is an RO request. It could
--- use a replica.
+fiber.sleep(0.01)
+disable_auto_masters()
+f1:join()
+
+-- Try to make an RO request but then turn of the auto search.
+test_run:cmd('stop server storage_1_a')
+test_run:cmd('stop server storage_1_b')
+forget_masters()
 f2 = fiber.create(function()                                                    \
     fiber.self():set_joinable(true)                                             \
     return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)            \
 end)
 fiber.sleep(0.01)
 disable_auto_masters()
-f1:join()
 f2:join()
+test_run:cmd('start server storage_1_a')
+test_run:cmd('start server storage_1_b')
 
 --
 -- Multiple masters logging.
@@ -242,14 +252,26 @@ f2:join()
 test_run:switch('storage_1_a')
 replicas = cfg.sharding[util.replicasets[1]].replicas
 replicas[util.name_to_uuid.storage_1_a].master = true
+replicas[util.name_to_uuid.storage_1_b].master = false
 vshard.storage.cfg(cfg, instance_uuid)
 
 test_run:switch('storage_1_b')
 replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = false
 replicas[util.name_to_uuid.storage_1_b].master = true
 vshard.storage.cfg(cfg, instance_uuid)
 
 test_run:switch('router_1')
+-- Ensure both replicas are connected. Otherwise the router can go to only one,
+-- find it is master, and won't go to the second one until the first one resigns
+-- or dies.
+rs1 = vshard.router.static.replicasets[util.replicasets[1]]
+replica1 = rs1.replicas[util.name_to_uuid.storage_1_a]
+replica2 = rs1.replicas[util.name_to_uuid.storage_1_b]
+test_run:wait_cond(function()                                                   \
+    return replica1:is_connected() and replica2:is_connected()                  \
+end)
+
 forget_masters()
 start_aggressive_master_search()
 test_run:wait_log('router_1', 'Found more than one master', nil, 10)
diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
index 174c761..93f210e 100644
--- a/vshard/replicaset.lua
+++ b/vshard/replicaset.lua
@@ -455,18 +455,25 @@ local function replicaset_template_multicallro(prefer_replica, balance)
                     return r
                 end
             end
-        elseif prefer_replica then
-            r = replicaset.replica
+        else
+            local start_r = replicaset.replica
+            r = start_r
             while r do
-                if r:is_connected() and r ~= master then
+                if r:is_connected() and (not prefer_replica or r ~= master) then
                     return r
                 end
                 r = r.next_by_priority
             end
-        else
-            r = replicaset.replica
-            if r and r:is_connected() then
-                return r
+            -- Iteration above could start not from the best prio replica.
+            -- Check the beginning of the list too.
+            for _, r in pairs(replicaset.priority_list) do
+                if r == start_r then
+                    -- Reached already checked part.
+                    break
+                end
+                if r:is_connected() and (not prefer_replica or r ~= master) then
+                    return r
+                end
             end
         end
     end
-- 
2.24.3 (Apple Git-128)