* [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 @ 2021-12-04 0:19 Vladislav Shpilevoy via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Vladislav Shpilevoy via Tarantool-patches ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-12-04 0:19 UTC (permalink / raw) To: tarantool-patches, olegrok The patchset makes a couple of changes needed to properly implement replica backoff feature. However they are useful regardless of the backoff, so I am sending them right away. Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-298-replica-backoff-part-1 Issue: https://github.com/tarantool/vshard/issues/298 Vladislav Shpilevoy (2): router: drop wait_connected from master discovery router: don't fallback RO to master right away test/failover/failover.result | 64 +++++++++++++------ test/failover/failover.test.lua | 30 +++++---- test/router/master_discovery.result | 92 +++++++++++++++++++++++---- test/router/master_discovery.test.lua | 38 ++++++++--- vshard/replicaset.lua | 27 ++++---- 5 files changed, 186 insertions(+), 65 deletions(-) -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery 2021-12-04 0:19 [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches @ 2021-12-04 0:19 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-05 17:44 ` Oleg Babin via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Vladislav Shpilevoy via Tarantool-patches 2021-12-07 22:38 ` [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches 2 siblings, 1 reply; 10+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-12-04 0:19 UTC (permalink / raw) To: tarantool-patches, olegrok Master discovery tried to wait for connection establishment for the discovery timeout for each instance in the replicaset. The problem is that if one of replicas is dead, the discovery will waste its entire timeout on just this waiting. For all the requests sent to connected replicas after this one it will have 0 timeout and won't properly wait for their results. For example, this is how master discovery could work: send requests: replica1 wait_connected + send, replica2 wait_connected fails on timeout replica3 wait_connected works if was connected + send collect responses: replica1 wait_result(0 timeout) replica2 skip replica3 wait_result(0 timeout) The entire timeout was wasted on 'replica2 wait_connected' during request sending. Replica1 result could be delivered fine because it was in progress while replica2 was waiting. So having 0 timeout in it is not a problem. It had time to be executed. But replica3's request has very few chances to be delivered in time. It was just sent and is collected almost immediately. The worst case is when the first replica is dead. Then it is very likely neither of requests will be delivered. Due to all result wait timeouts being 0. Although there is a certain chance that the next requests will be extra quick, so writing a stable test for that does not seem possible. The bug was discovered while working on #288. For its testing it was needed to stop one instance and master_discovery test started failing. --- vshard/replicaset.lua | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua index 55028bd..174c761 100644 --- a/vshard/replicaset.lua +++ b/vshard/replicaset.lua @@ -682,11 +682,7 @@ local function replicaset_locate_master(replicaset) local replicaset_uuid = replicaset.uuid for replica_uuid, replica in pairs(replicaset.replicas) do local conn = replica.conn - timeout, err = netbox_wait_connected(conn, timeout) - if not timeout then - last_err = err - timeout = deadline - fiber_clock() - else + if conn:is_connected() then ok, f = pcall(conn.call, conn, func, args, async_opts) if not ok then last_err = lerror.make(f) -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Vladislav Shpilevoy via Tarantool-patches @ 2021-12-05 17:44 ` Oleg Babin via Tarantool-patches 2021-12-06 23:34 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 10+ messages in thread From: Oleg Babin via Tarantool-patches @ 2021-12-05 17:44 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches Thanks for your patch. Looks reasonable. However I have one question. Before this patch if some connection was down we perform wait_connected and in some cases it could lead to successful reconnection. Currently we just skip broken connections and don't try to reconnect. Could it be a problem or we perform reconnection in another place? What will be happen if all connections will be down? On 04.12.2021 03:19, Vladislav Shpilevoy wrote: > Master discovery tried to wait for connection establishment for > the discovery timeout for each instance in the replicaset. > > The problem is that if one of replicas is dead, the discovery will > waste its entire timeout on just this waiting. For all the > requests sent to connected replicas after this one it will have 0 > timeout and won't properly wait for their results. > > For example, this is how master discovery could work: > > send requests: > replica1 wait_connected + send, > replica2 wait_connected fails on timeout > replica3 wait_connected works if was connected + send > > collect responses: > replica1 wait_result(0 timeout) > replica2 skip > replica3 wait_result(0 timeout) > > The entire timeout was wasted on 'replica2 wait_connected' during > request sending. Replica1 result could be delivered fine because > it was in progress while replica2 was waiting. So having 0 timeout > in it is not a problem. It had time to be executed. But replica3's > request has very few chances to be delivered in time. It was just > sent and is collected almost immediately. > > The worst case is when the first replica is dead. Then it is very > likely neither of requests will be delivered. Due to all result > wait timeouts being 0. > > Although there is a certain chance that the next requests will be > extra quick, so writing a stable test for that does not seem > possible. > > The bug was discovered while working on #288. For its testing it > was needed to stop one instance and master_discovery test started > failing. > --- > vshard/replicaset.lua | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua > index 55028bd..174c761 100644 > --- a/vshard/replicaset.lua > +++ b/vshard/replicaset.lua > @@ -682,11 +682,7 @@ local function replicaset_locate_master(replicaset) > local replicaset_uuid = replicaset.uuid > for replica_uuid, replica in pairs(replicaset.replicas) do > local conn = replica.conn > - timeout, err = netbox_wait_connected(conn, timeout) > - if not timeout then > - last_err = err > - timeout = deadline - fiber_clock() > - else > + if conn:is_connected() then > ok, f = pcall(conn.call, conn, func, args, async_opts) > if not ok then > last_err = lerror.make(f) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery 2021-12-05 17:44 ` Oleg Babin via Tarantool-patches @ 2021-12-06 23:34 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 0 siblings, 1 reply; 10+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-12-06 23:34 UTC (permalink / raw) To: Oleg Babin, tarantool-patches Hi! Thanks for the review! On 05.12.2021 18:44, Oleg Babin wrote: > Thanks for your patch. Looks reasonable. > > However I have one question. Before this patch if some > > connection was down we perform wait_connected and in some > > cases it could lead to successful reconnection. Currently we just skip > > broken connections and don't try to reconnect. > > Could it be a problem or we perform reconnection in another place? > > What will be happen if all connections will be down? All connections use reconnect_after option in netbox. Thus netbox handles reconnect. If all are down, netbox_wait_connected won't help either. So you will need to make several master discovery iterations anyway until one is found. Also before this patch netbox_wait_connected() couldn't boost successful reconnection anyhow. It was just waiting. It didn't affect the connection state. After this patch the waiting is dropped to try to talk to the disconnected nodes later on a next master discovery iteration. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery 2021-12-06 23:34 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 0 siblings, 0 replies; 10+ messages in thread From: Oleg Babin via Tarantool-patches @ 2021-12-07 6:49 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches Thanks for explanation. LGTM. On 07.12.2021 02:34, Vladislav Shpilevoy wrote: > Hi! Thanks for the review! > > On 05.12.2021 18:44, Oleg Babin wrote: >> Thanks for your patch. Looks reasonable. >> >> However I have one question. Before this patch if some >> >> connection was down we perform wait_connected and in some >> >> cases it could lead to successful reconnection. Currently we just skip >> >> broken connections and don't try to reconnect. >> >> Could it be a problem or we perform reconnection in another place? >> >> What will be happen if all connections will be down? > All connections use reconnect_after option in netbox. Thus netbox > handles reconnect. If all are down, netbox_wait_connected won't help > either. So you will need to make several master discovery iterations > anyway until one is found. > > Also before this patch netbox_wait_connected() couldn't boost > successful reconnection anyhow. It was just waiting. It didn't affect > the connection state. > > After this patch the waiting is dropped to try to talk to the > disconnected nodes later on a next master discovery iteration. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away 2021-12-04 0:19 [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Vladislav Shpilevoy via Tarantool-patches @ 2021-12-04 0:19 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-05 17:53 ` Oleg Babin via Tarantool-patches 2021-12-07 22:38 ` [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches 2 siblings, 1 reply; 10+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-12-04 0:19 UTC (permalink / raw) To: tarantool-patches, olegrok RO requests use the replica with the highest prio as specified in the weights matrix. If the best replica is not available now and failover didn't happen yet, then RO requests used to fallback to master. Even if there were other RO replicas with better prio. This patch makes RO call firstly try the currently selected most prio replica. If it is not available (no other connected replicas at all or failover didn't happen yet), the call will try to walk the prio list starting from this replica until it finds an available one. If it also fails, the call will try to walk the list from the beginning hoping that the unavailable replica wasn't the best one and there might be better option on the other side of the prio list. The patch was done in scope of task about replica backoff (#298) because the problem would additionally exist when the best replica is in backoff, not only disconnected. It would get worse. Closes #288 Needed for #298 --- test/failover/failover.result | 64 +++++++++++++------ test/failover/failover.test.lua | 30 +++++---- test/router/master_discovery.result | 92 +++++++++++++++++++++++---- test/router/master_discovery.test.lua | 38 ++++++++--- vshard/replicaset.lua | 21 ++++-- 5 files changed, 185 insertions(+), 60 deletions(-) diff --git a/test/failover/failover.result b/test/failover/failover.result index bae57fa..31eda0c 100644 --- a/test/failover/failover.result +++ b/test/failover/failover.result @@ -85,8 +85,8 @@ test_run:cmd('switch default') -- available; -- * up nearest replica priority if the best one is available -- again; --- * replicaset uses master connection, if the nearest's one is --- not available before call(); +-- * replicaset uses next prio available connection, if the +-- nearest's one is not available before call(); -- * current nearest connection is not down, when trying to -- connect to the replica with less weight. -- @@ -199,30 +199,17 @@ test_run:cmd('stop server box_1_d') - true ... -- Down_ts must be set in on_disconnect() trigger. -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] --- ... --- Try to execute read-only request - it must use master --- connection, because a replica's one is not available. -vshard.router.call(1, 'read', 'echo', {123}) ---- -- 123 -... -test_run:switch('box_1_a') ---- -- true -... -echo_count ---- -- 2 -... -test_run:switch('router_1') +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) --- - true ... -- New replica is box_1_b. -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) --- +- true ... rs1.replica.down_ts == nil --- @@ -247,14 +234,49 @@ test_run:cmd('switch box_1_b') ... -- Ensure the 'read' echo was executed on box_1_b - nearest -- available replica. -echo_count +assert(echo_count == 1) --- -- 1 +- true +... +test_run:switch('router_1') +--- +- true +... +-- +-- Kill the best replica. Don't need to wait for failover to happen for the +-- router to start using the next best available replica. +-- +test_run:cmd('stop server box_1_b') +--- +- true +... +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] +--- +... +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) +--- +- true +... +vshard.router.callro(1, 'echo', {123}) +--- +- 123 +... +test_run:switch('box_1_c') +--- +- true +... +assert(echo_count == 1) +--- +- true ... test_run:switch('router_1') --- - true ... +test_run:cmd('start server box_1_b') +--- +- true +... -- Revive the best replica. A router must reconnect to it in -- FAILOVER_UP_TIMEOUT seconds. test_run:cmd('start server box_1_d') diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua index a969e0e..b713319 100644 --- a/test/failover/failover.test.lua +++ b/test/failover/failover.test.lua @@ -43,8 +43,8 @@ test_run:cmd('switch default') -- available; -- * up nearest replica priority if the best one is available -- again; --- * replicaset uses master connection, if the nearest's one is --- not available before call(); +-- * replicaset uses next prio available connection, if the +-- nearest's one is not available before call(); -- * current nearest connection is not down, when trying to -- connect to the replica with less weight. -- @@ -86,15 +86,10 @@ rs1.replica_up_ts - old_up_ts >= vshard.consts.FAILOVER_UP_TIMEOUT -- box_1_d. test_run:cmd('stop server box_1_d') -- Down_ts must be set in on_disconnect() trigger. -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end --- Try to execute read-only request - it must use master --- connection, because a replica's one is not available. -vshard.router.call(1, 'read', 'echo', {123}) -test_run:switch('box_1_a') -echo_count -test_run:switch('router_1') +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) -- New replica is box_1_b. -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) rs1.replica.down_ts == nil rs1.replica_up_ts ~= nil test_run:grep_log('router_1', 'New replica box_1_b%(storage%@') @@ -103,8 +98,21 @@ vshard.router.callro(1, 'echo', {123}) test_run:cmd('switch box_1_b') -- Ensure the 'read' echo was executed on box_1_b - nearest -- available replica. -echo_count +assert(echo_count == 1) +test_run:switch('router_1') + +-- +-- gh-288: kill the best replica. Don't need to wait for failover to happen for +-- the router to start using the next best available replica. +-- +test_run:cmd('stop server box_1_b') +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) +vshard.router.callro(1, 'echo', {123}) +test_run:switch('box_1_c') +assert(echo_count == 1) test_run:switch('router_1') +test_run:cmd('start server box_1_b') -- Revive the best replica. A router must reconnect to it in -- FAILOVER_UP_TIMEOUT seconds. diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result index 7ebb67d..94a848b 100644 --- a/test/router/master_discovery.result +++ b/test/router/master_discovery.result @@ -325,6 +325,22 @@ end) start_aggressive_master_search() | --- | ... +test_run:cmd('stop server storage_1_b') + | --- + | - true + | ... +rs1 = vshard.router.static.replicasets[util.replicasets[1]] + | --- + | ... +replica = rs1.replicas[util.name_to_uuid.storage_1_b] + | --- + | ... +-- Ensure the replica is not available. Otherwise RO requests sneak into it +-- instead of waiting for master. +test_run:wait_cond(function() return not replica:is_connected() end) + | --- + | - true + | ... forget_masters() | --- @@ -337,8 +353,6 @@ vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) forget_masters() | --- | ... --- XXX: this should not depend on master so much. RO requests should be able to --- go to replicas. vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) | --- | - 1 @@ -357,8 +371,6 @@ vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) forget_masters() | --- | ... --- XXX: the same as above - should not really wait for master. Regardless of it --- being auto or not. vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) | --- | - 1 @@ -369,6 +381,10 @@ vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) stop_aggressive_master_search() | --- | ... +test_run:cmd('start server storage_1_b') + | --- + | - true + | ... test_run:switch('storage_1_a') | --- @@ -411,7 +427,7 @@ vshard.storage.cfg(cfg, instance_uuid) | --- | ... --- Try to make RW and RO requests but then turn of the auto search. +-- Try to make an RW request but then turn of the auto search. test_run:switch('router_1') | --- | - true @@ -425,14 +441,6 @@ f1 = fiber.create(function() end) | --- | ... --- XXX: should not really wait for master since this is an RO request. It could --- use a replica. -f2 = fiber.create(function() \ - fiber.self():set_joinable(true) \ - return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ -end) - | --- - | ... fiber.sleep(0.01) | --- | ... @@ -449,6 +457,31 @@ f1:join() | replicaset_uuid: <replicaset_1> | message: Master is not configured for replicaset <replicaset_1> | ... + +-- Try to make an RO request but then turn of the auto search. +test_run:cmd('stop server storage_1_a') + | --- + | - true + | ... +test_run:cmd('stop server storage_1_b') + | --- + | - true + | ... +forget_masters() + | --- + | ... +f2 = fiber.create(function() \ + fiber.self():set_joinable(true) \ + return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ +end) + | --- + | ... +fiber.sleep(0.01) + | --- + | ... +disable_auto_masters() + | --- + | ... f2:join() | --- | - true @@ -459,6 +492,14 @@ f2:join() | replicaset_uuid: <replicaset_1> | message: Master is not configured for replicaset <replicaset_1> | ... +test_run:cmd('start server storage_1_a') + | --- + | - true + | ... +test_run:cmd('start server storage_1_b') + | --- + | - true + | ... -- -- Multiple masters logging. @@ -473,6 +514,9 @@ replicas = cfg.sharding[util.replicasets[1]].replicas replicas[util.name_to_uuid.storage_1_a].master = true | --- | ... +replicas[util.name_to_uuid.storage_1_b].master = false + | --- + | ... vshard.storage.cfg(cfg, instance_uuid) | --- | ... @@ -484,6 +528,9 @@ test_run:switch('storage_1_b') replicas = cfg.sharding[util.replicasets[1]].replicas | --- | ... +replicas[util.name_to_uuid.storage_1_a].master = false + | --- + | ... replicas[util.name_to_uuid.storage_1_b].master = true | --- | ... @@ -495,6 +542,25 @@ test_run:switch('router_1') | --- | - true | ... +-- Ensure both replicas are connected. Otherwise the router can go to only one, +-- find it is master, and won't go to the second one until the first one resigns +-- or dies. +rs1 = vshard.router.static.replicasets[util.replicasets[1]] + | --- + | ... +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] + | --- + | ... +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] + | --- + | ... +test_run:wait_cond(function() \ + return replica1:is_connected() and replica2:is_connected() \ +end) + | --- + | - true + | ... + forget_masters() | --- | ... diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua index 6276dc9..8cacd6d 100644 --- a/test/router/master_discovery.test.lua +++ b/test/router/master_discovery.test.lua @@ -181,24 +181,27 @@ end) -- Call tries to wait for master if has enough time left. -- start_aggressive_master_search() +test_run:cmd('stop server storage_1_b') +rs1 = vshard.router.static.replicasets[util.replicasets[1]] +replica = rs1.replicas[util.name_to_uuid.storage_1_b] +-- Ensure the replica is not available. Otherwise RO requests sneak into it +-- instead of waiting for master. +test_run:wait_cond(function() return not replica:is_connected() end) forget_masters() vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) forget_masters() --- XXX: this should not depend on master so much. RO requests should be able to --- go to replicas. vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) forget_masters() vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) forget_masters() --- XXX: the same as above - should not really wait for master. Regardless of it --- being auto or not. vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) stop_aggressive_master_search() +test_run:cmd('start server storage_1_b') test_run:switch('storage_1_a') assert(echo_count == 4) @@ -218,23 +221,30 @@ replicas = cfg.sharding[util.replicasets[1]].replicas replicas[util.name_to_uuid.storage_1_a].master = false vshard.storage.cfg(cfg, instance_uuid) --- Try to make RW and RO requests but then turn of the auto search. +-- Try to make an RW request but then turn of the auto search. test_run:switch('router_1') forget_masters() f1 = fiber.create(function() \ fiber.self():set_joinable(true) \ return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) \ end) --- XXX: should not really wait for master since this is an RO request. It could --- use a replica. +fiber.sleep(0.01) +disable_auto_masters() +f1:join() + +-- Try to make an RO request but then turn of the auto search. +test_run:cmd('stop server storage_1_a') +test_run:cmd('stop server storage_1_b') +forget_masters() f2 = fiber.create(function() \ fiber.self():set_joinable(true) \ return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ end) fiber.sleep(0.01) disable_auto_masters() -f1:join() f2:join() +test_run:cmd('start server storage_1_a') +test_run:cmd('start server storage_1_b') -- -- Multiple masters logging. @@ -242,14 +252,26 @@ f2:join() test_run:switch('storage_1_a') replicas = cfg.sharding[util.replicasets[1]].replicas replicas[util.name_to_uuid.storage_1_a].master = true +replicas[util.name_to_uuid.storage_1_b].master = false vshard.storage.cfg(cfg, instance_uuid) test_run:switch('storage_1_b') replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = false replicas[util.name_to_uuid.storage_1_b].master = true vshard.storage.cfg(cfg, instance_uuid) test_run:switch('router_1') +-- Ensure both replicas are connected. Otherwise the router can go to only one, +-- find it is master, and won't go to the second one until the first one resigns +-- or dies. +rs1 = vshard.router.static.replicasets[util.replicasets[1]] +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] +test_run:wait_cond(function() \ + return replica1:is_connected() and replica2:is_connected() \ +end) + forget_masters() start_aggressive_master_search() test_run:wait_log('router_1', 'Found more than one master', nil, 10) diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua index 174c761..93f210e 100644 --- a/vshard/replicaset.lua +++ b/vshard/replicaset.lua @@ -455,18 +455,25 @@ local function replicaset_template_multicallro(prefer_replica, balance) return r end end - elseif prefer_replica then - r = replicaset.replica + else + local start_r = replicaset.replica + r = start_r while r do - if r:is_connected() and r ~= master then + if r:is_connected() and (not prefer_replica or r ~= master) then return r end r = r.next_by_priority end - else - r = replicaset.replica - if r and r:is_connected() then - return r + -- Iteration above could start not from the best prio replica. + -- Check the beginning of the list too. + for _, r in pairs(replicaset.priority_list) do + if r == start_r then + -- Reached already checked part. + break + end + if r:is_connected() and (not prefer_replica or r ~= master) then + return r + end end end end -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Vladislav Shpilevoy via Tarantool-patches @ 2021-12-05 17:53 ` Oleg Babin via Tarantool-patches 2021-12-06 23:56 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 10+ messages in thread From: Oleg Babin via Tarantool-patches @ 2021-12-05 17:53 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches Thanks for your patch. See one comment below. Also as I see CI is red on your branch. Is it related to your changes? On 04.12.2021 03:19, Vladislav Shpilevoy wrote: > RO requests use the replica with the highest prio as specified in > the weights matrix. > > If the best replica is not available now and failover didn't > happen yet, then RO requests used to fallback to master. Even if > there were other RO replicas with better prio. > > This patch makes RO call firstly try the currently selected most > prio replica. If it is not available (no other connected replicas > at all or failover didn't happen yet), the call will try to walk > the prio list starting from this replica until it finds an > available one. > > If it also fails, the call will try to walk the list from the > beginning hoping that the unavailable replica wasn't the best one > and there might be better option on the other side of the prio > list. > > The patch was done in scope of task about replica backoff (#298) > because the problem would additionally exist when the best replica > is in backoff, not only disconnected. It would get worse. > > Closes #288 > Needed for #298 > --- > test/failover/failover.result | 64 +++++++++++++------ > test/failover/failover.test.lua | 30 +++++---- > test/router/master_discovery.result | 92 +++++++++++++++++++++++---- > test/router/master_discovery.test.lua | 38 ++++++++--- > vshard/replicaset.lua | 21 ++++-- > 5 files changed, 185 insertions(+), 60 deletions(-) > > diff --git a/test/failover/failover.result b/test/failover/failover.result > index bae57fa..31eda0c 100644 > --- a/test/failover/failover.result > +++ b/test/failover/failover.result > @@ -85,8 +85,8 @@ test_run:cmd('switch default') > -- available; > -- * up nearest replica priority if the best one is available > -- again; > --- * replicaset uses master connection, if the nearest's one is > --- not available before call(); > +-- * replicaset uses next prio available connection, if the > +-- nearest's one is not available before call(); > -- * current nearest connection is not down, when trying to > -- connect to the replica with less weight. > -- > @@ -199,30 +199,17 @@ test_run:cmd('stop server box_1_d') > - true > ... > -- Down_ts must be set in on_disconnect() trigger. > -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end > +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] > --- > ... > --- Try to execute read-only request - it must use master > --- connection, because a replica's one is not available. > -vshard.router.call(1, 'read', 'echo', {123}) > ---- > -- 123 > -... > -test_run:switch('box_1_a') > ---- > -- true > -... > -echo_count > ---- > -- 2 > -... > -test_run:switch('router_1') > +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) > --- > - true > ... > -- New replica is box_1_b. > -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end > +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) > --- > +- true > ... > rs1.replica.down_ts == nil > --- > @@ -247,14 +234,49 @@ test_run:cmd('switch box_1_b') > ... > -- Ensure the 'read' echo was executed on box_1_b - nearest > -- available replica. > -echo_count > +assert(echo_count == 1) > --- > -- 1 > +- true > +... > +test_run:switch('router_1') > +--- > +- true > +... > +-- > +-- Kill the best replica. Don't need to wait for failover to happen for the > +-- router to start using the next best available replica. > +-- > +test_run:cmd('stop server box_1_b') > +--- > +- true > +... > +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] > +--- > +... > +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) > +--- > +- true > +... > +vshard.router.callro(1, 'echo', {123}) > +--- > +- 123 > +... > +test_run:switch('box_1_c') > +--- > +- true > +... > +assert(echo_count == 1) > +--- > +- true > ... > test_run:switch('router_1') > --- > - true > ... > +test_run:cmd('start server box_1_b') > +--- > +- true > +... > -- Revive the best replica. A router must reconnect to it in > -- FAILOVER_UP_TIMEOUT seconds. > test_run:cmd('start server box_1_d') > diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua > index a969e0e..b713319 100644 > --- a/test/failover/failover.test.lua > +++ b/test/failover/failover.test.lua > @@ -43,8 +43,8 @@ test_run:cmd('switch default') > -- available; > -- * up nearest replica priority if the best one is available > -- again; > --- * replicaset uses master connection, if the nearest's one is > --- not available before call(); > +-- * replicaset uses next prio available connection, if the > +-- nearest's one is not available before call(); > -- * current nearest connection is not down, when trying to > -- connect to the replica with less weight. > -- > @@ -86,15 +86,10 @@ rs1.replica_up_ts - old_up_ts >= vshard.consts.FAILOVER_UP_TIMEOUT > -- box_1_d. > test_run:cmd('stop server box_1_d') > -- Down_ts must be set in on_disconnect() trigger. > -while rs1.replica.down_ts == nil do fiber.sleep(0.1) end > --- Try to execute read-only request - it must use master > --- connection, because a replica's one is not available. > -vshard.router.call(1, 'read', 'echo', {123}) > -test_run:switch('box_1_a') > -echo_count > -test_run:switch('router_1') > +box_1_d = rs1.replicas[names.replica_uuid.box_1_d] > +test_run:wait_cond(function() return box_1_d.down_ts ~= nil end) > -- New replica is box_1_b. > -while rs1.replica.name ~= 'box_1_b' do fiber.sleep(0.1) end > +test_run:wait_cond(function() return rs1.replica.name == 'box_1_b' end) > rs1.replica.down_ts == nil > rs1.replica_up_ts ~= nil > test_run:grep_log('router_1', 'New replica box_1_b%(storage%@') > @@ -103,8 +98,21 @@ vshard.router.callro(1, 'echo', {123}) > test_run:cmd('switch box_1_b') > -- Ensure the 'read' echo was executed on box_1_b - nearest > -- available replica. > -echo_count > +assert(echo_count == 1) > +test_run:switch('router_1') > + > +-- > +-- gh-288: kill the best replica. Don't need to wait for failover to happen for > +-- the router to start using the next best available replica. > +-- > +test_run:cmd('stop server box_1_b') > +box_1_b = rs1.replicas[names.replica_uuid.box_1_b] > +test_run:wait_cond(function() return box_1_b.down_ts ~= nil end) > +vshard.router.callro(1, 'echo', {123}) > +test_run:switch('box_1_c') > +assert(echo_count == 1) > test_run:switch('router_1') > +test_run:cmd('start server box_1_b') > > -- Revive the best replica. A router must reconnect to it in > -- FAILOVER_UP_TIMEOUT seconds. > diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result > index 7ebb67d..94a848b 100644 > --- a/test/router/master_discovery.result > +++ b/test/router/master_discovery.result > @@ -325,6 +325,22 @@ end) > start_aggressive_master_search() > | --- > | ... > +test_run:cmd('stop server storage_1_b') > + | --- > + | - true > + | ... > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > + | --- > + | ... > +replica = rs1.replicas[util.name_to_uuid.storage_1_b] > + | --- > + | ... > +-- Ensure the replica is not available. Otherwise RO requests sneak into it > +-- instead of waiting for master. > +test_run:wait_cond(function() return not replica:is_connected() end) > + | --- > + | - true > + | ... > > forget_masters() > | --- > @@ -337,8 +353,6 @@ vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) > forget_masters() > | --- > | ... > --- XXX: this should not depend on master so much. RO requests should be able to > --- go to replicas. > vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) > | --- > | - 1 > @@ -357,8 +371,6 @@ vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) > forget_masters() > | --- > | ... > --- XXX: the same as above - should not really wait for master. Regardless of it > --- being auto or not. > vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) > | --- > | - 1 > @@ -369,6 +381,10 @@ vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) > stop_aggressive_master_search() > | --- > | ... > +test_run:cmd('start server storage_1_b') > + | --- > + | - true > + | ... > > test_run:switch('storage_1_a') > | --- > @@ -411,7 +427,7 @@ vshard.storage.cfg(cfg, instance_uuid) > | --- > | ... > > --- Try to make RW and RO requests but then turn of the auto search. > +-- Try to make an RW request but then turn of the auto search. > test_run:switch('router_1') > | --- > | - true > @@ -425,14 +441,6 @@ f1 = fiber.create(function() > end) > | --- > | ... > --- XXX: should not really wait for master since this is an RO request. It could > --- use a replica. > -f2 = fiber.create(function() \ > - fiber.self():set_joinable(true) \ > - return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ > -end) > - | --- > - | ... > fiber.sleep(0.01) > | --- > | ... > @@ -449,6 +457,31 @@ f1:join() > | replicaset_uuid: <replicaset_1> > | message: Master is not configured for replicaset <replicaset_1> > | ... > + > +-- Try to make an RO request but then turn of the auto search. > +test_run:cmd('stop server storage_1_a') > + | --- > + | - true > + | ... > +test_run:cmd('stop server storage_1_b') > + | --- > + | - true > + | ... > +forget_masters() > + | --- > + | ... > +f2 = fiber.create(function() \ > + fiber.self():set_joinable(true) \ > + return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ > +end) > + | --- > + | ... > +fiber.sleep(0.01) > + | --- > + | ... > +disable_auto_masters() > + | --- > + | ... > f2:join() > | --- > | - true > @@ -459,6 +492,14 @@ f2:join() > | replicaset_uuid: <replicaset_1> > | message: Master is not configured for replicaset <replicaset_1> > | ... > +test_run:cmd('start server storage_1_a') > + | --- > + | - true > + | ... > +test_run:cmd('start server storage_1_b') > + | --- > + | - true > + | ... > > -- > -- Multiple masters logging. > @@ -473,6 +514,9 @@ replicas = cfg.sharding[util.replicasets[1]].replicas > replicas[util.name_to_uuid.storage_1_a].master = true > | --- > | ... > +replicas[util.name_to_uuid.storage_1_b].master = false > + | --- > + | ... > vshard.storage.cfg(cfg, instance_uuid) > | --- > | ... > @@ -484,6 +528,9 @@ test_run:switch('storage_1_b') > replicas = cfg.sharding[util.replicasets[1]].replicas > | --- > | ... > +replicas[util.name_to_uuid.storage_1_a].master = false > + | --- > + | ... > replicas[util.name_to_uuid.storage_1_b].master = true > | --- > | ... > @@ -495,6 +542,25 @@ test_run:switch('router_1') > | --- > | - true > | ... > +-- Ensure both replicas are connected. Otherwise the router can go to only one, > +-- find it is master, and won't go to the second one until the first one resigns > +-- or dies. > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > + | --- > + | ... > +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] > + | --- > + | ... > +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] > + | --- > + | ... > +test_run:wait_cond(function() \ > + return replica1:is_connected() and replica2:is_connected() \ > +end) > + | --- > + | - true > + | ... > + > forget_masters() > | --- > | ... > diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua > index 6276dc9..8cacd6d 100644 > --- a/test/router/master_discovery.test.lua > +++ b/test/router/master_discovery.test.lua > @@ -181,24 +181,27 @@ end) > -- Call tries to wait for master if has enough time left. > -- > start_aggressive_master_search() > +test_run:cmd('stop server storage_1_b') > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > +replica = rs1.replicas[util.name_to_uuid.storage_1_b] > +-- Ensure the replica is not available. Otherwise RO requests sneak into it > +-- instead of waiting for master. > +test_run:wait_cond(function() return not replica:is_connected() end) > > forget_masters() > vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) > > forget_masters() > --- XXX: this should not depend on master so much. RO requests should be able to > --- go to replicas. > vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) > > forget_masters() > vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) > > forget_masters() > --- XXX: the same as above - should not really wait for master. Regardless of it > --- being auto or not. > vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) > > stop_aggressive_master_search() > +test_run:cmd('start server storage_1_b') > > test_run:switch('storage_1_a') > assert(echo_count == 4) > @@ -218,23 +221,30 @@ replicas = cfg.sharding[util.replicasets[1]].replicas > replicas[util.name_to_uuid.storage_1_a].master = false > vshard.storage.cfg(cfg, instance_uuid) > > --- Try to make RW and RO requests but then turn of the auto search. > +-- Try to make an RW request but then turn of the auto search. > test_run:switch('router_1') > forget_masters() > f1 = fiber.create(function() \ > fiber.self():set_joinable(true) \ > return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) \ > end) > --- XXX: should not really wait for master since this is an RO request. It could > --- use a replica. > +fiber.sleep(0.01) > +disable_auto_masters() > +f1:join() > + > +-- Try to make an RO request but then turn of the auto search. > +test_run:cmd('stop server storage_1_a') > +test_run:cmd('stop server storage_1_b') > +forget_masters() > f2 = fiber.create(function() \ > fiber.self():set_joinable(true) \ > return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ > end) > fiber.sleep(0.01) > disable_auto_masters() > -f1:join() > f2:join() > +test_run:cmd('start server storage_1_a') > +test_run:cmd('start server storage_1_b') > > -- > -- Multiple masters logging. > @@ -242,14 +252,26 @@ f2:join() > test_run:switch('storage_1_a') > replicas = cfg.sharding[util.replicasets[1]].replicas > replicas[util.name_to_uuid.storage_1_a].master = true > +replicas[util.name_to_uuid.storage_1_b].master = false > vshard.storage.cfg(cfg, instance_uuid) > > test_run:switch('storage_1_b') > replicas = cfg.sharding[util.replicasets[1]].replicas > +replicas[util.name_to_uuid.storage_1_a].master = false > replicas[util.name_to_uuid.storage_1_b].master = true > vshard.storage.cfg(cfg, instance_uuid) > > test_run:switch('router_1') > +-- Ensure both replicas are connected. Otherwise the router can go to only one, > +-- find it is master, and won't go to the second one until the first one resigns > +-- or dies. > +rs1 = vshard.router.static.replicasets[util.replicasets[1]] > +replica1 = rs1.replicas[util.name_to_uuid.storage_1_a] > +replica2 = rs1.replicas[util.name_to_uuid.storage_1_b] > +test_run:wait_cond(function() \ > + return replica1:is_connected() and replica2:is_connected() \ > +end) > + > forget_masters() > start_aggressive_master_search() > test_run:wait_log('router_1', 'Found more than one master', nil, 10) > diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua > index 174c761..93f210e 100644 > --- a/vshard/replicaset.lua > +++ b/vshard/replicaset.lua > @@ -455,18 +455,25 @@ local function replicaset_template_multicallro(prefer_replica, balance) > return r > end > end > - elseif prefer_replica then > - r = replicaset.replica > + else > + local start_r = replicaset.replica > + r = start_r > while r do > - if r:is_connected() and r ~= master then > + if r:is_connected() and (not prefer_replica or r ~= master) then > return r > end > r = r.next_by_priority > end > - else > - r = replicaset.replica > - if r and r:is_connected() then > - return r > + -- Iteration above could start not from the best prio replica. > + -- Check the beginning of the list too. > + for _, r in pairs(replicaset.priority_list) do pairs can't be jitted (at least now until https://github.com/tarantool/tarantool/issues/6475). Also pairs don't give any guarantees about iteration order. Could it be replaced with ipairs? > + if r == start_r then > + -- Reached already checked part. > + break > + end > + if r:is_connected() and (not prefer_replica or r ~= master) then > + return r > + end > end > end > end ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away 2021-12-05 17:53 ` Oleg Babin via Tarantool-patches @ 2021-12-06 23:56 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 0 siblings, 1 reply; 10+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-12-06 23:56 UTC (permalink / raw) To: Oleg Babin, tarantool-patches Thanks for the review! On 05.12.2021 18:53, Oleg Babin wrote: > Thanks for your patch. See one comment below. > > Also as I see CI is red on your branch. Is it related to your changes? Yes, I didn't update .result file in one of the tests. Done now. >> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua >> index 174c761..93f210e 100644 >> --- a/vshard/replicaset.lua >> +++ b/vshard/replicaset.lua >> @@ -455,18 +455,25 @@ local function replicaset_template_multicallro(prefer_replica, balance) >> return r >> end >> end >> - elseif prefer_replica then >> - r = replicaset.replica >> + else >> + local start_r = replicaset.replica >> + r = start_r >> while r do >> - if r:is_connected() and r ~= master then >> + if r:is_connected() and (not prefer_replica or r ~= master) then >> return r >> end >> r = r.next_by_priority >> end >> - else >> - r = replicaset.replica >> - if r and r:is_connected() then >> - return r >> + -- Iteration above could start not from the best prio replica. >> + -- Check the beginning of the list too. >> + for _, r in pairs(replicaset.priority_list) do > > pairs can't be jitted (at least now until https://github.com/tarantool/tarantool/issues/6475). > > Also pairs don't give any guarantees about iteration order. Could it be replaced with ipairs? It can be. Replaced with ipairs on the branch. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away 2021-12-06 23:56 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 0 siblings, 0 replies; 10+ messages in thread From: Oleg Babin via Tarantool-patches @ 2021-12-07 6:49 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches Thanks for your fixes. LGTM. On 07.12.2021 02:56, Vladislav Shpilevoy wrote: > Thanks for the review! > > On 05.12.2021 18:53, Oleg Babin wrote: >> Thanks for your patch. See one comment below. >> >> Also as I see CI is red on your branch. Is it related to your changes? > Yes, I didn't update .result file in one of the tests. Done now. > >>> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua >>> index 174c761..93f210e 100644 >>> --- a/vshard/replicaset.lua >>> +++ b/vshard/replicaset.lua >>> @@ -455,18 +455,25 @@ local function replicaset_template_multicallro(prefer_replica, balance) >>> return r >>> end >>> end >>> - elseif prefer_replica then >>> - r = replicaset.replica >>> + else >>> + local start_r = replicaset.replica >>> + r = start_r >>> while r do >>> - if r:is_connected() and r ~= master then >>> + if r:is_connected() and (not prefer_replica or r ~= master) then >>> return r >>> end >>> r = r.next_by_priority >>> end >>> - else >>> - r = replicaset.replica >>> - if r and r:is_connected() then >>> - return r >>> + -- Iteration above could start not from the best prio replica. >>> + -- Check the beginning of the list too. >>> + for _, r in pairs(replicaset.priority_list) do >> pairs can't be jitted (at least now until https://github.com/tarantool/tarantool/issues/6475). >> >> Also pairs don't give any guarantees about iteration order. Could it be replaced with ipairs? > It can be. Replaced with ipairs on the branch. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 2021-12-04 0:19 [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Vladislav Shpilevoy via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Vladislav Shpilevoy via Tarantool-patches @ 2021-12-07 22:38 ` Vladislav Shpilevoy via Tarantool-patches 2 siblings, 0 replies; 10+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-12-07 22:38 UTC (permalink / raw) To: tarantool-patches, olegrok Pushed to master. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-12-07 22:38 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-04 0:19 [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Vladislav Shpilevoy via Tarantool-patches 2021-12-05 17:44 ` Oleg Babin via Tarantool-patches 2021-12-06 23:34 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Vladislav Shpilevoy via Tarantool-patches 2021-12-05 17:53 ` Oleg Babin via Tarantool-patches 2021-12-06 23:56 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 2021-12-07 22:38 ` [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox