Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tarantool-patches@dev.tarantool.org, olegrok@tarantool.org
Subject: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery
Date: Sat,  4 Dec 2021 01:19:37 +0100	[thread overview]
Message-ID: <3274e75ae8fc8a996645991aa2f663a72319aa1f.1638577114.git.v.shpilevoy@tarantool.org> (raw)
In-Reply-To: <cover.1638577114.git.v.shpilevoy@tarantool.org>

Master discovery tried to wait for connection establishment for
the discovery timeout for each instance in the replicaset.

The problem is that if one of replicas is dead, the discovery will
waste its entire timeout on just this waiting. For all the
requests sent to connected replicas after this one it will have 0
timeout and won't properly wait for their results.

For example, this is how master discovery could work:

    send requests:
        replica1 wait_connected + send,
        replica2 wait_connected fails on timeout
        replica3 wait_connected works if was connected + send

    collect responses:
        replica1 wait_result(0 timeout)
        replica2 skip
        replica3 wait_result(0 timeout)

The entire timeout was wasted on 'replica2 wait_connected' during
request sending. Replica1 result could be delivered fine because
it was in progress while replica2 was waiting. So having 0 timeout
in it is not a problem. It had time to be executed. But replica3's
request has very few chances to be delivered in time. It was just
sent and is collected almost immediately.

The worst case is when the first replica is dead. Then it is very
likely neither of requests will be delivered. Due to all result
wait timeouts being 0.

Although there is a certain chance that the next requests will be
extra quick, so writing a stable test for that does not seem
possible.

The bug was discovered while working on #288. For its testing it
was needed to stop one instance and master_discovery test started
failing.
---
 vshard/replicaset.lua | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
index 55028bd..174c761 100644
--- a/vshard/replicaset.lua
+++ b/vshard/replicaset.lua
@@ -682,11 +682,7 @@ local function replicaset_locate_master(replicaset)
     local replicaset_uuid = replicaset.uuid
     for replica_uuid, replica in pairs(replicaset.replicas) do
         local conn = replica.conn
-        timeout, err = netbox_wait_connected(conn, timeout)
-        if not timeout then
-            last_err = err
-            timeout = deadline - fiber_clock()
-        else
+        if conn:is_connected() then
             ok, f = pcall(conn.call, conn, func, args, async_opts)
             if not ok then
                 last_err = lerror.make(f)
-- 
2.24.3 (Apple Git-128)


  reply	other threads:[~2021-12-04  0:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-04  0:19 [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches
2021-12-04  0:19 ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-12-05 17:44   ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Oleg Babin via Tarantool-patches
2021-12-06 23:34     ` Vladislav Shpilevoy via Tarantool-patches
2021-12-07  6:49       ` Oleg Babin via Tarantool-patches
2021-12-04  0:19 ` [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Vladislav Shpilevoy via Tarantool-patches
2021-12-05 17:53   ` Oleg Babin via Tarantool-patches
2021-12-06 23:56     ` Vladislav Shpilevoy via Tarantool-patches
2021-12-07  6:49       ` Oleg Babin via Tarantool-patches
2021-12-07 22:38 ` [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3274e75ae8fc8a996645991aa2f663a72319aa1f.1638577114.git.v.shpilevoy@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox