From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: tarantool-patches@dev.tarantool.org, olegrok@tarantool.org Subject: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Date: Sat, 4 Dec 2021 01:19:37 +0100 [thread overview] Message-ID: <3274e75ae8fc8a996645991aa2f663a72319aa1f.1638577114.git.v.shpilevoy@tarantool.org> (raw) In-Reply-To: <cover.1638577114.git.v.shpilevoy@tarantool.org> Master discovery tried to wait for connection establishment for the discovery timeout for each instance in the replicaset. The problem is that if one of replicas is dead, the discovery will waste its entire timeout on just this waiting. For all the requests sent to connected replicas after this one it will have 0 timeout and won't properly wait for their results. For example, this is how master discovery could work: send requests: replica1 wait_connected + send, replica2 wait_connected fails on timeout replica3 wait_connected works if was connected + send collect responses: replica1 wait_result(0 timeout) replica2 skip replica3 wait_result(0 timeout) The entire timeout was wasted on 'replica2 wait_connected' during request sending. Replica1 result could be delivered fine because it was in progress while replica2 was waiting. So having 0 timeout in it is not a problem. It had time to be executed. But replica3's request has very few chances to be delivered in time. It was just sent and is collected almost immediately. The worst case is when the first replica is dead. Then it is very likely neither of requests will be delivered. Due to all result wait timeouts being 0. Although there is a certain chance that the next requests will be extra quick, so writing a stable test for that does not seem possible. The bug was discovered while working on #288. For its testing it was needed to stop one instance and master_discovery test started failing. --- vshard/replicaset.lua | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua index 55028bd..174c761 100644 --- a/vshard/replicaset.lua +++ b/vshard/replicaset.lua @@ -682,11 +682,7 @@ local function replicaset_locate_master(replicaset) local replicaset_uuid = replicaset.uuid for replica_uuid, replica in pairs(replicaset.replicas) do local conn = replica.conn - timeout, err = netbox_wait_connected(conn, timeout) - if not timeout then - last_err = err - timeout = deadline - fiber_clock() - else + if conn:is_connected() then ok, f = pcall(conn.call, conn, func, args, async_opts) if not ok then last_err = lerror.make(f) -- 2.24.3 (Apple Git-128)
next prev parent reply other threads:[~2021-12-04 0:20 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-12-04 0:19 [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches 2021-12-04 0:19 ` Vladislav Shpilevoy via Tarantool-patches [this message] 2021-12-05 17:44 ` [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery Oleg Babin via Tarantool-patches 2021-12-06 23:34 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 2021-12-04 0:19 ` [Tarantool-patches] [PATCH vshard 2/2] router: don't fallback RO to master right away Vladislav Shpilevoy via Tarantool-patches 2021-12-05 17:53 ` Oleg Babin via Tarantool-patches 2021-12-06 23:56 ` Vladislav Shpilevoy via Tarantool-patches 2021-12-07 6:49 ` Oleg Babin via Tarantool-patches 2021-12-07 22:38 ` [Tarantool-patches] [PATCH vshard 0/2] VShard replica backoff, part 1 Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3274e75ae8fc8a996645991aa2f663a72319aa1f.1638577114.git.v.shpilevoy@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=olegrok@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox