From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 6B6F36ECC0; Sun, 5 Dec 2021 20:44:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6B6F36ECC0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1638726271; bh=L9B2zSvk5KMfbEE5dRO50Io06Sg3m0yy/uKeEIP2/l4=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=OWjcuF76IE31PFaZhzzdR5fonyXH0ptOYbjim8ynq2X7EYFszIgukb6vYeN3/8uau LIEOQ2eZN00nJbhLiPnlXJXvSmbWUwIB36Kn10kkJJZM5zD7OQw7NWtqD2BfvxubBy WiGvwupVbcQPDKPjhLPM/y4lDuR6+uBKJzKXthPc= Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7F8AC6ECC0 for ; Sun, 5 Dec 2021 20:44:28 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7F8AC6ECC0 Received: by smtp52.i.mail.ru with esmtpa (envelope-from ) id 1mtvYp-0006FG-MK; Sun, 05 Dec 2021 20:44:28 +0300 Message-ID: <61db4417-1dc2-a31a-e3e7-4e4d4d10c8ee@tarantool.org> Date: Sun, 5 Dec 2021 20:44:26 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org References: <3274e75ae8fc8a996645991aa2f663a72319aa1f.1638577114.git.v.shpilevoy@tarantool.org> In-Reply-To: <3274e75ae8fc8a996645991aa2f663a72319aa1f.1638577114.git.v.shpilevoy@tarantool.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD93822B471089FF64D4E646BE16CEFF463156D8594F96ADA64182A05F538085040DD90BAD8845764D80B42C8A51B4553A1E4C9289A72AE48D2049B3B652B455476 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7AED985C8E545F588EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637CF05F7050DCA185A8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D86950FF93D5B7BEEF04D88B2D5E7C5BB2117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCE96A3A8AAADC8934A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD1828451B159A507268D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE652FD71AFB96DC7DC3123C4324A5CF10D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE3034D30FDF2F620DB6E0066C2D8992A16C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F79006377F98CFD14CA0B037EFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-C1DE0DAB: 0D63561A33F958A5B8919424468F82C4483B8ED6D85462BBCC4ED37FECCCD73BD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7506FE1F977233B9BB410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34F0A5F58274334C9596E8EAE41F097DA7F71A285F091D850F1111BCED57F820F015D9AB76F9EAF4591D7E09C32AA3244C301225052B7CE4F5FA4AE7F12D5720D855E75C8D0ED9F6EE729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojk4oXTccdlonOHCzluMiheQ== X-Mailru-Sender: 583F1D7ACE8F49BD1042885CEC987B6B7ACCD787D0EFD7010B42C8A51B4553A1BDDB26EEA8DAF3A67019711D9D5B048E1458020726E2BC9FD5ECBA0B92C0A936CDC7563AA7CEBD2872D6B4FCE48DF648AE208404248635DF X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Oleg Babin via Tarantool-patches Reply-To: Oleg Babin Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for your patch. Looks reasonable. However I have one question. Before this patch if some connection was down we perform wait_connected and in some cases it could lead to successful reconnection. Currently we just skip broken connections and don't try to reconnect. Could it be a problem or we perform reconnection in another place? What will be happen if all connections will be down? On 04.12.2021 03:19, Vladislav Shpilevoy wrote: > Master discovery tried to wait for connection establishment for > the discovery timeout for each instance in the replicaset. > > The problem is that if one of replicas is dead, the discovery will > waste its entire timeout on just this waiting. For all the > requests sent to connected replicas after this one it will have 0 > timeout and won't properly wait for their results. > > For example, this is how master discovery could work: > > send requests: > replica1 wait_connected + send, > replica2 wait_connected fails on timeout > replica3 wait_connected works if was connected + send > > collect responses: > replica1 wait_result(0 timeout) > replica2 skip > replica3 wait_result(0 timeout) > > The entire timeout was wasted on 'replica2 wait_connected' during > request sending. Replica1 result could be delivered fine because > it was in progress while replica2 was waiting. So having 0 timeout > in it is not a problem. It had time to be executed. But replica3's > request has very few chances to be delivered in time. It was just > sent and is collected almost immediately. > > The worst case is when the first replica is dead. Then it is very > likely neither of requests will be delivered. Due to all result > wait timeouts being 0. > > Although there is a certain chance that the next requests will be > extra quick, so writing a stable test for that does not seem > possible. > > The bug was discovered while working on #288. For its testing it > was needed to stop one instance and master_discovery test started > failing. > --- > vshard/replicaset.lua | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua > index 55028bd..174c761 100644 > --- a/vshard/replicaset.lua > +++ b/vshard/replicaset.lua > @@ -682,11 +682,7 @@ local function replicaset_locate_master(replicaset) > local replicaset_uuid = replicaset.uuid > for replica_uuid, replica in pairs(replicaset.replicas) do > local conn = replica.conn > - timeout, err = netbox_wait_connected(conn, timeout) > - if not timeout then > - last_err = err > - timeout = deadline - fiber_clock() > - else > + if conn:is_connected() then > ok, f = pcall(conn.call, conn, func, args, async_opts) > if not ok then > last_err = lerror.make(f)