From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4BF8170150; Sat, 4 Dec 2021 03:20:14 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4BF8170150 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1638577214; bh=JL/w1geQahMaGP7HmydBfjN3+EfyIyUuC303Gb++Ep8=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=PFLEbdFNO740Iz46SQEfeSWopHgRk1t1FgOLZ84o1jk+ta9ULnGnjkegUqjEH/0se YKSjNCjICXvrT2lq3CdcL/icxn5v91tDmTo+Coiud8SaDzAioBkjNOluhXvir23I7L Sl88HsPGnRssKxfoCQt4lMIzMkUH0d4hUR7MlmyQ= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D0DE870150 for ; Sat, 4 Dec 2021 03:19:40 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D0DE870150 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1mtImC-0003OJ-8V; Sat, 04 Dec 2021 03:19:40 +0300 To: tarantool-patches@dev.tarantool.org, olegrok@tarantool.org Date: Sat, 4 Dec 2021 01:19:37 +0100 Message-Id: <3274e75ae8fc8a996645991aa2f663a72319aa1f.1638577114.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD93822B471089FF64D73F8BCC4B84AFC96045BB84450516CBB182A05F538085040BC30B7EA05B3285B76C5C39E833460ADED2298B5E5EB89712EA7FEA89DC684EA X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE792C68BF9CD4C0E9EEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006374D0D183F14C070BA8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8EDD8A31E9931C5A5F660635469D3499F117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC1F3D1E7C87716A07A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735204B6963042765DA4BBDFBBEFFF4125B51D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B62CFFCC7B69C47339089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975C1116EF429256F3B4362CDC731D89EEC12F4E5B848BAB34649C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF505D71D783575ABE699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34E9B0C12ABED55137523B737CD9F845752211E788C14CA3C08076F5C69A7EE0D24B946408A25026431D7E09C32AA3244C0DA41C5C36278EADE66F0051A52C4554F2F5F14F68F1805B729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj+/8mdklYaig2qpyG+fDXyA== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DF4320FA13AB0C9397B4C981C737CBAD83841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E25FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: [Tarantool-patches] [PATCH vshard 1/2] router: drop wait_connected from master discovery X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Master discovery tried to wait for connection establishment for the discovery timeout for each instance in the replicaset. The problem is that if one of replicas is dead, the discovery will waste its entire timeout on just this waiting. For all the requests sent to connected replicas after this one it will have 0 timeout and won't properly wait for their results. For example, this is how master discovery could work: send requests: replica1 wait_connected + send, replica2 wait_connected fails on timeout replica3 wait_connected works if was connected + send collect responses: replica1 wait_result(0 timeout) replica2 skip replica3 wait_result(0 timeout) The entire timeout was wasted on 'replica2 wait_connected' during request sending. Replica1 result could be delivered fine because it was in progress while replica2 was waiting. So having 0 timeout in it is not a problem. It had time to be executed. But replica3's request has very few chances to be delivered in time. It was just sent and is collected almost immediately. The worst case is when the first replica is dead. Then it is very likely neither of requests will be delivered. Due to all result wait timeouts being 0. Although there is a certain chance that the next requests will be extra quick, so writing a stable test for that does not seem possible. The bug was discovered while working on #288. For its testing it was needed to stop one instance and master_discovery test started failing. --- vshard/replicaset.lua | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua index 55028bd..174c761 100644 --- a/vshard/replicaset.lua +++ b/vshard/replicaset.lua @@ -682,11 +682,7 @@ local function replicaset_locate_master(replicaset) local replicaset_uuid = replicaset.uuid for replica_uuid, replica in pairs(replicaset.replicas) do local conn = replica.conn - timeout, err = netbox_wait_connected(conn, timeout) - if not timeout then - last_err = err - timeout = deadline - fiber_clock() - else + if conn:is_connected() then ok, f = pcall(conn.call, conn, func, args, async_opts) if not ok then last_err = lerror.make(f) -- 2.24.3 (Apple Git-128)