From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 8C4B323D2B for ; Mon, 25 Jun 2018 08:48:27 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Oo1vLrG-t1xN for ; Mon, 25 Jun 2018 08:48:27 -0400 (EDT) Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 4A39E23D18 for ; Mon, 25 Jun 2018 08:48:27 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race References: From: Alex Khatskevich Message-ID: Date: Mon, 25 Jun 2018 15:48:24 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: Vladislav Shpilevoy , tarantool-patches@freelists.org >> +-- >> +-- Check route_map is not filled with old replica objects after >> +-- recpnfigure. > > 1. Typo. fixed > >> >> +-- Simulate long `callro`. >> +-- Stuck on first rs in replicasets. >> +vshard.router.internal.errinj.LONG_DISCOVERY = true; > > 2. I do not see this error injection in the M.errinj > declaration. Fixed +Renamed to ERRINJ_LONG_DISCOVERY > >> +--- >> +... >> +for _, __ in pairs(vshard.router.internal.replicasets) do >> +    vshard.router.discovery_wakeup() >> +    fiber.sleep(0.02) >> +end; > > 3. This cycle makes no sense. With set LONG_DISCOVERY it is > equivalent to calling router.discovery_wakeup() once. Disagree. I need to stuck on the first routemap element. However I need to wakeup it #replicasets times, because the discovery_fiber sleeps after each iteration. > >> +--- >> +... >> +vshard.router.cfg(cfg); >> +--- >> +... >> +vshard.router.internal.errinj.LONG_DISCOVERY = nil; >> +--- >> +... >> +-- Do discovery iteration. >> +vshard.router.discovery_wakeup() >> +fiber.sleep(0.02) > > 4. Concrete timeouts are the way to create an unstable test. > Please, get rid of them and replace with 'while not cond do wait end' > where necessary. It is important that discovery fiber did only one iteration of discovery. So, If I want some `while` mechanism I want to introduce several synchronization and block/unblock phases. I get you point and suggest compromise here. 1. I have places an assert here, which checks that expected discovery has occurred. 2. I have launched the test 100 times and it did not fail > >>   - true >> diff --git a/vshard/router/init.lua b/vshard/router/init.lua >> index 7e765fa..df5b343 100644 >> --- a/vshard/router/init.lua >> +++ b/vshard/router/init.lua >> @@ -127,10 +127,23 @@ local function discovery_f(module_version) >> +            -- Renew replicasets object in case of reconfigure >> +            -- and reload events. > > 5. You do not renew here anything. > >> +            if M.replicasets ~= old_replicasets then >> +                break >> +            end I have rewritten the comment. The replicasets variable captured by the for loop should be updated by calling `break` and exiting the for loop.