From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id BF5F1714AA; Tue, 5 Oct 2021 16:09:43 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BF5F1714AA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1633439383; bh=Z3WDuPdX5OlBpNK79VfWnwjDXYkz+ZgTvvhNHg1k53M=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=X9g9yP8/ptJ47AAxyOTRDk48+W5XPrgcOzJO/AGRzZ5egwLjGeilbqn90FTQmZsVG KDkrdDVzLgt+5wviZLExBA7f2JYnCv16KvOWQLf2sHiJgNLvDdEjPQYNGHJrTBJ0Ib q9heeJsT3Alvbm1BWaetlfhX/ye9g+dzXaYFdd+s= Received: from smtp3.mail.ru (smtp3.mail.ru [94.100.179.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 718D8714AA for ; Tue, 5 Oct 2021 16:09:42 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 718D8714AA Received: by smtp3.mail.ru with esmtpa (envelope-from ) id 1mXkCT-0005Qc-JX; Tue, 05 Oct 2021 16:09:42 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com References: <203dd4c5c23e717861a4952510882904323e10a0.1633028320.git.sergepetrenko@tarantool.org> <71d4862c-768a-3768-a36b-03487cfb5115@tarantool.org> <91aca699-9fbc-845f-0cc8-d4d2b09b8902@tarantool.org> Message-ID: Date: Tue, 5 Oct 2021 16:09:40 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9064ADF4728AA0EE956D43A9B567FDD9A8B13701973EA0578182A05F53808504069FFC7DAE5AE690E55080EA4445EF4127B54E65814AFC0A815A2912758D47EF5 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7E4182CE4FE3052C2EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006378D70459434292EC88638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8AF5B837DE2AA85FCC56B76E2872F6774117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC974A882099E279BDA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751FF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B569F1129A2C6445075ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A573AE42343A380AC6270A6AF5D7B3E116274809830B387DB4D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75C4D20244F7083972410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D342B1F2AD168155B068F1D9588C5AB7FF7ACF146A93B23145A493416261D0DE8C3B35B9389CFC549EE1D7E09C32AA3244C7D43C23862C3A63C6E0B65D19217FE4D30452B15D76AEC14927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojhAh8SZXECpAHlJcuhVnllw== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A4466B873A555E5670DAD8A82CCCF7DDF1C97294268B687DDCF2424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 2/2] replication: fix replica disconnect upon reconfiguration X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 05.10.2021 00:04, Vladislav Shpilevoy пишет: > Hi! Thanks for working on this! > >>>> diff --git a/src/box/box.cc b/src/box/box.cc >>>> index 219ffa38d..89cda5599 100644 >>>> --- a/src/box/box.cc >>>> +++ b/src/box/box.cc> @@ -1261,7 +1261,9 @@ box_sync_replication(bool connect_quorum) >>>>               applier_delete(appliers[i]); /* doesn't affect diag */ >>>>       }); >>>>   -    replicaset_connect(appliers, count, connect_quorum); >>>> +    bool connect_quorum = strict; >>>> +    bool keep_connect = !strict; >>>> +    replicaset_connect(appliers, count, connect_quorum, keep_connect); >>> 1. How about passing both these parameters explicitly to box_sync_replication? >>> I don't understand the link between them so that they could be one. >>> >>> It seems the only case when you need to drop the old connections is when >>> you turn anon to normal. Why should they be fully reset otherwise? >> Yes, it's true. anon to normal is the only place where existing >> connections should be reset. >> >> For both bootstrap and local recovery (first ever box.cfg) keep_connect >> doesn't make sense at all, because there are no previous connections to >> keep. >> >> So the only two (out of 5) box_sync_replication calls, that need >> keep_connect are replication reconfiguration (keep_connect = true) and >> anon replica reconfiguration (keep_connect = false). >> >> Speaking of the relation between keep_connect and connect_quorum: >> We don't care about keep_connect in 3 calls (bootstrap and recovery), >> and when keep_connect is important, it's equal to !connect_quorum. >> I thought it might be nice to replace them with a single parameter. >> >> I tried to pass both parameters to box_sync_repication() at first. >> This looked rather ugly IMO: >> box_sync_replication(true, false), box_sync_replication(false, true); >> Two boolean parameters which are responsible for God knows what are >> worse than one parameter. >> >> I'm not 100% happy with my solution, but it at least hides the second >> parameter. And IMO box_sync_replication(strict) is rather easy to >> understand: when strict = true, you want to connect to quorum, and >> you want to reset the connections. And vice versa when strict = false. > This can be resolved with a couple of wrappers, like in this diff: > > ==================== > diff --git a/src/box/box.cc b/src/box/box.cc > index 89cda5599..c1216172d 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) > * don't start appliers. > */ > static void > -box_sync_replication(bool strict) > +box_sync_replication(bool do_quorum, bool do_reuse) > { > int count = 0; > struct applier **appliers = cfg_get_replication(&count); > @@ -1260,14 +1260,27 @@ box_sync_replication(bool strict) > for (int i = 0; i < count; i++) > applier_delete(appliers[i]); /* doesn't affect diag */ > }); > - > - bool connect_quorum = strict; > - bool keep_connect = !strict; > - replicaset_connect(appliers, count, connect_quorum, keep_connect); > + replicaset_connect(appliers, count, do_quorum, do_reuse); > > guard.is_active = false; > } > > +static inline void > +box_reset_replication(void) > +{ > + const bool do_quorum = true; > + const bool do_reuse = false; > + box_sync_replication(do_quorum, do_reuse); > +} > + > +static inline void > +box_update_replication(void) > +{ > + const bool do_quorum = false; > + const bool do_reuse = true; > + box_sync_replication(do_quorum, do_reuse); > +} > + > void > box_set_replication(void) > { > @@ -1286,7 +1299,7 @@ box_set_replication(void) > * Stay in orphan mode in case we fail to connect to at least > * 'replication_connect_quorum' remote instances. > */ > - box_sync_replication(false); > + box_update_replication(); > /* Follow replica */ > replicaset_follow(); > /* Wait until appliers are in sync */ > @@ -1406,7 +1419,7 @@ box_set_replication_anon(void) > * them can register and others resend a > * non-anonymous subscribe. > */ > - box_sync_replication(true); > + box_reset_replication(); > /* > * Wait until the master has registered this > * instance. > @@ -3260,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid, > * with connecting to 'replication_connect_quorum' masters. > * If this also fails, throw an error. > */ > - box_sync_replication(true); > + box_update_replication(); > > struct replica *master = replicaset_find_join_master(); > assert(master == NULL || master->applier != NULL); > @@ -3337,7 +3350,7 @@ local_recovery(const struct tt_uuid *instance_uuid, > if (wal_dir_lock >= 0) { > if (box_listen() != 0) > diag_raise(); > - box_sync_replication(false); > + box_update_replication(); > > struct replica *master; > if (replicaset_needs_rejoin(&master)) { > @@ -3416,7 +3429,7 @@ local_recovery(const struct tt_uuid *instance_uuid, > vclock_copy(&replicaset.vclock, &recovery->vclock); > if (box_listen() != 0) > diag_raise(); > - box_sync_replication(false); > + box_update_replication(); > } > stream_guard.is_active = false; > recovery_finalize(recovery); > ==================== > > Feel free to discard it if don't like. I am fine with the current > solution too. > > Now when I sent this diff, I realized box_restart_replication() > would be a better name than reset. Up to you as well. Your version looks better, thanks! Applied with renaming box_reset_replication() to box_restart_replication() Also replaced box_update_replication() with box_restart_replication() in bootstrap(). > >> diff --git a/test/instance_files/base_instance.lua b/test/instance_files/base_instance.lua >> index 45bdbc7e8..e579c3843 100755 >> --- a/test/instance_files/base_instance.lua >> +++ b/test/instance_files/base_instance.lua >> @@ -5,7 +5,8 @@ local listen = os.getenv('TARANTOOL_LISTEN') >> box.cfg({ >> work_dir = workdir, >> -- listen = 'localhost:3310' >> - listen = listen >> + listen = listen, >> + log = workdir..'/tarantool.log', > Do you really need it in this patch? Yep, I need it for grep_log. Looks like luatest doesn't set log to anything by default. > > Other than that LGTM. You can send the next version to a next > reviewer. I suppose it can be Yan now. Here's the full diff: ================================== diff --git a/src/box/box.cc b/src/box/box.cc index 89cda5599..cc4ada47e 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count)   * don't start appliers.   */  static void -box_sync_replication(bool strict) +box_sync_replication(bool do_quorum, bool do_reuse)  {      int count = 0;      struct applier **appliers = cfg_get_replication(&count); @@ -1260,14 +1260,27 @@ box_sync_replication(bool strict)          for (int i = 0; i < count; i++)              applier_delete(appliers[i]); /* doesn't affect diag */      }); - -    bool connect_quorum = strict; -    bool keep_connect = !strict; -    replicaset_connect(appliers, count, connect_quorum, keep_connect); +    replicaset_connect(appliers, count, do_quorum, do_reuse);      guard.is_active = false;  } +static inline void +box_restart_replication(void) +{ +    const bool do_quorum = true; +    const bool do_reuse = false; +    box_sync_replication(do_quorum, do_reuse); +} + +static inline void +box_update_replication(void) +{ +    const bool do_quorum = false; +    const bool do_reuse = true; +    box_sync_replication(do_quorum, do_reuse); +} +  void  box_set_replication(void)  { @@ -1286,7 +1299,7 @@ box_set_replication(void)       * Stay in orphan mode in case we fail to connect to at least       * 'replication_connect_quorum' remote instances.       */ -    box_sync_replication(false); +    box_update_replication();      /* Follow replica */      replicaset_follow();      /* Wait until appliers are in sync */ @@ -1406,7 +1419,7 @@ box_set_replication_anon(void)           * them can register and others resend a           * non-anonymous subscribe.           */ -        box_sync_replication(true); +        box_restart_replication();          /*           * Wait until the master has registered this           * instance. @@ -3260,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid,       * with connecting to 'replication_connect_quorum' masters.       * If this also fails, throw an error.       */ -    box_sync_replication(true); +    box_restart_replication();      struct replica *master = replicaset_find_join_master();      assert(master == NULL || master->applier != NULL); @@ -3337,7 +3350,7 @@ local_recovery(const struct tt_uuid *instance_uuid,      if (wal_dir_lock >= 0) {          if (box_listen() != 0)              diag_raise(); -        box_sync_replication(false); +        box_update_replication();          struct replica *master;          if (replicaset_needs_rejoin(&master)) { @@ -3416,7 +3429,7 @@ local_recovery(const struct tt_uuid *instance_uuid,          vclock_copy(&replicaset.vclock, &recovery->vclock);          if (box_listen() != 0)              diag_raise(); -        box_sync_replication(false); +        box_update_replication();      }      stream_guard.is_active = false;      recovery_finalize(recovery); diff --git a/test/replication-luatest/gh-4669-applier-reconnect_test.lua b/test/replication-luatest/gh_4669_applier_reconnect_test.lua similarity index 100% rename from test/replication-luatest/gh-4669-applier-reconnect_test.lua rename to test/replication-luatest/gh_4669_applier_reconnect_test.lua ================================== -- Serge Petrenko