From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A3CA66FC87; Tue, 5 Oct 2021 00:04:14 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A3CA66FC87 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1633381454; bh=kEaRmdchCfaCkRGPifCUJERb67nH9cGIaT0A57mSV+U=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=kfXHqmSPGHtGxrlZ8+YvMxVum48D4zH8mvK5pJ/6GgmshvQVHwAx5U6u60H922VJU X2Xag33XQqsFbm8uFma8894kvfYYhzilspkAs66xuKwwIw01gjuwKazXvNs8ExQn6G H9cepCQCMH6m+UUL7/g+Rozr78tNfP+XwdOmJ30M= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6BBE56FC87 for ; Tue, 5 Oct 2021 00:04:12 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6BBE56FC87 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1mXV87-0004pw-8n; Tue, 05 Oct 2021 00:04:11 +0300 To: Serge Petrenko , gorcunov@gmail.com References: <203dd4c5c23e717861a4952510882904323e10a0.1633028320.git.sergepetrenko@tarantool.org> <71d4862c-768a-3768-a36b-03487cfb5115@tarantool.org> <91aca699-9fbc-845f-0cc8-d4d2b09b8902@tarantool.org> Message-ID: Date: Mon, 4 Oct 2021 23:04:10 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <91aca699-9fbc-845f-0cc8-d4d2b09b8902@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9064ADF4728AA0EE9F29F6F937CFD73092774A1760F25EB43182A05F538085040FB35F87D84D978A709B5E84969688F08CE7601604AC23379A322B6899A596D60 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE73F94C36969B178B8EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006370CE92FB8C11ED3D88638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8345A98541B75BA7AC2956CB47D8628FC117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAE9A1BBD95851C5BA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751FCB629EEF1311BF91D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B613439FA09F3DCB32089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B505AE6DA87C73DF87FC4A8705CB3F75BC45 X-C1DE0DAB: 0D63561A33F958A5DEED54B42CFA817332A37EDF7FF8AE0C7847CE01CE2E6278D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75B7BFB303F1C7DB4D8E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D341ADA1A41A420E9B22AFFE3A723083145296234C5D86E34D5BDF8B7E593206F7B173C5FC7959A50D41D7E09C32AA3244CF91B96D68BBF8EFEC9E95C205D982BB64DBEAD0ED6C55A80927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj+1ww+tpaZeqCDn17GGd7pw== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DF3DD2845E8E9815F8A32CA16B50D18C03841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 2/2] replication: fix replica disconnect upon reconfiguration X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for working on this! >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 219ffa38d..89cda5599 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc> @@ -1261,7 +1261,9 @@ box_sync_replication(bool connect_quorum) >>>               applier_delete(appliers[i]); /* doesn't affect diag */ >>>       }); >>>   -    replicaset_connect(appliers, count, connect_quorum); >>> +    bool connect_quorum = strict; >>> +    bool keep_connect = !strict; >>> +    replicaset_connect(appliers, count, connect_quorum, keep_connect); >> 1. How about passing both these parameters explicitly to box_sync_replication? >> I don't understand the link between them so that they could be one. >> >> It seems the only case when you need to drop the old connections is when >> you turn anon to normal. Why should they be fully reset otherwise? > > Yes, it's true. anon to normal is the only place where existing > connections should be reset. > > For both bootstrap and local recovery (first ever box.cfg) keep_connect > doesn't make sense at all, because there are no previous connections to > keep. > > So the only two (out of 5) box_sync_replication calls, that need > keep_connect are replication reconfiguration (keep_connect = true) and > anon replica reconfiguration (keep_connect = false). > > Speaking of the relation between keep_connect and connect_quorum: > We don't care about keep_connect in 3 calls (bootstrap and recovery), > and when keep_connect is important, it's equal to !connect_quorum. > I thought it might be nice to replace them with a single parameter. > > I tried to pass both parameters to box_sync_repication() at first. > This looked rather ugly IMO: > box_sync_replication(true, false), box_sync_replication(false, true); > Two boolean parameters which are responsible for God knows what are > worse than one parameter. > > I'm not 100% happy with my solution, but it at least hides the second > parameter. And IMO box_sync_replication(strict) is rather easy to > understand: when strict = true, you want to connect to quorum, and > you want to reset the connections. And vice versa when strict = false. This can be resolved with a couple of wrappers, like in this diff: ==================== diff --git a/src/box/box.cc b/src/box/box.cc index 89cda5599..c1216172d 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) * don't start appliers. */ static void -box_sync_replication(bool strict) +box_sync_replication(bool do_quorum, bool do_reuse) { int count = 0; struct applier **appliers = cfg_get_replication(&count); @@ -1260,14 +1260,27 @@ box_sync_replication(bool strict) for (int i = 0; i < count; i++) applier_delete(appliers[i]); /* doesn't affect diag */ }); - - bool connect_quorum = strict; - bool keep_connect = !strict; - replicaset_connect(appliers, count, connect_quorum, keep_connect); + replicaset_connect(appliers, count, do_quorum, do_reuse); guard.is_active = false; } +static inline void +box_reset_replication(void) +{ + const bool do_quorum = true; + const bool do_reuse = false; + box_sync_replication(do_quorum, do_reuse); +} + +static inline void +box_update_replication(void) +{ + const bool do_quorum = false; + const bool do_reuse = true; + box_sync_replication(do_quorum, do_reuse); +} + void box_set_replication(void) { @@ -1286,7 +1299,7 @@ box_set_replication(void) * Stay in orphan mode in case we fail to connect to at least * 'replication_connect_quorum' remote instances. */ - box_sync_replication(false); + box_update_replication(); /* Follow replica */ replicaset_follow(); /* Wait until appliers are in sync */ @@ -1406,7 +1419,7 @@ box_set_replication_anon(void) * them can register and others resend a * non-anonymous subscribe. */ - box_sync_replication(true); + box_reset_replication(); /* * Wait until the master has registered this * instance. @@ -3260,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid, * with connecting to 'replication_connect_quorum' masters. * If this also fails, throw an error. */ - box_sync_replication(true); + box_update_replication(); struct replica *master = replicaset_find_join_master(); assert(master == NULL || master->applier != NULL); @@ -3337,7 +3350,7 @@ local_recovery(const struct tt_uuid *instance_uuid, if (wal_dir_lock >= 0) { if (box_listen() != 0) diag_raise(); - box_sync_replication(false); + box_update_replication(); struct replica *master; if (replicaset_needs_rejoin(&master)) { @@ -3416,7 +3429,7 @@ local_recovery(const struct tt_uuid *instance_uuid, vclock_copy(&replicaset.vclock, &recovery->vclock); if (box_listen() != 0) diag_raise(); - box_sync_replication(false); + box_update_replication(); } stream_guard.is_active = false; recovery_finalize(recovery); ==================== Feel free to discard it if don't like. I am fine with the current solution too. Now when I sent this diff, I realized box_restart_replication() would be a better name than reset. Up to you as well. > diff --git a/test/instance_files/base_instance.lua b/test/instance_files/base_instance.lua > index 45bdbc7e8..e579c3843 100755 > --- a/test/instance_files/base_instance.lua > +++ b/test/instance_files/base_instance.lua > @@ -5,7 +5,8 @@ local listen = os.getenv('TARANTOOL_LISTEN') > box.cfg({ > work_dir = workdir, > -- listen = 'localhost:3310' > - listen = listen > + listen = listen, > + log = workdir..'/tarantool.log', Do you really need it in this patch? Other than that LGTM. You can send the next version to a next reviewer. I suppose it can be Yan now.