From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 950116FC87; Fri, 1 Oct 2021 01:15:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 950116FC87 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1633040156; bh=DAsgleWLQ6R04jUbpKvIBh37Znr2VZty852HZ7eKIGE=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=IBiaG5C9/+tWSB77kaBd20PdKdKWVSx4WCiLkWjydaBI0RZjZ0lhVwOVHQ9ESR5zs kUo1QM3gnbbC/XVEgKcgxmdxpnD3EyUpguXHmIyN0M2ylNUsGw8zB+yU6V1vQjix3v G63DQGXijxKnPB2mwbXjf88XqtBOomW3npEuoVrE= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E9D4E6FC87 for ; Fri, 1 Oct 2021 01:15:50 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E9D4E6FC87 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1mW4LF-0001IJ-ED; Fri, 01 Oct 2021 01:15:50 +0300 To: Serge Petrenko , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <203dd4c5c23e717861a4952510882904323e10a0.1633028320.git.sergepetrenko@tarantool.org> Message-ID: <71d4862c-768a-3768-a36b-03487cfb5115@tarantool.org> Date: Fri, 1 Oct 2021 00:15:48 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <203dd4c5c23e717861a4952510882904323e10a0.1633028320.git.sergepetrenko@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD96A58C36AA2E99649C8C87BF8DECC9FC9CB42A33403F58D3C182A05F5380850405BF789EF60B5430764EB64C62F0D33BE56A247D1D954D666B1EFFBCBF8E49B2A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE74323F140F3EE5B6AC2099A533E45F2D0395957E7521B51C2CFCAF695D4D8E9FCEA1F7E6F0F101C6778DA827A17800CE7AE28AD3C7270E4528F08D7030A58E5AD1A62830130A00468AEEEE3FBA3A834EE7353EFBB55337566BBB17C150BCA67933E23F5F24CA067D5C9D0F0DA03912313A471835C12D1D9774AD6D5ED66289B5278DA827A17800CE70F3DDF2BBF19B93A9FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C3BCA4DA3BE1BC1572CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB86D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE74ABCC139FF3F849B731C566533BA786AA5CC5B56E945C8DA X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B5059C4011E44820E2896436C7671BC3A517 X-C1DE0DAB: 0D63561A33F958A51CBCDD31ADD1B0E4B1330DEF85C9E6537BB08989DD32835AD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA757AF5085B7B0228D6410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D343FB425EC7F4D4A4B7CD0D6E0D0807FBAD70A1A3397D8435FDBEF8C75EFE6282A7C3BE92124578D231D7E09C32AA3244C96569FD015A9F1183978FB673A54F118725D5B54B2FE4575FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojJNmX3owDPmGLhbT0m+Pgpg== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D239B616645192E3BB3D2E11C01DE698D3841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 2/2] replication: fix replica disconnect upon reconfiguration X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch! See 6 comments below. > diff --git a/src/box/box.cc b/src/box/box.cc > index 219ffa38d..89cda5599 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc> @@ -1261,7 +1261,9 @@ box_sync_replication(bool connect_quorum) > applier_delete(appliers[i]); /* doesn't affect diag */ > }); > > - replicaset_connect(appliers, count, connect_quorum); > + bool connect_quorum = strict; > + bool keep_connect = !strict; > + replicaset_connect(appliers, count, connect_quorum, keep_connect); 1. How about passing both these parameters explicitly to box_sync_replication? I don't understand the link between them so that they could be one. It seems the only case when you need to drop the old connections is when you turn anon to normal. Why should they be fully reset otherwise? > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 1288bc9b1..e5fce6c8c 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -664,11 +681,11 @@ applier_on_connect_f(struct trigger *trigger, void *event) > > void > replicaset_connect(struct applier **appliers, int count, > - bool connect_quorum) > + bool connect_quorum, bool keep_connect) > { > if (count == 0) { > /* Cleanup the replica set. */ > - replicaset_update(appliers, count); > + replicaset_update(appliers, count, keep_connect); 2. In case of count 0 it means all the appliers must be terminated, mustn't they? So you could pass always false here. Up to you. > return; > } 3. A few lines below I see that all the appliers are started via applier_start and gather connections. Even if you have them already. Wasn't the point of this patchset not to create connections when you already have them? You could find matches by URI even before you try to create a connection. Otherwise when I do this: box.cfg{replication = {3313, 3314}} box.cfg{replication = {3313, 3314, 3315}} you create 3 connections in the second box.cfg. While you could create just 1. > diff --git a/src/box/replication.h b/src/box/replication.h > index 4c82cd839..a8fed45e8 100644 > --- a/src/box/replication.h > +++ b/src/box/replication.h > @@ -439,10 +439,12 @@ replicaset_add_anon(const struct tt_uuid *replica_uuid); > * \param connect_quorum if this flag is set, fail unless at > * least replication_connect_quorum > * appliers have successfully connected. > + * \param keep_connect if this flag is set do not force a reconnect if the > + * old connection to the replica is fine. 4. Why do you need to touch it even if the replica is not fine? Shouldn't it reconnect automatically anyway? You could maybe force it to reconnect right now if you don't want to wait until its reconnect timeout expires. > */ > void > replicaset_connect(struct applier **appliers, int count, > - bool connect_quorum); > + bool connect_quorum, bool keep_connect); > > diff --git a/test/replication-luatest/gh_4669_applier_reconnect_test.lua b/test/replication-luatest/gh_4669_applier_reconnect_test.lua > new file mode 100644 > index 000000000..62adff716 > --- /dev/null > +++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua > @@ -0,0 +1,42 @@ > +local t = require('luatest') > +local fio = require('fio') > +local Server = t.Server > +local Cluster = require('test.luatest_helpers.cluster') 5. Are we using first capital letters for variable names now? Maybe stick to the guidelines and use lower case letters? > + > +local g = t.group('gh-4669-applier-reconnect') > + > +local function check_follow_master(server) > + return t.assert_equals( > + server:eval('return box.info.replication[1].upstream.status'), 'follow') > +end > + > +g.before_each(function() > + g.cluster = Cluster:new({}) > + g.master = g.cluster:build_server( > + {}, {alias = 'master'}, 'base_instance.lua') > + g.replica = g.cluster:build_server( > + {args={'master'}}, {alias = 'replica'}, 'replica.lua') > + > + g.cluster:join_server(g.master) > + g.cluster:join_server(g.replica) > + g.cluster:start() > + check_follow_master(g.replica) > +end) > + > +g.after_each(function() > + g.cluster:stop() > +end) > + > +-- Test that appliers aren't recreated upon replication reconfiguration. > +g.test_applier_connection_on_reconfig = function(g) > + g.replica:eval( > + 'box.cfg{'.. > + 'replication = {'.. > + 'os.getenv("TARANTOOL_LISTEN"),'.. > + 'box.cfg.replication[1],'.. > + '}'.. > + '}' > + ) 6. Are we really supposed to write Lua code as Lua strings now? You could use here [[ ... ]] btw, but the question still remains. Not a comment for your patch. But it looks unusable. What if the test would be a bit more complicated? Look at qsync tests for instance. Imagine writing all of them as Lua strings with quotes and all. Especially if you want to pass into the strings some parameters not known in advance. Could you maybe make it on top of master as a normal .test.lua test? > + check_follow_master(g.replica) > + t.assert_equals(g.master:grep_log("exiting the relay loop"), nil) > +end >