From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 0ECDF6E46D; Wed, 13 Oct 2021 19:00:33 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 0ECDF6E46D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1634140833; bh=L7lkIA5gt/hq0Lhhvb2qLmcG46ujSxRIFcUwYlESx/k=; h=Date:In-Reply-To:Cc:To:References:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=NxuvkfvX7cAILBCWRaNBkuRwzi760LB4Pu/yLOdcj/dzXEx+3gcnRXG36KOdeW8Q2 EljRgTnv5PRgXtNa0plHzmZVopRJijW2AeSswIGlx3Sy6or47W0b36y8V+a29dqvsS OZQeMSlJdgP7RZLRiwzSI+ImM2efVTfAf4GCMU4Q= Received: from smtp16.mail.ru (smtp16.mail.ru [94.100.176.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E61106E46D for ; Wed, 13 Oct 2021 19:00:30 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E61106E46D Received: by smtp16.mail.ru with esmtpa (envelope-from ) id 1maggA-0008Lq-2R; Wed, 13 Oct 2021 19:00:30 +0300 Message-Id: <31EB9C74-5BAD-4D52-A5BB-EE347C4104A5@tarantool.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_7421EA5A-06B2-4F6A-B6EC-F98FC59CEB91" Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Date: Wed, 13 Oct 2021 19:00:29 +0300 In-Reply-To: <8e453790-534e-be84-b4b2-fd30c9740eb7@tarantool.org> Cc: tarantool-patches@dev.tarantool.org To: Serge Petrenko References: <5f68cfcac8c44346a690e0f48ad0da1d7646ac83.1633439400.git.sergepetrenko@tarantool.org> <648688AC-03B8-48B3-B644-8F3B9A9A9212@tarantool.org> <8e453790-534e-be84-b4b2-fd30c9740eb7@tarantool.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9962CB5145ACAD0EF68317170C70F28A1EC1AF62B1AD9232000894C459B0CD1B9E076B962C695A5C49E5C3B5C8242FCE406816586D341252B567CE8B20D481239 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE752E71F0C64B7C834EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637BE899A9B5C1209058638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8AE5213B823AB063DF3949B8FD9EAF75B117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAA867293B0326636D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B60A9A04DE7321024275ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A5F26DBDD9D4072DA083AB283089369301A6A1BC82D64BAA5BD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA754A161BC97E2066CA410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34EA882B598A209811216475AED04202E1785D71946127B85D1DEFD0DF3A4567DB4CFCC424DB02CCBC1D7E09C32AA3244C1137BF12969A25F8B299ED76B8FC338CA95CA90A1D8AC565729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj+P5XwFBO7S960g8f45QpcQ== X-Mailru-Sender: 3B9A0136629DC912F4AABCEFC589C81EE1A1E16D3F26E6BF3CAE8427B993DB55AEC323C3D00E748BAD07DD1419AC565FA614486B47F28B67C5E079CCF3B0523AED31B7EB2E253A9E112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 2/2] replication: fix replica disconnect upon reconfiguration X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" --Apple-Mail=_7421EA5A-06B2-4F6A-B6EC-F98FC59CEB91 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 LGTM. Sergos > On 12 Oct 2021, at 17:14, Serge Petrenko = wrote: >=20 >=20 >=20 > 12.10.2021 13:14, sergos =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> Hi! >>=20 >> Thanks for the patch! >>=20 >> Just request for some addition to the test, otherwise LGTM. >>=20 >> Sergos >>=20 >>=20 >>> On 5 Oct 2021, at 16:18, Serge Petrenko = wrote: >>>=20 >>> Replication reconfiguration used to work as follows: upon receiving = a >>> new config disconnect from all the existing masters and try to = connect >>> to all the masters from the new config. >>>=20 >>> This lead to instances doing extra work when old config and new = config >>> had the same nodes in them: instead of doing nothing, tarantool >>> reinitialized the connection. >>>=20 >>> There was another problem: when an existing connection is broken, = master >>> takes some time to notice that. So, when replica resets the = connection, >>> it may try to reconnect faster than the master is able to notice its >>> absence. In this case replica wouldn't be able to reconnect due to >>> `duplicate connection with the same replica UUID`. So replication = would >>> stop for a replication_timeout, which may be quite large (seconds or >>> tens of seconds). >>>=20 >>> Let's prevent tarantool from reconnecting to the same instance, if = there >>> already is a working connection. >>>=20 >>> Closes #4669 >>> --- >>> .../gh-4669-applier-reconfig-reconnect.md | 5 ++ >>> src/box/box.cc | 31 ++++++--- >>> src/box/replication.cc | 41 ++++++++---- >>> src/box/replication.h | 4 +- >>> test/instance_files/base_instance.lua | 3 +- >>> test/instance_files/replica.lua | 17 +++++ >>> test/luatest_helpers/server.lua | 65 = +++++++++++++++++++ >>> .../gh_4669_applier_reconnect_test.lua | 40 ++++++++++++ >>> 8 files changed, 184 insertions(+), 22 deletions(-) >>> create mode 100644 = changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md >>> create mode 100755 test/instance_files/replica.lua >>> create mode 100644 = test/replication-luatest/gh_4669_applier_reconnect_test.lua >>>=20 >>> diff --git = a/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md = b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md >>> new file mode 100644 >>> index 000000000..2a776872b >>> --- /dev/null >>> +++ b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md >>> @@ -0,0 +1,5 @@ >>> +## bugfix/replication >>> + >>> +* Fixed replica reconnecting to a living master on any = `box.cfg{replication=3D...}` >>> + change. Such reconnects could lead to replica failing to restore = connection >>> + for `replication_timeout` seconds (gh-4669). >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 219ffa38d..cc4ada47e 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) >>> * don't start appliers. >>> */ >>> static void >>> -box_sync_replication(bool connect_quorum) >>> +box_sync_replication(bool do_quorum, bool do_reuse) >>> { >>> int count =3D 0; >>> struct applier **appliers =3D cfg_get_replication(&count); >>> @@ -1260,12 +1260,27 @@ box_sync_replication(bool connect_quorum) >>> for (int i =3D 0; i < count; i++) >>> applier_delete(appliers[i]); /* doesn't affect = diag */ >>> }); >>> - >>> - replicaset_connect(appliers, count, connect_quorum); >>> + replicaset_connect(appliers, count, do_quorum, do_reuse); >>>=20 >>> guard.is_active =3D false; >>> } >>>=20 >>> +static inline void >>> +box_restart_replication(void) >>> +{ >>> + const bool do_quorum =3D true; >>> + const bool do_reuse =3D false; >>> + box_sync_replication(do_quorum, do_reuse); >>> +} >>> + >>> +static inline void >>> +box_update_replication(void) >>> +{ >>> + const bool do_quorum =3D false; >>> + const bool do_reuse =3D true; >>> + box_sync_replication(do_quorum, do_reuse); >>> +} >>> + >>> void >>> box_set_replication(void) >>> { >>> @@ -1284,7 +1299,7 @@ box_set_replication(void) >>> * Stay in orphan mode in case we fail to connect to at least >>> * 'replication_connect_quorum' remote instances. >>> */ >>> - box_sync_replication(false); >>> + box_update_replication(); >>> /* Follow replica */ >>> replicaset_follow(); >>> /* Wait until appliers are in sync */ >>> @@ -1404,7 +1419,7 @@ box_set_replication_anon(void) >>> * them can register and others resend a >>> * non-anonymous subscribe. >>> */ >>> - box_sync_replication(true); >>> + box_restart_replication(); >>> /* >>> * Wait until the master has registered this >>> * instance. >>> @@ -3258,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid, >>> * with connecting to 'replication_connect_quorum' masters. >>> * If this also fails, throw an error. >>> */ >>> - box_sync_replication(true); >>> + box_restart_replication(); >>>=20 >>> struct replica *master =3D replicaset_find_join_master(); >>> assert(master =3D=3D NULL || master->applier !=3D NULL); >>> @@ -3335,7 +3350,7 @@ local_recovery(const struct tt_uuid = *instance_uuid, >>> if (wal_dir_lock >=3D 0) { >>> if (box_listen() !=3D 0) >>> diag_raise(); >>> - box_sync_replication(false); >>> + box_update_replication(); >>>=20 >>> struct replica *master; >>> if (replicaset_needs_rejoin(&master)) { >>> @@ -3414,7 +3429,7 @@ local_recovery(const struct tt_uuid = *instance_uuid, >>> vclock_copy(&replicaset.vclock, &recovery->vclock); >>> if (box_listen() !=3D 0) >>> diag_raise(); >>> - box_sync_replication(false); >>> + box_update_replication(); >>> } >>> stream_guard.is_active =3D false; >>> recovery_finalize(recovery); >>> diff --git a/src/box/replication.cc b/src/box/replication.cc >>> index 1288bc9b1..10b4ac915 100644 >>> --- a/src/box/replication.cc >>> +++ b/src/box/replication.cc >>> @@ -507,7 +507,7 @@ replica_on_applier_state_f(struct trigger = *trigger, void *event) >>> * upon reconfiguration of box.cfg.replication. >>> */ >>> static void >>> -replicaset_update(struct applier **appliers, int count) >>> +replicaset_update(struct applier **appliers, int count, bool = keep_connect) >>> { >>> replica_hash_t uniq; >>> memset(&uniq, 0, sizeof(uniq)); >>> @@ -572,22 +572,39 @@ replicaset_update(struct applier **appliers, = int count) >>> applier_stop(applier); >>> applier_delete(applier); >>> } >>> + >>> + replicaset.applier.total =3D count; >>> + replicaset.applier.connected =3D 0; >>> + replicaset.applier.loading =3D 0; >>> + replicaset.applier.synced =3D 0; >>> replicaset_foreach(replica) { >>> if (replica->applier =3D=3D NULL) >>> continue; >>> - applier =3D replica->applier; >>> - replica_clear_applier(replica); >>> - replica->applier_sync_state =3D APPLIER_DISCONNECTED; >>> + struct replica *other =3D replica_hash_search(&uniq, = replica); >>> + if (keep_connect && other !=3D NULL && >>> + (replica->applier->state =3D=3D APPLIER_FOLLOW || >>> + replica->applier->state =3D=3D APPLIER_SYNC)) { >>> + /* >>> + * Try not to interrupt working appliers upon >>> + * reconfiguration. >>> + */ >>> + replicaset.applier.connected++; >>> + replicaset.applier.synced++; >>> + replica_hash_remove(&uniq, other); >>> + applier =3D other->applier; >>> + replica_clear_applier(other); >>> + replica_delete(other); >>> + } else { >>> + applier =3D replica->applier; >>> + replica_clear_applier(replica); >>> + replica->applier_sync_state =3D = APPLIER_DISCONNECTED; >>> + } >>> applier_stop(applier); >>> applier_delete(applier); >>> } >>>=20 >>> - /* Save new appliers */ >>> - replicaset.applier.total =3D count; >>> - replicaset.applier.connected =3D 0; >>> - replicaset.applier.loading =3D 0; >>> - replicaset.applier.synced =3D 0; >>>=20 >>> + /* Save new appliers */ >>> replica_hash_foreach_safe(&uniq, replica, next) { >>> replica_hash_remove(&uniq, replica); >>>=20 >>> @@ -664,11 +681,11 @@ applier_on_connect_f(struct trigger *trigger, = void *event) >>>=20 >>> void >>> replicaset_connect(struct applier **appliers, int count, >>> - bool connect_quorum) >>> + bool connect_quorum, bool keep_connect) >>> { >>> if (count =3D=3D 0) { >>> /* Cleanup the replica set. */ >>> - replicaset_update(appliers, count); >>> + replicaset_update(appliers, 0, false); >>> return; >>> } >>>=20 >>> @@ -772,7 +789,7 @@ replicaset_connect(struct applier **appliers, = int count, >>>=20 >>> /* Now all the appliers are connected, update the replica set. = */ >>> try { >>> - replicaset_update(appliers, count); >>> + replicaset_update(appliers, count, keep_connect); >>> } catch (Exception *e) { >>> goto error; >>> } >>> diff --git a/src/box/replication.h b/src/box/replication.h >>> index 4c82cd839..a8fed45e8 100644 >>> --- a/src/box/replication.h >>> +++ b/src/box/replication.h >>> @@ -439,10 +439,12 @@ replicaset_add_anon(const struct tt_uuid = *replica_uuid); >>> * \param connect_quorum if this flag is set, fail unless at >>> * least replication_connect_quorum >>> * appliers have successfully connected. >>> + * \param keep_connect if this flag is set do not force a = reconnect if the >>> + * old connection to the replica is fine. >>> */ >>> void >>> replicaset_connect(struct applier **appliers, int count, >>> - bool connect_quorum); >>> + bool connect_quorum, bool keep_connect); >>>=20 >>> /** >>> * Check if the current instance fell too much behind its >>> diff --git a/test/instance_files/base_instance.lua = b/test/instance_files/base_instance.lua >>> index 45bdbc7e8..e579c3843 100755 >>> --- a/test/instance_files/base_instance.lua >>> +++ b/test/instance_files/base_instance.lua >>> @@ -5,7 +5,8 @@ local listen =3D os.getenv('TARANTOOL_LISTEN') >>> box.cfg({ >>> work_dir =3D workdir, >>> -- listen =3D 'localhost:3310' >>> - listen =3D listen >>> + listen =3D listen, >>> + log =3D workdir..'/tarantool.log', >>> }) >>>=20 >>> box.schema.user.grant('guest', 'read,write,execute,create', = 'universe') >>> diff --git a/test/instance_files/replica.lua = b/test/instance_files/replica.lua >>> new file mode 100755 >>> index 000000000..587345f24 >>> --- /dev/null >>> +++ b/test/instance_files/replica.lua >>> @@ -0,0 +1,17 @@ >>> +#!/usr/bin/env tarantool >>> +local workdir =3D os.getenv('TARANTOOL_WORKDIR') >>> +local listen =3D os.getenv('TARANTOOL_LISTEN') >>> +local SOCKET_DIR =3D os.getenv('VARDIR') >>> +local MASTER =3D arg[1] or "master" >>> + >>> +local function master_uri() >>> + return SOCKET_DIR..'/'..MASTER..'.sock' >>> +end >>> + >>> +box.cfg({ >>> + work_dir =3D workdir, >>> + listen =3D listen, >>> + replication =3D master_uri(), >>> +}) >>> + >>> +_G.ready =3D true >>> diff --git a/test/luatest_helpers/server.lua = b/test/luatest_helpers/server.lua >>> index 018ea4f7d..0306a24ef 100644 >>> --- a/test/luatest_helpers/server.lua >>> +++ b/test/luatest_helpers/server.lua >>> @@ -5,6 +5,7 @@ local fiber =3D require('fiber') >>> local fio =3D require('fio') >>> local fun =3D require('fun') >>> local json =3D require('json') >>> +local errno =3D require('errno') >>>=20 >>> local checks =3D require('checks') >>> local luatest =3D require('luatest') >>> @@ -59,6 +60,70 @@ function Server:initialize() >>> getmetatable(getmetatable(self)).initialize(self) >>> end >>>=20 >>> +-- A copy of test_run:grep_log. >>> +function Server:grep_log(what, bytes, opts) >>> + local opts =3D opts or {} >>> + local noreset =3D opts.noreset or false >>> + -- if instance has crashed provide filename to use grep_log >>> + local filename =3D opts.filename or self:eval('return = box.cfg.log') >>> + local file =3D fio.open(filename, {'O_RDONLY', 'O_NONBLOCK'}) >>> + >>> + local function fail(msg) >>> + local err =3D errno.strerror() >>> + if file ~=3D nil then >>> + file:close() >>> + end >>> + error(string.format("%s: %s: %s", msg, filename, err)) >>> + end >>> + >>> + if file =3D=3D nil then >>> + fail("Failed to open log file") >>> + end >>> + io.flush() -- attempt to flush stdout =3D=3D log fd >>> + local filesize =3D file:seek(0, 'SEEK_END') >>> + if filesize =3D=3D nil then >>> + fail("Failed to get log file size") >>> + end >>> + local bytes =3D bytes or 65536 -- don't read whole log - it can = be huge >>> + bytes =3D bytes > filesize and filesize or bytes >>> + if file:seek(-bytes, 'SEEK_END') =3D=3D nil then >>> + fail("Failed to seek log file") >>> + end >>> + local found, buf >>> + repeat -- read file in chunks >>> + local s =3D file:read(2048) >>> + if s =3D=3D nil then >>> + fail("Failed to read log file") >>> + end >>> + local pos =3D 1 >>> + repeat -- split read string in lines >>> + local endpos =3D string.find(s, '\n', pos) >>> + endpos =3D endpos and endpos - 1 -- strip terminating = \n >>> + local line =3D string.sub(s, pos, endpos) >>> + if endpos =3D=3D nil and s ~=3D '' then >>> + -- line doesn't end with \n or eof, append it to = buffer >>> + -- to be checked on next iteration >>> + buf =3D buf or {} >>> + table.insert(buf, line) >>> + else >>> + if buf ~=3D nil then -- prepend line with buffered = data >>> + table.insert(buf, line) >>> + line =3D table.concat(buf) >>> + buf =3D nil >>> + end >>> + if string.match(line, "Starting instance") and not = noreset then >>> + found =3D nil -- server was restarted, reset = search >>> + else >>> + found =3D string.match(line, what) or found >>> + end >>> + end >>> + pos =3D endpos and endpos + 2 -- jump to char after \n >>> + until pos =3D=3D nil >>> + until s =3D=3D '' >>> + file:close() >>> + return found >>> +end >>> + >>> --- Generates environment to run process with. >>> -- The result is merged into os.environ(). >>> -- @return map >>> diff --git = a/test/replication-luatest/gh_4669_applier_reconnect_test.lua = b/test/replication-luatest/gh_4669_applier_reconnect_test.lua >>> new file mode 100644 >>> index 000000000..19e10b136 >>> --- /dev/null >>> +++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua >>> @@ -0,0 +1,40 @@ >>> +local t =3D require('luatest') >>> +local cluster =3D require('test.luatest_helpers.cluster') >>> + >>> +local g =3D t.group('gh-4669-applier-reconnect') >>> + >>> +local function check_follow_master(server) >>> + return t.assert_equals( >>> + server:eval('return = box.info.replication[1].upstream.status'), 'follow') >>> +end >>> + >>> +g.before_each(function() >>> + g.cluster =3D cluster:new({}) >>> + g.master =3D g.cluster:build_server( >>> + {}, {alias =3D 'master'}, 'base_instance.lua') >>> + g.replica =3D g.cluster:build_server( >>> + {args=3D{'master'}}, {alias =3D 'replica'}, 'replica.lua') >>> + >>> + g.cluster:join_server(g.master) >>> + g.cluster:join_server(g.replica) >>> + g.cluster:start() >>> + check_follow_master(g.replica) >>> +end) >>> + >>> +g.after_each(function() >>> + g.cluster:stop() >>> +end) >>> + >>> +-- Test that appliers aren't recreated upon replication = reconfiguration. >>> +g.test_applier_connection_on_reconfig =3D function(g) >>> + g.replica:eval([[ >>> + box.cfg{ >>> + replication =3D { >>> + os.getenv("TARANTOOL_LISTEN"), >>> + box.cfg.replication[1], >>> + } >>> + } >>> + ]]) >> Should we test this with dynamic add/remove of a 2nd replica? >=20 > No problem. Here's the diff: >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >=20 > diff --git = a/test/replication-luatest/gh_4669_applier_reconnect_test.lua = b/test/replication-luatest/gh_4669_applier_reconnect_test.lua > index 19e10b136..e082515fd 100644 > --- a/test/replication-luatest/gh_4669_applier_reconnect_test.lua > +++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua > @@ -14,9 +14,12 @@ g.before_each(function() > {}, {alias =3D 'master'}, 'base_instance.lua') > g.replica =3D g.cluster:build_server( > {args=3D{'master'}}, {alias =3D 'replica'}, 'replica.lua') > + g.replica2 =3D g.cluster:build_server( > + {args=3D{'master'}}, {alias =3D 'replica2'}, 'replica.lua') >=20 > g.cluster:join_server(g.master) > g.cluster:join_server(g.replica) > + g.cluster:join_server(g.replica2) > g.cluster:start() > check_follow_master(g.replica) > end) > @@ -26,12 +29,20 @@ g.after_each(function() > end) >=20 > -- Test that appliers aren't recreated upon replication = reconfiguration. > +-- Add and then remove two extra replicas to the configuration. The = master > +-- connection should stay intact. > g.test_applier_connection_on_reconfig =3D function(g) > g.replica:eval([[ > box.cfg{ > replication =3D { > os.getenv("TARANTOOL_LISTEN"), > box.cfg.replication[1], > + os.getenv('VARDIR')..'/replica2.sock' > + } > + } > + box.cfg{ > + replication =3D { > + box.cfg.replication[2] > } > } > ]]) >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >=20 >>=20 >>> + check_follow_master(g.replica) >>> + t.assert_equals(g.master:grep_log("exiting the relay loop"), = nil) >>> +end >>> --=20 >>> 2.30.1 (Apple Git-130) >>>=20 >=20 > --=20 > Serge Petrenko --Apple-Mail=_7421EA5A-06B2-4F6A-B6EC-F98FC59CEB91 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 LGTM.
Sergos


On 12 Oct 2021, at 17:14, Serge Petrenko <sergepetrenko@tarantool.org> wrote:



12.10.2021 13:14, sergos =D0=BF=D0=B8=D1=88=D0=B5=D1=82:=
Hi!

Thanks for the patch!

Just request for some addition to the test, otherwise = LGTM.

Sergos

On 5 Oct 2021, at = 16:18, Serge Petrenko <sergepetrenko@tarantool.org> wrote:

Replication reconfiguration used to work as follows: upon = receiving a
new config disconnect from all the existing = masters and try to connect
to all the masters from the new = config.

This lead to instances doing extra = work when old config and new config
had the same nodes in = them: instead of doing nothing, tarantool
reinitialized = the connection.

There was another problem: = when an existing connection is broken, master
takes some = time to notice that. So, when replica resets the connection,
it may try to reconnect faster than the master is able to = notice its
absence. In this case replica wouldn't be able = to reconnect due to
`duplicate connection with the same = replica UUID`. So replication would
stop for a = replication_timeout, which may be quite large (seconds or
tens of seconds).

Let's prevent = tarantool from reconnecting to the same instance, if there
already is a working connection.

Closes #4669
---
.../gh-4669-applier-reconfig-reconnect.md =     |  5 ++
src/box/box.cc =             &n= bsp;           &nbs= p;      | 31 ++++++---
src/box/replication.cc =             &n= bsp;          | 41 = ++++++++----
src/box/replication.h =             &n= bsp;           | =  4 +-
test/instance_files/base_instance.lua =         |  3 +-
test/instance_files/replica.lua =             &n= bsp; | 17 +++++
test/luatest_helpers/server.lua =             &n= bsp; | 65 +++++++++++++++++++
.../gh_4669_applier_reconnect_test.lua =        | 40 ++++++++++++
8 files changed, 184 insertions(+), 22 deletions(-)
create mode 100644 = changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md
create mode 100755 test/instance_files/replica.lua
create mode 100644 = test/replication-luatest/gh_4669_applier_reconnect_test.lua

diff --git = a/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md = b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md
new file mode 100644
index = 000000000..2a776872b
--- /dev/null
+++ = b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md
@@ -0,0 +1,5 @@
+## bugfix/replication
+
+* Fixed replica reconnecting to a living = master on any `box.cfg{replication=3D...}`
+  change. = Such reconnects could lead to replica failing to restore connection
+  for `replication_timeout` seconds (gh-4669).
diff --git a/src/box/box.cc b/src/box/box.cc
index 219ffa38d..cc4ada47e 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1249,7 = +1249,7 @@ cfg_get_replication(int *p_count)
 * don't = start appliers.
 */
static void
-box_sync_replication(bool connect_quorum)
+box_sync_replication(bool do_quorum, bool do_reuse)
{
int count =3D 0;
= struct applier **appliers =3D cfg_get_replication(&count);
@@ -1260,12 +1260,27 @@ box_sync_replication(bool = connect_quorum)
for (int i =3D 0; i < count; = i++)
applier_delete(appliers[i]); /* doesn't affect diag */
= });
-
- replicaset_connect(appliers, = count, connect_quorum);
+ replicaset_connect(appliers, = count, do_quorum, do_reuse);

= guard.is_active =3D false;
}

+static inline void
+box_restart_replication(void)
+{
+ = const bool do_quorum =3D true;
+ const = bool do_reuse =3D false;
+ box_sync_replication(do_quorum, = do_reuse);
+}
+
+static inline = void
+box_update_replication(void)
+{
+ = const bool do_quorum =3D false;
+ const = bool do_reuse =3D true;
+ box_sync_replication(do_quorum, = do_reuse);
+}
+
void
box_set_replication(void)
{
@@ = -1284,7 +1299,7 @@ box_set_replication(void)
 * Stay in orphan mode in = case we fail to connect to at least
 * = 'replication_connect_quorum' remote instances.
 */
- = box_sync_replication(false);
+ = box_update_replication();
/* Follow = replica */
replicaset_follow();
= /* Wait until appliers are in sync */
@@ -1404,7 = +1419,7 @@ box_set_replication_anon(void)
 * them can register and = others resend a
 * non-anonymous = subscribe.
 */
- = box_sync_replication(true);
+ = box_restart_replication();
/*
= =  * Wait until = the master has registered this
 * instance.
@@ = -3258,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid,
=  * with = connecting to 'replication_connect_quorum' masters.
 * If this also fails, throw = an error.
 */
- = box_sync_replication(true);
+ = box_restart_replication();

struct = replica *master =3D replicaset_find_join_master();
= assert(master =3D=3D NULL || master->applier !=3D NULL);
@@ -3335,7 +3350,7 @@ local_recovery(const struct tt_uuid = *instance_uuid,
if (wal_dir_lock >=3D 0) {
= = if (box_listen() !=3D 0)
= diag_raise();
- box_sync_replication(false);
+ = = box_update_replication();

struct = replica *master;
if = (replicaset_needs_rejoin(&master)) {
@@ -3414,7 = +3429,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
= = vclock_copy(&replicaset.vclock, &recovery->vclock);
= = if (box_listen() !=3D 0)
= diag_raise();
- box_sync_replication(false);
+ = = box_update_replication();
}
= stream_guard.is_active =3D false;
= recovery_finalize(recovery);
diff --git = a/src/box/replication.cc = b/src/box/replication.cc
index 1288bc9b1..10b4ac915 = 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -507,7 +507,7 @@ replica_on_applier_state_f(struct trigger = *trigger, void *event)
 * upon reconfiguration of = box.cfg.replication.
 */
static void
-replicaset_update(struct applier **appliers, int count)
+replicaset_update(struct applier **appliers, int count, bool = keep_connect)
{
replica_hash_t uniq;
= memset(&uniq, 0, sizeof(uniq));
@@ -572,22 = +572,39 @@ replicaset_update(struct applier **appliers, int count)
= = applier_stop(applier);
applier_delete(applier);
= }
+
+ replicaset.applier.total =3D = count;
+ replicaset.applier.connected =3D 0;
+ = replicaset.applier.loading =3D 0;
+ = replicaset.applier.synced =3D 0;
= replicaset_foreach(replica) {
if = (replica->applier =3D=3D NULL)
= continue;
- applier =3D = replica->applier;
- = replica_clear_applier(replica);
- = replica->applier_sync_state =3D APPLIER_DISCONNECTED;
+ = = struct replica *other =3D replica_hash_search(&uniq, = replica);
+ if (keep_connect && other = !=3D NULL &&
+     (replica-&g= t;applier->state =3D=3D APPLIER_FOLLOW ||
+      repli= ca->applier->state =3D=3D APPLIER_SYNC)) {
+ /*
+ = = =  * Try not to = interrupt working appliers upon
+  * reconfiguration.
+ = = =  */
+ = = = replicaset.applier.connected++;
+ = replicaset.applier.synced++;
+ = replica_hash_remove(&uniq, other);
+ applier =3D= other->applier;
+ replica_clear_applier(other);
+ = = = replica_delete(other);
+ } else = {
+ applier =3D replica->applier;
+ = replica_clear_applier(replica);
+ = replica->applier_sync_state =3D APPLIER_DISCONNECTED;
+ = = }
applier_stop(applier);
= = applier_delete(applier);
}

- /* Save new appliers */
- = replicaset.applier.total =3D count;
- = replicaset.applier.connected =3D 0;
- = replicaset.applier.loading =3D 0;
- = replicaset.applier.synced =3D 0;

+ = /* Save new appliers */
= replica_hash_foreach_safe(&uniq, replica, next) {
= = replica_hash_remove(&uniq, replica);

@@ -664,11 +681,11 @@ applier_on_connect_f(struct trigger = *trigger, void *event)

void
replicaset_connect(struct applier **appliers, int count,
- = =    bool= connect_quorum)
+    bool = connect_quorum, bool keep_connect)
{
if (count = =3D=3D 0) {
/* Cleanup the replica set. */
- = = replicaset_update(appliers, count);
+ = replicaset_update(appliers, 0, false);
= return;
}

@@ = -772,7 +789,7 @@ replicaset_connect(struct applier **appliers, int = count,

/* Now all the appliers are = connected, update the replica set. */
try {
- = = replicaset_update(appliers, count);
+ = replicaset_update(appliers, count, keep_connect);
= } catch (Exception *e) {
goto = error;
}
diff --git a/src/box/replication.h = b/src/box/replication.h
index 4c82cd839..a8fed45e8 = 100644
--- a/src/box/replication.h
+++ = b/src/box/replication.h
@@ -439,10 +439,12 @@ = replicaset_add_anon(const struct tt_uuid *replica_uuid);
 * \param connect_quorum if this flag is set, fail = unless at
 * =             &n= bsp;         least = replication_connect_quorum
 * =             &n= bsp;         appliers have = successfully connected.
+ * \param keep_connect =   if this flag is set do not force a reconnect if the
+ * =             &n= bsp;         old connection = to the replica is fine.
 */
void
replicaset_connect(struct applier **appliers, int count,
- = =    bool= connect_quorum);
+    bool = connect_quorum, bool keep_connect);

/**
 * Check if the current instance fell too much behind = its
diff --git a/test/instance_files/base_instance.lua = b/test/instance_files/base_instance.lua
index = 45bdbc7e8..e579c3843 100755
--- = a/test/instance_files/base_instance.lua
+++ = b/test/instance_files/base_instance.lua
@@ -5,7 +5,8 @@ = local listen =3D os.getenv('TARANTOOL_LISTEN')
box.cfg({
    work_dir =3D workdir,
-- =     listen =3D 'localhost:3310'
- =    listen =3D listen
+ =    listen =3D listen,
+ =    log =3D workdir..'/tarantool.log',
})

box.schema.user.grant('guest', = 'read,write,execute,create', 'universe')
diff --git = a/test/instance_files/replica.lua b/test/instance_files/replica.lua
new file mode 100755
index = 000000000..587345f24
--- /dev/null
+++ = b/test/instance_files/replica.lua
@@ -0,0 +1,17 @@
+#!/usr/bin/env tarantool
+local workdir =3D = os.getenv('TARANTOOL_WORKDIR')
+local listen =3D = os.getenv('TARANTOOL_LISTEN')
+local SOCKET_DIR =3D = os.getenv('VARDIR')
+local MASTER =3D arg[1] or = "master"
+
+local function master_uri()
+    return = SOCKET_DIR..'/'..MASTER..'.sock'
+end
+
+box.cfg({
+    work_dir =3D = workdir,
+    listen =3D listen,
+    replication =3D master_uri(),
+})
+
+_G.ready =3D true
diff --git a/test/luatest_helpers/server.lua = b/test/luatest_helpers/server.lua
index = 018ea4f7d..0306a24ef 100644
--- = a/test/luatest_helpers/server.lua
+++ = b/test/luatest_helpers/server.lua
@@ -5,6 +5,7 @@ local = fiber =3D require('fiber')
local fio =3D require('fio')
local fun =3D require('fun')
local json =3D = require('json')
+local errno =3D require('errno')

local checks =3D require('checks')
local luatest =3D require('luatest')
@@ -59,6 = +60,70 @@ function Server:initialize()
    getmetatable(getmetatable(self)).initia= lize(self)
end

+-- A copy of = test_run:grep_log.
+function Server:grep_log(what, bytes, = opts)
+    local opts =3D opts or {}
+    local noreset =3D opts.noreset or = false
+    -- if instance has crashed = provide filename to use grep_log
+    local = filename =3D opts.filename or self:eval('return box.cfg.log')
+    local file =3D fio.open(filename, = {'O_RDONLY', 'O_NONBLOCK'})
+
+ =    local function fail(msg)
+ =        local err =3D = errno.strerror()
+ =        if file ~=3D nil then
+ =            file:clo= se()
+        end
+ =        error(string.format("%s: %s: = %s", msg, filename, err))
+    end
+
+    if file =3D=3D nil = then
+ =        fail("Failed to open log = file")
+    end
+ =    io.flush() -- attempt to flush stdout =3D=3D log fd
+    local filesize =3D file:seek(0, = 'SEEK_END')
+    if filesize =3D=3D nil = then
+ =        fail("Failed to get log file = size")
+    end
+ =    local bytes =3D bytes or 65536 -- don't read whole log = - it can be huge
+    bytes =3D bytes > = filesize and filesize or bytes
+    if = file:seek(-bytes, 'SEEK_END') =3D=3D nil then
+ =        fail("Failed to seek log = file")
+    end
+ =    local found, buf
+ =    repeat -- read file in chunks
+ =        local s =3D file:read(2048)
+        if s =3D=3D nil = then
+ =            fail("Fa= iled to read log file")
+ =        end
+ =        local pos =3D 1
+ =        repeat -- split read string in = lines
+ =            local = endpos =3D string.find(s, '\n', pos)
+ =            endpos = =3D endpos and endpos - 1 -- strip terminating \n
+ =            local = line =3D string.sub(s, pos, endpos)
+ =            if = endpos =3D=3D nil and s ~=3D '' then
+ =             &n= bsp;  -- line doesn't end with \n or eof, append it to = buffer
+ =             &n= bsp;  -- to be checked on next iteration
+ =             &n= bsp;  buf =3D buf or {}
+ =             &n= bsp;  table.insert(buf, line)
+ =            else
+ =             &n= bsp;  if buf ~=3D nil then -- prepend line with buffered = data
+ =             &n= bsp;      table.insert(buf, line)
+ =             &n= bsp;      line =3D table.concat(buf)
+ =             &n= bsp;      buf =3D nil
+ =             &n= bsp;  end
+ =             &n= bsp;  if string.match(line, "Starting instance") and not = noreset then
+ =             &n= bsp;      found =3D nil -- server was = restarted, reset search
+ =             &n= bsp;  else
+ =             &n= bsp;      found =3D string.match(line, = what) or found
+ =             &n= bsp;  end
+ =            end
+ =            pos =3D = endpos and endpos + 2 -- jump to char after \n
+ =        until pos =3D=3D nil
+    until s =3D=3D ''
+ =    file:close()
+    return = found
+end
+
--- Generates = environment to run process with.
-- The result is merged = into os.environ().
-- @return map
diff --git = a/test/replication-luatest/gh_4669_applier_reconnect_test.lua = b/test/replication-luatest/gh_4669_applier_reconnect_test.lua
new file mode 100644
index = 000000000..19e10b136
--- /dev/null
+++ = b/test/replication-luatest/gh_4669_applier_reconnect_test.lua
@@ -0,0 +1,40 @@
+local t =3D = require('luatest')
+local cluster =3D = require('test.luatest_helpers.cluster')
+
+local g =3D t.group('gh-4669-applier-reconnect')
+
+local function = check_follow_master(server)
+    return = t.assert_equals(
+ =        server:eval('return = box.info.replication[1].upstream.status'), 'follow')
+end+
+g.before_each(function()
+ =    g.cluster =3D cluster:new({})
+ =    g.master =3D g.cluster:build_server(
+ =        {}, {alias =3D 'master'}, = 'base_instance.lua')
+    g.replica =3D = g.cluster:build_server(
+ =        {args=3D{'master'}}, {alias =3D = 'replica'}, 'replica.lua')
+
+ =    g.cluster:join_server(g.master)
+ =    g.cluster:join_server(g.replica)
+ =    g.cluster:start()
+ =    check_follow_master(g.replica)
+end)
+
+g.after_each(function()
+ =    g.cluster:stop()
+end)
+
+-- Test that appliers aren't recreated upon replication = reconfiguration.
+g.test_applier_connection_on_reconfig =3D = function(g)
+    g.replica:eval([[
+        box.cfg{
+ =            replicat= ion =3D {
+ =             &n= bsp;  os.getenv("TARANTOOL_LISTEN"),
+ =             &n= bsp;  box.cfg.replication[1],
+ =            }
+        }
+ =    ]])
Should we test this = with dynamic add/remove of a 2nd replica?

No problem. = Here's the diff:

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D

diff --git = a/test/replication-luatest/gh_4669_applier_reconnect_test.lua = b/test/replication-luatest/gh_4669_applier_reconnect_test.lua
index = 19e10b136..e082515fd 100644
--- = a/test/replication-luatest/gh_4669_applier_reconnect_test.lua
+++ = b/test/replication-luatest/gh_4669_applier_reconnect_test.lua
@@ -14,9 = +14,12 @@ g.before_each(function()
         {}, {alias =3D= 'master'}, 'base_instance.lua')
     g.replica =3D = g.cluster:build_server(
         = {args=3D{'master'}}, {alias =3D 'replica'}, 'replica.lua')
+    g.replica2 =3D = g.cluster:build_server(
+        = {args=3D{'master'}}, {alias =3D 'replica2'}, 'replica.lua')

     = g.cluster:join_server(g.master)
     = g.cluster:join_server(g.replica)
+    = g.cluster:join_server(g.replica2)
     g.cluster:start()
     = check_follow_master(g.replica)
 end)
@@ -26,12 +29,20 @@ g.after_each(function()
 end)

 -- Test that appliers aren't recreated upon replication = reconfiguration.
+-- Add and then remove two extra replicas to the = configuration. The master
+-- connection should stay intact.
 g.test_applier_connection_on_reconfig =3D = function(g)
     g.replica:eval([[
         = box.cfg{
          &nb= sp;  replication =3D {
          &nb= sp;      = os.getenv("TARANTOOL_LISTEN"),
          &nb= sp;      box.cfg.replication[1],
+          &n= bsp;     = os.getenv('VARDIR')..'/replica2.sock'
+          &n= bsp; }
+        }
+        = box.cfg{
+          &n= bsp; replication =3D {
+          &n= bsp;     box.cfg.replication[2]
          &nb= sp;  }
         }
     ]])

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D


+ =    check_follow_master(g.replica)
+ =    t.assert_equals(g.master:grep_log("exiting the relay = loop"), nil)
+end
-- 
2.30.1 = (Apple Git-130)

-- 
Serge = Petrenko

= --Apple-Mail=_7421EA5A-06B2-4F6A-B6EC-F98FC59CEB91--