From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4DD7B711AB; Tue, 12 Oct 2021 13:14:31 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4DD7B711AB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1634033671; bh=gEwC69Nx7KDrzpJHap+dDCswMv6TfblbJDgsPvzc93c=; h=In-Reply-To:Date:Cc:References:To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=QhsAnEchnKRJ2aInOCZfyq22apGENopIo2+Z8lTiaXgabzV6vu3Djfe7ftJ095CzW /mrrYjgm6AeOpIF80yZZ10t2oFK4SC/nIibFQBOxGlPQdCtuAXKVADoPoe/Y1T1M7R wT7WEQQIyC9U9OTe5sIBL2RznZzCF/QmuVyCIecg= Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 638E0711A9 for ; Tue, 12 Oct 2021 13:14:29 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 638E0711A9 Received: by smtp33.i.mail.ru with esmtpa (envelope-from ) id 1maEnk-0003Qr-Dq; Tue, 12 Oct 2021 13:14:29 +0300 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) In-Reply-To: <5f68cfcac8c44346a690e0f48ad0da1d7646ac83.1633439400.git.sergepetrenko@tarantool.org> Date: Tue, 12 Oct 2021 13:14:27 +0300 Cc: tarantool-patches@dev.tarantool.org Content-Transfer-Encoding: quoted-printable Message-Id: <648688AC-03B8-48B3-B644-8F3B9A9A9212@tarantool.org> References: <5f68cfcac8c44346a690e0f48ad0da1d7646ac83.1633439400.git.sergepetrenko@tarantool.org> To: Serge Petrenko X-Mailer: Apple Mail (2.3654.120.0.1.13) X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD922964B4BA091D9ACBEACDE9010F061E4495BCD93C24EB242182A05F5380850407ABB2E5A0904EAA38AAD3798019258CF5EF510941EE8594DBB52A3846CC5CD12 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7AC4684DF4EC4B256EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637FD169B9D7A3022168638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D81104B9B8F8ABAA36819491BB4B96797A117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAE9A1BBD95851C5BA471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7C8BDE37D78FCB031643847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A53751A4E2BA1507D1C8B0F5D5D2AC25D338DECB0E97FEFE31D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA752654937D1C607C78410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34E20F2278F74481A12829E6107D023925B0E8C3EB52665B24ED073511457385C993A422D0F5996E9C1D7E09C32AA3244CC40486E2222AFD6123515D7AF1829408435BF7150578642F729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojtWOf/ZwQPGG0+eKFP9GGlA== X-Mailru-Sender: 3B9A0136629DC912F4AABCEFC589C81E55D113FB93ED319E8EEE56A85A6E699C54E5E91BF2E07D9AAD07DD1419AC565FA614486B47F28B67C5E079CCF3B0523AED31B7EB2E253A9E112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 2/2] replication: fix replica disconnect upon reconfiguration X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch! Just request for some addition to the test, otherwise LGTM. Sergos > On 5 Oct 2021, at 16:18, Serge Petrenko = wrote: >=20 > Replication reconfiguration used to work as follows: upon receiving a > new config disconnect from all the existing masters and try to connect > to all the masters from the new config. >=20 > This lead to instances doing extra work when old config and new config > had the same nodes in them: instead of doing nothing, tarantool > reinitialized the connection. >=20 > There was another problem: when an existing connection is broken, = master > takes some time to notice that. So, when replica resets the = connection, > it may try to reconnect faster than the master is able to notice its > absence. In this case replica wouldn't be able to reconnect due to > `duplicate connection with the same replica UUID`. So replication = would > stop for a replication_timeout, which may be quite large (seconds or > tens of seconds). >=20 > Let's prevent tarantool from reconnecting to the same instance, if = there > already is a working connection. >=20 > Closes #4669 > --- > .../gh-4669-applier-reconfig-reconnect.md | 5 ++ > src/box/box.cc | 31 ++++++--- > src/box/replication.cc | 41 ++++++++---- > src/box/replication.h | 4 +- > test/instance_files/base_instance.lua | 3 +- > test/instance_files/replica.lua | 17 +++++ > test/luatest_helpers/server.lua | 65 +++++++++++++++++++ > .../gh_4669_applier_reconnect_test.lua | 40 ++++++++++++ > 8 files changed, 184 insertions(+), 22 deletions(-) > create mode 100644 = changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md > create mode 100755 test/instance_files/replica.lua > create mode 100644 = test/replication-luatest/gh_4669_applier_reconnect_test.lua >=20 > diff --git = a/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md = b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md > new file mode 100644 > index 000000000..2a776872b > --- /dev/null > +++ b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md > @@ -0,0 +1,5 @@ > +## bugfix/replication > + > +* Fixed replica reconnecting to a living master on any = `box.cfg{replication=3D...}` > + change. Such reconnects could lead to replica failing to restore = connection > + for `replication_timeout` seconds (gh-4669). > diff --git a/src/box/box.cc b/src/box/box.cc > index 219ffa38d..cc4ada47e 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) > * don't start appliers. > */ > static void > -box_sync_replication(bool connect_quorum) > +box_sync_replication(bool do_quorum, bool do_reuse) > { > int count =3D 0; > struct applier **appliers =3D cfg_get_replication(&count); > @@ -1260,12 +1260,27 @@ box_sync_replication(bool connect_quorum) > for (int i =3D 0; i < count; i++) > applier_delete(appliers[i]); /* doesn't affect = diag */ > }); > - > - replicaset_connect(appliers, count, connect_quorum); > + replicaset_connect(appliers, count, do_quorum, do_reuse); >=20 > guard.is_active =3D false; > } >=20 > +static inline void > +box_restart_replication(void) > +{ > + const bool do_quorum =3D true; > + const bool do_reuse =3D false; > + box_sync_replication(do_quorum, do_reuse); > +} > + > +static inline void > +box_update_replication(void) > +{ > + const bool do_quorum =3D false; > + const bool do_reuse =3D true; > + box_sync_replication(do_quorum, do_reuse); > +} > + > void > box_set_replication(void) > { > @@ -1284,7 +1299,7 @@ box_set_replication(void) > * Stay in orphan mode in case we fail to connect to at least > * 'replication_connect_quorum' remote instances. > */ > - box_sync_replication(false); > + box_update_replication(); > /* Follow replica */ > replicaset_follow(); > /* Wait until appliers are in sync */ > @@ -1404,7 +1419,7 @@ box_set_replication_anon(void) > * them can register and others resend a > * non-anonymous subscribe. > */ > - box_sync_replication(true); > + box_restart_replication(); > /* > * Wait until the master has registered this > * instance. > @@ -3258,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid, > * with connecting to 'replication_connect_quorum' masters. > * If this also fails, throw an error. > */ > - box_sync_replication(true); > + box_restart_replication(); >=20 > struct replica *master =3D replicaset_find_join_master(); > assert(master =3D=3D NULL || master->applier !=3D NULL); > @@ -3335,7 +3350,7 @@ local_recovery(const struct tt_uuid = *instance_uuid, > if (wal_dir_lock >=3D 0) { > if (box_listen() !=3D 0) > diag_raise(); > - box_sync_replication(false); > + box_update_replication(); >=20 > struct replica *master; > if (replicaset_needs_rejoin(&master)) { > @@ -3414,7 +3429,7 @@ local_recovery(const struct tt_uuid = *instance_uuid, > vclock_copy(&replicaset.vclock, &recovery->vclock); > if (box_listen() !=3D 0) > diag_raise(); > - box_sync_replication(false); > + box_update_replication(); > } > stream_guard.is_active =3D false; > recovery_finalize(recovery); > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 1288bc9b1..10b4ac915 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -507,7 +507,7 @@ replica_on_applier_state_f(struct trigger = *trigger, void *event) > * upon reconfiguration of box.cfg.replication. > */ > static void > -replicaset_update(struct applier **appliers, int count) > +replicaset_update(struct applier **appliers, int count, bool = keep_connect) > { > replica_hash_t uniq; > memset(&uniq, 0, sizeof(uniq)); > @@ -572,22 +572,39 @@ replicaset_update(struct applier **appliers, int = count) > applier_stop(applier); > applier_delete(applier); > } > + > + replicaset.applier.total =3D count; > + replicaset.applier.connected =3D 0; > + replicaset.applier.loading =3D 0; > + replicaset.applier.synced =3D 0; > replicaset_foreach(replica) { > if (replica->applier =3D=3D NULL) > continue; > - applier =3D replica->applier; > - replica_clear_applier(replica); > - replica->applier_sync_state =3D APPLIER_DISCONNECTED; > + struct replica *other =3D replica_hash_search(&uniq, = replica); > + if (keep_connect && other !=3D NULL && > + (replica->applier->state =3D=3D APPLIER_FOLLOW || > + replica->applier->state =3D=3D APPLIER_SYNC)) { > + /* > + * Try not to interrupt working appliers upon > + * reconfiguration. > + */ > + replicaset.applier.connected++; > + replicaset.applier.synced++; > + replica_hash_remove(&uniq, other); > + applier =3D other->applier; > + replica_clear_applier(other); > + replica_delete(other); > + } else { > + applier =3D replica->applier; > + replica_clear_applier(replica); > + replica->applier_sync_state =3D = APPLIER_DISCONNECTED; > + } > applier_stop(applier); > applier_delete(applier); > } >=20 > - /* Save new appliers */ > - replicaset.applier.total =3D count; > - replicaset.applier.connected =3D 0; > - replicaset.applier.loading =3D 0; > - replicaset.applier.synced =3D 0; >=20 > + /* Save new appliers */ > replica_hash_foreach_safe(&uniq, replica, next) { > replica_hash_remove(&uniq, replica); >=20 > @@ -664,11 +681,11 @@ applier_on_connect_f(struct trigger *trigger, = void *event) >=20 > void > replicaset_connect(struct applier **appliers, int count, > - bool connect_quorum) > + bool connect_quorum, bool keep_connect) > { > if (count =3D=3D 0) { > /* Cleanup the replica set. */ > - replicaset_update(appliers, count); > + replicaset_update(appliers, 0, false); > return; > } >=20 > @@ -772,7 +789,7 @@ replicaset_connect(struct applier **appliers, int = count, >=20 > /* Now all the appliers are connected, update the replica set. = */ > try { > - replicaset_update(appliers, count); > + replicaset_update(appliers, count, keep_connect); > } catch (Exception *e) { > goto error; > } > diff --git a/src/box/replication.h b/src/box/replication.h > index 4c82cd839..a8fed45e8 100644 > --- a/src/box/replication.h > +++ b/src/box/replication.h > @@ -439,10 +439,12 @@ replicaset_add_anon(const struct tt_uuid = *replica_uuid); > * \param connect_quorum if this flag is set, fail unless at > * least replication_connect_quorum > * appliers have successfully connected. > + * \param keep_connect if this flag is set do not force a reconnect = if the > + * old connection to the replica is fine. > */ > void > replicaset_connect(struct applier **appliers, int count, > - bool connect_quorum); > + bool connect_quorum, bool keep_connect); >=20 > /** > * Check if the current instance fell too much behind its > diff --git a/test/instance_files/base_instance.lua = b/test/instance_files/base_instance.lua > index 45bdbc7e8..e579c3843 100755 > --- a/test/instance_files/base_instance.lua > +++ b/test/instance_files/base_instance.lua > @@ -5,7 +5,8 @@ local listen =3D os.getenv('TARANTOOL_LISTEN') > box.cfg({ > work_dir =3D workdir, > -- listen =3D 'localhost:3310' > - listen =3D listen > + listen =3D listen, > + log =3D workdir..'/tarantool.log', > }) >=20 > box.schema.user.grant('guest', 'read,write,execute,create', = 'universe') > diff --git a/test/instance_files/replica.lua = b/test/instance_files/replica.lua > new file mode 100755 > index 000000000..587345f24 > --- /dev/null > +++ b/test/instance_files/replica.lua > @@ -0,0 +1,17 @@ > +#!/usr/bin/env tarantool > +local workdir =3D os.getenv('TARANTOOL_WORKDIR') > +local listen =3D os.getenv('TARANTOOL_LISTEN') > +local SOCKET_DIR =3D os.getenv('VARDIR') > +local MASTER =3D arg[1] or "master" > + > +local function master_uri() > + return SOCKET_DIR..'/'..MASTER..'.sock' > +end > + > +box.cfg({ > + work_dir =3D workdir, > + listen =3D listen, > + replication =3D master_uri(), > +}) > + > +_G.ready =3D true > diff --git a/test/luatest_helpers/server.lua = b/test/luatest_helpers/server.lua > index 018ea4f7d..0306a24ef 100644 > --- a/test/luatest_helpers/server.lua > +++ b/test/luatest_helpers/server.lua > @@ -5,6 +5,7 @@ local fiber =3D require('fiber') > local fio =3D require('fio') > local fun =3D require('fun') > local json =3D require('json') > +local errno =3D require('errno') >=20 > local checks =3D require('checks') > local luatest =3D require('luatest') > @@ -59,6 +60,70 @@ function Server:initialize() > getmetatable(getmetatable(self)).initialize(self) > end >=20 > +-- A copy of test_run:grep_log. > +function Server:grep_log(what, bytes, opts) > + local opts =3D opts or {} > + local noreset =3D opts.noreset or false > + -- if instance has crashed provide filename to use grep_log > + local filename =3D opts.filename or self:eval('return = box.cfg.log') > + local file =3D fio.open(filename, {'O_RDONLY', 'O_NONBLOCK'}) > + > + local function fail(msg) > + local err =3D errno.strerror() > + if file ~=3D nil then > + file:close() > + end > + error(string.format("%s: %s: %s", msg, filename, err)) > + end > + > + if file =3D=3D nil then > + fail("Failed to open log file") > + end > + io.flush() -- attempt to flush stdout =3D=3D log fd > + local filesize =3D file:seek(0, 'SEEK_END') > + if filesize =3D=3D nil then > + fail("Failed to get log file size") > + end > + local bytes =3D bytes or 65536 -- don't read whole log - it can = be huge > + bytes =3D bytes > filesize and filesize or bytes > + if file:seek(-bytes, 'SEEK_END') =3D=3D nil then > + fail("Failed to seek log file") > + end > + local found, buf > + repeat -- read file in chunks > + local s =3D file:read(2048) > + if s =3D=3D nil then > + fail("Failed to read log file") > + end > + local pos =3D 1 > + repeat -- split read string in lines > + local endpos =3D string.find(s, '\n', pos) > + endpos =3D endpos and endpos - 1 -- strip terminating \n > + local line =3D string.sub(s, pos, endpos) > + if endpos =3D=3D nil and s ~=3D '' then > + -- line doesn't end with \n or eof, append it to = buffer > + -- to be checked on next iteration > + buf =3D buf or {} > + table.insert(buf, line) > + else > + if buf ~=3D nil then -- prepend line with buffered = data > + table.insert(buf, line) > + line =3D table.concat(buf) > + buf =3D nil > + end > + if string.match(line, "Starting instance") and not = noreset then > + found =3D nil -- server was restarted, reset = search > + else > + found =3D string.match(line, what) or found > + end > + end > + pos =3D endpos and endpos + 2 -- jump to char after \n > + until pos =3D=3D nil > + until s =3D=3D '' > + file:close() > + return found > +end > + > --- Generates environment to run process with. > -- The result is merged into os.environ(). > -- @return map > diff --git = a/test/replication-luatest/gh_4669_applier_reconnect_test.lua = b/test/replication-luatest/gh_4669_applier_reconnect_test.lua > new file mode 100644 > index 000000000..19e10b136 > --- /dev/null > +++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua > @@ -0,0 +1,40 @@ > +local t =3D require('luatest') > +local cluster =3D require('test.luatest_helpers.cluster') > + > +local g =3D t.group('gh-4669-applier-reconnect') > + > +local function check_follow_master(server) > + return t.assert_equals( > + server:eval('return = box.info.replication[1].upstream.status'), 'follow') > +end > + > +g.before_each(function() > + g.cluster =3D cluster:new({}) > + g.master =3D g.cluster:build_server( > + {}, {alias =3D 'master'}, 'base_instance.lua') > + g.replica =3D g.cluster:build_server( > + {args=3D{'master'}}, {alias =3D 'replica'}, 'replica.lua') > + > + g.cluster:join_server(g.master) > + g.cluster:join_server(g.replica) > + g.cluster:start() > + check_follow_master(g.replica) > +end) > + > +g.after_each(function() > + g.cluster:stop() > +end) > + > +-- Test that appliers aren't recreated upon replication = reconfiguration. > +g.test_applier_connection_on_reconfig =3D function(g) > + g.replica:eval([[ > + box.cfg{ > + replication =3D { > + os.getenv("TARANTOOL_LISTEN"), > + box.cfg.replication[1], > + } > + } > + ]]) Should we test this with dynamic add/remove of a 2nd replica? > + check_follow_master(g.replica) > + t.assert_equals(g.master:grep_log("exiting the relay loop"), nil) > +end > --=20 > 2.30.1 (Apple Git-130) >=20