From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id D4D336E46D; Tue, 12 Oct 2021 17:14:36 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D4D336E46D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1634048076; bh=2trLkQ4rZkCcHEFZM4KIRob0HnBG3U4UrClVw2k4rmU=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=l0214fZGjn7k5c51epmjGTXSoIkq+fBptp2a3WxjrEOk+z1ML+YGvmUZpDOtm938L J0hny0z7cfVrTGk8evFBTx4QcJMDD2mVh8w9Sr3Bf5+XLjjAN6owEzz9keR1TDZWlX FP9EncktkTsaCN8kdDct1Ax7ZboPfhw7qt+AsYu8= Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C1BEF6E46D for ; Tue, 12 Oct 2021 17:14:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C1BEF6E46D Received: by smtp49.i.mail.ru with esmtpa (envelope-from ) id 1maIY6-0000kj-UD; Tue, 12 Oct 2021 17:14:35 +0300 To: sergos Cc: tarantool-patches@dev.tarantool.org References: <5f68cfcac8c44346a690e0f48ad0da1d7646ac83.1633439400.git.sergepetrenko@tarantool.org> <648688AC-03B8-48B3-B644-8F3B9A9A9212@tarantool.org> Message-ID: <8e453790-534e-be84-b4b2-fd30c9740eb7@tarantool.org> Date: Tue, 12 Oct 2021 17:14:34 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <648688AC-03B8-48B3-B644-8F3B9A9A9212@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9CC2C53771CDF5F704ECB515BF30065306241B0DEB1B44DAE00894C459B0CD1B97FCA7EA9CF8972B17E0E569AEBB2B23EC72855A8E261D629B0371ADBF8103445 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE71BDE6A359BD5B800EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637279A5203CF71F5518638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D805E0C8903B5D28C689102DFD31AAB010117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BF1175FABE1C0F9B6A471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7C2C2559B29ED8195043847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A5366CE1D046848002699B92B69BACAB7624D33DDF0C614BB6D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA752654937D1C607C78410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34D8C933888226C8413C96907CF39C3209988BC39D5586960D172335CA3A942DFB102FC3A6854158591D7E09C32AA3244C0ADC64EB8007EF21335A480DFFD328EA725D5B54B2FE4575729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojtWOf/ZwQPGEql1KWilRolQ== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446FC2929BD3FF0ED5CCF49C1EA3F720F52AB0C2CBB143DDF30424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 2/2] replication: fix replica disconnect upon reconfiguration X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 12.10.2021 13:14, sergos пишет: > Hi! > > Thanks for the patch! > > Just request for some addition to the test, otherwise LGTM. > > Sergos > > >> On 5 Oct 2021, at 16:18, Serge Petrenko wrote: >> >> Replication reconfiguration used to work as follows: upon receiving a >> new config disconnect from all the existing masters and try to connect >> to all the masters from the new config. >> >> This lead to instances doing extra work when old config and new config >> had the same nodes in them: instead of doing nothing, tarantool >> reinitialized the connection. >> >> There was another problem: when an existing connection is broken, master >> takes some time to notice that. So, when replica resets the connection, >> it may try to reconnect faster than the master is able to notice its >> absence. In this case replica wouldn't be able to reconnect due to >> `duplicate connection with the same replica UUID`. So replication would >> stop for a replication_timeout, which may be quite large (seconds or >> tens of seconds). >> >> Let's prevent tarantool from reconnecting to the same instance, if there >> already is a working connection. >> >> Closes #4669 >> --- >> .../gh-4669-applier-reconfig-reconnect.md | 5 ++ >> src/box/box.cc | 31 ++++++--- >> src/box/replication.cc | 41 ++++++++---- >> src/box/replication.h | 4 +- >> test/instance_files/base_instance.lua | 3 +- >> test/instance_files/replica.lua | 17 +++++ >> test/luatest_helpers/server.lua | 65 +++++++++++++++++++ >> .../gh_4669_applier_reconnect_test.lua | 40 ++++++++++++ >> 8 files changed, 184 insertions(+), 22 deletions(-) >> create mode 100644 changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md >> create mode 100755 test/instance_files/replica.lua >> create mode 100644 test/replication-luatest/gh_4669_applier_reconnect_test.lua >> >> diff --git a/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md >> new file mode 100644 >> index 000000000..2a776872b >> --- /dev/null >> +++ b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md >> @@ -0,0 +1,5 @@ >> +## bugfix/replication >> + >> +* Fixed replica reconnecting to a living master on any `box.cfg{replication=...}` >> + change. Such reconnects could lead to replica failing to restore connection >> + for `replication_timeout` seconds (gh-4669). >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 219ffa38d..cc4ada47e 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) >> * don't start appliers. >> */ >> static void >> -box_sync_replication(bool connect_quorum) >> +box_sync_replication(bool do_quorum, bool do_reuse) >> { >> int count = 0; >> struct applier **appliers = cfg_get_replication(&count); >> @@ -1260,12 +1260,27 @@ box_sync_replication(bool connect_quorum) >> for (int i = 0; i < count; i++) >> applier_delete(appliers[i]); /* doesn't affect diag */ >> }); >> - >> - replicaset_connect(appliers, count, connect_quorum); >> + replicaset_connect(appliers, count, do_quorum, do_reuse); >> >> guard.is_active = false; >> } >> >> +static inline void >> +box_restart_replication(void) >> +{ >> + const bool do_quorum = true; >> + const bool do_reuse = false; >> + box_sync_replication(do_quorum, do_reuse); >> +} >> + >> +static inline void >> +box_update_replication(void) >> +{ >> + const bool do_quorum = false; >> + const bool do_reuse = true; >> + box_sync_replication(do_quorum, do_reuse); >> +} >> + >> void >> box_set_replication(void) >> { >> @@ -1284,7 +1299,7 @@ box_set_replication(void) >> * Stay in orphan mode in case we fail to connect to at least >> * 'replication_connect_quorum' remote instances. >> */ >> - box_sync_replication(false); >> + box_update_replication(); >> /* Follow replica */ >> replicaset_follow(); >> /* Wait until appliers are in sync */ >> @@ -1404,7 +1419,7 @@ box_set_replication_anon(void) >> * them can register and others resend a >> * non-anonymous subscribe. >> */ >> - box_sync_replication(true); >> + box_restart_replication(); >> /* >> * Wait until the master has registered this >> * instance. >> @@ -3258,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid, >> * with connecting to 'replication_connect_quorum' masters. >> * If this also fails, throw an error. >> */ >> - box_sync_replication(true); >> + box_restart_replication(); >> >> struct replica *master = replicaset_find_join_master(); >> assert(master == NULL || master->applier != NULL); >> @@ -3335,7 +3350,7 @@ local_recovery(const struct tt_uuid *instance_uuid, >> if (wal_dir_lock >= 0) { >> if (box_listen() != 0) >> diag_raise(); >> - box_sync_replication(false); >> + box_update_replication(); >> >> struct replica *master; >> if (replicaset_needs_rejoin(&master)) { >> @@ -3414,7 +3429,7 @@ local_recovery(const struct tt_uuid *instance_uuid, >> vclock_copy(&replicaset.vclock, &recovery->vclock); >> if (box_listen() != 0) >> diag_raise(); >> - box_sync_replication(false); >> + box_update_replication(); >> } >> stream_guard.is_active = false; >> recovery_finalize(recovery); >> diff --git a/src/box/replication.cc b/src/box/replication.cc >> index 1288bc9b1..10b4ac915 100644 >> --- a/src/box/replication.cc >> +++ b/src/box/replication.cc >> @@ -507,7 +507,7 @@ replica_on_applier_state_f(struct trigger *trigger, void *event) >> * upon reconfiguration of box.cfg.replication. >> */ >> static void >> -replicaset_update(struct applier **appliers, int count) >> +replicaset_update(struct applier **appliers, int count, bool keep_connect) >> { >> replica_hash_t uniq; >> memset(&uniq, 0, sizeof(uniq)); >> @@ -572,22 +572,39 @@ replicaset_update(struct applier **appliers, int count) >> applier_stop(applier); >> applier_delete(applier); >> } >> + >> + replicaset.applier.total = count; >> + replicaset.applier.connected = 0; >> + replicaset.applier.loading = 0; >> + replicaset.applier.synced = 0; >> replicaset_foreach(replica) { >> if (replica->applier == NULL) >> continue; >> - applier = replica->applier; >> - replica_clear_applier(replica); >> - replica->applier_sync_state = APPLIER_DISCONNECTED; >> + struct replica *other = replica_hash_search(&uniq, replica); >> + if (keep_connect && other != NULL && >> + (replica->applier->state == APPLIER_FOLLOW || >> + replica->applier->state == APPLIER_SYNC)) { >> + /* >> + * Try not to interrupt working appliers upon >> + * reconfiguration. >> + */ >> + replicaset.applier.connected++; >> + replicaset.applier.synced++; >> + replica_hash_remove(&uniq, other); >> + applier = other->applier; >> + replica_clear_applier(other); >> + replica_delete(other); >> + } else { >> + applier = replica->applier; >> + replica_clear_applier(replica); >> + replica->applier_sync_state = APPLIER_DISCONNECTED; >> + } >> applier_stop(applier); >> applier_delete(applier); >> } >> >> - /* Save new appliers */ >> - replicaset.applier.total = count; >> - replicaset.applier.connected = 0; >> - replicaset.applier.loading = 0; >> - replicaset.applier.synced = 0; >> >> + /* Save new appliers */ >> replica_hash_foreach_safe(&uniq, replica, next) { >> replica_hash_remove(&uniq, replica); >> >> @@ -664,11 +681,11 @@ applier_on_connect_f(struct trigger *trigger, void *event) >> >> void >> replicaset_connect(struct applier **appliers, int count, >> - bool connect_quorum) >> + bool connect_quorum, bool keep_connect) >> { >> if (count == 0) { >> /* Cleanup the replica set. */ >> - replicaset_update(appliers, count); >> + replicaset_update(appliers, 0, false); >> return; >> } >> >> @@ -772,7 +789,7 @@ replicaset_connect(struct applier **appliers, int count, >> >> /* Now all the appliers are connected, update the replica set. */ >> try { >> - replicaset_update(appliers, count); >> + replicaset_update(appliers, count, keep_connect); >> } catch (Exception *e) { >> goto error; >> } >> diff --git a/src/box/replication.h b/src/box/replication.h >> index 4c82cd839..a8fed45e8 100644 >> --- a/src/box/replication.h >> +++ b/src/box/replication.h >> @@ -439,10 +439,12 @@ replicaset_add_anon(const struct tt_uuid *replica_uuid); >> * \param connect_quorum if this flag is set, fail unless at >> * least replication_connect_quorum >> * appliers have successfully connected. >> + * \param keep_connect if this flag is set do not force a reconnect if the >> + * old connection to the replica is fine. >> */ >> void >> replicaset_connect(struct applier **appliers, int count, >> - bool connect_quorum); >> + bool connect_quorum, bool keep_connect); >> >> /** >> * Check if the current instance fell too much behind its >> diff --git a/test/instance_files/base_instance.lua b/test/instance_files/base_instance.lua >> index 45bdbc7e8..e579c3843 100755 >> --- a/test/instance_files/base_instance.lua >> +++ b/test/instance_files/base_instance.lua >> @@ -5,7 +5,8 @@ local listen = os.getenv('TARANTOOL_LISTEN') >> box.cfg({ >> work_dir = workdir, >> -- listen = 'localhost:3310' >> - listen = listen >> + listen = listen, >> + log = workdir..'/tarantool.log', >> }) >> >> box.schema.user.grant('guest', 'read,write,execute,create', 'universe') >> diff --git a/test/instance_files/replica.lua b/test/instance_files/replica.lua >> new file mode 100755 >> index 000000000..587345f24 >> --- /dev/null >> +++ b/test/instance_files/replica.lua >> @@ -0,0 +1,17 @@ >> +#!/usr/bin/env tarantool >> +local workdir = os.getenv('TARANTOOL_WORKDIR') >> +local listen = os.getenv('TARANTOOL_LISTEN') >> +local SOCKET_DIR = os.getenv('VARDIR') >> +local MASTER = arg[1] or "master" >> + >> +local function master_uri() >> + return SOCKET_DIR..'/'..MASTER..'.sock' >> +end >> + >> +box.cfg({ >> + work_dir = workdir, >> + listen = listen, >> + replication = master_uri(), >> +}) >> + >> +_G.ready = true >> diff --git a/test/luatest_helpers/server.lua b/test/luatest_helpers/server.lua >> index 018ea4f7d..0306a24ef 100644 >> --- a/test/luatest_helpers/server.lua >> +++ b/test/luatest_helpers/server.lua >> @@ -5,6 +5,7 @@ local fiber = require('fiber') >> local fio = require('fio') >> local fun = require('fun') >> local json = require('json') >> +local errno = require('errno') >> >> local checks = require('checks') >> local luatest = require('luatest') >> @@ -59,6 +60,70 @@ function Server:initialize() >> getmetatable(getmetatable(self)).initialize(self) >> end >> >> +-- A copy of test_run:grep_log. >> +function Server:grep_log(what, bytes, opts) >> + local opts = opts or {} >> + local noreset = opts.noreset or false >> + -- if instance has crashed provide filename to use grep_log >> + local filename = opts.filename or self:eval('return box.cfg.log') >> + local file = fio.open(filename, {'O_RDONLY', 'O_NONBLOCK'}) >> + >> + local function fail(msg) >> + local err = errno.strerror() >> + if file ~= nil then >> + file:close() >> + end >> + error(string.format("%s: %s: %s", msg, filename, err)) >> + end >> + >> + if file == nil then >> + fail("Failed to open log file") >> + end >> + io.flush() -- attempt to flush stdout == log fd >> + local filesize = file:seek(0, 'SEEK_END') >> + if filesize == nil then >> + fail("Failed to get log file size") >> + end >> + local bytes = bytes or 65536 -- don't read whole log - it can be huge >> + bytes = bytes > filesize and filesize or bytes >> + if file:seek(-bytes, 'SEEK_END') == nil then >> + fail("Failed to seek log file") >> + end >> + local found, buf >> + repeat -- read file in chunks >> + local s = file:read(2048) >> + if s == nil then >> + fail("Failed to read log file") >> + end >> + local pos = 1 >> + repeat -- split read string in lines >> + local endpos = string.find(s, '\n', pos) >> + endpos = endpos and endpos - 1 -- strip terminating \n >> + local line = string.sub(s, pos, endpos) >> + if endpos == nil and s ~= '' then >> + -- line doesn't end with \n or eof, append it to buffer >> + -- to be checked on next iteration >> + buf = buf or {} >> + table.insert(buf, line) >> + else >> + if buf ~= nil then -- prepend line with buffered data >> + table.insert(buf, line) >> + line = table.concat(buf) >> + buf = nil >> + end >> + if string.match(line, "Starting instance") and not noreset then >> + found = nil -- server was restarted, reset search >> + else >> + found = string.match(line, what) or found >> + end >> + end >> + pos = endpos and endpos + 2 -- jump to char after \n >> + until pos == nil >> + until s == '' >> + file:close() >> + return found >> +end >> + >> --- Generates environment to run process with. >> -- The result is merged into os.environ(). >> -- @return map >> diff --git a/test/replication-luatest/gh_4669_applier_reconnect_test.lua b/test/replication-luatest/gh_4669_applier_reconnect_test.lua >> new file mode 100644 >> index 000000000..19e10b136 >> --- /dev/null >> +++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua >> @@ -0,0 +1,40 @@ >> +local t = require('luatest') >> +local cluster = require('test.luatest_helpers.cluster') >> + >> +local g = t.group('gh-4669-applier-reconnect') >> + >> +local function check_follow_master(server) >> + return t.assert_equals( >> + server:eval('return box.info.replication[1].upstream.status'), 'follow') >> +end >> + >> +g.before_each(function() >> + g.cluster = cluster:new({}) >> + g.master = g.cluster:build_server( >> + {}, {alias = 'master'}, 'base_instance.lua') >> + g.replica = g.cluster:build_server( >> + {args={'master'}}, {alias = 'replica'}, 'replica.lua') >> + >> + g.cluster:join_server(g.master) >> + g.cluster:join_server(g.replica) >> + g.cluster:start() >> + check_follow_master(g.replica) >> +end) >> + >> +g.after_each(function() >> + g.cluster:stop() >> +end) >> + >> +-- Test that appliers aren't recreated upon replication reconfiguration. >> +g.test_applier_connection_on_reconfig = function(g) >> + g.replica:eval([[ >> + box.cfg{ >> + replication = { >> + os.getenv("TARANTOOL_LISTEN"), >> + box.cfg.replication[1], >> + } >> + } >> + ]]) > Should we test this with dynamic add/remove of a 2nd replica? No problem. Here's the diff: ============================ diff --git a/test/replication-luatest/gh_4669_applier_reconnect_test.lua b/test/replication-luatest/gh_4669_applier_reconnect_test.lua index 19e10b136..e082515fd 100644 --- a/test/replication-luatest/gh_4669_applier_reconnect_test.lua +++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua @@ -14,9 +14,12 @@ g.before_each(function()          {}, {alias = 'master'}, 'base_instance.lua')      g.replica = g.cluster:build_server(          {args={'master'}}, {alias = 'replica'}, 'replica.lua') +    g.replica2 = g.cluster:build_server( +        {args={'master'}}, {alias = 'replica2'}, 'replica.lua')      g.cluster:join_server(g.master)      g.cluster:join_server(g.replica) +    g.cluster:join_server(g.replica2)      g.cluster:start()      check_follow_master(g.replica)  end) @@ -26,12 +29,20 @@ g.after_each(function()  end)  -- Test that appliers aren't recreated upon replication reconfiguration. +-- Add and then remove two extra replicas to the configuration. The master +-- connection should stay intact.  g.test_applier_connection_on_reconfig = function(g)      g.replica:eval([[          box.cfg{              replication = {                  os.getenv("TARANTOOL_LISTEN"),                  box.cfg.replication[1], +                os.getenv('VARDIR')..'/replica2.sock' +            } +        } +        box.cfg{ +            replication = { +                box.cfg.replication[2]              }          }      ]]) ============================ > >> + check_follow_master(g.replica) >> + t.assert_equals(g.master:grep_log("exiting the relay loop"), nil) >> +end >> -- >> 2.30.1 (Apple Git-130) >> -- Serge Petrenko