From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 2B90A74141; Thu, 30 Sep 2021 22:18:46 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2B90A74141 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1633029526; bh=k6Oxo/bTprcmPuMR8QJdH5xOz9OqeiXMFJK8MIP+eBw=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=csR56c0OvVpVFLGCvA5/qdWMb0BYQ9IOjPl/dUrp+kdBbninFS3AKnAXVFLWNX4rn fiz+IQQwhoI1P+wysJmA80paUopmr0JThYBohh2vGFFJfN9IWINpAK2auZFL5bu3+I 5KhWilOGvtynzJpjONkgYMRQxSE3Yrq87tDflPrU= Received: from smtp55.i.mail.ru (smtp55.i.mail.ru [217.69.128.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 36E2774141 for ; Thu, 30 Sep 2021 22:17:44 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 36E2774141 Received: by smtp55.i.mail.ru with esmtpa (envelope-from ) id 1mW1Yt-0008Hc-Dr; Thu, 30 Sep 2021 22:17:43 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Thu, 30 Sep 2021 22:17:40 +0300 Message-Id: <203dd4c5c23e717861a4952510882904323e10a0.1633028320.git.sergepetrenko@tarantool.org> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-4EC0790: 10 X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD96A58C36AA2E99649BF631F26B0465AFD0E15652C7D51B98D182A05F5380850407DDB4B8B37FC41E492013CC6EE1C650EF2316B6E90AE15A93B142FFEBA644FEC X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE77603ADE015AF816DEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006378D071468F2F69688EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F2B55ED16C0C889EF15B9D00C314674D80CC7F00164DA146DAFE8445B8C89999728AA50765F790063783E00425F71A4181389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC82FFDA4F57982C5F4F6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA73AA81AA40904B5D9A18204E546F3947CE6BDB36F057AC83CC0837EA9F3D197644AD6D5ED66289B52698AB9A7B718F8C46E0066C2D8992A16725E5C173C3A84C38F14543C44A2D1B3BA3038C0950A5D36B5C8C57E37DE458B0BC6067A898B09E46D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE76D0F27F7E6A6C418731C566533BA786AA5CC5B56E945C8DA X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B50517E5D529FE1D7C9868860728A6BE755C X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975CBBB17C150BCA6793B80D3CDE4604713D4BB07EAFCA5317BF9C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF783E2B6F79C23BED699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34A503FBFE8BE8FC4968CC8B78836938433BEF0B11D402C6B6808A171A45D71C4ABDF0CAE6E4B3A0FF1D7E09C32AA3244C653546F767E5517A88F96A9E1F9BFA383A76366E8A9DE7CA927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojJNmX3owDPmGoTCgk+7VWjQ== X-Mailru-Sender: 583F1D7ACE8F49BD8518EAAA0E4F94F1BE42B18A631A566EB65DB86C2D212C5A6A560E4984BBE738424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH 2/2] replication: fix replica disconnect upon reconfiguration X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Replication reconfiguration used to work as follows: upon receiving a new config disconnect from all the existing masters and try to connect to all the masters from the new config. This lead to instances doing extra work when old config and new config had the same nodes in them: instead of doing nothing, tarantool reinitialized the connection. There was another problem: when an existing connection is broken, master takes some time to notice that. So, when replica resets the connection, it may try to reconnect faster than the master is able to notice its absence. In this case replica wouldn't be able to reconnect due to `duplicate connection with the same replica UUID`. So replication would stop for a replication_timeout, which may be quite large (seconds or tens of seconds). Let's prevent tarantool from reconnecting to the same instance, if there already is a working connection. Closes #4669 --- .../gh-4669-applier-reconfig-reconnect.md | 5 ++ src/box/box.cc | 6 +- src/box/replication.cc | 41 ++++++++---- src/box/replication.h | 4 +- test/instance_files/base_instance.lua | 3 +- test/instance_files/replica.lua | 17 +++++ test/luatest_helpers/server.lua | 64 +++++++++++++++++++ .../gh_4669_applier_reconnect_test.lua | 42 ++++++++++++ 8 files changed, 166 insertions(+), 16 deletions(-) create mode 100644 changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md create mode 100755 test/instance_files/replica.lua create mode 100644 test/replication-luatest/gh_4669_applier_reconnect_test.lua diff --git a/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md new file mode 100644 index 000000000..2a776872b --- /dev/null +++ b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md @@ -0,0 +1,5 @@ +## bugfix/replication + +* Fixed replica reconnecting to a living master on any `box.cfg{replication=...}` + change. Such reconnects could lead to replica failing to restore connection + for `replication_timeout` seconds (gh-4669). diff --git a/src/box/box.cc b/src/box/box.cc index 219ffa38d..89cda5599 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) * don't start appliers. */ static void -box_sync_replication(bool connect_quorum) +box_sync_replication(bool strict) { int count = 0; struct applier **appliers = cfg_get_replication(&count); @@ -1261,7 +1261,9 @@ box_sync_replication(bool connect_quorum) applier_delete(appliers[i]); /* doesn't affect diag */ }); - replicaset_connect(appliers, count, connect_quorum); + bool connect_quorum = strict; + bool keep_connect = !strict; + replicaset_connect(appliers, count, connect_quorum, keep_connect); guard.is_active = false; } diff --git a/src/box/replication.cc b/src/box/replication.cc index 1288bc9b1..e5fce6c8c 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -507,7 +507,7 @@ replica_on_applier_state_f(struct trigger *trigger, void *event) * upon reconfiguration of box.cfg.replication. */ static void -replicaset_update(struct applier **appliers, int count) +replicaset_update(struct applier **appliers, int count, bool keep_connect) { replica_hash_t uniq; memset(&uniq, 0, sizeof(uniq)); @@ -572,22 +572,39 @@ replicaset_update(struct applier **appliers, int count) applier_stop(applier); applier_delete(applier); } + + replicaset.applier.total = count; + replicaset.applier.connected = 0; + replicaset.applier.loading = 0; + replicaset.applier.synced = 0; replicaset_foreach(replica) { if (replica->applier == NULL) continue; - applier = replica->applier; - replica_clear_applier(replica); - replica->applier_sync_state = APPLIER_DISCONNECTED; + struct replica *other = replica_hash_search(&uniq, replica); + if (keep_connect && other != NULL && + (replica->applier->state == APPLIER_FOLLOW || + replica->applier->state == APPLIER_SYNC)) { + /* + * Try not to interrupt working appliers upon + * reconfiguration. + */ + replicaset.applier.connected++; + replicaset.applier.synced++; + replica_hash_remove(&uniq, other); + applier = other->applier; + replica_clear_applier(other); + replica_delete(other); + } else { + applier = replica->applier; + replica_clear_applier(replica); + replica->applier_sync_state = APPLIER_DISCONNECTED; + } applier_stop(applier); applier_delete(applier); } - /* Save new appliers */ - replicaset.applier.total = count; - replicaset.applier.connected = 0; - replicaset.applier.loading = 0; - replicaset.applier.synced = 0; + /* Save new appliers */ replica_hash_foreach_safe(&uniq, replica, next) { replica_hash_remove(&uniq, replica); @@ -664,11 +681,11 @@ applier_on_connect_f(struct trigger *trigger, void *event) void replicaset_connect(struct applier **appliers, int count, - bool connect_quorum) + bool connect_quorum, bool keep_connect) { if (count == 0) { /* Cleanup the replica set. */ - replicaset_update(appliers, count); + replicaset_update(appliers, count, keep_connect); return; } @@ -772,7 +789,7 @@ replicaset_connect(struct applier **appliers, int count, /* Now all the appliers are connected, update the replica set. */ try { - replicaset_update(appliers, count); + replicaset_update(appliers, count, keep_connect); } catch (Exception *e) { goto error; } diff --git a/src/box/replication.h b/src/box/replication.h index 4c82cd839..a8fed45e8 100644 --- a/src/box/replication.h +++ b/src/box/replication.h @@ -439,10 +439,12 @@ replicaset_add_anon(const struct tt_uuid *replica_uuid); * \param connect_quorum if this flag is set, fail unless at * least replication_connect_quorum * appliers have successfully connected. + * \param keep_connect if this flag is set do not force a reconnect if the + * old connection to the replica is fine. */ void replicaset_connect(struct applier **appliers, int count, - bool connect_quorum); + bool connect_quorum, bool keep_connect); /** * Check if the current instance fell too much behind its diff --git a/test/instance_files/base_instance.lua b/test/instance_files/base_instance.lua index 45bdbc7e8..e579c3843 100755 --- a/test/instance_files/base_instance.lua +++ b/test/instance_files/base_instance.lua @@ -5,7 +5,8 @@ local listen = os.getenv('TARANTOOL_LISTEN') box.cfg({ work_dir = workdir, -- listen = 'localhost:3310' - listen = listen + listen = listen, + log = workdir..'/tarantool.log', }) box.schema.user.grant('guest', 'read,write,execute,create', 'universe') diff --git a/test/instance_files/replica.lua b/test/instance_files/replica.lua new file mode 100755 index 000000000..587345f24 --- /dev/null +++ b/test/instance_files/replica.lua @@ -0,0 +1,17 @@ +#!/usr/bin/env tarantool +local workdir = os.getenv('TARANTOOL_WORKDIR') +local listen = os.getenv('TARANTOOL_LISTEN') +local SOCKET_DIR = os.getenv('VARDIR') +local MASTER = arg[1] or "master" + +local function master_uri() + return SOCKET_DIR..'/'..MASTER..'.sock' +end + +box.cfg({ + work_dir = workdir, + listen = listen, + replication = master_uri(), +}) + +_G.ready = true diff --git a/test/luatest_helpers/server.lua b/test/luatest_helpers/server.lua index 018ea4f7d..f204dc794 100644 --- a/test/luatest_helpers/server.lua +++ b/test/luatest_helpers/server.lua @@ -59,6 +59,70 @@ function Server:initialize() getmetatable(getmetatable(self)).initialize(self) end +-- A copy of test_run:grep_log. +function Server:grep_log(what, bytes, opts) + local opts = opts or {} + local noreset = opts.noreset or false + -- if instance has crashed provide filename to use grep_log + local filename = opts.filename or self:eval('return box.cfg.log') + local file = fio.open(filename, {'O_RDONLY', 'O_NONBLOCK'}) + + local function fail(msg) + local err = errno.strerror() + if file ~= nil then + file:close() + end + error(string.format("%s: %s: %s", msg, filename, err)) + end + + if file == nil then + fail("Failed to open log file") + end + io.flush() -- attempt to flush stdout == log fd + local filesize = file:seek(0, 'SEEK_END') + if filesize == nil then + fail("Failed to get log file size") + end + local bytes = bytes or 65536 -- don't read whole log - it can be huge + bytes = bytes > filesize and filesize or bytes + if file:seek(-bytes, 'SEEK_END') == nil then + fail("Failed to seek log file") + end + local found, buf + repeat -- read file in chunks + local s = file:read(2048) + if s == nil then + fail("Failed to read log file") + end + local pos = 1 + repeat -- split read string in lines + local endpos = string.find(s, '\n', pos) + endpos = endpos and endpos - 1 -- strip terminating \n + local line = string.sub(s, pos, endpos) + if endpos == nil and s ~= '' then + -- line doesn't end with \n or eof, append it to buffer + -- to be checked on next iteration + buf = buf or {} + table.insert(buf, line) + else + if buf ~= nil then -- prepend line with buffered data + table.insert(buf, line) + line = table.concat(buf) + buf = nil + end + if string.match(line, "Starting instance") and not noreset then + found = nil -- server was restarted, reset search + else + found = string.match(line, what) or found + end + end + pos = endpos and endpos + 2 -- jump to char after \n + until pos == nil + until s == '' + file:close() + return found +end + --- Generates environment to run process with. -- The result is merged into os.environ(). -- @return map diff --git a/test/replication-luatest/gh_4669_applier_reconnect_test.lua b/test/replication-luatest/gh_4669_applier_reconnect_test.lua new file mode 100644 index 000000000..62adff716 --- /dev/null +++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua @@ -0,0 +1,42 @@ +local t = require('luatest') +local fio = require('fio') +local Server = t.Server +local Cluster = require('test.luatest_helpers.cluster') + +local g = t.group('gh-4669-applier-reconnect') + +local function check_follow_master(server) + return t.assert_equals( + server:eval('return box.info.replication[1].upstream.status'), 'follow') +end + +g.before_each(function() + g.cluster = Cluster:new({}) + g.master = g.cluster:build_server( + {}, {alias = 'master'}, 'base_instance.lua') + g.replica = g.cluster:build_server( + {args={'master'}}, {alias = 'replica'}, 'replica.lua') + + g.cluster:join_server(g.master) + g.cluster:join_server(g.replica) + g.cluster:start() + check_follow_master(g.replica) +end) + +g.after_each(function() + g.cluster:stop() +end) + +-- Test that appliers aren't recreated upon replication reconfiguration. +g.test_applier_connection_on_reconfig = function(g) + g.replica:eval( + 'box.cfg{'.. + 'replication = {'.. + 'os.getenv("TARANTOOL_LISTEN"),'.. + 'box.cfg.replication[1],'.. + '}'.. + '}' + ) + check_follow_master(g.replica) + t.assert_equals(g.master:grep_log("exiting the relay loop"), nil) +end -- 2.30.1 (Apple Git-130)