<HTML><BODY><div class="js-helper js-readmsg-msg"><div id="style_16353208500860512771"><div id="style_16353208500860512771_BODY"><div class="cl_340083"><div class="js-helper_mr_css_attr js-readmsg-msg_mr_css_attr"><div id="style_16353207600706896388_mr_css_attr"><div id="style_16353207600706896388_BODY_mr_css_attr"><div class="cl_328643_mr_css_attr"><div>Why did not you use </div><div><p>asserts:assert_server_follow_upstream(server, id)</p><div>from asserts.lua here?</div><div> </div><div>+local function check_follow_master(server)<br>+ return t.assert_equals(<br>+ server:eval('return box.info.replication[1].upstream.status'), 'follow')<br>+end</div></div><div> </div><div> </div><div>Also, do we want to check this patch only for memtx? I asked because you haven’t provided any engine for the test.</div><div> </div><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Vitaliia Ioffe</div></div></div><div> </div><div> </div><div class="mail-quote-collapse"><blockquote style="border-left:1px solid #0857A6;margin:10px;padding:0 0 0 10px;"><span><span>Среда, 27 октября 2021, 10:20 +03:00 от Serge Petrenko <<a>sergepetrenko@tarantool.org</a>>:<br> </span></span><div><div><div id=""><div class="js-helper_mr_css_attr js-readmsg-msg_mr_css_attr"><div><div id="style_16353192520527531861_BODY_mr_css_attr">Replication reconfiguration used to work as follows: upon receiving a<br>new config disconnect from all the existing masters and try to connect<br>to all the masters from the new config.<br><br>This lead to instances doing extra work when old config and new config<br>had the same nodes in them: instead of doing nothing, tarantool<br>reinitialized the connection.<br><br>There was another problem: when an existing connection is broken, master<br>takes some time to notice that. So, when replica resets the connection,<br>it may try to reconnect faster than the master is able to notice its<br>absence. In this case replica wouldn't be able to reconnect due to<br>`duplicate connection with the same replica UUID`. So replication would<br>stop for a replication_timeout, which may be quite large (seconds or<br>tens of seconds).<br><br>Let's prevent tarantool from reconnecting to the same instance, if there<br>already is a working connection.<br><br>Closes #4669<br>---<br> .../gh-4669-applier-reconfig-reconnect.md | 5 ++<br> src/box/box.cc | 31 ++++++++---<br> src/box/replication.cc | 41 +++++++++-----<br> src/box/replication.h | 4 +-<br> .../gh_4669_applier_reconnect_test.lua | 53 +++++++++++++++++++<br> 5 files changed, 113 insertions(+), 21 deletions(-)<br> create mode 100644 changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md<br> create mode 100644 test/replication-luatest/gh_4669_applier_reconnect_test.lua<br><br>diff --git a/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md<br>new file mode 100644<br>index 000000000..2a776872b<br>--- /dev/null<br>+++ b/changelogs/unreleased/gh-4669-applier-reconfig-reconnect.md<br>@@ -0,0 +1,5 @@<br>+## bugfix/replication<br>+<br>+* Fixed replica reconnecting to a living master on any `box.cfg{replication=...}`<br>+ change. Such reconnects could lead to replica failing to restore connection<br>+ for `replication_timeout` seconds (gh-4669).<br>diff --git a/src/box/box.cc b/src/box/box.cc<br>index b1154affa..c2e4a55a7 100644<br>--- a/src/box/box.cc<br>+++ b/src/box/box.cc<br>@@ -1261,7 +1261,7 @@ cfg_get_replication(int *p_count)<br>  * don't start appliers.<br>  */<br> static void<br>-box_sync_replication(bool connect_quorum)<br>+box_sync_replication(bool do_quorum, bool do_reuse)<br> {<br>  int count = 0;<br>  struct applier **appliers = cfg_get_replication(&count);<br>@@ -1272,12 +1272,27 @@ box_sync_replication(bool connect_quorum)<br>  for (int i = 0; i < count; i++)<br>  applier_delete(appliers[i]); /* doesn't affect diag */<br>  });<br>-<br>- replicaset_connect(appliers, count, connect_quorum);<br>+ replicaset_connect(appliers, count, do_quorum, do_reuse);<br> <br>  guard.is_active = false;<br> }<br> <br>+static inline void<br>+box_restart_replication(void)<br>+{<br>+ const bool do_quorum = true;<br>+ const bool do_reuse = false;<br>+ box_sync_replication(do_quorum, do_reuse);<br>+}<br>+<br>+static inline void<br>+box_update_replication(void)<br>+{<br>+ const bool do_quorum = false;<br>+ const bool do_reuse = true;<br>+ box_sync_replication(do_quorum, do_reuse);<br>+}<br>+<br> void<br> box_set_replication(void)<br> {<br>@@ -1296,7 +1311,7 @@ box_set_replication(void)<br>  * Stay in orphan mode in case we fail to connect to at least<br>  * 'replication_connect_quorum' remote instances.<br>  */<br>- box_sync_replication(false);<br>+ box_update_replication();<br>  /* Follow replica */<br>  replicaset_follow();<br>  /* Wait until appliers are in sync */<br>@@ -1416,7 +1431,7 @@ box_set_replication_anon(void)<br>  * them can register and others resend a<br>  * non-anonymous subscribe.<br>  */<br>- box_sync_replication(true);<br>+ box_restart_replication();<br>  /*<br>  * Wait until the master has registered this<br>  * instance.<br>@@ -3279,7 +3294,7 @@ bootstrap(const struct tt_uuid *instance_uuid,<br>  * with connecting to 'replication_connect_quorum' masters.<br>  * If this also fails, throw an error.<br>  */<br>- box_sync_replication(true);<br>+ box_restart_replication();<br> <br>  struct replica *master = replicaset_find_join_master();<br>  assert(master == NULL || master->applier != NULL);<br>@@ -3356,7 +3371,7 @@ local_recovery(const struct tt_uuid *instance_uuid,<br>  if (wal_dir_lock >= 0) {<br>  if (box_listen() != 0)<br>  diag_raise();<br>- box_sync_replication(false);<br>+ box_update_replication();<br> <br>  struct replica *master;<br>  if (replicaset_needs_rejoin(&master)) {<br>@@ -3435,7 +3450,7 @@ local_recovery(const struct tt_uuid *instance_uuid,<br>  vclock_copy(&replicaset.vclock, &recovery->vclock);<br>  if (box_listen() != 0)<br>  diag_raise();<br>- box_sync_replication(false);<br>+ box_update_replication();<br>  }<br>  stream_guard.is_active = false;<br>  recovery_finalize(recovery);<br>diff --git a/src/box/replication.cc b/src/box/replication.cc<br>index 1288bc9b1..10b4ac915 100644<br>--- a/src/box/replication.cc<br>+++ b/src/box/replication.cc<br>@@ -507,7 +507,7 @@ replica_on_applier_state_f(struct trigger *trigger, void *event)<br>  * upon reconfiguration of box.cfg.replication.<br>  */<br> static void<br>-replicaset_update(struct applier **appliers, int count)<br>+replicaset_update(struct applier **appliers, int count, bool keep_connect)<br> {<br>  replica_hash_t uniq;<br>  memset(&uniq, 0, sizeof(uniq));<br>@@ -572,22 +572,39 @@ replicaset_update(struct applier **appliers, int count)<br>  applier_stop(applier);<br>  applier_delete(applier);<br>  }<br>+<br>+ replicaset.applier.total = count;<br>+ replicaset.applier.connected = 0;<br>+ replicaset.applier.loading = 0;<br>+ replicaset.applier.synced = 0;<br>  replicaset_foreach(replica) {<br>  if (replica->applier == NULL)<br>  continue;<br>- applier = replica->applier;<br>- replica_clear_applier(replica);<br>- replica->applier_sync_state = APPLIER_DISCONNECTED;<br>+ struct replica *other = replica_hash_search(&uniq, replica);<br>+ if (keep_connect && other != NULL &&<br>+ (replica->applier->state == APPLIER_FOLLOW ||<br>+ replica->applier->state == APPLIER_SYNC)) {<br>+ /*<br>+ * Try not to interrupt working appliers upon<br>+ * reconfiguration.<br>+ */<br>+ replicaset.applier.connected++;<br>+ replicaset.applier.synced++;<br>+ replica_hash_remove(&uniq, other);<br>+ applier = other->applier;<br>+ replica_clear_applier(other);<br>+ replica_delete(other);<br>+ } else {<br>+ applier = replica->applier;<br>+ replica_clear_applier(replica);<br>+ replica->applier_sync_state = APPLIER_DISCONNECTED;<br>+ }<br>  applier_stop(applier);<br>  applier_delete(applier);<br>  }<br> <br>- /* Save new appliers */<br>- replicaset.applier.total = count;<br>- replicaset.applier.connected = 0;<br>- replicaset.applier.loading = 0;<br>- replicaset.applier.synced = 0;<br> <br>+ /* Save new appliers */<br>  replica_hash_foreach_safe(&uniq, replica, next) {<br>  replica_hash_remove(&uniq, replica);<br> <br>@@ -664,11 +681,11 @@ applier_on_connect_f(struct trigger *trigger, void *event)<br> <br> void<br> replicaset_connect(struct applier **appliers, int count,<br>- bool connect_quorum)<br>+ bool connect_quorum, bool keep_connect)<br> {<br>  if (count == 0) {<br>  /* Cleanup the replica set. */<br>- replicaset_update(appliers, count);<br>+ replicaset_update(appliers, 0, false);<br>  return;<br>  }<br> <br>@@ -772,7 +789,7 @@ replicaset_connect(struct applier **appliers, int count,<br> <br>  /* Now all the appliers are connected, update the replica set. */<br>  try {<br>- replicaset_update(appliers, count);<br>+ replicaset_update(appliers, count, keep_connect);<br>  } catch (Exception *e) {<br>  goto error;<br>  }<br>diff --git a/src/box/replication.h b/src/box/replication.h<br>index 4c82cd839..a8fed45e8 100644<br>--- a/src/box/replication.h<br>+++ b/src/box/replication.h<br>@@ -439,10 +439,12 @@ replicaset_add_anon(const struct tt_uuid *replica_uuid);<br>  * \param connect_quorum if this flag is set, fail unless at<br>  * least replication_connect_quorum<br>  * appliers have successfully connected.<br>+ * \param keep_connect if this flag is set do not force a reconnect if the<br>+ * old connection to the replica is fine.<br>  */<br> void<br> replicaset_connect(struct applier **appliers, int count,<br>- bool connect_quorum);<br>+ bool connect_quorum, bool keep_connect);<br> <br> /**<br>  * Check if the current instance fell too much behind its<br>diff --git a/test/replication-luatest/gh_4669_applier_reconnect_test.lua b/test/replication-luatest/gh_4669_applier_reconnect_test.lua<br>new file mode 100644<br>index 000000000..9c6d26036<br>--- /dev/null<br>+++ b/test/replication-luatest/gh_4669_applier_reconnect_test.lua<br>@@ -0,0 +1,53 @@<br>+local t = require('luatest')<br>+local cluster = require('test.luatest_helpers.cluster')<br>+local helpers = require('test.luatest_helpers')<br>+<br>+local g = t.group('gh-4669-applier-reconnect')<br>+<br>+local function check_follow_master(server)<br>+ return t.assert_equals(<br>+ server:eval('return box.info.replication[1].upstream.status'), 'follow')<br>+end<br>+<br>+g.before_each(function()<br>+ g.cluster = cluster:new({})<br>+ g.master = g.cluster:build_server({alias = 'master'})<br>+ local box_cfg = {<br>+ replication = ('%s/master.iproto'):format(helpers.SOCKET_DIR),<br>+ }<br>+ g.replica = g.cluster:build_server({alias = 'replica', box_cfg = box_cfg})<br>+ g.replica2 = g.cluster:build_server({alias = 'replica2', box_cfg = box_cfg})<br>+<br>+ g.cluster:add_server(g.master)<br>+ g.cluster:add_server(g.replica)<br>+ g.cluster:add_server(g.replica2)<br>+ g.cluster:start()<br>+ check_follow_master(g.replica)<br>+end)<br>+<br>+g.after_each(function()<br>+ g.cluster:stop()<br>+end)<br>+<br>+-- Test that appliers aren't recreated upon replication reconfiguration.<br>+-- Add and then remove two extra replicas to the configuration. The master<br>+-- connection should stay intact.<br>+g.test_applier_connection_on_reconfig = function(g)<br>+ g.replica:eval(([[<br>+ box.cfg{<br>+ replication = {<br>+ box.cfg.listen,<br>+ box.cfg.replication[1],<br>+ "%s/replica2.iproto",<br>+ }<br>+ }]]):format(helpers.SOCKET_DIR))<br>+ g.replica:eval([[<br>+ box.cfg{<br>+ replication = {<br>+ box.cfg.replication[2]<br>+ }<br>+ }<br>+ ]])<br>+ check_follow_master(g.replica)<br>+ t.assert_equals(g.master:grep_log("exiting the relay loop"), nil)<br>+end<br>--<br>2.30.1 (Apple Git-130)</div></div></div></div></div></div></blockquote></div><div> </div></div></div></div></div></div></div></div></div></BODY></HTML>