<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi! Thank you for the review!<br class=""><br class="">I fixed your remarks. The new diff is below. Also please see my comments.<br class=""><br class=""><blockquote type="cite" class="">7 авг. 2018 г., в 20:28, Vladimir Davydov <<a href="mailto:vdavydov.dev@gmail.com" class="">vdavydov.dev@gmail.com</a>> написал(а):<br class=""><br class="">On Tue, Aug 07, 2018 at 03:44:13PM +0300, Serge Petrenko wrote:<br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">On bootstrap and after replication reconfiguration<br class="">replication_connect_quorum was ignored. The instance tried to connect to<br class="">every replica listed in replication parameter, and errored if it wasn't<br class="">possible.<br class="">The patch alters this behaviour. The instance still tries to connect to<br class="">every node listed in replication, but does not raise an error if it was<br class="">able to connect to at least replication_connect_quorum instances.<br class=""></blockquote><br class="">Please append a documentation request (@TarantoolBot) to the commit<br class="">message.<br class=""><br class=""></blockquote>Done.<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class="">Closes #3428<br class="">---<br class=""><a href="https://github.com/tarantool/tarantool/issues/3428" class="">https://github.com/tarantool/tarantool/issues/3428</a><br class="">https://github.com/tarantool/tarantool/tree/sergepetrenko/gh-3428-replication-connect-quorum<br class=""><br class="">Changes in v2:<br class=""> - change test/replication/ddl.lua instance file to fix<br class="">   test failure on Travis.<br class=""><br class="">src/box/box.cc                           | 6 +++---<br class="">src/box/replication.cc                   | 8 +++-----<br class="">src/box/replication.h                    | 7 ++++---<br class="">test/replication-py/init_storage.test.py | 2 +-<br class="">test/replication-py/master.lua           | 2 ++<br class="">test/replication-py/replica.lua          | 2 ++<br class="">test/replication/autobootstrap.lua       | 3 ++-<br class="">test/replication/autobootstrap_guest.lua | 2 +-<br class="">test/replication/ddl.lua                 | 3 ++-<br class="">test/replication/errinj.result           | 6 +++---<br class="">test/replication/errinj.test.lua         | 6 +++---<br class="">test/replication/master.lua              | 1 +<br class="">test/replication/master_quorum.lua       | 3 ++-<br class="">test/replication/on_replace.lua          | 3 ++-<br class="">test/replication/quorum.lua              | 4 ++--<br class="">test/replication/rebootstrap.lua         | 2 +-<br class="">test/replication/replica_no_quorum.lua   | 3 ++-<br class="">test/replication/replica_timeout.lua     | 3 ++-<br class="">test/replication/replica_uuid_ro.lua     | 2 +-<br class="">19 files changed, 39 insertions(+), 29 deletions(-)<br class=""><br class="">diff --git a/src/box/box.cc b/src/box/box.cc<br class="">index e3eb2738f..f8731f464 100644<br class="">--- a/src/box/box.cc<br class="">+++ b/src/box/box.cc<br class="">@@ -595,7 +595,7 @@ cfg_get_replication(int *p_count)<br class=""> * don't start appliers.<br class=""> */<br class="">static void<br class="">-box_sync_replication(double timeout, bool connect_all)<br class="">+box_sync_replication(double timeout, bool reach_quorum)<br class=""></blockquote><br class="">After this patch, 'timeout' always equals replication_connect_timeout<br class="">so you don't need to pass it explicitly anymore. Please remove it from<br class="">this function and from replicaset_connect.<br class=""></blockquote><br class="">Done.<br class=""><br class=""><blockquote type="cite" class=""><br class="">Also, I don't like 'reach_quorum' name. Would 'connect_quorum' sound<br class="">better? Or may be we should pass the minimal number of masters to<br class="">connect instead?<br class=""></blockquote><br class="">Let it be connect_quorum. I don’t like the idea to pass a minimal number of<br class="">replicas to connect. Looks like we would always pass either<br class="">replication_connect_quorum or count.<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>int count = 0;<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>struct applier **appliers = cfg_get_replication(&count);<br class="">@@ -607,7 +607,7 @@ box_sync_replication(double timeout, bool connect_all)<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>applier_delete(appliers[i]); /* doesn't affect diag */<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>});<br class=""><br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span>replicaset_connect(appliers, count, timeout, connect_all);<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span>replicaset_connect(appliers, count, timeout, reach_quorum);<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>guard.is_active = false;<br class="">}<br class="">@@ -1888,7 +1888,7 @@ box_cfg_xc(void)<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span> * receive the same replica set UUID when a new cluster<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span> * is deployed.<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>box_sync_replication(TIMEOUT_INFINITY, true);<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>box_sync_replication(replication_connect_timeout, true);<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>/* Bootstrap a new master */<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>bootstrap(&replicaset_uuid, &is_bootstrap_leader);<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>}<br class="">diff --git a/src/box/<a href="http://replication.cc" class="">replication.cc</a> b/src/box/<a href="http://replication.cc" class="">replication.cc</a><br class="">index 528fe4459..a6c60220f 100644<br class="">--- a/src/box/<a href="http://replication.cc" class="">replication.cc</a><br class="">+++ b/src/box/<a href="http://replication.cc" class="">replication.cc</a><br class="">@@ -46,7 +46,7 @@ struct tt_uuid INSTANCE_UUID;<br class="">struct tt_uuid REPLICASET_UUID;<br class=""><br class="">double replication_timeout = 1.0; /* seconds */<br class="">-double replication_connect_timeout = 4.0; /* seconds */<br class="">+double replication_connect_timeout = 10.0; /* seconds */<br class=""></blockquote><br class="">Why? BTW, this isn't enough - replication_connect_timeout is actually<br class="">set in load_cfg.lua (I've no idea why they differ).<br class=""></blockquote><br class=""><br class="">Changed the default value to 30 seconds, to match the one in load_cfg.lua<br class="">Don’t see a reason for them to differ either,<br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;<br class="">double replication_sync_lag = 10.0; /* seconds */<br class=""><br class="">@@ -540,7 +540,7 @@ applier_on_connect_f(struct trigger *trigger, void *event)<br class=""><br class="">void<br class="">replicaset_connect(struct applier **appliers, int count,<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>   double timeout, bool connect_all)<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span>   double timeout, bool reach_quorum)<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>if (count == 0) {<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>/* Cleanup the replica set. */<br class="">@@ -587,15 +587,13 @@ replicaset_connect(struct applier **appliers, int count,<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>double wait_start = ev_monotonic_now(loop());<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (fiber_cond_wait_timeout(&state.wakeup, timeout) != 0)<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>break;<br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (state.failed > 0 && connect_all)<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>break;<br class=""></blockquote><br class="">I guess you should break the loop if<br class=""><br class=""> state.failed > count - min_count_to_connect<br class=""></blockquote><br class="">AFAIU, if replication_timeout is less than replication_connect_timeout, the appliers, which have<br class="">failed, will have time to try and reconnect during replicaset_connect(). So failing here is essentially ignoring<br class="">replication_connect_timeout.<br class=""><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>timeout -= ev_monotonic_now(loop()) - wait_start;<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span>}<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span>if (state.connected < count) {<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>say_crit("failed to connect to %d out of %d replicas",<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> count - state.connected, count);<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>/* Timeout or connection failure. */<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (connect_all)<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (reach_quorum && state.connected < replication_connect_quorum)<br class=""></blockquote><br class="">replication_connect_quorum can be greater than the number of configured<br class="">replicas. I think you should use MIN(count, replication_connect_quorum).<br class=""></blockquote><br class="">Fixed.<br class=""><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>goto error;<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>} else {<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>say_verbose("connected to %d replicas", state.connected);<br class="">diff --git a/src/box/replication.h b/src/box/replication.h<br class="">index 95122eb45..c16c8b56c 100644<br class="">--- a/src/box/replication.h<br class="">+++ b/src/box/replication.h<br class="">@@ -357,12 +357,13 @@ replicaset_add(uint32_t replica_id, const struct tt_uuid *instance_uuid);<br class=""> * \param appliers the array of appliers<br class=""> * \param count size of appliers array<br class=""> * \param timeout connection timeout<br class="">- * \param connect_all if this flag is set, fail unless all<br class="">- *                    appliers have successfully connected<br class="">+ * \param reach_quorum if this flag is set, fail unless at<br class="">+ *<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>       least replication_connect_quorum<br class="">+ *<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>       appliers have successfully connected.<br class=""> */<br class="">void<br class="">replicaset_connect(struct applier **appliers, int count,<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>   double timeout, bool connect_all);<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>   double timeout, bool reach_quorum);<br class=""><br class="">/**<br class=""> * Resume all appliers registered with the replica set.<br class="">diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py<br class="">index 0911a02c0..32b4639f1 100644<br class="">--- a/test/replication-py/init_storage.test.py<br class="">+++ b/test/replication-py/init_storage.test.py<br class="">@@ -57,7 +57,7 @@ print '-------------------------------------------------------------'<br class=""><br class="">server.stop()<br class="">replica = TarantoolServer(server.ini)<br class="">-replica.script = 'replication/replica.lua'<br class="">+replica.script = 'replication-py/replica.lua'<br class="">replica.vardir = server.vardir #os.path.join(server.vardir, 'replica')<br class="">replica.rpl_master = master<br class="">replica.deploy(wait=False)<br class="">diff --git a/test/replication-py/master.lua b/test/replication-py/master.lua<br class="">index 0f9f7a6f0..51283efdf 100644<br class="">--- a/test/replication-py/master.lua<br class="">+++ b/test/replication-py/master.lua<br class="">@@ -3,6 +3,8 @@ os = require('os')<br class="">box.cfg({<br class="">    listen              = os.getenv("LISTEN"),<br class="">    memtx_memory        = 107374182,<br class="">+    replication_connect_timeout = 1.0,<br class="">+    replication_timeout = 0.3<br class=""></blockquote><br class="">Why do you need to adjust the timeouts?<br class=""></blockquote><br class="">The timeouts set in all cfg files in all the tests had no effect, just like the<br class="">replication_connect_quorum option, cos both of the options were ignored<br class="">During bootstrap. We wanted every replica to connect, and the timeout was<br class="">Set to TIMEOUT_INFINITY. Now when we actually start passing<br class="">replication_connect_timeout, all these timeouts become too small.<br class=""><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">})<br class=""><br class="">require('console').listen(os.getenv('ADMIN'))<br class="">diff --git a/test/replication-py/replica.lua b/test/replication-py/replica.lua<br class="">index 278291bba..b9d193b70 100644<br class="">--- a/test/replication-py/replica.lua<br class="">+++ b/test/replication-py/replica.lua<br class="">@@ -7,6 +7,8 @@ box.cfg({<br class="">    listen              = os.getenv("LISTEN"),<br class="">    replication         = os.getenv("MASTER"),<br class="">    memtx_memory        = 107374182,<br class="">+    replication_connect_timeout = 1.0,<br class="">+    replication_timeout = 0.3<br class="">})<br class=""><br class="">box_cfg_done = true<br class="">diff --git a/test/replication/autobootstrap.lua b/test/replication/autobootstrap.lua<br class="">index 4f55417ae..8fc6809de 100644<br class="">--- a/test/replication/autobootstrap.lua<br class="">+++ b/test/replication/autobootstrap.lua<br class="">@@ -21,7 +21,8 @@ box.cfg({<br class="">        USER..':'..PASSWORD..'@'..instance_uri(2);<br class="">        USER..':'..PASSWORD..'@'..instance_uri(3);<br class="">    };<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_connect_timeout = 3.0;<br class="">+    replication_timeout = 0.5;<br class="">})<br class=""><br class="">box.once("bootstrap", function()<br class="">diff --git a/test/replication/autobootstrap_guest.lua b/test/replication/autobootstrap_guest.lua<br class="">index 40fef2c7a..7cd921e3c 100644<br class="">--- a/test/replication/autobootstrap_guest.lua<br class="">+++ b/test/replication/autobootstrap_guest.lua<br class="">@@ -20,7 +20,7 @@ box.cfg({<br class="">        instance_uri(2);<br class="">        instance_uri(3);<br class="">    };<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_connect_timeout = 5,<br class=""></blockquote><br class="">Why do you use different timeouts in different tests?<br class=""></blockquote><br class="">I tried to find the lowest possible boundary in every test. It happens so that they differ.<br class="">I believe, any finite timeout is better that the infinite one.<br class=""><br class="">Here’s the new diff:<br class=""><br class=""> src/box/<a href="http://box.cc" class="">box.cc</a>                           | 10 +++++-----<br class=""> src/box/<a href="http://replication.cc" class="">replication.cc</a>                   | 13 +++++++------<br class=""> src/box/replication.h                    |  8 ++++----<br class=""> test/replication-py/init_storage.test.py |  2 +-<br class=""> test/replication-py/master.lua           |  2 ++<br class=""> test/replication-py/replica.lua          |  2 ++<br class=""> test/replication/autobootstrap.lua       |  3 ++-<br class=""> test/replication/autobootstrap_guest.lua |  2 +-<br class=""> test/replication/ddl.lua                 |  3 ++-<br class=""> test/replication/errinj.result           |  6 +++---<br class=""> test/replication/errinj.test.lua         |  6 +++---<br class=""> test/replication/master.lua              |  1 +<br class=""> test/replication/master_quorum.lua       |  3 ++-<br class=""> test/replication/on_replace.lua          |  3 ++-<br class=""> test/replication/quorum.lua              |  4 ++--<br class=""> test/replication/rebootstrap.lua         |  2 +-<br class=""> test/replication/replica_no_quorum.lua   |  3 ++-<br class=""> test/replication/replica_timeout.lua     |  3 ++-<br class=""> test/replication/replica_uuid_ro.lua     |  2 +-<br class=""> 19 files changed, 45 insertions(+), 33 deletions(-)<br class=""><br class="">diff --git a/src/box/<a href="http://box.cc" class="">box.cc</a> b/src/box/<a href="http://box.cc" class="">box.cc</a><br class="">index e3eb2738f..8cf43a6ad 100644<br class="">--- a/src/box/<a href="http://box.cc" class="">box.cc</a><br class="">+++ b/src/box/<a href="http://box.cc" class="">box.cc</a><br class="">@@ -595,7 +595,7 @@ cfg_get_replication(int *p_count)<br class="">  * don't start appliers.<br class="">  */<br class=""> static void<br class="">-box_sync_replication(double timeout, bool connect_all)<br class="">+box_sync_replication(bool connect_quorum)<br class=""> {<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span>int count = 0;<br class=""> <span class="Apple-tab-span" style="white-space:pre">   </span>struct applier **appliers = cfg_get_replication(&count);<br class="">@@ -607,7 +607,7 @@ box_sync_replication(double timeout, bool connect_all)<br class=""> <span class="Apple-tab-span" style="white-space:pre">                      </span>applier_delete(appliers[i]); /* doesn't affect diag */<br class=""> <span class="Apple-tab-span" style="white-space:pre">   </span>});<br class=""> <br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span>replicaset_connect(appliers, count, timeout, connect_all);<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span>replicaset_connect(appliers, count, connect_quorum);<br class=""> <br class=""> <span class="Apple-tab-span" style="white-space:pre">  </span>guard.is_active = false;<br class=""> }<br class="">@@ -626,7 +626,7 @@ box_set_replication(void)<br class=""> <br class=""> <span class="Apple-tab-span" style="white-space:pre">        </span>box_check_replication();<br class=""> <span class="Apple-tab-span" style="white-space:pre"> </span>/* Try to connect to all replicas within the timeout period */<br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span>box_sync_replication(replication_connect_timeout, true);<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span>box_sync_replication(true);<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span>/* Follow replica */<br class=""> <span class="Apple-tab-span" style="white-space:pre">     </span>replicaset_follow();<br class=""> }<br class="">@@ -1866,7 +1866,7 @@ box_cfg_xc(void)<br class=""> <span class="Apple-tab-span" style="white-space:pre">              </span>title("orphan");<br class=""> <br class=""> <span class="Apple-tab-span" style="white-space:pre">            </span>/* Wait for the cluster to start up */<br class="">-<span class="Apple-tab-span" style="white-space:pre">                </span>box_sync_replication(replication_connect_timeout, false);<br class="">+<span class="Apple-tab-span" style="white-space:pre">             </span>box_sync_replication(false);<br class=""> <span class="Apple-tab-span" style="white-space:pre">     </span>} else {<br class=""> <span class="Apple-tab-span" style="white-space:pre">         </span>if (!tt_uuid_is_nil(&instance_uuid))<br class=""> <span class="Apple-tab-span" style="white-space:pre">                 </span>INSTANCE_UUID = instance_uuid;<br class="">@@ -1888,7 +1888,7 @@ box_cfg_xc(void)<br class=""> <span class="Apple-tab-span" style="white-space:pre">                </span> * receive the same replica set UUID when a new cluster<br class=""> <span class="Apple-tab-span" style="white-space:pre">             </span> * is deployed.<br class=""> <span class="Apple-tab-span" style="white-space:pre">             </span> */<br class="">-<span class="Apple-tab-span" style="white-space:pre">              </span>box_sync_replication(TIMEOUT_INFINITY, true);<br class="">+<span class="Apple-tab-span" style="white-space:pre">         </span>box_sync_replication(true);<br class=""> <span class="Apple-tab-span" style="white-space:pre">              </span>/* Bootstrap a new master */<br class=""> <span class="Apple-tab-span" style="white-space:pre">             </span>bootstrap(&replicaset_uuid, &is_bootstrap_leader);<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span>}<br class="">diff --git a/src/box/<a href="http://replication.cc" class="">replication.cc</a> b/src/box/<a href="http://replication.cc" class="">replication.cc</a><br class="">index 528fe4459..fa3b6afb4 100644<br class="">--- a/src/box/<a href="http://replication.cc" class="">replication.cc</a><br class="">+++ b/src/box/<a href="http://replication.cc" class="">replication.cc</a><br class="">@@ -46,7 +46,7 @@ struct tt_uuid INSTANCE_UUID;<br class=""> struct tt_uuid REPLICASET_UUID;<br class=""> <br class=""> double replication_timeout = 1.0; /* seconds */<br class="">-double replication_connect_timeout = 4.0; /* seconds */<br class="">+double replication_connect_timeout = 30.0; /* seconds */<br class=""> int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;<br class=""> double replication_sync_lag = 10.0; /* seconds */<br class=""> <br class="">@@ -540,7 +540,7 @@ applier_on_connect_f(struct trigger *trigger, void *event)<br class=""> <br class=""> void<br class=""> replicaset_connect(struct applier **appliers, int count,<br class="">-<span class="Apple-tab-span" style="white-space:pre">                </span>   double timeout, bool connect_all)<br class="">+<span class="Apple-tab-span" style="white-space:pre">           </span>   bool connect_quorum)<br class=""> {<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span>if (count == 0) {<br class=""> <span class="Apple-tab-span" style="white-space:pre">                </span>/* Cleanup the replica set. */<br class="">@@ -557,7 +557,7 @@ replicaset_connect(struct applier **appliers, int count,<br class=""> <span class="Apple-tab-span" style="white-space:pre">  </span> * - register a trigger in each applier to wake up our<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span> *   fiber via this channel when the remote peer becomes<br class=""> <span class="Apple-tab-span" style="white-space:pre">     </span> *   connected and a UUID is received;<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span> * - wait up to CONNECT_TIMEOUT seconds for `count` messages;<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span> * - wait up to REPLICATION_CONNECT_TIMEOUT seconds for `count` messages;<br class=""> <span class="Apple-tab-span" style="white-space:pre">   </span> * - on timeout, raise a CFG error, cancel and destroy<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span> *   the freshly created appliers (done in a guard);<br class=""> <span class="Apple-tab-span" style="white-space:pre"> </span> * - an success, unregister the trigger, check the UUID set<br class="">@@ -571,6 +571,8 @@ replicaset_connect(struct applier **appliers, int count,<br class=""> <span class="Apple-tab-span" style="white-space:pre">        </span>state.connected = state.failed = 0;<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span>fiber_cond_create(&state.wakeup);<br class=""> <br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span>double timeout = replication_connect_timeout;<br class="">+<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span>/* Add triggers and start simulations connection to remote peers */<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span>for (int i = 0; i < count; i++) {<br class=""> <span class="Apple-tab-span" style="white-space:pre">             </span>struct applier *applier = appliers[i];<br class="">@@ -587,15 +589,14 @@ replicaset_connect(struct applier **appliers, int count,<br class=""> <span class="Apple-tab-span" style="white-space:pre">                </span>double wait_start = ev_monotonic_now(loop());<br class=""> <span class="Apple-tab-span" style="white-space:pre">            </span>if (fiber_cond_wait_timeout(&state.wakeup, timeout) != 0)<br class=""> <span class="Apple-tab-span" style="white-space:pre">                    </span>break;<br class="">-<span class="Apple-tab-span" style="white-space:pre">                </span>if (state.failed > 0 && connect_all)<br class="">-<span class="Apple-tab-span" style="white-space:pre">                       </span>break;<br class=""> <span class="Apple-tab-span" style="white-space:pre">           </span>timeout -= ev_monotonic_now(loop()) - wait_start;<br class=""> <span class="Apple-tab-span" style="white-space:pre">        </span>}<br class=""> <span class="Apple-tab-span" style="white-space:pre">        </span>if (state.connected < count) {<br class=""> <span class="Apple-tab-span" style="white-space:pre">                </span>say_crit("failed to connect to %d out of %d replicas",<br class=""> <span class="Apple-tab-span" style="white-space:pre">                 </span> count - state.connected, count);<br class=""> <span class="Apple-tab-span" style="white-space:pre">           </span>/* Timeout or connection failure. */<br class="">-<span class="Apple-tab-span" style="white-space:pre">          </span>if (connect_all)<br class="">+<span class="Apple-tab-span" style="white-space:pre">              </span>if (connect_quorum && state.connected <<br class="">+<span class="Apple-tab-span" style="white-space:pre">            </span>    MIN(count, replication_connect_quorum))<br class=""> <span class="Apple-tab-span" style="white-space:pre">                       </span>goto error;<br class=""> <span class="Apple-tab-span" style="white-space:pre">      </span>} else {<br class=""> <span class="Apple-tab-span" style="white-space:pre">         </span>say_verbose("connected to %d replicas", state.connected);<br class="">diff --git a/src/box/replication.h b/src/box/replication.h<br class="">index 95122eb45..9ce9910f8 100644<br class="">--- a/src/box/replication.h<br class="">+++ b/src/box/replication.h<br class="">@@ -356,13 +356,13 @@ replicaset_add(uint32_t replica_id, const struct tt_uuid *instance_uuid);<br class="">  *<br class="">  * \param appliers the array of appliers<br class="">  * \param count size of appliers array<br class="">- * \param timeout connection timeout<br class="">- * \param connect_all if this flag is set, fail unless all<br class="">- *                    appliers have successfully connected<br class="">+ * \param connect_quorum if this flag is set, fail unless at<br class="">+ *<span class="Apple-tab-span" style="white-space:pre">             </span>       least replication_connect_quorum<br class="">+ *<span class="Apple-tab-span" style="white-space:pre">            </span>       appliers have successfully connected.<br class="">  */<br class=""> void<br class=""> replicaset_connect(struct applier **appliers, int count,<br class="">-<span class="Apple-tab-span" style="white-space:pre">            </span>   double timeout, bool connect_all);<br class="">+<span class="Apple-tab-span" style="white-space:pre">          </span>   bool connect_quorum);<br class=""> <br class=""> /**<br class="">  * Resume all appliers registered with the replica set.<br class="">diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py<br class="">index 0911a02c0..32b4639f1 100644<br class="">--- a/test/replication-py/init_storage.test.py<br class="">+++ b/test/replication-py/init_storage.test.py<br class="">@@ -57,7 +57,7 @@ print '-------------------------------------------------------------'<br class=""> <br class=""> server.stop()<br class=""> replica = TarantoolServer(server.ini)<br class="">-replica.script = 'replication/replica.lua'<br class="">+replica.script = 'replication-py/replica.lua'<br class=""> replica.vardir = server.vardir #os.path.join(server.vardir, 'replica')<br class=""> replica.rpl_master = master<br class=""> replica.deploy(wait=False)<br class="">diff --git a/test/replication-py/master.lua b/test/replication-py/master.lua<br class="">index 0f9f7a6f0..51283efdf 100644<br class="">--- a/test/replication-py/master.lua<br class="">+++ b/test/replication-py/master.lua<br class="">@@ -3,6 +3,8 @@ os = require('os')<br class=""> box.cfg({<br class="">     listen              = os.getenv("LISTEN"),<br class="">     memtx_memory        = 107374182,<br class="">+    replication_connect_timeout = 1.0,<br class="">+    replication_timeout = 0.3<br class=""> })<br class=""> <br class=""> require('console').listen(os.getenv('ADMIN'))<br class="">diff --git a/test/replication-py/replica.lua b/test/replication-py/replica.lua<br class="">index 278291bba..b9d193b70 100644<br class="">--- a/test/replication-py/replica.lua<br class="">+++ b/test/replication-py/replica.lua<br class="">@@ -7,6 +7,8 @@ box.cfg({<br class="">     listen              = os.getenv("LISTEN"),<br class="">     replication         = os.getenv("MASTER"),<br class="">     memtx_memory        = 107374182,<br class="">+    replication_connect_timeout = 1.0,<br class="">+    replication_timeout = 0.3<br class=""> })<br class=""> <br class=""> box_cfg_done = true<br class="">diff --git a/test/replication/autobootstrap.lua b/test/replication/autobootstrap.lua<br class="">index 4f55417ae..8fc6809de 100644<br class="">--- a/test/replication/autobootstrap.lua<br class="">+++ b/test/replication/autobootstrap.lua<br class="">@@ -21,7 +21,8 @@ box.cfg({<br class="">         USER..':'..PASSWORD..'@'..instance_uri(2);<br class="">         USER..':'..PASSWORD..'@'..instance_uri(3);<br class="">     };<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_connect_timeout = 3.0;<br class="">+    replication_timeout = 0.5;<br class=""> })<br class=""> <br class=""> box.once("bootstrap", function()<br class="">diff --git a/test/replication/autobootstrap_guest.lua b/test/replication/autobootstrap_guest.lua<br class="">index 40fef2c7a..7cd921e3c 100644<br class="">--- a/test/replication/autobootstrap_guest.lua<br class="">+++ b/test/replication/autobootstrap_guest.lua<br class="">@@ -20,7 +20,7 @@ box.cfg({<br class="">         instance_uri(2);<br class="">         instance_uri(3);<br class="">     };<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_connect_timeout = 5,<br class=""> })<br class=""> <br class=""> box.once("bootstrap", function()<br class="">diff --git a/test/replication/ddl.lua b/test/replication/ddl.lua<br class="">index 694f40eac..85403e35b 100644<br class="">--- a/test/replication/ddl.lua<br class="">+++ b/test/replication/ddl.lua<br class="">@@ -22,7 +22,8 @@ box.cfg({<br class="">         USER..':'..PASSWORD..'@'..instance_uri(3);<br class="">         USER..':'..PASSWORD..'@'..instance_uri(4);<br class="">     };<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_timeout = 0.1,<br class="">+    replication_connect_timeout = 2.0,<br class=""> })<br class=""> <br class=""> box.once("bootstrap", function()<br class="">diff --git a/test/replication/errinj.result b/test/replication/errinj.result<br class="">index ca8af2988..19d7d9a05 100644<br class="">--- a/test/replication/errinj.result<br class="">+++ b/test/replication/errinj.result<br class="">@@ -418,7 +418,7 @@ test_run:cmd("create server replica_timeout with rpl_master=default, script='rep<br class=""> ---<br class=""> - true<br class=""> ...<br class="">-test_run:cmd("start server replica_timeout with args='0.01'")<br class="">+test_run:cmd("start server replica_timeout with args='0.1, 0.5'")<br class=""> ---<br class=""> - true<br class=""> ...<br class="">@@ -474,7 +474,7 @@ errinj.set("ERRINJ_RELAY_REPORT_INTERVAL", 0)<br class=""> ...<br class=""> -- Check replica's ACKs don't prevent the master from sending<br class=""> -- heartbeat messages (gh-3160).<br class="">-test_run:cmd("start server replica_timeout with args='0.009'")<br class="">+test_run:cmd("start server replica_timeout with args='0.009, 0.5'")<br class=""> ---<br class=""> - true<br class=""> ...<br class="">@@ -522,7 +522,7 @@ for i = 0, 9999 do box.space.test:replace({i, 4, 5, 'test'}) end<br class=""> -- during the join stage, i.e. a replica with a minuscule<br class=""> -- timeout successfully bootstraps and breaks connection only<br class=""> -- after subscribe.<br class="">-test_run:cmd("start server replica_timeout with args='0.00001'")<br class="">+test_run:cmd("start server replica_timeout with args='0.00001, 0.5'")<br class=""> ---<br class=""> - true<br class=""> ...<br class="">diff --git a/test/replication/errinj.test.lua b/test/replication/errinj.test.lua<br class="">index 463d89a8f..f00b98eed 100644<br class="">--- a/test/replication/errinj.test.lua<br class="">+++ b/test/replication/errinj.test.lua<br class="">@@ -173,7 +173,7 @@ errinj.set("ERRINJ_RELAY_EXIT_DELAY", 0)<br class=""> box.cfg{replication_timeout = 0.01}<br class=""> <br class=""> test_run:cmd("create server replica_timeout with rpl_master=default, script='replication/replica_timeout.lua'")<br class="">-test_run:cmd("start server replica_timeout with args='0.01'")<br class="">+test_run:cmd("start server replica_timeout with args='0.1, 0.5'")<br class=""> test_run:cmd("switch replica_timeout")<br class=""> <br class=""> fiber = require('fiber')<br class="">@@ -199,7 +199,7 @@ errinj.set("ERRINJ_RELAY_REPORT_INTERVAL", 0)<br class=""> -- Check replica's ACKs don't prevent the master from sending<br class=""> -- heartbeat messages (gh-3160).<br class=""> <br class="">-test_run:cmd("start server replica_timeout with args='0.009'")<br class="">+test_run:cmd("start server replica_timeout with args='0.009, 0.5'")<br class=""> test_run:cmd("switch replica_timeout")<br class=""> <br class=""> fiber = require('fiber')<br class="">@@ -219,7 +219,7 @@ for i = 0, 9999 do box.space.test:replace({i, 4, 5, 'test'}) end<br class=""> -- during the join stage, i.e. a replica with a minuscule<br class=""> -- timeout successfully bootstraps and breaks connection only<br class=""> -- after subscribe.<br class="">-test_run:cmd("start server replica_timeout with args='0.00001'")<br class="">+test_run:cmd("start server replica_timeout with args='0.00001, 0.5'")<br class=""> test_run:cmd("switch replica_timeout")<br class=""> fiber = require('fiber')<br class=""> while box.info.replication[1].upstream.message ~= 'timed out' do fiber.sleep(0.0001) end<br class="">diff --git a/test/replication/master.lua b/test/replication/master.lua<br class="">index 6d431aaeb..9b96b7891 100644<br class="">--- a/test/replication/master.lua<br class="">+++ b/test/replication/master.lua<br class="">@@ -4,6 +4,7 @@ box.cfg({<br class="">     listen              = os.getenv("LISTEN"),<br class="">     memtx_memory        = 107374182,<br class="">     replication_connect_timeout = 0.5,<br class="">+    replication_timeout = 0.1<br class=""> })<br class=""> <br class=""> require('console').listen(os.getenv('ADMIN'))<br class="">diff --git a/test/replication/master_quorum.lua b/test/replication/master_quorum.lua<br class="">index fb5f7ec2b..6e0429f65 100644<br class="">--- a/test/replication/master_quorum.lua<br class="">+++ b/test/replication/master_quorum.lua<br class="">@@ -20,7 +20,8 @@ box.cfg({<br class="">         instance_uri(2);<br class="">     };<br class="">     replication_connect_quorum = 0;<br class="">-    replication_connect_timeout = 0.1;<br class="">+    replication_timeout = 0.5;<br class="">+    replication_connect_timeout = 2.0;<br class=""> })<br class=""> <br class=""> test_run = require('test_run').new()<br class="">diff --git a/test/replication/on_replace.lua b/test/replication/on_replace.lua<br class="">index 03f15d94c..bafead48d 100644<br class="">--- a/test/replication/on_replace.lua<br class="">+++ b/test/replication/on_replace.lua<br class="">@@ -20,7 +20,8 @@ box.cfg({<br class="">         USER..':'..PASSWORD..'@'..instance_uri(1);<br class="">         USER..':'..PASSWORD..'@'..instance_uri(2);<br class="">     };<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_timeout = 0.5,<br class="">+    replication_connect_timeout = 1.0,<br class=""> })<br class=""> <br class=""> env = require('test_run')<br class="">diff --git a/test/replication/quorum.lua b/test/replication/quorum.lua<br class="">index 9c7bf5c93..7f85d7b13 100644<br class="">--- a/test/replication/quorum.lua<br class="">+++ b/test/replication/quorum.lua<br class="">@@ -15,8 +15,8 @@ require('console').listen(os.getenv('ADMIN'))<br class=""> box.cfg({<br class="">     listen = instance_uri(INSTANCE_ID);<br class="">     replication_timeout = 0.05;<br class="">-    replication_sync_lag = 0.01;<br class="">-    replication_connect_timeout = 0.1;<br class="">+    replication_sync_lag = 0.1;<br class="">+    replication_connect_timeout = 3.0;<br class="">     replication_connect_quorum = 3;<br class="">     replication = {<br class="">         instance_uri(1);<br class="">diff --git a/test/replication/rebootstrap.lua b/test/replication/rebootstrap.lua<br class="">index e743577e4..f1e8d69e9 100644<br class="">--- a/test/replication/rebootstrap.lua<br class="">+++ b/test/replication/rebootstrap.lua<br class="">@@ -15,7 +15,7 @@ box.cfg({<br class="">     listen = instance_uri(INSTANCE_ID),<br class="">     instance_uuid = '12345678-abcd-1234-abcd-123456789ef' .. INSTANCE_ID,<br class="">     replication_timeout = 0.1,<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_connect_timeout = 2.0,<br class="">     replication = {<br class="">         instance_uri(1);<br class="">         instance_uri(2);<br class="">diff --git a/test/replication/replica_no_quorum.lua b/test/replication/replica_no_quorum.lua<br class="">index b9edeea94..c30c043cc 100644<br class="">--- a/test/replication/replica_no_quorum.lua<br class="">+++ b/test/replication/replica_no_quorum.lua<br class="">@@ -5,7 +5,8 @@ box.cfg({<br class="">     replication         = os.getenv("MASTER"),<br class="">     memtx_memory        = 107374182,<br class="">     replication_connect_quorum = 0,<br class="">-    replication_connect_timeout = 0.1,<br class="">+    replication_timeout = 0.1,<br class="">+    replication_connect_timeout = 0.5,<br class=""> })<br class=""> <br class=""> require('console').listen(os.getenv('ADMIN'))<br class="">diff --git a/test/replication/replica_timeout.lua b/test/replication/replica_timeout.lua<br class="">index 64f119763..51c718360 100644<br class="">--- a/test/replication/replica_timeout.lua<br class="">+++ b/test/replication/replica_timeout.lua<br class="">@@ -1,13 +1,14 @@<br class=""> #!/usr/bin/env tarantool<br class=""> <br class=""> local TIMEOUT = tonumber(arg[1])<br class="">+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or TIMEOUT * 3<br class=""> <br class=""> box.cfg({<br class="">     listen              = os.getenv("LISTEN"),<br class="">     replication         = os.getenv("MASTER"),<br class="">     memtx_memory        = 107374182,<br class="">     replication_timeout = TIMEOUT,<br class="">-    replication_connect_timeout = TIMEOUT * 3,<br class="">+    replication_connect_timeout = CON_TIMEOUT,<br class=""> })<br class=""> <br class=""> require('console').listen(os.getenv('ADMIN'))<br class="">diff --git a/test/replication/replica_uuid_ro.lua b/test/replication/replica_uuid_ro.lua<br class="">index 8e1c6cc47..ff70da144 100644<br class="">--- a/test/replication/replica_uuid_ro.lua<br class="">+++ b/test/replication/replica_uuid_ro.lua<br class="">@@ -22,7 +22,7 @@ box.cfg({<br class="">         USER..':'..PASSWORD..'@'..instance_uri(2);<br class="">     };<br class="">     read_only = (INSTANCE_ID ~= '1' and true or false);<br class="">-    replication_connect_timeout = 0.5,<br class="">+    replication_connect_timeout = 5,<br class=""> })<br class=""> <br class=""> box.once("bootstrap", function()<br class="">-- <br class="">2.15.2 (Apple Git-101.1)<br class=""><br class=""></body></html>