<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <b style="font-weight:normal;"
      id="docs-internal-guid-fbc296c2-7fff-f3c3-d05b-cf1e7064ad4b">
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Hi! Thanks for the review. I will fix most of the comments and observations.</span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">However, I guess I need to know that we are on the same page.</span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">When we do  </span><span style="font-size:9pt;font-family:Verdana;color:#24292e;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">box.cfg{replication={44441, 44441}}</span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">we have two replicas, each one has its onw applier, etc. In </span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">replicaset_update()</span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"> we are able to</span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">identify  that the replicas are same. At this point we raise exceptions. Problem occurs when we try to delete the second one. For proper deletion we need to stop applier, clear it and than delete replica. As I understand we need to:</span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">applier_stop(replica->applier);</span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">replica_clear_applier(replica);</span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">
</span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">replica_delete(replica);</span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">The reason I added replica_on_applier_off(replica) is because when an applier enters stopped state, it state marks as APPLIER_OFF. Trigger on change states reacts on this change with </span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">replica_on_applier_state_f. That leads us to on_applier_sync. Instead we should react on APPLIER_DISCONNECTED. And the only way we react on this state  - is to try to load applier again. </span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">So  </span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">replica_on_applier_off </span><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">is used in case when we want to stop applier forever, before replica deletion. I think we do need this function. May be in some other form that I did. </span></p>
      <p dir="ltr"
        style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Other comments, about places, comments, test I will definitely fix. But I need to know that the way I see the problem and the way of fix is correct one. So let’s discuss it.</span></p>
    </b><br>
    <div class="moz-cite-prefix">17/09/2018 18:05, Vladimir Davydov
      пишет:<br>
    </div>
    <blockquote type="cite"
      cite="mid:20180917150512.fizr47fac4zks26l@esperanza">
      <pre wrap="">On Tue, Sep 11, 2018 at 10:11:05AM +0300, Olga Arkhangelskaia wrote:
</pre>
      <blockquote type="cite">
        <pre wrap="">In case of duplicated connection to the same master we are not able
to check it at the very beginning due to quorum option. So we allow such
configuration to proceed till it is obvious. If it the initial
configuration - replica won't start and in case of configuration change
we will be notified about duplicated replication source.
</pre>
      </blockquote>
      <pre wrap="">
This comment isn't exactly helpful: this is how duplicate connections
are supposed to be handled. It'd be nice to read what exactly you're
fixing in this patch.

Nit: please use 'duplicate' instead of 'duplicated'.

</pre>
      <blockquote type="cite">
        <pre wrap="">---
<a class="moz-txt-link-freetext" href="https://github.com/tarantool/tarantool/issues/3610">https://github.com/tarantool/tarantool/issues/3610</a>
<a class="moz-txt-link-freetext" href="https://github.com/tarantool/tarantool/tree/OKriw/gh-3610-assert_fail_when_connect_master_twice-1.10">https://github.com/tarantool/tarantool/tree/OKriw/gh-3610-assert_fail_when_connect_master_twice-1.10</a>

v1:

Changes in v2:
- changed completely, now we let duplicated params to proceed
- stop the applier at the moment when replica has the same hash that other one
- changed test

 src/box/box.cc                                 |  4 ++-
 src/box/replication.cc                         | 21 +++++++++++++--
 test/replication/duplicate_connection.result   | 37 ++++++++++++++++++++++++++
 test/replication/duplicate_connection.test.lua | 16 +++++++++++
 4 files changed, 75 insertions(+), 3 deletions(-)
 create mode 100644 test/replication/duplicate_connection.result
 create mode 100644 test/replication/duplicate_connection.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index f25146d01..d3aeb5de0 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -667,8 +667,10 @@ box_sync_replication(bool connect_quorum)
                diag_raise();
 
        auto guard = make_scoped_guard([=]{
-               for (int i = 0; i < count; i++)
+               for (int i = 0; i < count; i++) {
+                       applier_stop(appliers[i]);
</pre>
      </blockquote>
      <pre wrap="">
Appliers are started by replicaset_connect. On some errors they are
stopped by replicaset_connect, but on other errors they are stopped by
this guard here, in box_sync_replication. This looks confusing. Please
always stop appliers in the same place where they are started, i.e. in
replicaset_connect.

</pre>
      <blockquote type="cite">
        <pre wrap="">                   applier_delete(appliers[i]); /* doesn't affect diag */
+               }
        });
 
        replicaset_connect(appliers, count, connect_quorum);
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 5755ad45e..d6b65e0a1 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -261,6 +261,21 @@ replica_on_applier_sync(struct replica *replica)
        replicaset_check_quorum();
 }
 
+static void
+replica_on_applier_off(struct replica *replica)
+{
+       switch (replica->applier_sync_state) {
+       case APPLIER_CONNECTED:
+               replica_on_applier_sync(replica);
+               break;
+       case APPLIER_DISCONNECTED:
+               break;
+       default:
+               unreachable();
+       }
+
+}
+
 static void
 replica_on_applier_connect(struct replica *replica)
 {
@@ -396,9 +411,10 @@ replica_on_applier_state_f(struct trigger *trigger, void *event)
                /*
                 * Connection to self, duplicate connection
                 * to the same master, or the applier fiber
-                * has been cancelled. Assume synced.
+                * has been cancelled. In case of duplicated connection
+                * will be left in this state, otherwise assume synced.
</pre>
      </blockquote>
      <pre wrap="">
Nit: when updating a comment, please make sure it stays aligned.

</pre>
      <blockquote type="cite">
        <pre wrap="">            */
-               replica_on_applier_sync(replica);
+               replica_on_applier_off(replica);
</pre>
      </blockquote>
      <pre wrap="">
I don't understand replica_on_applier_off(). How's it possible?

Replacing this function with replica_on_applier_sync() results in an
assertion failure:

  void replica_on_applier_sync(replica*): Assertion `replica->applier_sync_state == APPLIER_CONNECTED' failed.

I guess replica_on_applier_off() is supposed to plug it.

The assertion is triggered by box_sync_replication() calling
applier_stop(). However, the replica should've been deleted by that time
and the trigger should've been cleared and hence never called. Looks
like instead of plugging this hole with replica_on_applier_off(), you
should fix a replica struct leak in replicaset_update():

433 static void
434 replicaset_update(struct applier **appliers, int count)
435 {
...
473                 if (replica_hash_search(&uniq, replica) != NULL) {
474                         tnt_raise(ClientError, ER_CFG, "replication",
475                                   "duplicate connection to the same replica");

^^^ replica leaks here and stays attached to the applier

476                 }
477                 replica_hash_insert(&uniq, replica);

</pre>
      <blockquote type="cite">
        <pre wrap="">           break;
        case APPLIER_STOPPED:
                /* Unrecoverable error. */
@@ -427,6 +443,7 @@ replicaset_update(struct applier **appliers, int count)
        auto uniq_guard = make_scoped_guard([&]{
                replica_hash_foreach_safe(&uniq, replica, next) {
                        replica_hash_remove(&uniq, replica);
+                       replica_clear_applier(replica);
                        replica_delete(replica);
                }
        });
</pre>
      </blockquote>
      <pre wrap="">
</pre>
      <blockquote type="cite">
        <pre wrap="">diff --git a/test/replication/duplicate_connection.test.lua b/test/replication/duplicate_connection.test.lua
new file mode 100644
index 000000000..b9bc573ca
--- /dev/null
+++ b/test/replication/duplicate_connection.test.lua
@@ -0,0 +1,16 @@
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
</pre>
      </blockquote>
      <pre wrap="">
engine isn't used in this test

Anyway, I think that the test case is small enough to be moved to
misc.test.lua

Also, you only test one case described in #3610. The other one (async
duplicate connection detection) remains untested. Please test it as
well.

</pre>
      <blockquote type="cite">
        <pre wrap="">+
+box.schema.user.grant('guest', 'replication')
+
+-- Deploy a replica.
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica")
+test_run:cmd("switch replica")
+
+replication = box.cfg.replication
+box.cfg{replication = {replication, replication}}
+
+
+test_run:cmd("switch default")
+box.schema.user.revoke('guest', 'replication') 
</pre>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>