[Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.

Sergey Petrenko sergepetrenko at tarantool.org
Fri Dec 27 18:27:17 MSK 2019


>Среда, 25 декабря 2019, 21:22 +03:00 от Vladislav Shpilevoy <v.shpilevoy at tarantool.org>:
>
>Thanks for the patch!

Hi! Thanks for your answer! My comments and incremental diff are below.
Also please check out the discussion with Kostja Osipov in this thread
( [Tarantool-patches] [PATCH 4/5] vclock: ignore 0th component in comparisons.),
there are some new changes to the commit regarding 0 vclock component comparison.

>
>I've pushed my review fixes on top of this commit. See it in the
>end of the email, and on the branch.
Thanks! Looks good.
>
>
>We should document new iproto codes: IPROTO_FETCH_SNAPSHOT,
>IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here:
>https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/

Opened a ticket. https://github.com/tarantool/doc/issues/1046

>
>
>See 4 comments below.
>
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index f4f9d0670..2939a0f61 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -996,10 +1035,24 @@ applier_f(va_list ap)
>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>  /*
>>   * Execute JOIN if this is a bootstrap.
>> + * In case of anonymous replication, don't
>> + * join but just fetch master's snapshot.
>> + *
>>   * The join will pause the applier
>>   * until WAL is created.
>>   */
>> -applier_join(applier);
>> +if (replication_anon)
>> +applier_fetch_snapshot(applier);
>> +else
>> +applier_join(applier);
>> +}
>> +if (applier->version_id >= version_id(1, 7, 0) &&
>> +    !replication_anon && instance_id == REPLICA_ID_NIL) {
>> +/* anonymity was turned off while we were
>> + * fetching a snapshot or following master.
>> + * Register the replica now.
>> + */
>> +applier_register(applier);
>>  }
>
>1. Why do we check for version here? Wouldn't it be more straight
>to do something like this?
>
>================================================================================
>
>diff --git a/src/box/applier.cc b/src/box/applier.cc
>index 3efa2b765..32851506c 100644
>--- a/src/box/applier.cc
>+++ b/src/box/applier.cc
>@@ -1032,6 +1032,7 @@ applier_f(va_list ap)
> while (!fiber_is_cancelled()) {
> try {
> applier_connect(applier);
>+bool was_anon = replication_anon;
> if (tt_uuid_is_nil(&REPLICASET_UUID)) {
> /*
>  * Execute JOIN if this is a bootstrap.
>@@ -1046,8 +1047,8 @@ applier_f(va_list ap)
> else
> applier_join(applier);
> }
>-if (applier->version_id >= version_id(1, 7, 0) &&
>-    !replication_anon && instance_id == REPLICA_ID_NIL) {
>+if (was_anon && !replication_anon &&
>+    instance_id == REPLICA_ID_NIL) {
> /*
>  * Anonymity was turned off while
>  * we were fetching a snapshot or
>
>================================================================================
>
>Also I don't understand, why is it needed? From what I see in
>box_set_replication_anon(), we stop existing appliers, when
>change `replication_anon`. So in case `replication_anon` is
>changed, the fiber should be canceled by that moment.
>applier_connect() does fiber_testcancel() via coio after each
>yield.
>
>A register could even harm here, because this instance will do
>register twice - from there, and from a new applier fiber, started
>in box_set_replication_anon().

True, the version check is pointless and the comment is misleading.
But we need to call applier_register here. In case of a transition
from anon to normal, we have already fetched a snapshot, so we
shouldn't execute join in a new applier, we just execute register and
proceed to subscribe. Fixed.

>
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 56d07e634..a93e2503e 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
>>  replication_skip_conflict = cfg_geti("replication_skip_conflict");
>>  }
>> 
>> +void
>> +box_set_replication_anon(void)
>> +{
>> +bool anon = box_check_replication_anon();
>> +if (anon == replication_anon)
>> +return;
>> +
>> +if (!anon) {
>> +/* Turn anonymous instance into a normal one. */
>> +replication_anon = anon;
>
>2. In case of any exception `replication_anon` will contain a not
>applied value.
>
>But there is a worse problem - we stop the appliers, and don't
>restart them in case of an exception. So box_set_replication_anon()
>is not atomic, it may leave the instance in a half configured
>state. box_set_replication() deals with that by creation of a new
>array of appliers. It connects them, and only then atomically
>replaces the old appliers.
>
>Also looks like deanonymisation does not respect quorum. I did
>this test:
>
>    Master:
>        box.cfg{listen=3301}
>        box.schema.user.grant('guest', 'read,write,execute', 'universe')
>
>
>    Replica:
>        box.cfg{replication_anon=true,
>                read_only=true,
>                replication_connect_quorum=0,
>                replication=3301}
>
>    Master:
>        Killed.
>
>    Replica:
>        box.cfg{replication_anon=false, read_only=false}
>
>The replica hangs. Even though the quorum is 0.
>
>I think, we somehow need to reuse box_set_replication() for all
>of this. Looks like it does all we need except does not send the
>register requests.

True.

>
>> +/*
>> + * Reset all appliers. This will interrupt
>> + * anonymous follow they're in so that one of
>> + * them can register and others resend a
>> + * non-anonymous subscribe.
>> + */
>> +replicaset_foreach(replica) {
>> +struct applier *applier = replica->applier;
>> +if (applier == NULL)
>> +continue;
>> +replica_clear_applier(replica);
>> +applier_stop(applier);
>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>> +replica_set_applier(replica, applier);
>> +}
>> +/* Choose a master to send register request to. */
>> +struct replica *master = replicaset_leader();
>> +assert(master != NULL && master->applier != NULL);
>> +struct applier *master_applier = master->applier;
>> +
>> +applier_start(master_applier);
>> +
>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
>> +applier_resume(master_applier);
>> +/**
>> + * Restart other appliers to
>> + * resend non-anonymous subscribe.
>> + */
>> +replicaset_foreach(replica) {
>> +if (replica != master && replica->applier)
>> +applier_start(replica->applier);
>> +}
>> +} else if (!is_box_configured) {
>> +replication_anon = anon;
>> +} else {
>> +/*
>> + * It is forbidden to turn a normal replica into
>> + * an anonymous one.
>> + */
>> +tnt_raise(ClientError, ER_CFG, "replication_anon",
>> +  "cannot be turned on after bootstrap"
>> +  " has finished");
>> +}
>> +
>> +}
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index e849fcf4f..22029751f 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>>  relay_start(relay, fd, sync, relay_send_row);
>>  auto relay_guard = make_scoped_guard([=] {
>>  relay_stop(relay);
>> -replica_on_relay_stop(replica);
>> +if (replica->anon)
>> +replica_anon_delete(replica);
>> +else
>> +replica_on_relay_stop(replica);
>
>3. Why can't replica_on_relay_stop() be used for an anon replica?

replica_on_relay_stop() asserts the replica is registered with gc and
removes it from consumers. Anonymous replicas aren't registered with gc.
I can rewrite replica_on_relay_stop() appropriately, but why?

>
>>  });
>> 
>> @@ -741,6 +750,13 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
>>  {
>>  struct relay *relay = container_of(stream, struct relay, stream);
>>  assert(iproto_type_is_dml(packet->type));
>> +/*
>> + * Replica-local requests generated while replica was
>> + * anonymous have a zero instance id. Just skip all
>> + * these rows.
>> + */
>> +if (packet->replica_id == REPLICA_ID_NIL)
>> +return;
>
>4. From what I remember about local requests in an anon replica,
>this is possible only with packet->group_id == GROUP_LOCAL,
>right? Then can you move it into the check about GROUP_LOCAL
>below? To avoid checking replica_id, when the group is not local
>anyway. Like this:
>
>if (packet->group_id == GROUP_LOCAL) {
>if (packet->replica_id == REPLICA_ID_NIL)
>return;
>packet->type = IPROTO_NOP;
>packet->group_id = GROUP_DEFAULT;
>packet->bodycnt = 0;
>}

Good point, thanks!

My changes are below.

>
>================================================================================
>
>commit 3e8fded091fae8abacc876bf6c899750b595d72d
>Author: Vladislav Shpilevoy < v.shpilevoy at tarantool.org >
>Date:   Wed Dec 25 17:20:45 2019 +0100
>
>    Review fixes
>
>diff --git a/src/box/applier.cc b/src/box/applier.cc
>index 2939a0f61..3efa2b765 100644
>--- a/src/box/applier.cc
>+++ b/src/box/applier.cc
>@@ -1048,9 +1048,11 @@ applier_f(va_list ap)
> }
> if (applier->version_id >= version_id(1, 7, 0) &&
>     !replication_anon && instance_id == REPLICA_ID_NIL) {
>-/* anonymity was turned off while we were
>- * fetching a snapshot or following master.
>- * Register the replica now.
>+/*
>+ * Anonymity was turned off while
>+ * we were fetching a snapshot or
>+ * following master. Register the
>+ * replica now.
>  */
> applier_register(applier);
> }
>diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>index 9dee75b7d..51d157aa3 100644
>--- a/src/box/lua/load_cfg.lua
>+++ b/src/box/lua/load_cfg.lua
>@@ -279,6 +279,7 @@ local dynamic_cfg_order = {
>     replication_sync_timeout    = 150,
>     replication_connect_timeout = 150,
>     replication_connect_quorum  = 150,
>+    replication_anon        = 175,
>================================================================================
>
>I am not sure about this change.
>
>The only way how can it be changed now - from true to false.
>Not back. The only dubious scenario: we change `replication_anon`
>to false, and change `replication` at the same time.
>
>If we set `replication_anon` false before new `replication`,
>we will try to register the old `replication`. It means
>`replication_anon` update should be done after `replication`.
>So as the new configuration could connect as anon, and then be
>upgraded to normal.
>
>But when we will support anonymization of normal replicas, the
>config update could look like this:
>
>    `replication_anon` false -> true
>    `replication` old -> new
>
>And here `replication_anon` should be set before `replication`.
>So as we would connect anonymously to the new replication set.
>
>In both cases we do unnecessary actions with the old
>replication.
>
>Seems like it can't be properly solved by dynamic_cfg_order
>table. We need to configure `replication_anon` and `replication`
>at the same time. So as in box_set_replication() you always know
>if the new `replication` setting is anon or not.
>
>Maybe we need to extend dynamic_cfg_order idea, and introduce
>depending parameters. Or parameter groups. Or something like
>that. So as we could, for example, group `replication_anon` and
>`replication`, and pass anon value to box_set_replication() when
>they are updated both at the same box.cfg.
>
>Or we need to forbid update of some parameters in one box.cfg.
>
>Or we could make that order of parameters depends on their
>values. Although it does not solve the problem that when we
>change both `replication_anon` and `replication`, we do
>unnecessary actions to the old `replication` in some cases.

Well, I reused the box_set_replication code in replication_anon change, so
now if replication_anon is changed together with replication, it will trigger
duplicate box_set_replication action (almost). This is ok, I guess.

>
>
>================================================================================
>     replication             = 200,
> }
>
>diff --git a/src/box/replication.cc b/src/box/replication.cc
>index ce707811a..d9848debe 100644
>--- a/src/box/replication.cc
>+++ b/src/box/replication.cc
>@@ -916,7 +916,6 @@ replica_anon_delete(struct replica *replica)
> replica_delete(replica);
> }
>
>-
> struct replica *
> replicaset_first(void)
> {


diff --git a/src/box/applier.cc b/src/box/applier.cc
index 10725ed2c..ae3d281a5 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -1046,13 +1046,12 @@ applier_f(va_list ap)
 else
 applier_join(applier);
 }
-if (applier->version_id >= version_id(1, 7, 0) &&
-    !replication_anon && instance_id == REPLICA_ID_NIL) {
+if (instance_id == REPLICA_ID_NIL &&
+    !replication_anon) {
 /*
- * Anonymity was turned off while
- * we were fetching a snapshot or
- * following master. Register the
- * replica now.
+ * The instance transitioned
+ * from anonymous. Register it
+ * now.
  */
 applier_register(applier);
 }
diff --git a/src/box/box.cc b/src/box/box.cc
index a93e2503e..f99aeebc6 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -778,6 +778,9 @@ box_set_replication_anon(void)
 return;
 
 if (!anon) {
+auto guard = make_scoped_guard([&]{
+replication_anon = !anon;
+});
 /* Turn anonymous instance into a normal one. */
 replication_anon = anon;
 /*
@@ -786,34 +789,30 @@ box_set_replication_anon(void)
  * them can register and others resend a
  * non-anonymous subscribe.
  */
-replicaset_foreach(replica) {
-struct applier *applier = replica->applier;
-if (applier == NULL)
-continue;
-replica_clear_applier(replica);
-applier_stop(applier);
-replica->applier_sync_state = APPLIER_DISCONNECTED;
-replica_set_applier(replica, applier);
-}
+box_sync_replication(false);
 /* Choose a master to send register request to. */
 struct replica *master = replicaset_leader();
-assert(master != NULL && master->applier != NULL);
+if (master == NULL || master->applier == NULL ||
+    master->applier->state != APPLIER_CONNECTED) {
+tnt_raise(ClientError, ER_CANNOT_REGISTER);
+}
 struct applier *master_applier = master->applier;
 
-applier_start(master_applier);
-
-applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
-applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
-applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
-applier_resume(master_applier);
+applier_resume_to_state(master_applier, APPLIER_READY,
+TIMEOUT_INFINITY);
+applier_resume_to_state(master_applier, APPLIER_REGISTER,
+TIMEOUT_INFINITY);
+applier_resume_to_state(master_applier, APPLIER_REGISTERED,
+TIMEOUT_INFINITY);
+applier_resume_to_state(master_applier, APPLIER_READY,
+TIMEOUT_INFINITY);
 /**
- * Restart other appliers to
+ * Resume other appliers to
  * resend non-anonymous subscribe.
  */
-replicaset_foreach(replica) {
-if (replica != master && replica->applier)
-applier_start(replica->applier);
-}
+replicaset_follow();
+replicaset_sync();
+guard.is_active = false;
 } else if (!is_box_configured) {
 replication_anon = anon;
 } else {
diff --git a/src/box/errcode.h b/src/box/errcode.h
index dce69c9c0..0f7563e64 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -259,6 +259,7 @@ struct errcode_record {
 /*204 */_(ER_SQL_FUNC_WRONG_RET_COUNT,"SQL expects exactly one argument returned from %s, got %d")\
 /*205 */_(ER_FUNC_INVALID_RETURN_TYPE,"Function '%s' returned value of invalid type: expected %s got %s") \
 /*206 */_(ER_REPLICA_NOT_ANON, "Replica '%s' is not anonymous and cannot register.") \
+/*207 */_(ER_CANNOT_REGISTER, "Couldn't find an instance to register this replica on.") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 22029751f..4258edfcd 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -750,19 +750,19 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
 {
 struct relay *relay = container_of(stream, struct relay, stream);
 assert(iproto_type_is_dml(packet->type));
-/*
- * Replica-local requests generated while replica was
- * anonymous have a zero instance id. Just skip all
- * these rows.
- */
-if (packet->replica_id == REPLICA_ID_NIL)
-return;
 /*
  * Transform replica local requests to IPROTO_NOP so as to
  * promote vclock on the replica without actually modifying
  * any data.
  */
 if (packet->group_id == GROUP_LOCAL) {
+/*
+ * Replica-local requests generated while replica
+ * was anonymous have a zero instance id. Just
+ * skip all these rows.
+ */
+if (packet->replica_id == REPLICA_ID_NIL)
+return;
 packet->type = IPROTO_NOP;
 packet->group_id = GROUP_DEFAULT;
 packet->bodycnt = 0;
diff --git a/test/box/misc.result b/test/box/misc.result
index 134caef63..51ad29cdd 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -555,6 +555,7 @@ t;
   204: box.error.SQL_FUNC_WRONG_RET_COUNT
   205: box.error.FUNC_INVALID_RETURN_TYPE
   206: box.error.REPLICA_NOT_ANON
+  207: box.error.CANNOT_REGISTER
 ...
 test_run:cmd("setopt delimiter ''");
 ---
 

-- 
Sergey Petrenko


More information about the Tarantool-patches mailing list