[Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Dec 27 21:42:11 MSK 2019


Thanks for the fixes!

I've pushed my review fixes on top of this commit. See it below
and on the branch. If you agree, then squash. Otherwise lets
discuss.

> Also please check out the discussion with Kostja Osipov in this thread
> ( [Tarantool-patches] [PATCH 4/5] vclock: ignore 0th component in comparisons.),
> there are some new changes to the commit regarding 0 vclock component comparison.

Yeah, I saw. Honestly, I don't understand how is 0 component used anywhere
before your patch. From what I see, it was just unused always. I've left a
small amendment in v2 thread to this commit. See answer to
"[Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons."

>> We should document new iproto codes: IPROTO_FETCH_SNAPSHOT,
>> IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here:
>> https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/
> 
> Opened a ticket. https://github.com/tarantool/doc/issues/1046

I rather meant to add them to the docbot request, but this is
ok as well.

>> See 4 comments below.
>>
>>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>>> index f4f9d0670..2939a0f61 100644
>>> --- a/src/box/applier.cc
>>> +++ b/src/box/applier.cc
>>> @@ -996,10 +1035,24 @@ applier_f(va_list ap)
>>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>>  /*
>>>   * Execute JOIN if this is a bootstrap.
>>> + * In case of anonymous replication, don't
>>> + * join but just fetch master's snapshot.
>>> + *
>>>   * The join will pause the applier
>>>   * until WAL is created.
>>>   */
>>> -applier_join(applier);
>>> +if (replication_anon)
>>> +applier_fetch_snapshot(applier);
>>> +else
>>> +applier_join(applier);
>>> +}
>>> +if (applier->version_id >= version_id(1, 7, 0) &&
>>> +    !replication_anon && instance_id == REPLICA_ID_NIL) {
>>> +/* anonymity was turned off while we were
>>> + * fetching a snapshot or following master.
>>> + * Register the replica now.
>>> + */
>>> +applier_register(applier);
>>>  }
>>
>> 1. Why do we check for version here? Wouldn't it be more straight
>> to do something like this?
>>
>> ================================================================================
>>
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index 3efa2b765..32851506c 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -1032,6 +1032,7 @@ applier_f(va_list ap)
>>  while (!fiber_is_cancelled()) {
>>  try {
>>  applier_connect(applier);
>> +bool was_anon = replication_anon;
>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>  /*
>>   * Execute JOIN if this is a bootstrap.
>> @@ -1046,8 +1047,8 @@ applier_f(va_list ap)
>>  else
>>  applier_join(applier);
>>  }
>> -if (applier->version_id >= version_id(1, 7, 0) &&
>> -    !replication_anon && instance_id == REPLICA_ID_NIL) {
>> +if (was_anon && !replication_anon &&
>> +    instance_id == REPLICA_ID_NIL) {
>>  /*
>>   * Anonymity was turned off while
>>   * we were fetching a snapshot or
>>
>> ================================================================================
>>
>> Also I don't understand, why is it needed? From what I see in
>> box_set_replication_anon(), we stop existing appliers, when
>> change `replication_anon`. So in case `replication_anon` is
>> changed, the fiber should be canceled by that moment.
>> applier_connect() does fiber_testcancel() via coio after each
>> yield.
>>
>> A register could even harm here, because this instance will do
>> register twice - from there, and from a new applier fiber, started
>> in box_set_replication_anon().
> 
> True, the version check is pointless and the comment is misleading.
> But we need to call applier_register here. In case of a transition
> from anon to normal, we have already fetched a snapshot, so we
> shouldn't execute join in a new applier, we just execute register and
> proceed to subscribe. Fixed.

Thanks for the fix. Although I still don't understand how is this
code reachable (I tried to add assert(false), and it crashed). It
is even covered with a test, what is cool btw.

As I see, when we change anon setting, the existing appliers are
cancelled. Why does this applier_f fiber still works, if any change
in anon and replication cancels all the existing appliers?

Apparently, I miss something.

>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index 56d07e634..a93e2503e 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
>>>  replication_skip_conflict = cfg_geti("replication_skip_conflict");
>>>  }
>>>
>>> +void
>>> +box_set_replication_anon(void)
>>> +{
>>> +bool anon = box_check_replication_anon();
>>> +if (anon == replication_anon)
>>> +return;
>>> +
>>> +if (!anon) {
>>> +/* Turn anonymous instance into a normal one. */
>>> +replication_anon = anon;
>>
>> 2. In case of any exception `replication_anon` will contain a not
>> applied value.
>>
>> But there is a worse problem - we stop the appliers, and don't
>> restart them in case of an exception. So box_set_replication_anon()
>> is not atomic, it may leave the instance in a half configured
>> state. box_set_replication() deals with that by creation of a new
>> array of appliers. It connects them, and only then atomically
>> replaces the old appliers.
>>
>> Also looks like deanonymisation does not respect quorum. I did
>> this test:
>>
>>     Master:
>>         box.cfg{listen=3301}
>>         box.schema.user.grant('guest', 'read,write,execute', 'universe')
>>
>>
>>     Replica:
>>         box.cfg{replication_anon=true,
>>                 read_only=true,
>>                 replication_connect_quorum=0,
>>                 replication=3301}
>>
>>     Master:
>>         Killed.
>>
>>     Replica:
>>         box.cfg{replication_anon=false, read_only=false}
>>
>> The replica hangs. Even though the quorum is 0.
>>
>> I think, we somehow need to reuse box_set_replication() for all
>> of this. Looks like it does all we need except does not send the
>> register requests.
> 
> True.

Cool, I am happy that it was fixed so easy! Although this test
should probably be added to anon.test.lua.

Now I've discovered another possible problem. Are anon replicas
going to be used as a failover master in case the old master
dies? Because the example above shows that it is not possible to
make an anon replica read-write without a master. This is what
I get, even with 0 quorum:
---
- error: Couldn't find an instance to register this replica on.
...

Is that intended? Likely yes, but I ask just in case.

However the thing below looks like a bug:

    Master:
      box.cfg{listen=3301}
      box.schema.user.grant('guest', 'read,write,execute', 'universe')
      box.schema.user.grant('guest', 'replication')

    Replica:
      box.cfg{replication_anon=true,
              read_only=true,
              replication_connect_quorum=0,
              replication=3301}

      box.cfg{replication_anon=false}
      ---
      - error: Couldn't find an instance to register this replica on.
      ...

This is because of quorum 0. When the quorum is 0, box_set_replication_anon()
tries to find a leader immediately after creation of appliers. Note, that the
master is alive. I tried to make a simple fix, but failed. Temporary change
to quorum 1 won't help, btw, because it can connect to a read-only replica
instead of a master.

Everything works, when I change quorum to 1, and then try deanon again.

Also it works when I do this:

@@ -789,7 +789,7 @@ box_set_replication_anon(void)
 		 * them can register and others resend a
 		 * non-anonymous subscribe.
 		 */
-		box_sync_replication(false);
+		box_sync_replication(true);
 		/*
 		 * Wait until the master has registered this
 		 * instance.

But I am not sure that it is legal to do so.

>>> +/*
>>> + * Reset all appliers. This will interrupt
>>> + * anonymous follow they're in so that one of
>>> + * them can register and others resend a
>>> + * non-anonymous subscribe.
>>> + */
>>> +replicaset_foreach(replica) {
>>> +struct applier *applier = replica->applier;
>>> +if (applier == NULL)
>>> +continue;
>>> +replica_clear_applier(replica);
>>> +applier_stop(applier);
>>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>>> +replica_set_applier(replica, applier);
>>> +}
>>> +/* Choose a master to send register request to. */
>>> +struct replica *master = replicaset_leader();
>>> +assert(master != NULL && master->applier != NULL);
>>> +struct applier *master_applier = master->applier;
>>> +
>>> +applier_start(master_applier);
>>> +
>>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);

Is it possible to always wait for APPLIER_READY and not to
pay attention to the prior states?

Or at least wait for REGISTERED + READY only? Like this:

================================================================================
 		}
 		struct applier *master_applier = master->applier;
 
-		applier_resume_to_state(master_applier, APPLIER_READY,
-					TIMEOUT_INFINITY);
-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
-					TIMEOUT_INFINITY);
 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
 					TIMEOUT_INFINITY);
================================================================================

>>> +applier_resume(master_applier);
>>> +/**
>>> + * Restart other appliers to
>>> + * resend non-anonymous subscribe.
>>> + */
>>> +replicaset_foreach(replica) {
>>> +if (replica != master && replica->applier)
>>> +applier_start(replica->applier);
>>> +}
>>> +} else if (!is_box_configured) {
>>> +replication_anon = anon;
>>> +} else {
>>> +/*
>>> + * It is forbidden to turn a normal replica into
>>> + * an anonymous one.
>>> + */
>>> +tnt_raise(ClientError, ER_CFG, "replication_anon",
>>> +  "cannot be turned on after bootstrap"
>>> +  " has finished");
>>> +}
>>> +
>>> +}
>>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>>> index e849fcf4f..22029751f 100644
>>> --- a/src/box/relay.cc
>>> +++ b/src/box/relay.cc
>>> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>>>  relay_start(relay, fd, sync, relay_send_row);
>>>  auto relay_guard = make_scoped_guard([=] {
>>>  relay_stop(relay);
>>> -replica_on_relay_stop(replica);
>>> +if (replica->anon)
>>> +replica_anon_delete(replica);
>>> +else
>>> +replica_on_relay_stop(replica);
>>
>> 3. Why can't replica_on_relay_stop() be used for an anon replica?
> 
> replica_on_relay_stop() asserts the replica is registered with gc and
> removes it from consumers. Anonymous replicas aren't registered with gc.
> I can rewrite replica_on_relay_stop() appropriately, but why?

The problem is that replica_on_relay_stop() is not really a destructor.
It is an event. What to do on that event is encapsulated inside
replica_on_relay_stop(). When you added branching on anon setting, the
encapsulation was broken.

>> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>> index 9dee75b7d..51d157aa3 100644
>> --- a/src/box/lua/load_cfg.lua
>> +++ b/src/box/lua/load_cfg.lua
>> @@ -279,6 +279,7 @@ local dynamic_cfg_order = {
>>      replication_sync_timeout    = 150,
>>      replication_connect_timeout = 150,
>>      replication_connect_quorum  = 150,
>> +    replication_anon        = 175,
>> ================================================================================
>>
>> I am not sure about this change.
>>
>> The only way how can it be changed now - from true to false.
>> Not back. The only dubious scenario: we change `replication_anon`
>> to false, and change `replication` at the same time.
>>
>> If we set `replication_anon` false before new `replication`,
>> we will try to register the old `replication`. It means
>> `replication_anon` update should be done after `replication`.
>> So as the new configuration could connect as anon, and then be
>> upgraded to normal.
>>
>> But when we will support anonymization of normal replicas, the
>> config update could look like this:
>>
>>     `replication_anon` false -> true
>>     `replication` old -> new
>>
>> And here `replication_anon` should be set before `replication`.
>> So as we would connect anonymously to the new replication set.
>>
>> In both cases we do unnecessary actions with the old
>> replication.
>>
>> Seems like it can't be properly solved by dynamic_cfg_order
>> table. We need to configure `replication_anon` and `replication`
>> at the same time. So as in box_set_replication() you always know
>> if the new `replication` setting is anon or not.
>>
>> Maybe we need to extend dynamic_cfg_order idea, and introduce
>> depending parameters. Or parameter groups. Or something like
>> that. So as we could, for example, group `replication_anon` and
>> `replication`, and pass anon value to box_set_replication() when
>> they are updated both at the same box.cfg.
>>
>> Or we need to forbid update of some parameters in one box.cfg.
>>
>> Or we could make that order of parameters depends on their
>> values. Although it does not solve the problem that when we
>> change both `replication_anon` and `replication`, we do
>> unnecessary actions to the old `replication` in some cases.
> 
> Well, I reused the box_set_replication code in replication_anon change, so
> now if replication_anon is changed together with replication, it will trigger
> duplicate box_set_replication action (almost). This is ok, I guess.

No, looks like the same problem still exists. The settings
are applied one by one. So when we change anon and replication,
the anon is changed first, and replication is changed second.

When anon is changed, it still sees the old replication setting.
Check this code in load_cfg.lua:

    local function reload_cfg(oldcfg, cfg)
        cfg = upgrade_cfg(cfg, translate_cfg)
        local newcfg = prepare_cfg(cfg, default_cfg, template_cfg, modify_cfg)
        local ordered_cfg = {}
        -- iterate over original table because prepare_cfg() may store NILs
        for key, val in pairs(cfg) do
            if dynamic_cfg[key] == nil and oldcfg[key] ~= val then
                box.error(box.error.RELOAD_CFG, key);
            end
            table.insert(ordered_cfg, key)
        end
        table.sort(ordered_cfg, sort_cfg_cb)
        for _, key in pairs(ordered_cfg) do
            local val = newcfg[key]
            local oldval = oldcfg[key]
            if not compare_cfg(val, oldval) then
                rawset(oldcfg, key, val)
                if not pcall(dynamic_cfg[key]) then
                    rawset(oldcfg, key, oldval) -- revert the old value
                    return box.error() -- re-throw
                end
                if log_cfg_option[key] ~= nil then
                    val = log_cfg_option[key](val)
                end
                log.info("set '%s' configuration option to %s", key,
                    json.encode(val))
            end
        end
        ...

Here rawset(oldcfg, key, val) is the actual update of what we
see in cfg_get*() functions. So when anon is changed before
replication, it will try to deanon the old replication. And
only then will configure the new replication.

In that case lets configure anon after replication in this
patch. It partially solves the problem for now. And I added
a ticket to improve this part.

https://github.com/tarantool/tarantool/issues/4714

Below are my review fixes:

================================================================================

commit ed559c8f18d5c8606410bfbb76a380a43e2e163f
Author: Vladislav Shpilevoy <v.shpilevoy at tarantool.org>
Date:   Fri Dec 27 19:56:11 2019 +0300

    Review fixes 5/5

diff --git a/src/box/box.cc b/src/box/box.cc
index f99aeebc6..5a8e4d3b9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -790,7 +790,10 @@ box_set_replication_anon(void)
 		 * non-anonymous subscribe.
 		 */
 		box_sync_replication(false);
-		/* Choose a master to send register request to. */
+		/*
+		 * Wait until the master has registered this
+		 * instance.
+		 */
 		struct replica *master = replicaset_leader();
 		if (master == NULL || master->applier == NULL ||
 		    master->applier->state != APPLIER_CONNECTED) {
@@ -798,10 +801,6 @@ box_set_replication_anon(void)
 		}
 		struct applier *master_applier = master->applier;
 
-		applier_resume_to_state(master_applier, APPLIER_READY,
-					TIMEOUT_INFINITY);
-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
-					TIMEOUT_INFINITY);
 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
 					TIMEOUT_INFINITY);
 		applier_resume_to_state(master_applier, APPLIER_READY,
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 51d157aa3..955f50061 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -279,8 +279,16 @@ local dynamic_cfg_order = {
     replication_sync_timeout    = 150,
     replication_connect_timeout = 150,
     replication_connect_quorum  = 150,
-    replication_anon        = 175,
     replication             = 200,
+    -- Anon is set after `replication` as a temporary workaround
+    -- for the problem, that `replication` and `replication_anon`
+    -- depend on each other. If anon would be configured before
+    -- `replication`, it could lead to a bug, when anon is changed
+    -- from true to false together with `replication`, and it
+    -- would try to deanon the old `replication` before applying
+    -- the new one. This should be fixed when box.cfg is able to
+    -- apply some parameters together and atomically.
+    replication_anon        = 250,
 }
 
 local function sort_cfg_cb(l, r)
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4258edfcd..7ffc79285 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -697,10 +697,7 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 	relay_start(relay, fd, sync, relay_send_row);
 	auto relay_guard = make_scoped_guard([=] {
 		relay_stop(relay);
-		if (replica->anon)
-			replica_anon_delete(replica);
-		else
-			replica_on_relay_stop(replica);
+		replica_on_relay_stop(replica);
 	});
 
 	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 11096cc79..1476c7472 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -890,14 +890,24 @@ void
 replica_on_relay_stop(struct replica *replica)
 {
 	/*
-	 * If the replica was evicted from the cluster, we don't
-	 * need to keep WALs for it anymore. Unregister it with
-	 * the garbage collector then. See also replica_clear_id.
+	 * If the replica was evicted from the cluster, or was not
+	 * even added there (anon replica), we don't need to keep
+	 * WALs for it anymore. Unregister it with the garbage
+	 * collector then. See also replica_clear_id.
 	 */
-	assert(replica->gc != NULL);
 	if (replica->id == REPLICA_ID_NIL) {
-		gc_consumer_unregister(replica->gc);
-		replica->gc = NULL;
+		if (!replica->anon) {
+			gc_consumer_unregister(replica->gc);
+			replica->gc = NULL;
+		} else {
+			assert(replica->gc == NULL);
+			assert(replica->id == REPLICA_ID_NIL);
+			/*
+			 * We do not replicate from anonymous
+			 * replicas.
+			 */
+			assert(replica->applier == NULL);
+		}
 	}
 	if (replica_is_orphan(replica)) {
 		replica_hash_remove(&replicaset.hash, replica);
@@ -905,17 +915,6 @@ replica_on_relay_stop(struct replica *replica)
 	}
 }
 
-void
-replica_anon_delete(struct replica *replica)
-{
-	assert(replica->gc == NULL);
-	assert(replica->id == REPLICA_ID_NIL);
-	/* We do not replicate from anonymous replicas */
-	assert(replica->applier == NULL);
-	replica_hash_remove(&replicaset.hash, replica);
-	replica_delete(replica);
-}
-
 struct replica *
 replicaset_first(void)
 {
diff --git a/src/box/replication.h b/src/box/replication.h
index 978a09d41..2ef1255b3 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -367,9 +367,6 @@ replica_set_applier(struct replica * replica, struct applier * applier);
 void
 replica_on_relay_stop(struct replica *replica);
 
-void
-replica_anon_delete(struct replica *replica);
-
 #if defined(__cplusplus)
 } /* extern "C" */
 
diff --git a/test/replication/anon.result b/test/replication/anon.result
index f6a85e7d0..88061569f 100644
--- a/test/replication/anon.result
+++ b/test/replication/anon.result
@@ -324,3 +324,62 @@ box.schema.user.revoke('guest', 'replication')
 test_run:cleanup_cluster()
  | ---
  | ...
+
+--
+-- Check that in case of a master absence an anon replica can't
+-- deanonymize itself, regardless of quorum value.
+--
+test_run:cmd('create server master with script="replication/master1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master')
+ | ---
+ | - true
+ | ...
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:switch('replica_anon')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_connect_quorum = 0}
+ | ---
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_anon = false}
+ | ---
+ | - error: Couldn't find an instance to register this replica on.
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica_anon')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/anon.test.lua b/test/replication/anon.test.lua
index 1877ec8bb..8a8d15c18 100644
--- a/test/replication/anon.test.lua
+++ b/test/replication/anon.test.lua
@@ -116,3 +116,22 @@ box.space.temp:drop()
 box.space.loc:drop()
 box.schema.user.revoke('guest', 'replication')
 test_run:cleanup_cluster()
+
+--
+-- Check that in case of a master absence an anon replica can't
+-- deanonymize itself, regardless of quorum value.
+--
+test_run:cmd('create server master with script="replication/master1.lua"')
+test_run:cmd('start server master')
+test_run:switch('master')
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
+test_run:cmd('start server replica_anon')
+test_run:switch('replica_anon')
+box.cfg{replication_connect_quorum = 0}
+test_run:cmd('stop server master')
+test_run:cmd('delete server master')
+box.cfg{replication_anon = false}
+test_run:switch('default')
+test_run:cmd('stop server replica_anon')
+test_run:cmd('delete server replica_anon')
diff --git a/test/replication/master1.lua b/test/replication/master1.lua
new file mode 120000
index 000000000..1c7debd2f
--- /dev/null
+++ b/test/replication/master1.lua
@@ -0,0 +1 @@
+master.lua
\ No newline at end of file


More information about the Tarantool-patches mailing list