[Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.

Sergey Petrenko sergepetrenko at tarantool.org
Sat Dec 28 14:48:15 MSK 2019


>Пятница, 27 декабря 2019, 21:42 +03:00 от Vladislav Shpilevoy <v.shpilevoy at tarantool.org>:
>
>Thanks for the fixes!
>
>I've pushed my review fixes on top of this commit. See it below
>and on the branch. If you agree, then squash. Otherwise lets
>discuss.
>
>> Also please check out the discussion with Kostja Osipov in this thread
>> ( [Tarantool-patches] [PATCH 4/5] vclock: ignore 0th component in comparisons.),
>> there are some new changes to the commit regarding 0 vclock component comparison.
>
>Yeah, I saw. Honestly, I don't understand how is 0 component used anywhere
>before your patch. From what I see, it was just unused always. I've left a
>small amendment in v2 thread to this commit. See answer to
>"[Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons."
>
>>> We should document new iproto codes: IPROTO_FETCH_SNAPSHOT,
>>> IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here:
>>>  https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/
>> 
>> Opened a ticket. https://github.com/tarantool/doc/issues/1046
>
>I rather meant to add them to the docbot request, but this is
>ok as well.
>
>>> See 4 comments below.
>>>
>>>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>>>> index f4f9d0670..2939a0f61 100644
>>>> --- a/src/box/applier.cc
>>>> +++ b/src/box/applier.cc
>>>> @@ -996,10 +1035,24 @@ applier_f(va_list ap)
>>>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>>>  /*
>>>>   * Execute JOIN if this is a bootstrap.
>>>> + * In case of anonymous replication, don't
>>>> + * join but just fetch master's snapshot.
>>>> + *
>>>>   * The join will pause the applier
>>>>   * until WAL is created.
>>>>   */
>>>> -applier_join(applier);
>>>> +if (replication_anon)
>>>> +applier_fetch_snapshot(applier);
>>>> +else
>>>> +applier_join(applier);
>>>> +}
>>>> +if (applier->version_id >= version_id(1, 7, 0) &&
>>>> +    !replication_anon && instance_id == REPLICA_ID_NIL) {
>>>> +/* anonymity was turned off while we were
>>>> + * fetching a snapshot or following master.
>>>> + * Register the replica now.
>>>> + */
>>>> +applier_register(applier);
>>>>  }
>>>
>>> 1. Why do we check for version here? Wouldn't it be more straight
>>> to do something like this?
>>>
>>> ================================================================================
>>>
>>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>>> index 3efa2b765..32851506c 100644
>>> --- a/src/box/applier.cc
>>> +++ b/src/box/applier.cc
>>> @@ -1032,6 +1032,7 @@ applier_f(va_list ap)
>>>  while (!fiber_is_cancelled()) {
>>>  try {
>>>  applier_connect(applier);
>>> +bool was_anon = replication_anon;
>>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>>  /*
>>>   * Execute JOIN if this is a bootstrap.
>>> @@ -1046,8 +1047,8 @@ applier_f(va_list ap)
>>>  else
>>>  applier_join(applier);
>>>  }
>>> -if (applier->version_id >= version_id(1, 7, 0) &&
>>> -    !replication_anon && instance_id == REPLICA_ID_NIL) {
>>> +if (was_anon && !replication_anon &&
>>> +    instance_id == REPLICA_ID_NIL) {
>>>  /*
>>>   * Anonymity was turned off while
>>>   * we were fetching a snapshot or
>>>
>>> ================================================================================
>>>
>>> Also I don't understand, why is it needed? From what I see in
>>> box_set_replication_anon(), we stop existing appliers, when
>>> change `replication_anon`. So in case `replication_anon` is
>>> changed, the fiber should be canceled by that moment.
>>> applier_connect() does fiber_testcancel() via coio after each
>>> yield.
>>>
>>> A register could even harm here, because this instance will do
>>> register twice - from there, and from a new applier fiber, started
>>> in box_set_replication_anon().
>> 
>> True, the version check is pointless and the comment is misleading.
>> But we need to call applier_register here. In case of a transition
>> from anon to normal, we have already fetched a snapshot, so we
>> shouldn't execute join in a new applier, we just execute register and
>> proceed to subscribe. Fixed.
>
>Thanks for the fix. Although I still don't understand how is this
>code reachable (I tried to add assert(false), and it crashed). It
>is even covered with a test, what is cool btw.
>
>
>As I see, when we change anon setting, the existing appliers are
>cancelled. Why does this applier_f fiber still works, if any change
>in anon and replication cancels all the existing appliers?
On replication_anon reconfiguration old appliers are killed.
The new applier is started, it connects, checks for replicaset_uuid, which is
not nil, since the previous applier has fetched a snapshot,
proceeds to checking whether the instance_uuid is zero.
If it is (it is, since we were anonymous), and replication_anon is false
(and it is, since we have reconfigured it), it proceeds to register.
Then it subscribes.

I guess the previous comment that we fixed here was misleading.
This was how I understood it when I started working on this and
tried to reuse the same applier without killing it. So, sorry for the
confusion.

>
>
>Apparently, I miss something.
>
>>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>>> index 56d07e634..a93e2503e 100644
>>>> --- a/src/box/box.cc
>>>> +++ b/src/box/box.cc
>>>> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
>>>>  replication_skip_conflict = cfg_geti("replication_skip_conflict");
>>>>  }
>>>>
>>>> +void
>>>> +box_set_replication_anon(void)
>>>> +{
>>>> +bool anon = box_check_replication_anon();
>>>> +if (anon == replication_anon)
>>>> +return;
>>>> +
>>>> +if (!anon) {
>>>> +/* Turn anonymous instance into a normal one. */
>>>> +replication_anon = anon;
>>>
>>> 2. In case of any exception `replication_anon` will contain a not
>>> applied value.
>>>
>>> But there is a worse problem - we stop the appliers, and don't
>>> restart them in case of an exception. So box_set_replication_anon()
>>> is not atomic, it may leave the instance in a half configured
>>> state. box_set_replication() deals with that by creation of a new
>>> array of appliers. It connects them, and only then atomically
>>> replaces the old appliers.
>>>
>>> Also looks like deanonymisation does not respect quorum. I did
>>> this test:
>>>
>>>     Master:
>>>         box.cfg{listen=3301}
>>>         box.schema.user.grant('guest', 'read,write,execute', 'universe')
>>>
>>>
>>>     Replica:
>>>         box.cfg{replication_anon=true,
>>>                 read_only=true,
>>>                 replication_connect_quorum=0,
>>>                 replication=3301}
>>>
>>>     Master:
>>>         Killed.
>>>
>>>     Replica:
>>>         box.cfg{replication_anon=false, read_only=false}
>>>
>>> The replica hangs. Even though the quorum is 0.
>>>
>>> I think, we somehow need to reuse box_set_replication() for all
>>> of this. Looks like it does all we need except does not send the
>>> register requests.
>> 
>> True.
>
>Cool, I am happy that it was fixed so easy! Although this test
>should probably be added to anon.test.lua.
>
>Now I've discovered another possible problem. Are anon replicas
>going to be used as a failover master in case the old master
>dies? Because the example above shows that it is not possible to
>make an anon replica read-write without a master. This is what
>I get, even with 0 quorum:
>---
>- error: Couldn't find an instance to register this replica on.
>...
>
>Is that intended? Likely yes, but I ask just in case. 
True, this is intended.

>
>
>However the thing below looks like a bug:
>
>    Master:
>      box.cfg{listen=3301}
>      box.schema.user.grant('guest', 'read,write,execute', 'universe')
>      box.schema.user.grant('guest', 'replication')
>
>    Replica:
>      box.cfg{replication_anon=true,
>              read_only=true,
>              replication_connect_quorum=0,
>              replication=3301}
>
>      box.cfg{replication_anon=false}
>      ---
>      - error: Couldn't find an instance to register this replica on.
>      ...
>
>This is because of quorum 0. When the quorum is 0, box_set_replication_anon()
>tries to find a leader immediately after creation of appliers. Note, that the
>master is alive. I tried to make a simple fix, but failed. Temporary change
>to quorum 1 won't help, btw, because it can connect to a read-only replica
>instead of a master.
>
>Everything works, when I change quorum to 1, and then try deanon again.
>
>Also it works when I do this:
>
>@@ -789,7 +789,7 @@ box_set_replication_anon(void)
> 		 * them can register and others resend a
> 		 * non-anonymous subscribe.
> 		 */
>-		box_sync_replication(false);
>+		box_sync_replication(true);
> 		/*
> 		 * Wait until the master has registered this
> 		 * instance.
>
>But I am not sure that it is legal to do so.
Maybe leave it as is for now? I can't see what the consequences
will be. If you want to promote an anon replica to master this
probably means that one of the masters is dead. So if you do
box_sync_replication(true), your instance will hang on replication_anon
configuration for replication timeout or whatever (replication_connect_timeout?).
You'll need to remove the dead master from box.cfg.replication first.
So this will still require additional configuration mangling.

If you think it's fine to wait for a timeout on replication_anon reconfiguration,
feel free to apply the change above. I'm not against it.

>
>
>>>> +/*
>>>> + * Reset all appliers. This will interrupt
>>>> + * anonymous follow they're in so that one of
>>>> + * them can register and others resend a
>>>> + * non-anonymous subscribe.
>>>> + */
>>>> +replicaset_foreach(replica) {
>>>> +struct applier *applier = replica->applier;
>>>> +if (applier == NULL)
>>>> +continue;
>>>> +replica_clear_applier(replica);
>>>> +applier_stop(applier);
>>>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>>>> +replica_set_applier(replica, applier);
>>>> +}
>>>> +/* Choose a master to send register request to. */
>>>> +struct replica *master = replicaset_leader();
>>>> +assert(master != NULL && master->applier != NULL);
>>>> +struct applier *master_applier = master->applier;
>>>> +
>>>> +applier_start(master_applier);
>>>> +
>>>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>>>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>>>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
>
>Is it possible to always wait for APPLIER_READY and not to
>pay attention to the prior states? 
No, applier transitions to ready right after CONNECTED.
>
>
>Or at least wait for REGISTERED + READY only? Like this:
This is ok. I just liked the verbosity of waiting for all the states step by step.
>
>
>================================================================================
> 		}
> 		struct applier *master_applier = master->applier;
>
>-		applier_resume_to_state(master_applier, APPLIER_READY,
>-					TIMEOUT_INFINITY);
>-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
>-					TIMEOUT_INFINITY);
> 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
> 					TIMEOUT_INFINITY);
>================================================================================
>
>>>> +applier_resume(master_applier);
>>>> +/**
>>>> + * Restart other appliers to
>>>> + * resend non-anonymous subscribe.
>>>> + */
>>>> +replicaset_foreach(replica) {
>>>> +if (replica != master && replica->applier)
>>>> +applier_start(replica->applier);
>>>> +}
>>>> +} else if (!is_box_configured) {
>>>> +replication_anon = anon;
>>>> +} else {
>>>> +/*
>>>> + * It is forbidden to turn a normal replica into
>>>> + * an anonymous one.
>>>> + */
>>>> +tnt_raise(ClientError, ER_CFG, "replication_anon",
>>>> +  "cannot be turned on after bootstrap"
>>>> +  " has finished");
>>>> +}
>>>> +
>>>> +}
>>>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>>>> index e849fcf4f..22029751f 100644
>>>> --- a/src/box/relay.cc
>>>> +++ b/src/box/relay.cc
>>>> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>>>>  relay_start(relay, fd, sync, relay_send_row);
>>>>  auto relay_guard = make_scoped_guard([=] {
>>>>  relay_stop(relay);
>>>> -replica_on_relay_stop(replica);
>>>> +if (replica->anon)
>>>> +replica_anon_delete(replica);
>>>> +else
>>>> +replica_on_relay_stop(replica);
>>>
>>> 3. Why can't replica_on_relay_stop() be used for an anon replica?
>> 
>> replica_on_relay_stop() asserts the replica is registered with gc and
>> removes it from consumers. Anonymous replicas aren't registered with gc.
>> I can rewrite replica_on_relay_stop() appropriately, but why?
>
>The problem is that replica_on_relay_stop() is not really a destructor.
>It is an event. What to do on that event is encapsulated inside
>replica_on_relay_stop(). When you added branching on anon setting, the
>encapsulation was broken.
Ok, I see, thanks.
>
>
>>> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>>> index 9dee75b7d..51d157aa3 100644
>>> --- a/src/box/lua/load_cfg.lua
>>> +++ b/src/box/lua/load_cfg.lua
>>> @@ -279,6 +279,7 @@ local dynamic_cfg_order = {
>>>      replication_sync_timeout    = 150,
>>>      replication_connect_timeout = 150,
>>>      replication_connect_quorum  = 150,
>>> +    replication_anon        = 175,
>>> ================================================================================
>>>
>>> I am not sure about this change.
>>>
>>> The only way how can it be changed now - from true to false.
>>> Not back. The only dubious scenario: we change `replication_anon`
>>> to false, and change `replication` at the same time.
>>>
>>> If we set `replication_anon` false before new `replication`,
>>> we will try to register the old `replication`. It means
>>> `replication_anon` update should be done after `replication`.
>>> So as the new configuration could connect as anon, and then be
>>> upgraded to normal.
>>>
>>> But when we will support anonymization of normal replicas, the
>>> config update could look like this:
>>>
>>>     `replication_anon` false -> true
>>>     `replication` old -> new
>>>
>>> And here `replication_anon` should be set before `replication`.
>>> So as we would connect anonymously to the new replication set.
>>>
>>> In both cases we do unnecessary actions with the old
>>> replication.
>>>
>>> Seems like it can't be properly solved by dynamic_cfg_order
>>> table. We need to configure `replication_anon` and `replication`
>>> at the same time. So as in box_set_replication() you always know
>>> if the new `replication` setting is anon or not.
>>>
>>> Maybe we need to extend dynamic_cfg_order idea, and introduce
>>> depending parameters. Or parameter groups. Or something like
>>> that. So as we could, for example, group `replication_anon` and
>>> `replication`, and pass anon value to box_set_replication() when
>>> they are updated both at the same box.cfg.
>>>
>>> Or we need to forbid update of some parameters in one box.cfg.
>>>
>>> Or we could make that order of parameters depends on their
>>> values. Although it does not solve the problem that when we
>>> change both `replication_anon` and `replication`, we do
>>> unnecessary actions to the old `replication` in some cases.
>> 
>> Well, I reused the box_set_replication code in replication_anon change, so
>> now if replication_anon is changed together with replication, it will trigger
>> duplicate box_set_replication action (almost). This is ok, I guess.
>
>No, looks like the same problem still exists. The settings
>are applied one by one. So when we change anon and replication,
>the anon is changed first, and replication is changed second.
>
>When anon is changed, it still sees the old replication setting.
>Check this code in load_cfg.lua:
>
>    local function reload_cfg(oldcfg, cfg)
>        cfg = upgrade_cfg(cfg, translate_cfg)
>        local newcfg = prepare_cfg(cfg, default_cfg, template_cfg, modify_cfg)
>        local ordered_cfg = {}
>        -- iterate over original table because prepare_cfg() may store NILs
>        for key, val in pairs(cfg) do
>            if dynamic_cfg[key] == nil and oldcfg[key] ~= val then
>                box.error(box.error.RELOAD_CFG, key);
>            end
>            table.insert(ordered_cfg, key)
>        end
>        table.sort(ordered_cfg, sort_cfg_cb)
>        for _, key in pairs(ordered_cfg) do
>            local val = newcfg[key]
>            local oldval = oldcfg[key]
>            if not compare_cfg(val, oldval) then
>                rawset(oldcfg, key, val)
>                if not pcall(dynamic_cfg[key]) then
>                    rawset(oldcfg, key, oldval) -- revert the old value
>                    return box.error() -- re-throw
>                end
>                if log_cfg_option[key] ~= nil then
>                    val = log_cfg_option[key](val)
>                end
>                log.info("set '%s' configuration option to %s", key,
>                    json.encode(val))
>            end
>        end
>        ...
>
>Here rawset(oldcfg, key, val) is the actual update of what we
>see in cfg_get*() functions. So when anon is changed before
>replication, it will try to deanon the old replication. And
>only then will configure the new replication.
>
>In that case lets configure anon after replication in this
>patch. It partially solves the problem for now. And I added
>a ticket to improve this part.
Thanks for the clarification and fixes, I missed it.
>
>
>https://github.com/tarantool/tarantool/issues/4714
>
>Below are my review fixes:
>
>================================================================================
>
>commit ed559c8f18d5c8606410bfbb76a380a43e2e163f
>Author: Vladislav Shpilevoy < v.shpilevoy at tarantool.org >
>Date:   Fri Dec 27 19:56:11 2019 +0300
>
>    Review fixes 5/5
>
>diff --git a/src/box/box.cc b/src/box/box.cc
>index f99aeebc6..5a8e4d3b9 100644
>--- a/src/box/box.cc
>+++ b/src/box/box.cc
>@@ -790,7 +790,10 @@ box_set_replication_anon(void)
> 		 * non-anonymous subscribe.
> 		 */
> 		box_sync_replication(false);
>-		/* Choose a master to send register request to. */
>+		/*
>+		 * Wait until the master has registered this
>+		 * instance.
>+		 */
> 		struct replica *master = replicaset_leader();
> 		if (master == NULL || master->applier == NULL ||
> 		    master->applier->state != APPLIER_CONNECTED) {
>@@ -798,10 +801,6 @@ box_set_replication_anon(void)
> 		}
> 		struct applier *master_applier = master->applier;
>
>-		applier_resume_to_state(master_applier, APPLIER_READY,
>-					TIMEOUT_INFINITY);
>-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
>-					TIMEOUT_INFINITY);
> 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
> 					TIMEOUT_INFINITY);
> 		applier_resume_to_state(master_applier, APPLIER_READY,
>diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>index 51d157aa3..955f50061 100644
>--- a/src/box/lua/load_cfg.lua
>+++ b/src/box/lua/load_cfg.lua
>@@ -279,8 +279,16 @@ local dynamic_cfg_order = {
>     replication_sync_timeout    = 150,
>     replication_connect_timeout = 150,
>     replication_connect_quorum  = 150,
>-    replication_anon        = 175,
>     replication             = 200,
>+    -- Anon is set after `replication` as a temporary workaround
>+    -- for the problem, that `replication` and `replication_anon`
>+    -- depend on each other. If anon would be configured before
>+    -- `replication`, it could lead to a bug, when anon is changed
>+    -- from true to false together with `replication`, and it
>+    -- would try to deanon the old `replication` before applying
>+    -- the new one. This should be fixed when box.cfg is able to
>+    -- apply some parameters together and atomically.
>+    replication_anon        = 250,
> }
>
> local function sort_cfg_cb(l, r)
>diff --git a/src/box/relay.cc b/src/box/relay.cc
>index 4258edfcd..7ffc79285 100644
>--- a/src/box/relay.cc
>+++ b/src/box/relay.cc
>@@ -697,10 +697,7 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
> 	relay_start(relay, fd, sync, relay_send_row);
> 	auto relay_guard = make_scoped_guard([=] {
> 		relay_stop(relay);
>-		if (replica->anon)
>-			replica_anon_delete(replica);
>-		else
>-			replica_on_relay_stop(replica);
>+		replica_on_relay_stop(replica);
> 	});
>
> 	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
>diff --git a/src/box/replication.cc b/src/box/replication.cc
>index 11096cc79..1476c7472 100644
>--- a/src/box/replication.cc
>+++ b/src/box/replication.cc
>@@ -890,14 +890,24 @@ void
> replica_on_relay_stop(struct replica *replica)
> {
> 	/*
>-	 * If the replica was evicted from the cluster, we don't
>-	 * need to keep WALs for it anymore. Unregister it with
>-	 * the garbage collector then. See also replica_clear_id.
>+	 * If the replica was evicted from the cluster, or was not
>+	 * even added there (anon replica), we don't need to keep
>+	 * WALs for it anymore. Unregister it with the garbage
>+	 * collector then. See also replica_clear_id.
> 	 */
>-	assert(replica->gc != NULL);
> 	if (replica->id == REPLICA_ID_NIL) {
>-		gc_consumer_unregister(replica->gc);
>-		replica->gc = NULL;
>+		if (!replica->anon) {
>+			gc_consumer_unregister(replica->gc);
>+			replica->gc = NULL;
>+		} else {
>+			assert(replica->gc == NULL);
>+			assert(replica->id == REPLICA_ID_NIL);
>+			/*
>+			 * We do not replicate from anonymous
>+			 * replicas.
>+			 */
>+			assert(replica->applier == NULL);
>+		}
> 	}
> 	if (replica_is_orphan(replica)) {
> 		replica_hash_remove(&replicaset.hash, replica);
>@@ -905,17 +915,6 @@ replica_on_relay_stop(struct replica *replica)
> 	}
> }
>
>-void
>-replica_anon_delete(struct replica *replica)
>-{
>-	assert(replica->gc == NULL);
>-	assert(replica->id == REPLICA_ID_NIL);
>-	/* We do not replicate from anonymous replicas */
>-	assert(replica->applier == NULL);
>-	replica_hash_remove(&replicaset.hash, replica);
>-	replica_delete(replica);
>-}
>-
> struct replica *
> replicaset_first(void)
> {
>diff --git a/src/box/replication.h b/src/box/replication.h
>index 978a09d41..2ef1255b3 100644
>--- a/src/box/replication.h
>+++ b/src/box/replication.h
>@@ -367,9 +367,6 @@ replica_set_applier(struct replica * replica, struct applier * applier);
> void
> replica_on_relay_stop(struct replica *replica);
>
>-void
>-replica_anon_delete(struct replica *replica);
>-
> #if defined(__cplusplus)
> } /* extern "C" */
>
>diff --git a/test/replication/anon.result b/test/replication/anon.result
>index f6a85e7d0..88061569f 100644
>--- a/test/replication/anon.result
>+++ b/test/replication/anon.result
>@@ -324,3 +324,62 @@ box.schema.user.revoke('guest', 'replication')
> test_run:cleanup_cluster()
>  | ---
>  | ...
>+
>+--
>+-- Check that in case of a master absence an anon replica can't
>+-- deanonymize itself, regardless of quorum value.
>+--
>+test_run:cmd('create server master with script="replication/master1.lua"')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('start server master')
>+ | ---
>+ | - true
>+ | ...
>+test_run:switch('master')
>+ | ---
>+ | - true
>+ | ...
>+box.schema.user.grant('guest', 'replication')
>+ | ---
>+ | ...
>+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('start server replica_anon')
>+ | ---
>+ | - true
>+ | ...
>+test_run:switch('replica_anon')
>+ | ---
>+ | - true
>+ | ...
>+box.cfg{replication_connect_quorum = 0}
>+ | ---
>+ | ...
>+test_run:cmd('stop server master')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('delete server master')
>+ | ---
>+ | - true
>+ | ...
>+box.cfg{replication_anon = false}
>+ | ---
>+ | - error: Couldn't find an instance to register this replica on.
>+ | ...
>+test_run:switch('default')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('stop server replica_anon')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('delete server replica_anon')
>+ | ---
>+ | - true
>+ | ...
>diff --git a/test/replication/anon.test.lua b/test/replication/anon.test.lua
>index 1877ec8bb..8a8d15c18 100644
>--- a/test/replication/anon.test.lua
>+++ b/test/replication/anon.test.lua
>@@ -116,3 +116,22 @@ box.space.temp:drop()
> box.space.loc:drop()
> box.schema.user.revoke('guest', 'replication')
> test_run:cleanup_cluster()
>+
>+--
>+-- Check that in case of a master absence an anon replica can't
>+-- deanonymize itself, regardless of quorum value.
>+--
>+test_run:cmd('create server master with script="replication/master1.lua"')
>+test_run:cmd('start server master')
>+test_run:switch('master')
>+box.schema.user.grant('guest', 'replication')
>+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
>+test_run:cmd('start server replica_anon')
>+test_run:switch('replica_anon')
>+box.cfg{replication_connect_quorum = 0}
>+test_run:cmd('stop server master')
>+test_run:cmd('delete server master')
>+box.cfg{replication_anon = false}
>+test_run:switch('default')
>+test_run:cmd('stop server replica_anon')
>+test_run:cmd('delete server replica_anon')
>diff --git a/test/replication/master1.lua b/test/replication/master1.lua
>new file mode 120000
>index 000000000..1c7debd2f
>--- /dev/null
>+++ b/test/replication/master1.lua
>@@ -0,0 +1 @@
>+master.lua
>\ No newline at end of file

LGTM, squashed with a tiny change:

diff --git a/src/box/replication.cc b/src/box/replication.cc
index 1476c7472..e7bfa22ab 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -901,7 +901,6 @@ replica_on_relay_stop(struct replica *replica)
 replica->gc = NULL;
 } else {
 assert(replica->gc == NULL);
-assert(replica->id == REPLICA_ID_NIL);
 /*
  * We do not replicate from anonymous
  * replicas.

The assertion was inside if (replica->id == REPLICA_ID_NIL) branch.

-- 
Sergey Petrenko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20191228/b4a90dcd/attachment.html>


More information about the Tarantool-patches mailing list