From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id AA6B946970E for ; Fri, 27 Dec 2019 21:42:13 +0300 (MSK) References: <590158f6-a650-97c8-94fb-6f8c530e5c88@tarantool.org> <1577460437.511789779@f420.i.mail.ru> From: Vladislav Shpilevoy Message-ID: <42e12eeb-00b3-00dc-789f-a851459937ba@tarantool.org> Date: Fri, 27 Dec 2019 21:42:11 +0300 MIME-Version: 1.0 In-Reply-To: <1577460437.511789779@f420.i.mail.ru> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica. List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Petrenko Cc: tarantool-patches@dev.tarantool.org Thanks for the fixes! I've pushed my review fixes on top of this commit. See it below and on the branch. If you agree, then squash. Otherwise lets discuss. > Also please check out the discussion with Kostja Osipov in this thread > ( [Tarantool-patches] [PATCH 4/5] vclock: ignore 0th component in comparisons.), > there are some new changes to the commit regarding 0 vclock component comparison. Yeah, I saw. Honestly, I don't understand how is 0 component used anywhere before your patch. From what I see, it was just unused always. I've left a small amendment in v2 thread to this commit. See answer to "[Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons." >> We should document new iproto codes: IPROTO_FETCH_SNAPSHOT, >> IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here: >> https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/ > > Opened a ticket. https://github.com/tarantool/doc/issues/1046 I rather meant to add them to the docbot request, but this is ok as well. >> See 4 comments below. >> >>> diff --git a/src/box/applier.cc b/src/box/applier.cc >>> index f4f9d0670..2939a0f61 100644 >>> --- a/src/box/applier.cc >>> +++ b/src/box/applier.cc >>> @@ -996,10 +1035,24 @@ applier_f(va_list ap) >>> if (tt_uuid_is_nil(&REPLICASET_UUID)) { >>> /* >>> * Execute JOIN if this is a bootstrap. >>> + * In case of anonymous replication, don't >>> + * join but just fetch master's snapshot. >>> + * >>> * The join will pause the applier >>> * until WAL is created. >>> */ >>> -applier_join(applier); >>> +if (replication_anon) >>> +applier_fetch_snapshot(applier); >>> +else >>> +applier_join(applier); >>> +} >>> +if (applier->version_id >= version_id(1, 7, 0) && >>> + !replication_anon && instance_id == REPLICA_ID_NIL) { >>> +/* anonymity was turned off while we were >>> + * fetching a snapshot or following master. >>> + * Register the replica now. >>> + */ >>> +applier_register(applier); >>> } >> >> 1. Why do we check for version here? Wouldn't it be more straight >> to do something like this? >> >> ================================================================================ >> >> diff --git a/src/box/applier.cc b/src/box/applier.cc >> index 3efa2b765..32851506c 100644 >> --- a/src/box/applier.cc >> +++ b/src/box/applier.cc >> @@ -1032,6 +1032,7 @@ applier_f(va_list ap) >>  while (!fiber_is_cancelled()) { >>  try { >>  applier_connect(applier); >> +bool was_anon = replication_anon; >>  if (tt_uuid_is_nil(&REPLICASET_UUID)) { >>  /* >>   * Execute JOIN if this is a bootstrap. >> @@ -1046,8 +1047,8 @@ applier_f(va_list ap) >>  else >>  applier_join(applier); >>  } >> -if (applier->version_id >= version_id(1, 7, 0) && >> - !replication_anon && instance_id == REPLICA_ID_NIL) { >> +if (was_anon && !replication_anon && >> + instance_id == REPLICA_ID_NIL) { >>  /* >>   * Anonymity was turned off while >>   * we were fetching a snapshot or >> >> ================================================================================ >> >> Also I don't understand, why is it needed? From what I see in >> box_set_replication_anon(), we stop existing appliers, when >> change `replication_anon`. So in case `replication_anon` is >> changed, the fiber should be canceled by that moment. >> applier_connect() does fiber_testcancel() via coio after each >> yield. >> >> A register could even harm here, because this instance will do >> register twice - from there, and from a new applier fiber, started >> in box_set_replication_anon(). > > True, the version check is pointless and the comment is misleading. > But we need to call applier_register here. In case of a transition > from anon to normal, we have already fetched a snapshot, so we > shouldn't execute join in a new applier, we just execute register and > proceed to subscribe. Fixed. Thanks for the fix. Although I still don't understand how is this code reachable (I tried to add assert(false), and it crashed). It is even covered with a test, what is cool btw. As I see, when we change anon setting, the existing appliers are cancelled. Why does this applier_f fiber still works, if any change in anon and replication cancels all the existing appliers? Apparently, I miss something. >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 56d07e634..a93e2503e 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void) >>> replication_skip_conflict = cfg_geti("replication_skip_conflict"); >>> } >>> >>> +void >>> +box_set_replication_anon(void) >>> +{ >>> +bool anon = box_check_replication_anon(); >>> +if (anon == replication_anon) >>> +return; >>> + >>> +if (!anon) { >>> +/* Turn anonymous instance into a normal one. */ >>> +replication_anon = anon; >> >> 2. In case of any exception `replication_anon` will contain a not >> applied value. >> >> But there is a worse problem - we stop the appliers, and don't >> restart them in case of an exception. So box_set_replication_anon() >> is not atomic, it may leave the instance in a half configured >> state. box_set_replication() deals with that by creation of a new >> array of appliers. It connects them, and only then atomically >> replaces the old appliers. >> >> Also looks like deanonymisation does not respect quorum. I did >> this test: >> >>     Master: >>         box.cfg{listen=3301} >>         box.schema.user.grant('guest', 'read,write,execute', 'universe') >> >> >>     Replica: >>         box.cfg{replication_anon=true, >>                 read_only=true, >>                 replication_connect_quorum=0, >>                 replication=3301} >> >>     Master: >>         Killed. >> >>     Replica: >>         box.cfg{replication_anon=false, read_only=false} >> >> The replica hangs. Even though the quorum is 0. >> >> I think, we somehow need to reuse box_set_replication() for all >> of this. Looks like it does all we need except does not send the >> register requests. > > True. Cool, I am happy that it was fixed so easy! Although this test should probably be added to anon.test.lua. Now I've discovered another possible problem. Are anon replicas going to be used as a failover master in case the old master dies? Because the example above shows that it is not possible to make an anon replica read-write without a master. This is what I get, even with 0 quorum: --- - error: Couldn't find an instance to register this replica on. ... Is that intended? Likely yes, but I ask just in case. However the thing below looks like a bug: Master: box.cfg{listen=3301} box.schema.user.grant('guest', 'read,write,execute', 'universe') box.schema.user.grant('guest', 'replication') Replica: box.cfg{replication_anon=true, read_only=true, replication_connect_quorum=0, replication=3301} box.cfg{replication_anon=false} --- - error: Couldn't find an instance to register this replica on. ... This is because of quorum 0. When the quorum is 0, box_set_replication_anon() tries to find a leader immediately after creation of appliers. Note, that the master is alive. I tried to make a simple fix, but failed. Temporary change to quorum 1 won't help, btw, because it can connect to a read-only replica instead of a master. Everything works, when I change quorum to 1, and then try deanon again. Also it works when I do this: @@ -789,7 +789,7 @@ box_set_replication_anon(void) * them can register and others resend a * non-anonymous subscribe. */ - box_sync_replication(false); + box_sync_replication(true); /* * Wait until the master has registered this * instance. But I am not sure that it is legal to do so. >>> +/* >>> + * Reset all appliers. This will interrupt >>> + * anonymous follow they're in so that one of >>> + * them can register and others resend a >>> + * non-anonymous subscribe. >>> + */ >>> +replicaset_foreach(replica) { >>> +struct applier *applier = replica->applier; >>> +if (applier == NULL) >>> +continue; >>> +replica_clear_applier(replica); >>> +applier_stop(applier); >>> +replica->applier_sync_state = APPLIER_DISCONNECTED; >>> +replica_set_applier(replica, applier); >>> +} >>> +/* Choose a master to send register request to. */ >>> +struct replica *master = replicaset_leader(); >>> +assert(master != NULL && master->applier != NULL); >>> +struct applier *master_applier = master->applier; >>> + >>> +applier_start(master_applier); >>> + >>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY); >>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY); >>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY); Is it possible to always wait for APPLIER_READY and not to pay attention to the prior states? Or at least wait for REGISTERED + READY only? Like this: ================================================================================ } struct applier *master_applier = master->applier; - applier_resume_to_state(master_applier, APPLIER_READY, - TIMEOUT_INFINITY); - applier_resume_to_state(master_applier, APPLIER_REGISTER, - TIMEOUT_INFINITY); applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY); ================================================================================ >>> +applier_resume(master_applier); >>> +/** >>> + * Restart other appliers to >>> + * resend non-anonymous subscribe. >>> + */ >>> +replicaset_foreach(replica) { >>> +if (replica != master && replica->applier) >>> +applier_start(replica->applier); >>> +} >>> +} else if (!is_box_configured) { >>> +replication_anon = anon; >>> +} else { >>> +/* >>> + * It is forbidden to turn a normal replica into >>> + * an anonymous one. >>> + */ >>> +tnt_raise(ClientError, ER_CFG, "replication_anon", >>> + "cannot be turned on after bootstrap" >>> + " has finished"); >>> +} >>> + >>> +} >>> diff --git a/src/box/relay.cc b/src/box/relay.cc >>> index e849fcf4f..22029751f 100644 >>> --- a/src/box/relay.cc >>> +++ b/src/box/relay.cc >>> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync, >>> relay_start(relay, fd, sync, relay_send_row); >>> auto relay_guard = make_scoped_guard([=] { >>> relay_stop(relay); >>> -replica_on_relay_stop(replica); >>> +if (replica->anon) >>> +replica_anon_delete(replica); >>> +else >>> +replica_on_relay_stop(replica); >> >> 3. Why can't replica_on_relay_stop() be used for an anon replica? > > replica_on_relay_stop() asserts the replica is registered with gc and > removes it from consumers. Anonymous replicas aren't registered with gc. > I can rewrite replica_on_relay_stop() appropriately, but why? The problem is that replica_on_relay_stop() is not really a destructor. It is an event. What to do on that event is encapsulated inside replica_on_relay_stop(). When you added branching on anon setting, the encapsulation was broken. >> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua >> index 9dee75b7d..51d157aa3 100644 >> --- a/src/box/lua/load_cfg.lua >> +++ b/src/box/lua/load_cfg.lua >> @@ -279,6 +279,7 @@ local dynamic_cfg_order = { >>      replication_sync_timeout = 150, >>      replication_connect_timeout = 150, >>      replication_connect_quorum = 150, >> + replication_anon = 175, >> ================================================================================ >> >> I am not sure about this change. >> >> The only way how can it be changed now - from true to false. >> Not back. The only dubious scenario: we change `replication_anon` >> to false, and change `replication` at the same time. >> >> If we set `replication_anon` false before new `replication`, >> we will try to register the old `replication`. It means >> `replication_anon` update should be done after `replication`. >> So as the new configuration could connect as anon, and then be >> upgraded to normal. >> >> But when we will support anonymization of normal replicas, the >> config update could look like this: >> >>     `replication_anon` false -> true >>     `replication` old -> new >> >> And here `replication_anon` should be set before `replication`. >> So as we would connect anonymously to the new replication set. >> >> In both cases we do unnecessary actions with the old >> replication. >> >> Seems like it can't be properly solved by dynamic_cfg_order >> table. We need to configure `replication_anon` and `replication` >> at the same time. So as in box_set_replication() you always know >> if the new `replication` setting is anon or not. >> >> Maybe we need to extend dynamic_cfg_order idea, and introduce >> depending parameters. Or parameter groups. Or something like >> that. So as we could, for example, group `replication_anon` and >> `replication`, and pass anon value to box_set_replication() when >> they are updated both at the same box.cfg. >> >> Or we need to forbid update of some parameters in one box.cfg. >> >> Or we could make that order of parameters depends on their >> values. Although it does not solve the problem that when we >> change both `replication_anon` and `replication`, we do >> unnecessary actions to the old `replication` in some cases. > > Well, I reused the box_set_replication code in replication_anon change, so > now if replication_anon is changed together with replication, it will trigger > duplicate box_set_replication action (almost). This is ok, I guess. No, looks like the same problem still exists. The settings are applied one by one. So when we change anon and replication, the anon is changed first, and replication is changed second. When anon is changed, it still sees the old replication setting. Check this code in load_cfg.lua: local function reload_cfg(oldcfg, cfg) cfg = upgrade_cfg(cfg, translate_cfg) local newcfg = prepare_cfg(cfg, default_cfg, template_cfg, modify_cfg) local ordered_cfg = {} -- iterate over original table because prepare_cfg() may store NILs for key, val in pairs(cfg) do if dynamic_cfg[key] == nil and oldcfg[key] ~= val then box.error(box.error.RELOAD_CFG, key); end table.insert(ordered_cfg, key) end table.sort(ordered_cfg, sort_cfg_cb) for _, key in pairs(ordered_cfg) do local val = newcfg[key] local oldval = oldcfg[key] if not compare_cfg(val, oldval) then rawset(oldcfg, key, val) if not pcall(dynamic_cfg[key]) then rawset(oldcfg, key, oldval) -- revert the old value return box.error() -- re-throw end if log_cfg_option[key] ~= nil then val = log_cfg_option[key](val) end log.info("set '%s' configuration option to %s", key, json.encode(val)) end end ... Here rawset(oldcfg, key, val) is the actual update of what we see in cfg_get*() functions. So when anon is changed before replication, it will try to deanon the old replication. And only then will configure the new replication. In that case lets configure anon after replication in this patch. It partially solves the problem for now. And I added a ticket to improve this part. https://github.com/tarantool/tarantool/issues/4714 Below are my review fixes: ================================================================================ commit ed559c8f18d5c8606410bfbb76a380a43e2e163f Author: Vladislav Shpilevoy Date: Fri Dec 27 19:56:11 2019 +0300 Review fixes 5/5 diff --git a/src/box/box.cc b/src/box/box.cc index f99aeebc6..5a8e4d3b9 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -790,7 +790,10 @@ box_set_replication_anon(void) * non-anonymous subscribe. */ box_sync_replication(false); - /* Choose a master to send register request to. */ + /* + * Wait until the master has registered this + * instance. + */ struct replica *master = replicaset_leader(); if (master == NULL || master->applier == NULL || master->applier->state != APPLIER_CONNECTED) { @@ -798,10 +801,6 @@ box_set_replication_anon(void) } struct applier *master_applier = master->applier; - applier_resume_to_state(master_applier, APPLIER_READY, - TIMEOUT_INFINITY); - applier_resume_to_state(master_applier, APPLIER_REGISTER, - TIMEOUT_INFINITY); applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY); applier_resume_to_state(master_applier, APPLIER_READY, diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua index 51d157aa3..955f50061 100644 --- a/src/box/lua/load_cfg.lua +++ b/src/box/lua/load_cfg.lua @@ -279,8 +279,16 @@ local dynamic_cfg_order = { replication_sync_timeout = 150, replication_connect_timeout = 150, replication_connect_quorum = 150, - replication_anon = 175, replication = 200, + -- Anon is set after `replication` as a temporary workaround + -- for the problem, that `replication` and `replication_anon` + -- depend on each other. If anon would be configured before + -- `replication`, it could lead to a bug, when anon is changed + -- from true to false together with `replication`, and it + -- would try to deanon the old `replication` before applying + -- the new one. This should be fixed when box.cfg is able to + -- apply some parameters together and atomically. + replication_anon = 250, } local function sort_cfg_cb(l, r) diff --git a/src/box/relay.cc b/src/box/relay.cc index 4258edfcd..7ffc79285 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -697,10 +697,7 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync, relay_start(relay, fd, sync, relay_send_row); auto relay_guard = make_scoped_guard([=] { relay_stop(relay); - if (replica->anon) - replica_anon_delete(replica); - else - replica_on_relay_stop(replica); + replica_on_relay_stop(replica); }); vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock); diff --git a/src/box/replication.cc b/src/box/replication.cc index 11096cc79..1476c7472 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -890,14 +890,24 @@ void replica_on_relay_stop(struct replica *replica) { /* - * If the replica was evicted from the cluster, we don't - * need to keep WALs for it anymore. Unregister it with - * the garbage collector then. See also replica_clear_id. + * If the replica was evicted from the cluster, or was not + * even added there (anon replica), we don't need to keep + * WALs for it anymore. Unregister it with the garbage + * collector then. See also replica_clear_id. */ - assert(replica->gc != NULL); if (replica->id == REPLICA_ID_NIL) { - gc_consumer_unregister(replica->gc); - replica->gc = NULL; + if (!replica->anon) { + gc_consumer_unregister(replica->gc); + replica->gc = NULL; + } else { + assert(replica->gc == NULL); + assert(replica->id == REPLICA_ID_NIL); + /* + * We do not replicate from anonymous + * replicas. + */ + assert(replica->applier == NULL); + } } if (replica_is_orphan(replica)) { replica_hash_remove(&replicaset.hash, replica); @@ -905,17 +915,6 @@ replica_on_relay_stop(struct replica *replica) } } -void -replica_anon_delete(struct replica *replica) -{ - assert(replica->gc == NULL); - assert(replica->id == REPLICA_ID_NIL); - /* We do not replicate from anonymous replicas */ - assert(replica->applier == NULL); - replica_hash_remove(&replicaset.hash, replica); - replica_delete(replica); -} - struct replica * replicaset_first(void) { diff --git a/src/box/replication.h b/src/box/replication.h index 978a09d41..2ef1255b3 100644 --- a/src/box/replication.h +++ b/src/box/replication.h @@ -367,9 +367,6 @@ replica_set_applier(struct replica * replica, struct applier * applier); void replica_on_relay_stop(struct replica *replica); -void -replica_anon_delete(struct replica *replica); - #if defined(__cplusplus) } /* extern "C" */ diff --git a/test/replication/anon.result b/test/replication/anon.result index f6a85e7d0..88061569f 100644 --- a/test/replication/anon.result +++ b/test/replication/anon.result @@ -324,3 +324,62 @@ box.schema.user.revoke('guest', 'replication') test_run:cleanup_cluster() | --- | ... + +-- +-- Check that in case of a master absence an anon replica can't +-- deanonymize itself, regardless of quorum value. +-- +test_run:cmd('create server master with script="replication/master1.lua"') + | --- + | - true + | ... +test_run:cmd('start server master') + | --- + | - true + | ... +test_run:switch('master') + | --- + | - true + | ... +box.schema.user.grant('guest', 'replication') + | --- + | ... +test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica_anon') + | --- + | - true + | ... +test_run:switch('replica_anon') + | --- + | - true + | ... +box.cfg{replication_connect_quorum = 0} + | --- + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('delete server master') + | --- + | - true + | ... +box.cfg{replication_anon = false} + | --- + | - error: Couldn't find an instance to register this replica on. + | ... +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server replica_anon') + | --- + | - true + | ... +test_run:cmd('delete server replica_anon') + | --- + | - true + | ... diff --git a/test/replication/anon.test.lua b/test/replication/anon.test.lua index 1877ec8bb..8a8d15c18 100644 --- a/test/replication/anon.test.lua +++ b/test/replication/anon.test.lua @@ -116,3 +116,22 @@ box.space.temp:drop() box.space.loc:drop() box.schema.user.revoke('guest', 'replication') test_run:cleanup_cluster() + +-- +-- Check that in case of a master absence an anon replica can't +-- deanonymize itself, regardless of quorum value. +-- +test_run:cmd('create server master with script="replication/master1.lua"') +test_run:cmd('start server master') +test_run:switch('master') +box.schema.user.grant('guest', 'replication') +test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"') +test_run:cmd('start server replica_anon') +test_run:switch('replica_anon') +box.cfg{replication_connect_quorum = 0} +test_run:cmd('stop server master') +test_run:cmd('delete server master') +box.cfg{replication_anon = false} +test_run:switch('default') +test_run:cmd('stop server replica_anon') +test_run:cmd('delete server replica_anon') diff --git a/test/replication/master1.lua b/test/replication/master1.lua new file mode 120000 index 000000000..1c7debd2f --- /dev/null +++ b/test/replication/master1.lua @@ -0,0 +1 @@ +master.lua \ No newline at end of file