From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: Sergey Petrenko <sergepetrenko@tarantool.org>, Kirill Yukhin <kyukhin@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica. Date: Sat, 28 Dec 2019 15:15:19 +0300 [thread overview] Message-ID: <40e33eff-8893-5917-edeb-1f5a8aee8777@tarantool.org> (raw) In-Reply-To: <1577533695.773222909@f469.i.mail.ru> Thanks for the fixes! The whole patchset LGTM. With some of my answers below. > >> 1. Why do we check for version here? Wouldn't it be more straight > >> to do something like this? > >> > >> ================================================================================ > >> > >> diff --git a/src/box/applier.cc b/src/box/applier.cc > >> index 3efa2b765..32851506c 100644 > >> --- a/src/box/applier.cc > >> +++ b/src/box/applier.cc > >> @@ -1032,6 +1032,7 @@ applier_f(va_list ap) > >> while (!fiber_is_cancelled()) { > >> try { > >> applier_connect(applier); > >> +bool was_anon = replication_anon; > >> if (tt_uuid_is_nil(&REPLICASET_UUID)) { > >> /* > >> * Execute JOIN if this is a bootstrap. > >> @@ -1046,8 +1047,8 @@ applier_f(va_list ap) > >> else > >> applier_join(applier); > >> } > >> -if (applier->version_id >= version_id(1, 7, 0) && > >> - !replication_anon && instance_id == REPLICA_ID_NIL) { > >> +if (was_anon && !replication_anon && > >> + instance_id == REPLICA_ID_NIL) { > >> /* > >> * Anonymity was turned off while > >> * we were fetching a snapshot or > >> > >> ================================================================================ > >> > >> Also I don't understand, why is it needed? From what I see in > >> box_set_replication_anon(), we stop existing appliers, when > >> change `replication_anon`. So in case `replication_anon` is > >> changed, the fiber should be canceled by that moment. > >> applier_connect() does fiber_testcancel() via coio after each > >> yield. > >> > >> A register could even harm here, because this instance will do > >> register twice - from there, and from a new applier fiber, started > >> in box_set_replication_anon(). > > > > True, the version check is pointless and the comment is misleading. > > But we need to call applier_register here. In case of a transition > > from anon to normal, we have already fetched a snapshot, so we > > shouldn't execute join in a new applier, we just execute register and > > proceed to subscribe. Fixed. > > Thanks for the fix. Although I still don't understand how is this > code reachable (I tried to add assert(false), and it crashed). It > is even covered with a test, what is cool btw. > > > > > As I see, when we change anon setting, the existing appliers are > cancelled. Why does this applier_f fiber still works, if any change > in anon and replication cancels all the existing appliers? > > > On replication_anon reconfiguration old appliers are killed. > The new applier is started, it connects, checks for replicaset_uuid, which is > not nil, since the previous applier has fetched a snapshot, > proceeds to checking whether the instance_uuid is zero. > If it is (it is, since we were anonymous), and replication_anon is false > (and it is, since we have reconfigured it), it proceeds to register. > Then it subscribes. > > I guess the previous comment that we fixed here was misleading. > This was how I understood it when I started working on this and > tried to reuse the same applier without killing it. So, sorry for the > confusion. Aha, now I understand. Then looks ok. > However the thing below looks like a bug: > > Master: > box.cfg{listen=3301} > box.schema.user.grant('guest', 'read,write,execute', 'universe') > box.schema.user.grant('guest', 'replication') > > Replica: > box.cfg{replication_anon=true, > read_only=true, > replication_connect_quorum=0, > replication=3301} > > box.cfg{replication_anon=false} > --- > - error: Couldn't find an instance to register this replica on. > ... > > This is because of quorum 0. When the quorum is 0, box_set_replication_anon() > tries to find a leader immediately after creation of appliers. Note, that the > master is alive. I tried to make a simple fix, but failed. Temporary change > to quorum 1 won't help, btw, because it can connect to a read-only replica > instead of a master. > > Everything works, when I change quorum to 1, and then try deanon again. > > Also it works when I do this: > > @@ -789,7 +789,7 @@ box_set_replication_anon(void) > * them can register and others resend a > * non-anonymous subscribe. > */ > - box_sync_replication(false); > + box_sync_replication(true); > /* > * Wait until the master has registered this > * instance. > > But I am not sure that it is legal to do so. > > > Maybe leave it as is for now? I can't see what the consequences > will be. If you want to promote an anon replica to master this > probably means that one of the masters is dead. So if you do > box_sync_replication(true), your instance will hang on replication_anon > configuration for replication timeout or whatever (replication_connect_timeout?). > You'll need to remove the dead master from box.cfg.replication first. > So this will still require additional configuration mangling. > > If you think it's fine to wait for a timeout on replication_anon reconfiguration, > feel free to apply the change above. I'm not against it. I don't know what is more correct to do here. Yes, lets keep it as is, and look at user feedback. > >>> +/* > >>> + * Reset all appliers. This will interrupt > >>> + * anonymous follow they're in so that one of > >>> + * them can register and others resend a > >>> + * non-anonymous subscribe. > >>> + */ > >>> +replicaset_foreach(replica) { > >>> +struct applier *applier = replica->applier; > >>> +if (applier == NULL) > >>> +continue; > >>> +replica_clear_applier(replica); > >>> +applier_stop(applier); > >>> +replica->applier_sync_state = APPLIER_DISCONNECTED; > >>> +replica_set_applier(replica, applier); > >>> +} > >>> +/* Choose a master to send register request to. */ > >>> +struct replica *master = replicaset_leader(); > >>> +assert(master != NULL && master->applier != NULL); > >>> +struct applier *master_applier = master->applier; > >>> + > >>> +applier_start(master_applier); > >>> + > >>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY); > >>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY); > >>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY); > > Is it possible to always wait for APPLIER_READY and not to > pay attention to the prior states? > > No, applier transitions to ready right after CONNECTED. > > > > Or at least wait for REGISTERED + READY only? Like this: > > > This is ok. I just liked the verbosity of waiting for all the states step by step. You are free to revert this part of my change, I don't mind.
next prev parent reply other threads:[~2019-12-28 12:15 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko 2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 1/5] box: update comment describing join protocol sergepetrenko 2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 2/5] replication: do not decode replicaset uuid when processing a subscribe sergepetrenko 2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 3/5] applier: split join processing into two stages sergepetrenko 2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons sergepetrenko 2019-12-25 16:00 ` Vladislav Shpilevoy 2019-12-27 18:42 ` Vladislav Shpilevoy 2019-12-28 11:21 ` Sergey Petrenko 2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica sergepetrenko 2019-12-25 18:22 ` Vladislav Shpilevoy 2019-12-27 15:27 ` Sergey Petrenko 2019-12-27 18:42 ` Vladislav Shpilevoy 2019-12-28 11:48 ` Sergey Petrenko 2019-12-28 12:15 ` Vladislav Shpilevoy [this message] 2019-12-30 5:12 ` [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=40e33eff-8893-5917-edeb-1f5a8aee8777@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=kyukhin@tarantool.org \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox