Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Sergey Petrenko <sergepetrenko@tarantool.org>,
	Kirill Yukhin <kyukhin@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.
Date: Sat, 28 Dec 2019 15:15:19 +0300	[thread overview]
Message-ID: <40e33eff-8893-5917-edeb-1f5a8aee8777@tarantool.org> (raw)
In-Reply-To: <1577533695.773222909@f469.i.mail.ru>

Thanks for the fixes!

The whole patchset LGTM. With some of my answers below.

>     >> 1. Why do we check for version here? Wouldn't it be more straight
>     >> to do something like this?
>     >>
>     >> ================================================================================
>     >>
>     >> diff --git a/src/box/applier.cc b/src/box/applier.cc
>     >> index 3efa2b765..32851506c 100644
>     >> --- a/src/box/applier.cc
>     >> +++ b/src/box/applier.cc
>     >> @@ -1032,6 +1032,7 @@ applier_f(va_list ap)
>     >>  while (!fiber_is_cancelled()) {
>     >>  try {
>     >>  applier_connect(applier);
>     >> +bool was_anon = replication_anon;
>     >>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>     >>  /*
>     >>   * Execute JOIN if this is a bootstrap.
>     >> @@ -1046,8 +1047,8 @@ applier_f(va_list ap)
>     >>  else
>     >>  applier_join(applier);
>     >>  }
>     >> -if (applier->version_id >= version_id(1, 7, 0) &&
>     >> - !replication_anon && instance_id == REPLICA_ID_NIL) {
>     >> +if (was_anon && !replication_anon &&
>     >> + instance_id == REPLICA_ID_NIL) {
>     >>  /*
>     >>   * Anonymity was turned off while
>     >>   * we were fetching a snapshot or
>     >>
>     >> ================================================================================
>     >>
>     >> Also I don't understand, why is it needed? From what I see in
>     >> box_set_replication_anon(), we stop existing appliers, when
>     >> change `replication_anon`. So in case `replication_anon` is
>     >> changed, the fiber should be canceled by that moment.
>     >> applier_connect() does fiber_testcancel() via coio after each
>     >> yield.
>     >>
>     >> A register could even harm here, because this instance will do
>     >> register twice - from there, and from a new applier fiber, started
>     >> in box_set_replication_anon().
>     >
>     > True, the version check is pointless and the comment is misleading.
>     > But we need to call applier_register here. In case of a transition
>     > from anon to normal, we have already fetched a snapshot, so we
>     > shouldn't execute join in a new applier, we just execute register and
>     > proceed to subscribe. Fixed.
> 
>     Thanks for the fix. Although I still don't understand how is this
>     code reachable (I tried to add assert(false), and it crashed). It
>     is even covered with a test, what is cool btw.
> 
> 
> 
> 
>     As I see, when we change anon setting, the existing appliers are
>     cancelled. Why does this applier_f fiber still works, if any change
>     in anon and replication cancels all the existing appliers?
> 
> 
> On replication_anon reconfiguration old appliers are killed.
> The new applier is started, it connects, checks for replicaset_uuid, which is
> not nil, since the previous applier has fetched a snapshot,
> proceeds to checking whether the instance_uuid is zero.
> If it is (it is, since we were anonymous), and replication_anon is false
> (and it is, since we have reconfigured it), it proceeds to register.
> Then it subscribes.
> 
> I guess the previous comment that we fixed here was misleading.
> This was how I understood it when I started working on this and
> tried to reuse the same applier without killing it. So, sorry for the
> confusion.

Aha, now I understand. Then looks ok.

>     However the thing below looks like a bug:
> 
>         Master:
>           box.cfg{listen=3301}
>           box.schema.user.grant('guest', 'read,write,execute', 'universe')
>           box.schema.user.grant('guest', 'replication')
> 
>         Replica:
>           box.cfg{replication_anon=true,
>                   read_only=true,
>                   replication_connect_quorum=0,
>                   replication=3301}
> 
>           box.cfg{replication_anon=false}
>           ---
>           - error: Couldn't find an instance to register this replica on.
>           ...
> 
>     This is because of quorum 0. When the quorum is 0, box_set_replication_anon()
>     tries to find a leader immediately after creation of appliers. Note, that the
>     master is alive. I tried to make a simple fix, but failed. Temporary change
>     to quorum 1 won't help, btw, because it can connect to a read-only replica
>     instead of a master.
> 
>     Everything works, when I change quorum to 1, and then try deanon again.
> 
>     Also it works when I do this:
> 
>     @@ -789,7 +789,7 @@ box_set_replication_anon(void)
>       * them can register and others resend a
>       * non-anonymous subscribe.
>       */
>     - box_sync_replication(false);
>     + box_sync_replication(true);
>       /*
>       * Wait until the master has registered this
>       * instance.
> 
>     But I am not sure that it is legal to do so.
> 
> 
> Maybe leave it as is for now? I can't see what the consequences
> will be. If you want to promote an anon replica to master this
> probably means that one of the masters is dead. So if you do
> box_sync_replication(true), your instance will hang on replication_anon
> configuration for replication timeout or whatever (replication_connect_timeout?).
> You'll need to remove the dead master from box.cfg.replication first.
> So this will still require additional configuration mangling.
> 
> If you think it's fine to wait for a timeout on replication_anon reconfiguration,
> feel free to apply the change above. I'm not against it.

I don't know what is more correct to do here. Yes, lets keep it
as is, and look at user feedback.

>     >>> +/*
>     >>> + * Reset all appliers. This will interrupt
>     >>> + * anonymous follow they're in so that one of
>     >>> + * them can register and others resend a
>     >>> + * non-anonymous subscribe.
>     >>> + */
>     >>> +replicaset_foreach(replica) {
>     >>> +struct applier *applier = replica->applier;
>     >>> +if (applier == NULL)
>     >>> +continue;
>     >>> +replica_clear_applier(replica);
>     >>> +applier_stop(applier);
>     >>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>     >>> +replica_set_applier(replica, applier);
>     >>> +}
>     >>> +/* Choose a master to send register request to. */
>     >>> +struct replica *master = replicaset_leader();
>     >>> +assert(master != NULL && master->applier != NULL);
>     >>> +struct applier *master_applier = master->applier;
>     >>> +
>     >>> +applier_start(master_applier);
>     >>> +
>     >>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>     >>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>     >>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
> 
>     Is it possible to always wait for APPLIER_READY and not to
>     pay attention to the prior states?
> 
> No, applier transitions to ready right after CONNECTED.
> 
> 
> 
>     Or at least wait for REGISTERED + READY only? Like this:
> 
> 
> This is ok. I just liked the verbosity of waiting for all the states step by step.

You are free to revert this part of my change, I don't mind.

  reply	other threads:[~2019-12-28 12:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 1/5] box: update comment describing join protocol sergepetrenko
2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 2/5] replication: do not decode replicaset uuid when processing a subscribe sergepetrenko
2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 3/5] applier: split join processing into two stages sergepetrenko
2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons sergepetrenko
2019-12-25 16:00   ` Vladislav Shpilevoy
2019-12-27 18:42   ` Vladislav Shpilevoy
2019-12-28 11:21     ` Sergey Petrenko
2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica sergepetrenko
2019-12-25 18:22   ` Vladislav Shpilevoy
2019-12-27 15:27     ` Sergey Petrenko
2019-12-27 18:42       ` Vladislav Shpilevoy
2019-12-28 11:48         ` Sergey Petrenko
2019-12-28 12:15           ` Vladislav Shpilevoy [this message]
2019-12-30  5:12 ` [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40e33eff-8893-5917-edeb-1f5a8aee8777@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=kyukhin@tarantool.org \
    --cc=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox