From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Konstantin Osipov <kostja@tarantool.org>
Cc: tarantool-patches@freelists.org
Subject: Re: [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind
Date: Thu, 19 Jul 2018 13:04:39 +0300 [thread overview]
Message-ID: <20180719100439.bmcm4txbbjb2xjjf@esperanza> (raw)
In-Reply-To: <20180719071903.GG11373@chai>
On Thu, Jul 19, 2018 at 10:19:03AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/07/14 23:50]:
>
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index b629a4d8..baf30fce 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -1797,6 +1797,9 @@ bootstrap(const struct tt_uuid *instance_uuid,
> > /**
> > * Recover the instance from the local directory.
> > * Enter hot standby if the directory is locked.
> > + * Invoke rebootstrap if the instance fell too much
> > + * behind its peers in the replica set and needs
> > + * to be rebootstrapped.
> > */
> > static void
> > local_recovery(const struct tt_uuid *instance_uuid,
> > @@ -1832,6 +1835,12 @@ local_recovery(const struct tt_uuid *instance_uuid,
> > if (wal_dir_lock >= 0) {
> > box_listen();
> > box_sync_replication(replication_connect_timeout, false);
> > +
> > + struct replica *master;
> > + if (replicaset_needs_rejoin(&master)) {
> > + say_info("replica is too old, initiating rejoin");
> > + return bootstrap_from_master(master);
>
> This is say_crit() IMHO.
>
> > +bool
> > +replicaset_needs_rejoin(struct replica **master)
> > +{
> > + replicaset_foreach(replica) {
> > + /*
> > + * Rebootstrap this instance from a master if:
> > + * - the oldest vclock stored on the master is greater
> > + * than or incomparable with the instance vclock
> > + * (so that the instance can't follow the master) and
> > + * - the instance is strictly behind the master (so
> > + * that we won't lose any data by rebootstrapping
> > + * this instance)
> > + */
> > + struct applier *applier = replica->applier;
> > + if (applier != NULL &&
> > + vclock_compare(&applier->remote_status.gc_vclock,
> > + &replicaset.vclock) > 0 &&
> > + vclock_compare(&replicaset.vclock,
> > + &applier->remote_status.vclock) < 0) {
> > + *master = replica;
> > + return true;
>
> I'd love to see a bit more clarity in the log about this decision
> making process. Imagine this function returns 'false' because
> vclocks are incomparable and then replication breaks - it would be
> very hard to diagnose why this happened. You could add some
> logging to this function, but this would change its contract, since
> currently this function has no side effects.
>
> Should it set the diagnostics area in case of error? Log the
> error? Return an extra status code? Please feel free to choose the
> option you think is best.
What about this?
: 2018-07-19 12:53:45.195 [19455] main/101/replica I> can't follow [::1]:47383: required {1: 8} available {1: 12}
: 2018-07-19 12:53:45.195 [19455] main/101/replica C> replica is too old, initiating rebootstrap
: 2018-07-19 12:53:45.195 [19455] main/101/replica I> bootstrapping replica from [::1]:47383
and
: 2018-07-19 12:53:46.546 [19493] main/101/replica I> can't follow [::1]:47383: required {1: 17, 2: 1} available {1: 20}
: 2018-07-19 12:53:46.546 [19493] main/101/replica I> can't rebootstrap from [::1]:47383: replica has local rows: local {1: 17, 2: 1} remote {1: 23}
: 2018-07-19 12:53:46.546 [19493] main/101/replica I> recovery start
The diff is below. Note, apart from logging I also changed the logic
of the rebootstrap trigger a bit: now we proceed to rebootstrap only
if none of the configured masters has rows needed by the instance.
I haven't pushed the patch to the branch yet. If you're okay with it,
I'll squash it into the original patch, rebase the whole series, and
then update the remote branch.
diff --git a/src/box/box.cc b/src/box/box.cc
index baf30fce..7d752c59 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1838,7 +1838,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
struct replica *master;
if (replicaset_needs_rejoin(&master)) {
- say_info("replica is too old, initiating rejoin");
+ say_crit("replica is too old, initiating rebootstrap");
return bootstrap_from_master(master);
}
}
diff --git a/src/box/replication.cc b/src/box/replication.cc
index d61a984f..3ea4f578 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -41,6 +41,7 @@
#include "error.h"
#include "relay.h"
#include "vclock.h" /* VCLOCK_MAX */
+#include "sio.h"
uint32_t instance_id = REPLICA_ID_NIL;
struct tt_uuid INSTANCE_UUID;
@@ -628,28 +629,58 @@ error:
bool
replicaset_needs_rejoin(struct replica **master)
{
+ struct replica *leader = NULL;
replicaset_foreach(replica) {
- /*
- * Rebootstrap this instance from a master if:
- * - the oldest vclock stored on the master is greater
- * than or incomparable with the instance vclock
- * (so that the instance can't follow the master) and
- * - the instance is strictly behind the master (so
- * that we won't lose any data by rebootstrapping
- * this instance)
- */
struct applier *applier = replica->applier;
- if (applier != NULL &&
- vclock_compare(&applier->remote_status.gc_vclock,
- &replicaset.vclock) > 0 &&
- vclock_compare(&replicaset.vclock,
- &applier->remote_status.vclock) < 0) {
- *master = replica;
- return true;
+ if (applier == NULL)
+ continue;
+
+ const struct status *status = &applier->remote_status;
+ if (vclock_compare(&status->gc_vclock, &replicaset.vclock) < 0) {
+ /*
+ * There's at least one master that still stores
+ * WALs needed by this instance. Proceed to local
+ * recovery.
+ */
+ return false;
}
+
+ const char *addr_str = sio_strfaddr(&applier->addr,
+ applier->addr_len);
+ char *local_vclock_str = vclock_to_string(&replicaset.vclock);
+ char *remote_vclock_str = vclock_to_string(&status->vclock);
+ char *gc_vclock_str = vclock_to_string(&status->gc_vclock);
+
+ say_info("can't follow %s: required %s available %s",
+ addr_str, local_vclock_str, gc_vclock_str);
+
+ if (vclock_compare(&replicaset.vclock, &status->vclock) > 0) {
+ /*
+ * Replica has some rows that are not present on
+ * the master. Don't rebootstrap as we don't want
+ * to lose any data.
+ */
+ say_info("can't rebootstrap from %s: "
+ "replica has local rows: local %s remote %s",
+ addr_str, local_vclock_str, remote_vclock_str);
+ goto next;
+ }
+
+ /* Prefer a master with the max vclock. */
+ if (leader == NULL ||
+ vclock_sum(&applier->remote_status.vclock) >
+ vclock_sum(&leader->applier->remote_status.vclock))
+ leader = replica;
+next:
+ free(local_vclock_str);
+ free(remote_vclock_str);
+ free(gc_vclock_str);
}
- *master = NULL;
- return false;
+ if (leader == NULL)
+ return false;
+
+ *master = leader;
+ return true;
}
void
next prev parent reply other threads:[~2018-07-19 10:04 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-14 20:49 [PATCH v3 00/11] Replica rejoin Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 01/11] recovery: clean up WAL dir scan code Vladimir Davydov
2018-07-19 7:08 ` Konstantin Osipov
2018-07-14 20:49 ` [PATCH v3 02/11] xrow: factor out function for decoding vclock Vladimir Davydov
2018-07-19 7:08 ` Konstantin Osipov
2018-07-14 20:49 ` [PATCH v3 03/11] Introduce IPROTO_REQUEST_STATUS command Vladimir Davydov
2018-07-19 7:10 ` Konstantin Osipov
2018-07-19 8:17 ` Vladimir Davydov
2018-07-21 10:25 ` Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 04/11] Get rid of IPROTO_SERVER_IS_RO Vladimir Davydov
2018-07-19 7:10 ` Konstantin Osipov
2018-07-21 12:07 ` Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 05/11] gc: keep track of vclocks instead of signatures Vladimir Davydov
2018-07-19 7:11 ` Konstantin Osipov
2018-07-14 20:49 ` [PATCH v3 06/11] Include oldest vclock available on the instance in IPROTO_STATUS Vladimir Davydov
2018-07-19 7:12 ` Konstantin Osipov
2018-07-21 12:07 ` Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind Vladimir Davydov
2018-07-19 7:19 ` Konstantin Osipov
2018-07-19 10:04 ` Vladimir Davydov [this message]
2018-07-23 20:19 ` Konstantin Osipov
2018-07-27 16:13 ` [PATCH] replication: print master uuid when (re)bootstrapping Vladimir Davydov
2018-07-31 8:34 ` Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 08/11] vinyl: simplify vylog recovery from backup Vladimir Davydov
2018-07-31 8:21 ` Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 09/11] vinyl: pass flags to vy_recovery_new Vladimir Davydov
2018-07-21 11:12 ` Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 10/11] Update test-run Vladimir Davydov
2018-07-21 11:13 ` Vladimir Davydov
2018-07-14 20:49 ` [PATCH v3 11/11] vinyl: implement rebootstrap support Vladimir Davydov
2018-07-31 8:23 ` Vladimir Davydov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180719100439.bmcm4txbbjb2xjjf@esperanza \
--to=vdavydov.dev@gmail.com \
--cc=kostja@tarantool.org \
--cc=tarantool-patches@freelists.org \
--subject='Re: [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox