[PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind

Vladimir Davydov vdavydov.dev at gmail.com
Thu Jul 19 13:04:39 MSK 2018


On Thu, Jul 19, 2018 at 10:19:03AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [18/07/14 23:50]:
> 
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index b629a4d8..baf30fce 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -1797,6 +1797,9 @@ bootstrap(const struct tt_uuid *instance_uuid,
> >  /**
> >   * Recover the instance from the local directory.
> >   * Enter hot standby if the directory is locked.
> > + * Invoke rebootstrap if the instance fell too much
> > + * behind its peers in the replica set and needs
> > + * to be rebootstrapped.
> >   */
> >  static void
> >  local_recovery(const struct tt_uuid *instance_uuid,
> > @@ -1832,6 +1835,12 @@ local_recovery(const struct tt_uuid *instance_uuid,
> >  	if (wal_dir_lock >= 0) {
> >  		box_listen();
> >  		box_sync_replication(replication_connect_timeout, false);
> > +
> > +		struct replica *master;
> > +		if (replicaset_needs_rejoin(&master)) {
> > +			say_info("replica is too old, initiating rejoin");
> > +			return bootstrap_from_master(master);
> 
> This is say_crit() IMHO.
> 
> > +bool
> > +replicaset_needs_rejoin(struct replica **master)
> > +{
> > +	replicaset_foreach(replica) {
> > +		/*
> > +		 * Rebootstrap this instance from a master if:
> > +		 * - the oldest vclock stored on the master is greater
> > +		 *   than or incomparable with the instance vclock
> > +		 *   (so that the instance can't follow the master) and
> > +		 * - the instance is strictly behind the master (so
> > +		 *   that we won't lose any data by rebootstrapping
> > +		 *   this instance)
> > +		 */
> > +		struct applier *applier = replica->applier;
> > +		if (applier != NULL &&
> > +		    vclock_compare(&applier->remote_status.gc_vclock,
> > +				   &replicaset.vclock) > 0 &&
> > +		    vclock_compare(&replicaset.vclock,
> > +				   &applier->remote_status.vclock) < 0) {
> > +			*master = replica;
> > +			return true;
> 
> I'd love to see a bit more clarity in the log about this decision
> making process. Imagine this function returns 'false' because
> vclocks are incomparable and then replication breaks - it would be
> very hard to diagnose why this happened. You could add some
> logging to this function, but this would change its contract, since
> currently this function has no side effects.
> 
> Should it set the diagnostics area in case of error? Log the
> error? Return an extra status code? Please feel free to choose the
> option you think is best.

What about this?

: 2018-07-19 12:53:45.195 [19455] main/101/replica I> can't follow [::1]:47383: required {1: 8} available {1: 12}
: 2018-07-19 12:53:45.195 [19455] main/101/replica C> replica is too old, initiating rebootstrap
: 2018-07-19 12:53:45.195 [19455] main/101/replica I> bootstrapping replica from [::1]:47383

and

: 2018-07-19 12:53:46.546 [19493] main/101/replica I> can't follow [::1]:47383: required {1: 17, 2: 1} available {1: 20}
: 2018-07-19 12:53:46.546 [19493] main/101/replica I> can't rebootstrap from [::1]:47383: replica has local rows: local {1: 17, 2: 1} remote {1: 23}
: 2018-07-19 12:53:46.546 [19493] main/101/replica I> recovery start

The diff is below. Note, apart from logging I also changed the logic
of the rebootstrap trigger a bit: now we proceed to rebootstrap only
if none of the configured masters has rows needed by the instance.

I haven't pushed the patch to the branch yet. If you're okay with it,
I'll squash it into the original patch, rebase the whole series, and
then update the remote branch.

diff --git a/src/box/box.cc b/src/box/box.cc
index baf30fce..7d752c59 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1838,7 +1838,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
 
 		struct replica *master;
 		if (replicaset_needs_rejoin(&master)) {
-			say_info("replica is too old, initiating rejoin");
+			say_crit("replica is too old, initiating rebootstrap");
 			return bootstrap_from_master(master);
 		}
 	}
diff --git a/src/box/replication.cc b/src/box/replication.cc
index d61a984f..3ea4f578 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -41,6 +41,7 @@
 #include "error.h"
 #include "relay.h"
 #include "vclock.h" /* VCLOCK_MAX */
+#include "sio.h"
 
 uint32_t instance_id = REPLICA_ID_NIL;
 struct tt_uuid INSTANCE_UUID;
@@ -628,28 +629,58 @@ error:
 bool
 replicaset_needs_rejoin(struct replica **master)
 {
+	struct replica *leader = NULL;
 	replicaset_foreach(replica) {
-		/*
-		 * Rebootstrap this instance from a master if:
-		 * - the oldest vclock stored on the master is greater
-		 *   than or incomparable with the instance vclock
-		 *   (so that the instance can't follow the master) and
-		 * - the instance is strictly behind the master (so
-		 *   that we won't lose any data by rebootstrapping
-		 *   this instance)
-		 */
 		struct applier *applier = replica->applier;
-		if (applier != NULL &&
-		    vclock_compare(&applier->remote_status.gc_vclock,
-				   &replicaset.vclock) > 0 &&
-		    vclock_compare(&replicaset.vclock,
-				   &applier->remote_status.vclock) < 0) {
-			*master = replica;
-			return true;
+		if (applier == NULL)
+			continue;
+
+		const struct status *status = &applier->remote_status;
+		if (vclock_compare(&status->gc_vclock, &replicaset.vclock) < 0) {
+			/*
+			 * There's at least one master that still stores
+			 * WALs needed by this instance. Proceed to local
+			 * recovery.
+			 */
+			return false;
 		}
+
+		const char *addr_str = sio_strfaddr(&applier->addr,
+						applier->addr_len);
+		char *local_vclock_str = vclock_to_string(&replicaset.vclock);
+		char *remote_vclock_str = vclock_to_string(&status->vclock);
+		char *gc_vclock_str = vclock_to_string(&status->gc_vclock);
+
+		say_info("can't follow %s: required %s available %s",
+			 addr_str, local_vclock_str, gc_vclock_str);
+
+		if (vclock_compare(&replicaset.vclock, &status->vclock) > 0) {
+			/*
+			 * Replica has some rows that are not present on
+			 * the master. Don't rebootstrap as we don't want
+			 * to lose any data.
+			 */
+			say_info("can't rebootstrap from %s: "
+				 "replica has local rows: local %s remote %s",
+				 addr_str, local_vclock_str, remote_vclock_str);
+			goto next;
+		}
+
+		/* Prefer a master with the max vclock. */
+		if (leader == NULL ||
+		    vclock_sum(&applier->remote_status.vclock) >
+		    vclock_sum(&leader->applier->remote_status.vclock))
+			leader = replica;
+next:
+		free(local_vclock_str);
+		free(remote_vclock_str);
+		free(gc_vclock_str);
 	}
-	*master = NULL;
-	return false;
+	if (leader == NULL)
+		return false;
+
+	*master = leader;
+	return true;
 }
 
 void



More information about the Tarantool-patches mailing list