Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v4 0/4] replication: fix local space tracking
@ 2020-03-27 10:20 Serge Petrenko
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 1/4] vclock: add an ability to reset individual clock components Serge Petrenko
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Serge Petrenko @ 2020-03-27 10:20 UTC (permalink / raw)
  To: v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

https://github.com/tarantool/tarantool/issues/4114
https://github.com/tarantool/tarantool/tree/sp/gh-4114-local-space-replication

Changes in v4:
 - review fixes as per review from Kostja

Changes in v3:
 - rewrite the patches re wal gc rework
   to avoid matrix clock.

Changes in v2:
 - used Georgy's patches re introduction of
   matrix clock and wal gc rework to fix the
   problem with ordering gc consumers by
   vclock signature

Serge Petrenko (4):
  vclock: add an ability to reset individual clock components
  replication: hide 0-th vclock components in replication responses
  gc: rely on minimal vclock components instead of signatures
  box: start counting local space requests separately

 src/box/applier.cc                            |   5 +-
 src/box/box.cc                                |  32 ++++-
 src/box/gc.c                                  |  41 +++---
 src/box/recovery.cc                           |   8 +-
 src/box/relay.cc                              |  23 ++--
 src/box/vclock.c                              |  15 +++
 src/box/vclock.h                              |  91 +++++++++++++
 src/box/wal.c                                 |  16 ++-
 test/replication/anon.result                  |   5 +
 test/replication/anon.test.lua                |   2 +
 test/replication/autobootstrap.result         |   6 +
 test/replication/autobootstrap.test.lua       |   2 +
 test/replication/before_replace.result        |   8 +-
 test/replication/before_replace.test.lua      |   4 +-
 .../gh-4114-local-space-replication.result    | 125 ++++++++++++++++++
 .../gh-4114-local-space-replication.test.lua  |  48 +++++++
 test/replication/local_spaces.result          |   4 +
 test/replication/local_spaces.test.lua        |   3 +
 test/replication/misc.result                  |   6 +
 test/replication/misc.test.lua                |   2 +
 test/replication/quorum.result                |   6 +
 test/replication/quorum.test.lua              |   2 +
 test/replication/replica_rejoin.result        |   9 ++
 test/replication/replica_rejoin.test.lua      |   3 +
 test/replication/skip_conflict_row.result     |   3 +
 test/replication/skip_conflict_row.test.lua   |   1 +
 test/replication/suite.cfg                    |   1 +
 test/vinyl/errinj.result                      |   5 +
 test/vinyl/errinj.test.lua                    |   4 +
 29 files changed, 440 insertions(+), 40 deletions(-)
 create mode 100644 test/replication/gh-4114-local-space-replication.result
 create mode 100644 test/replication/gh-4114-local-space-replication.test.lua

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v4 1/4] vclock: add an ability to reset individual clock components
  2020-03-27 10:20 [Tarantool-patches] [PATCH v4 0/4] replication: fix local space tracking Serge Petrenko
@ 2020-03-27 10:20 ` Serge Petrenko
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses Serge Petrenko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Serge Petrenko @ 2020-03-27 10:20 UTC (permalink / raw)
  To: v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

Anonymous replicas use 0th vclock component to sign local rows.
vclock_reset will allow to zero-out 0th vclock component when an
anonymous replica is promoted to a normal one and sends out its vclock
to other joining instances, as to not pollute their own 0th vclock
component.

Follow-up #3186
Prerequisite #4114
---
 src/box/vclock.c | 15 +++++++++++++++
 src/box/vclock.h | 13 +++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/src/box/vclock.c b/src/box/vclock.c
index 90ae27591..ac3e9fccd 100644
--- a/src/box/vclock.c
+++ b/src/box/vclock.c
@@ -37,6 +37,21 @@
 #include "diag.h"
 #include "tt_static.h"
 
+void
+vclock_reset(struct vclock *vclock, uint32_t replica_id, int64_t lsn)
+{
+	assert(lsn >= 0);
+	assert(replica_id < VCLOCK_MAX);
+	vclock->signature -= vclock_get(vclock, replica_id);
+	if (lsn == 0) {
+		vclock->map &= ~(1 << replica_id);
+		return;
+	}
+	vclock->lsn[replica_id] = lsn;
+	vclock->map |= 1 << replica_id;
+	vclock->signature += lsn;
+}
+
 int64_t
 vclock_follow(struct vclock *vclock, uint32_t replica_id, int64_t lsn)
 {
diff --git a/src/box/vclock.h b/src/box/vclock.h
index 79e5a1bc0..aeb8ca66b 100644
--- a/src/box/vclock.h
+++ b/src/box/vclock.h
@@ -211,6 +211,19 @@ vclock_calc_sum(const struct vclock *vclock)
 	return sum;
 }
 
+/**
+ * Set vclock component represented by replica id to the desired
+ * value. Can be used to decrease stored LSN value for the given
+ * replica id while maintaining a valid signature or in the same
+ * manner as vclock_follow.
+ *
+ * @param vclock Vector clock.
+ * @param replica_id Replica identifier.
+ * @param lsn Lsn to set
+ */
+void
+vclock_reset(struct vclock *vclock, uint32_t replica_id, int64_t lsn);
+
 static inline int64_t
 vclock_sum(const struct vclock *vclock)
 {
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses
  2020-03-27 10:20 [Tarantool-patches] [PATCH v4 0/4] replication: fix local space tracking Serge Petrenko
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 1/4] vclock: add an ability to reset individual clock components Serge Petrenko
@ 2020-03-27 10:20 ` Serge Petrenko
  2020-03-28  5:57   ` Konstantin Osipov
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures Serge Petrenko
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately Serge Petrenko
  3 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-03-27 10:20 UTC (permalink / raw)
  To: v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

If an anonymous replica is promoted to a normal one and becomes
replication master later, its vclock contains a non-empty zero
component, tracking local changes on this replica from the time when it
had been anonymous. No need to pollute joining instance's vclock with
our non-empty 0 component.
When an anonymous replica reports its status to a remote instance it
should also hide its 0-th vclock component.

This is needed for backward compatibility with old instances, which
don't ignore 0th vclock component coming from a remote instance by
default.
Also make sure that new instances ignore 0th vclock component.

Follow-up #3186
Prerequisite #4114
---
 src/box/applier.cc             |  5 ++++-
 src/box/box.cc                 | 12 ++++++++++--
 src/box/relay.cc               |  8 ++++++--
 test/replication/anon.result   |  5 +++++
 test/replication/anon.test.lua |  2 ++
 5 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 47a26c366..2e765ca12 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -173,7 +173,10 @@ applier_writer_f(va_list ap)
 			continue;
 		try {
 			struct xrow_header xrow;
-			xrow_encode_vclock(&xrow, &replicaset.vclock);
+			struct vclock vclock;
+			vclock_copy(&vclock, &replicaset.vclock);
+			vclock_reset(&vclock, 0, 0);
+			xrow_encode_vclock(&xrow, &vclock);
 			coio_write_xrow(&io, &xrow);
 		} catch (SocketError *e) {
 			/*
diff --git a/src/box/box.cc b/src/box/box.cc
index 765d64678..87e836432 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1540,6 +1540,7 @@ box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header)
 	/* Remember master's vclock after the last request */
 	struct vclock stop_vclock;
 	vclock_copy(&stop_vclock, &replicaset.vclock);
+	vclock_reset(&stop_vclock, 0, 0);
 
 	/* Send end of snapshot data marker */
 	struct xrow_header row;
@@ -1621,7 +1622,9 @@ box_process_register(struct ev_io *io, struct xrow_header *header)
 
 	struct xrow_header row;
 	/* Send end of WAL stream marker */
-	xrow_encode_vclock_xc(&row, &replicaset.vclock);
+	vclock_copy(&vclock, &replicaset.vclock);
+	vclock_reset(&vclock, 0, 0);
+	xrow_encode_vclock_xc(&row, &vclock);
 	row.sync = header->sync;
 	coio_write_xrow(io, &row);
 
@@ -1773,7 +1776,11 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 	say_info("final data sent.");
 
 	/* Send end of WAL stream marker */
-	xrow_encode_vclock_xc(&row, &replicaset.vclock);
+	struct vclock vclock;
+	vclock_copy(&vclock, &replicaset.vclock);
+	vclock_reset(&vclock, 0, 0);
+	xrow_encode_vclock_xc(&row, &vclock);
+
 	row.sync = header->sync;
 	coio_write_xrow(io, &row);
 
@@ -1905,6 +1912,7 @@ box_process_vote(struct ballot *ballot)
 	ballot->is_loading = is_ro;
 	vclock_copy(&ballot->vclock, &replicaset.vclock);
 	vclock_copy(&ballot->gc_vclock, &gc.vclock);
+	vclock_reset(&ballot->gc_vclock, 0, 0);
 }
 
 /** Insert a new cluster into _schema */
diff --git a/src/box/relay.cc b/src/box/relay.cc
index c634348a4..9abf35fdf 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -322,7 +322,11 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
 
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
-	xrow_encode_vclock_xc(&row, vclock);
+	struct vclock subst_vclock;
+	vclock_copy(&subst_vclock, vclock);
+	/* zero - out 0-th vclock component. */
+	vclock_reset(&subst_vclock, 0, 0);
+	xrow_encode_vclock_xc(&row, &subst_vclock);
 	row.sync = sync;
 	coio_write_xrow(&relay->io, &row);
 
@@ -464,7 +468,7 @@ relay_schedule_pending_gc(struct relay *relay, const struct vclock *vclock)
 		 * the greater signatures is due to changes pulled
 		 * from other members of the cluster.
 		 */
-		if (vclock_compare(&curr->vclock, vclock) > 0)
+		if (vclock_compare_ignore0(&curr->vclock, vclock) > 0)
 			break;
 		stailq_shift(&relay->pending_gc);
 		free(gc_msg);
diff --git a/test/replication/anon.result b/test/replication/anon.result
index 88061569f..cbbeeef09 100644
--- a/test/replication/anon.result
+++ b/test/replication/anon.result
@@ -187,6 +187,11 @@ a > 0
  | ---
  | - true
  | ...
+-- 0-th vclock component isn't propagated across the cluster.
+box.info.vclock[0]
+ | ---
+ | - null
+ | ...
 test_run:cmd('switch default')
  | ---
  | - true
diff --git a/test/replication/anon.test.lua b/test/replication/anon.test.lua
index 8a8d15c18..627dc5c8e 100644
--- a/test/replication/anon.test.lua
+++ b/test/replication/anon.test.lua
@@ -66,6 +66,8 @@ test_run:cmd('switch replica_anon2')
 a = box.info.vclock[1]
 -- The instance did fetch a snapshot.
 a > 0
+-- 0-th vclock component isn't propagated across the cluster.
+box.info.vclock[0]
 test_run:cmd('switch default')
 box.space.test:insert{2}
 test_run:cmd("switch replica_anon2")
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures
  2020-03-27 10:20 [Tarantool-patches] [PATCH v4 0/4] replication: fix local space tracking Serge Petrenko
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 1/4] vclock: add an ability to reset individual clock components Serge Petrenko
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses Serge Petrenko
@ 2020-03-27 10:20 ` Serge Petrenko
  2020-03-28  6:03   ` Konstantin Osipov
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately Serge Petrenko
  3 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-03-27 10:20 UTC (permalink / raw)
  To: v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

The current WAL GC implementation tracks consumers (i.e. remote replicas) by
their vclock signature, which is the sum of all vclock components.
This approach is wrong, and this can be shown by a little example.
The example will be a little synthetic, but it'll illustrate the problem.
Say, you have 2 masters, A and B with ids 1 and 2 respectively, and a replica C
with id 3. Say, С replicates from both A and B, and there is no replication
between A and B (say, the instances were reconfigured to not replicate from each
other). Now, say replica C has followed A and B to vclock {1:5, 2:13}. At the
same time, A has lsn 10 and B has lsn 15. A and B do not know about each other’s
changes, so A’s vclock is {1:10} and B’s vclock is {2:15}. Now imagine A does a
snapshot and creates a new xlog with signature 10. A’s directory will look like:
00…000.xlog 00…010.snap 00….010.xlog
Replica C reports its vclock {1:5, 2:13} to A, A uses the vclock to update the
corresponding GC consumer. Since signatures are used, GC consumer is assigned a
signature = 13 + 5 = 18. This is greater than the signature of the last xlog on
A (which is 10), so the previous xlog (00…00.xlog) can be deleted (at least A
assumes it can be). Actually, replica still needs 00…00.xlog, because it
contains rows corresponding to vclocks {1:6} - {1:10}, which haven’t been
replicated yet.

If instead of using vclock signatures, gc consumers used vclocks, such a problem
wouldn’t arise. Replia would report its vclock {1:5, 2:13}. The vclock is NOT
strictly greater than A’s most recent xlog vclock ({1:10}), so the previous log
is kept until replica reports a vclock {1:10, 2:something} or {1:11, …} and so
on.

Rewrite gc to perform cleanup based on finding minimal vclock
components present in at least one of the consumer vclocks instead of
just comparing vclock signatures.

Prerequisite #4114
---
 src/box/gc.c     | 41 +++++++++++++++----------
 src/box/vclock.h | 78 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+), 15 deletions(-)

diff --git a/src/box/gc.c b/src/box/gc.c
index f5c387f9d..4eae6ef3b 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -66,16 +66,15 @@ static int
 gc_checkpoint_fiber_f(va_list);
 
 /**
- * Comparator used for ordering gc_consumer objects by signature
- * in a binary tree.
+ * Comparator used for ordering gc_consumer objects
+ * lexicographically by their vclock in a binary tree.
  */
 static inline int
 gc_consumer_cmp(const struct gc_consumer *a, const struct gc_consumer *b)
 {
-	if (vclock_sum(&a->vclock) < vclock_sum(&b->vclock))
-		return -1;
-	if (vclock_sum(&a->vclock) > vclock_sum(&b->vclock))
-		return 1;
+	int rc = vclock_lex_compare(&a->vclock, &b->vclock);
+	if (rc != 0)
+		return rc;
 	if ((intptr_t)a < (intptr_t)b)
 		return -1;
 	if ((intptr_t)a > (intptr_t)b)
@@ -192,14 +191,26 @@ gc_run_cleanup(void)
 	 * Note, we must keep all WALs created after the
 	 * oldest checkpoint, even if no consumer needs them.
 	 */
-	const struct vclock *vclock = (gc_tree_empty(&gc.consumers) ? NULL :
-				       &gc_tree_first(&gc.consumers)->vclock);
-	if (vclock == NULL ||
-	    vclock_sum(vclock) > vclock_sum(&checkpoint->vclock))
-		vclock = &checkpoint->vclock;
-
-	if (vclock_sum(vclock) > vclock_sum(&gc.vclock)) {
-		vclock_copy(&gc.vclock, vclock);
+	struct vclock min_vclock;
+	struct gc_consumer *consumer = gc_tree_first(&gc.consumers);
+	/*
+	 * Vclock of the oldest WAL row to keep is a by-component
+	 * minimum of all consumer vclocks and the oldest
+	 * checkpoint vclock. This ensures that all rows needed by
+	 * at least one consumer are kept.
+	 */
+	vclock_copy(&min_vclock, &checkpoint->vclock);
+	while (consumer != NULL) {
+		/*
+		 * Consumers will never need rows signed
+		 * with a zero instance id (local rows).
+		 */
+		vclock_min_ignore0(&min_vclock, &consumer->vclock);
+		consumer = gc_tree_next(&gc.consumers, consumer);
+	}
+
+	if (vclock_sum(&min_vclock) > vclock_sum(&gc.vclock)) {
+		vclock_copy(&gc.vclock, &min_vclock);
 		run_wal_gc = true;
 	}
 
@@ -222,7 +233,7 @@ gc_run_cleanup(void)
 	if (run_engine_gc)
 		engine_collect_garbage(&checkpoint->vclock);
 	if (run_wal_gc)
-		wal_collect_garbage(vclock);
+		wal_collect_garbage(&min_vclock);
 }
 
 static int
diff --git a/src/box/vclock.h b/src/box/vclock.h
index aeb8ca66b..722a8d6c0 100644
--- a/src/box/vclock.h
+++ b/src/box/vclock.h
@@ -337,6 +337,84 @@ vclock_compare_ignore0(const struct vclock *a, const struct vclock *b)
 	return vclock_compare_generic(a, b, true);
 }
 
+/**
+ * Compare vclocks lexicographically.
+ * The following affirmation holds: if a < b in terms of
+ * vclock_compare, then a < b in terms of vclock_lex_compare.
+ * However, a < b in terms of vclock_lex_compare doesn't mean
+ * than a < b in terms of vclock_compare. Example:
+ * {1:5, 2:10} < {1:7, 2:3} (vclock_lex_compare), but the vclocks
+ * are incomparable in terms of vclock_compare.
+ * All vclocks are comparable lexicographically.
+ *
+ * \param a vclock
+ * \param b vclock
+ * \retval 1 if \a a is lexicographically greater than \a b
+ * \retval -1 if \a a is lexicographically less than \a b
+ * \retval 0 if vclocks are equal
+ */
+static inline int
+vclock_lex_compare(const struct vclock *a, const struct vclock *b)
+{
+	vclock_map_t map = a->map | b->map;
+	struct bit_iterator it;
+	bit_iterator_init(&it, &map, sizeof(map), true);
+	for(size_t replica_id = bit_iterator_next(&it); replica_id < VCLOCK_MAX;
+	    replica_id = bit_iterator_next(&it)) {
+		int64_t lsn_a = vclock_get(a, replica_id);
+		int64_t lsn_b = vclock_get(b, replica_id);
+		if (lsn_a < lsn_b) return -1;
+		if (lsn_a > lsn_b) return 1;
+	}
+	return 0;
+}
+
+/**
+ * Return a vclock, which is a componentwise minimum between
+ * vclocks \a a and \a b.
+ *
+ * \param[in,out] a vclock containing the minimum components.
+ * \param b vclock to compare with.
+ * \param ignore_zero whether to ignore 0th vclock component
+ *                    or not.
+ */
+static inline void
+vclock_min_generic(struct vclock *a, const struct vclock *b, bool ignore_zero)
+{
+	vclock_map_t map = a->map | b->map;
+	struct bit_iterator it;
+	bit_iterator_init(&it, &map, sizeof(map), true);
+	size_t replica_id = bit_iterator_next(&it);
+	if (replica_id == 0 && ignore_zero)
+		replica_id = bit_iterator_next(&it);
+
+	for( ; replica_id < VCLOCK_MAX; replica_id = bit_iterator_next(&it)) {
+		int64_t lsn_a = vclock_get(a, replica_id);
+		int64_t lsn_b = vclock_get(b, replica_id);
+		if (lsn_a <= lsn_b)
+			continue;
+		vclock_reset(a, replica_id, lsn_b);
+	}
+}
+
+/**
+ * \sa vclock_min_generic
+ */
+static inline void
+vclock_min(struct vclock *a, const struct vclock *b)
+{
+	return vclock_min_generic(a, b, false);
+}
+
+/**
+ * \sa vclock_min_generic
+ */
+static inline void
+vclock_min_ignore0(struct vclock *a, const struct vclock *b)
+{
+	return vclock_min_generic(a, b, true);
+}
+
 /**
  * @brief vclockset - a set of vclocks
  */
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately
  2020-03-27 10:20 [Tarantool-patches] [PATCH v4 0/4] replication: fix local space tracking Serge Petrenko
                   ` (2 preceding siblings ...)
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures Serge Petrenko
@ 2020-03-27 10:20 ` Serge Petrenko
  2020-03-28  6:17   ` Konstantin Osipov
  2020-03-28 16:23   ` Konstantin Osipov
  3 siblings, 2 replies; 14+ messages in thread
From: Serge Petrenko @ 2020-03-27 10:20 UTC (permalink / raw)
  To: v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

Sign local space requests with a zero instance id. This allows to split
local changes aside from the changes, which should be visible to the whole
cluster, and stop sending NOPs to replicas to follow local vclock.

Moreover, it fixes the following bug with local spaces and replication.
In a situation when there are a master and a replica set up, replica may
still write to local spaces even if it's read-only. Local space
operations used to promote instance's lsn before this patch. Eventually,
master would have vclock {1:x} and replica'd have vclock {1:x, 2:y},
where y > 0, due to local space requests performed by the replica.
If a snapshot happens on replica side, replica will delete it's .xlog
files prior to the snapshot, since no one replicates from it and thus it
doesn't have any registered GC consumers.
From this point, every attempt to configure replication from replica to
master will fail, since master will try to fetch records which account
for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y},
even though master will never need the rows in range {2:1} - {2:y},
since they'll be turned to NOPs during replication.

Starting from this patch, in the situation described above, replica's
clock will be {0:y, 1:x}, and, since local space requests are now not
replicated at all, master will be able to follow replica, turning the
configuration to master-master.

Closes #4114
---
 src/box/box.cc                                |  20 +++
 src/box/recovery.cc                           |   8 +-
 src/box/relay.cc                              |  15 +--
 src/box/wal.c                                 |  16 ++-
 test/replication/autobootstrap.result         |   6 +
 test/replication/autobootstrap.test.lua       |   2 +
 test/replication/before_replace.result        |   8 +-
 test/replication/before_replace.test.lua      |   4 +-
 .../gh-4114-local-space-replication.result    | 125 ++++++++++++++++++
 .../gh-4114-local-space-replication.test.lua  |  48 +++++++
 test/replication/local_spaces.result          |   4 +
 test/replication/local_spaces.test.lua        |   3 +
 test/replication/misc.result                  |   6 +
 test/replication/misc.test.lua                |   2 +
 test/replication/quorum.result                |   6 +
 test/replication/quorum.test.lua              |   2 +
 test/replication/replica_rejoin.result        |   9 ++
 test/replication/replica_rejoin.test.lua      |   3 +
 test/replication/skip_conflict_row.result     |   3 +
 test/replication/skip_conflict_row.test.lua   |   1 +
 test/replication/suite.cfg                    |   1 +
 test/vinyl/errinj.result                      |   5 +
 test/vinyl/errinj.test.lua                    |   4 +
 23 files changed, 281 insertions(+), 20 deletions(-)
 create mode 100644 test/replication/gh-4114-local-space-replication.result
 create mode 100644 test/replication/gh-4114-local-space-replication.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 87e836432..85c6cd72a 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	say_info("remote vclock %s local vclock %s",
 		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
 
+	/*
+	 * Replica clock is used in gc state and recovery
+	 * initialization, so we need to replace the remote 0-th
+	 * component with our own one. This makes recovery work
+	 * correctly: we're trying to recover from an xlog whose
+	 * vclock is smaller than remote replica clock in each
+	 * component, exluding the 0-th component. This leads to
+	 * finding the correct WAL, if it exists, since we do not
+	 * need to recover local rows (the ones, that contribute
+	 * to the 0-th vclock component). It would be bad to set
+	 * 0-th vclock component to a smaller value, since it
+	 * would unnecessarily require additional WALs, which may
+	 * have already been deleted.
+	 * Same stands for gc. Remote instances do not need this
+	 * instance's local rows, and after gc was reworked to
+	 * track individual vclock components instead of
+	 * signatures it's safe to set the local component to the
+	 * most recent value.
+	 */
+	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));
 	/*
 	 * Process SUBSCRIBE request via replication relay
 	 * Send current recovery vector clock as a marker
diff --git a/src/box/recovery.cc b/src/box/recovery.cc
index 64aa467b1..d1a503cfc 100644
--- a/src/box/recovery.cc
+++ b/src/box/recovery.cc
@@ -261,11 +261,9 @@ recover_xlog(struct recovery *r, struct xstream *stream,
 			continue; /* already applied, skip */
 
 		/*
-		 * All rows in xlog files have an assigned
-		 * replica id. The only exception is anonymous
-		 * replica, which has a zero instance id.
-		 * In this case the only rows from such an instance
-		 * can be for the local spaces.
+		 * All rows in xlog files have an assigned replica
+		 * id. The only exception are local rows, which
+		 * are signed with a zero replica id.
 		 */
 		assert(row.replica_id != 0 || row.group_id == GROUP_LOCAL);
 		/*
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 9abf35fdf..aa17528bb 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -760,16 +760,15 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
 {
 	struct relay *relay = container_of(stream, struct relay, stream);
 	assert(iproto_type_is_dml(packet->type));
-	/*
-	 * Transform replica local requests to IPROTO_NOP so as to
-	 * promote vclock on the replica without actually modifying
-	 * any data.
-	 */
 	if (packet->group_id == GROUP_LOCAL) {
 		/*
-		 * Replica-local requests generated while replica
-		 * was anonymous have a zero instance id. Just
-		 * skip all these rows.
+		 * We do not relay replica-local rows to other
+		 * instances, since we started signing them with
+		 * a zero instance id. However, if replica-local
+		 * rows, signed with a non-zero id are present in
+		 * our WAL, we still need to relay them as NOPs in
+		 * order to correctly promote the vclock on the
+		 * replica.
 		 */
 		if (packet->replica_id == REPLICA_ID_NIL)
 			return;
diff --git a/src/box/wal.c b/src/box/wal.c
index 3b094b0e8..a74bdecd9 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -953,13 +953,19 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 	/** Assign LSN to all local rows. */
 	for ( ; row < end; row++) {
 		if ((*row)->replica_id == 0) {
-			(*row)->lsn = vclock_inc(vclock_diff, instance_id) +
-				      vclock_get(base, instance_id);
 			/*
-			 * Note, an anonymous replica signs local
-			 * rows whith a zero instance id.
+			 * All rows representing local space data
+			 * manipulations are signed wth a zero
+			 * instance id. This is also true for
+			 * anonymous replicas, since they are
+			 * only capable of writing to local and
+			 * temporary spaces.
 			 */
-			(*row)->replica_id = instance_id;
+			if ((*row)->group_id != GROUP_LOCAL)
+				(*row)->replica_id = instance_id;
+
+			(*row)->lsn = vclock_inc(vclock_diff, (*row)->replica_id) +
+						 vclock_get(base, (*row)->replica_id);
 			/* Use lsn of the first local row as transaction id. */
 			tsn = tsn == 0 ? (*row)->lsn : tsn;
 			(*row)->tsn = tsn;
diff --git a/test/replication/autobootstrap.result b/test/replication/autobootstrap.result
index 743982d47..6918e23ea 100644
--- a/test/replication/autobootstrap.result
+++ b/test/replication/autobootstrap.result
@@ -162,6 +162,9 @@ box.schema.user.revoke('test_u', 'create', 'space')
 vclock = test_run:get_vclock('autobootstrap1')
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock("autobootstrap2", vclock)
 ---
 ...
@@ -206,6 +209,9 @@ test_run:wait_fullmesh(SERVERS)
 vclock = test_run:get_vclock("autobootstrap1")
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock("autobootstrap2", vclock)
 ---
 ...
diff --git a/test/replication/autobootstrap.test.lua b/test/replication/autobootstrap.test.lua
index 055ea4277..f8bb1c74a 100644
--- a/test/replication/autobootstrap.test.lua
+++ b/test/replication/autobootstrap.test.lua
@@ -74,6 +74,7 @@ box.schema.user.revoke('test_u', 'create', 'space')
 
 -- Synchronize
 vclock = test_run:get_vclock('autobootstrap1')
+vclock[0] = nil
 _ = test_run:wait_vclock("autobootstrap2", vclock)
 _ = test_run:wait_vclock("autobootstrap3", vclock)
 
@@ -95,6 +96,7 @@ _ = test_run:cmd("switch default")
 test_run:wait_fullmesh(SERVERS)
 
 vclock = test_run:get_vclock("autobootstrap1")
+vclock[0] = nil
 _ = test_run:wait_vclock("autobootstrap2", vclock)
 _ = test_run:wait_vclock("autobootstrap3", vclock)
 
diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result
index ced40547e..61a552e84 100644
--- a/test/replication/before_replace.result
+++ b/test/replication/before_replace.result
@@ -266,7 +266,13 @@ box.space.test:replace{1, 1}
 ---
 - [1, 1]
 ...
-_ = test_run:wait_vclock('replica', test_run:get_vclock('default'))
+vclock = test_run:get_vclock('default')
+---
+...
+vclock[0] = nil
+---
+...
+_ = test_run:wait_vclock('replica', vclock)
 ---
 ...
 -- Check that replace{1, 2} coming from the master was suppressed
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index bcc6dc00d..ecd8ff044 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -101,7 +101,9 @@ _ = box.space.test:before_replace(function(old, new) return new:update{{'+', 2,
 test_run:cmd("switch default")
 box.space.test:replace{1, 1}
 
-_ = test_run:wait_vclock('replica', test_run:get_vclock('default'))
+vclock = test_run:get_vclock('default')
+vclock[0] = nil
+_ = test_run:wait_vclock('replica', vclock)
 
 -- Check that replace{1, 2} coming from the master was suppressed
 -- by the before_replace trigger on the replica.
diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
new file mode 100644
index 000000000..e524c9a1b
--- /dev/null
+++ b/test/replication/gh-4114-local-space-replication.result
@@ -0,0 +1,125 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+
+--
+-- gh-4114. Account local space changes in a separate vclock
+-- component. Do not replicate local space changes, even as NOPs.
+--
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+_ = box.schema.space.create('test', {is_local=true})
+ | ---
+ | ...
+_ = box.space.test:create_index("pk")
+ | ---
+ | ...
+
+test_run:cmd('create server replica with rpl_master=default, script "replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+a = box.info.vclock[0] or 0
+ | ---
+ | ...
+for i = 1,10 do box.space.test:insert{i} end
+ | ---
+ | ...
+box.info.vclock[0] == a + 10 or box.info.vclock[0] - a
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.info.vclock[0]
+ | ---
+ | - null
+ | ...
+box.cfg{checkpoint_count=1}
+ | ---
+ | ...
+box.space.test:select{}
+ | ---
+ | - []
+ | ...
+box.space.test:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.test:insert{2}
+ | ---
+ | - [2]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.test:insert{3}
+ | ---
+ | - [3]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+box.info.vclock[0]
+ | ---
+ | - 3
+ | ...
+
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('set variable repl_source to "replica.listen"')
+ | ---
+ | - true
+ | ...
+
+box.cfg{replication=repl_source}
+ | ---
+ | ...
+test_run:wait_cond(function()\
+                       return box.info.replication[2].upstream and\
+                       box.info.replication[2].upstream.status == 'follow'\
+                   end,\
+                   10)
+ | ---
+ | - true
+ | ...
+
+-- Cleanup.
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.space.test:drop()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/gh-4114-local-space-replication.test.lua b/test/replication/gh-4114-local-space-replication.test.lua
new file mode 100644
index 000000000..26dccee68
--- /dev/null
+++ b/test/replication/gh-4114-local-space-replication.test.lua
@@ -0,0 +1,48 @@
+env = require('test_run')
+test_run = env.new()
+
+--
+-- gh-4114. Account local space changes in a separate vclock
+-- component. Do not replicate local space changes, even as NOPs.
+--
+
+box.schema.user.grant('guest', 'replication')
+_ = box.schema.space.create('test', {is_local=true})
+_ = box.space.test:create_index("pk")
+
+test_run:cmd('create server replica with rpl_master=default, script "replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+a = box.info.vclock[0] or 0
+for i = 1,10 do box.space.test:insert{i} end
+box.info.vclock[0] == a + 10 or box.info.vclock[0] - a
+
+test_run:cmd('switch replica')
+box.info.vclock[0]
+box.cfg{checkpoint_count=1}
+box.space.test:select{}
+box.space.test:insert{1}
+box.snapshot()
+box.space.test:insert{2}
+box.snapshot()
+box.space.test:insert{3}
+box.snapshot()
+
+box.info.vclock[0]
+
+test_run:cmd('switch default')
+
+test_run:cmd('set variable repl_source to "replica.listen"')
+
+box.cfg{replication=repl_source}
+test_run:wait_cond(function()\
+                       return box.info.replication[2].upstream and\
+                       box.info.replication[2].upstream.status == 'follow'\
+                   end,\
+                   10)
+
+-- Cleanup.
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/local_spaces.result b/test/replication/local_spaces.result
index cf2c52010..4855d8a88 100644
--- a/test/replication/local_spaces.result
+++ b/test/replication/local_spaces.result
@@ -288,6 +288,10 @@ _ = s3:insert{3}
 vclock = test_run:get_vclock('default')
 ---
 ...
+-- Ignore 0-th component when waiting. They don't match.
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock('replica', vclock)
 ---
 ...
diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
index 373e2cd20..c5e224030 100644
--- a/test/replication/local_spaces.test.lua
+++ b/test/replication/local_spaces.test.lua
@@ -112,6 +112,9 @@ _ = s1:insert{3}
 _ = s2:insert{3}
 _ = s3:insert{3}
 vclock = test_run:get_vclock('default')
+
+-- Ignore 0-th component when waiting. They don't match.
+vclock[0] = nil
 _ = test_run:wait_vclock('replica', vclock)
 
 test_run:cmd("switch replica")
diff --git a/test/replication/misc.result b/test/replication/misc.result
index b63d72846..e5d1f560e 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -214,6 +214,9 @@ box.space.space1:select{}
 vclock = test_run:get_vclock("autobootstrap1")
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock("autobootstrap2", vclock)
 ---
 ...
@@ -414,6 +417,9 @@ while box.info.replication[2] == nil do fiber.sleep(0.01) end
 vclock = test_run:get_vclock('default')
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock('replica_auth', vclock)
 ---
 ...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index c454a0992..d285b014a 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -88,6 +88,7 @@ c.space.space1:insert{box.NULL, "data"} -- fails, but bumps sequence value
 c.space.space1:insert{box.NULL, 1, "data"}
 box.space.space1:select{}
 vclock = test_run:get_vclock("autobootstrap1")
+vclock[0] = nil
 _ = test_run:wait_vclock("autobootstrap2", vclock)
 test_run:cmd("switch autobootstrap2")
 box.space.space1:select{}
@@ -172,6 +173,7 @@ box.schema.user.grant('cluster', 'replication')
 
 while box.info.replication[2] == nil do fiber.sleep(0.01) end
 vclock = test_run:get_vclock('default')
+vclock[0] = nil
 _ = test_run:wait_vclock('replica_auth', vclock)
 
 test_run:cmd("stop server replica_auth")
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 07abe7f2a..5ef66bf0a 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -325,6 +325,9 @@ space:insert{2}
 vclock = test_run:get_vclock("default")
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock("replica", vclock)
 ---
 ...
@@ -390,6 +393,9 @@ box.cfg{replication = repl}
 vclock = test_run:get_vclock("master_quorum1")
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock("master_quorum2", vclock)
 ---
 ...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 5f2872675..da24f34a0 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -125,6 +125,7 @@ test_run:cmd("switch default")
 box.cfg{listen = listen}
 space:insert{2}
 vclock = test_run:get_vclock("default")
+vclock[0] = nil
 _ = test_run:wait_vclock("replica", vclock)
 test_run:cmd("switch replica")
 box.info.status -- running
@@ -145,6 +146,7 @@ box.cfg{replication = ""}
 box.space.test:insert{1}
 box.cfg{replication = repl}
 vclock = test_run:get_vclock("master_quorum1")
+vclock[0] = nil
 _ = test_run:wait_vclock("master_quorum2", vclock)
 test_run:cmd("switch master_quorum2")
 box.space.test:select()
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index b8ed79f14..dd04ae297 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -144,6 +144,9 @@ for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
 vclock = test_run:get_vclock('default')
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock('replica', vclock)
 ---
 ...
@@ -295,6 +298,9 @@ for i = 1, 10 do box.space.test:replace{2} end
 vclock = test_run:get_vclock('replica')
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock('default', vclock)
 ---
 ...
@@ -345,6 +351,9 @@ for i = 1, 10 do box.space.test:replace{2} end
 vclock = test_run:get_vclock('replica')
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock('default', vclock)
 ---
 ...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 25c0849ec..410ef44d7 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -55,6 +55,7 @@ test_run:cmd("switch default")
 -- Make sure the replica follows new changes.
 for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
 vclock = test_run:get_vclock('default')
+vclock[0] = nil
 _ = test_run:wait_vclock('replica', vclock)
 test_run:cmd("switch replica")
 box.space.test:select()
@@ -109,6 +110,7 @@ box.space.test:replace{1}
 test_run:cmd("switch replica")
 for i = 1, 10 do box.space.test:replace{2} end
 vclock = test_run:get_vclock('replica')
+vclock[0] = nil
 _ = test_run:wait_vclock('default', vclock)
 -- Restart the master and force garbage collection.
 test_run:cmd("switch default")
@@ -126,6 +128,7 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
 test_run:cmd("switch replica")
 for i = 1, 10 do box.space.test:replace{2} end
 vclock = test_run:get_vclock('replica')
+vclock[0] = nil
 _ = test_run:wait_vclock('default', vclock)
 -- Restart the replica. It should successfully rebootstrap.
 test_run:cmd("restart server replica with args='true'")
diff --git a/test/replication/skip_conflict_row.result b/test/replication/skip_conflict_row.result
index 9b2777872..d70ac8e2a 100644
--- a/test/replication/skip_conflict_row.result
+++ b/test/replication/skip_conflict_row.result
@@ -54,6 +54,9 @@ box.info.status
 vclock = test_run:get_vclock('default')
 ---
 ...
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock("replica", vclock)
 ---
 ...
diff --git a/test/replication/skip_conflict_row.test.lua b/test/replication/skip_conflict_row.test.lua
index 2982c730a..04fd08136 100644
--- a/test/replication/skip_conflict_row.test.lua
+++ b/test/replication/skip_conflict_row.test.lua
@@ -19,6 +19,7 @@ space:insert{2}
 box.info.status
 
 vclock = test_run:get_vclock('default')
+vclock[0] = nil
 _ = test_run:wait_vclock("replica", vclock)
 test_run:cmd("switch replica")
 box.info.replication[1].upstream.message
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 90fd53ca6..0907ac17f 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -12,6 +12,7 @@
     "on_schema_init.test.lua": {},
     "long_row_timeout.test.lua": {},
     "join_without_snap.test.lua": {},
+    "gh-4114-local-space-replication.test.lua": {},
     "gh-4402-info-errno.test.lua": {},
     "gh-4605-empty-password.test.lua": {},
     "gh-4606-admin-creds.test.lua": {},
diff --git a/test/vinyl/errinj.result b/test/vinyl/errinj.result
index 2635da265..2bd701f70 100644
--- a/test/vinyl/errinj.result
+++ b/test/vinyl/errinj.result
@@ -1241,6 +1241,11 @@ test_run:cmd("switch default")
 vclock = test_run:get_vclock("default")
 ---
 ...
+-- Ignore 0-th vclock component. They don't match between
+-- replicas.
+vclock[0] = nil
+---
+...
 _ = test_run:wait_vclock("replica", vclock)
 ---
 ...
diff --git a/test/vinyl/errinj.test.lua b/test/vinyl/errinj.test.lua
index 4230cfae3..750d3bfe8 100644
--- a/test/vinyl/errinj.test.lua
+++ b/test/vinyl/errinj.test.lua
@@ -455,6 +455,10 @@ box.cfg{read_only = true}
 
 test_run:cmd("switch default")
 vclock = test_run:get_vclock("default")
+
+-- Ignore 0-th vclock component. They don't match between
+-- replicas.
+vclock[0] = nil
 _ = test_run:wait_vclock("replica", vclock)
 
 test_run:cmd("stop server replica")
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses Serge Petrenko
@ 2020-03-28  5:57   ` Konstantin Osipov
  2020-03-30 11:02     ` Serge Petrenko
  0 siblings, 1 reply; 14+ messages in thread
From: Konstantin Osipov @ 2020-03-28  5:57 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:
> If an anonymous replica is promoted to a normal one and becomes
> replication master later, its vclock contains a non-empty zero
> component, tracking local changes on this replica from the time when it
> had been anonymous. No need to pollute joining instance's vclock with
> our non-empty 0 component.
> When an anonymous replica reports its status to a remote instance it
> should also hide its 0-th vclock component.
> 
> This is needed for backward compatibility with old instances, which
> don't ignore 0th vclock component coming from a remote instance by
> default.
> Also make sure that new instances ignore 0th vclock component.

If you added vclock_compare_ignore0, suggest to introduce and use 
vclock_copy_ignore0 instead of a pair vclock_copy + vclock_reset.

Where was vclock_compare_ignore0 introduced, did I miss the
patch?

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures Serge Petrenko
@ 2020-03-28  6:03   ` Konstantin Osipov
  2020-03-30 11:02     ` Serge Petrenko
  0 siblings, 1 reply; 14+ messages in thread
From: Konstantin Osipov @ 2020-03-28  6:03 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:
> +	struct vclock min_vclock;
> +	struct gc_consumer *consumer = gc_tree_first(&gc.consumers);

The code would be easier to follow if the entire vclock api used
would be ignore0:

> +	/*
> +	 * Vclock of the oldest WAL row to keep is a by-component
> +	 * minimum of all consumer vclocks and the oldest
> +	 * checkpoint vclock. This ensures that all rows needed by
> +	 * at least one consumer are kept.
> +	 */
> +	vclock_copy(&min_vclock, &checkpoint->vclock);

E.g. use vclock_copy_ignore0 here

> +	while (consumer != NULL) {
> +		/*
> +		 * Consumers will never need rows signed
> +		 * with a zero instance id (local rows).
> +		 */
> +		vclock_min_ignore0(&min_vclock, &consumer->vclock);
> +		consumer = gc_tree_next(&gc.consumers, consumer);
> +	}
> +
> +	if (vclock_sum(&min_vclock) > vclock_sum(&gc.vclock)) {

Please use vclock_sum_ignore0
> +		vclock_copy(&gc.vclock, &min_vclock);

Please use vclock_copy_ignore0

The goal is to switch as many places as possible to ignore0 api,
then see the few cases left where we don't, and flip around: 
rename ignore0 api to the default naming scheme, and no-ignore0 to
vlock_copy_local(), vclock_compare_local() vclock_copy_local().

This doesn't have to be part of your patch set, but 
would be nice to get to.

Ideally ignore0 vclock should be a distinct data type
with an explicit conversion to and from non-ignore0 vclock
and no implicit assignment (using vclock_copy already more
or less ensures that).

Other than that you seem to be on track with the patch.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately Serge Petrenko
@ 2020-03-28  6:17   ` Konstantin Osipov
  2020-03-30 11:02     ` Serge Petrenko
  2020-03-28 16:23   ` Konstantin Osipov
  1 sibling, 1 reply; 14+ messages in thread
From: Konstantin Osipov @ 2020-03-28  6:17 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:

One thing I don't understand is how is this extra patch necessary
to close #4114? Aren't the first 3 patches enough?

This is a nice optimization and is perhaps a pre-requisite for a
fix for some anonymous replica bug, but how does it affect vinyl?

> Sign local space requests with a zero instance id. This allows to split
> local changes aside from the changes, which should be visible to the whole
> cluster, and stop sending NOPs to replicas to follow local vclock.
> 
> Moreover, it fixes the following bug with local spaces and replication.
> In a situation when there are a master and a replica set up, replica may
> still write to local spaces even if it's read-only. Local space
> operations used to promote instance's lsn before this patch. Eventually,
> master would have vclock {1:x} and replica'd have vclock {1:x, 2:y},
> where y > 0, due to local space requests performed by the replica.
> If a snapshot happens on replica side, replica will delete it's .xlog
> files prior to the snapshot, since no one replicates from it and thus it
> doesn't have any registered GC consumers.
> From this point, every attempt to configure replication from replica to
> master will fail, since master will try to fetch records which account
> for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y},
> even though master will never need the rows in range {2:1} - {2:y},
> since they'll be turned to NOPs during replication.

I think the issue above is fixed by lexicographical search for min
vclock in gc.

> 
> Starting from this patch, in the situation described above, replica's
> clock will be {0:y, 1:x}, and, since local space requests are now not
> replicated at all, master will be able to follow replica, turning the
> configuration to master-master.


> @@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>  	say_info("remote vclock %s local vclock %s",
>  		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
>  
> +	/*
> +	 * Replica clock is used in gc state and recovery
> +	 * initialization, so we need to replace the remote 0-th
> +	 * component with our own one. This makes recovery work
> +	 * correctly: we're trying to recover from an xlog whose
> +	 * vclock is smaller than remote replica clock in each
> +	 * component, exluding the 0-th component. This leads to
> +	 * finding the correct WAL, if it exists, since we do not
> +	 * need to recover local rows (the ones, that contribute
> +	 * to the 0-th vclock component). It would be bad to set
> +	 * 0-th vclock component to a smaller value, since it
> +	 * would unnecessarily require additional WALs, which may
> +	 * have already been deleted.
> +	 * Same stands for gc. Remote instances do not need this
> +	 * instance's local rows, and after gc was reworked to
> +	 * track individual vclock components instead of
> +	 * signatures it's safe to set the local component to the
> +	 * most recent value.
> +	 */
> +	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));

As I mentioned above, it seems unnecessary for 4114, but it also
seems insufficient you change to use replica id 0 for local
requests later in this patch. relay_subscribe will use
signature-based search of xlog so playing around with components
may be not enough. E.g. if local component is huge, we may skip
over some important changes the replica is missing.

Apparently relay has to use lexigraphical search in xlog_dir index as well!

Please provide an explanation in CS/code comment or lets discuss
this if you disagree.


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately
  2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately Serge Petrenko
  2020-03-28  6:17   ` Konstantin Osipov
@ 2020-03-28 16:23   ` Konstantin Osipov
  1 sibling, 0 replies; 14+ messages in thread
From: Konstantin Osipov @ 2020-03-28 16:23 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:
> Moreover, it fixes the following bug with local spaces and replication.
> In a situation when there are a master and a replica set up, replica may
> still write to local spaces even if it's read-only. Local space
> operations used to promote instance's lsn before this patch. Eventually,
> master would have vclock {1:x} and replica'd have vclock {1:x, 2:y},
> where y > 0, due to local space requests performed by the replica.
> If a snapshot happens on replica side, replica will delete it's .xlog
> files prior to the snapshot, since no one replicates from it and thus it
> doesn't have any registered GC consumers.
> From this point, every attempt to configure replication from replica to
> master will fail, since master will try to fetch records which account
> for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y},
> even though master will never need the rows in range {2:1} - {2:y},
> since they'll be turned to NOPs during replication.

OK, got it.

Makes sense.

Then, I'm afraid, relay has to be using lexicographical vclock as
well.

Waiting a fix or clarification here.


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses
  2020-03-28  5:57   ` Konstantin Osipov
@ 2020-03-30 11:02     ` Serge Petrenko
  2020-03-30 12:52       ` Konstantin Osipov
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-03-30 11:02 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, Vladislav Shpilevoy


> 28 марта 2020 г., в 08:57, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):
> 
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:
>> If an anonymous replica is promoted to a normal one and becomes
>> replication master later, its vclock contains a non-empty zero
>> component, tracking local changes on this replica from the time when it
>> had been anonymous. No need to pollute joining instance's vclock with
>> our non-empty 0 component.
>> When an anonymous replica reports its status to a remote instance it
>> should also hide its 0-th vclock component.
>> 
>> This is needed for backward compatibility with old instances, which
>> don't ignore 0th vclock component coming from a remote instance by
>> default.
>> Also make sure that new instances ignore 0th vclock component.
> 
> If you added vclock_compare_ignore0, suggest to introduce and use 
> vclock_copy_ignore0 instead of a pair vclock_copy + vclock_reset.

Ok. The diff’s below.

> 
> Where was vclock_compare_ignore0 introduced, did I miss the
> patch?

I added it in one of the anonymous replica patches.

=============================================================
1st patch:

diff --git a/src/box/vclock.h b/src/box/vclock.h
index aeb8ca66b..2a3a29020 100644
--- a/src/box/vclock.h
+++ b/src/box/vclock.h
@@ -182,6 +182,19 @@ vclock_inc(struct vclock *vclock, uint32_t replica_id)
 	return ++vclock->lsn[replica_id];
 }
 
+/**
+ * Set vclock component represented by replica id to the desired
+ * value. Can be used to decrease stored LSN value for the given
+ * replica id while maintaining a valid signature or in the same
+ * manner as vclock_follow.
+ *
+ * @param vclock Vector clock.
+ * @param replica_id Replica identifier.
+ * @param lsn Lsn to set
+ */
+void
+vclock_reset(struct vclock *vclock, uint32_t replica_id, int64_t lsn);
+
 static inline void
 vclock_copy(struct vclock *dst, const struct vclock *src)
 {
@@ -194,6 +207,17 @@ vclock_copy(struct vclock *dst, const struct vclock *src)
 			 sizeof(*dst->lsn) * max_pos);
 }
 
+/**
+ * A shortcut for vclock_copy() + vclock_reset() for 0th clock
+ * component.
+ */
+static inline void
+vclock_copy_ignore0(struct vclock *dst, const struct vclock *src)
+{
+	vclock_copy(dst, src);
+	vclock_reset(dst, 0, 0);
+}
+
 static inline uint32_t
 vclock_size(const struct vclock *vclock)
 {
@@ -211,19 +235,6 @@ vclock_calc_sum(const struct vclock *vclock)
 	return sum;
 }
 
-/**
- * Set vclock component represented by replica id to the desired
- * value. Can be used to decrease stored LSN value for the given
- * replica id while maintaining a valid signature or in the same
- * manner as vclock_follow.
- *
- * @param vclock Vector clock.
- * @param replica_id Replica identifier.
- * @param lsn Lsn to set
- */
-void
-vclock_reset(struct vclock *vclock, uint32_t replica_id, int64_t lsn);
-
 static inline int64_t
 vclock_sum(const struct vclock *vclock)
 {

============================================================================
2nd patch:

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 2e765ca12..f5f67b6a9 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -174,8 +174,7 @@ applier_writer_f(va_list ap)
 		try {
 			struct xrow_header xrow;
 			struct vclock vclock;
-			vclock_copy(&vclock, &replicaset.vclock);
-			vclock_reset(&vclock, 0, 0);
+			vclock_copy_ignore0(&vclock, &replicaset.vclock);
 			xrow_encode_vclock(&xrow, &vclock);
 			coio_write_xrow(&io, &xrow);
 		} catch (SocketError *e) {
diff --git a/src/box/box.cc b/src/box/box.cc
index 87e836432..b49fdbe21 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1622,8 +1622,7 @@ box_process_register(struct ev_io *io, struct xrow_header *header)
 
 	struct xrow_header row;
 	/* Send end of WAL stream marker */
-	vclock_copy(&vclock, &replicaset.vclock);
-	vclock_reset(&vclock, 0, 0);
+	vclock_copy_ignore0(&vclock, &replicaset.vclock);
 	xrow_encode_vclock_xc(&row, &vclock);
 	row.sync = header->sync;
 	coio_write_xrow(io, &row);
@@ -1777,8 +1776,7 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 
 	/* Send end of WAL stream marker */
 	struct vclock vclock;
-	vclock_copy(&vclock, &replicaset.vclock);
-	vclock_reset(&vclock, 0, 0);
+	vclock_copy_ignore0(&vclock, &replicaset.vclock);
 	xrow_encode_vclock_xc(&row, &vclock);
 
 	row.sync = header->sync;
@@ -1911,8 +1909,7 @@ box_process_vote(struct ballot *ballot)
 	 */
 	ballot->is_loading = is_ro;
 	vclock_copy(&ballot->vclock, &replicaset.vclock);
-	vclock_copy(&ballot->gc_vclock, &gc.vclock);
-	vclock_reset(&ballot->gc_vclock, 0, 0);
+	vclock_copy_ignore0(&ballot->gc_vclock, &gc.vclock);
 }
 
 /** Insert a new cluster into _schema */
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 9abf35fdf..5ed8f5dae 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -323,9 +323,7 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
 	struct vclock subst_vclock;
-	vclock_copy(&subst_vclock, vclock);
-	/* zero - out 0-th vclock component. */
-	vclock_reset(&subst_vclock, 0, 0);
+	vclock_copy_ignore0(&subst_vclock, vclock);
 	xrow_encode_vclock_xc(&row, &subst_vclock);
 	row.sync = sync;
 	coio_write_xrow(&relay->io, &row);


> 
> -- 
> Konstantin Osipov, Moscow, Russia

--
Serge Petrenko
sergepetrenko@tarantool.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures
  2020-03-28  6:03   ` Konstantin Osipov
@ 2020-03-30 11:02     ` Serge Petrenko
  2020-03-30 12:54       ` Konstantin Osipov
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-03-30 11:02 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, Vladislav Shpilevoy



> 28 марта 2020 г., в 09:03, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):
> 
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:
>> +	struct vclock min_vclock;
>> +	struct gc_consumer *consumer = gc_tree_first(&gc.consumers);
> 
> The code would be easier to follow if the entire vclock api used
> would be ignore0:
> 
>> +	/*
>> +	 * Vclock of the oldest WAL row to keep is a by-component
>> +	 * minimum of all consumer vclocks and the oldest
>> +	 * checkpoint vclock. This ensures that all rows needed by
>> +	 * at least one consumer are kept.
>> +	 */
>> +	vclock_copy(&min_vclock, &checkpoint->vclock);
> 
> E.g. use vclock_copy_ignore0 here
> 
>> +	while (consumer != NULL) {
>> +		/*
>> +		 * Consumers will never need rows signed
>> +		 * with a zero instance id (local rows).
>> +		 */
>> +		vclock_min_ignore0(&min_vclock, &consumer->vclock);
>> +		consumer = gc_tree_next(&gc.consumers, consumer);
>> +	}
>> +
>> +	if (vclock_sum(&min_vclock) > vclock_sum(&gc.vclock)) {
> 
> Please use vclock_sum_ignore0
>> +		vclock_copy(&gc.vclock, &min_vclock);
> 
> Please use vclock_copy_ignore0

I can’t. wal_collect_garbage() searches for the wal file with vclock
strictly less than or equal to the one provided, so 0-th clock
component cannot be zeroed out, otherwise no logs will ever be deleted.
While all the other components are taken as a minimum between all
consumers and the oldest snapshot, the 0-th component is copied directly
from the oldest snapshot, since we have to keep all WALs starting from the
oldest snapshot. It is stated in the comment a few lines above. Now i moved
it to a more relevant place.

So, gc.vclock must contain a valid 0th component. That’s why I use ‘local’
vclock_sum() and vclock_copy() here.

> 
> The goal is to switch as many places as possible to ignore0 api,
> then see the few cases left where we don't, and flip around: 
> rename ignore0 api to the default naming scheme, and no-ignore0 to
> vlock_copy_local(), vclock_compare_local() vclock_copy_local().
> 
> This doesn't have to be part of your patch set, but 
> would be nice to get to.

Sounds good, I don’t want to do it now though. Let it be part of a
follow up patch.

> 
> Ideally ignore0 vclock should be a distinct data type
> with an explicit conversion to and from non-ignore0 vclock
> and no implicit assignment (using vclock_copy already more
> or less ensures that).
> 
> Other than that you seem to be on track with the patch.
> 
> -- 
> Konstantin Osipov, Moscow, Russia

=============================================
diff --git a/src/box/gc.c b/src/box/gc.c
index 4eae6ef3b..a2d0a515c 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -186,11 +186,7 @@ gc_run_cleanup(void)
 	/* At least one checkpoint must always be available. */
 	assert(checkpoint != NULL);
 
-	/*
-	 * Find the vclock of the oldest WAL row to keep.
-	 * Note, we must keep all WALs created after the
-	 * oldest checkpoint, even if no consumer needs them.
-	 */
+	/* Find the vclock of the oldest WAL row to keep. */
 	struct vclock min_vclock;
 	struct gc_consumer *consumer = gc_tree_first(&gc.consumers);
 	/*
@@ -198,6 +194,8 @@ gc_run_cleanup(void)
 	 * minimum of all consumer vclocks and the oldest
 	 * checkpoint vclock. This ensures that all rows needed by
 	 * at least one consumer are kept.
+	 * Note, we must keep all WALs created after the
+	 * oldest checkpoint, even if no consumer needs them.
 	 */
 	vclock_copy(&min_vclock, &checkpoint->vclock);
 	while (consumer != NULL) {


--
Serge Petrenko
sergepetrenko@tarantool.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately
  2020-03-28  6:17   ` Konstantin Osipov
@ 2020-03-30 11:02     ` Serge Petrenko
  0 siblings, 0 replies; 14+ messages in thread
From: Serge Petrenko @ 2020-03-30 11:02 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, Vladislav Shpilevoy


> 28 марта 2020 г., в 09:17, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):
> 
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:
> 
> One thing I don't understand is how is this extra patch necessary
> to close #4114? Aren't the first 3 patches enough?
> 
> This is a nice optimization and is perhaps a pre-requisite for a
> fix for some anonymous replica bug, but how does it affect vinyl?
> 
>> Sign local space requests with a zero instance id. This allows to split
>> local changes aside from the changes, which should be visible to the whole
>> cluster, and stop sending NOPs to replicas to follow local vclock.
>> 
>> Moreover, it fixes the following bug with local spaces and replication.
>> In a situation when there are a master and a replica set up, replica may
>> still write to local spaces even if it's read-only. Local space
>> operations used to promote instance's lsn before this patch. Eventually,
>> master would have vclock {1:x} and replica'd have vclock {1:x, 2:y},
>> where y > 0, due to local space requests performed by the replica.
>> If a snapshot happens on replica side, replica will delete it's .xlog
>> files prior to the snapshot, since no one replicates from it and thus it
>> doesn't have any registered GC consumers.
>> From this point, every attempt to configure replication from replica to
>> master will fail, since master will try to fetch records which account
>> for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y},
>> even though master will never need the rows in range {2:1} - {2:y},
>> since they'll be turned to NOPs during replication.
> 
> I think the issue above is fixed by lexicographical search for min
> vclock in gc.
> 
>> 
>> Starting from this patch, in the situation described above, replica's
>> clock will be {0:y, 1:x}, and, since local space requests are now not
>> replicated at all, master will be able to follow replica, turning the
>> configuration to master-master.
> 
> 
>> @@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>> 	say_info("remote vclock %s local vclock %s",
>> 		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
>> 
>> +	/*
>> +	 * Replica clock is used in gc state and recovery
>> +	 * initialization, so we need to replace the remote 0-th
>> +	 * component with our own one. This makes recovery work
>> +	 * correctly: we're trying to recover from an xlog whose
>> +	 * vclock is smaller than remote replica clock in each
>> +	 * component, exluding the 0-th component. This leads to
>> +	 * finding the correct WAL, if it exists, since we do not
>> +	 * need to recover local rows (the ones, that contribute
>> +	 * to the 0-th vclock component). It would be bad to set
>> +	 * 0-th vclock component to a smaller value, since it
>> +	 * would unnecessarily require additional WALs, which may
>> +	 * have already been deleted.
>> +	 * Same stands for gc. Remote instances do not need this
>> +	 * instance's local rows, and after gc was reworked to
>> +	 * track individual vclock components instead of
>> +	 * signatures it's safe to set the local component to the
>> +	 * most recent value.
>> +	 */
>> +	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));
> 
> As I mentioned above, it seems unnecessary for 4114, but it also
> seems insufficient you change to use replica id 0 for local
> requests later in this patch. relay_subscribe will use
> signature-based search of xlog so playing around with components
> may be not enough. E.g. if local component is huge, we may skip
> over some important changes the replica is missing.
> 
> Apparently relay has to use lexigraphical search in xlog_dir index as well!

But relay(recovery) doesn’t perform a signature-based xlog search. It uses
vcockset_match(), which uses vclock_compare(). vclockset_match() finds the
clock which is strictly less, than the one provided.
signatures are used in xdir_scan(), but xdir_scan still updates the vclockset.
If my patch mangled with vclocks the WALs are signed with, there would be some
concern. Since my patch doesn’t do that, and only changes the key we search with,
everything’s fine.
I tried to explain this in the comment right above, but didn’t emphasize it enough
apparently.
 
I’ll also send v5 with the changes mentioned so that it’s easier for Vlad to review.

Here’s a new comment:

diff --git a/src/box/box.cc b/src/box/box.cc
index a37a95185..0762266b0 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1882,21 +1882,18 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	/*
 	 * Replica clock is used in gc state and recovery
 	 * initialization, so we need to replace the remote 0-th
-	 * component with our own one. This makes recovery work
-	 * correctly: we're trying to recover from an xlog whose
-	 * vclock is smaller than remote replica clock in each
-	 * component, exluding the 0-th component. This leads to
-	 * finding the correct WAL, if it exists, since we do not
-	 * need to recover local rows (the ones, that contribute
-	 * to the 0-th vclock component). It would be bad to set
-	 * 0-th vclock component to a smaller value, since it
-	 * would unnecessarily require additional WALs, which may
-	 * have already been deleted.
-	 * Same stands for gc. Remote instances do not need this
-	 * instance's local rows, and after gc was reworked to
-	 * track individual vclock components instead of
-	 * signatures it's safe to set the local component to the
-	 * most recent value.
+	 * component with our own one. This doesn't break
+	 * recovery: it finds the WAL with a vclock strictly less
+	 * than replia clock in all components except the 0th one.
+	 * This leads to finding the correct WAL, if it exists,
+	 * since we do not need to recover local rows (the ones,
+	 * that contribute to the 0-th vclock component).
+	 * Note, that it would be bad to set 0-th vclock component
+	 * to a smaller value, since it would unnecessarily
+	 * require additional WALs, which may have already been
+	 * deleted.
+	 * Speaking of gc, remote instances' local vclock
+	 * components are not used by consumers at all.
 	 */
 	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));
 	/*
 

> 
> Please provide an explanation in CS/code comment or lets discuss
> this if you disagree.
> 
> 
> -- 
> Konstantin Osipov, Moscow, Russia


--
Serge Petrenko
sergepetrenko@tarantool.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses
  2020-03-30 11:02     ` Serge Petrenko
@ 2020-03-30 12:52       ` Konstantin Osipov
  0 siblings, 0 replies; 14+ messages in thread
From: Konstantin Osipov @ 2020-03-30 12:52 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/03/30 14:03]:
> > If you added vclock_compare_ignore0, suggest to introduce and use 
> > vclock_copy_ignore0 instead of a pair vclock_copy + vclock_reset.
> Ok. The diff’s below.

The diff is fine. You could make vclock_copy_ingore0 a tad faster
I guess.

Anyway it's already lgtm.


-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures
  2020-03-30 11:02     ` Serge Petrenko
@ 2020-03-30 12:54       ` Konstantin Osipov
  0 siblings, 0 replies; 14+ messages in thread
From: Konstantin Osipov @ 2020-03-30 12:54 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/03/30 14:03]:
> >> +	while (consumer != NULL) {
> >> +		/*
> >> +		 * Consumers will never need rows signed
> >> +		 * with a zero instance id (local rows).
> >> +		 */
> >> +		vclock_min_ignore0(&min_vclock, &consumer->vclock);
> >> +		consumer = gc_tree_next(&gc.consumers, consumer);
> >> +	}
> >> +
> >> +	if (vclock_sum(&min_vclock) > vclock_sum(&gc.vclock)) {
> > 
> > Please use vclock_sum_ignore0
> >> +		vclock_copy(&gc.vclock, &min_vclock);
> > 
> > Please use vclock_copy_ignore0
> 
> I can’t. wal_collect_garbage() searches for the wal file with vclock
> strictly less than or equal to the one provided, so 0-th clock
> component cannot be zeroed out, otherwise no logs will ever be deleted.
> While all the other components are taken as a minimum between all
> consumers and the oldest snapshot, the 0-th component is copied directly
> from the oldest snapshot, since we have to keep all WALs starting from the
> oldest snapshot. It is stated in the comment a few lines above. Now i moved
> it to a more relevant place.

Wow. Please add a comment and resend an updated patch so that I
can check/lgtm it.

-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-30 12:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 10:20 [Tarantool-patches] [PATCH v4 0/4] replication: fix local space tracking Serge Petrenko
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 1/4] vclock: add an ability to reset individual clock components Serge Petrenko
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses Serge Petrenko
2020-03-28  5:57   ` Konstantin Osipov
2020-03-30 11:02     ` Serge Petrenko
2020-03-30 12:52       ` Konstantin Osipov
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures Serge Petrenko
2020-03-28  6:03   ` Konstantin Osipov
2020-03-30 11:02     ` Serge Petrenko
2020-03-30 12:54       ` Konstantin Osipov
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately Serge Petrenko
2020-03-28  6:17   ` Konstantin Osipov
2020-03-30 11:02     ` Serge Petrenko
2020-03-28 16:23   ` Konstantin Osipov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox