[Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately

Serge Petrenko sergepetrenko at tarantool.org
Mon Mar 30 14:02:54 MSK 2020


> 28 марта 2020 г., в 09:17, Konstantin Osipov <kostja.osipov at gmail.com> написал(а):
> 
> * Serge Petrenko <sergepetrenko at tarantool.org> [20/03/27 18:08]:
> 
> One thing I don't understand is how is this extra patch necessary
> to close #4114? Aren't the first 3 patches enough?
> 
> This is a nice optimization and is perhaps a pre-requisite for a
> fix for some anonymous replica bug, but how does it affect vinyl?
> 
>> Sign local space requests with a zero instance id. This allows to split
>> local changes aside from the changes, which should be visible to the whole
>> cluster, and stop sending NOPs to replicas to follow local vclock.
>> 
>> Moreover, it fixes the following bug with local spaces and replication.
>> In a situation when there are a master and a replica set up, replica may
>> still write to local spaces even if it's read-only. Local space
>> operations used to promote instance's lsn before this patch. Eventually,
>> master would have vclock {1:x} and replica'd have vclock {1:x, 2:y},
>> where y > 0, due to local space requests performed by the replica.
>> If a snapshot happens on replica side, replica will delete it's .xlog
>> files prior to the snapshot, since no one replicates from it and thus it
>> doesn't have any registered GC consumers.
>> From this point, every attempt to configure replication from replica to
>> master will fail, since master will try to fetch records which account
>> for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y},
>> even though master will never need the rows in range {2:1} - {2:y},
>> since they'll be turned to NOPs during replication.
> 
> I think the issue above is fixed by lexicographical search for min
> vclock in gc.
> 
>> 
>> Starting from this patch, in the situation described above, replica's
>> clock will be {0:y, 1:x}, and, since local space requests are now not
>> replicated at all, master will be able to follow replica, turning the
>> configuration to master-master.
> 
> 
>> @@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>> 	say_info("remote vclock %s local vclock %s",
>> 		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
>> 
>> +	/*
>> +	 * Replica clock is used in gc state and recovery
>> +	 * initialization, so we need to replace the remote 0-th
>> +	 * component with our own one. This makes recovery work
>> +	 * correctly: we're trying to recover from an xlog whose
>> +	 * vclock is smaller than remote replica clock in each
>> +	 * component, exluding the 0-th component. This leads to
>> +	 * finding the correct WAL, if it exists, since we do not
>> +	 * need to recover local rows (the ones, that contribute
>> +	 * to the 0-th vclock component). It would be bad to set
>> +	 * 0-th vclock component to a smaller value, since it
>> +	 * would unnecessarily require additional WALs, which may
>> +	 * have already been deleted.
>> +	 * Same stands for gc. Remote instances do not need this
>> +	 * instance's local rows, and after gc was reworked to
>> +	 * track individual vclock components instead of
>> +	 * signatures it's safe to set the local component to the
>> +	 * most recent value.
>> +	 */
>> +	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));
> 
> As I mentioned above, it seems unnecessary for 4114, but it also
> seems insufficient you change to use replica id 0 for local
> requests later in this patch. relay_subscribe will use
> signature-based search of xlog so playing around with components
> may be not enough. E.g. if local component is huge, we may skip
> over some important changes the replica is missing.
> 
> Apparently relay has to use lexigraphical search in xlog_dir index as well!

But relay(recovery) doesn’t perform a signature-based xlog search. It uses
vcockset_match(), which uses vclock_compare(). vclockset_match() finds the
clock which is strictly less, than the one provided.
signatures are used in xdir_scan(), but xdir_scan still updates the vclockset.
If my patch mangled with vclocks the WALs are signed with, there would be some
concern. Since my patch doesn’t do that, and only changes the key we search with,
everything’s fine.
I tried to explain this in the comment right above, but didn’t emphasize it enough
apparently.
 
I’ll also send v5 with the changes mentioned so that it’s easier for Vlad to review.

Here’s a new comment:

diff --git a/src/box/box.cc b/src/box/box.cc
index a37a95185..0762266b0 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1882,21 +1882,18 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	/*
 	 * Replica clock is used in gc state and recovery
 	 * initialization, so we need to replace the remote 0-th
-	 * component with our own one. This makes recovery work
-	 * correctly: we're trying to recover from an xlog whose
-	 * vclock is smaller than remote replica clock in each
-	 * component, exluding the 0-th component. This leads to
-	 * finding the correct WAL, if it exists, since we do not
-	 * need to recover local rows (the ones, that contribute
-	 * to the 0-th vclock component). It would be bad to set
-	 * 0-th vclock component to a smaller value, since it
-	 * would unnecessarily require additional WALs, which may
-	 * have already been deleted.
-	 * Same stands for gc. Remote instances do not need this
-	 * instance's local rows, and after gc was reworked to
-	 * track individual vclock components instead of
-	 * signatures it's safe to set the local component to the
-	 * most recent value.
+	 * component with our own one. This doesn't break
+	 * recovery: it finds the WAL with a vclock strictly less
+	 * than replia clock in all components except the 0th one.
+	 * This leads to finding the correct WAL, if it exists,
+	 * since we do not need to recover local rows (the ones,
+	 * that contribute to the 0-th vclock component).
+	 * Note, that it would be bad to set 0-th vclock component
+	 * to a smaller value, since it would unnecessarily
+	 * require additional WALs, which may have already been
+	 * deleted.
+	 * Speaking of gc, remote instances' local vclock
+	 * components are not used by consumers at all.
 	 */
 	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));
 	/*
 

> 
> Please provide an explanation in CS/code comment or lets discuss
> this if you disagree.
> 
> 
> -- 
> Konstantin Osipov, Moscow, Russia


--
Serge Petrenko
sergepetrenko at tarantool.org



More information about the Tarantool-patches mailing list