[Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately

Konstantin Osipov kostja.osipov at gmail.com
Sat Mar 28 09:17:22 MSK 2020


* Serge Petrenko <sergepetrenko at tarantool.org> [20/03/27 18:08]:

One thing I don't understand is how is this extra patch necessary
to close #4114? Aren't the first 3 patches enough?

This is a nice optimization and is perhaps a pre-requisite for a
fix for some anonymous replica bug, but how does it affect vinyl?

> Sign local space requests with a zero instance id. This allows to split
> local changes aside from the changes, which should be visible to the whole
> cluster, and stop sending NOPs to replicas to follow local vclock.
> 
> Moreover, it fixes the following bug with local spaces and replication.
> In a situation when there are a master and a replica set up, replica may
> still write to local spaces even if it's read-only. Local space
> operations used to promote instance's lsn before this patch. Eventually,
> master would have vclock {1:x} and replica'd have vclock {1:x, 2:y},
> where y > 0, due to local space requests performed by the replica.
> If a snapshot happens on replica side, replica will delete it's .xlog
> files prior to the snapshot, since no one replicates from it and thus it
> doesn't have any registered GC consumers.
> From this point, every attempt to configure replication from replica to
> master will fail, since master will try to fetch records which account
> for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y},
> even though master will never need the rows in range {2:1} - {2:y},
> since they'll be turned to NOPs during replication.

I think the issue above is fixed by lexicographical search for min
vclock in gc.

> 
> Starting from this patch, in the situation described above, replica's
> clock will be {0:y, 1:x}, and, since local space requests are now not
> replicated at all, master will be able to follow replica, turning the
> configuration to master-master.


> @@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>  	say_info("remote vclock %s local vclock %s",
>  		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
>  
> +	/*
> +	 * Replica clock is used in gc state and recovery
> +	 * initialization, so we need to replace the remote 0-th
> +	 * component with our own one. This makes recovery work
> +	 * correctly: we're trying to recover from an xlog whose
> +	 * vclock is smaller than remote replica clock in each
> +	 * component, exluding the 0-th component. This leads to
> +	 * finding the correct WAL, if it exists, since we do not
> +	 * need to recover local rows (the ones, that contribute
> +	 * to the 0-th vclock component). It would be bad to set
> +	 * 0-th vclock component to a smaller value, since it
> +	 * would unnecessarily require additional WALs, which may
> +	 * have already been deleted.
> +	 * Same stands for gc. Remote instances do not need this
> +	 * instance's local rows, and after gc was reworked to
> +	 * track individual vclock components instead of
> +	 * signatures it's safe to set the local component to the
> +	 * most recent value.
> +	 */
> +	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));

As I mentioned above, it seems unnecessary for 4114, but it also
seems insufficient you change to use replica id 0 for local
requests later in this patch. relay_subscribe will use
signature-based search of xlog so playing around with components
may be not enough. E.g. if local component is huge, we may skip
over some important changes the replica is missing.

Apparently relay has to use lexigraphical search in xlog_dir index as well!

Please provide an explanation in CS/code comment or lets discuss
this if you disagree.


-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list