Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Konstantin Osipov <kostja.osipov@gmail.com>
Cc: tarantool-patches@dev.tarantool.org,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately
Date: Mon, 30 Mar 2020 14:02:54 +0300	[thread overview]
Message-ID: <FD12A07B-757B-4416-AF92-42F14540755E@tarantool.org> (raw)
In-Reply-To: <20200328061722.GC23207@atlas>


> 28 марта 2020 г., в 09:17, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):
> 
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/03/27 18:08]:
> 
> One thing I don't understand is how is this extra patch necessary
> to close #4114? Aren't the first 3 patches enough?
> 
> This is a nice optimization and is perhaps a pre-requisite for a
> fix for some anonymous replica bug, but how does it affect vinyl?
> 
>> Sign local space requests with a zero instance id. This allows to split
>> local changes aside from the changes, which should be visible to the whole
>> cluster, and stop sending NOPs to replicas to follow local vclock.
>> 
>> Moreover, it fixes the following bug with local spaces and replication.
>> In a situation when there are a master and a replica set up, replica may
>> still write to local spaces even if it's read-only. Local space
>> operations used to promote instance's lsn before this patch. Eventually,
>> master would have vclock {1:x} and replica'd have vclock {1:x, 2:y},
>> where y > 0, due to local space requests performed by the replica.
>> If a snapshot happens on replica side, replica will delete it's .xlog
>> files prior to the snapshot, since no one replicates from it and thus it
>> doesn't have any registered GC consumers.
>> From this point, every attempt to configure replication from replica to
>> master will fail, since master will try to fetch records which account
>> for the difference in master's and replica's vclocks: {1:x} vs {1:x,2:y},
>> even though master will never need the rows in range {2:1} - {2:y},
>> since they'll be turned to NOPs during replication.
> 
> I think the issue above is fixed by lexicographical search for min
> vclock in gc.
> 
>> 
>> Starting from this patch, in the situation described above, replica's
>> clock will be {0:y, 1:x}, and, since local space requests are now not
>> replicated at all, master will be able to follow replica, turning the
>> configuration to master-master.
> 
> 
>> @@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>> 	say_info("remote vclock %s local vclock %s",
>> 		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
>> 
>> +	/*
>> +	 * Replica clock is used in gc state and recovery
>> +	 * initialization, so we need to replace the remote 0-th
>> +	 * component with our own one. This makes recovery work
>> +	 * correctly: we're trying to recover from an xlog whose
>> +	 * vclock is smaller than remote replica clock in each
>> +	 * component, exluding the 0-th component. This leads to
>> +	 * finding the correct WAL, if it exists, since we do not
>> +	 * need to recover local rows (the ones, that contribute
>> +	 * to the 0-th vclock component). It would be bad to set
>> +	 * 0-th vclock component to a smaller value, since it
>> +	 * would unnecessarily require additional WALs, which may
>> +	 * have already been deleted.
>> +	 * Same stands for gc. Remote instances do not need this
>> +	 * instance's local rows, and after gc was reworked to
>> +	 * track individual vclock components instead of
>> +	 * signatures it's safe to set the local component to the
>> +	 * most recent value.
>> +	 */
>> +	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));
> 
> As I mentioned above, it seems unnecessary for 4114, but it also
> seems insufficient you change to use replica id 0 for local
> requests later in this patch. relay_subscribe will use
> signature-based search of xlog so playing around with components
> may be not enough. E.g. if local component is huge, we may skip
> over some important changes the replica is missing.
> 
> Apparently relay has to use lexigraphical search in xlog_dir index as well!

But relay(recovery) doesn’t perform a signature-based xlog search. It uses
vcockset_match(), which uses vclock_compare(). vclockset_match() finds the
clock which is strictly less, than the one provided.
signatures are used in xdir_scan(), but xdir_scan still updates the vclockset.
If my patch mangled with vclocks the WALs are signed with, there would be some
concern. Since my patch doesn’t do that, and only changes the key we search with,
everything’s fine.
I tried to explain this in the comment right above, but didn’t emphasize it enough
apparently.
 
I’ll also send v5 with the changes mentioned so that it’s easier for Vlad to review.

Here’s a new comment:

diff --git a/src/box/box.cc b/src/box/box.cc
index a37a95185..0762266b0 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1882,21 +1882,18 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	/*
 	 * Replica clock is used in gc state and recovery
 	 * initialization, so we need to replace the remote 0-th
-	 * component with our own one. This makes recovery work
-	 * correctly: we're trying to recover from an xlog whose
-	 * vclock is smaller than remote replica clock in each
-	 * component, exluding the 0-th component. This leads to
-	 * finding the correct WAL, if it exists, since we do not
-	 * need to recover local rows (the ones, that contribute
-	 * to the 0-th vclock component). It would be bad to set
-	 * 0-th vclock component to a smaller value, since it
-	 * would unnecessarily require additional WALs, which may
-	 * have already been deleted.
-	 * Same stands for gc. Remote instances do not need this
-	 * instance's local rows, and after gc was reworked to
-	 * track individual vclock components instead of
-	 * signatures it's safe to set the local component to the
-	 * most recent value.
+	 * component with our own one. This doesn't break
+	 * recovery: it finds the WAL with a vclock strictly less
+	 * than replia clock in all components except the 0th one.
+	 * This leads to finding the correct WAL, if it exists,
+	 * since we do not need to recover local rows (the ones,
+	 * that contribute to the 0-th vclock component).
+	 * Note, that it would be bad to set 0-th vclock component
+	 * to a smaller value, since it would unnecessarily
+	 * require additional WALs, which may have already been
+	 * deleted.
+	 * Speaking of gc, remote instances' local vclock
+	 * components are not used by consumers at all.
 	 */
 	vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, 0));
 	/*
 

> 
> Please provide an explanation in CS/code comment or lets discuss
> this if you disagree.
> 
> 
> -- 
> Konstantin Osipov, Moscow, Russia


--
Serge Petrenko
sergepetrenko@tarantool.org

  reply	other threads:[~2020-03-30 11:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 10:20 [Tarantool-patches] [PATCH v4 0/4] replication: fix local space tracking Serge Petrenko
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 1/4] vclock: add an ability to reset individual clock components Serge Petrenko
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 2/4] replication: hide 0-th vclock components in replication responses Serge Petrenko
2020-03-28  5:57   ` Konstantin Osipov
2020-03-30 11:02     ` Serge Petrenko
2020-03-30 12:52       ` Konstantin Osipov
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures Serge Petrenko
2020-03-28  6:03   ` Konstantin Osipov
2020-03-30 11:02     ` Serge Petrenko
2020-03-30 12:54       ` Konstantin Osipov
2020-03-27 10:20 ` [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately Serge Petrenko
2020-03-28  6:17   ` Konstantin Osipov
2020-03-30 11:02     ` Serge Petrenko [this message]
2020-03-28 16:23   ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FD12A07B-757B-4416-AF92-42F14540755E@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox