From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp60.i.mail.ru (smtp60.i.mail.ru [217.69.128.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C6648441849 for ; Mon, 30 Mar 2020 14:02:54 +0300 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) From: Serge Petrenko In-Reply-To: <20200328061722.GC23207@atlas> Date: Mon, 30 Mar 2020 14:02:54 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20200328061722.GC23207@atlas> Subject: Re: [Tarantool-patches] [PATCH v4 4/4] box: start counting local space requests separately List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy > 28 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2020 =D0=B3., =D0=B2 09:17, = Konstantin Osipov =D0=BD=D0=B0=D0=BF=D0=B8=D1=81= =D0=B0=D0=BB(=D0=B0): >=20 > * Serge Petrenko [20/03/27 18:08]: >=20 > One thing I don't understand is how is this extra patch necessary > to close #4114? Aren't the first 3 patches enough? >=20 > This is a nice optimization and is perhaps a pre-requisite for a > fix for some anonymous replica bug, but how does it affect vinyl? >=20 >> Sign local space requests with a zero instance id. This allows to = split >> local changes aside from the changes, which should be visible to the = whole >> cluster, and stop sending NOPs to replicas to follow local vclock. >>=20 >> Moreover, it fixes the following bug with local spaces and = replication. >> In a situation when there are a master and a replica set up, replica = may >> still write to local spaces even if it's read-only. Local space >> operations used to promote instance's lsn before this patch. = Eventually, >> master would have vclock {1:x} and replica'd have vclock {1:x, 2:y}, >> where y > 0, due to local space requests performed by the replica. >> If a snapshot happens on replica side, replica will delete it's .xlog >> files prior to the snapshot, since no one replicates from it and thus = it >> doesn't have any registered GC consumers. >> =46rom this point, every attempt to configure replication from = replica to >> master will fail, since master will try to fetch records which = account >> for the difference in master's and replica's vclocks: {1:x} vs = {1:x,2:y}, >> even though master will never need the rows in range {2:1} - {2:y}, >> since they'll be turned to NOPs during replication. >=20 > I think the issue above is fixed by lexicographical search for min > vclock in gc. >=20 >>=20 >> Starting from this patch, in the situation described above, replica's >> clock will be {0:y, 1:x}, and, since local space requests are now not >> replicated at all, master will be able to follow replica, turning the >> configuration to master-master. >=20 >=20 >> @@ -1882,6 +1882,26 @@ box_process_subscribe(struct ev_io *io, struct = xrow_header *header) >> say_info("remote vclock %s local vclock %s", >> vclock_to_string(&replica_clock), = vclock_to_string(&vclock)); >>=20 >> + /* >> + * Replica clock is used in gc state and recovery >> + * initialization, so we need to replace the remote 0-th >> + * component with our own one. This makes recovery work >> + * correctly: we're trying to recover from an xlog whose >> + * vclock is smaller than remote replica clock in each >> + * component, exluding the 0-th component. This leads to >> + * finding the correct WAL, if it exists, since we do not >> + * need to recover local rows (the ones, that contribute >> + * to the 0-th vclock component). It would be bad to set >> + * 0-th vclock component to a smaller value, since it >> + * would unnecessarily require additional WALs, which may >> + * have already been deleted. >> + * Same stands for gc. Remote instances do not need this >> + * instance's local rows, and after gc was reworked to >> + * track individual vclock components instead of >> + * signatures it's safe to set the local component to the >> + * most recent value. >> + */ >> + vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, = 0)); >=20 > As I mentioned above, it seems unnecessary for 4114, but it also > seems insufficient you change to use replica id 0 for local > requests later in this patch. relay_subscribe will use > signature-based search of xlog so playing around with components > may be not enough. E.g. if local component is huge, we may skip > over some important changes the replica is missing. >=20 > Apparently relay has to use lexigraphical search in xlog_dir index as = well! But relay(recovery) doesn=E2=80=99t perform a signature-based xlog = search. It uses vcockset_match(), which uses vclock_compare(). vclockset_match() finds = the clock which is strictly less, than the one provided. signatures are used in xdir_scan(), but xdir_scan still updates the = vclockset. If my patch mangled with vclocks the WALs are signed with, there would = be some concern. Since my patch doesn=E2=80=99t do that, and only changes the = key we search with, everything=E2=80=99s fine. I tried to explain this in the comment right above, but didn=E2=80=99t = emphasize it enough apparently. =20 I=E2=80=99ll also send v5 with the changes mentioned so that it=E2=80=99s = easier for Vlad to review. Here=E2=80=99s a new comment: diff --git a/src/box/box.cc b/src/box/box.cc index a37a95185..0762266b0 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1882,21 +1882,18 @@ box_process_subscribe(struct ev_io *io, struct = xrow_header *header) /* * Replica clock is used in gc state and recovery * initialization, so we need to replace the remote 0-th - * component with our own one. This makes recovery work - * correctly: we're trying to recover from an xlog whose - * vclock is smaller than remote replica clock in each - * component, exluding the 0-th component. This leads to - * finding the correct WAL, if it exists, since we do not - * need to recover local rows (the ones, that contribute - * to the 0-th vclock component). It would be bad to set - * 0-th vclock component to a smaller value, since it - * would unnecessarily require additional WALs, which may - * have already been deleted. - * Same stands for gc. Remote instances do not need this - * instance's local rows, and after gc was reworked to - * track individual vclock components instead of - * signatures it's safe to set the local component to the - * most recent value. + * component with our own one. This doesn't break + * recovery: it finds the WAL with a vclock strictly less + * than replia clock in all components except the 0th one. + * This leads to finding the correct WAL, if it exists, + * since we do not need to recover local rows (the ones, + * that contribute to the 0-th vclock component). + * Note, that it would be bad to set 0-th vclock component + * to a smaller value, since it would unnecessarily + * require additional WALs, which may have already been + * deleted. + * Speaking of gc, remote instances' local vclock + * components are not used by consumers at all. */ vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, = 0)); /* =20 >=20 > Please provide an explanation in CS/code comment or lets discuss > this if you disagree. >=20 >=20 > --=20 > Konstantin Osipov, Moscow, Russia -- Serge Petrenko sergepetrenko@tarantool.org