From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp41.i.mail.ru (smtp41.i.mail.ru [94.100.177.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E1E0C469719 for ; Fri, 14 Feb 2020 13:46:20 +0300 (MSK) From: Serge Petrenko Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_09E64DF6-4AD2-498A-B30D-E37A4E3C79FB" Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Date: Fri, 14 Feb 2020 13:46:18 +0300 In-Reply-To: <20200214072526.GC15237@atlas> References: <20200214072526.GC15237@atlas> Subject: Re: [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org --Apple-Mail=_09E64DF6-4AD2-498A-B30D-E37A4E3C79FB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > 14 =D1=84=D0=B5=D0=B2=D1=80. 2020 =D0=B3., =D0=B2 10:25, Konstantin = Osipov =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB= (=D0=B0): >=20 > * sergepetrenko > [20/02/14 09:46]: >> From: Serge Petrenko > >>=20 >> When master processes a subscribe response, it responds with its = vclock >> at the moment of receiving the request. However, the fiber processing >> the request may yield on coio_write_xrow, when sending the response = to >> the replica. In the meantime, master may apply additional rows coming >> from the replica after it has issued SUBSCRIBE. >> Then in relay_subscribe master sets its local vclock_at_subscribe to >> a possibly updated value of replicaset.vclock >> So, set local_vclock_at_subscribe to a remembered value, rather than = an >> updated one. >=20 > I don't fully understand the explanation and what this fix > achieves. The fix allows to stop relaying rows that have just came from the = replica back to it. It is not necessary, since we=E2=80=99ve fixed applier in a different = way, but I think there=E2=80=99s no need to resend the replica=E2=80=99s rows back and forth. It just looks = more correct, or consistent, if you wish. If master responds that its vclock is such and such, it = should use the same vclock to judge whether to send the replica its own rows, or not, in my = opinion. Before the patch master judges by replicaset vclock, which may get = updated while master responds to subscribe (coio_write_xrow yields). >=20 > local_vclock_at_subscribe is used to leave orphan mode. It > basically tells the applier when it more or less has fully caught > up with the relay.=20 Yes, master sends it to replica, and later master uses its own = replicaset vclock (previously) or the same vclock it sent to replica (after my patch) to = filter replica=E2=80=99s rows to send back to it (see the piece of code you=E2=80=99ve shown me = yesterday). Imagine a situation: you have master-master configuration with instances = 1 and 2 2 is subscribed to 1, and 1 resubscribes to 2 (maybe 2 just restarted = and was the first one to successfully subscribe). 2 yields on writing subscribe responce. In the meantime 1 writes = something new to WAL and relays it to 2. 2 writes it to WAL and increments its replicaset = vclock. Later it=E2=80=99ll resend these rows back to 1, because `relay->local_vclock_at_subscribe` holds = an updated vclock value. Hope I made it more clear this time. >=20 > It should not impact replication correctness in any other way. >=20 > I.e. it shouldn't matter whether or not it's accurate - while the > applier is reading rows from the master, the master can get new > rows anyway. >=20 > If local_vclock_at_subscribe has any other meaning/impact, it's a bug.=20= >=20 >=20 >>=20 >> Follow-up #4739 >> --- >> src/box/box.cc | 2 +- >> src/box/relay.cc | 13 +++++++++++-- >> src/box/relay.h | 3 ++- >> 3 files changed, 14 insertions(+), 4 deletions(-) >>=20 >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 952d60ad1..7dec1ae6b 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -1871,7 +1871,7 @@ box_process_subscribe(struct ev_io *io, struct = xrow_header *header) >> * indefinitely). >> */ >> relay_subscribe(replica, io->fd, header->sync, &replica_clock, >> - replica_version_id); >> + replica_version_id, &vclock); >> } >>=20 >> void >> diff --git a/src/box/relay.cc b/src/box/relay.cc >> index b89632273..b69646446 100644 >> --- a/src/box/relay.cc >> +++ b/src/box/relay.cc >> @@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap) >> /** Replication acceptor fiber handler. */ >> void >> relay_subscribe(struct replica *replica, int fd, uint64_t sync, >> - struct vclock *replica_clock, uint32_t = replica_version_id) >> + struct vclock *replica_clock, uint32_t = replica_version_id, >> + struct vclock *clock_at_subscribe) >> { >> assert(replica->anon || replica->id !=3D REPLICA_ID_NIL); >> struct relay *relay =3D replica->relay; >> @@ -699,7 +700,15 @@ relay_subscribe(struct replica *replica, int fd, = uint64_t sync, >> replica_on_relay_stop(replica); >> }); >>=20 >> - vclock_copy(&relay->local_vclock_at_subscribe, = &replicaset.vclock); >> + /* >> + * It's too late to remember replicaset.vclock as local >> + * vclock at subscribe. It might have incremented while we >> + * were writing a subscribe response, and we don't want to >> + * replicate back rows originating from the replica and >> + * having arrived later than replica has issued >> + * SUBSCRIBE. >=20 >=20 > I still don't und >> + */ >> + vclock_copy(&relay->local_vclock_at_subscribe, = clock_at_subscribe); >> relay->r =3D recovery_new(cfg_gets("wal_dir"), false, >> replica_clock); >> vclock_copy(&relay->tx.vclock, replica_clock); >> diff --git a/src/box/relay.h b/src/box/relay.h >> index e1782d78f..54ebd6731 100644 >> --- a/src/box/relay.h >> +++ b/src/box/relay.h >> @@ -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct = vclock *start_vclock, >> */ >> void >> relay_subscribe(struct replica *replica, int fd, uint64_t sync, >> - struct vclock *replica_vclock, uint32_t = replica_version_id); >> + struct vclock *replica_vclock, uint32_t = replica_version_id, >> + struct vclock *clock_at_subscribe); >>=20 >> #endif /* TARANTOOL_REPLICATION_RELAY_H_INCLUDED */ >> --=20 >> 2.20.1 (Apple Git-117) >>=20 >=20 > --=20 > Konstantin Osipov, Moscow, Russia > https://scylladb.com --Apple-Mail=_09E64DF6-4AD2-498A-B30D-E37A4E3C79FB Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
14 = =D1=84=D0=B5=D0=B2=D1=80. 2020 =D0=B3., =D0=B2 10:25, Konstantin Osipov = <kostja.osipov@gmail.com> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0= =B0=D0=BB(=D0=B0):

* = sergepetrenko <sergepetrenko@tarantool.org> [20/02/14 09:46]:
From: Serge Petrenko <sergepetrenko@tarantool.org>

When master processes a subscribe response, it responds with = its vclock
at the moment of receiving the request. = However, the fiber processing
the request may yield on = coio_write_xrow, when sending the response to
the replica. = In the meantime, master may apply additional rows coming
from the replica after it has issued SUBSCRIBE.
Then in relay_subscribe master sets its local = vclock_at_subscribe to
a possibly updated value of = replicaset.vclock
So, set local_vclock_at_subscribe to a = remembered value, rather than an
updated one.

I don't fully understand the explanation and what this = fix
achieves.

The fix allows to stop relaying rows that have = just came from the replica back to it.
It is not necessary, = since we=E2=80=99ve fixed applier in a different way, but I think = there=E2=80=99s no
need to resend the replica=E2=80=99s rows = back and forth. It just looks more correct, or consistent,
if = you wish. If master responds that its vclock is such and such, it should = use the same
vclock to judge whether to send the replica its = own rows, or not, in my opinion.
Before the patch master = judges by replicaset vclock, which may get updated while = master
responds to subscribe (coio_write_xrow = yields).


local_vclock_at_subscribe is used to leave orphan mode. = It
basically = tells the applier when it more or less has fully caught
up with the relay. 

Yes, master = sends it to replica, and later master uses its own replicaset = vclock
(previously) or the same vclock it sent to replica = (after my patch) to filter replica=E2=80=99s
rows to send back = to it (see the piece of code you=E2=80=99ve shown me = yesterday).
Imagine a situation: you have master-master = configuration with instances 1 and 2
2 is subscribed to 1, and = 1 resubscribes to 2 (maybe 2 just restarted and was the first = one
to successfully subscribe).
2 yields on writing = subscribe responce. In the meantime 1 writes something new to = WAL
and relays it to 2. 2 writes it to WAL and increments its = replicaset vclock. Later it=E2=80=99ll resend
these rows back = to 1, because `relay->local_vclock_at_subscribe` holds an updated = vclock
value.
Hope I made it more clear this = time.


It should not = impact replication correctness in any other way.

I.e. it shouldn't matter whether = or not it's accurate - while the
applier is reading rows from the master, the master can get = new
rows = anyway.

If = local_vclock_at_subscribe has any other meaning/impact, it's a bug. 



Follow-up #4739
---
src/box/box.cc   |  2 +-
src/box/relay.cc | = 13 +++++++++++--
src/box/relay.h  |  3 ++-
3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 952d60ad1..7dec1ae6b 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1871,7 = +1871,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header = *header)
 * indefinitely).
=  */
= relay_subscribe(replica, io->fd, header->sync, = &replica_clock,
- replica_version_id);
+ = = = replica_version_id, &vclock);
}
void
diff --git a/src/box/relay.cc b/src/box/relay.cc
index = b89632273..b69646446 100644
--- a/src/box/relay.cc
+++ = b/src/box/relay.cc
@@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap)
/** Replication acceptor fiber handler. */
void
relay_subscribe(struct replica *replica, = int fd, uint64_t sync,
- struct vclock *replica_clock, = uint32_t replica_version_id)
+ struct vclock *replica_clock, = uint32_t replica_version_id,
+ struct vclock = *clock_at_subscribe)
{
= assert(replica->anon || replica->id !=3D = REPLICA_ID_NIL);
struct relay *relay =3D = replica->relay;
@@ -699,7 +700,15 @@ = relay_subscribe(struct replica *replica, int fd, uint64_t sync,
= = replica_on_relay_stop(replica);
});

- = vclock_copy(&relay->local_vclock_at_subscribe, = &replicaset.vclock);
+ /*
+  * It's too late to remember = replicaset.vclock as local
+  * vclock at subscribe. It = might have incremented while we
+  * were writing a subscribe = response, and we don't want to
+  * replicate back rows = originating from the replica and
+  * having arrived later than = replica has issued
+  * SUBSCRIBE.


I still don't und
+  */
+ = vclock_copy(&relay->local_vclock_at_subscribe, = clock_at_subscribe);
relay->r =3D = recovery_new(cfg_gets("wal_dir"), false,
      = ;  replica_clock);
= vclock_copy(&relay->tx.vclock, replica_clock);
diff --git a/src/box/relay.h b/src/box/relay.h
index e1782d78f..54ebd6731 100644
--- = a/src/box/relay.h
+++ b/src/box/relay.h
@@ = -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct vclock = *start_vclock,
 */
void
relay_subscribe(struct replica *replica, int fd, uint64_t = sync,
- struct vclock *replica_vclock, uint32_t = replica_version_id);
+ struct vclock *replica_vclock, = uint32_t replica_version_id,
+ struct vclock = *clock_at_subscribe);

#endif /* = TARANTOOL_REPLICATION_RELAY_H_INCLUDED */
-- 
2.20.1 = (Apple Git-117)


-- 
Konstantin Osipov, Moscow, = Russia
https://scylladb.com

= --Apple-Mail=_09E64DF6-4AD2-498A-B30D-E37A4E3C79FB--