14 февр. 2020 г., в 10:25, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):

* sergepetrenko <sergepetrenko@tarantool.org> [20/02/14 09:46]:
From: Serge Petrenko <sergepetrenko@tarantool.org>

When master processes a subscribe response, it responds with its vclock
at the moment of receiving the request. However, the fiber processing
the request may yield on coio_write_xrow, when sending the response to
the replica. In the meantime, master may apply additional rows coming
from the replica after it has issued SUBSCRIBE.
Then in relay_subscribe master sets its local vclock_at_subscribe to
a possibly updated value of replicaset.vclock
So, set local_vclock_at_subscribe to a remembered value, rather than an
updated one.

I don't fully understand the explanation and what this fix
achieves.

The fix allows to stop relaying rows that have just came from the replica back to it.
It is not necessary, since we’ve fixed applier in a different way, but I think there’s no
need to resend the replica’s rows back and forth. It just looks more correct, or consistent,
if you wish. If master responds that its vclock is such and such, it should use the same
vclock to judge whether to send the replica its own rows, or not, in my opinion.
Before the patch master judges by replicaset vclock, which may get updated while master
responds to subscribe (coio_write_xrow yields).


local_vclock_at_subscribe is used to leave orphan mode. It
basically tells the applier when it more or less has fully caught
up with the relay. 

Yes, master sends it to replica, and later master uses its own replicaset vclock
(previously) or the same vclock it sent to replica (after my patch) to filter replica’s
rows to send back to it (see the piece of code you’ve shown me yesterday).
Imagine a situation: you have master-master configuration with instances 1 and 2
2 is subscribed to 1, and 1 resubscribes to 2 (maybe 2 just restarted and was the first one
to successfully subscribe).
2 yields on writing subscribe responce. In the meantime 1 writes something new to WAL
and relays it to 2. 2 writes it to WAL and increments its replicaset vclock. Later it’ll resend
these rows back to 1, because `relay->local_vclock_at_subscribe` holds an updated vclock
value.
Hope I made it more clear this time.


It should not impact replication correctness in any other way.

I.e. it shouldn't matter whether or not it's accurate - while the
applier is reading rows from the master, the master can get new
rows anyway.

If local_vclock_at_subscribe has any other meaning/impact, it's a bug. 



Follow-up #4739
---
src/box/box.cc   |  2 +-
src/box/relay.cc | 13 +++++++++++--
src/box/relay.h  |  3 ++-
3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 952d60ad1..7dec1ae6b 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1871,7 +1871,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 * indefinitely).
 */
relay_subscribe(replica, io->fd, header->sync, &replica_clock,
- replica_version_id);
+ replica_version_id, &vclock);
}

void
diff --git a/src/box/relay.cc b/src/box/relay.cc
index b89632273..b69646446 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap)
/** Replication acceptor fiber handler. */
void
relay_subscribe(struct replica *replica, int fd, uint64_t sync,
- struct vclock *replica_clock, uint32_t replica_version_id)
+ struct vclock *replica_clock, uint32_t replica_version_id,
+ struct vclock *clock_at_subscribe)
{
assert(replica->anon || replica->id != REPLICA_ID_NIL);
struct relay *relay = replica->relay;
@@ -699,7 +700,15 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
replica_on_relay_stop(replica);
});

- vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
+ /*
+  * It's too late to remember replicaset.vclock as local
+  * vclock at subscribe. It might have incremented while we
+  * were writing a subscribe response, and we don't want to
+  * replicate back rows originating from the replica and
+  * having arrived later than replica has issued
+  * SUBSCRIBE.


I still don't und
+  */
+ vclock_copy(&relay->local_vclock_at_subscribe, clock_at_subscribe);
relay->r = recovery_new(cfg_gets("wal_dir"), false,
        replica_clock);
vclock_copy(&relay->tx.vclock, replica_clock);
diff --git a/src/box/relay.h b/src/box/relay.h
index e1782d78f..54ebd6731 100644
--- a/src/box/relay.h
+++ b/src/box/relay.h
@@ -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct vclock *start_vclock,
 */
void
relay_subscribe(struct replica *replica, int fd, uint64_t sync,
- struct vclock *replica_vclock, uint32_t replica_version_id);
+ struct vclock *replica_vclock, uint32_t replica_version_id,
+ struct vclock *clock_at_subscribe);

#endif /* TARANTOOL_REPLICATION_RELAY_H_INCLUDED */
-- 
2.20.1 (Apple Git-117)


-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com