[Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily
Serge Petrenko
sergepetrenko at tarantool.org
Fri Feb 14 13:46:18 MSK 2020
> 14 февр. 2020 г., в 10:25, Konstantin Osipov <kostja.osipov at gmail.com> написал(а):
>
> * sergepetrenko <sergepetrenko at tarantool.org <mailto:sergepetrenko at tarantool.org>> [20/02/14 09:46]:
>> From: Serge Petrenko <sergepetrenko at tarantool.org <mailto:sergepetrenko at tarantool.org>>
>>
>> When master processes a subscribe response, it responds with its vclock
>> at the moment of receiving the request. However, the fiber processing
>> the request may yield on coio_write_xrow, when sending the response to
>> the replica. In the meantime, master may apply additional rows coming
>> from the replica after it has issued SUBSCRIBE.
>> Then in relay_subscribe master sets its local vclock_at_subscribe to
>> a possibly updated value of replicaset.vclock
>> So, set local_vclock_at_subscribe to a remembered value, rather than an
>> updated one.
>
> I don't fully understand the explanation and what this fix
> achieves.
The fix allows to stop relaying rows that have just came from the replica back to it.
It is not necessary, since we’ve fixed applier in a different way, but I think there’s no
need to resend the replica’s rows back and forth. It just looks more correct, or consistent,
if you wish. If master responds that its vclock is such and such, it should use the same
vclock to judge whether to send the replica its own rows, or not, in my opinion.
Before the patch master judges by replicaset vclock, which may get updated while master
responds to subscribe (coio_write_xrow yields).
>
> local_vclock_at_subscribe is used to leave orphan mode. It
> basically tells the applier when it more or less has fully caught
> up with the relay.
Yes, master sends it to replica, and later master uses its own replicaset vclock
(previously) or the same vclock it sent to replica (after my patch) to filter replica’s
rows to send back to it (see the piece of code you’ve shown me yesterday).
Imagine a situation: you have master-master configuration with instances 1 and 2
2 is subscribed to 1, and 1 resubscribes to 2 (maybe 2 just restarted and was the first one
to successfully subscribe).
2 yields on writing subscribe responce. In the meantime 1 writes something new to WAL
and relays it to 2. 2 writes it to WAL and increments its replicaset vclock. Later it’ll resend
these rows back to 1, because `relay->local_vclock_at_subscribe` holds an updated vclock
value.
Hope I made it more clear this time.
>
> It should not impact replication correctness in any other way.
>
> I.e. it shouldn't matter whether or not it's accurate - while the
> applier is reading rows from the master, the master can get new
> rows anyway.
>
> If local_vclock_at_subscribe has any other meaning/impact, it's a bug.
>
>
>>
>> Follow-up #4739
>> ---
>> src/box/box.cc | 2 +-
>> src/box/relay.cc | 13 +++++++++++--
>> src/box/relay.h | 3 ++-
>> 3 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 952d60ad1..7dec1ae6b 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1871,7 +1871,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>> * indefinitely).
>> */
>> relay_subscribe(replica, io->fd, header->sync, &replica_clock,
>> - replica_version_id);
>> + replica_version_id, &vclock);
>> }
>>
>> void
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index b89632273..b69646446 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap)
>> /** Replication acceptor fiber handler. */
>> void
>> relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> - struct vclock *replica_clock, uint32_t replica_version_id)
>> + struct vclock *replica_clock, uint32_t replica_version_id,
>> + struct vclock *clock_at_subscribe)
>> {
>> assert(replica->anon || replica->id != REPLICA_ID_NIL);
>> struct relay *relay = replica->relay;
>> @@ -699,7 +700,15 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> replica_on_relay_stop(replica);
>> });
>>
>> - vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
>> + /*
>> + * It's too late to remember replicaset.vclock as local
>> + * vclock at subscribe. It might have incremented while we
>> + * were writing a subscribe response, and we don't want to
>> + * replicate back rows originating from the replica and
>> + * having arrived later than replica has issued
>> + * SUBSCRIBE.
>
>
> I still don't und
>> + */
>> + vclock_copy(&relay->local_vclock_at_subscribe, clock_at_subscribe);
>> relay->r = recovery_new(cfg_gets("wal_dir"), false,
>> replica_clock);
>> vclock_copy(&relay->tx.vclock, replica_clock);
>> diff --git a/src/box/relay.h b/src/box/relay.h
>> index e1782d78f..54ebd6731 100644
>> --- a/src/box/relay.h
>> +++ b/src/box/relay.h
>> @@ -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct vclock *start_vclock,
>> */
>> void
>> relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> - struct vclock *replica_vclock, uint32_t replica_version_id);
>> + struct vclock *replica_vclock, uint32_t replica_version_id,
>> + struct vclock *clock_at_subscribe);
>>
>> #endif /* TARANTOOL_REPLICATION_RELAY_H_INCLUDED */
>> --
>> 2.20.1 (Apple Git-117)
>>
>
> --
> Konstantin Osipov, Moscow, Russia
> https://scylladb.com <https://scylladb.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20200214/bc0db932/attachment.html>
More information about the Tarantool-patches
mailing list