Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Konstantin Osipov <kostja.osipov@gmail.com>
Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily
Date: Fri, 14 Feb 2020 13:46:18 +0300	[thread overview]
Message-ID: <A2EF2902-12AE-42C6-ACA0-0A2D7FF45B31@tarantool.org> (raw)
In-Reply-To: <20200214072526.GC15237@atlas>

[-- Attachment #1: Type: text/plain, Size: 5512 bytes --]


> 14 февр. 2020 г., в 10:25, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):
> 
> * sergepetrenko <sergepetrenko@tarantool.org <mailto:sergepetrenko@tarantool.org>> [20/02/14 09:46]:
>> From: Serge Petrenko <sergepetrenko@tarantool.org <mailto:sergepetrenko@tarantool.org>>
>> 
>> When master processes a subscribe response, it responds with its vclock
>> at the moment of receiving the request. However, the fiber processing
>> the request may yield on coio_write_xrow, when sending the response to
>> the replica. In the meantime, master may apply additional rows coming
>> from the replica after it has issued SUBSCRIBE.
>> Then in relay_subscribe master sets its local vclock_at_subscribe to
>> a possibly updated value of replicaset.vclock
>> So, set local_vclock_at_subscribe to a remembered value, rather than an
>> updated one.
> 
> I don't fully understand the explanation and what this fix
> achieves.

The fix allows to stop relaying rows that have just came from the replica back to it.
It is not necessary, since we’ve fixed applier in a different way, but I think there’s no
need to resend the replica’s rows back and forth. It just looks more correct, or consistent,
if you wish. If master responds that its vclock is such and such, it should use the same
vclock to judge whether to send the replica its own rows, or not, in my opinion.
Before the patch master judges by replicaset vclock, which may get updated while master
responds to subscribe (coio_write_xrow yields).

> 
> local_vclock_at_subscribe is used to leave orphan mode. It
> basically tells the applier when it more or less has fully caught
> up with the relay. 

Yes, master sends it to replica, and later master uses its own replicaset vclock
(previously) or the same vclock it sent to replica (after my patch) to filter replica’s
rows to send back to it (see the piece of code you’ve shown me yesterday).
Imagine a situation: you have master-master configuration with instances 1 and 2
2 is subscribed to 1, and 1 resubscribes to 2 (maybe 2 just restarted and was the first one
to successfully subscribe).
2 yields on writing subscribe responce. In the meantime 1 writes something new to WAL
and relays it to 2. 2 writes it to WAL and increments its replicaset vclock. Later it’ll resend
these rows back to 1, because `relay->local_vclock_at_subscribe` holds an updated vclock
value.
Hope I made it more clear this time.

> 
> It should not impact replication correctness in any other way.
> 
> I.e. it shouldn't matter whether or not it's accurate - while the
> applier is reading rows from the master, the master can get new
> rows anyway.
> 
> If local_vclock_at_subscribe has any other meaning/impact, it's a bug. 
> 
> 
>> 
>> Follow-up #4739
>> ---
>> src/box/box.cc   |  2 +-
>> src/box/relay.cc | 13 +++++++++++--
>> src/box/relay.h  |  3 ++-
>> 3 files changed, 14 insertions(+), 4 deletions(-)
>> 
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 952d60ad1..7dec1ae6b 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1871,7 +1871,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>> 	 * indefinitely).
>> 	 */
>> 	relay_subscribe(replica, io->fd, header->sync, &replica_clock,
>> -			replica_version_id);
>> +			replica_version_id, &vclock);
>> }
>> 
>> void
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index b89632273..b69646446 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap)
>> /** Replication acceptor fiber handler. */
>> void
>> relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> -		struct vclock *replica_clock, uint32_t replica_version_id)
>> +		struct vclock *replica_clock, uint32_t replica_version_id,
>> +		struct vclock *clock_at_subscribe)
>> {
>> 	assert(replica->anon || replica->id != REPLICA_ID_NIL);
>> 	struct relay *relay = replica->relay;
>> @@ -699,7 +700,15 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> 		replica_on_relay_stop(replica);
>> 	});
>> 
>> -	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
>> +	/*
>> +	 * It's too late to remember replicaset.vclock as local
>> +	 * vclock at subscribe. It might have incremented while we
>> +	 * were writing a subscribe response, and we don't want to
>> +	 * replicate back rows originating from the replica and
>> +	 * having arrived later than replica has issued
>> +	 * SUBSCRIBE.
> 
> 
> I still don't und
>> +	 */
>> +	vclock_copy(&relay->local_vclock_at_subscribe, clock_at_subscribe);
>> 	relay->r = recovery_new(cfg_gets("wal_dir"), false,
>> 			        replica_clock);
>> 	vclock_copy(&relay->tx.vclock, replica_clock);
>> diff --git a/src/box/relay.h b/src/box/relay.h
>> index e1782d78f..54ebd6731 100644
>> --- a/src/box/relay.h
>> +++ b/src/box/relay.h
>> @@ -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct vclock *start_vclock,
>>  */
>> void
>> relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> -		struct vclock *replica_vclock, uint32_t replica_version_id);
>> +		struct vclock *replica_vclock, uint32_t replica_version_id,
>> +		struct vclock *clock_at_subscribe);
>> 
>> #endif /* TARANTOOL_REPLICATION_RELAY_H_INCLUDED */
>> -- 
>> 2.20.1 (Apple Git-117)
>> 
> 
> -- 
> Konstantin Osipov, Moscow, Russia
> https://scylladb.com <https://scylladb.com/>

[-- Attachment #2: Type: text/html, Size: 26292 bytes --]

  reply	other threads:[~2020-02-14 10:46 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13 21:52 [Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance sergepetrenko
2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 1/4] box: expose box_is_orphan method sergepetrenko
2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly sergepetrenko
2020-02-14  7:19   ` Konstantin Osipov
2020-02-14  7:29     ` Konstantin Osipov
2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn sergepetrenko
2020-02-14  7:20   ` Konstantin Osipov
2020-02-14 10:46     ` Serge Petrenko
2020-02-16 16:15   ` Vladislav Shpilevoy
2020-02-18 17:28     ` Serge Petrenko
2020-02-18 21:15       ` Vladislav Shpilevoy
2020-02-19  8:46         ` Serge Petrenko
2020-02-13 21:53 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily sergepetrenko
2020-02-14  7:25   ` Konstantin Osipov
2020-02-14 10:46     ` Serge Petrenko [this message]
2020-02-14 10:52       ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A2EF2902-12AE-42C6-ACA0-0A2D7FF45B31@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox