Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
	tml <tarantool-patches@dev.tarantool.org>
Cc: Mons Anderson <v.perepelitsa@corp.mail.ru>
Subject: Re: [Tarantool-patches] [RFC v3 3/3] relay: provide information about downstream lag
Date: Fri, 30 Apr 2021 22:50:29 +0200	[thread overview]
Message-ID: <24ead094-55e5-322a-3ab6-3229252b5e5a@tarantool.org> (raw)
In-Reply-To: <20210430153940.121271-4-gorcunov@gmail.com>

I appreciate the work you did here!

See 4 comments below.

On 30.04.2021 17:39, Cyrill Gorcunov via Tarantool-patches wrote:
> We already have `box.replication.upstream.lag` entry for monitoring
> sake. Same time in synchronous replication timeouts are key properties
> for quorum gathering procedure. Thus we would like to know how long
> it took of a transaction to traverse `initiator WAL -> network ->
> remote applier -> ACK` path.
> 
> In this patch we use new applier's functionality which returns us
> the timestamp of first written xrow in a transaction such that we
> can calculate the downstream lag.
> 
> Closes #5447

1. There must be a docbot request and a changelog file.

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index ff43c2fc7..6d880932a 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -158,6 +158,12 @@ struct relay {
>  	struct stailq pending_gc;
>  	/** Time when last row was sent to peer. */
>  	double last_row_time;
> +	/**
> +	 * A time difference between moment when we
> +	 * wrote the row in a local WAL and peer wrote
> +	 * it in own WAL.
> +	 */
> +	double peer_wal_lag;

2. It is not true really. It also includes 2 network hops.
I wouldn't call it 'wal'. It is simply lag. Or latency.

> @@ -614,15 +626,41 @@ relay_reader_f(va_list ap)
>  	coio_create(&io, relay->io.fd);
>  	ibuf_create(&ibuf, &cord()->slabc, 1024);
>  	try {
> -		while (!fiber_is_cancelled()) {
> -			struct xrow_header xrow;
> +		struct xrow_header xrow;
> +		double prev_tm;
> +
> +		/*
> +		 * Make a first read as a separate action because
> +		 * we need previous timestamp from the xrow to
> +		 * calculate delta from (to eliminate branching
> +		 * in next reads).
> +		 */
> +		if (!fiber_is_cancelled()) {
>  			coio_read_xrow_timeout_xc(&io, &ibuf, &xrow,
> -					replication_disconnect_timeout());
> +				replication_disconnect_timeout());

3. You didn't have to change this line.

> +			prev_tm = xrow.tm;
> +		}
> +
> +		do {
>  			/* vclock is followed while decoding, zeroing it. */
>  			vclock_create(&relay->recv_vclock);
>  			xrow_decode_vclock_xc(&xrow, &relay->recv_vclock);
> +			/*
> +			 * Old instances do not report the timestamp.
> +			 * Same time in case of idle cycles the xrow.tm
> +			 * is the same so update lag only when new data
> +			 * been acked.
> +			 */
> +			if (prev_tm != xrow.tm) {

4. It also could be reset to zero when there are no rows and send/received
vclock match. Because it means there is no rows to ack, and therefore can't
be any latency. Or you can use relay heartbeats to update it even when
there are no rows.

> +				double delta = ev_now(loop()) - xrow.tm;
> +				relay->peer_wal_lag = delta;
> +				prev_tm = xrow.tm;
> +			}

  reply	other threads:[~2021-04-30 20:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30 15:39 [Tarantool-patches] [RFC v3 0/3] relay: provide downstream lag information Cyrill Gorcunov via Tarantool-patches
2021-04-30 15:39 ` [Tarantool-patches] [RFC v3 1/3] xrow: allow to pass timestamp via xrow_encode_vclock_timed helper Cyrill Gorcunov via Tarantool-patches
2021-04-30 20:45   ` Vladislav Shpilevoy via Tarantool-patches
2021-04-30 20:50     ` Cyrill Gorcunov via Tarantool-patches
2021-05-03 20:21   ` Konstantin Osipov via Tarantool-patches
2021-05-03 20:33     ` Cyrill Gorcunov via Tarantool-patches
2021-05-03 20:37       ` Konstantin Osipov via Tarantool-patches
2021-05-03 20:42         ` Cyrill Gorcunov via Tarantool-patches
2021-04-30 15:39 ` [Tarantool-patches] [RFC v3 2/3] applier: send first row's WAL time in the applier_writer_f Cyrill Gorcunov via Tarantool-patches
2021-04-30 20:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-05-05 13:06     ` Cyrill Gorcunov via Tarantool-patches
2021-05-05 20:47       ` Vladislav Shpilevoy via Tarantool-patches
2021-05-05 22:19         ` Cyrill Gorcunov via Tarantool-patches
2021-04-30 15:39 ` [Tarantool-patches] [RFC v3 3/3] relay: provide information about downstream lag Cyrill Gorcunov via Tarantool-patches
2021-04-30 20:50   ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-04-30 20:45 ` [Tarantool-patches] [RFC v3 0/3] relay: provide downstream lag information Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24ead094-55e5-322a-3ab6-3229252b5e5a@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=v.perepelitsa@corp.mail.ru \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC v3 3/3] relay: provide information about downstream lag' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox