[Tarantool-patches] [RFC v3 3/3] relay: provide information about downstream lag

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Apr 30 23:50:29 MSK 2021


I appreciate the work you did here!

See 4 comments below.

On 30.04.2021 17:39, Cyrill Gorcunov via Tarantool-patches wrote:
> We already have `box.replication.upstream.lag` entry for monitoring
> sake. Same time in synchronous replication timeouts are key properties
> for quorum gathering procedure. Thus we would like to know how long
> it took of a transaction to traverse `initiator WAL -> network ->
> remote applier -> ACK` path.
> 
> In this patch we use new applier's functionality which returns us
> the timestamp of first written xrow in a transaction such that we
> can calculate the downstream lag.
> 
> Closes #5447

1. There must be a docbot request and a changelog file.

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index ff43c2fc7..6d880932a 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -158,6 +158,12 @@ struct relay {
>  	struct stailq pending_gc;
>  	/** Time when last row was sent to peer. */
>  	double last_row_time;
> +	/**
> +	 * A time difference between moment when we
> +	 * wrote the row in a local WAL and peer wrote
> +	 * it in own WAL.
> +	 */
> +	double peer_wal_lag;

2. It is not true really. It also includes 2 network hops.
I wouldn't call it 'wal'. It is simply lag. Or latency.

> @@ -614,15 +626,41 @@ relay_reader_f(va_list ap)
>  	coio_create(&io, relay->io.fd);
>  	ibuf_create(&ibuf, &cord()->slabc, 1024);
>  	try {
> -		while (!fiber_is_cancelled()) {
> -			struct xrow_header xrow;
> +		struct xrow_header xrow;
> +		double prev_tm;
> +
> +		/*
> +		 * Make a first read as a separate action because
> +		 * we need previous timestamp from the xrow to
> +		 * calculate delta from (to eliminate branching
> +		 * in next reads).
> +		 */
> +		if (!fiber_is_cancelled()) {
>  			coio_read_xrow_timeout_xc(&io, &ibuf, &xrow,
> -					replication_disconnect_timeout());
> +				replication_disconnect_timeout());

3. You didn't have to change this line.

> +			prev_tm = xrow.tm;
> +		}
> +
> +		do {
>  			/* vclock is followed while decoding, zeroing it. */
>  			vclock_create(&relay->recv_vclock);
>  			xrow_decode_vclock_xc(&xrow, &relay->recv_vclock);
> +			/*
> +			 * Old instances do not report the timestamp.
> +			 * Same time in case of idle cycles the xrow.tm
> +			 * is the same so update lag only when new data
> +			 * been acked.
> +			 */
> +			if (prev_tm != xrow.tm) {

4. It also could be reset to zero when there are no rows and send/received
vclock match. Because it means there is no rows to ack, and therefore can't
be any latency. Or you can use relay heartbeats to update it even when
there are no rows.

> +				double delta = ev_now(loop()) - xrow.tm;
> +				relay->peer_wal_lag = delta;
> +				prev_tm = xrow.tm;
> +			}


More information about the Tarantool-patches mailing list