Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
	tml <tarantool-patches@dev.tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v9 2/2] relay: provide information about downstream lag
Date: Sun, 20 Jun 2021 16:37:21 +0200	[thread overview]
Message-ID: <909ea97e-9b56-d327-860b-65aba685fce3@tarantool.org> (raw)
In-Reply-To: <20210617154835.315576-3-gorcunov@gmail.com>

Hi! Thanks for the patch!

The test fails when I run it multiple times:

[014] Test failed! Result content mismatch:
[014] --- replication/gh-5447-downstream-lag.result	Sun Jun 20 16:10:26 2021
[014] +++ var/rejects/replication/gh-5447-downstream-lag.reject	Sun Jun 20 16:33:01 2021
[014] @@ -37,7 +37,7 @@
[014]  -- Upon replica startup there is no ACKs to process.
[014]  assert(box.info.replication[replica_id].downstream.lag == 0)
[014]   | ---
[014] - | - true
[014] + | - error: assertion failed!

See 4 comments below.

> @@ -629,6 +673,19 @@ relay_reader_f(va_list ap)
>  			/* vclock is followed while decoding, zeroing it. */
>  			vclock_create(&relay->recv_vclock);
>  			xrow_decode_vclock_xc(&xrow, &relay->recv_vclock);
> +			/*
> +			 * Replica send us last replicated transaction
> +			 * timestamp which is needed for relay lag
> +			 * monitoring. Note that this transaction has
> +			 * been written to WAL with our current realtime
> +			 * clock value, thus when it get reported back we
> +			 * can compute time spent regardless of the clock
> +			 * value on remote replica.
> +			 */
> +			if (relay->txn_acked_tm < xrow.tm) {

1. Why do you need this `if`? Why xrow.tm != 0 is not enough? (It is 0
when replicate to old versions.) ACKs are sent in the same order as the
rows, so there can't be any reordering.

If it is intended to be used against time changes, this check won't work
it seems. If time is moved far into the future, the check passes and the
lag will be huge for some time. No protection. And there can't be.

If the time is moved far into the past, the check will freeze for the
time shift size. Even when all the old transactions are acked and new
ones are coming. Because you cached txn_acked_tm in the old time system.
No protection either. Looks even like a bug, because the lag freezes
regardless of whether there are new transactions ACKed with the new time
system or not. It will wait for the new time system to catch up with the
old txn_acked_tm.

If the timestamp is not needed, you can drop txn_acked_tm member from
struct relay.

> diff --git a/test/replication/gh-5447-downstream-lag.result b/test/replication/gh-5447-downstream-lag.result
> new file mode 100644
> index 000000000..2cc020451
> --- /dev/null
> +++ b/test/replication/gh-5447-downstream-lag.result
> @@ -0,0 +1,128 @@

<...>

> +-- Insert a record and wakeup replica's WAL to process data.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +lsn = box.info.lsn
> + | ---
> + | ...
> +box.space.test:insert({1})
> + | ---
> + | - [1]
> + | ...
> +test_run:wait_cond(function() return box.info.lsn > lsn end)

2. You don't need it. You did blocking insert(), which returns only
after the WAL write is done.

> + | ---
> + | - true
> + | ...
> +-- The record is written on the master node.
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.error.injection.set("ERRINJ_WAL_DELAY", false)
> + | ---
> + | - ok
> + | ...
> +
> +--
> +-- Wait the record to be ACKed, the lag value should be nonzero.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:wait_lsn('replica', 'default')
> + | ---
> + | ...
> +assert(box.info.replication[replica_id].downstream.lag > 0)
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- Cleanup everything.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.schema.user.revoke('guest', 'replication')

3. You didn't drop the space.

> + | ---
> + | ...
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('cleanup server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica')
> + | ---
> + | - true
> + | ...
4. Your test uses error injections. It means it must be configured
not to run in Release build. See replication/suite.ini.


  parent reply	other threads:[~2021-06-20 14:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 15:48 [Tarantool-patches] [PATCH v9 0/2] relay: provide downstream lag information Cyrill Gorcunov via Tarantool-patches
2021-06-17 15:48 ` [Tarantool-patches] [PATCH v9 1/2] applier: send transaction's first row WAL time in the applier_writer_f Cyrill Gorcunov via Tarantool-patches
2021-06-18  9:51   ` Serge Petrenko via Tarantool-patches
2021-06-18 18:06     ` Cyrill Gorcunov via Tarantool-patches
2021-06-21  8:35       ` Serge Petrenko via Tarantool-patches
2021-06-17 15:48 ` [Tarantool-patches] [PATCH v9 2/2] relay: provide information about downstream lag Cyrill Gorcunov via Tarantool-patches
2021-06-18  9:50   ` Serge Petrenko via Tarantool-patches
2021-06-20 14:37   ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-06-21  8:44     ` Cyrill Gorcunov via Tarantool-patches
2021-06-21 16:17     ` Cyrill Gorcunov via Tarantool-patches
2021-06-21 21:16       ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=909ea97e-9b56-d327-860b-65aba685fce3@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v9 2/2] relay: provide information about downstream lag' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox