From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Cyrill Gorcunov <gorcunov@gmail.com>, tml <tarantool-patches@dev.tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v9 2/2] relay: provide information about downstream lag Date: Sun, 20 Jun 2021 16:37:21 +0200 [thread overview] Message-ID: <909ea97e-9b56-d327-860b-65aba685fce3@tarantool.org> (raw) In-Reply-To: <20210617154835.315576-3-gorcunov@gmail.com> Hi! Thanks for the patch! The test fails when I run it multiple times: [014] Test failed! Result content mismatch: [014] --- replication/gh-5447-downstream-lag.result Sun Jun 20 16:10:26 2021 [014] +++ var/rejects/replication/gh-5447-downstream-lag.reject Sun Jun 20 16:33:01 2021 [014] @@ -37,7 +37,7 @@ [014] -- Upon replica startup there is no ACKs to process. [014] assert(box.info.replication[replica_id].downstream.lag == 0) [014] | --- [014] - | - true [014] + | - error: assertion failed! See 4 comments below. > @@ -629,6 +673,19 @@ relay_reader_f(va_list ap) > /* vclock is followed while decoding, zeroing it. */ > vclock_create(&relay->recv_vclock); > xrow_decode_vclock_xc(&xrow, &relay->recv_vclock); > + /* > + * Replica send us last replicated transaction > + * timestamp which is needed for relay lag > + * monitoring. Note that this transaction has > + * been written to WAL with our current realtime > + * clock value, thus when it get reported back we > + * can compute time spent regardless of the clock > + * value on remote replica. > + */ > + if (relay->txn_acked_tm < xrow.tm) { 1. Why do you need this `if`? Why xrow.tm != 0 is not enough? (It is 0 when replicate to old versions.) ACKs are sent in the same order as the rows, so there can't be any reordering. If it is intended to be used against time changes, this check won't work it seems. If time is moved far into the future, the check passes and the lag will be huge for some time. No protection. And there can't be. If the time is moved far into the past, the check will freeze for the time shift size. Even when all the old transactions are acked and new ones are coming. Because you cached txn_acked_tm in the old time system. No protection either. Looks even like a bug, because the lag freezes regardless of whether there are new transactions ACKed with the new time system or not. It will wait for the new time system to catch up with the old txn_acked_tm. If the timestamp is not needed, you can drop txn_acked_tm member from struct relay. > diff --git a/test/replication/gh-5447-downstream-lag.result b/test/replication/gh-5447-downstream-lag.result > new file mode 100644 > index 000000000..2cc020451 > --- /dev/null > +++ b/test/replication/gh-5447-downstream-lag.result > @@ -0,0 +1,128 @@ <...> > +-- Insert a record and wakeup replica's WAL to process data. > +test_run:switch('default') > + | --- > + | - true > + | ... > +lsn = box.info.lsn > + | --- > + | ... > +box.space.test:insert({1}) > + | --- > + | - [1] > + | ... > +test_run:wait_cond(function() return box.info.lsn > lsn end) 2. You don't need it. You did blocking insert(), which returns only after the WAL write is done. > + | --- > + | - true > + | ... > +-- The record is written on the master node. > +test_run:switch('replica') > + | --- > + | - true > + | ... > +box.error.injection.set("ERRINJ_WAL_DELAY", false) > + | --- > + | - ok > + | ... > + > +-- > +-- Wait the record to be ACKed, the lag value should be nonzero. > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:wait_lsn('replica', 'default') > + | --- > + | ... > +assert(box.info.replication[replica_id].downstream.lag > 0) > + | --- > + | - true > + | ... > + > +-- > +-- Cleanup everything. > +test_run:switch('default') > + | --- > + | - true > + | ... > +box.schema.user.revoke('guest', 'replication') 3. You didn't drop the space. > + | --- > + | ... > +test_run:cmd('stop server replica') > + | --- > + | - true > + | ... > +test_run:cmd('cleanup server replica') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica') > + | --- > + | - true > + | ... 4. Your test uses error injections. It means it must be configured not to run in Release build. See replication/suite.ini.
next prev parent reply other threads:[~2021-06-20 14:37 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-17 15:48 [Tarantool-patches] [PATCH v9 0/2] relay: provide downstream lag information Cyrill Gorcunov via Tarantool-patches 2021-06-17 15:48 ` [Tarantool-patches] [PATCH v9 1/2] applier: send transaction's first row WAL time in the applier_writer_f Cyrill Gorcunov via Tarantool-patches 2021-06-18 9:51 ` Serge Petrenko via Tarantool-patches 2021-06-18 18:06 ` Cyrill Gorcunov via Tarantool-patches 2021-06-21 8:35 ` Serge Petrenko via Tarantool-patches 2021-06-17 15:48 ` [Tarantool-patches] [PATCH v9 2/2] relay: provide information about downstream lag Cyrill Gorcunov via Tarantool-patches 2021-06-18 9:50 ` Serge Petrenko via Tarantool-patches 2021-06-20 14:37 ` Vladislav Shpilevoy via Tarantool-patches [this message] 2021-06-21 8:44 ` Cyrill Gorcunov via Tarantool-patches 2021-06-21 16:17 ` Cyrill Gorcunov via Tarantool-patches 2021-06-21 21:16 ` Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=909ea97e-9b56-d327-860b-65aba685fce3@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v9 2/2] relay: provide information about downstream lag' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox