Re: [Tarantool-patches] [PATCH v8 2/2] relay: provide information about downstream lag - Vladislav Shpilevoy via Tarantool-patches

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: tml <tarantool-patches@dev.tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v8 2/2] relay: provide information about downstream lag
Date: Tue, 8 Jun 2021 20:15:09 +0200	[thread overview]
Message-ID: <3a0cf108-090f-e29c-b168-d00d3266fc7d@tarantool.org> (raw)
In-Reply-To: <YL8tAey0KgDwf6Wd@grain>

On 08.06.2021 10:40, Cyrill Gorcunov wrote:
> On Mon, Jun 07, 2021 at 09:21:09PM +0200, Vladislav Shpilevoy wrote:
>>>  
>>> +double
>>> +relay_txn_lag(const struct relay *relay)
>>> +{
>>> +	return relay->txn_lag;
>>
>> 1. As I said in the previous review, you can't read a variable from another
>> thread without any protection.
> 
> Let me explain why I did so - I really don't like that we have to add another
> variable into relay structure: we already have the lag keeper in replica
> structure and since the lag value is not any kind of sync point or some flag
> the value of which changes program flow logic, we can use parallel read from
> another thread. Moreover we could use guaranteed atomic read operation, at
> least on x86 (via return *(int64_t *)relay->txn_lag, though we must be sure
> the member is qword aligned).

It is not only x86 anymore. Any assumptions, on which the data correctness
depends, must not be made.

>>> @@ -629,6 +659,26 @@ relay_reader_f(va_list ap)
>>>  			/* vclock is followed while decoding, zeroing it. */
>>>  			vclock_create(&relay->recv_vclock);
>>>  			xrow_decode_vclock_xc(&xrow, &relay->recv_vclock);
>>> +			/*
>>> +			 * Replica send us last replicated transaction
>>> +			 * timestamp which is needed for relay lag
>>> +			 * monitoring. Note that this transaction has
>>> +			 * been written to WAL with our current realtime
>>> +			 * clock value, thus when it get reported back we
>>> +			 * can compute time spent regardless of the clock
>>> +			 * value on remote replica.
>>> +			 *
>>> +			 * An interesting moment is replica restart - it will
>>> +			 * send us value 0 after that but we can preserve
>>> +			 * old reported value here since we *assume* that
>>> +			 * timestamp is not going backwards on properly
>>> +			 * set up nodes, otherwise the lag get raised.
>>> +			 * After all this is a not tamper-proof value.
>>
>> 2. I don't understand. Why does it send value 0? And if it does, why
>> can't you ignore only zeros? The non-0 values must be valid anyway.
> 
> IOW, the real situation is the following:
> 
>  - if replica restarted, but main node is alive, the lag report on the
>    main node is dropped to 0
> 
>  - if main node get restarted, then lag report is dropped to 0 as well
> 
> I suppose this is expected? I'll update the comment above.

Drop to 0 in case of any reconnect until data is being replicated again
is fine.

>>> +++ b/test/replication/gh-5447-downstream-lag.result
>>> @@ -0,0 +1,93 @@
>>> +-- test-run result file version 2
>>> +--
>>> +-- gh-5447: Test for box.info.replication[n].downstream.lag.
>>> +-- We need to be sure that if replica start been back of
>>> +-- master node reports own lagging and cluster admin would
>>> +-- be able to detect such situation.
>>
>> 3. I couldn't parse the last sentence. Could you use some
>> punctuation? It might help.
> 
> Would the following be better? "We need to be sure that slow
> ACKs delivery might be catched by monitoring tools".

Yes. Except 'catched' -> 'caught'.

>>> +
>>> +--
>>> +-- The replica should wait some time (wal delay is 1 second
>>> +-- by default) so we would be able to detect the lag, since
>>> +-- on local instances the lag is minimal and usually transactions
>>> +-- are handled instantly.
>>
>> 4. But it is not 1 second. usleep(1000) means 1 millisecond, and it
> 
> No, usleep(1000) means exactly 1 second, this system call works with
> microseconds, I think you misread it with nanosleep().

1 second has 1 000 000 microseconds. 1000us is 1ms. Not 1s.

>> happens in a loop, so it does not matter much. It works until you
>> set the delay back to false. That makes WAL thread blocked until
>> you free it. It is not a fixed delay.
> 
> Not sure I follow you here. We force wal engine to slow down _each_
> write to take at least 1 second long, in turn this will delay the
> ACK delivery and calculated lag won't be zero.

No, this is not how it works. When you turn on the delay, you block
the WAL thread from doing anything. It is not a delay per transaction.
Please, open ERROR_INJECT_SLEEP() (which is used for ERRINJ_WAL_DELAY)
and see how it works. I paste it below.

	#  define ERROR_INJECT_WHILE(ID, CODE) \
		do { \
			while (errinj(ID, ERRINJ_BOOL)->bparam) \
				CODE; \
		} while (0)

It is a busy loop until you drop the injection back to false.

next prev parent reply	other threads:[~2021-06-08 18:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 15:55 [Tarantool-patches] [PATCH v8 0/2] relay: provide downstream lag information Cyrill Gorcunov via Tarantool-patches
2021-06-07 15:55 ` [Tarantool-patches] [PATCH v8 1/2] applier: send transaction's first row WAL time in the applier_writer_f Cyrill Gorcunov via Tarantool-patches
2021-06-07 19:20   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-15  9:36   ` Serge Petrenko via Tarantool-patches
2021-06-16 13:32     ` Cyrill Gorcunov via Tarantool-patches
2021-06-17  9:16       ` Serge Petrenko via Tarantool-patches
2021-06-07 15:55 ` [Tarantool-patches] [PATCH v8 2/2] relay: provide information about downstream lag Cyrill Gorcunov via Tarantool-patches
2021-06-07 19:21   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-08  8:40     ` Cyrill Gorcunov via Tarantool-patches
2021-06-08  9:58       ` Cyrill Gorcunov via Tarantool-patches
2021-06-08 18:15       ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-06-15 10:03   ` Serge Petrenko via Tarantool-patches
2021-06-17  6:55     ` Cyrill Gorcunov via Tarantool-patches
2021-06-17  9:01       ` Serge Petrenko via Tarantool-patches
2021-06-17  9:58         ` Cyrill Gorcunov via Tarantool-patches
2021-06-07 19:20 ` [Tarantool-patches] [PATCH v8 0/2] relay: provide downstream lag information Vladislav Shpilevoy via Tarantool-patches
2021-06-07 20:00   ` Cyrill Gorcunov via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3a0cf108-090f-e29c-b168-d00d3266fc7d@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v8 2/2] relay: provide information about downstream lag' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox