From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounces@dev.tarantool.org>
Received: from [87.239.111.99] (localhost [127.0.0.1])
	by dev.tarantool.org (Postfix) with ESMTP id F21156EC40;
	Tue,  8 Jun 2021 11:40:38 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F21156EC40
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev;
	t=1623141639; bh=7BK8L13hPTN/ZITPjf8skY4MoXzlB8qPWBtHvpTLOH0=;
	h=Date:To:Cc:References:In-Reply-To:Subject:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=pVPQYr5vuTq/VemLdhzU4O/phe1xiBzK69CJrvf4oHNeVvwJMk1n5YCRaZyF+bgGp
	 97VBBxl1UlugWTzaXt9Ft4tmR+voH1iOsYYvrkmEFqIi1XmFITK3qthLTyxY37uWDB
	 V0+MIe7DzLZ9jROZ6oRTOJyhWmv/o5pgVPX6AmIg=
Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com
 [209.85.208.174])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 399446EC40
 for <tarantool-patches@dev.tarantool.org>;
 Tue,  8 Jun 2021 11:40:37 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 399446EC40
Received: by mail-lj1-f174.google.com with SMTP id d2so21779775ljj.11
 for <tarantool-patches@dev.tarantool.org>;
 Tue, 08 Jun 2021 01:40:37 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to:user-agent;
 bh=EgKCa3kC0NP8thSdoSsT1dUh7QvNjlQEbxAIjt9rx80=;
 b=FyD45G4MAhzil2pfOSn19FJxDP72lYg3D8lVQBMRMINqe1Pj9loGPm/ETc3AYVgIbf
 EU8vU6SZK2MW6fun+sOQb9jitfxQjRh6ch039kjIS29oz5+5At1yxYaj8n0PJhaODK6R
 5PDSQY4FYBHGzYrKossCajx32nNbJ/H9z9VKAVhMCq0Z9rQNhqUNEbmETjOC28o3q36f
 TDkhI0zzwkM9O1MQ09ICBmlFz+1zVkbEFz3tFLMEU0R66HxCXxUDHlGHYRzIvAa/29tc
 e0OVGAZpfZly1w2pD3j/gPVn9zl/inWti2P4FDoabxlwWH+esn62rPVoQYJiXrsZ4qNF
 /Aog==
X-Gm-Message-State: AOAM533ymC3RjcV9w2eZ7P9l8aDkbYqDzM2EtfJQvA6JQOcaEuGNqbaG
 u9xo75UuV1uKNnfkbBp4ZE8EQCWeSl8=
X-Google-Smtp-Source: ABdhPJzbwS+Z3FqPQBoLAkuLUv4xmlCL2oNu6pHm+0OC0BbGQd+d+hv3o1eqJxoceFd+/p6jGEwiDw==
X-Received: by 2002:a2e:9f47:: with SMTP id v7mr18438430ljk.333.1623141635843; 
 Tue, 08 Jun 2021 01:40:35 -0700 (PDT)
Received: from grain.localdomain ([5.18.171.94])
 by smtp.gmail.com with ESMTPSA id v10sm754941lfg.224.2021.06.08.01.40.34
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 08 Jun 2021 01:40:34 -0700 (PDT)
Received: by grain.localdomain (Postfix, from userid 1000)
 id E201A5A0042; Tue,  8 Jun 2021 11:40:33 +0300 (MSK)
Date: Tue, 8 Jun 2021 11:40:33 +0300
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tml <tarantool-patches@dev.tarantool.org>
Message-ID: <YL8tAey0KgDwf6Wd@grain>
References: <20210607155519.109626-1-gorcunov@gmail.com>
 <20210607155519.109626-3-gorcunov@gmail.com>
 <decaacd4-f2ff-42f6-7c9d-7628b89284d9@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <decaacd4-f2ff-42f6-7c9d-7628b89284d9@tarantool.org>
User-Agent: Mutt/2.0.7 (2021-05-04)
Subject: Re: [Tarantool-patches] [PATCH v8 2/2] relay: provide information
 about downstream lag
X-BeenThere: tarantool-patches@dev.tarantool.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
From: Cyrill Gorcunov via Tarantool-patches
 <tarantool-patches@dev.tarantool.org>
Reply-To: Cyrill Gorcunov <gorcunov@gmail.com>
Errors-To: tarantool-patches-bounces@dev.tarantool.org
Sender: "Tarantool-patches" <tarantool-patches-bounces@dev.tarantool.org>

On Mon, Jun 07, 2021 at 09:21:09PM +0200, Vladislav Shpilevoy wrote:
> >  
> > +double
> > +relay_txn_lag(const struct relay *relay)
> > +{
> > +	return relay->txn_lag;
> 
> 1. As I said in the previous review, you can't read a variable from another
> thread without any protection.

Let me explain why I did so - I really don't like that we have to add another
variable into relay structure: we already have the lag keeper in replica
structure and since the lag value is not any kind of sync point or some flag
the value of which changes program flow logic, we can use parallel read from
another thread. Moreover we could use guaranteed atomic read operation, at
least on x86 (via return *(int64_t *)relay->txn_lag, though we must be sure
the member is qword aligned). But I presume this trick will confuse other
code readers in future because it is not obvious and without deep knowlege
of arch internals it might draw a wrong impression that such read is a bug,
especially since there is no comments in code.

> 
> Please, use the way I proposed last time. Relay has 'tx' struct inside,
> which is updated on each received ACK. You need to deliver the lag value
> to TX thread in the same way as the acked vclock is delivered. In the
> same message preferably.

Sure, will do.

> > @@ -629,6 +659,26 @@ relay_reader_f(va_list ap)
> >  			/* vclock is followed while decoding, zeroing it. */
> >  			vclock_create(&relay->recv_vclock);
> >  			xrow_decode_vclock_xc(&xrow, &relay->recv_vclock);
> > +			/*
> > +			 * Replica send us last replicated transaction
> > +			 * timestamp which is needed for relay lag
> > +			 * monitoring. Note that this transaction has
> > +			 * been written to WAL with our current realtime
> > +			 * clock value, thus when it get reported back we
> > +			 * can compute time spent regardless of the clock
> > +			 * value on remote replica.
> > +			 *
> > +			 * An interesting moment is replica restart - it will
> > +			 * send us value 0 after that but we can preserve
> > +			 * old reported value here since we *assume* that
> > +			 * timestamp is not going backwards on properly
> > +			 * set up nodes, otherwise the lag get raised.
> > +			 * After all this is a not tamper-proof value.
> 
> 2. I don't understand. Why does it send value 0? And if it does, why
> can't you ignore only zeros? The non-0 values must be valid anyway.

When replica node get restarted the applier_txn_start_tm is initialized to
zero inside relay structure creation, and since there are no new transactions
the applier_txn_start_tm remains set to 0, which replica sends out. Also I
just realized that jeeping lag inside relay structure seems to be not very
good: on reconnection the relay recreated from scratch so I zap previously
read timestamp to 0.

IOW, the real situation is the following:

 - if replica restarted, but main node is alive, the lag report on the
   main node is dropped to 0

 - if main node get restarted, then lag report is dropped to 0 as well

I suppose this is expected? I'll update the comment above.

> > +++ b/test/replication/gh-5447-downstream-lag.result
> > @@ -0,0 +1,93 @@
> > +-- test-run result file version 2
> > +--
> > +-- gh-5447: Test for box.info.replication[n].downstream.lag.
> > +-- We need to be sure that if replica start been back of
> > +-- master node reports own lagging and cluster admin would
> > +-- be able to detect such situation.
> 
> 3. I couldn't parse the last sentence. Could you use some
> punctuation? It might help.

Would the following be better? "We need to be sure that slow
ACKs delivery might be catched by monitoring tools".

> > +
> > +--
> > +-- The replica should wait some time (wal delay is 1 second
> > +-- by default) so we would be able to detect the lag, since
> > +-- on local instances the lag is minimal and usually transactions
> > +-- are handled instantly.
> 
> 4. But it is not 1 second. usleep(1000) means 1 millisecond, and it

No, usleep(1000) means exactly 1 second, this system call works with
microseconds, I think you misread it with nanosleep().

> happens in a loop, so it does not matter much. It works until you
> set the delay back to false. That makes WAL thread blocked until
> you free it. It is not a fixed delay.

Not sure I follow you here. We force wal engine to slow down _each_
write to take at least 1 second long, in turn this will delay the
ACK delivery and calculated lag won't be zero.

> > +box.space.test:insert({1})
> > + | ---
> > + | - [1]
> > + | ...
> > +test_run:wait_cond(function() return box.info.replication[2].downstream.lag ~= 0 end, 10)
> 
> 5. This condition is true even before you did the insert.

Indeed, because of space replication.

> And it couldn't change during insert, because there are no
> ACKs - the replica can't write to WAL because of the delay,
> it is blocked in a busy loop.

Hmm, need to think, thanks!

> 
> > + | ---
> > + | - true
> > + | ...
> > +
> > +test_run:switch('replica')
> > + | ---
> > + | - true
> > + | ...
> > +box.error.injection.set("ERRINJ_WAL_DELAY", false)
> > + | ---
> > + | - ok
> > + | ...
> > +--
> > +-- Cleanup everything.
> 
> 6. You need to revoke the granted rights and drop the space.

+1, thanks!