Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] applier: fix upstream.lag calculations
@ 2021-08-13 14:25 Serge Petrenko via Tarantool-patches
  2021-08-14  6:42 ` Vitaliia Ioffe via Tarantool-patches
  0 siblings, 1 reply; 3+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-08-13 14:25 UTC (permalink / raw)
  To: v.ioffe, kyukhin; +Cc: tarantool-patches

upstream.lag is the delta between the moment when a row was written to
master's journal and the moment when it was received by the replica.
It's an important metric to check whether the replica has fallen too far
behind master.

Not all the rows coming from master have a valid time of creation. For
example, RAFT system messages don't have one, and we can't assign
correct time to them: these messages do not originate from the journal,
and assigning current time to them would lead to jumps in upstream.lag
results.

Stop updating upstream.lag for rows which don't have creation time
assigned.

This also fixes the flaky replication/errinj.test.lua
---
https://github.com/tarantool/tarantool/tree/sp/applier-lag-fix

 src/box/applier.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 902d0bc72..9256078e1 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -664,7 +664,8 @@ applier_read_tx_row(struct applier *applier, double timeout)
 
 	coio_read_xrow_timeout_xc(coio, ibuf, row, timeout);
 
-	applier->lag = ev_now(loop()) - row->tm;
+	if (row->tm > 0)
+		applier->lag = ev_now(loop()) - row->tm;
 	applier->last_row_time = ev_monotonic_now(loop());
 	return tx_row;
 }
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches]  [PATCH] applier: fix upstream.lag calculations
  2021-08-13 14:25 [Tarantool-patches] [PATCH] applier: fix upstream.lag calculations Serge Petrenko via Tarantool-patches
@ 2021-08-14  6:42 ` Vitaliia Ioffe via Tarantool-patches
  2021-08-14  8:03   ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 3+ messages in thread
From: Vitaliia Ioffe via Tarantool-patches @ 2021-08-14  6:42 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]


Hi Sergey, 
 
I’m so sorry for saying it: but this fix is not a fix. I have to underline there were failed tests:
[037] replication/errinj.test.lua                     memtx           [ fail ]
[037] replication/errinj.test.lua                     vinyl              [ fail ]
 
You can find it here: 
https://github.com/tarantool/tarantool/runs/3322606890
 
 
--
Vitaliia Ioffe
 
  
>Пятница, 13 августа 2021, 17:25 +03:00 от Serge Petrenko <sergepetrenko@tarantool.org>:
> 
>upstream.lag is the delta between the moment when a row was written to
>master's journal and the moment when it was received by the replica.
>It's an important metric to check whether the replica has fallen too far
>behind master.
>
>Not all the rows coming from master have a valid time of creation. For
>example, RAFT system messages don't have one, and we can't assign
>correct time to them: these messages do not originate from the journal,
>and assigning current time to them would lead to jumps in upstream.lag
>results.
>
>Stop updating upstream.lag for rows which don't have creation time
>assigned.
>
>This also fixes the flaky replication/errinj.test.lua
>---
>https://github.com/tarantool/tarantool/tree/sp/applier-lag-fix
>
> src/box/applier.cc | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/src/box/applier.cc b/src/box/applier.cc
>index 902d0bc72..9256078e1 100644
>--- a/src/box/applier.cc
>+++ b/src/box/applier.cc
>@@ -664,7 +664,8 @@ applier_read_tx_row(struct applier *applier, double timeout)
> 
>  coio_read_xrow_timeout_xc(coio, ibuf, row, timeout);
> 
>- applier->lag = ev_now(loop()) - row->tm;
>+ if (row->tm > 0)
>+ applier->lag = ev_now(loop()) - row->tm;
>  applier->last_row_time = ev_monotonic_now(loop());
>  return tx_row;
> }
>--
>2.30.1 (Apple Git-130)
 

[-- Attachment #2: Type: text/html, Size: 2740 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH] applier: fix upstream.lag calculations
  2021-08-14  6:42 ` Vitaliia Ioffe via Tarantool-patches
@ 2021-08-14  8:03   ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 3+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-08-14  8:03 UTC (permalink / raw)
  To: Vitaliia Ioffe; +Cc: tarantool-patches



14.08.2021 09:42, Vitaliia Ioffe пишет:
> Hi Sergey,
> I’m so sorry for saying it: but this fix is not a fix. I have to 
> underline there were failed tests:
>
> [037] replication/errinj.test.lua                 memtx           [ fail ]
>
> [037] replication/errinj.test.lua                   vinyl            
>   [ fail ]
> You can find it here:
> https://github.com/tarantool/tarantool/runs/3322606890
> --
> Vitaliia Ioffe
Don't be sorry, I didn't check the patch thoroughly enough.

Applied the following diff and reworded the patch a bit.
Everything should be fine now.

===================================

diff --git a/test/replication/errinj.result b/test/replication/errinj.result
index 9d13f6aa7..ec251182f 100644
--- a/test/replication/errinj.result
+++ b/test/replication/errinj.result
@@ -308,7 +308,10 @@ box.info.replication[1].upstream.lag > 0
  ---
  - true
  ...
-box.info.replication[1].upstream.lag < 1
+-- Upstream lag is huge until the first row is received.
+test_run:wait_cond(function()\
+    return box.info.replication[1].upstream.lag < 1\
+end)
  ---
  - true
  ...
diff --git a/test/replication/errinj.test.lua 
b/test/replication/errinj.test.lua
index 19234ab35..7f6535ec1 100644
--- a/test/replication/errinj.test.lua
+++ b/test/replication/errinj.test.lua
@@ -130,7 +130,10 @@ test_run:cmd("switch replica")
  while box.info.replication[1].upstream.status ~= 'follow' do 
fiber.sleep(0.0001) end
  box.info.replication[1].upstream.status
  box.info.replication[1].upstream.lag > 0
-box.info.replication[1].upstream.lag < 1
+-- Upstream lag is huge until the first row is received.
+test_run:wait_cond(function()\
+    return box.info.replication[1].upstream.lag < 1\
+end)
  -- wait for ack timeout
  test_run:wait_upstream(1, {status='disconnected', 
message_re='unexpected EOF'})


===================================
>
>     Пятница, 13 августа 2021, 17:25 +03:00 от Serge Petrenko
>     <sergepetrenko@tarantool.org>:
>     upstream.lag is the delta between the moment when a row was written to
>     master's journal and the moment when it was received by the replica.
>     It's an important metric to check whether the replica has fallen
>     too far
>     behind master.
>
>     Not all the rows coming from master have a valid time of creation. For
>     example, RAFT system messages don't have one, and we can't assign
>     correct time to them: these messages do not originate from the
>     journal,
>     and assigning current time to them would lead to jumps in upstream.lag
>     results.
>
>     Stop updating upstream.lag for rows which don't have creation time
>     assigned.
>
>     This also fixes the flaky replication/errinj.test.lua
>     ---
>     https://github.com/tarantool/tarantool/tree/sp/applier-lag-fix
>     <https://github.com/tarantool/tarantool/tree/sp/applier-lag-fix>
>
>      src/box/applier.cc | 3 ++-
>      1 file changed, 2 insertions(+), 1 deletion(-)
>
>     diff --git a/src/box/applier.cc b/src/box/applier.cc
>     index 902d0bc72..9256078e1 100644
>     --- a/src/box/applier.cc
>     +++ b/src/box/applier.cc
>     @@ -664,7 +664,8 @@ applier_read_tx_row(struct applier *applier,
>     double timeout)
>
>       coio_read_xrow_timeout_xc(coio, ibuf, row, timeout);
>
>     - applier->lag = ev_now(loop()) - row->tm;
>     + if (row->tm > 0)
>     + applier->lag = ev_now(loop()) - row->tm;
>       applier->last_row_time = ev_monotonic_now(loop());
>       return tx_row;
>      }
>     --
>     2.30.1 (Apple Git-130)
>

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-14  8:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13 14:25 [Tarantool-patches] [PATCH] applier: fix upstream.lag calculations Serge Petrenko via Tarantool-patches
2021-08-14  6:42 ` Vitaliia Ioffe via Tarantool-patches
2021-08-14  8:03   ` Serge Petrenko via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox