[Tarantool-patches] [PATCH v8 2/2] relay: provide information about downstream lag

Serge Petrenko sergepetrenko at tarantool.org
Tue Jun 15 13:03:59 MSK 2021



07.06.2021 18:55, Cyrill Gorcunov пишет:
> We already have `box.replication.upstream.lag` entry for monitoring
> sake. Same time in synchronous replication timeouts are key properties
> for quorum gathering procedure. Thus we would like to know how long
> it took of a transaction to traverse `initiator WAL -> network ->
> remote applier -> ACK` path.

Hi! Thanks for the patch! Please, find a couple of comments below.

>
> Typical output is
>
>   | tarantool> box.info.replication[2].downstream
>   | ---
>   | - status: follow
>   |   idle: 0.61753897101153
>   |   vclock: {1: 147}
>   |   lag: 0
>   | ...
>   | tarantool> box.space.sync:insert{69}
>   | ---
>   | - [69]
>   | ...
>   |
>   | tarantool> box.info.replication[2].downstream
>   | ---
>   | - status: follow
>   |   idle: 0.75324084801832
>   |   vclock: {1: 151}
>   |   lag: 0.0011014938354492
>   | ...
>
> Closes #5447
>
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
>
> @TarantoolBot document
> Title: Add `box.info.replication[n].downstream.lag` entry
>
> `replication[n].downstream.lag` is the time difference between
> last transactions been written to the WAL journal of the transaction
> initiator and the transaction written to WAL on the `n` replica.
>
> In other words this is a lag in seconds between the main node writes
> data to own WAL and the replica `n` get this data replicated to own
> WAL journal.

This is not true. You describe `upstream.lag` in this paragraph.
Downstream lag is the time difference between the WAL write on master
side and the receipt of an ack (confirmation of a WAL write on replica)
for this transaction. Also on master side.

>
> In case if a transaction failed to replicate the lag value won't be
> modified because only successfully applied transactions are accounted.
> Same time if the main node or a repllica get restarted the lag value
> will be zero until next success transaction.
> ---
>   .../unreleased/gh-5447-downstream-lag.md      |  6 ++
>   src/box/lua/info.c                            |  3 +
>   src/box/relay.cc                              | 50 ++++++++++
>   src/box/relay.h                               |  6 ++
>   .../replication/gh-5447-downstream-lag.result | 93 +++++++++++++++++++
>   .../gh-5447-downstream-lag.test.lua           | 41 ++++++++
>   6 files changed, 199 insertions(+)
>   create mode 100644 changelogs/unreleased/gh-5447-downstream-lag.md
>   create mode 100644 test/replication/gh-5447-downstream-lag.result
>   create mode 100644 test/replication/gh-5447-downstream-lag.test.lua
>
> diff --git a/changelogs/unreleased/gh-5447-downstream-lag.md b/changelogs/unreleased/gh-5447-downstream-lag.md
> new file mode 100644
> index 000000000..726175c6c
> --- /dev/null
> +++ b/changelogs/unreleased/gh-5447-downstream-lag.md
> @@ -0,0 +1,6 @@
> +#feature/replication
> +
> + * Introduced `box.info.replication[n].downstream.lag` field to monitor
> +   state of replication. This member represents time spent between
> +   transaction been written to initiator's WAL file and reached WAL
> +   file of a replica (gh-5447).

Same here.

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list