[Tarantool-patches] [PATCH v2] qsync: provide box.info interface for monitoring

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Wed Apr 7 23:53:27 MSK 2021


Hi! Thanks for the patch!

See 7 comments below.

>     qsync: provide box.info interface for monitoring
>     
>     Since commit 14fa5fd82 we support symbolic evaluation of

1. Please, after a commit hash provide the referenced commit
title in parentheses and quotes.
https://github.com/tarantool/tarantool/wiki/Code-review-procedure#commit-message

>     `replication_synchro_quorum` parameter and there is no easy
>     way to obtain it current run-time value, ie evaluated one.
>     Moreover we would like to fetch queue length on transaction
>     limbo for tests and extend this statistics in future.
>     
>     Thus we introduce the "synchro" leaf in box.info interface.
>     For now only a few entries are printed out
>     
>      | tarantool> box.info.synchro
>      | ---
>      | - queue:
>      |     1:
>      |       len: 0
>      |   quorum: 1
>      | ...

2. AFAIK, it was decided not to return an array, and return only
the currently used queue. But even if it would be an array, as I
said in the previous review, the indexing must be by box.info.id,
not by any order. Assume there are multiple limbos, and the master is
not instance_id = 1. How is a user supposed to find what is the
queue in the currently used limbo? And what is the currently used
limbo?

Besides, the commit message and the docbot request are almost the
same. You could drop the message and leave only the docbot request
if you don't want to support both places. Also it would save time
on reading that, anyway the information is the same.

>     The `queue` represents limbo instances (since we support only
>     one limbo for now the sole entry is printed) and `len` member
>     shows the number of entries in the queue.
>     
>     The `quorum` member shows the evaluated value of
>     `replication_synchro_quorum` parameter.
>     
>     Closes #5191
>     
>     Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
>     
>     @TarantoolBot document
>     Title: Provice `box.info.synchro` interface

3. Provice -> Provide.

>     The `box.info.synchro` leaf provides information about details of
>     synchronous replication.
>     
>     In particular `quorum` represent the current value of synchronous
>     replication quorum defined by `replication_synchro_quorum`
>     configuration parameter since it can be set as dynamic formula
>     such as `N/2+1` and the value depends on current number
>     of replicas.
>     
>     Since synchronous replication does not commit data immediately
>     but waits for its propagation to replicas such data sits in
>     a queue gathering `commit` responses from remote nodes. Current
>     number of entries sitting in the queue is shown by `queue.len`
>     member.
>     
>     A typical output is the following
>     
>     ``` Lua
>     tarantool> box.info.synchro
>     ---
>     - queue:
>         1:
>           len: 0
>       quorum: 1
>     ...
>     ```
>     
>     For now only one `queue` is supported so the output is filled with
>     one array entry. In future the multiple queues might be implemented.
> 
> diff --git a/src/box/lua/info.c b/src/box/lua/info.c
> index 8cd379756..18b18cc90 100644
> --- a/src/box/lua/info.c
> +++ b/src/box/lua/info.c
> @@ -599,6 +600,47 @@ lbox_info_election(struct lua_State *L)
>  	return 1;
>  }
>  
> +static void
> +lbox_push_synchro_queue(struct lua_State *L)
> +{
> +	/*
> +	 * For fancy formatting.
> +	 */
> +	lua_newtable(L);
> +	lua_pushliteral(L, "mapping");
> +	lua_setfield(L, -2, "__serialize");
> +	lua_setmetatable(L, -2);
> +
> +	struct txn_limbo *queue = &txn_limbo;
> +	const int queue_nr = 1;
> +
> +	lua_createtable(L, 0, 1);
> +	lua_pushstring(L, "len");
> +	lua_pushnumber(L, queue->len);
> +	lua_settable(L, -3);

4. There is a function setfield(), which you
can use instead of pushstring() + settable(). You even
did use it above and below, but somewhy not here.

> +	lua_rawseti(L, -2, queue_nr);
> +}
> diff --git a/test/replication/gh-5446-qsync-eval-quorum.result b/test/replication/gh-5446-qsync-eval-quorum.result
> index 1d13f26db..33d64b5b1 100644
> --- a/test/replication/gh-5446-qsync-eval-quorum.result
> +++ b/test/replication/gh-5446-qsync-eval-quorum.result
> @@ -2,6 +2,9 @@
>  test_run = require('test_run').new()
>   | ---
>   | ...
> +fiber = require('fiber')
> + | ---
> + | ...
>  engine = test_run:get_cfg('engine')
>   | ---
>   | ...
> @@ -123,6 +126,11 @@ s:insert{3} -- should pass
>   | - [3]
>   | ...
>  
> +assert(box.info.synchro.quorum == 2)
> + | ---
> + | - true
> + | ...
> +
>  -- 6 replicas, 7 nodes -> replication_synchro_quorum = 7/2 + 1 = 4
>  test_run:cmd('create server replica2 with rpl_master=default,\
>                script="replication/replica-quorum-2.lua"')
> @@ -174,6 +182,11 @@ test_run:cmd('start server replica6 with wait=True, wait_load=True')
>   | - true
>   | ...
>  
> +assert(box.info.synchro.quorum == 4)
> + | ---
> + | - true
> + | ...
> +
>  -- All replicas are up and running
>  s:insert{4} -- should pass
>   | ---
> @@ -254,49 +267,76 @@ s:insert{11} -- should pass
>   | - [11]
>   | ...
>  
> --- cleanup
> -
> +-- To test queue lentgh tracking we should enter a state

5. lentgh -> length.

> +-- where attempt to write data stucks waiting for replication

6. stuck is past from stick, it does not conjugate. But I also
often forget that, so up to you. Still understandable.

But a more important comment - please, don't alter the old tests
just to get them cover your new feature whenever it feels like it.
The test is called "gh-5446-qsync-eval-quorum", therefore it must
be about quorum. Not about the queue length.

Testing box.info.synchro.quorum is fine here, it is related. But
not the queue length.

For testing the length you can add some new cases to qsync_basic.test.lua,
or even qsync_advanced.test.lua, if you don't want to create a
new file.

7. box/info.test.lua fails on your branch.



More information about the Tarantool-patches mailing list