[Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Mar 19 02:04:11 MSK 2021


Thanks for the patch!

See 8 comments below.

On 18.03.2021 19:41, Cyrill Gorcunov via Tarantool-patches wrote:
> Part-of #5806
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
> ---
>  test/replication/gh-5806-master.lua           |  13 +
>  test/replication/gh-5806-slave.lua            |  13 +
>  test/replication/gh-5806-xlog-cleanup.result  | 336 ++++++++++++++++++
>  .../replication/gh-5806-xlog-cleanup.test.lua | 148 ++++++++
>  4 files changed, 510 insertions(+)
>  create mode 100644 test/replication/gh-5806-master.lua
>  create mode 100644 test/replication/gh-5806-slave.lua
>  create mode 100644 test/replication/gh-5806-xlog-cleanup.result
>  create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua
> 
> diff --git a/test/replication/gh-5806-master.lua b/test/replication/gh-5806-master.lua
> new file mode 100644
> index 000000000..0404965d3
> --- /dev/null
> +++ b/test/replication/gh-5806-master.lua
> @@ -0,0 +1,13 @@
> +#!/usr/bin/env tarantool
> +
> +require('console').listen(os.getenv('ADMIN'))
> +
> +function func_xlog_snap(space, value)

1. This fails luacheck (which I don't like having in the tests,
but still we must keep it green):

Checking test/replication/gh-5806-master.lua      1 warning

    test/replication/gh-5806-master.lua:5:10: (W111) setting non-standard global variable func_xlog_snap

Checking test/replication/gh-5806-slave.lua       1 warning

    test/replication/gh-5806-slave.lua:5:10: (W111) setting non-standard global variable func_xlog_snap

Also why does it have 'func' prefix? It is a function
obviously, we don't add 'func' prefix to all functions.

> +    space:insert(value)
> +    box.snapshot()
> +end
> diff --git a/test/replication/gh-5806-xlog-cleanup.result b/test/replication/gh-5806-xlog-cleanup.result
> new file mode 100644
> index 000000000..97355a8bf
> --- /dev/null
> +++ b/test/replication/gh-5806-xlog-cleanup.result
> @@ -0,0 +1,336 @@
> +-- test-run result file version 2
> +--
> +-- gh-5806: defer xlog cleanup to keep xlogs until
> +-- replicas present in "_cluster" are connected.
> +-- Otherwise we are getting XlogGapError since
> +-- master might go far forwad from replica and

2. forwad -> forward.

> +-- replica won't be able to connect without full
> +-- rebootstrap.
> +--
> +
> +fiber = require('fiber')
> + | ---
> + | ...
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +
> +--
> +-- Case 1.
> +--
> +-- First lets make sure we're getting XlogGapError in
> +-- case if wal_cleanup_delay is not used.
> +--
> +
> +test_run:cmd('create server master with script="replication/gh-5806-master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server master with wait=True, wait_load=True')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +box.schema.user.grant('guest', 'replication')
> + | ---
> + | ...
> +
> +--
> +-- Keep small number of snaps to force cleanup
> +-- procedure be more intensive.
> +box.cfg{checkpoint_count = 1}
> + | ---
> + | ...
> +
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +s = box.schema.space.create('test', {engine = engine})
> + | ---
> + | ...
> +_ = s:create_index('pk')
> + | ---
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica with rpl_master=master,\
> +              script="replication/gh-5806-slave.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.cfg{checkpoint_count = 1}
> + | ---
> + | ...
> +s = box.schema.space.create('testtemp', {temporary = true})
> + | ---
> + | ...
> +_ = s:create_index('pk')
> + | ---
> + | ...
> +for i=1,2 do func_xlog_snap(box.space.testtemp, {i}) end

3. Honestly, it would look much simpler if it would be just 4
lines with 2 inserts and 2 snapshots.

4. Why do you do rw requests both on the replica and master?
And why do you need 2 spaces?

> + | ---
> + | ...
> +
> +--
> +-- Stop the replica node and generate
> +-- first range of xlogs on the master.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +for i=1,2 do func_xlog_snap(box.space.test, {i}) end
> + | ---
> + | ...
> +
> +--
> +-- Restart the masted and generate the

5. masted -> master.

> +-- next range of xlogs.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server master with wait_load=True')

6. Does 'restart server master' command work?

> + | ---
> + | - true
> + | ...
> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +for i=3,4 do func_xlog_snap(box.space.test, {i}) end
> + | ---
> + | ...
> +
> +--
> +-- Restart master node and the replica then.

7. Why do you need to restart the master 2 times?

> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server master with wait_load=True')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with wait=False, wait_load=False')
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- Wait error to appear.
> +while test_run:grep_log("master", "XlogGapError") == nil do fiber.sleep(0.01) end

8. We have test_run:wait_log().


More information about the Tarantool-patches mailing list