[Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Jul 21 01:00:56 MSK 2020


Thanks for the patch!

See 8 comments below.

On 09.07.2020 19:16, sergeyb at tarantool.org wrote:
> From: Sergey Bronnikov <sergeyb at tarantool.org>
> 
> Part of #5055
> Part of #4849
> ---
>  test/replication/qsync_basic.result   | 85 +++++++++++++++++++++++++++
>  test/replication/qsync_basic.test.lua | 31 ++++++++++
>  2 files changed, 116 insertions(+)
> 
> diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
> index ab4be0c7e..464df75a7 100644
> --- a/test/replication/qsync_basic.result
> +++ b/test/replication/qsync_basic.result
> @@ -32,6 +32,14 @@ s2.is_sync
>   | - false
>   | ...
>  
> +--
> +-- gh-4849: clear synchro queue with unconfigured box
> +--
> +box.ctl.clear_synchro_queue()

1. Enough to test that it does not crash here. No need to
check result.

> + | ---
> + | - -1
> + | ...
> +
>  -- Net.box takes sync into account.
>  box.schema.user.grant('guest', 'super')
>   | ---
> @@ -553,6 +561,82 @@ box.space.sync:select{7}
>   | - - [7]
>   | ...
>  
> +--
> +-- gh-4849: clear synchro queue on a replica
> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2}

2. If you want this timeout to fail, it is too big. The test will be too
long. If it is supposed not to fail, then it is way too small. If you want
it to fail, it should be something like 0.001. If you want it to hold, it
should be 1000 to be sure. Keep in mind that you can change the timeout on
fly. That allows to build quite complex reactive tests completely event-based.

> + | ---
> + | ...
> +f1 = fiber.create(box.space.sync.replace, box.space.sync, {9})

3. You need to extract the exact result value. Use pcall for that, and print
its result after the fiber is dead. See other examples with fiber.create() in
qsync test suite. The same for the next test case.

> + | ---
> + | ...
> +f1:status()
> + | ---
> + | - suspended
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true

4. Better wait until the value is delivered. Otherwise you can switch to
replica before master finishes WAL write, and the queue will be empty here.
Try

    test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)

> + | ...
> +box.ctl.clear_synchro_queue()
> + | ---
> + | - 0
> + | ...
> +box.space.sync:select{9}
> + | ---
> + | - []

5. If select returns 9 before queue clean, and returns empty afterwards,
it means the queue was cleared. So here the queue size is not really needed,
as you can see.

> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{9}
> + | ---
> + | - []
> + | ...
> +f1:status()
> + | ---
> + | - dead
> + | ...
> +
> +--
> +-- gh-4849: clear synchro queue on a master

6. Since the previous test the replica's WAL is different from master's.
I am not sure the replication is still alive.

> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2}
> + | ---
> + | ...
> +f1 = fiber.create(box.space.sync.replace, box.space.sync, {10})
> + | ---
> + | ...
> +f1:status()
> + | ---
> + | - suspended
> + | ...
> +box.ctl.clear_synchro_queue()
> + | ---
> + | - -2
> + | ...
> +box.space.sync:select{10}
> + | ---
> + | - - [10]

7. Why is it 10? The quorum is not reached, it should have been rolled back.

> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{10}
> + | ---
> + | - - [10]
> + | ...
> +
>  -- Cleanup.
>  test_run:cmd('switch default')
>   | ---
> @@ -576,6 +660,7 @@ test_run:cmd('delete server replica')
>   | ...
>  box.space.test:drop()
>   | ---
> + | - error: A rollback for a synchronous transaction is received

8. Why is it changed?

>   | ...
>  box.space.sync:drop()
>   | ---


More information about the Tarantool-patches mailing list