[Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-*

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Thu Sep 10 00:12:40 MSK 2020


Thanks for the patch!

See 3 comments below.

On 07.09.2020 14:13, Alexander V. Tikhonov wrote:
> On heavy loaded hosts found the following issue:
> 
>    box.cfg{replication_synchro_quorum = 2}
>     | ---
>   + | - error: '[string "test_run:wait_cond(function()                ..."]:1: attempt to
>   + |     index field ''vclock'' (a nil value)'

1. How is that output possible? box.cfg has nothing to do with wait_cond
nor with vclocks. The diff looks broken, misplaced.

>     | ...
>    test_run:wait_cond(function() return f:status() == 'dead' end)
>     | ---
>   - | - true
>   + | - false
>     | ...
>    ok, err
>     | ---
>   - | - true
>   - | - [2]
>   + | - null
>   + | - null
>     | ...
>    box.space.sync:select{}
>     | ---
> 
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine.

2. But in the fix you don't wait for it. You just added one more level
of checking inside the existing wait_cond.

> Closes #5230
> ---
> 
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195
> Issue: https://github.com/tarantool/tarantool/issues/5230
> 
>  test/replication/gh-5195-qsync-replica-write.result   | 10 ++++++----
>  test/replication/gh-5195-qsync-replica-write.test.lua |  7 +++----
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
> index 3999e8f5e..b47359fde 100644
> --- a/test/replication/gh-5195-qsync-replica-write.result
> +++ b/test/replication/gh-5195-qsync-replica-write.result
> @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'})
>   | - true
>   | ...
>  test_run:wait_cond(function()                                                   \
> -        local info = box.info.replication[replica_id]                           \
> -        local lsn = info.downstream.vclock[replica_id]                          \
> -        return lsn and lsn >= replica_lsn                                       \
> -end)                                                                            \
> +        local lsn = box.info.replication[replica_id].downstream.vclock          \

3. It is not lsn, it is vclock now. Also I don't think you need to inline 'info'
variable, it was perfectly fine.

> +        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
> +    end) or box.info
> + | ---
> + | - true
> + | ...


More information about the Tarantool-patches mailing list