[Tarantool-patches] [PATCH v1] test: fix flaky replication/skip_conflict_row test

Sergey Bronnikov sergeyb at tarantool.org
Wed Apr 29 14:53:50 MSK 2020


LGTM

Test passed 100 out of 100 iterations.
And see comments inline.

On 02:08 Mon 27 Apr , Alexander V. Tikhonov wrote:
> From: "Aleander V. Tikhonov" <avtikhon at tarantool.org>
> 
> Fixed flaky upstream checks at replication/skip_conflict_row test.
> 
> Errors fixed:
> 
> [038] @@ -174,7 +174,7 @@
> [038]  ...
> [038]  box.info.replication[1].upstream.status
> [038]  ---
> [038] -- follow
> [038] +- disconnected
> [038]  ...
> [038]  -- write some conflicting records on slave
> [038]  for i = 1, 10 do box.space.test:insert({i, 'r'}) end
> 
> [030] @@ -201,12 +201,12 @@
> [030]  -- lsn should be incremented
> [030]  v1 == box.info.vclock[1] - 10
> [030]  ---
> [030] -- true
> [030] +- false
> [030]  ...
> [030]  -- and state is follow
> [030]  box.info.replication[1].upstream.status
> [030]  ---
> [030] -- follow
> [030] +- disconnected
> [030]  ...
> [030]  -- restart server and check replication continues from nop-ed vclock
> [030]  test_run:cmd("switch default")
> 
> [022] @@ -230,7 +230,7 @@
> [022]  ...
> [022]  box.info.replication[1].upstream.status
> [022]  ---
> [022] -- follow
> [022] +- disconnected
> [022]  ...
> [022]  box.space.test:select({11}, {iterator = "GE"})
> [022]  ---
> [022]

It is not clear from output was the problem you have fixed.
I propose to remove this output and add description of the problem.

> 
> Closes #4425
> ---
>  test/replication/skip_conflict_row.result   | 34 ++++++++-------------
>  test/replication/skip_conflict_row.test.lua | 16 +++++-----
>  test/replication/suite.ini                  |  1 -
>  3 files changed, 20 insertions(+), 31 deletions(-)
> 
> diff --git a/test/replication/skip_conflict_row.result b/test/replication/skip_conflict_row.result
> index d70ac8e2a..737522b8b 100644
> --- a/test/replication/skip_conflict_row.result
> +++ b/test/replication/skip_conflict_row.result
> @@ -64,13 +64,9 @@ test_run:cmd("switch replica")
>  ---
>  - true
>  ...
> -box.info.replication[1].upstream.message
> +test_run:wait_upstream(1, {status = 'follow', message_re = box.NULL})
>  ---
> -- null
> -...
> -box.info.replication[1].upstream.status
> ----
> -- follow
> +- true
>  ...
>  box.space.test:select()
>  ---
> @@ -123,13 +119,9 @@ lsn1 == box.info.vclock[1]
>  ---
>  - true
>  ...
> -box.info.replication[1].upstream.message
> ----
> -- Duplicate key exists in unique index 'primary' in space 'test'
> -...
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
>  ---
> -- stopped
> +- true
>  ...
>  test_run:cmd("switch default")
>  ---
> @@ -140,9 +132,9 @@ test_run:cmd("restart server replica")
>  - true
>  ...
>  -- applier is not in follow state
> -box.info.replication[1].upstream.message
> +test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
>  ---
> -- Duplicate key exists in unique index 'primary' in space 'test'
> +- true
>  ...
>  --
>  -- gh-3977: check that NOP is written instead of conflicting row.
> @@ -172,9 +164,9 @@ test_run:cmd("switch replica")
>  ---
>  - true
>  ...
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'follow'})
>  ---
> -- follow
> +- true
>  ...
>  -- write some conflicting records on slave
>  for i = 1, 10 do box.space.test:insert({i, 'r'}) end
> @@ -199,14 +191,14 @@ test_run:cmd("switch replica")
>  - true
>  ...
>  -- lsn should be incremented
> -v1 == box.info.vclock[1] - 10
> +test_run:wait_cond(function() return v1 == box.info.vclock[1] - 10 end)
>  ---
>  - true
>  ...
>  -- and state is follow
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'follow'})
>  ---
> -- follow
> +- true
>  ...
>  -- restart server and check replication continues from nop-ed vclock
>  test_run:cmd("switch default")
> @@ -228,9 +220,9 @@ test_run:cmd("switch replica")
>  ---
>  - true
>  ...
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'follow'})
>  ---
> -- follow
> +- true
>  ...
>  box.space.test:select({11}, {iterator = "GE"})
>  ---
> diff --git a/test/replication/skip_conflict_row.test.lua b/test/replication/skip_conflict_row.test.lua
> index 04fd08136..e7a93cc74 100644
> --- a/test/replication/skip_conflict_row.test.lua
> +++ b/test/replication/skip_conflict_row.test.lua
> @@ -22,8 +22,7 @@ vclock = test_run:get_vclock('default')
>  vclock[0] = nil
>  _ = test_run:wait_vclock("replica", vclock)
>  test_run:cmd("switch replica")
> -box.info.replication[1].upstream.message
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'follow', message_re = box.NULL})
>  box.space.test:select()
>  
>  test_run:cmd("switch default")
> @@ -41,12 +40,11 @@ box.space.test:insert{4}
>  test_run:cmd("switch replica")
>  -- lsn is not promoted
>  lsn1 == box.info.vclock[1]
> -box.info.replication[1].upstream.message
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
>  test_run:cmd("switch default")
>  test_run:cmd("restart server replica")
>  -- applier is not in follow state
> -box.info.replication[1].upstream.message
> +test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
>  
>  --
>  -- gh-3977: check that NOP is written instead of conflicting row.
> @@ -60,7 +58,7 @@ test_run:cmd("switch default")
>  box.space.test:truncate()
>  test_run:cmd("restart server replica")
>  test_run:cmd("switch replica")
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'follow'})
>  -- write some conflicting records on slave
>  for i = 1, 10 do box.space.test:insert({i, 'r'}) end
>  box.cfg{replication_skip_conflict = true}
> @@ -72,9 +70,9 @@ for i = 1, 10 do box.space.test:insert({i, 'm'}) end
>  
>  test_run:cmd("switch replica")
>  -- lsn should be incremented
> -v1 == box.info.vclock[1] - 10
> +test_run:wait_cond(function() return v1 == box.info.vclock[1] - 10 end)
>  -- and state is follow
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'follow'})
>  
>  -- restart server and check replication continues from nop-ed vclock
>  test_run:cmd("switch default")
> @@ -82,7 +80,7 @@ test_run:cmd("stop server replica")
>  for i = 11, 20 do box.space.test:insert({i, 'm'}) end
>  test_run:cmd("start server replica")
>  test_run:cmd("switch replica")
> -box.info.replication[1].upstream.status
> +test_run:wait_upstream(1, {status = 'follow'})
>  box.space.test:select({11}, {iterator = "GE"})
>  
>  test_run:cmd("switch default")
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index ac413669d..572dd47fe 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -13,7 +13,6 @@ is_parallel = True
>  pretest_clean = True
>  fragile = errinj.test.lua            ; gh-3870
>            long_row_timeout.test.lua  ; gh-4351
> -          skip_conflict_row.test.lua ; gh-4457
>            sync.test.lua              ; gh-3835 gh-3877
>            transaction.test.lua       ; gh-4312
>            wal_off.test.lua           ; gh-4355
> -- 
> 2.17.1
> 

-- 
sergeyb@


More information about the Tarantool-patches mailing list