Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions
       [not found] <cover.1574290043.git.i.kosarev@tarantool.org>
@ 2019-11-20 22:47 ` Ilya Kosarev
  2019-11-20 23:07   ` Vladislav Shpilevoy
  0 siblings, 1 reply; 5+ messages in thread
From: Ilya Kosarev @ 2019-11-20 22:47 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

There were some pass conditions in quorum test which could take some
time to be satisfied. Now they are wrapped using test_run:wait_cond to
make the test stable.

Part of #4586
---
 test/replication/quorum.result   | 30 +++++++++++++++++-------------
 test/replication/quorum.test.lua | 18 +++++++++---------
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index ff5fa0150..12604c8de 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -115,15 +115,15 @@ box.info.status -- running
 - running
 ...
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
@@ -149,6 +149,10 @@ test_run:cmd('stop server quorum1')
 ---
 - true
 ...
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
+---
+- true
+...
 for i = 1, 100 do box.space.test:insert{i} end
 ---
 ...
@@ -166,9 +170,9 @@ test_run:cmd('switch quorum1')
 ---
 - true
 ...
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -197,9 +201,9 @@ test_run:cmd('switch quorum1')
 - true
 ...
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
@@ -207,11 +211,9 @@ test_run:cmd('switch quorum2')
 ---
 - true
 ...
-fiber = require('fiber')
----
-...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 ---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
@@ -221,11 +223,13 @@ test_run:cmd('switch quorum3')
 ---
 - true
 ...
-fiber = require('fiber')
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
 ---
+- true
 ...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 ---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 98febb367..be23200d3 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -47,9 +47,9 @@ box.info.ro -- false
 box.info.status -- running
 
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 
 -- Check that box.cfg() doesn't return until the instance
 -- catches up with all configured replicas.
@@ -59,13 +59,14 @@ test_run:cmd('switch quorum2')
 box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
 test_run:cmd('stop server quorum1')
 
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
 for i = 1, 100 do box.space.test:insert{i} end
 fiber = require('fiber')
 fiber.sleep(0.1)
 
 test_run:cmd('start server quorum1 with args="0.1  0.5"')
 test_run:cmd('switch quorum1')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -81,17 +82,16 @@ box.snapshot()
 test_run:cmd('switch quorum1')
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
 
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
 test_run:cmd('switch quorum2')
-fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 test_run:cmd('switch quorum3')
-fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 
 -- Cleanup.
-- 
2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions
  2019-11-20 22:47 ` [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions Ilya Kosarev
@ 2019-11-20 23:07   ` Vladislav Shpilevoy
  2019-11-21  0:12     ` Ilya Kosarev
  0 siblings, 1 reply; 5+ messages in thread
From: Vladislav Shpilevoy @ 2019-11-20 23:07 UTC (permalink / raw)
  To: Ilya Kosarev, tarantool-patches

Hm. So you are not going to fix the flaky error I mentioned
in the previous thread about this commit?

Seems like it is also about 'conditions which need time to be
satisfied'.

On 20/11/2019 23:47, Ilya Kosarev wrote:
> There were some pass conditions in quorum test which could take some
> time to be satisfied. Now they are wrapped using test_run:wait_cond to
> make the test stable.
> 
> Part of #4586
> ---
>  test/replication/quorum.result   | 30 +++++++++++++++++-------------
>  test/replication/quorum.test.lua | 18 +++++++++---------
>  2 files changed, 26 insertions(+), 22 deletions(-)
> 
> diff --git a/test/replication/quorum.result b/test/replication/quorum.result
> index ff5fa0150..12604c8de 100644
> --- a/test/replication/quorum.result
> +++ b/test/replication/quorum.result
> @@ -115,15 +115,15 @@ box.info.status -- running
>  - running
>  ...
>  -- Check that the replica follows all masters.
> -box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
> +box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
>  ---
>  - true
>  ...
> -box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
> +box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
>  ---
>  - true
>  ...
> -box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
> +box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
>  ---
>  - true
>  ...
> @@ -149,6 +149,10 @@ test_run:cmd('stop server quorum1')
>  ---
>  - true
>  ...
> +test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
> +---
> +- true
> +...
>  for i = 1, 100 do box.space.test:insert{i} end
>  ---
>  ...
> @@ -166,9 +170,9 @@ test_run:cmd('switch quorum1')
>  ---
>  - true
>  ...
> -box.space.test:count() -- 100
> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>  ---
> -- 100
> +- true
>  ...
>  -- Rebootstrap one node of the cluster and check that others follow.
>  -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
> @@ -197,9 +201,9 @@ test_run:cmd('switch quorum1')
>  - true
>  ...
>  test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
> -box.space.test:count() -- 100
> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>  ---
> -- 100
> +- true
>  ...
>  -- The rebootstrapped replica will be assigned id = 4,
>  -- because ids 1..3 are busy.
> @@ -207,11 +211,9 @@ test_run:cmd('switch quorum2')
>  ---
>  - true
>  ...
> -fiber = require('fiber')
> ----
> -...
> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>  ---
> +- true
>  ...
>  box.info.replication[4].upstream.status
>  ---
> @@ -221,11 +223,13 @@ test_run:cmd('switch quorum3')
>  ---
>  - true
>  ...
> -fiber = require('fiber')
> +test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
>  ---
> +- true
>  ...
> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>  ---
> +- true
>  ...
>  box.info.replication[4].upstream.status
>  ---
> diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
> index 98febb367..be23200d3 100644
> --- a/test/replication/quorum.test.lua
> +++ b/test/replication/quorum.test.lua
> @@ -47,9 +47,9 @@ box.info.ro -- false
>  box.info.status -- running
>  
>  -- Check that the replica follows all masters.
> -box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
> -box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
> -box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
> +box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
> +box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
> +box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
>  
>  -- Check that box.cfg() doesn't return until the instance
>  -- catches up with all configured replicas.
> @@ -59,13 +59,14 @@ test_run:cmd('switch quorum2')
>  box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
>  test_run:cmd('stop server quorum1')
>  
> +test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
>  for i = 1, 100 do box.space.test:insert{i} end
>  fiber = require('fiber')
>  fiber.sleep(0.1)
>  
>  test_run:cmd('start server quorum1 with args="0.1  0.5"')
>  test_run:cmd('switch quorum1')
> -box.space.test:count() -- 100
> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>  
>  -- Rebootstrap one node of the cluster and check that others follow.
>  -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
> @@ -81,17 +82,16 @@ box.snapshot()
>  test_run:cmd('switch quorum1')
>  test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
>  
> -box.space.test:count() -- 100
> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>  
>  -- The rebootstrapped replica will be assigned id = 4,
>  -- because ids 1..3 are busy.
>  test_run:cmd('switch quorum2')
> -fiber = require('fiber')
> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>  box.info.replication[4].upstream.status
>  test_run:cmd('switch quorum3')
> -fiber = require('fiber')
> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
> +test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>  box.info.replication[4].upstream.status
>  
>  -- Cleanup.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions
  2019-11-20 23:07   ` Vladislav Shpilevoy
@ 2019-11-21  0:12     ` Ilya Kosarev
  0 siblings, 0 replies; 5+ messages in thread
From: Ilya Kosarev @ 2019-11-21  0:12 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 6651 bytes --]


I am going to fix it as soon as i will be able to test it, as far as it
doesn't seem convenient to fix something i can't reproduce. For now i
just brought the patch to the current state.
>Четверг, 21 ноября 2019, 2:01 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
>
>Hm. So you are not going to fix the flaky error I mentioned
>in the previous thread about this commit?
>
>Seems like it is also about 'conditions which need time to be
>satisfied'.
>
>On 20/11/2019 23:47, Ilya Kosarev wrote:
>> There were some pass conditions in quorum test which could take some
>> time to be satisfied. Now they are wrapped using test_run:wait_cond to
>> make the test stable.
>> 
>> Part of #4586
>> ---
>>  test/replication/quorum.result   | 30 +++++++++++++++++-------------
>>  test/replication/quorum.test.lua | 18 +++++++++---------
>>  2 files changed, 26 insertions(+), 22 deletions(-)
>> 
>> diff --git a/test/replication/quorum.result b/test/replication/quorum.result
>> index ff5fa0150..12604c8de 100644
>> --- a/test/replication/quorum.result
>> +++ b/test/replication/quorum.result
>> @@ -115,15 +115,15 @@ box.info.status -- running
>>  - running
>>  ...
>>  -- Check that the replica follows all masters.
>> -box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
>> +box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
>>  ---
>>  - true
>>  ...
>> -box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
>> +box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
>>  ---
>>  - true
>>  ...
>> -box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
>> +box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
>>  ---
>>  - true
>>  ...
>> @@ -149,6 +149,10 @@ test_run:cmd('stop server quorum1')
>>  ---
>>  - true
>>  ...
>> +test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
>> +---
>> +- true
>> +...
>>  for i = 1, 100 do box.space.test:insert{i} end
>>  ---
>>  ...
>> @@ -166,9 +170,9 @@ test_run:cmd('switch quorum1')
>>  ---
>>  - true
>>  ...
>> -box.space.test:count() -- 100
>> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>>  ---
>> -- 100
>> +- true
>>  ...
>>  -- Rebootstrap one node of the cluster and check that others follow.
>>  -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
>> @@ -197,9 +201,9 @@ test_run:cmd('switch quorum1')
>>  - true
>>  ...
>>  test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
>> -box.space.test:count() -- 100
>> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>>  ---
>> -- 100
>> +- true
>>  ...
>>  -- The rebootstrapped replica will be assigned id = 4,
>>  -- because ids 1..3 are busy.
>> @@ -207,11 +211,9 @@ test_run:cmd('switch quorum2')
>>  ---
>>  - true
>>  ...
>> -fiber = require('fiber')
>> ----
>> -...
>> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
>> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>>  ---
>> +- true
>>  ...
>>  box.info.replication[4].upstream.status
>>  ---
>> @@ -221,11 +223,13 @@ test_run:cmd('switch quorum3')
>>  ---
>>  - true
>>  ...
>> -fiber = require('fiber')
>> +test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
>>  ---
>> +- true
>>  ...
>> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
>> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>>  ---
>> +- true
>>  ...
>>  box.info.replication[4].upstream.status
>>  ---
>> diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
>> index 98febb367..be23200d3 100644
>> --- a/test/replication/quorum.test.lua
>> +++ b/test/replication/quorum.test.lua
>> @@ -47,9 +47,9 @@ box.info.ro -- false
>>  box.info.status -- running
>> 
>>  -- Check that the replica follows all masters.
>> -box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
>> -box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
>> -box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
>> +box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
>> +box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
>> +box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
>> 
>>  -- Check that box.cfg() doesn't return until the instance
>>  -- catches up with all configured replicas.
>> @@ -59,13 +59,14 @@ test_run:cmd('switch quorum2')
>>  box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
>>  test_run:cmd('stop server quorum1')
>> 
>> +test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
>>  for i = 1, 100 do box.space.test:insert{i} end
>>  fiber = require('fiber')
>>  fiber.sleep(0.1)
>> 
>>  test_run:cmd('start server quorum1 with args="0.1  0.5"')
>>  test_run:cmd('switch quorum1')
>> -box.space.test:count() -- 100
>> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>> 
>>  -- Rebootstrap one node of the cluster and check that others follow.
>>  -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
>> @@ -81,17 +82,16 @@ box.snapshot()
>>  test_run:cmd('switch quorum1')
>>  test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
>> 
>> -box.space.test:count() -- 100
>> +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>> 
>>  -- The rebootstrapped replica will be assigned id = 4,
>>  -- because ids 1..3 are busy.
>>  test_run:cmd('switch quorum2')
>> -fiber = require('fiber')
>> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
>> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>>  box.info.replication[4].upstream.status
>>  test_run:cmd('switch quorum3')
>> -fiber = require('fiber')
>> -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
>> +test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
>> +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
>>  box.info.replication[4].upstream.status
>> 
>>  -- Cleanup.
>> 


-- 
Ilya Kosarev

[-- Attachment #2: Type: text/html, Size: 7991 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions
       [not found] <cover.1574159473.git.i.kosarev@tarantool.org>
@ 2019-11-19 10:31 ` Ilya Kosarev
  0 siblings, 0 replies; 5+ messages in thread
From: Ilya Kosarev @ 2019-11-19 10:31 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

There were some pass conditions in quorum test which could take some
time to be satisfied. Now they are wrapped using test_run:wait_cond to
make the test stable.

Closes #4586
---
 test/replication/quorum.result   | 30 +++++++++++++++++-------------
 test/replication/quorum.test.lua | 18 +++++++++---------
 2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index ff5fa0150..12604c8de 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -115,15 +115,15 @@ box.info.status -- running
 - running
 ...
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
@@ -149,6 +149,10 @@ test_run:cmd('stop server quorum1')
 ---
 - true
 ...
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
+---
+- true
+...
 for i = 1, 100 do box.space.test:insert{i} end
 ---
 ...
@@ -166,9 +170,9 @@ test_run:cmd('switch quorum1')
 ---
 - true
 ...
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -197,9 +201,9 @@ test_run:cmd('switch quorum1')
 - true
 ...
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
@@ -207,11 +211,9 @@ test_run:cmd('switch quorum2')
 ---
 - true
 ...
-fiber = require('fiber')
----
-...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 ---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
@@ -221,11 +223,13 @@ test_run:cmd('switch quorum3')
 ---
 - true
 ...
-fiber = require('fiber')
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
 ---
+- true
 ...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 ---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 98febb367..be23200d3 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -47,9 +47,9 @@ box.info.ro -- false
 box.info.status -- running
 
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 
 -- Check that box.cfg() doesn't return until the instance
 -- catches up with all configured replicas.
@@ -59,13 +59,14 @@ test_run:cmd('switch quorum2')
 box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
 test_run:cmd('stop server quorum1')
 
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
 for i = 1, 100 do box.space.test:insert{i} end
 fiber = require('fiber')
 fiber.sleep(0.1)
 
 test_run:cmd('start server quorum1 with args="0.1  0.5"')
 test_run:cmd('switch quorum1')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -81,17 +82,16 @@ box.snapshot()
 test_run:cmd('switch quorum1')
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
 
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
 test_run:cmd('switch quorum2')
-fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 test_run:cmd('switch quorum3')
-fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 
 -- Cleanup.
-- 
2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions
  2019-11-18 13:19 [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test Ilya Kosarev
@ 2019-11-18 13:19 ` Ilya Kosarev
  0 siblings, 0 replies; 5+ messages in thread
From: Ilya Kosarev @ 2019-11-18 13:19 UTC (permalink / raw)
  To: tarantool-patches

There were some pass conditions in quorum test which could take some
time to be satisfied. Now they are wrapped using test_run:wait_cond to
make the test stable.

Closes #4586
---
 test/replication/quorum.result   | 28 +++++++++++++++++++---------
 test/replication/quorum.test.lua | 16 +++++++++-------
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index ff5fa0150..73def113f 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -115,15 +115,15 @@ box.info.status -- running
 - running
 ...
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
@@ -149,6 +149,10 @@ test_run:cmd('stop server quorum1')
 ---
 - true
 ...
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
+---
+- true
+...
 for i = 1, 100 do box.space.test:insert{i} end
 ---
 ...
@@ -166,9 +170,9 @@ test_run:cmd('switch quorum1')
 ---
 - true
 ...
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -197,9 +201,9 @@ test_run:cmd('switch quorum1')
 - true
 ...
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
@@ -210,8 +214,9 @@ test_run:cmd('switch quorum2')
 fiber = require('fiber')
 ---
 ...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 ---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
@@ -224,8 +229,13 @@ test_run:cmd('switch quorum3')
 fiber = require('fiber')
 ---
 ...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
 ---
+- true
+...
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
+---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 98febb367..1842cdffe 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -47,9 +47,9 @@ box.info.ro -- false
 box.info.status -- running
 
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 
 -- Check that box.cfg() doesn't return until the instance
 -- catches up with all configured replicas.
@@ -59,13 +59,14 @@ test_run:cmd('switch quorum2')
 box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
 test_run:cmd('stop server quorum1')
 
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
 for i = 1, 100 do box.space.test:insert{i} end
 fiber = require('fiber')
 fiber.sleep(0.1)
 
 test_run:cmd('start server quorum1 with args="0.1  0.5"')
 test_run:cmd('switch quorum1')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -81,17 +82,18 @@ box.snapshot()
 test_run:cmd('switch quorum1')
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
 
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
 test_run:cmd('switch quorum2')
 fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 test_run:cmd('switch quorum3')
 fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 
 -- Cleanup.
-- 
2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-21  0:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1574290043.git.i.kosarev@tarantool.org>
2019-11-20 22:47 ` [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions Ilya Kosarev
2019-11-20 23:07   ` Vladislav Shpilevoy
2019-11-21  0:12     ` Ilya Kosarev
     [not found] <cover.1574159473.git.i.kosarev@tarantool.org>
2019-11-19 10:31 ` Ilya Kosarev
2019-11-18 13:19 [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test Ilya Kosarev
2019-11-18 13:19 ` [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions Ilya Kosarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox