Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/4] Flaky qsync tests
@ 2020-07-16 20:38 Vladislav Shpilevoy
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 1/4] test: fix flaky qsync_with_anon.test.lua Vladislav Shpilevoy
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 20:38 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

Branch: http://github.com/tarantool/tarantool/tree/gerold103/qsync_flaky
Issue: https://github.com/tarantool/tarantool/issues/5162
Issue: https://github.com/tarantool/tarantool/issues/5165
Issue: https://github.com/tarantool/tarantool/issues/5167
Issue: https://github.com/tarantool/tarantool/issues/5168

Vladislav Shpilevoy (4):
  test: fix flaky qsync_with_anon.test.lua
  test: fix flaky qsync_advanced.test.lua
  test: fix flaky qsync_snapshots.test.lua
  test: fix flaky qsync_basic.test.lua

 test/replication/qsync_advanced.result    | 42 ++++++++++++++++-------
 test/replication/qsync_advanced.test.lua  | 32 ++++++++++-------
 test/replication/qsync_basic.result       |  6 ++--
 test/replication/qsync_basic.test.lua     |  6 ++--
 test/replication/qsync_snapshots.result   |  4 +++
 test/replication/qsync_snapshots.test.lua |  1 +
 test/replication/qsync_with_anon.result   |  4 +--
 test/replication/qsync_with_anon.test.lua |  4 +--
 8 files changed, 65 insertions(+), 34 deletions(-)

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 1/4] test: fix flaky qsync_with_anon.test.lua
  2020-07-16 20:38 [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Vladislav Shpilevoy
@ 2020-07-16 20:38 ` Vladislav Shpilevoy
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 2/4] test: fix flaky qsync_advanced.test.lua Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 20:38 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

The test used very small timeout, 0.1 second, to test that a sync
transaction works fine. But with too much load on the machine this
may appear not enough. Now in the cases where the timeout wasn't
supposed to explode the value 1000 seconds is used, more than
enough.

Closes #5165
---
 test/replication/qsync_with_anon.result   | 4 ++--
 test/replication/qsync_with_anon.test.lua | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
index 59506a608..51f02bcdb 100644
--- a/test/replication/qsync_with_anon.result
+++ b/test/replication/qsync_with_anon.result
@@ -48,7 +48,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -126,7 +126,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 box.space.sync:insert{1} -- success
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
index aed62775e..5bc7c8be4 100644
--- a/test/replication/qsync_with_anon.test.lua
+++ b/test/replication/qsync_with_anon.test.lua
@@ -19,7 +19,7 @@ test_run:cmd('switch replica_anon')
 -- replica.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -45,7 +45,7 @@ box.space.sync:insert{1} -- failure
 test_run:cmd('switch replica_anon')
 box.space.sync:select{} -- none
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 box.space.sync:insert{1} -- success
 test_run:cmd('switch replica_anon')
 box.space.sync:select{} -- 1
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 2/4] test: fix flaky qsync_advanced.test.lua
  2020-07-16 20:38 [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Vladislav Shpilevoy
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 1/4] test: fix flaky qsync_with_anon.test.lua Vladislav Shpilevoy
@ 2020-07-16 20:38 ` Vladislav Shpilevoy
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 3/4] test: fix flaky qsync_snapshots.test.lua Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 20:38 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

There were multiple problems:

- Some timeouts were too small. Timeout 0.1 is a very small value,
  which leads to flakiness in 100% cases sooner or later.

- One timeout was too big - 5 second waiting, whereas it could
  easily be less than a second. Anyway it was expected to fail. No
  need to wait so long.

- To check if timeout really passed whole, was used os.time(),
  which is incorrect: precision is seconds. Also the passed time
  was checked using equation duration == timeout, but it is also
  wrong. When something is blocked on a timeout, if the system is
  not real-time, the really passed time is always >= timeout. Not
  == timeout.

- In the failover test there was no fullmesh. As a result, when a
  replica was promoted and wrote something into the sync space, it
  wasn't replicated to master. But the test passed because
  1) The incorrect behaviour was in .result file;
  2) On the replica the quorum was default, i.e. 1. So the replica
   didn't wait master, and successfully wrote data into the sync
   space.

The initial problem of the test was that in the last case one of
the test jobs somehow got the old master seeing the replica's
data. But it is impossible, there was no replication from the
replica to master. Anyway now the test case is reworked, and even
if it would fail, it would be a new fail.

Closes #5168
---
 test/replication/qsync_advanced.result   | 42 +++++++++++++++++-------
 test/replication/qsync_advanced.test.lua | 32 +++++++++++-------
 2 files changed, 50 insertions(+), 24 deletions(-)

diff --git a/test/replication/qsync_advanced.result b/test/replication/qsync_advanced.result
index 3a288e0ca..94b19b1f2 100644
--- a/test/replication/qsync_advanced.result
+++ b/test/replication/qsync_advanced.result
@@ -63,7 +63,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -100,7 +100,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.001}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -138,7 +138,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -191,7 +191,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.001}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -201,14 +201,17 @@ _ = box.space.sync:create_index('pk')
  | ---
  | ...
 -- Testcase body.
-start = os.time()
+start = fiber.clock()
  | ---
  | ...
 box.space.sync:insert{1}
  | ---
  | - error: Quorum collection for a synchronous transaction is timed out
  | ...
-(os.time() - start) == box.cfg.replication_synchro_timeout -- true
+duration = fiber.clock() - start
+ | ---
+ | ...
+duration >= box.cfg.replication_synchro_timeout or duration -- true
  | ---
  | - true
  | ...
@@ -298,7 +301,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -326,7 +329,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -428,7 +431,15 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+test_run:cmd("set variable replica_url to 'replica.listen'")
+ | ---
+ | - true
+ | ...
+box.cfg{                                                                        \
+    replication_synchro_quorum = NUM_INSTANCES,                                 \
+    replication_synchro_timeout = 1000,                                         \
+    replication = replica_url,                                                  \
+}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -468,6 +479,9 @@ test_run:switch('replica')
  | ---
  | - true
  | ...
+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
+ | ---
+ | ...
 box.space.sync:insert{2}
  | ---
  | - [2]
@@ -484,6 +498,7 @@ test_run:switch('default')
 box.space.sync:select{} -- 1, 2
  | ---
  | - - [1]
+ |   - [2]
  | ...
 -- Revert cluster configuration.
 test_run:switch('default')
@@ -508,6 +523,9 @@ test_run:switch('default')
 box.space.sync:drop()
  | ---
  | ...
+box.cfg{replication = {}}
+ | ---
+ | ...
 
 -- Check behaviour with failed write to WAL on master (ERRINJ_WAL_IO).
 -- Testcase setup.
@@ -515,7 +533,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -699,14 +717,14 @@ disable_sync_mode()
  | ---
  | ...
 -- Space is in sync mode now.
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 box.space.sync:insert{2} -- success
  | ---
  | - [2]
  | ...
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=1000}
  | ---
  | ...
 box.space.sync:insert{3} -- success
diff --git a/test/replication/qsync_advanced.test.lua b/test/replication/qsync_advanced.test.lua
index 4b62c6fb4..058ece602 100644
--- a/test/replication/qsync_advanced.test.lua
+++ b/test/replication/qsync_advanced.test.lua
@@ -27,7 +27,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
 -- Successful write.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -41,7 +41,7 @@ box.space.sync:drop()
 -- Unsuccessfull write.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.001}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -56,7 +56,7 @@ box.space.sync:drop()
 -- same order as on client in case of achieved quorum.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -73,13 +73,14 @@ box.space.sync:drop()
 -- Synchro timeout is not bigger than replication_synchro_timeout value.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.001}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
-start = os.time()
+start = fiber.clock()
 box.space.sync:insert{1}
-(os.time() - start) == box.cfg.replication_synchro_timeout -- true
+duration = fiber.clock() - start
+duration >= box.cfg.replication_synchro_timeout or duration -- true
 -- Testcase cleanup.
 test_run:switch('default')
 box.space.sync:drop()
@@ -108,7 +109,7 @@ box.cfg.replication_synchro_timeout -- old value
 -- TX is in synchronous replication.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -121,7 +122,7 @@ box.space.sync:drop()
 -- data consistency on a leader and replicas.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -155,7 +156,12 @@ box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
 -- success and data consistency on a leader and replicas (gh-5124).
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+test_run:cmd("set variable replica_url to 'replica.listen'")
+box.cfg{                                                                        \
+    replication_synchro_quorum = NUM_INSTANCES,                                 \
+    replication_synchro_timeout = 1000,                                         \
+    replication = replica_url,                                                  \
+}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -167,6 +173,7 @@ box.cfg{read_only=false} -- promote replica to master
 test_run:switch('default')
 box.cfg{read_only=true} -- demote master to replica
 test_run:switch('replica')
+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
 box.space.sync:insert{2}
 box.space.sync:select{} -- 1, 2
 test_run:switch('default')
@@ -179,11 +186,12 @@ box.cfg{read_only=true}
 -- Testcase cleanup.
 test_run:switch('default')
 box.space.sync:drop()
+box.cfg{replication = {}}
 
 -- Check behaviour with failed write to WAL on master (ERRINJ_WAL_IO).
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -250,9 +258,9 @@ test_run:switch('default')
 -- Enable synchronous mode.
 disable_sync_mode()
 -- Space is in sync mode now.
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 box.space.sync:insert{2} -- success
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=1000}
 box.space.sync:insert{3} -- success
 box.space.sync:select{} -- 1, 2, 3
 test_run:cmd('switch replica')
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 3/4] test: fix flaky qsync_snapshots.test.lua
  2020-07-16 20:38 [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Vladislav Shpilevoy
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 1/4] test: fix flaky qsync_with_anon.test.lua Vladislav Shpilevoy
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 2/4] test: fix flaky qsync_advanced.test.lua Vladislav Shpilevoy
@ 2020-07-16 20:38 ` Vladislav Shpilevoy
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 4/4] test: fix flaky qsync_basic.test.lua Vladislav Shpilevoy
  2020-07-17 13:19 ` [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Alexander V. Tikhonov
  4 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 20:38 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

There was a test case about master writing a transaction, replica
stating snapshot, master writing rollback, and replica canceling
the snapshot. Because its data was spoiled with rolled back data.

Sometimes it could happen that the master started writing the
transaction to WAL in a newly created fiber, then the test
switched to replica, successfully created the snapshot, and only
then the data from master was received. As a result, the following
rollback didn't affect the already finished snapshot.

The patch forces the replica wait for receipt of dirty data from
master before starting the snapshot.

Closes #5167
---
 test/replication/qsync_snapshots.result   | 4 ++++
 test/replication/qsync_snapshots.test.lua | 1 +
 2 files changed, 5 insertions(+)

diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index 2a126087a..782ffd482 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -204,6 +204,10 @@ fiber = require('fiber')
 box.cfg{replication_synchro_timeout=1000}
  | ---
  | ...
+test_run:wait_cond(function() return box.space.sync:count() == 1 end)
+ | ---
+ | - true
+ | ...
 ok, err = nil
  | ---
  | ...
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index 0db61da95..979f04d5f 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -97,6 +97,7 @@ end)
 test_run:switch('replica')
 fiber = require('fiber')
 box.cfg{replication_synchro_timeout=1000}
+test_run:wait_cond(function() return box.space.sync:count() == 1 end)
 ok, err = nil
 f = fiber.create(function() ok, err = pcall(box.snapshot) end)
 
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 4/4] test: fix flaky qsync_basic.test.lua
  2020-07-16 20:38 [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 3/4] test: fix flaky qsync_snapshots.test.lua Vladislav Shpilevoy
@ 2020-07-16 20:38 ` Vladislav Shpilevoy
  2020-07-17 13:19 ` [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Alexander V. Tikhonov
  4 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-16 20:38 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

Too small timeouts were used for testing that synchronous
transactions succeed.

Follow-up #5162
---
 test/replication/qsync_basic.result   | 6 +++---
 test/replication/qsync_basic.test.lua | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index b355a5c06..bd3c3cce1 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -96,7 +96,7 @@ old_synchro_quorum = box.cfg.replication_synchro_quorum
 old_synchro_timeout = box.cfg.replication_synchro_timeout
  | ---
  | ...
-box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
  | ---
  | ...
 
@@ -130,7 +130,7 @@ box.info.lsn - lsn
  | - 2
  | ...
 -- Raise quorum so that master has to issue a ROLLBACK.
-box.cfg{replication_synchro_quorum=3}
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 0.001}
  | ---
  | ...
 t = fiber.time()
@@ -145,7 +145,7 @@ fiber.time() - t > box.cfg.replication_synchro_timeout
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=2}
+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
  | ---
  | ...
 box.space.sync:insert{3}
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index 205fb0664..94235547d 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -40,7 +40,7 @@ box.schema.user.grant('guest', 'replication')
 -- Set up synchronous replication options.
 old_synchro_quorum = box.cfg.replication_synchro_quorum
 old_synchro_timeout = box.cfg.replication_synchro_timeout
-box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
 
 test_run:cmd('create server replica with rpl_master=default,\
                                          script="replication/replica.lua"')
@@ -54,12 +54,12 @@ box.space.sync:insert{1}
 -- 1 for insertion, 1 for CONFIRM message.
 box.info.lsn - lsn
 -- Raise quorum so that master has to issue a ROLLBACK.
-box.cfg{replication_synchro_quorum=3}
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 0.001}
 t = fiber.time()
 box.space.sync:insert{2}
 -- Check that master waited for acks.
 fiber.time() - t > box.cfg.replication_synchro_timeout
-box.cfg{replication_synchro_quorum=2}
+box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
 box.space.sync:insert{3}
 box.space.sync:select{}
 
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Flaky qsync tests
  2020-07-16 20:38 [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Vladislav Shpilevoy
                   ` (3 preceding siblings ...)
  2020-07-16 20:38 ` [Tarantool-patches] [PATCH 4/4] test: fix flaky qsync_basic.test.lua Vladislav Shpilevoy
@ 2020-07-17 13:19 ` Alexander V. Tikhonov
  2020-07-20 19:23   ` Vladislav Shpilevoy
  4 siblings, 1 reply; 7+ messages in thread
From: Alexander V. Tikhonov @ 2020-07-17 13:19 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi Vlad, thanks a lot for the fast fixes, I've tried it and looks like
it really helped to avoid of flaky fails. The patches LGTM.

On Thu, Jul 16, 2020 at 10:38:09PM +0200, Vladislav Shpilevoy wrote:
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/qsync_flaky
> Issue: https://github.com/tarantool/tarantool/issues/5162
> Issue: https://github.com/tarantool/tarantool/issues/5165
> Issue: https://github.com/tarantool/tarantool/issues/5167
> Issue: https://github.com/tarantool/tarantool/issues/5168
> 
> Vladislav Shpilevoy (4):
>   test: fix flaky qsync_with_anon.test.lua
>   test: fix flaky qsync_advanced.test.lua
>   test: fix flaky qsync_snapshots.test.lua
>   test: fix flaky qsync_basic.test.lua
> 
>  test/replication/qsync_advanced.result    | 42 ++++++++++++++++-------
>  test/replication/qsync_advanced.test.lua  | 32 ++++++++++-------
>  test/replication/qsync_basic.result       |  6 ++--
>  test/replication/qsync_basic.test.lua     |  6 ++--
>  test/replication/qsync_snapshots.result   |  4 +++
>  test/replication/qsync_snapshots.test.lua |  1 +
>  test/replication/qsync_with_anon.result   |  4 +--
>  test/replication/qsync_with_anon.test.lua |  4 +--
>  8 files changed, 65 insertions(+), 34 deletions(-)
> 
> -- 
> 2.21.1 (Apple Git-122.3)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Flaky qsync tests
  2020-07-17 13:19 ` [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Alexander V. Tikhonov
@ 2020-07-20 19:23   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-20 19:23 UTC (permalink / raw)
  To: Alexander V. Tikhonov; +Cc: tarantool-patches

Pushed to master.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-20 19:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-16 20:38 [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Vladislav Shpilevoy
2020-07-16 20:38 ` [Tarantool-patches] [PATCH 1/4] test: fix flaky qsync_with_anon.test.lua Vladislav Shpilevoy
2020-07-16 20:38 ` [Tarantool-patches] [PATCH 2/4] test: fix flaky qsync_advanced.test.lua Vladislav Shpilevoy
2020-07-16 20:38 ` [Tarantool-patches] [PATCH 3/4] test: fix flaky qsync_snapshots.test.lua Vladislav Shpilevoy
2020-07-16 20:38 ` [Tarantool-patches] [PATCH 4/4] test: fix flaky qsync_basic.test.lua Vladislav Shpilevoy
2020-07-17 13:19 ` [Tarantool-patches] [PATCH 0/4] Flaky qsync tests Alexander V. Tikhonov
2020-07-20 19:23   ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox