* [Tarantool-patches] [PATCH 1/2] test: fix flaky qsync_snapshots.test.lua again
2020-07-22 23:57 [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Vladislav Shpilevoy
@ 2020-07-22 23:57 ` Vladislav Shpilevoy
2020-07-22 23:57 ` [Tarantool-patches] [PATCH 2/2] test: fix flaky qsync_with_anon.test.lua again Vladislav Shpilevoy
2020-07-28 8:09 ` [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Alexander V. Tikhonov
2 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-22 23:57 UTC (permalink / raw)
To: tarantool-patches, avtikhon
One of the test cases started a sync transaction on master,
switched to replica, and tried to do some actions assuming that
the latest master data has arrived here.
But in fact the replica could be far behind the master. It could
still contain data from the previous test case. That led to a
bug, when it looked like if the replica had some data committed
on it, but not committed on master - this was just data from the
previous test case.
The issue is solved by flushing master's state to replica via
making a successful sync transaction.
Closes #5167
---
test/replication/qsync_snapshots.result | 8 +++++++-
test/replication/qsync_snapshots.test.lua | 4 +++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index 782ffd482..cafdd63c8 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -176,8 +176,14 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+ | ---
+ | ...
+_ = box.space.sync:delete{1}
+ | ---
+ | ...
--- Testcase body.
test_run:switch('default')
| ---
| - true
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index 979f04d5f..590610974 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -85,8 +85,10 @@ test_run:switch('default')
box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+_ = box.space.sync:delete{1}
--- Testcase body.
test_run:switch('default')
box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
ok, err = nil
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Tarantool-patches] [PATCH 2/2] test: fix flaky qsync_with_anon.test.lua again
2020-07-22 23:57 [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Vladislav Shpilevoy
2020-07-22 23:57 ` [Tarantool-patches] [PATCH 1/2] test: fix flaky qsync_snapshots.test.lua again Vladislav Shpilevoy
@ 2020-07-22 23:57 ` Vladislav Shpilevoy
2020-07-28 8:09 ` [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Alexander V. Tikhonov
2 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-22 23:57 UTC (permalink / raw)
To: tarantool-patches, avtikhon
One of the test cases had 2 problems.
- The same as in the previous commit - it started a sync
transaction on master, switched to replica assuming it sees
everything up to this sync transaction, but it still can see
data from the previous test case;
- The test case tried to write a sync transaction on master, got
timeout, switched to replica to ensure the data is removed here
too, but since dirty reads are possible, it could happen the
data was delivered to replica and ROLLBACK wasn't not yet. On
the replica the rolled back data still could be visible.
The first issue is solved by flushing master's state to replica
via making a successful sync transaction.
The second issue is fixed by splitting it into more steps, not
depending on timeouts (1000 is considered infinity).
Closes #5196
---
test/replication/qsync_with_anon.result | 58 +++++++++++++++++++++--
test/replication/qsync_with_anon.test.lua | 27 +++++++++--
2 files changed, 76 insertions(+), 9 deletions(-)
diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
index 51f02bcdb..6a2952a32 100644
--- a/test/replication/qsync_with_anon.result
+++ b/test/replication/qsync_with_anon.result
@@ -96,7 +96,7 @@ box.space.sync:drop()
-- [RFC, Asynchronous replication] failed transaction rolled back on async
-- replica.
-- Testcase setup.
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000}
| ---
| ...
_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -105,23 +105,71 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
--- Testcase body.
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+ | ---
+ | ...
+_ = box.space.sync:delete{1}
+ | ---
+ | ...
+
+box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000}
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+ok, err = nil
+ | ---
+ | ...
+f = fiber.create(function() \
+ ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \
+end)
+ | ---
+ | ...
+
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.sync:count() == 1 end)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
test_run:switch('default')
| ---
| - true
| ...
-box.space.sync:insert{1} -- failure
+box.cfg{replication_synchro_timeout = 0.001}
+ | ---
+ | ...
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
| ---
- | - error: Quorum collection for a synchronous transaction is timed out
+ | - []
| ...
+
test_run:cmd('switch replica_anon')
| ---
| - true
| ...
-box.space.sync:select{} -- none
+test_run:wait_cond(function() return box.space.sync:count() == 0 end)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
| ---
| - []
| ...
+
test_run:switch('default')
| ---
| - true
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
index 5bc7c8be4..d7ecaa107 100644
--- a/test/replication/qsync_with_anon.test.lua
+++ b/test/replication/qsync_with_anon.test.lua
@@ -36,14 +36,33 @@ box.space.sync:drop()
-- [RFC, Asynchronous replication] failed transaction rolled back on async
-- replica.
-- Testcase setup.
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000}
_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
--- Testcase body.
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+_ = box.space.sync:delete{1}
+
+box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000}
+fiber = require('fiber')
+ok, err = nil
+f = fiber.create(function() \
+ ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \
+end)
+
+test_run:cmd('switch replica_anon')
+test_run:wait_cond(function() return box.space.sync:count() == 1 end)
+box.space.sync:select{}
+
test_run:switch('default')
-box.space.sync:insert{1} -- failure
+box.cfg{replication_synchro_timeout = 0.001}
+test_run:wait_cond(function() return f:status() == 'dead' end)
+box.space.sync:select{}
+
test_run:cmd('switch replica_anon')
-box.space.sync:select{} -- none
+test_run:wait_cond(function() return box.space.sync:count() == 0 end)
+box.space.sync:select{}
+
test_run:switch('default')
box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
box.space.sync:insert{1} -- success
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 7+ messages in thread