Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration
@ 2020-07-22 23:57 Vladislav Shpilevoy
  2020-07-22 23:57 ` [Tarantool-patches] [PATCH 1/2] test: fix flaky qsync_snapshots.test.lua again Vladislav Shpilevoy
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-22 23:57 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

The patchset attempts to fix more flaky test cases discovered since the last
fixes.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/qsync-flaky-tests
Issue: https://github.com/tarantool/tarantool/issues/5196
Issue: https://github.com/tarantool/tarantool/issues/5167

Vladislav Shpilevoy (2):
  test: fix flaky qsync_snapshots.test.lua again
  test: fix flaky qsync_with_anon.test.lua again

 test/replication/qsync_snapshots.result   |  8 +++-
 test/replication/qsync_snapshots.test.lua |  4 +-
 test/replication/qsync_with_anon.result   | 58 +++++++++++++++++++++--
 test/replication/qsync_with_anon.test.lua | 27 +++++++++--
 4 files changed, 86 insertions(+), 11 deletions(-)

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 1/2] test: fix flaky qsync_snapshots.test.lua again
  2020-07-22 23:57 [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Vladislav Shpilevoy
@ 2020-07-22 23:57 ` Vladislav Shpilevoy
  2020-07-22 23:57 ` [Tarantool-patches] [PATCH 2/2] test: fix flaky qsync_with_anon.test.lua again Vladislav Shpilevoy
  2020-07-28  8:09 ` [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Alexander V. Tikhonov
  2 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-22 23:57 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

One of the test cases started a sync transaction on master,
switched to replica, and tried to do some actions assuming that
the latest master data has arrived here.

But in fact the replica could be far behind the master. It could
still contain data from the previous test case. That led to a
bug, when it looked like if the replica had some data committed
on it, but not committed on master - this was just data from the
previous test case.

The issue is solved by flushing master's state to replica via
making a successful sync transaction.

Closes #5167
---
 test/replication/qsync_snapshots.result   | 8 +++++++-
 test/replication/qsync_snapshots.test.lua | 4 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index 782ffd482..cafdd63c8 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -176,8 +176,14 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+ | ---
+ | ...
+_ = box.space.sync:delete{1}
+ | ---
+ | ...
 
--- Testcase body.
 test_run:switch('default')
  | ---
  | - true
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index 979f04d5f..590610974 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -85,8 +85,10 @@ test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+_ = box.space.sync:delete{1}
 
--- Testcase body.
 test_run:switch('default')
 box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
 ok, err = nil
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 2/2] test: fix flaky qsync_with_anon.test.lua again
  2020-07-22 23:57 [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Vladislav Shpilevoy
  2020-07-22 23:57 ` [Tarantool-patches] [PATCH 1/2] test: fix flaky qsync_snapshots.test.lua again Vladislav Shpilevoy
@ 2020-07-22 23:57 ` Vladislav Shpilevoy
  2020-07-28  8:09 ` [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Alexander V. Tikhonov
  2 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-22 23:57 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

One of the test cases had 2 problems.

- The same as in the previous commit - it started a sync
  transaction on master, switched to replica assuming it sees
  everything up to this sync transaction, but it still can see
  data from the previous test case;

- The test case tried to write a sync transaction on master, got
  timeout, switched to replica to ensure the data is removed here
  too, but since dirty reads are possible, it could happen the
  data was delivered to replica and ROLLBACK wasn't not yet. On
  the replica the rolled back data still could be visible.

The first issue is solved by flushing master's state to replica
via making a successful sync transaction.

The second issue is fixed by splitting it into more steps, not
depending on timeouts (1000 is considered infinity).

Closes #5196
---
 test/replication/qsync_with_anon.result   | 58 +++++++++++++++++++++--
 test/replication/qsync_with_anon.test.lua | 27 +++++++++--
 2 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
index 51f02bcdb..6a2952a32 100644
--- a/test/replication/qsync_with_anon.result
+++ b/test/replication/qsync_with_anon.result
@@ -96,7 +96,7 @@ box.space.sync:drop()
 -- [RFC, Asynchronous replication] failed transaction rolled back on async
 -- replica.
 -- Testcase setup.
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -105,23 +105,71 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
--- Testcase body.
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+ | ---
+ | ...
+_ = box.space.sync:delete{1}
+ | ---
+ | ...
+
+box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000}
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+ok, err = nil
+ | ---
+ | ...
+f = fiber.create(function()                                                     \
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {1})                 \
+end)
+ | ---
+ | ...
+
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.sync:count() == 1 end)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
 test_run:switch('default')
  | ---
  | - true
  | ...
-box.space.sync:insert{1} -- failure
+box.cfg{replication_synchro_timeout = 0.001}
+ | ---
+ | ...
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
  | ---
- | - error: Quorum collection for a synchronous transaction is timed out
+ | - []
  | ...
+
 test_run:cmd('switch replica_anon')
  | ---
  | - true
  | ...
-box.space.sync:select{} -- none
+test_run:wait_cond(function() return box.space.sync:count() == 0 end)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
  | ---
  | - []
  | ...
+
 test_run:switch('default')
  | ---
  | - true
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
index 5bc7c8be4..d7ecaa107 100644
--- a/test/replication/qsync_with_anon.test.lua
+++ b/test/replication/qsync_with_anon.test.lua
@@ -36,14 +36,33 @@ box.space.sync:drop()
 -- [RFC, Asynchronous replication] failed transaction rolled back on async
 -- replica.
 -- Testcase setup.
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
--- Testcase body.
+-- Write something to flush the current master's state to replica.
+_ = box.space.sync:insert{1}
+_ = box.space.sync:delete{1}
+
+box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000}
+fiber = require('fiber')
+ok, err = nil
+f = fiber.create(function()                                                     \
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {1})                 \
+end)
+
+test_run:cmd('switch replica_anon')
+test_run:wait_cond(function() return box.space.sync:count() == 1 end)
+box.space.sync:select{}
+
 test_run:switch('default')
-box.space.sync:insert{1} -- failure
+box.cfg{replication_synchro_timeout = 0.001}
+test_run:wait_cond(function() return f:status() == 'dead' end)
+box.space.sync:select{}
+
 test_run:cmd('switch replica_anon')
-box.space.sync:select{} -- none
+test_run:wait_cond(function() return box.space.sync:count() == 0 end)
+box.space.sync:select{}
+
 test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 box.space.sync:insert{1} -- success
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration
  2020-07-22 23:57 [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Vladislav Shpilevoy
  2020-07-22 23:57 ` [Tarantool-patches] [PATCH 1/2] test: fix flaky qsync_snapshots.test.lua again Vladislav Shpilevoy
  2020-07-22 23:57 ` [Tarantool-patches] [PATCH 2/2] test: fix flaky qsync_with_anon.test.lua again Vladislav Shpilevoy
@ 2020-07-28  8:09 ` Alexander V. Tikhonov
  2020-07-28 20:37   ` Vladislav Shpilevoy
  2 siblings, 1 reply; 7+ messages in thread
From: Alexander V. Tikhonov @ 2020-07-28  8:09 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi Vlad, thanks for the fixes. I've checked it and seems that it helps
to avoid of issues - the patches LGTM.

On Thu, Jul 23, 2020 at 01:57:00AM +0200, Vladislav Shpilevoy wrote:
> The patchset attempts to fix more flaky test cases discovered since the last
> fixes.
> 
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/qsync-flaky-tests
> Issue: https://github.com/tarantool/tarantool/issues/5196
> Issue: https://github.com/tarantool/tarantool/issues/5167
> 
> Vladislav Shpilevoy (2):
>   test: fix flaky qsync_snapshots.test.lua again
>   test: fix flaky qsync_with_anon.test.lua again
> 
>  test/replication/qsync_snapshots.result   |  8 +++-
>  test/replication/qsync_snapshots.test.lua |  4 +-
>  test/replication/qsync_with_anon.result   | 58 +++++++++++++++++++++--
>  test/replication/qsync_with_anon.test.lua | 27 +++++++++--
>  4 files changed, 86 insertions(+), 11 deletions(-)
> 
> -- 
> 2.21.1 (Apple Git-122.3)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration
  2020-07-28  8:09 ` [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Alexander V. Tikhonov
@ 2020-07-28 20:37   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-28 20:37 UTC (permalink / raw)
  To: Alexander V. Tikhonov; +Cc: tarantool-patches

Pushed to master and 2.5.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration
  2020-07-14 22:44 Vladislav Shpilevoy
@ 2020-07-17 10:56 ` Sergey Bronnikov
  0 siblings, 0 replies; 7+ messages in thread
From: Sergey Bronnikov @ 2020-07-17 10:56 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Thanks for the patch!

I have spend some time to reproduce an original issue described in a
commit message and failed. On the other hand both tests with applied
patches passed 1000 iterations in concurrent mode (-j 10) without fails.

LGTM

On 00:44 Wed 15 Jul , Vladislav Shpilevoy wrote:
> The tests keep failing, each time in a new way. The patchset
> attempts to fix them again. Worth mentioning, that I couldn't
> reproduce the fails in the issues, and the fixes are based on my
> assumptions + on the passed CI (failed, but in other tests).
> 
> How 5168 managed to happen I can't even imagine, but the flaky
> test case is reworked in this patchset anyway, it was incorrect.
> 
> I suspect these fails depend on disk speed somehow, not on CPU.
> Especially looking at how 5167 failed. On my machine
> reproducibility seems to be so low, that I couldn't get it, even
> with tens of workers.
> 
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5167-5168-qsync-flaky
> Issue: https://github.com/tarantool/tarantool/issues/5167
> Issue: https://github.com/tarantool/tarantool/issues/5168

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration
@ 2020-07-14 22:44 Vladislav Shpilevoy
  2020-07-17 10:56 ` Sergey Bronnikov
  0 siblings, 1 reply; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-14 22:44 UTC (permalink / raw)
  To: tarantool-patches, avtikhon

The tests keep failing, each time in a new way. The patchset
attempts to fix them again. Worth mentioning, that I couldn't
reproduce the fails in the issues, and the fixes are based on my
assumptions + on the passed CI (failed, but in other tests).

How 5168 managed to happen I can't even imagine, but the flaky
test case is reworked in this patchset anyway, it was incorrect.

I suspect these fails depend on disk speed somehow, not on CPU.
Especially looking at how 5167 failed. On my machine
reproducibility seems to be so low, that I couldn't get it, even
with tens of workers.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5167-5168-qsync-flaky
Issue: https://github.com/tarantool/tarantool/issues/5167
Issue: https://github.com/tarantool/tarantool/issues/5168

Vladislav Shpilevoy (2):
  test: fix flaky qsync_advanced.test.lua
  test: fix flaky qsync_snapshots.test.lua

 test/replication/qsync_advanced.result    | 42 ++++++++++++++++-------
 test/replication/qsync_advanced.test.lua  | 32 ++++++++++-------
 test/replication/qsync_snapshots.result   |  4 +++
 test/replication/qsync_snapshots.test.lua |  1 +
 4 files changed, 55 insertions(+), 24 deletions(-)

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-28 20:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22 23:57 [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Vladislav Shpilevoy
2020-07-22 23:57 ` [Tarantool-patches] [PATCH 1/2] test: fix flaky qsync_snapshots.test.lua again Vladislav Shpilevoy
2020-07-22 23:57 ` [Tarantool-patches] [PATCH 2/2] test: fix flaky qsync_with_anon.test.lua again Vladislav Shpilevoy
2020-07-28  8:09 ` [Tarantool-patches] [PATCH 0/2] Qsync flaky tests, next iteration Alexander V. Tikhonov
2020-07-28 20:37   ` Vladislav Shpilevoy
  -- strict thread matches above, loose matches on Subject: below --
2020-07-14 22:44 Vladislav Shpilevoy
2020-07-17 10:56 ` Sergey Bronnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox