* [Tarantool-patches] [PATCH v1] test: fix issue on first replica in drop_cluster()
@ 2020-08-16 20:01 Alexander V. Tikhonov
0 siblings, 0 replies; only message in thread
From: Alexander V. Tikhonov @ 2020-08-16 20:01 UTC (permalink / raw)
To: Kirill Yukhin, Alexander Turenko; +Cc: tarantool-patches
Found flaky failed test replication/box_set_replication_stress.test.lua
on drop_cluster() routine, like:
--- replication/box_set_replication_stress.result Fri Aug 14 18:28:41 2020
+++ var/004_replication/box_set_replication_stress.result Sat Aug 15 15:19:44 2020
@@ -34,5 +34,3 @@
-- Cleanup.
test_run:drop_cluster(SERVERS)
- | ---
- | ...
Found that drop_cluster() routine from test-run repository failed in
stop() routine from lib/tarantool_server.py:TarantoolServer class.
It failed to stop 1st replica which used in test to switch on/off the
replication 1000 times. It happend because stop() routine used SIGTERM
by default which couldn't kill the first replica in some situations.
It happend when both replca processes were alive and tried to read and
write data into their sockets, but sockets of the first replica were
already unreachable while second replica were alive. In this situation
SIGTERM signal was not enough to stop the first replica and test-run
hanged in wait_stop() in lib/tarantool_server.py:TarantoolServer class
till test-run stopped the test by its general timeout of 2 minutes.
To fix the issue the only possible way was to use SIGKILL instead of
SIGTERM to be sure that the process will not wait for sockets closing
and would be killed w/o waiting of it. SIGKILL could be used by default
in drop_cluster() routine, but seems that this change was not good for
detecting the other issues of the other tests. So it was decided to use
SIGKILL just in this test as the additional option for "stop server"
test-run call.
Closes #5244
---
Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5244-replication-box-stress-drop-replica
Issue: https://github.com/tarantool/tarantool/issues/5244
.../replication/box_set_replication_stress.result | 15 ++++++++++++++-
.../box_set_replication_stress.test.lua | 5 ++++-
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/test/replication/box_set_replication_stress.result b/test/replication/box_set_replication_stress.result
index e683c0643..225f33ecb 100644
--- a/test/replication/box_set_replication_stress.result
+++ b/test/replication/box_set_replication_stress.result
@@ -33,6 +33,19 @@ test_run:cmd("switch default")
| ...
-- Cleanup.
-test_run:drop_cluster(SERVERS)
+test_run:cmd('stop server master_quorum1 with signal=SIGKILL')
| ---
+ | - true
+ | ...
+test_run:cmd('delete server master_quorum1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master_quorum2 with signal=SIGKILL')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master_quorum2')
+ | ---
+ | - true
| ...
diff --git a/test/replication/box_set_replication_stress.test.lua b/test/replication/box_set_replication_stress.test.lua
index 407e91e0f..88652b0b4 100644
--- a/test/replication/box_set_replication_stress.test.lua
+++ b/test/replication/box_set_replication_stress.test.lua
@@ -14,4 +14,7 @@ end
test_run:cmd("switch default")
-- Cleanup.
-test_run:drop_cluster(SERVERS)
+test_run:cmd('stop server master_quorum1 with signal=SIGKILL')
+test_run:cmd('delete server master_quorum1')
+test_run:cmd('stop server master_quorum2 with signal=SIGKILL')
+test_run:cmd('delete server master_quorum2')
--
2.17.1
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2020-08-16 20:01 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-16 20:01 [Tarantool-patches] [PATCH v1] test: fix issue on first replica in drop_cluster() Alexander V. Tikhonov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox