[Tarantool-patches] [PATCH v1] test: fix issue on first replica in drop_cluster()

Alexander V. Tikhonov avtikhon at tarantool.org
Sun Aug 16 23:01:34 MSK 2020


Found flaky failed test replication/box_set_replication_stress.test.lua
on drop_cluster() routine, like:

  --- replication/box_set_replication_stress.result	Fri Aug 14 18:28:41 2020
  +++ var/004_replication/box_set_replication_stress.result	Sat Aug 15 15:19:44 2020
  @@ -34,5 +34,3 @@

   -- Cleanup.
   test_run:drop_cluster(SERVERS)
  - | ---
  - | ...

Found that drop_cluster() routine from test-run repository failed in
stop() routine from lib/tarantool_server.py:TarantoolServer class.
It failed to stop 1st replica which used in test to switch on/off the
replication 1000 times. It happend because stop() routine used SIGTERM
by default which couldn't kill the first replica in some situations.
It happend when both replca processes were alive and tried to read and
write data into their sockets, but sockets of the first replica were
already unreachable while second replica were alive. In this situation
SIGTERM signal was not enough to stop the first replica and test-run
hanged in wait_stop() in lib/tarantool_server.py:TarantoolServer class
till test-run stopped the test by its general timeout of 2 minutes.

To fix the issue the only possible way was to use SIGKILL instead of
SIGTERM to be sure that the process will not wait for sockets closing
and would be killed w/o waiting of it. SIGKILL could be used by default
in drop_cluster() routine, but seems that this change was not good for
detecting the other issues of the other tests. So it was decided to use
SIGKILL just in this test as the additional option for "stop server"
test-run call.

Closes #5244
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5244-replication-box-stress-drop-replica
Issue: https://github.com/tarantool/tarantool/issues/5244

 .../replication/box_set_replication_stress.result | 15 ++++++++++++++-
 .../box_set_replication_stress.test.lua           |  5 ++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/test/replication/box_set_replication_stress.result b/test/replication/box_set_replication_stress.result
index e683c0643..225f33ecb 100644
--- a/test/replication/box_set_replication_stress.result
+++ b/test/replication/box_set_replication_stress.result
@@ -33,6 +33,19 @@ test_run:cmd("switch default")
  | ...
 
 -- Cleanup.
-test_run:drop_cluster(SERVERS)
+test_run:cmd('stop server master_quorum1 with signal=SIGKILL')
  | ---
+ | - true
+ | ...
+test_run:cmd('delete server master_quorum1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master_quorum2 with signal=SIGKILL')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master_quorum2')
+ | ---
+ | - true
  | ...
diff --git a/test/replication/box_set_replication_stress.test.lua b/test/replication/box_set_replication_stress.test.lua
index 407e91e0f..88652b0b4 100644
--- a/test/replication/box_set_replication_stress.test.lua
+++ b/test/replication/box_set_replication_stress.test.lua
@@ -14,4 +14,7 @@ end
 test_run:cmd("switch default")
 
 -- Cleanup.
-test_run:drop_cluster(SERVERS)
+test_run:cmd('stop server master_quorum1 with signal=SIGKILL')
+test_run:cmd('delete server master_quorum1')
+test_run:cmd('stop server master_quorum2 with signal=SIGKILL')
+test_run:cmd('delete server master_quorum2')
-- 
2.17.1



More information about the Tarantool-patches mailing list