From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp29.i.mail.ru (smtp29.i.mail.ru [94.100.177.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 1458942F4AD for ; Mon, 15 Jun 2020 17:34:17 +0300 (MSK) From: "Alexander V. Tikhonov" Date: Mon, 15 Jun 2020 17:34:12 +0300 Message-Id: <2074a5617eb0da1c16830aab2f64f51f22ecb9bf.1592231572.git.avtikhon@tarantool.org> Subject: [Tarantool-patches] [PATCH v1] test: fix flaky replication/wal_rw_stress.test.lua List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Bronnikov , Alexander Turenko , Kirill Yukhin Cc: tarantool-patches@dev.tarantool.org Found issue: [016] --- replication/wal_rw_stress.result Fri Feb 21 11:53:21 2020 [016] +++ replication/wal_rw_stress.reject Fri May 8 08:23:56 2020 [016] @@ -73,7 +73,42 @@ [016] ... [016] box.info.replication[1].downstream.status ~= 'stopped' or box.info [016] --- [016] -- true [016] +- version: 2.5.0-27-g32f59756a [016] + id: 2 [016] + ro: false [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + package: Tarantool [016] + cluster: [016] + uuid: 397c196f-9105-11ea-96ab-08002739cbd6 [016] + listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto [016] + replication: [016] + 1: [016] + id: 1 [016] + uuid: 397a1886-9105-11ea-96ab-08002739cbd6 [016] + lsn: 10005 [016] + upstream: [016] + status: follow [016] + idle: 0.46353673400017 [016] + peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto [016] + lag: -0.45732522010803 [016] + downstream: [016] + status: stopped [016] + message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati [016] + system_message: Broken pipe [016] + 2: [016] + id: 2 [016] + uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6 [016] + lsn: 0 [016] + signature: 10005 [016] + status: running [016] + vinyl: [] [016] + uptime: 2 [016] + lsn: 0 [016] + sql: [] [016] + gc: [] [016] + pid: 41231 [016] + memory: [] [016] + vclock: {1: 10005} [016] ... [016] test_run:cmd("switch default") [016] --- To check the downstream status and it's message need to wait until an downstream appears. This prevents an attempt to index a nil value when one of those functions are called before a record about a peer appears in box.info.replication. It was observed on test: replication/show_error_on_disconnect after commit c6bea65f8ef5f6c737cf70c0127189d0ebcbc36e ('replication: recfg with 0 quorum returns immediately'). Closes #4977 --- Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4977-replication-wal_rw_stress Issue: https://github.com/tarantool/tarantool/issues/4977 test/replication/suite.ini | 1 - test/replication/wal_rw_stress.result | 2 +- test/replication/wal_rw_stress.test.lua | 2 +- 3 files changed, 2 insertions(+), 3 deletions(-) diff --git a/test/replication/suite.ini b/test/replication/suite.ini index 6a8944020..231bedaaf 100644 --- a/test/replication/suite.ini +++ b/test/replication/suite.ini @@ -18,7 +18,6 @@ fragile = errinj.test.lua ; gh-3870 sync.test.lua ; gh-3835 gh-3877 transaction.test.lua ; gh-4312 wal_off.test.lua ; gh-4355 - wal_rw_stress.test.lua ; gh-4977 replica_rejoin.test.lua ; gh-4985 recover_missing_xlog.test.lua ; gh-4989 box_set_replication_stress ; gh-4992 diff --git a/test/replication/wal_rw_stress.result b/test/replication/wal_rw_stress.result index cc68877b0..cfb2f8a9e 100644 --- a/test/replication/wal_rw_stress.result +++ b/test/replication/wal_rw_stress.result @@ -71,7 +71,7 @@ test_run:cmd("switch replica") box.cfg{replication = replication} --- ... -box.info.replication[1].downstream.status ~= 'stopped' or box.info +test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info --- - true ... diff --git a/test/replication/wal_rw_stress.test.lua b/test/replication/wal_rw_stress.test.lua index 08570b285..48d68c5ac 100644 --- a/test/replication/wal_rw_stress.test.lua +++ b/test/replication/wal_rw_stress.test.lua @@ -38,7 +38,7 @@ test_run:cmd("setopt delimiter ''"); -- are running in different threads, there shouldn't be any rw errors. test_run:cmd("switch replica") box.cfg{replication = replication} -box.info.replication[1].downstream.status ~= 'stopped' or box.info +test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info test_run:cmd("switch default") -- Cleanup. -- 2.17.1