[Tarantool-patches] [PATCH v1] test: fix flaky replication/wal_rw_stress.test.lua

Alexander V. Tikhonov avtikhon at tarantool.org
Mon Jun 15 17:34:12 MSK 2020


Found issue:

 [016] --- replication/wal_rw_stress.result	Fri Feb 21 11:53:21 2020
 [016] +++ replication/wal_rw_stress.reject	Fri May  8 08:23:56 2020
 [016] @@ -73,7 +73,42 @@
 [016]  ...
 [016]  box.info.replication[1].downstream.status ~= 'stopped' or box.info
 [016]  ---
 [016] -- true
 [016] +- version: 2.5.0-27-g32f59756a
 [016] +  id: 2
 [016] +  ro: false
 [016] +  uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +  package: Tarantool
 [016] +  cluster:
 [016] +    uuid: 397c196f-9105-11ea-96ab-08002739cbd6
 [016] +  listen: unix/:/home/vagrant/tarantool/test/var/016_replication/replica.socket-iproto
 [016] +  replication:
 [016] +    1:
 [016] +      id: 1
 [016] +      uuid: 397a1886-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 10005
 [016] +      upstream:
 [016] +        status: follow
 [016] +        idle: 0.46353673400017
 [016] +        peer: unix/:/home/vagrant/tarantool/test/var/016_replication/master.socket-iproto
 [016] +        lag: -0.45732522010803
 [016] +      downstream:
 [016] +        status: stopped
 [016] +        message: writev(1), called on fd 24, aka unix/:/home/vagrant/tarantool/test/var/016_replicati
 [016] +        system_message: Broken pipe
 [016] +    2:
 [016] +      id: 2
 [016] +      uuid: 41cbebcc-9105-11ea-96ab-08002739cbd6
 [016] +      lsn: 0
 [016] +  signature: 10005
 [016] +  status: running
 [016] +  vinyl: []
 [016] +  uptime: 2
 [016] +  lsn: 0
 [016] +  sql: []
 [016] +  gc: []
 [016] +  pid: 41231
 [016] +  memory: []
 [016] +  vclock: {1: 10005}
 [016]  ...
 [016]  test_run:cmd("switch default")
 [016]  ---

To check the downstream status and it's message need to wait until an
downstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65f8ef5f6c737cf70c0127189d0ebcbc36e ('replication: recfg with 0
quorum returns immediately').

Closes #4977
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4977-replication-wal_rw_stress
Issue: https://github.com/tarantool/tarantool/issues/4977

 test/replication/suite.ini              | 1 -
 test/replication/wal_rw_stress.result   | 2 +-
 test/replication/wal_rw_stress.test.lua | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 6a8944020..231bedaaf 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -18,7 +18,6 @@ fragile = errinj.test.lua            ; gh-3870
           sync.test.lua              ; gh-3835 gh-3877
           transaction.test.lua       ; gh-4312
           wal_off.test.lua           ; gh-4355
-          wal_rw_stress.test.lua     ; gh-4977
           replica_rejoin.test.lua    ; gh-4985
           recover_missing_xlog.test.lua ; gh-4989
           box_set_replication_stress ; gh-4992
diff --git a/test/replication/wal_rw_stress.result b/test/replication/wal_rw_stress.result
index cc68877b0..cfb2f8a9e 100644
--- a/test/replication/wal_rw_stress.result
+++ b/test/replication/wal_rw_stress.result
@@ -71,7 +71,7 @@ test_run:cmd("switch replica")
 box.cfg{replication = replication}
 ---
 ...
-box.info.replication[1].downstream.status ~= 'stopped' or box.info
+test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
 ---
 - true
 ...
diff --git a/test/replication/wal_rw_stress.test.lua b/test/replication/wal_rw_stress.test.lua
index 08570b285..48d68c5ac 100644
--- a/test/replication/wal_rw_stress.test.lua
+++ b/test/replication/wal_rw_stress.test.lua
@@ -38,7 +38,7 @@ test_run:cmd("setopt delimiter ''");
 -- are running in different threads, there shouldn't be any rw errors.
 test_run:cmd("switch replica")
 box.cfg{replication = replication}
-box.info.replication[1].downstream.status ~= 'stopped' or box.info
+test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
 test_run:cmd("switch default")
 
 -- Cleanup.
-- 
2.17.1



More information about the Tarantool-patches mailing list