From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 40F14469719 for ; Fri, 11 Sep 2020 13:36:45 +0300 (MSK) Date: Fri, 11 Sep 2020 13:36:44 +0300 From: Kirill Yukhin Message-ID: <20200911103644.tthafcu6iried62o@tarantool.org> References: <87a0a297a6c954f37bed6a79f81949a87196a218.1599440334.git.avtikhon@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87a0a297a6c954f37bed6a79f81949a87196a218.1599440334.git.avtikhon@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Alexander V. Tikhonov" Cc: tarantool-patches@dev.tarantool.org Hello, On 07 сен 04:00, Alexander V. Tikhonov wrote: > On heavy loaded hosts found the following 3 issues: > > line 174: > > [026] --- replication/status.result Thu Jun 11 12:07:39 2020 > [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020 > [026] @@ -174,15 +174,17 @@ > [026] ... > [026] replica.downstream.status == 'follow' > [026] --- > [026] -- true > [026] +- false > [026] ... > > It happened because replication downstream status check occurred too > early. To give the replication status check routine ability to reach > the needed 'follow' state, it need to wait for it using > test_run:wait_downstream() routine. > > line 178: > > [024] --- replication/status.result Mon Sep 7 00:22:52 2020 > [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020 > [024] @@ -178,11 +178,13 @@ > [024] ... > [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id] > [024] --- > [024] -- true > [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to > [024] + index field ''vclock'' (a nil value)' > [024] ... > [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] > [024] --- > [024] -- true > [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to > [024] + index field ''vclock'' (a nil value)' > [024] ... > [024] -- > [024] -- Replica > > It happened because replication vclock field was not exist at the moment > of its check. To fix the issue, vclock field had to be waited to be > available using test_run:wait_cond() routine. Also the replication data > downstream had to be read at the same moment. > > line 224: > > [014] --- replication/status.result Fri Jul 3 04:29:56 2020 > [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020 > [014] @@ -224,7 +224,7 @@ > [014] ... > [014] master.upstream.status == "follow" > [014] --- > [014] -- true > [014] +- false > [014] ... > [014] master.upstream.lag < 1 > [014] --- > > It happened because replication upstream status check occurred too > early. To give the replication status check routine ability to reach > the needed 'follow' state, it need to wait for it using > test_run:wait_upstream() routine. > > Removed test from 'fragile' test_run tool list to run it in parallel. > > Closes #5110 I've checked your patch into 1.10, 2.4, 2.5 and master. -- Regards, Kirill Yukhin