From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id C09386EC40; Mon, 16 Aug 2021 18:15:58 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C09386EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1629126958; bh=t1qKVH8xQ3gnp6iSMiUoc8p9OPhHoCXxxd7xSUU6byw=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=F9zvPesjziB+ZQyUbO0xeAgJe9J+ML7iiKjakq8a74iCOPYTeM2q9sZQH1l777YXR 9uoDFDeJFkMzv/1wpypyH3wPYPfSJSHBhi46ZBDt5Bu7N9PB1Zh/KlNC5bss/bwon2 Ll7kP7EFvFZcKjNGG4d9YGd8C74drtxHFHwwlhVo= Received: from smtp48.i.mail.ru (smtp48.i.mail.ru [94.100.177.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 578846EC40 for ; Mon, 16 Aug 2021 18:15:27 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 578846EC40 Received: by smtp48.i.mail.ru with esmtpa (envelope-from ) id 1mFeKk-0006G4-4O; Mon, 16 Aug 2021 18:15:26 +0300 To: v.ioffe@tarantool.org, kyukhin@tarantool.org Date: Mon, 16 Aug 2021 18:15:22 +0300 Message-Id: X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92087353F0EC44DD972FF4A7D76DB5E242D14FEF1BD8BF4AC182A05F538085040F9035A49682B8BAD867E414C89B194709783F9D5083C91ACA7E7235D3FE98B0A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7E9A0F80F179600C6EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637BBCE7257090F96C9EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F25C47BB19D6E8431453A6D695EB6A97DECC7F00164DA146DAFE8445B8C89999728AA50765F7900637F3E38EE449E3E2AE389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC8A9FF340AA05FB58CF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B3241B8C6B78F28F975ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FC6E47B1ECF46E4C15F1034B2E5EE9138D111494D15AA4CBEBB1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDB3173B6A67958DD8015E2725BA614EAEA1EF972C1F679AE1C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3473688ED311681BF0A3284DAB8BE7D853C573F0C73CF1BD48E425BC0AA0523AAA22A564692DDB5F211D7E09C32AA3244CC66C04A777F95DC6B86FB56B0956DED064EE5813BBCA3A9D927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojIrFL/N5KnVEYoI+Cclkufw== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A44619A12318E85B8A6FB2766B9427F7C5D47B8E9BC31E217953424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v2 1/2] applier: fix upstream.lag calculations X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" upstream.lag is the delta between the moment when a row was written to master's journal and the moment when it was received by the replica. It's an important metric to check whether the replica has fallen too far behind master. Not all the rows coming from master have a valid time of creation. For example, RAFT system messages don't have one, and we can't assign correct time to them: these messages do not originate from the journal, and assigning current time to them would lead to jumps in upstream.lag results. Stop updating upstream.lag for rows which don't have creation time assigned. The upstream.lag calculation changes were meant to fix the flaky replication/errinj.test: Test failed! Result content mismatch: --- replication/errinj.result Fri Aug 13 15:15:35 2021 +++ /tmp/tnt/rejects/replication/errinj.reject Fri Aug 13 15:40:39 2021 @@ -310,7 +310,7 @@ ... box.info.replication[1].upstream.lag < 1 --- -- true +- false ... But the changes were not enough, because now the test may see the initial lag value (TIMEOUT_INFINITY). So fix the test as well by waiting until upstream.lag becomes < 1. --- src/box/applier.cc | 3 ++- test/replication/errinj.result | 5 ++++- test/replication/errinj.test.lua | 5 ++++- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index 902d0bc72..9256078e1 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -664,7 +664,8 @@ applier_read_tx_row(struct applier *applier, double timeout) coio_read_xrow_timeout_xc(coio, ibuf, row, timeout); - applier->lag = ev_now(loop()) - row->tm; + if (row->tm > 0) + applier->lag = ev_now(loop()) - row->tm; applier->last_row_time = ev_monotonic_now(loop()); return tx_row; } diff --git a/test/replication/errinj.result b/test/replication/errinj.result index 9d13f6aa7..ec251182f 100644 --- a/test/replication/errinj.result +++ b/test/replication/errinj.result @@ -308,7 +308,10 @@ box.info.replication[1].upstream.lag > 0 --- - true ... -box.info.replication[1].upstream.lag < 1 +-- Upstream lag is huge until the first row is received. +test_run:wait_cond(function()\ + return box.info.replication[1].upstream.lag < 1\ +end) --- - true ... diff --git a/test/replication/errinj.test.lua b/test/replication/errinj.test.lua index 19234ab35..7f6535ec1 100644 --- a/test/replication/errinj.test.lua +++ b/test/replication/errinj.test.lua @@ -130,7 +130,10 @@ test_run:cmd("switch replica") while box.info.replication[1].upstream.status ~= 'follow' do fiber.sleep(0.0001) end box.info.replication[1].upstream.status box.info.replication[1].upstream.lag > 0 -box.info.replication[1].upstream.lag < 1 +-- Upstream lag is huge until the first row is received. +test_run:wait_cond(function()\ + return box.info.replication[1].upstream.lag < 1\ +end) -- wait for ack timeout test_run:wait_upstream(1, {status='disconnected', message_re='unexpected EOF'}) -- 2.30.1 (Apple Git-130)