From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 7509C6EC55; Thu, 15 Jul 2021 23:11:08 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7509C6EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1626379868; bh=N0irALI7mTMrGVcbdioeovqDZQVBPETKUd6B5yd7EA4=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=ptVSijYCxoip0d6bEu+w3JRcwMjcjVrB+zeCAT0euEV7pZn52msVG++hLgL4KXq9M QSL6pzxBzAzC31HovCqBYhEwsEhJ8FNqWRCviovHLXvtcIl774oWuYjIdPj5ShTZot XVuMUbkWJpjZGVLmTmwx+urEDAMW/jbaILviPaqU= Received: from smtp36.i.mail.ru (smtp36.i.mail.ru [94.100.177.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BFEDD6EC55 for ; Thu, 15 Jul 2021 23:11:05 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BFEDD6EC55 Received: by smtp36.i.mail.ru with esmtpa (envelope-from ) id 1m47hI-0001mN-O2; Thu, 15 Jul 2021 23:11:05 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <65333f9e19bc2ab095ec53810f40b7b779aad032.1626287002.git.sergepetrenko@tarantool.org> Message-ID: Date: Thu, 15 Jul 2021 23:11:04 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <65333f9e19bc2ab095ec53810f40b7b779aad032.1626287002.git.sergepetrenko@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C30288BCF456A452EC92BAB6D044D5CCDE182A05F5380850406F03A19A333EBADDF81DB6A7741411F2561E94CF57BFBB3AD6841E6E36921F35 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE706BB9CA6FE35398DEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637205505A8D8EF484BEA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F22337720285D94172BC65D682D482BACFCC7F00164DA146DAFE8445B8C89999728AA50765F7900637D0FEED2715E18529389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC82FFDA4F57982C5F4F6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B9149C560DC76099D75ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975CAA8DC07915BC95B7669C15ADE96E799CDF086E9352037EDF9C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF3033054805BDE987699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D344A431191C56981FE07395C2F860CC58505CAEBC7B50BE49C87D0AC056D12CDA0EBD1E6B6FA32BC471D7E09C32AA3244C06CE89E66A38A9D2BC6275BBFC36BD2B97FE24653F78E668FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojSyb42jm8PHJ+a3Ga6HG4hQ== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A4460A9E4C7FEEBE3A60DBA6FCF5AEDBC2EC3C6C7BBBECBA1D4B424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v4 17/16] replication: fix flaky election_qsync.test X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Fix the test failing occasionally with the following result mismatch: [001] replication/election_qsync.test.lua memtx           [ fail ] [001] [001] Test failed! Result content mismatch: [001] --- replication/election_qsync.result    Thu Jul 15 17:15:48 2021 [001] +++ var/rejects/replication/election_qsync.reject    Thu Jul 15 20:46:51 2021 [001] @@ -145,8 +145,7 @@ [001]   | ... [001]  box.space.test:select{} [001]   | --- [001] - | - - [1] [001] - |   - [2] [001] + | - - [2] [001]   | ... [001]  box.space.test:drop() [001]   | --- [001] The issue happened because row [1] wasn't delivered to the 'default' instance from the 'replica' at all. The test does try to wait for [1] to be written to WAL and replicated, but sometimes it fails to wait until this event happens: box.ctl.promote() is issued asynchronously once the instance becomes the Raft leader. So issuing `box.ctl.wait_rw()` doesn't guarantee that the replica has already written the PROMOTE (the limbo is initially unclaimed so replica becomes writeable as soon as it becomes the Raft leader). Right after `wait_rw()` we wait for lsn propagation and for 'default' instance to reach replica's lsn. It may happen that lsn propagation happens due to PROMOTE being written to WAL, and not row [1]. When this is the case, the 'default' instance doesn't receive row [1] at all, resulting in the test error shown above. Fix the issue by waiting for the promotion to happen explicitly. Part of #5430 ---  test/replication/election_qsync.result   | 8 +++++++-  test/replication/election_qsync.test.lua | 7 ++++++-  2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result index 2402c8578..c6ec5e352 100644 --- a/test/replication/election_qsync.result +++ b/test/replication/election_qsync.result @@ -75,13 +75,19 @@ box.cfg{   | ---   | ... -box.ctl.wait_rw() +-- Promote is written asynchronously to the instance becoming the leader, so +-- wait for it. As soon as it's written, the instance's definitely a leader. +test_run:wait_cond(function() \ +    return box.info.synchro.queue.owner == box.info.id                          \ +end)   | --- + | - true   | ...  assert(box.info.election.state == 'leader')   | ---   | - true   | ... +  lsn = box.info.lsn   | ---   | ... diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua index e1aca8351..f3c7c290b 100644 --- a/test/replication/election_qsync.test.lua +++ b/test/replication/election_qsync.test.lua @@ -39,8 +39,13 @@ box.cfg{      replication_timeout = 0.1,                                                  \  } -box.ctl.wait_rw() +-- Promote is written asynchronously to the instance becoming the leader, so +-- wait for it. As soon as it's written, the instance's definitely a leader. +test_run:wait_cond(function() \ +    return box.info.synchro.queue.owner == box.info.id                          \ +end)  assert(box.info.election.state == 'leader') +  lsn = box.info.lsn  _ = fiber.create(function() \      ok, err = pcall(box.space.test.replace, box.space.test, {1})                \ -- 2.30.1 (Apple Git-130)