From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp18.mail.ru (smtp18.mail.ru [94.100.176.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 3F361445320 for ; Tue, 21 Jul 2020 01:00:59 +0300 (MSK) References: <2e5a2d3948eafebd1488ab120ad0458838661d8a.1594314820.git.sergeyb@tarantool.org> From: Vladislav Shpilevoy Message-ID: <8c3b7d4c-22e8-108a-b15e-ef3c89510a63@tarantool.org> Date: Tue, 21 Jul 2020 00:00:56 +0200 MIME-Version: 1.0 In-Reply-To: <2e5a2d3948eafebd1488ab120ad0458838661d8a.1594314820.git.sergeyb@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: sergeyb@tarantool.org, tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org, gorcunov@gmail.com, lvasiliev@tarantool.org Thanks for the patch! See 8 comments below. On 09.07.2020 19:16, sergeyb@tarantool.org wrote: > From: Sergey Bronnikov > > Part of #5055 > Part of #4849 > --- > test/replication/qsync_basic.result | 85 +++++++++++++++++++++++++++ > test/replication/qsync_basic.test.lua | 31 ++++++++++ > 2 files changed, 116 insertions(+) > > diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result > index ab4be0c7e..464df75a7 100644 > --- a/test/replication/qsync_basic.result > +++ b/test/replication/qsync_basic.result > @@ -32,6 +32,14 @@ s2.is_sync > | - false > | ... > > +-- > +-- gh-4849: clear synchro queue with unconfigured box > +-- > +box.ctl.clear_synchro_queue() 1. Enough to test that it does not crash here. No need to check result. > + | --- > + | - -1 > + | ... > + > -- Net.box takes sync into account. > box.schema.user.grant('guest', 'super') > | --- > @@ -553,6 +561,82 @@ box.space.sync:select{7} > | - - [7] > | ... > > +-- > +-- gh-4849: clear synchro queue on a replica > +-- > +test_run:switch('default') > + | --- > + | - true > + | ... > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2} 2. If you want this timeout to fail, it is too big. The test will be too long. If it is supposed not to fail, then it is way too small. If you want it to fail, it should be something like 0.001. If you want it to hold, it should be 1000 to be sure. Keep in mind that you can change the timeout on fly. That allows to build quite complex reactive tests completely event-based. > + | --- > + | ... > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {9}) 3. You need to extract the exact result value. Use pcall for that, and print its result after the fiber is dead. See other examples with fiber.create() in qsync test suite. The same for the next test case. > + | --- > + | ... > +f1:status() > + | --- > + | - suspended > + | ... > +test_run:switch('replica') > + | --- > + | - true 4. Better wait until the value is delivered. Otherwise you can switch to replica before master finishes WAL write, and the queue will be empty here. Try test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end) > + | ... > +box.ctl.clear_synchro_queue() > + | --- > + | - 0 > + | ... > +box.space.sync:select{9} > + | --- > + | - [] 5. If select returns 9 before queue clean, and returns empty afterwards, it means the queue was cleared. So here the queue size is not really needed, as you can see. > + | ... > +test_run:switch('default') > + | --- > + | - true > + | ... > +box.space.sync:select{9} > + | --- > + | - [] > + | ... > +f1:status() > + | --- > + | - dead > + | ... > + > +-- > +-- gh-4849: clear synchro queue on a master 6. Since the previous test the replica's WAL is different from master's. I am not sure the replication is still alive. > +-- > +test_run:switch('default') > + | --- > + | - true > + | ... > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2} > + | --- > + | ... > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {10}) > + | --- > + | ... > +f1:status() > + | --- > + | - suspended > + | ... > +box.ctl.clear_synchro_queue() > + | --- > + | - -2 > + | ... > +box.space.sync:select{10} > + | --- > + | - - [10] 7. Why is it 10? The quorum is not reached, it should have been rolled back. > + | ... > +test_run:switch('replica') > + | --- > + | - true > + | ... > +box.space.sync:select{10} > + | --- > + | - - [10] > + | ... > + > -- Cleanup. > test_run:cmd('switch default') > | --- > @@ -576,6 +660,7 @@ test_run:cmd('delete server replica') > | ... > box.space.test:drop() > | --- > + | - error: A rollback for a synchronous transaction is received 8. Why is it changed? > | ... > box.space.sync:drop() > | ---