From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 6B6A36EC5B; Tue, 30 Mar 2021 15:33:36 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6B6A36EC5B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1617107616; bh=GNY7g4qdSPsijd1h+TPYpoLFjE9UcrfEkHA6YnZuFrs=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=RSU67CezDjKLjcEGDPLcs9oQQNZ+spAFiX28IttY/XK/7k8nIVW5S/pjsQ3oXsEPI gtX9VJ4ki5xxPTniAnGY5N5J88wtkAuP6lpbRFDXoMgcmmfsJO6lKxYZyi9wXDtiak aE9/HAKmzlpO2htX61+QTKBWMnORYA4W3XwMLgY0= Received: from smtp17.mail.ru (smtp17.mail.ru [94.100.176.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 56ECB6EC5B for ; Tue, 30 Mar 2021 15:33:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 56ECB6EC5B Received: by smtp17.mail.ru with esmtpa (envelope-from ) id 1lRDYs-0002jD-Li; Tue, 30 Mar 2021 15:33:35 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <9bc5d0c0-39b1-eedd-9651-2f9c0b259f5a@tarantool.org> <618c1086-2bcd-ea5e-1d4c-54da8b6fa36b@tarantool.org> Message-ID: Date: Tue, 30 Mar 2021 15:33:34 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: <618c1086-2bcd-ea5e-1d4c-54da8b6fa36b@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9ED7173E37F4E32941B7C4A78AC10F96A7797F60C25BD4B06182A05F5380850403200E624F2775D2F5130CFD0C7FDF5CC899A60BDF7A127F64220ACD3637DE31D X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7586684DD3F99AA20EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637129C704593A46970EA1F7E6F0F101C67CDEEF6D7F21E0D1D174C73DBBBFC7664E2FBBBD4E8E0D64DCCD031121C6D1EC05D1C57F9CAC16F64389733CBF5DBD5E913377AFFFEAFD269A417C69337E82CC2CC7F00164DA146DAFE8445B8C89999729449624AB7ADAF37F6B57BC7E64490611E7FA7ABCAF51C92A417C69337E82CC2CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C224952E1F31393F6C55F76E601842F6C81A12EF20D2F80756B5F7E9C4E3C761E06A776E601842F6C81A127C277FBC8AE2E8BBDBBAF15C9C580393AA81AA40904B5D9DBF02ECDB25306B2201CA6A4E26CD07C3BBE47FD9DD3FB595F5C1EE8F4F765FCA83251EDC214901ED5E8D9A59859A8B645423645B6F85954089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A5664C7877E3FDB39A3F396B20FE18CC18161E6A9D86ABCDB8D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3457FA942CB4462B4C13E48089D4C2902160920E5BA18D4AB93CDC41E8039131FA2DDDDA2DABD507A91D7E09C32AA3244CD05117153B38787E9B5A6CA0A4F7BD0539C99C45E8D137E9FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojfQIxm2xDGoZS+ka4HNly5w== X-Mailru-Sender: 11C2EC085EDE56FAC07928AF2646A769B3A2AA99033588005130CFD0C7FDF5CCF43CCAED39DA92F132C2A64043BFB05F66FEC6BF5C9C28D9D2F9AC31ED4B18F0B80F102789B70DF27402F9BA4338D657ED14614B50AE0675 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 7/7] replication: do not ignore replica vclock on register X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 27.03.2021 23:13, Serge Petrenko via Tarantool-patches пишет: > > > 26.03.2021 23:50, Vladislav Shpilevoy пишет: >> Thanks for the patch! >> >> See 2 comments below. >> >> On 24.03.2021 13:24, Serge Petrenko wrote: >>> There was a bug in box_process_register. It decoded replica's vclock >>> but >>> never used it when sending the registration stream. So the replica >>> might >>> lose the data in range (replica_vclock, start_vclock). >> 1. Could you please add a test? > > Yes, sure. > >> >>> Follow-up #5566 >>> --- >>>   src/box/box.cc | 14 ++++++-------- >>>   1 file changed, 6 insertions(+), 8 deletions(-) >>> >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 292a54213..0bcb505a8 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -2154,7 +2154,8 @@ box_process_register(struct ev_io *io, struct >>> xrow_header *header) >>>                 "wal_mode = 'none'"); >>>       } >>>   -    struct gc_consumer *gc = >>> gc_consumer_register(&replicaset.vclock, >>> +    vclock_reset(&replica_vclock, 0, vclock_get(&replicaset.vclock, >>> 0)); >> 2. xrow_decode_register_xc() already returns the vclock with empty 0 >> part. Why do you need this reset? > > It sets vclock[0] to master's vclock[0]. But it's not needed here, > indeed. > Thanks for noticing! > > I thought having vclock[0] = 0 would harm gc somehow, but that's not > true. > I'm returning this vclock_reset, and here's why. It doesn't set vclock[0] to 0. It sets it to match master's vclock[0]. This is needed, because recovery tries to find xlog with vclock strictly less or equal to the vclock provided. If we leave vclock[0] = 0, recovery will try to open the last xlog which had vclock[0] = 0. This log may be already deleted, which was the reason for test failures in CI. Strangely enough, I couldn't reproduce the issue locally. Anyway, here you go: ================================================ diff --git a/src/box/box.cc b/src/box/box.cc index ecec8df27..4da274976 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -2154,6 +2154,8 @@ box_process_register(struct ev_io *io, struct xrow_header *header)                           "wal_mode = 'none'");         } +       /* @sa box_process_subscribe(). */ +       vclock_reset(&replica_vclock, 0, vclock_get(&replicaset.vclock, 0));         struct gc_consumer *gc = gc_consumer_register(&replica_vclock,                                 "replica %s", tt_uuid_str(&instance_uuid));         if (gc == NULL) ================================================ >> >>> +    struct gc_consumer *gc = gc_consumer_register(&replica_vclock, >>>                   "replica %s", tt_uuid_str(&instance_uuid)); >>>       if (gc == NULL) >>>           diag_raise(); > > ================================= > > diff --git a/src/box/box.cc b/src/box/box.cc > index 0bcb505a8..ecec8df27 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -2154,7 +2154,6 @@ box_process_register(struct ev_io *io, struct > xrow_header *header) >                "wal_mode = 'none'"); >      } > > -    vclock_reset(&replica_vclock, 0, vclock_get(&replicaset.vclock, 0)); >      struct gc_consumer *gc = gc_consumer_register(&replica_vclock, >                  "replica %s", tt_uuid_str(&instance_uuid)); >      if (gc == NULL) > diff --git a/test/replication/anon_register_gap.result > b/test/replication/anon_register_gap.result > new file mode 100644 > index 000000000..24a3548c8 > --- /dev/null > +++ b/test/replication/anon_register_gap.result > @@ -0,0 +1,116 @@ > +-- test-run result file version 2 > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- > +-- When master's registering an anonymous replica, it might ignore > the replica's > +-- current vclock, and skip the data in range (replica_clock, > master_clock). > +-- > +box.schema.user.grant('guest', 'replication') > + | --- > + | ... > +_ = box.schema.space.create('test') > + | --- > + | ... > +_ = box.space.test:create_index('pk') > + | --- > + | ... > + > +test_run:cmd('create server replica with rpl_master=default,\ > +              script="replication/anon1.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica') > + | --- > + | - true > + | ... > + > +test_run:wait_lsn('replica', 'default') > + | --- > + | ... > +box.error.injection.set('ERRINJ_RELAY_SEND_DELAY', true) > + | --- > + | - ok > + | ... > + > +box.space.test:insert{1} > + | --- > + | - [1] > + | ... > + > +test_run:switch('replica') > + | --- > + | - true > + | ... > + > +test_run:wait_upstream(1, {status='disconnected'}) > + | --- > + | - true > + | ... > +box.space.test:select{} > + | --- > + | - [] > + | ... > + > +fiber = require('fiber') > + | --- > + | ... > +f = fiber.new(function() box.cfg{replication_anon=false} end) > + | --- > + | ... > +test_run:wait_upstream(1, {status='register'}) > + | --- > + | - true > + | ... > + > +test_run:switch('default') > + | --- > + | - true > + | ... > +box.error.injection.set('ERRINJ_RELAY_SEND_DELAY', false) > + | --- > + | - ok > + | ... > +box.space.test:insert{2} > + | --- > + | - [2] > + | ... > + > +test_run:switch('replica') > + | --- > + | - true > + | ... > +test_run:wait_lsn('replica', 'default') > + | --- > + | ... > +f:status() > + | --- > + | - dead > + | ... > +box.space.test:select{} > + | --- > + | - - [1] > + |   - [2] > + | ... > + > +-- Cleanup > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica') > + | --- > + | - true > + | ... > +box.space.test:drop() > + | --- > + | ... > +box.schema.user.revoke('guest', 'replication') > + | --- > + | ... > diff --git a/test/replication/anon_register_gap.test.lua > b/test/replication/anon_register_gap.test.lua > new file mode 100644 > index 000000000..c71576a23 > --- /dev/null > +++ b/test/replication/anon_register_gap.test.lua > @@ -0,0 +1,43 @@ > +test_run = require('test_run').new() > + > +-- > +-- When master's registering an anonymous replica, it might ignore > the replica's > +-- current vclock, and skip the data in range (replica_clock, > master_clock). > +-- > +box.schema.user.grant('guest', 'replication') > +_ = box.schema.space.create('test') > +_ = box.space.test:create_index('pk') > + > +test_run:cmd('create server replica with rpl_master=default,\ > +              script="replication/anon1.lua"') > +test_run:cmd('start server replica') > + > +test_run:wait_lsn('replica', 'default') > +box.error.injection.set('ERRINJ_RELAY_SEND_DELAY', true) > + > +box.space.test:insert{1} > + > +test_run:switch('replica') > + > +test_run:wait_upstream(1, {status='disconnected'}) > +box.space.test:select{} > + > +fiber = require('fiber') > +f = fiber.new(function() box.cfg{replication_anon=false} end) > +test_run:wait_upstream(1, {status='register'}) > + > +test_run:switch('default') > +box.error.injection.set('ERRINJ_RELAY_SEND_DELAY', false) > +box.space.test:insert{2} > + > +test_run:switch('replica') > +test_run:wait_lsn('replica', 'default') > +f:status() > +box.space.test:select{} > + > +-- Cleanup > +test_run:switch('default') > +test_run:cmd('stop server replica') > +test_run:cmd('delete server replica') > +box.space.test:drop() > +box.schema.user.revoke('guest', 'replication') > diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg > index aea8b6157..aff5fda26 100644 > --- a/test/replication/suite.cfg > +++ b/test/replication/suite.cfg > @@ -1,5 +1,6 @@ >  { >      "anon.test.lua": {}, > +    "anon_register_gap.test.lua": {}, >      "gh-2991-misc-asserts-on-update.test.lua": {}, >      "gh-3111-misc-rebootstrap-from-ro-master.test.lua": {}, >      "gh-3160-misc-heartbeats-on-master-changes.test.lua": {}, > diff --git a/test/replication/suite.ini b/test/replication/suite.ini > index fc161700a..a9e44e8cf 100644 > --- a/test/replication/suite.ini > +++ b/test/replication/suite.ini > @@ -3,7 +3,7 @@ core = tarantool >  script =  master.lua >  description = tarantool/box, replication >  disabled = consistent.test.lua > -release_disabled = catch.test.lua errinj.test.lua gc.test.lua > gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua > qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua > sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua > gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua > gh-5144-qsync-dup-confirm.test.lua > gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua > gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua > +release_disabled = catch.test.lua errinj.test.lua gc.test.lua > gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua > qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua > sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua > gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua > gh-5144-qsync-dup-confirm.test.lua > gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua > gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua > anon_register_gap.test.lua >  config = suite.cfg >  lua_libs = lua/fast_replica.lua lua/rlimit.lua >  use_unix_sockets = True > -- Serge Petrenko