From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 7C15E6EC55; Tue, 15 Jun 2021 17:26:09 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7C15E6EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1623767169; bh=Zp591K2512RWqRyZajIYohU8ulohoaIHuCjIYABlroI=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=zMpdbg+4P6muc/A3ND76N2sUUWKX3Swf14KUVwg0Iv2Fc4eT64KCHR0Xmqe75mBMy p29k9yq5TifBtZwWXPyEdpW77a0Q3cCsw+A7ciiE2AZPI/lsNvFM299FUakjRAGD9F IX++J5l97rsgBPQkdo/OVTZl5iTDTPBbE2WtymHU= Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 29C4E6EC55 for ; Tue, 15 Jun 2021 17:26:08 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 29C4E6EC55 Received: by smtp51.i.mail.ru with esmtpa (envelope-from ) id 1ltA11-0000qD-7e; Tue, 15 Jun 2021 17:26:07 +0300 To: Cyrill Gorcunov , tml Cc: Vladislav Shpilevoy References: <20210615135630.63465-1-gorcunov@gmail.com> <20210615135630.63465-2-gorcunov@gmail.com> Message-ID: Date: Tue, 15 Jun 2021 17:26:06 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210615135630.63465-2-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C54907A7AE9C1BA82BC67C1327DFB87C6A6182A05F53808504081985CB2880E97F3DDD538B31918F0E04E732FDCA7DA3B8C03227BFAA76E8199 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7CEF52BA550D6CAADEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637E6006D770ADD73CF8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D82814AFE59DEE19628A44E6C37A421D62117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BF1175FABE1C0F9B6A471835C12D1D977C4224003CC836476EB9C4185024447017B076A6E789B0E975F5C1EE8F4F765FCFD4895D4057426303AA81AA40904B5D9CF19DD082D7633A078D18283394535A93AA81AA40904B5D98AA50765F790063772B997ABA695E58ED81D268191BDAD3D698AB9A7B718F8C4D1B931868CE1C5781A620F70A64A45A98AA50765F79006372E808ACE2090B5E1725E5C173C3A84C3C5EA940A35A165FF2DBA43225CD8A89F890A246B268E114E57739F23D657EF2BB5C8C57E37DE458BEDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A58E8003D4A0D2B271AB6F0AB597CC37527E44862DB2CA9F6DD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75FBC5FED0552DA851410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34A5970F8A4685C8AB3CA5DE5BEAE6C539551B4994836E06AC33C3A0D0BFA04D72302F9225D94816C71D7E09C32AA3244CE017A407A45F57407494105E6D21DD94A995755A1445935EFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj6OL1iHTyIM081xxWClPp6A== X-Mailru-Sender: 583F1D7ACE8F49BD9DF7A8DAE6E2B08A6DE85298C5CF16B4834295E36CE59F5A65A204933B50BE7F424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 15.06.2021 16:56, Cyrill Gorcunov пишет: > Currently we use synchro packets filtration based on their contents, > in particular by their xrow->replica_id value. Still there was a > question if we can optimize this moment and rather filter out all > packets coming from non-leader replica. > > Raft specification requires that only data from a current leader > should be applied to local WAL but doesn't put a concrete claim on > the data transport, ie how exactly rows are reaching replicas. This > implies that data propagation may reach replicas indirectly via transit > hops. Thus we drop applier->instance_id filtering and rely on > xrow->replica_id matching instead. > > In the test (inspired by Serge Petrenko's test) we recreate the situation > where replica3 obtains master's node data (which is a raft leader) > indirectly via replica2 node. > > Closes #6035 > > Co-developed-by: Serge Petrenko > Signed-off-by: Cyrill Gorcunov > --- Thanks for the fixes! Please, find a couple of finalization comments below. > src/box/applier.cc | 27 ++-- > src/lib/raft/raft.h | 7 - > .../gh-5445-leader-inconsistency.result | 15 ++ > .../gh-5445-leader-inconsistency.test.lua | 5 + > .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++ > .../gh-6035-applier-filter.test.lua | 68 +++++++++ > test/replication/gh-6035-master.lua | 1 + > test/replication/gh-6035-node.lua | 35 +++++ > test/replication/gh-6035-replica1.lua | 1 + > test/replication/gh-6035-replica2.lua | 1 + > test/replication/suite.cfg | 3 + > 11 files changed, 288 insertions(+), 19 deletions(-) > create mode 100644 test/replication/gh-6035-applier-filter.result > create mode 100644 test/replication/gh-6035-applier-filter.test.lua > create mode 120000 test/replication/gh-6035-master.lua > create mode 100644 test/replication/gh-6035-node.lua > create mode 120000 test/replication/gh-6035-replica1.lua > create mode 120000 test/replication/gh-6035-replica2.lua > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 33181fdbf..d3430f582 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -1027,21 +1027,24 @@ nopify:; > * Return 0 for success or -1 in case of an error. > */ > static int > -applier_apply_tx(struct applier *applier, struct stailq *rows) > +applier_apply_tx(struct stailq *rows) > { > /* > - * Rows received not directly from a leader are ignored. That is a > - * protection against the case when an old leader keeps sending data > - * around not knowing yet that it is not a leader anymore. > + * Initially we've been filtering out data if it came from > + * an applier which instance_id doesn't match raft->leader, > + * but this prevents from obtaining valid leader's data when > + * it comes from intermediate node. For example a series of > + * replica hops > * > - * XXX: it may be that this can be fine to apply leader transactions by > - * looking at their replica_id field if it is equal to leader id. That > - * can be investigated as an 'optimization'. Even though may not give > - * anything, because won't change total number of rows sent in the > - * network anyway. > + * master -> replica 1 -> replica 2 > + * > + * where each replica carries master's initiated transaction > + * in xrow->replica_id field and master's data get propagated > + * indirectly. > + * > + * Finally we dropped such "sender" filtration and use transaction > + * "initiator" filtration via xrow->replica_id only. > */ > - if (!raft_is_source_allowed(box_raft(), applier->instance_id)) > - return 0; > struct xrow_header *first_row = &stailq_first_entry(rows, > struct applier_tx_row, next)->row; > struct xrow_header *last_row; > @@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier) > diag_raise(); > } > applier_signal_ack(applier); > - } else if (applier_apply_tx(applier, &rows) != 0) { > + } else if (applier_apply_tx(&rows) != 0) { > diag_raise(); > } > > diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h > index a8da564b0..fae30b03d 100644 > --- a/src/lib/raft/raft.h > +++ b/src/lib/raft/raft.h > @@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft) > return raft->is_enabled && raft->state != RAFT_STATE_LEADER; > } > > -/** See if the instance can accept rows from an instance with the given ID. */ > -static inline bool > -raft_is_source_allowed(const struct raft *raft, uint32_t source_id) > -{ > - return !raft->is_enabled || raft->leader == source_id; > -} > - > /** Check if Raft is enabled. */ > static inline bool > raft_is_enabled(const struct raft *raft) > diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result > index 5c6169f50..8b9a4051a 100644 > --- a/test/replication/gh-5445-leader-inconsistency.result > +++ b/test/replication/gh-5445-leader-inconsistency.result > @@ -178,6 +178,14 @@ test_run:cmd('stop server '..leader) > is_possible_leader[leader_nr] = false > | --- > | ... > +-- And other node as well. > +test_run:cmd('stop server '..other) > + | --- > + | - true > + | ... > +is_possible_leader[other_nr] = false > + | --- > + | ... > > -- Emulate a situation when next_leader wins the elections. It can't do that in > -- this configuration, obviously, because it's behind the 'other' node, so set > @@ -195,6 +203,13 @@ assert(get_leader(is_possible_leader) == next_leader_nr) > | --- > | - true > | ... > +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') > + | --- > + | - true > + | ... > +is_possible_leader[other_nr] = true > + | --- > + | ... > test_run:switch(other) > | --- > | - true > diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua > index e7952f5fa..b0b8baf36 100644 > --- a/test/replication/gh-5445-leader-inconsistency.test.lua > +++ b/test/replication/gh-5445-leader-inconsistency.test.lua > @@ -82,6 +82,9 @@ test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) > test_run:switch('default') > test_run:cmd('stop server '..leader) > is_possible_leader[leader_nr] = false > +-- And other node as well. > +test_run:cmd('stop server '..other) > +is_possible_leader[other_nr] = false > > -- Emulate a situation when next_leader wins the elections. It can't do that in > -- this configuration, obviously, because it's behind the 'other' node, so set > @@ -93,6 +96,8 @@ is_possible_leader[leader_nr] = false > -- a situation when some rows from the old leader were not received). > test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') > assert(get_leader(is_possible_leader) == next_leader_nr) > +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') > +is_possible_leader[other_nr] = true > test_run:switch(other) > -- New leader didn't know about the unconfirmed rows but still rolled them back. > test_run:wait_cond(function() return box.space.test:get{2} == nil end) > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result > new file mode 100644 > index 000000000..2620e7b6f > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.result > @@ -0,0 +1,144 @@ > +-- test-run result file version 2 > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- > +-- Prepare a scheme with transitional node > +-- > +-- master <=> replica1 => replica2 > +-- > +-- such as transaction initiated on the master node would > +-- be replicated to the replica2 via interim replica1 node. > +-- > + > +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') > + | --- > + | - true > + | ... > + > +test_run:cmd('start server master') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica2') > + | --- > + | - true > + | ... > + > +test_run:switch('replica2') > + | --- > + | - true > + | ... > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > + | --- > + | ... > + > +-- > +-- Make the master to be RAFT leader, this drops connection > +-- to the replica2. > +test_run:switch('master') > + | --- > + | - true > + | ... > +box.cfg({ \ > + replication = { \ > + require('fio').cwd() .. "/master.sock", \ > + require('fio').cwd() .. "/replica1.sock", \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +}) > + | --- > + | ... > + > +box.ctl.promote() > + | --- > + | ... > +_ = box.schema.space.create("sync", {is_sync = true}) > + | --- > + | ... > +_ = box.space.sync:create_index("pk") > + | --- > + | ... > +box.space.sync:insert{1} > + | --- > + | - [1] > + | ... > + > +-- > +-- The first hop is replica1. > +test_run:switch('replica1') > + | --- > + | - true > + | ... > +box.space.sync:select{} > + | --- > + | - - [1] > + | ... You need to wait for the space creation, just like you do below. Otherwise the test'll be flaky. Also, please see a comment regarding wait_lsn vs wait_cond below. > + > +-- > +-- And the second hop is replica2 where > +-- replica1 replicated the data to us. > +test_run:switch('replica2') > + | --- > + | - true > + | ... > +test_run:wait_cond(function() return \ > + box.space.sync ~= nil and \ > + box.space.sync:get{1} ~= nil and \ > + box.space.sync:get{1}[1] == 1 end, 100) > + | --- > + | - true > + | ... I suggest you use wait_lsn('replica2', 'master') here instead of this bulky wait_cond. First of all, it takes a single line, instead of 4 lines. Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning the test will still fail occasionally, when index creation doesn't replicate in time. > +box.space.sync:select{} > + | --- > + | - - [1] > + | ... > + > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('stop server master') > + | --- > + | - true > + | ... > +test_run:cmd('delete server master') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica2') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica2') > + | --- > + | - true > + | ... > diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua > new file mode 100644 > index 000000000..9bfd91288 > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.test.lua > @@ -0,0 +1,68 @@ > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > +test_run = require('test_run').new() > + > +-- > +-- Prepare a scheme with transitional node > +-- > +-- master <=> replica1 => replica2 > +-- > +-- such as transaction initiated on the master node would > +-- be replicated to the replica2 via interim replica1 node. > +-- > + > +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') > +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') > +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') > + > +test_run:cmd('start server master') > +test_run:cmd('start server replica1') > +test_run:cmd('start server replica2') > + > +test_run:switch('replica2') > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > + > +-- > +-- Make the master to be RAFT leader, this drops connection > +-- to the replica2. > +test_run:switch('master') > +box.cfg({ \ > + replication = { \ > + require('fio').cwd() .. "/master.sock", \ > + require('fio').cwd() .. "/replica1.sock", \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +}) > + > +box.ctl.promote() > +_ = box.schema.space.create("sync", {is_sync = true}) > +_ = box.space.sync:create_index("pk") > +box.space.sync:insert{1} > + > +-- > +-- The first hop is replica1. > +test_run:switch('replica1') > +box.space.sync:select{} > + > +-- > +-- And the second hop is replica2 where > +-- replica1 replicated the data to us. > +test_run:switch('replica2') > +test_run:wait_cond(function() return \ > + box.space.sync ~= nil and \ > + box.space.sync:get{1} ~= nil and \ > + box.space.sync:get{1}[1] == 1 end, 100) > +box.space.sync:select{} > + > +test_run:switch('default') > +test_run:cmd('stop server master') > +test_run:cmd('delete server master') > +test_run:cmd('stop server replica1') > +test_run:cmd('delete server replica1') > +test_run:cmd('stop server replica2') > +test_run:cmd('delete server replica2') > diff --git a/test/replication/gh-6035-master.lua b/test/replication/gh-6035-master.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-master.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua > new file mode 100644 > index 000000000..e3819471a > --- /dev/null > +++ b/test/replication/gh-6035-node.lua > @@ -0,0 +1,35 @@ > +local SOCKET_DIR = require('fio').cwd() > +local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua") > + > +local function unix_socket(name) > + return SOCKET_DIR .. "/" .. name .. '.sock'; > +end > + > +require('console').listen(os.getenv('ADMIN')) > + > +if INSTANCE_ID == "master" then > + box.cfg({ > + listen = unix_socket("master"), > + }) > +elseif INSTANCE_ID == "replica1" then > + box.cfg({ > + listen = unix_socket("replica1"), > + replication = { > + unix_socket("master"), > + unix_socket("replica1") > + }, > + election_mode = 'voter' > + }) > +else > + assert(INSTANCE_ID == "replica2") > + box.cfg({ > + replication = { > + unix_socket("master"), > + }, > + election_mode = 'voter' > + }) > +end > + > +box.once("bootstrap", function() > + box.schema.user.grant('guest', 'super') > +end) > diff --git a/test/replication/gh-6035-replica1.lua b/test/replication/gh-6035-replica1.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-replica1.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/gh-6035-replica2.lua b/test/replication/gh-6035-replica2.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-replica2.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg > index 27eab20c2..55ec022ff 100644 > --- a/test/replication/suite.cfg > +++ b/test/replication/suite.cfg > @@ -47,6 +47,9 @@ > "gh-6032-promote-wal-write.test.lua": {}, > "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, > "gh-6094-rs-uuid-mismatch.test.lua": {}, > + "gh-6035-applier-filter.test.lua": { > + "memtx": {"engine": "memtx"} > + }, > "*": { > "memtx": {"engine": "memtx"}, > "vinyl": {"engine": "vinyl"} -- Serge Petrenko