From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id CFB346EC55; Tue, 15 Jun 2021 13:36:57 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CFB346EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1623753417; bh=t3LuIliYN2rRrHm4sxupE44vS9ddp414Y62PJ4hhN8Q=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=a66TE1awVAHq+495T4LBToBtzNEIwyhzYj034tYVEjKv4K1i/2VC2/FPihN7Pk2o2 rJb1X4bRpxZodZ4r0W7DtAjmkfelkfus0+PNLgNjXkVWbrZGtjFUvqENSOyk5upp+C K7q8hjXo3Tmx4QizFCvWtnuXKfm9DJ86B8TE7u5A= Received: from smtp56.i.mail.ru (smtp56.i.mail.ru [217.69.128.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 51B826EC55 for ; Tue, 15 Jun 2021 13:36:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 51B826EC55 Received: by smtp56.i.mail.ru with esmtpa (envelope-from ) id 1lt6RC-0005t5-Jd; Tue, 15 Jun 2021 13:36:55 +0300 To: Cyrill Gorcunov , tml Cc: Vladislav Shpilevoy References: <20210611152259.205627-1-gorcunov@gmail.com> <20210611152259.205627-2-gorcunov@gmail.com> Message-ID: <814a002a-6cde-1ec9-e201-c13637c58207@tarantool.org> Date: Tue, 15 Jun 2021 13:36:54 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210611152259.205627-2-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: ru X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C54907A7AE9C1BA82BC67C1327DFB87C6A6182A05F53808504061692E74C5D1C025DBC7F3577E9E2CB8633086DC670503B4533CFB4889C9202F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE796AA83661EF29BCEEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637B23888C9749F3CAC8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8F3C31A1A1C901ECE00D01F7CA6D9F86D117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAA867293B0326636D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6B67FC62909E22F84089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A564ECCA29BDA2359626C0A050DE3EBB5A83052D2AB8174CE3D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75FBC5FED0552DA851410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D349A401E2B4D763A4AEE70FE8B203F9A9EF842987B634EE8112A41E4699AC07D473D1D1412CC54E2CE1D7E09C32AA3244C1CC7ABB73406B6C97BA80D34D96FB8A1D9ADFF0C0BDB8D1FFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj6OL1iHTyIM2pFV8TVWExCQ== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A4468BAD4BE1AC841F65D2D38C443AD6BA23A4B7C1C391C1FF39424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v8 1/2] applier: filter incoming synchro packets via transaction initiator X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 11.06.2021 18:22, Cyrill Gorcunov пишет: > Currently we use synchro packets filtration based on their contents, > in particular by their xrow->replica_id value. Still there was a > question if we can optimize this moment and rather filter out all > packets coming from non-leader replica. > > Raft specification requires that only data from a current leader > should be applied to local WAL but doesn't put a concrete claim on > the data transport, ie how exactly rows are reaching replicas. This > implies that data propagation may reach replicas indirectly via transit > hops. Thus we drop applier->instance_id filtering and rely on > xrow->replica_id matching instead. > > In the test (inspired by Serge Petrenko's test) we recreate the situation > where replica3 obtains master's node data (which is a raft leader) > indirectly via replica2 node. > > Closes #6035 > > Signed-off-by: Cyrill Gorcunov > --- > src/box/applier.cc | 27 ++-- > src/lib/raft/raft.h | 7 - > .../gh-5445-leader-inconsistency.result | 15 ++ > .../gh-5445-leader-inconsistency.test.lua | 5 + > .../replication/gh-6035-applier-filter.result | 137 ++++++++++++++++++ > .../gh-6035-applier-filter.test.lua | 64 ++++++++ > test/replication/gh-6035-master.lua | 1 + > test/replication/gh-6035-node.lua | 35 +++++ > test/replication/gh-6035-replica1.lua | 1 + > test/replication/gh-6035-replica2.lua | 1 + > test/replication/suite.cfg | 3 + > 11 files changed, 277 insertions(+), 19 deletions(-) > create mode 100644 test/replication/gh-6035-applier-filter.result > create mode 100644 test/replication/gh-6035-applier-filter.test.lua > create mode 120000 test/replication/gh-6035-master.lua > create mode 100644 test/replication/gh-6035-node.lua > create mode 120000 test/replication/gh-6035-replica1.lua > create mode 120000 test/replication/gh-6035-replica2.lua > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 33181fdbf..d3430f582 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -1027,21 +1027,24 @@ nopify:; > * Return 0 for success or -1 in case of an error. > */ > static int > -applier_apply_tx(struct applier *applier, struct stailq *rows) > +applier_apply_tx(struct stailq *rows) > { > /* > - * Rows received not directly from a leader are ignored. That is a > - * protection against the case when an old leader keeps sending data > - * around not knowing yet that it is not a leader anymore. > + * Initially we've been filtering out data if it came from > + * an applier which instance_id doesn't match raft->leader, > + * but this prevents from obtaining valid leader's data when > + * it comes from intermediate node. For example a series of > + * replica hops > * > - * XXX: it may be that this can be fine to apply leader transactions by > - * looking at their replica_id field if it is equal to leader id. That > - * can be investigated as an 'optimization'. Even though may not give > - * anything, because won't change total number of rows sent in the > - * network anyway. > + * master -> replica 1 -> replica 2 > + * > + * where each replica carries master's initiated transaction > + * in xrow->replica_id field and master's data get propagated > + * indirectly. > + * > + * Finally we dropped such "sender" filtration and use transaction > + * "initiator" filtration via xrow->replica_id only. > */ > - if (!raft_is_source_allowed(box_raft(), applier->instance_id)) > - return 0; > struct xrow_header *first_row = &stailq_first_entry(rows, > struct applier_tx_row, next)->row; > struct xrow_header *last_row; > @@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier) > diag_raise(); > } > applier_signal_ack(applier); > - } else if (applier_apply_tx(applier, &rows) != 0) { > + } else if (applier_apply_tx(&rows) != 0) { > diag_raise(); > } > > diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h > index a8da564b0..fae30b03d 100644 > --- a/src/lib/raft/raft.h > +++ b/src/lib/raft/raft.h > @@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft) > return raft->is_enabled && raft->state != RAFT_STATE_LEADER; > } > > -/** See if the instance can accept rows from an instance with the given ID. */ > -static inline bool > -raft_is_source_allowed(const struct raft *raft, uint32_t source_id) > -{ > - return !raft->is_enabled || raft->leader == source_id; > -} > - > /** Check if Raft is enabled. */ > static inline bool > raft_is_enabled(const struct raft *raft) > diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result > index 5c6169f50..38d0b097c 100644 > --- a/test/replication/gh-5445-leader-inconsistency.result > +++ b/test/replication/gh-5445-leader-inconsistency.result > @@ -175,9 +175,17 @@ test_run:cmd('stop server '..leader) > | --- > | - true > | ... > +-- And other node as well so it would notice new term on restart. > is_possible_leader[leader_nr] = false > | --- > | ... > +test_run:cmd('stop server '..other) > + | --- > + | - true > + | ... > +is_possible_leader[other_nr] = false > + | --- > + | ... > > -- Emulate a situation when next_leader wins the elections. It can't do that in > -- this configuration, obviously, because it's behind the 'other' node, so set > @@ -195,6 +203,13 @@ assert(get_leader(is_possible_leader) == next_leader_nr) > | --- > | - true > | ... > +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') > + | --- > + | - true > + | ... > +is_possible_leader[other_nr] = true > + | --- > + | ... > test_run:switch(other) > | --- > | - true > diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua > index e7952f5fa..fad101881 100644 > --- a/test/replication/gh-5445-leader-inconsistency.test.lua > +++ b/test/replication/gh-5445-leader-inconsistency.test.lua > @@ -81,7 +81,10 @@ test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) > -- Old leader is gone. > test_run:switch('default') > test_run:cmd('stop server '..leader) > +-- And other node as well so it would notice new term on restart. > is_possible_leader[leader_nr] = false > +test_run:cmd('stop server '..other) As I said in the previous review, let's move the comment one line below. And you shouldn't mention term here. The node is stopped so that it doesn't replicate [2] to next_leader. > +is_possible_leader[other_nr] = false > > -- Emulate a situation when next_leader wins the elections. It can't do that in > -- this configuration, obviously, because it's behind the 'other' node, so set > @@ -93,6 +96,8 @@ is_possible_leader[leader_nr] = false > -- a situation when some rows from the old leader were not received). > test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') > assert(get_leader(is_possible_leader) == next_leader_nr) > +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') > +is_possible_leader[other_nr] = true > test_run:switch(other) > -- New leader didn't know about the unconfirmed rows but still rolled them back. > test_run:wait_cond(function() return box.space.test:get{2} == nil end) > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result > new file mode 100644 > index 000000000..077cd46e5 > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.result > @@ -0,0 +1,137 @@ > +-- test-run result file version 2 > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- > +-- Prepare a scheme with transitional node > +-- > +-- master <=> replica1 => replica2 > +-- > +-- such as transaction initiated on the master node would > +-- be replicated to the replica2 via interim replica1 node. > +-- > + > +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') > + | --- > + | - true > + | ... > + > +test_run:cmd('start server master') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica2') > + | --- > + | - true > + | ... > + > +test_run:switch('replica2') > + | --- > + | - true > + | ... > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > + | --- > + | ... > + > +-- > +-- Make the master to be RAFT leader. > +test_run:switch('master') > + | --- > + | - true > + | ... > +box.cfg({ \ > + replication = { \ > + require('fio').cwd() .. "/master.sock", \ > + require('fio').cwd() .. "/replica1.sock", \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +}) > + | --- > + | ... > + > +box.ctl.promote() > + | --- > + | ... > +_ = box.schema.space.create("sync", {is_sync = true}) > + | --- > + | ... > +_ = box.space.sync:create_index("pk") > + | --- > + | ... > +box.space.sync:insert{1} > + | --- > + | - [1] > + | ... > + > +-- > +-- The first hop is replica1. > +test_run:switch('replica1') > + | --- > + | - true > + | ... > +box.space.sync:select{} > + | --- > + | - - [1] > + | ... > + > +-- > +-- And the second hop is replica2 where > +-- replica1 replicated the data to us. > +test_run:switch('replica2') > + | --- > + | - true > + | ... > +--test_run:wait_cond(function() return box.space.sync:get{1} ~= nil end, 10) > +box.space.sync:select{} > + | --- > + | - [] > + | ... > + > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('stop server master') > + | --- > + | - true > + | ... > +test_run:cmd('delete server master') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica2') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica2') > + | --- > + | - true > + | ... > diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua > new file mode 100644 > index 000000000..4e72abe5f > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.test.lua > @@ -0,0 +1,64 @@ > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > +test_run = require('test_run').new() > + > +-- > +-- Prepare a scheme with transitional node > +-- > +-- master <=> replica1 => replica2 > +-- > +-- such as transaction initiated on the master node would > +-- be replicated to the replica2 via interim replica1 node. > +-- > + > +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') > +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') > +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') > + > +test_run:cmd('start server master') > +test_run:cmd('start server replica1') > +test_run:cmd('start server replica2') > + > +test_run:switch('replica2') > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > + > +-- > +-- Make the master to be RAFT leader. > +test_run:switch('master') > +box.cfg({ \ > + replication = { \ > + require('fio').cwd() .. "/master.sock", \ > + require('fio').cwd() .. "/replica1.sock", \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +}) > + > +box.ctl.promote() > +_ = box.schema.space.create("sync", {is_sync = true}) > +_ = box.space.sync:create_index("pk") > +box.space.sync:insert{1} > + > +-- > +-- The first hop is replica1. > +test_run:switch('replica1') > +box.space.sync:select{} > + > +-- > +-- And the second hop is replica2 where > +-- replica1 replicated the data to us. > +test_run:switch('replica2') > +--test_run:wait_cond(function() return box.space.sync:get{1} ~= nil end, 10) > +box.space.sync:select{} > + > +test_run:switch('default') > +test_run:cmd('stop server master') > +test_run:cmd('delete server master') > +test_run:cmd('stop server replica1') > +test_run:cmd('delete server replica1') > +test_run:cmd('stop server replica2') > +test_run:cmd('delete server replica2') > diff --git a/test/replication/gh-6035-master.lua b/test/replication/gh-6035-master.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-master.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua > new file mode 100644 > index 000000000..e3819471a > --- /dev/null > +++ b/test/replication/gh-6035-node.lua > @@ -0,0 +1,35 @@ > +local SOCKET_DIR = require('fio').cwd() > +local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua") > + > +local function unix_socket(name) > + return SOCKET_DIR .. "/" .. name .. '.sock'; > +end > + > +require('console').listen(os.getenv('ADMIN')) > + > +if INSTANCE_ID == "master" then > + box.cfg({ > + listen = unix_socket("master"), > + }) > +elseif INSTANCE_ID == "replica1" then > + box.cfg({ > + listen = unix_socket("replica1"), > + replication = { > + unix_socket("master"), > + unix_socket("replica1") > + }, > + election_mode = 'voter' > + }) > +else > + assert(INSTANCE_ID == "replica2") > + box.cfg({ > + replication = { > + unix_socket("master"), > + }, > + election_mode = 'voter' > + }) > +end > + > +box.once("bootstrap", function() > + box.schema.user.grant('guest', 'super') > +end) > diff --git a/test/replication/gh-6035-replica1.lua b/test/replication/gh-6035-replica1.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-replica1.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/gh-6035-replica2.lua b/test/replication/gh-6035-replica2.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-replica2.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg > index 27eab20c2..55ec022ff 100644 > --- a/test/replication/suite.cfg > +++ b/test/replication/suite.cfg > @@ -47,6 +47,9 @@ > "gh-6032-promote-wal-write.test.lua": {}, > "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, > "gh-6094-rs-uuid-mismatch.test.lua": {}, > + "gh-6035-applier-filter.test.lua": { > + "memtx": {"engine": "memtx"} > + }, > "*": { > "memtx": {"engine": "memtx"}, > "vinyl": {"engine": "vinyl"} -- Serge Petrenko