* [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator @ 2021-06-15 13:56 Cyrill Gorcunov via Tarantool-patches 2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 13:56 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy Guys, here is a patch which implements the filtration we are interested in. I switched back to the former version with a few updates: - fixed comment in gh-5445-leader-inconsistency test as Serge pointed - added comments to the new patch pointing what's going on - we tried to rework the test with Serge but it didn't make thnigs much easier so I left the former approach - we use only memtx engine as Vlad asked Please give it a shot. issue https://github.com/tarantool/tarantool/issues/6035 branch gorcunov/gh-6035-applier-filter-9 Cyrill Gorcunov (1): applier: filter incoming synchro packets via transaction initiator src/box/applier.cc | 27 ++-- src/lib/raft/raft.h | 7 - .../gh-5445-leader-inconsistency.result | 15 ++ .../gh-5445-leader-inconsistency.test.lua | 5 + .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++ .../gh-6035-applier-filter.test.lua | 68 +++++++++ test/replication/gh-6035-master.lua | 1 + test/replication/gh-6035-node.lua | 35 +++++ test/replication/gh-6035-replica1.lua | 1 + test/replication/gh-6035-replica2.lua | 1 + test/replication/suite.cfg | 3 + 11 files changed, 288 insertions(+), 19 deletions(-) create mode 100644 test/replication/gh-6035-applier-filter.result create mode 100644 test/replication/gh-6035-applier-filter.test.lua create mode 120000 test/replication/gh-6035-master.lua create mode 100644 test/replication/gh-6035-node.lua create mode 120000 test/replication/gh-6035-replica1.lua create mode 120000 test/replication/gh-6035-replica2.lua base-commit: 9fb95cfd78403b33cacbe765c35036a0532b2c02 -- 2.31.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 13:56 ` Cyrill Gorcunov via Tarantool-patches 2021-06-15 14:26 ` Serge Petrenko via Tarantool-patches 2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches 2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches 2 siblings, 1 reply; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 13:56 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy Currently we use synchro packets filtration based on their contents, in particular by their xrow->replica_id value. Still there was a question if we can optimize this moment and rather filter out all packets coming from non-leader replica. Raft specification requires that only data from a current leader should be applied to local WAL but doesn't put a concrete claim on the data transport, ie how exactly rows are reaching replicas. This implies that data propagation may reach replicas indirectly via transit hops. Thus we drop applier->instance_id filtering and rely on xrow->replica_id matching instead. In the test (inspired by Serge Petrenko's test) we recreate the situation where replica3 obtains master's node data (which is a raft leader) indirectly via replica2 node. Closes #6035 Co-developed-by: Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> --- src/box/applier.cc | 27 ++-- src/lib/raft/raft.h | 7 - .../gh-5445-leader-inconsistency.result | 15 ++ .../gh-5445-leader-inconsistency.test.lua | 5 + .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++ .../gh-6035-applier-filter.test.lua | 68 +++++++++ test/replication/gh-6035-master.lua | 1 + test/replication/gh-6035-node.lua | 35 +++++ test/replication/gh-6035-replica1.lua | 1 + test/replication/gh-6035-replica2.lua | 1 + test/replication/suite.cfg | 3 + 11 files changed, 288 insertions(+), 19 deletions(-) create mode 100644 test/replication/gh-6035-applier-filter.result create mode 100644 test/replication/gh-6035-applier-filter.test.lua create mode 120000 test/replication/gh-6035-master.lua create mode 100644 test/replication/gh-6035-node.lua create mode 120000 test/replication/gh-6035-replica1.lua create mode 120000 test/replication/gh-6035-replica2.lua diff --git a/src/box/applier.cc b/src/box/applier.cc index 33181fdbf..d3430f582 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -1027,21 +1027,24 @@ nopify:; * Return 0 for success or -1 in case of an error. */ static int -applier_apply_tx(struct applier *applier, struct stailq *rows) +applier_apply_tx(struct stailq *rows) { /* - * Rows received not directly from a leader are ignored. That is a - * protection against the case when an old leader keeps sending data - * around not knowing yet that it is not a leader anymore. + * Initially we've been filtering out data if it came from + * an applier which instance_id doesn't match raft->leader, + * but this prevents from obtaining valid leader's data when + * it comes from intermediate node. For example a series of + * replica hops * - * XXX: it may be that this can be fine to apply leader transactions by - * looking at their replica_id field if it is equal to leader id. That - * can be investigated as an 'optimization'. Even though may not give - * anything, because won't change total number of rows sent in the - * network anyway. + * master -> replica 1 -> replica 2 + * + * where each replica carries master's initiated transaction + * in xrow->replica_id field and master's data get propagated + * indirectly. + * + * Finally we dropped such "sender" filtration and use transaction + * "initiator" filtration via xrow->replica_id only. */ - if (!raft_is_source_allowed(box_raft(), applier->instance_id)) - return 0; struct xrow_header *first_row = &stailq_first_entry(rows, struct applier_tx_row, next)->row; struct xrow_header *last_row; @@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier) diag_raise(); } applier_signal_ack(applier); - } else if (applier_apply_tx(applier, &rows) != 0) { + } else if (applier_apply_tx(&rows) != 0) { diag_raise(); } diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index a8da564b0..fae30b03d 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft) return raft->is_enabled && raft->state != RAFT_STATE_LEADER; } -/** See if the instance can accept rows from an instance with the given ID. */ -static inline bool -raft_is_source_allowed(const struct raft *raft, uint32_t source_id) -{ - return !raft->is_enabled || raft->leader == source_id; -} - /** Check if Raft is enabled. */ static inline bool raft_is_enabled(const struct raft *raft) diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result index 5c6169f50..8b9a4051a 100644 --- a/test/replication/gh-5445-leader-inconsistency.result +++ b/test/replication/gh-5445-leader-inconsistency.result @@ -178,6 +178,14 @@ test_run:cmd('stop server '..leader) is_possible_leader[leader_nr] = false | --- | ... +-- And other node as well. +test_run:cmd('stop server '..other) + | --- + | - true + | ... +is_possible_leader[other_nr] = false + | --- + | ... -- Emulate a situation when next_leader wins the elections. It can't do that in -- this configuration, obviously, because it's behind the 'other' node, so set @@ -195,6 +203,13 @@ assert(get_leader(is_possible_leader) == next_leader_nr) | --- | - true | ... +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') + | --- + | - true + | ... +is_possible_leader[other_nr] = true + | --- + | ... test_run:switch(other) | --- | - true diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua index e7952f5fa..b0b8baf36 100644 --- a/test/replication/gh-5445-leader-inconsistency.test.lua +++ b/test/replication/gh-5445-leader-inconsistency.test.lua @@ -82,6 +82,9 @@ test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) test_run:switch('default') test_run:cmd('stop server '..leader) is_possible_leader[leader_nr] = false +-- And other node as well. +test_run:cmd('stop server '..other) +is_possible_leader[other_nr] = false -- Emulate a situation when next_leader wins the elections. It can't do that in -- this configuration, obviously, because it's behind the 'other' node, so set @@ -93,6 +96,8 @@ is_possible_leader[leader_nr] = false -- a situation when some rows from the old leader were not received). test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') assert(get_leader(is_possible_leader) == next_leader_nr) +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') +is_possible_leader[other_nr] = true test_run:switch(other) -- New leader didn't know about the unconfirmed rows but still rolled them back. test_run:wait_cond(function() return box.space.test:get{2} == nil end) diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result new file mode 100644 index 000000000..2620e7b6f --- /dev/null +++ b/test/replication/gh-6035-applier-filter.result @@ -0,0 +1,144 @@ +-- test-run result file version 2 +-- +-- gh-6035: verify synchronous rows filtration in applier, +-- we need to be sure that filtering synchronous rows is +-- done via transaction initiator not sender (iow via +-- xrow->replica_id). +-- +test_run = require('test_run').new() + | --- + | ... + +-- +-- Prepare a scheme with transitional node +-- +-- master <=> replica1 => replica2 +-- +-- such as transaction initiated on the master node would +-- be replicated to the replica2 via interim replica1 node. +-- + +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') + | --- + | - true + | ... +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') + | --- + | - true + | ... +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') + | --- + | - true + | ... + +test_run:cmd('start server master') + | --- + | - true + | ... +test_run:cmd('start server replica1') + | --- + | - true + | ... +test_run:cmd('start server replica2') + | --- + | - true + | ... + +test_run:switch('replica2') + | --- + | - true + | ... +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} + | --- + | ... + +-- +-- Make the master to be RAFT leader, this drops connection +-- to the replica2. +test_run:switch('master') + | --- + | - true + | ... +box.cfg({ \ + replication = { \ + require('fio').cwd() .. "/master.sock", \ + require('fio').cwd() .. "/replica1.sock", \ + }, \ + replication_synchro_quorum = 2, \ + election_mode = 'manual', \ +}) + | --- + | ... + +box.ctl.promote() + | --- + | ... +_ = box.schema.space.create("sync", {is_sync = true}) + | --- + | ... +_ = box.space.sync:create_index("pk") + | --- + | ... +box.space.sync:insert{1} + | --- + | - [1] + | ... + +-- +-- The first hop is replica1. +test_run:switch('replica1') + | --- + | - true + | ... +box.space.sync:select{} + | --- + | - - [1] + | ... + +-- +-- And the second hop is replica2 where +-- replica1 replicated the data to us. +test_run:switch('replica2') + | --- + | - true + | ... +test_run:wait_cond(function() return \ + box.space.sync ~= nil and \ + box.space.sync:get{1} ~= nil and \ + box.space.sync:get{1}[1] == 1 end, 100) + | --- + | - true + | ... +box.space.sync:select{} + | --- + | - - [1] + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('delete server master') + | --- + | - true + | ... +test_run:cmd('stop server replica1') + | --- + | - true + | ... +test_run:cmd('delete server replica1') + | --- + | - true + | ... +test_run:cmd('stop server replica2') + | --- + | - true + | ... +test_run:cmd('delete server replica2') + | --- + | - true + | ... diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua new file mode 100644 index 000000000..9bfd91288 --- /dev/null +++ b/test/replication/gh-6035-applier-filter.test.lua @@ -0,0 +1,68 @@ +-- +-- gh-6035: verify synchronous rows filtration in applier, +-- we need to be sure that filtering synchronous rows is +-- done via transaction initiator not sender (iow via +-- xrow->replica_id). +-- +test_run = require('test_run').new() + +-- +-- Prepare a scheme with transitional node +-- +-- master <=> replica1 => replica2 +-- +-- such as transaction initiated on the master node would +-- be replicated to the replica2 via interim replica1 node. +-- + +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') + +test_run:cmd('start server master') +test_run:cmd('start server replica1') +test_run:cmd('start server replica2') + +test_run:switch('replica2') +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} + +-- +-- Make the master to be RAFT leader, this drops connection +-- to the replica2. +test_run:switch('master') +box.cfg({ \ + replication = { \ + require('fio').cwd() .. "/master.sock", \ + require('fio').cwd() .. "/replica1.sock", \ + }, \ + replication_synchro_quorum = 2, \ + election_mode = 'manual', \ +}) + +box.ctl.promote() +_ = box.schema.space.create("sync", {is_sync = true}) +_ = box.space.sync:create_index("pk") +box.space.sync:insert{1} + +-- +-- The first hop is replica1. +test_run:switch('replica1') +box.space.sync:select{} + +-- +-- And the second hop is replica2 where +-- replica1 replicated the data to us. +test_run:switch('replica2') +test_run:wait_cond(function() return \ + box.space.sync ~= nil and \ + box.space.sync:get{1} ~= nil and \ + box.space.sync:get{1}[1] == 1 end, 100) +box.space.sync:select{} + +test_run:switch('default') +test_run:cmd('stop server master') +test_run:cmd('delete server master') +test_run:cmd('stop server replica1') +test_run:cmd('delete server replica1') +test_run:cmd('stop server replica2') +test_run:cmd('delete server replica2') diff --git a/test/replication/gh-6035-master.lua b/test/replication/gh-6035-master.lua new file mode 120000 index 000000000..f7ede7ef2 --- /dev/null +++ b/test/replication/gh-6035-master.lua @@ -0,0 +1 @@ +gh-6035-node.lua \ No newline at end of file diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua new file mode 100644 index 000000000..e3819471a --- /dev/null +++ b/test/replication/gh-6035-node.lua @@ -0,0 +1,35 @@ +local SOCKET_DIR = require('fio').cwd() +local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua") + +local function unix_socket(name) + return SOCKET_DIR .. "/" .. name .. '.sock'; +end + +require('console').listen(os.getenv('ADMIN')) + +if INSTANCE_ID == "master" then + box.cfg({ + listen = unix_socket("master"), + }) +elseif INSTANCE_ID == "replica1" then + box.cfg({ + listen = unix_socket("replica1"), + replication = { + unix_socket("master"), + unix_socket("replica1") + }, + election_mode = 'voter' + }) +else + assert(INSTANCE_ID == "replica2") + box.cfg({ + replication = { + unix_socket("master"), + }, + election_mode = 'voter' + }) +end + +box.once("bootstrap", function() + box.schema.user.grant('guest', 'super') +end) diff --git a/test/replication/gh-6035-replica1.lua b/test/replication/gh-6035-replica1.lua new file mode 120000 index 000000000..f7ede7ef2 --- /dev/null +++ b/test/replication/gh-6035-replica1.lua @@ -0,0 +1 @@ +gh-6035-node.lua \ No newline at end of file diff --git a/test/replication/gh-6035-replica2.lua b/test/replication/gh-6035-replica2.lua new file mode 120000 index 000000000..f7ede7ef2 --- /dev/null +++ b/test/replication/gh-6035-replica2.lua @@ -0,0 +1 @@ +gh-6035-node.lua \ No newline at end of file diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 27eab20c2..55ec022ff 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -47,6 +47,9 @@ "gh-6032-promote-wal-write.test.lua": {}, "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, "gh-6094-rs-uuid-mismatch.test.lua": {}, + "gh-6035-applier-filter.test.lua": { + "memtx": {"engine": "memtx"} + }, "*": { "memtx": {"engine": "memtx"}, "vinyl": {"engine": "vinyl"} -- 2.31.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 14:26 ` Serge Petrenko via Tarantool-patches 2021-06-15 17:02 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 1 reply; 15+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-06-15 14:26 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy 15.06.2021 16:56, Cyrill Gorcunov пишет: > Currently we use synchro packets filtration based on their contents, > in particular by their xrow->replica_id value. Still there was a > question if we can optimize this moment and rather filter out all > packets coming from non-leader replica. > > Raft specification requires that only data from a current leader > should be applied to local WAL but doesn't put a concrete claim on > the data transport, ie how exactly rows are reaching replicas. This > implies that data propagation may reach replicas indirectly via transit > hops. Thus we drop applier->instance_id filtering and rely on > xrow->replica_id matching instead. > > In the test (inspired by Serge Petrenko's test) we recreate the situation > where replica3 obtains master's node data (which is a raft leader) > indirectly via replica2 node. > > Closes #6035 > > Co-developed-by: Serge Petrenko <sergepetrenko@tarantool.org> > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > --- Thanks for the fixes! Please, find a couple of finalization comments below. > src/box/applier.cc | 27 ++-- > src/lib/raft/raft.h | 7 - > .../gh-5445-leader-inconsistency.result | 15 ++ > .../gh-5445-leader-inconsistency.test.lua | 5 + > .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++ > .../gh-6035-applier-filter.test.lua | 68 +++++++++ > test/replication/gh-6035-master.lua | 1 + > test/replication/gh-6035-node.lua | 35 +++++ > test/replication/gh-6035-replica1.lua | 1 + > test/replication/gh-6035-replica2.lua | 1 + > test/replication/suite.cfg | 3 + > 11 files changed, 288 insertions(+), 19 deletions(-) > create mode 100644 test/replication/gh-6035-applier-filter.result > create mode 100644 test/replication/gh-6035-applier-filter.test.lua > create mode 120000 test/replication/gh-6035-master.lua > create mode 100644 test/replication/gh-6035-node.lua > create mode 120000 test/replication/gh-6035-replica1.lua > create mode 120000 test/replication/gh-6035-replica2.lua > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 33181fdbf..d3430f582 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -1027,21 +1027,24 @@ nopify:; > * Return 0 for success or -1 in case of an error. > */ > static int > -applier_apply_tx(struct applier *applier, struct stailq *rows) > +applier_apply_tx(struct stailq *rows) > { > /* > - * Rows received not directly from a leader are ignored. That is a > - * protection against the case when an old leader keeps sending data > - * around not knowing yet that it is not a leader anymore. > + * Initially we've been filtering out data if it came from > + * an applier which instance_id doesn't match raft->leader, > + * but this prevents from obtaining valid leader's data when > + * it comes from intermediate node. For example a series of > + * replica hops > * > - * XXX: it may be that this can be fine to apply leader transactions by > - * looking at their replica_id field if it is equal to leader id. That > - * can be investigated as an 'optimization'. Even though may not give > - * anything, because won't change total number of rows sent in the > - * network anyway. > + * master -> replica 1 -> replica 2 > + * > + * where each replica carries master's initiated transaction > + * in xrow->replica_id field and master's data get propagated > + * indirectly. > + * > + * Finally we dropped such "sender" filtration and use transaction > + * "initiator" filtration via xrow->replica_id only. > */ > - if (!raft_is_source_allowed(box_raft(), applier->instance_id)) > - return 0; > struct xrow_header *first_row = &stailq_first_entry(rows, > struct applier_tx_row, next)->row; > struct xrow_header *last_row; > @@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier) > diag_raise(); > } > applier_signal_ack(applier); > - } else if (applier_apply_tx(applier, &rows) != 0) { > + } else if (applier_apply_tx(&rows) != 0) { > diag_raise(); > } > > diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h > index a8da564b0..fae30b03d 100644 > --- a/src/lib/raft/raft.h > +++ b/src/lib/raft/raft.h > @@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft) > return raft->is_enabled && raft->state != RAFT_STATE_LEADER; > } > > -/** See if the instance can accept rows from an instance with the given ID. */ > -static inline bool > -raft_is_source_allowed(const struct raft *raft, uint32_t source_id) > -{ > - return !raft->is_enabled || raft->leader == source_id; > -} > - > /** Check if Raft is enabled. */ > static inline bool > raft_is_enabled(const struct raft *raft) > diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result > index 5c6169f50..8b9a4051a 100644 > --- a/test/replication/gh-5445-leader-inconsistency.result > +++ b/test/replication/gh-5445-leader-inconsistency.result > @@ -178,6 +178,14 @@ test_run:cmd('stop server '..leader) > is_possible_leader[leader_nr] = false > | --- > | ... > +-- And other node as well. > +test_run:cmd('stop server '..other) > + | --- > + | - true > + | ... > +is_possible_leader[other_nr] = false > + | --- > + | ... > > -- Emulate a situation when next_leader wins the elections. It can't do that in > -- this configuration, obviously, because it's behind the 'other' node, so set > @@ -195,6 +203,13 @@ assert(get_leader(is_possible_leader) == next_leader_nr) > | --- > | - true > | ... > +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') > + | --- > + | - true > + | ... > +is_possible_leader[other_nr] = true > + | --- > + | ... > test_run:switch(other) > | --- > | - true > diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua > index e7952f5fa..b0b8baf36 100644 > --- a/test/replication/gh-5445-leader-inconsistency.test.lua > +++ b/test/replication/gh-5445-leader-inconsistency.test.lua > @@ -82,6 +82,9 @@ test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) > test_run:switch('default') > test_run:cmd('stop server '..leader) > is_possible_leader[leader_nr] = false > +-- And other node as well. > +test_run:cmd('stop server '..other) > +is_possible_leader[other_nr] = false > > -- Emulate a situation when next_leader wins the elections. It can't do that in > -- this configuration, obviously, because it's behind the 'other' node, so set > @@ -93,6 +96,8 @@ is_possible_leader[leader_nr] = false > -- a situation when some rows from the old leader were not received). > test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') > assert(get_leader(is_possible_leader) == next_leader_nr) > +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"') > +is_possible_leader[other_nr] = true > test_run:switch(other) > -- New leader didn't know about the unconfirmed rows but still rolled them back. > test_run:wait_cond(function() return box.space.test:get{2} == nil end) > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result > new file mode 100644 > index 000000000..2620e7b6f > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.result > @@ -0,0 +1,144 @@ > +-- test-run result file version 2 > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- > +-- Prepare a scheme with transitional node > +-- > +-- master <=> replica1 => replica2 > +-- > +-- such as transaction initiated on the master node would > +-- be replicated to the replica2 via interim replica1 node. > +-- > + > +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') > + | --- > + | - true > + | ... > + > +test_run:cmd('start server master') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica2') > + | --- > + | - true > + | ... > + > +test_run:switch('replica2') > + | --- > + | - true > + | ... > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > + | --- > + | ... > + > +-- > +-- Make the master to be RAFT leader, this drops connection > +-- to the replica2. > +test_run:switch('master') > + | --- > + | - true > + | ... > +box.cfg({ \ > + replication = { \ > + require('fio').cwd() .. "/master.sock", \ > + require('fio').cwd() .. "/replica1.sock", \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +}) > + | --- > + | ... > + > +box.ctl.promote() > + | --- > + | ... > +_ = box.schema.space.create("sync", {is_sync = true}) > + | --- > + | ... > +_ = box.space.sync:create_index("pk") > + | --- > + | ... > +box.space.sync:insert{1} > + | --- > + | - [1] > + | ... > + > +-- > +-- The first hop is replica1. > +test_run:switch('replica1') > + | --- > + | - true > + | ... > +box.space.sync:select{} > + | --- > + | - - [1] > + | ... You need to wait for the space creation, just like you do below. Otherwise the test'll be flaky. Also, please see a comment regarding wait_lsn vs wait_cond below. > + > +-- > +-- And the second hop is replica2 where > +-- replica1 replicated the data to us. > +test_run:switch('replica2') > + | --- > + | - true > + | ... > +test_run:wait_cond(function() return \ > + box.space.sync ~= nil and \ > + box.space.sync:get{1} ~= nil and \ > + box.space.sync:get{1}[1] == 1 end, 100) > + | --- > + | - true > + | ... I suggest you use wait_lsn('replica2', 'master') here instead of this bulky wait_cond. First of all, it takes a single line, instead of 4 lines. Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning the test will still fail occasionally, when index creation doesn't replicate in time. > +box.space.sync:select{} > + | --- > + | - - [1] > + | ... > + > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('stop server master') > + | --- > + | - true > + | ... > +test_run:cmd('delete server master') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica2') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica2') > + | --- > + | - true > + | ... > diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua > new file mode 100644 > index 000000000..9bfd91288 > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.test.lua > @@ -0,0 +1,68 @@ > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > +test_run = require('test_run').new() > + > +-- > +-- Prepare a scheme with transitional node > +-- > +-- master <=> replica1 => replica2 > +-- > +-- such as transaction initiated on the master node would > +-- be replicated to the replica2 via interim replica1 node. > +-- > + > +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') > +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') > +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') > + > +test_run:cmd('start server master') > +test_run:cmd('start server replica1') > +test_run:cmd('start server replica2') > + > +test_run:switch('replica2') > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > + > +-- > +-- Make the master to be RAFT leader, this drops connection > +-- to the replica2. > +test_run:switch('master') > +box.cfg({ \ > + replication = { \ > + require('fio').cwd() .. "/master.sock", \ > + require('fio').cwd() .. "/replica1.sock", \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +}) > + > +box.ctl.promote() > +_ = box.schema.space.create("sync", {is_sync = true}) > +_ = box.space.sync:create_index("pk") > +box.space.sync:insert{1} > + > +-- > +-- The first hop is replica1. > +test_run:switch('replica1') > +box.space.sync:select{} > + > +-- > +-- And the second hop is replica2 where > +-- replica1 replicated the data to us. > +test_run:switch('replica2') > +test_run:wait_cond(function() return \ > + box.space.sync ~= nil and \ > + box.space.sync:get{1} ~= nil and \ > + box.space.sync:get{1}[1] == 1 end, 100) > +box.space.sync:select{} > + > +test_run:switch('default') > +test_run:cmd('stop server master') > +test_run:cmd('delete server master') > +test_run:cmd('stop server replica1') > +test_run:cmd('delete server replica1') > +test_run:cmd('stop server replica2') > +test_run:cmd('delete server replica2') > diff --git a/test/replication/gh-6035-master.lua b/test/replication/gh-6035-master.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-master.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua > new file mode 100644 > index 000000000..e3819471a > --- /dev/null > +++ b/test/replication/gh-6035-node.lua > @@ -0,0 +1,35 @@ > +local SOCKET_DIR = require('fio').cwd() > +local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua") > + > +local function unix_socket(name) > + return SOCKET_DIR .. "/" .. name .. '.sock'; > +end > + > +require('console').listen(os.getenv('ADMIN')) > + > +if INSTANCE_ID == "master" then > + box.cfg({ > + listen = unix_socket("master"), > + }) > +elseif INSTANCE_ID == "replica1" then > + box.cfg({ > + listen = unix_socket("replica1"), > + replication = { > + unix_socket("master"), > + unix_socket("replica1") > + }, > + election_mode = 'voter' > + }) > +else > + assert(INSTANCE_ID == "replica2") > + box.cfg({ > + replication = { > + unix_socket("master"), > + }, > + election_mode = 'voter' > + }) > +end > + > +box.once("bootstrap", function() > + box.schema.user.grant('guest', 'super') > +end) > diff --git a/test/replication/gh-6035-replica1.lua b/test/replication/gh-6035-replica1.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-replica1.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/gh-6035-replica2.lua b/test/replication/gh-6035-replica2.lua > new file mode 120000 > index 000000000..f7ede7ef2 > --- /dev/null > +++ b/test/replication/gh-6035-replica2.lua > @@ -0,0 +1 @@ > +gh-6035-node.lua > \ No newline at end of file > diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg > index 27eab20c2..55ec022ff 100644 > --- a/test/replication/suite.cfg > +++ b/test/replication/suite.cfg > @@ -47,6 +47,9 @@ > "gh-6032-promote-wal-write.test.lua": {}, > "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, > "gh-6094-rs-uuid-mismatch.test.lua": {}, > + "gh-6035-applier-filter.test.lua": { > + "memtx": {"engine": "memtx"} > + }, > "*": { > "memtx": {"engine": "memtx"}, > "vinyl": {"engine": "vinyl"} -- Serge Petrenko ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 14:26 ` Serge Petrenko via Tarantool-patches @ 2021-06-15 17:02 ` Cyrill Gorcunov via Tarantool-patches 2021-06-15 18:08 ` Cyrill Gorcunov via Tarantool-patches 2021-06-16 8:16 ` Serge Petrenko via Tarantool-patches 0 siblings, 2 replies; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 17:02 UTC (permalink / raw) To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote: > > +box.space.sync:select{} > > + | --- > > + | - - [1] > > + | ... > > You need to wait for the space creation, just like > you do below. Otherwise the test'll be flaky. > Also, please see a comment regarding wait_lsn vs wait_cond > below. As being discussed due to quorum=2 and sync space we don't need to wait. > > +test_run:wait_cond(function() return \ > > + box.space.sync ~= nil and \ > > + box.space.sync:get{1} ~= nil and \ > > + box.space.sync:get{1}[1] == 1 end, 100) > > + | --- > > + | - true > > + | ... > > I suggest you use wait_lsn('replica2', 'master') here > instead of this bulky wait_cond. > First of all, it takes a single line, instead of 4 lines. > > Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning > the test will still fail occasionally, when index creation doesn't replicate > in time. You mean something like below? --- diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result index 2620e7b6f..7345a19f7 100644 --- a/test/replication/gh-6035-applier-filter.result +++ b/test/replication/gh-6035-applier-filter.result @@ -102,12 +102,8 @@ test_run:switch('replica2') | --- | - true | ... -test_run:wait_cond(function() return \ - box.space.sync ~= nil and \ - box.space.sync:get{1} ~= nil and \ - box.space.sync:get{1}[1] == 1 end, 100) +test_run:wait_lsn('replica2', 'master') | --- - | - true | ... box.space.sync:select{} | --- diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua index 9bfd91288..beca5258e 100644 --- a/test/replication/gh-6035-applier-filter.test.lua +++ b/test/replication/gh-6035-applier-filter.test.lua @@ -53,10 +53,7 @@ box.space.sync:select{} -- And the second hop is replica2 where -- replica1 replicated the data to us. test_run:switch('replica2') -test_run:wait_cond(function() return \ - box.space.sync ~= nil and \ - box.space.sync:get{1} ~= nil and \ - box.space.sync:get{1}[1] == 1 end, 100) +test_run:wait_lsn('replica2', 'master') box.space.sync:select{} test_run:switch('default') ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 17:02 ` Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 18:08 ` Cyrill Gorcunov via Tarantool-patches 2021-06-16 8:31 ` Serge Petrenko via Tarantool-patches 2021-06-16 8:16 ` Serge Petrenko via Tarantool-patches 1 sibling, 1 reply; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 18:08 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy On Tue, Jun 15, 2021 at 08:02:59PM +0300, Cyrill Gorcunov wrote: > On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote: > > > +box.space.sync:select{} > > > + | --- > > > + | - - [1] > > > + | ... > > > > You need to wait for the space creation, just like > > you do below. Otherwise the test'll be flaky. > > Also, please see a comment regarding wait_lsn vs wait_cond > > below. > > As being discussed due to quorum=2 and sync space we don't need > to wait. > > > > +test_run:wait_cond(function() return \ > > > + box.space.sync ~= nil and \ > > > + box.space.sync:get{1} ~= nil and \ > > > + box.space.sync:get{1}[1] == 1 end, 100) > > > + | --- > > > + | - true > > > + | ... > > > > I suggest you use wait_lsn('replica2', 'master') here > > instead of this bulky wait_cond. > > First of all, it takes a single line, instead of 4 lines. > > > > Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning > > the test will still fail occasionally, when index creation doesn't replicate > > in time. > > You mean something like below? > --- I force pushed an update. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 18:08 ` Cyrill Gorcunov via Tarantool-patches @ 2021-06-16 8:31 ` Serge Petrenko via Tarantool-patches 2021-06-16 8:43 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 1 reply; 15+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-06-16 8:31 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy 15.06.2021 21:08, Cyrill Gorcunov пишет: > On Tue, Jun 15, 2021 at 08:02:59PM +0300, Cyrill Gorcunov wrote: >> On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote: >>>> +box.space.sync:select{} >>>> + | --- >>>> + | - - [1] >>>> + | ... >>> You need to wait for the space creation, just like >>> you do below. Otherwise the test'll be flaky. >>> Also, please see a comment regarding wait_lsn vs wait_cond >>> below. >> As being discussed due to quorum=2 and sync space we don't need >> to wait. >> >>>> +test_run:wait_cond(function() return \ >>>> + box.space.sync ~= nil and \ >>>> + box.space.sync:get{1} ~= nil and \ >>>> + box.space.sync:get{1}[1] == 1 end, 100) >>>> + | --- >>>> + | - true >>>> + | ... >>> I suggest you use wait_lsn('replica2', 'master') here >>> instead of this bulky wait_cond. >>> First of all, it takes a single line, instead of 4 lines. >>> >>> Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning >>> the test will still fail occasionally, when index creation doesn't replicate >>> in time. >> You mean something like below? >> --- > I force pushed an update. Thanks for the fixes! I assume the new branch is gorcunov/gh-6035-applier-filter-9 LGTM. -- Serge Petrenko ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator 2021-06-16 8:31 ` Serge Petrenko via Tarantool-patches @ 2021-06-16 8:43 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 0 replies; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-16 8:43 UTC (permalink / raw) To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy On Wed, Jun 16, 2021 at 11:31:35AM +0300, Serge Petrenko wrote: > > Thanks for the fixes! > > I assume the new branch is > gorcunov/gh-6035-applier-filter-9 Thanks fore review and overall help! Yes, I pushed there and waiting the tests to proceed. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 17:02 ` Cyrill Gorcunov via Tarantool-patches 2021-06-15 18:08 ` Cyrill Gorcunov via Tarantool-patches @ 2021-06-16 8:16 ` Serge Petrenko via Tarantool-patches 1 sibling, 0 replies; 15+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-06-16 8:16 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tml, Vladislav Shpilevoy 15.06.2021 20:02, Cyrill Gorcunov пишет: > On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote: >>> +box.space.sync:select{} >>> + | --- >>> + | - - [1] >>> + | ... >> You need to wait for the space creation, just like >> you do below. Otherwise the test'll be flaky. >> Also, please see a comment regarding wait_lsn vs wait_cond >> below. > As being discussed due to quorum=2 and sync space we don't need > to wait. Yes, indeed. I missed that. >>> +test_run:wait_cond(function() return \ >>> + box.space.sync ~= nil and \ >>> + box.space.sync:get{1} ~= nil and \ >>> + box.space.sync:get{1}[1] == 1 end, 100) >>> + | --- >>> + | - true >>> + | ... >> I suggest you use wait_lsn('replica2', 'master') here >> instead of this bulky wait_cond. >> First of all, it takes a single line, instead of 4 lines. >> >> Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning >> the test will still fail occasionally, when index creation doesn't replicate >> in time. > You mean something like below? > --- > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result > index 2620e7b6f..7345a19f7 100644 > --- a/test/replication/gh-6035-applier-filter.result > +++ b/test/replication/gh-6035-applier-filter.result > @@ -102,12 +102,8 @@ test_run:switch('replica2') > | --- > | - true > | ... > -test_run:wait_cond(function() return \ > - box.space.sync ~= nil and \ > - box.space.sync:get{1} ~= nil and \ > - box.space.sync:get{1}[1] == 1 end, 100) > +test_run:wait_lsn('replica2', 'master') > | --- > - | - true > | ... > box.space.sync:select{} > | --- > diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua > index 9bfd91288..beca5258e 100644 > --- a/test/replication/gh-6035-applier-filter.test.lua > +++ b/test/replication/gh-6035-applier-filter.test.lua > @@ -53,10 +53,7 @@ box.space.sync:select{} > -- And the second hop is replica2 where > -- replica1 replicated the data to us. > test_run:switch('replica2') > -test_run:wait_cond(function() return \ > - box.space.sync ~= nil and \ > - box.space.sync:get{1} ~= nil and \ > - box.space.sync:get{1}[1] == 1 end, 100) > +test_run:wait_lsn('replica2', 'master') > box.space.sync:select{} > > test_run:switch('default') Exactly. -- Serge Petrenko ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches 2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches @ 2021-06-18 21:49 ` Vladislav Shpilevoy via Tarantool-patches 2021-06-18 22:16 ` Cyrill Gorcunov via Tarantool-patches 2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches 2 siblings, 1 reply; 15+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 21:49 UTC (permalink / raw) To: Cyrill Gorcunov, tml Hi! Thanks for the patch! See 4 comments below. > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result > new file mode 100644 > index 000000000..7345a19f7 > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.result 1. Maybe better rename it to gh-6035-election-filter. Because otherwise this test is not included when I run `python test-run.py election`. And I do that quite often when make a patch which affects election only. > @@ -0,0 +1,140 @@ > +-- test-run result file version 2 > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- > +-- Prepare a scheme with transitional node > +-- > +-- master <=> replica1 => replica2 > +-- > +-- such as transaction initiated on the master node would > +-- be replicated to the replica2 via interim replica1 node. > +-- > + > +test_run:cmd('create server master with script="replication/gh-6035-master.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"') > + | --- > + | - true > + | ... > + > +test_run:cmd('start server master') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica2') > + | --- > + | - true > + | ... > + > +test_run:switch('replica2') > + | --- > + | - true > + | ... > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} 2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works just fine. The same for the other socket paths. > + | --- > + | ... > + > +-- > +-- Make the master to be RAFT leader, this drops connection > +-- to the replica2. 3. There was no connection to replica2 from master. > +test_run:switch('master') > + | --- > + | - true > + | ... > +box.cfg({ \ > + replication = { \ > + require('fio').cwd() .. "/master.sock", \ > + require('fio').cwd() .. "/replica1.sock", \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +}) > diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg > index 3a0a8649f..3b5cee75b 100644 > --- a/test/replication/suite.cfg > +++ b/test/replication/suite.cfg > @@ -50,6 +50,9 @@ > "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, > "gh-6094-rs-uuid-mismatch.test.lua": {}, > "gh-6127-election-join-new.test.lua": {}, > + "gh-6035-applier-filter.test.lua": { > + "memtx": {"engine": "memtx"} > + }, 4. You don't need to specify the engine. You don't use the engine variable in the test. Just leave it empty {}. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator 2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:16 ` Cyrill Gorcunov via Tarantool-patches 2021-06-18 22:58 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-18 22:16 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Fri, Jun 18, 2021 at 11:49:39PM +0200, Vladislav Shpilevoy wrote: > Hi! Thanks for the patch! > > See 4 comments below. > > > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result > > new file mode 100644 > > index 000000000..7345a19f7 > > --- /dev/null > > +++ b/test/replication/gh-6035-applier-filter.result > > 1. Maybe better rename it to gh-6035-election-filter. Because otherwise > this test is not included when I run `python test-run.py election`. And I > do that quite often when make a patch which affects election only. Sure, I don't mind. > > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > > 2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works > just fine. The same for the other socket paths. Actually I too this from other examples since I suspect the use of absolute path might be critical for test run engine. It is not a problem to rename but other our tests do use this trick so meaybe we should stick with same approach? Again, I don't mind to use relative path if this won't cause problems in future. > > +-- Make the master to be RAFT leader, this drops connection > > +-- to the replica2. > > 3. There was no connection to replica2 from master. > I'll update the comment, thanks! (actually replica2 connected to the master initialy and this is full duplex connection which we close on reconfig, that's what I meant saying "dropping" connection, but this seems to be confusing). > > + "gh-6035-applier-filter.test.lua": { > > + "memtx": {"engine": "memtx"} > > + }, > > 4. You don't need to specify the engine. You don't use the > engine variable in the test. Just leave it empty {}. OK ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator 2021-06-18 22:16 ` Cyrill Gorcunov via Tarantool-patches @ 2021-06-18 22:58 ` Vladislav Shpilevoy via Tarantool-patches 2021-06-19 11:01 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 1 reply; 15+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:58 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tml >>> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} >> >> 2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works >> just fine. The same for the other socket paths. > > Actually I too this from other examples since I suspect the use of > absolute path might be critical for test run engine. It is not a problem > to rename but other our tests do use this trick so meaybe we should stick > with same approach? Again, I don't mind to use relative path if this > won't cause problems in future. Cargo cult does not work. If something is done somewhere, it does not mean it is correct. I see no issues with using the relative path. Anyway if it won't work someday, your way won't either, because cwd() is the same as "./". >>> +-- Make the master to be RAFT leader, this drops connection >>> +-- to the replica2. >> >> 3. There was no connection to replica2 from master. >> > > I'll update the comment, thanks! (actually replica2 > connected to the master initialy and this is full > duplex connection which we close on reconfig, that's > what I meant saying "dropping" connection, but this > seems to be confusing). In the comment you said that this reconfig drops the connection. But it still does not, even in the updated definition above. You dropped this connection a few lines before this comment. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator 2021-06-18 22:58 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-06-19 11:01 ` Cyrill Gorcunov via Tarantool-patches 2021-06-20 14:50 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-19 11:01 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Sat, Jun 19, 2021 at 12:58:03AM +0200, Vladislav Shpilevoy wrote: > >>> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} > >> > >> 2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works > >> just fine. The same for the other socket paths. > > > > Actually I too this from other examples since I suspect the use of > > absolute path might be critical for test run engine. It is not a problem > > to rename but other our tests do use this trick so meaybe we should stick > > with same approach? Again, I don't mind to use relative path if this > > won't cause problems in future. > > Cargo cult does not work. If something is done somewhere, it does not > mean it is correct. I see no issues with using the relative path. This has nothing to do with cargo cult, but rather with code unification. Either we start using relative paths everywhere, or we continue using abs paths. I'll change to relatve paths since you prefer. > Anyway if it won't work someday, your way won't either, because > cwd() is the same as "./". This is a bit more complex when mount namespaces step in, because open("cwd()" + "./entry") is not the same as open("./entry"). > >>> +-- Make the master to be RAFT leader, this drops connection > >>> +-- to the replica2. > >> > >> 3. There was no connection to replica2 from master. > >> > > > > I'll update the comment, thanks! (actually replica2 > > connected to the master initialy and this is full > > duplex connection which we close on reconfig, that's > > what I meant saying "dropping" connection, but this > > seems to be confusing). > > In the comment you said that this reconfig > drops the connection. But it still does not, even in > the updated definition above. > > You dropped this connection a few lines before this > comment. Vlad, here is an update, does it look better? I force pushed it into the same branch. --- diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result index 7345a19f7..2fa593267 100644 --- a/test/replication/gh-6035-applier-filter.result +++ b/test/replication/gh-6035-applier-filter.result @@ -48,24 +48,23 @@ test_run:switch('replica2') | --- | - true | ... -box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} +box.cfg({replication = {"unix/:./replica1.sock"}}) | --- | ... -- --- Make the master to be RAFT leader, this drops connection --- to the replica2. +-- Make the master to be RAFT leader. test_run:switch('master') | --- | - true | ... -box.cfg({ \ - replication = { \ - require('fio').cwd() .. "/master.sock", \ - require('fio').cwd() .. "/replica1.sock", \ - }, \ - replication_synchro_quorum = 2, \ - election_mode = 'manual', \ +box.cfg({ \ + replication = { \ + "unix/:./master.sock", \ + "unix/:./replica1.sock", \ + }, \ + replication_synchro_quorum = 2, \ + election_mode = 'manual', \ }) | --- | ... diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua index beca5258e..716c84bb6 100644 --- a/test/replication/gh-6035-applier-filter.test.lua +++ b/test/replication/gh-6035-applier-filter.test.lua @@ -24,19 +24,18 @@ test_run:cmd('start server replica1') test_run:cmd('start server replica2') test_run:switch('replica2') -box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}} +box.cfg({replication = {"unix/:./replica1.sock"}}) -- --- Make the master to be RAFT leader, this drops connection --- to the replica2. +-- Make the master to be RAFT leader. test_run:switch('master') -box.cfg({ \ - replication = { \ - require('fio').cwd() .. "/master.sock", \ - require('fio').cwd() .. "/replica1.sock", \ - }, \ - replication_synchro_quorum = 2, \ - election_mode = 'manual', \ +box.cfg({ \ + replication = { \ + "unix/:./master.sock", \ + "unix/:./replica1.sock", \ + }, \ + replication_synchro_quorum = 2, \ + election_mode = 'manual', \ }) box.ctl.promote() diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua index e3819471a..819a20332 100644 --- a/test/replication/gh-6035-node.lua +++ b/test/replication/gh-6035-node.lua @@ -1,8 +1,7 @@ -local SOCKET_DIR = require('fio').cwd() local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua") local function unix_socket(name) - return SOCKET_DIR .. "/" .. name .. '.sock'; + return "unix/:./" .. name .. '.sock'; end require('console').listen(os.getenv('ADMIN')) diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 3b5cee75b..69f2f3511 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -50,9 +50,7 @@ "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, "gh-6094-rs-uuid-mismatch.test.lua": {}, "gh-6127-election-join-new.test.lua": {}, - "gh-6035-applier-filter.test.lua": { - "memtx": {"engine": "memtx"} - }, + "gh-6035-applier-filter.test.lua": {}, "*": { "memtx": {"engine": "memtx"}, "vinyl": {"engine": "vinyl"} ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator 2021-06-19 11:01 ` Cyrill Gorcunov via Tarantool-patches @ 2021-06-20 14:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-06-20 17:17 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 1 reply; 15+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-20 14:50 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tml Thanks for the fixes! >>>>> +-- Make the master to be RAFT leader, this drops connection >>>>> +-- to the replica2. >>>> >>>> 3. There was no connection to replica2 from master. >>>> >>> >>> I'll update the comment, thanks! (actually replica2 >>> connected to the master initialy and this is full >>> duplex connection which we close on reconfig, that's >>> what I meant saying "dropping" connection, but this >>> seems to be confusing). >> >> In the comment you said that this reconfig >> drops the connection. But it still does not, even in >> the updated definition above. >> >> You dropped this connection a few lines before this >> comment. > > Vlad, here is an update, does it look better? It looks better, but why did you ignore my first comment? I remind it here: Maybe better rename it to gh-6035-election-filter. Because otherwise this test is not included when I run `python test-run.py election`. And I do that quite often when make a patch which affects election only. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator 2021-06-20 14:50 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-06-20 17:17 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 0 replies; 15+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-20 17:17 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml On Sun, Jun 20, 2021 at 04:50:10PM +0200, Vladislav Shpilevoy wrote: > > It looks better, but why did you ignore my first comment? > I remind it here: > > Maybe better rename it to gh-6035-election-filter. Because otherwise > this test is not included when I run `python test-run.py election`. And I > do that quite often when make a patch which affects election only. I didn't ignore it. Renamed it locally after I've sent you the fixes, simply because if was more convenient for me to make the rename as a last step. Everything is pushed into the branch. Thanks. src/box/applier.cc | 27 ++++++++++--------- src/lib/raft/raft.h | 7 ----- test/replication/gh-5445-leader-inconsistency.result | 15 +++++++++++ test/replication/gh-5445-leader-inconsistency.test.lua | 5 ++++ test/replication/gh-6035-election-filter.result | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ test/replication/gh-6035-election-filter.test.lua | 64 ++++++++++++++++++++++++++++++++++++++++++++ test/replication/gh-6035-master.lua | 1 + test/replication/gh-6035-node.lua | 34 +++++++++++++++++++++++ test/replication/gh-6035-replica1.lua | 1 + test/replication/gh-6035-replica2.lua | 1 + test/replication/suite.cfg | 1 + (I just looked into our repo and force pushed it again just to be sure) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator 2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches 2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches 2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches @ 2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches 2 siblings, 0 replies; 15+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-21 21:06 UTC (permalink / raw) To: Cyrill Gorcunov, tml Hi! Thanks for the patch! I pushed it to master and 2.8. On 2.7 the test gh-5445-... does not pass. Please, cherry-pick it on 2.7 locally, investigate, and send a new patchset specifically for 2.7. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-06-21 21:06 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches 2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches 2021-06-15 14:26 ` Serge Petrenko via Tarantool-patches 2021-06-15 17:02 ` Cyrill Gorcunov via Tarantool-patches 2021-06-15 18:08 ` Cyrill Gorcunov via Tarantool-patches 2021-06-16 8:31 ` Serge Petrenko via Tarantool-patches 2021-06-16 8:43 ` Cyrill Gorcunov via Tarantool-patches 2021-06-16 8:16 ` Serge Petrenko via Tarantool-patches 2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches 2021-06-18 22:16 ` Cyrill Gorcunov via Tarantool-patches 2021-06-18 22:58 ` Vladislav Shpilevoy via Tarantool-patches 2021-06-19 11:01 ` Cyrill Gorcunov via Tarantool-patches 2021-06-20 14:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-06-20 17:17 ` Cyrill Gorcunov via Tarantool-patches 2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox