Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
@ 2021-06-15 13:56 Cyrill Gorcunov via Tarantool-patches
  2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 13:56 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Guys, here is a patch which implements the filtration we are
interested in. I switched back to the former version with a few
updates:

 - fixed comment in gh-5445-leader-inconsistency test as Serge pointed
 - added comments to the new patch pointing what's going on
 - we tried to rework the test with Serge but it didn't make thnigs
   much easier so I left the former approach
 - we use only memtx engine as Vlad asked

Please give it a shot.

issue https://github.com/tarantool/tarantool/issues/6035
branch gorcunov/gh-6035-applier-filter-9

Cyrill Gorcunov (1):
  applier: filter incoming synchro packets via transaction initiator

 src/box/applier.cc                            |  27 ++--
 src/lib/raft/raft.h                           |   7 -
 .../gh-5445-leader-inconsistency.result       |  15 ++
 .../gh-5445-leader-inconsistency.test.lua     |   5 +
 .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++
 .../gh-6035-applier-filter.test.lua           |  68 +++++++++
 test/replication/gh-6035-master.lua           |   1 +
 test/replication/gh-6035-node.lua             |  35 +++++
 test/replication/gh-6035-replica1.lua         |   1 +
 test/replication/gh-6035-replica2.lua         |   1 +
 test/replication/suite.cfg                    |   3 +
 11 files changed, 288 insertions(+), 19 deletions(-)
 create mode 100644 test/replication/gh-6035-applier-filter.result
 create mode 100644 test/replication/gh-6035-applier-filter.test.lua
 create mode 120000 test/replication/gh-6035-master.lua
 create mode 100644 test/replication/gh-6035-node.lua
 create mode 120000 test/replication/gh-6035-replica1.lua
 create mode 120000 test/replication/gh-6035-replica2.lua


base-commit: 9fb95cfd78403b33cacbe765c35036a0532b2c02
-- 
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches
@ 2021-06-15 13:56 ` Cyrill Gorcunov via Tarantool-patches
  2021-06-15 14:26   ` Serge Petrenko via Tarantool-patches
  2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches
  2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches
  2 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 13:56 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Currently we use synchro packets filtration based on their contents,
in particular by their xrow->replica_id value. Still there was a
question if we can optimize this moment and rather filter out all
packets coming from non-leader replica.

Raft specification requires that only data from a current leader
should be applied to local WAL but doesn't put a concrete claim on
the data transport, ie how exactly rows are reaching replicas. This
implies that data propagation may reach replicas indirectly via transit
hops. Thus we drop applier->instance_id filtering and rely on
xrow->replica_id matching instead.

In the test (inspired by Serge Petrenko's test) we recreate the situation
where replica3 obtains master's node data (which is a raft leader)
indirectly via replica2 node.

Closes #6035

Co-developed-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/applier.cc                            |  27 ++--
 src/lib/raft/raft.h                           |   7 -
 .../gh-5445-leader-inconsistency.result       |  15 ++
 .../gh-5445-leader-inconsistency.test.lua     |   5 +
 .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++
 .../gh-6035-applier-filter.test.lua           |  68 +++++++++
 test/replication/gh-6035-master.lua           |   1 +
 test/replication/gh-6035-node.lua             |  35 +++++
 test/replication/gh-6035-replica1.lua         |   1 +
 test/replication/gh-6035-replica2.lua         |   1 +
 test/replication/suite.cfg                    |   3 +
 11 files changed, 288 insertions(+), 19 deletions(-)
 create mode 100644 test/replication/gh-6035-applier-filter.result
 create mode 100644 test/replication/gh-6035-applier-filter.test.lua
 create mode 120000 test/replication/gh-6035-master.lua
 create mode 100644 test/replication/gh-6035-node.lua
 create mode 120000 test/replication/gh-6035-replica1.lua
 create mode 120000 test/replication/gh-6035-replica2.lua

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 33181fdbf..d3430f582 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -1027,21 +1027,24 @@ nopify:;
  * Return 0 for success or -1 in case of an error.
  */
 static int
-applier_apply_tx(struct applier *applier, struct stailq *rows)
+applier_apply_tx(struct stailq *rows)
 {
 	/*
-	 * Rows received not directly from a leader are ignored. That is a
-	 * protection against the case when an old leader keeps sending data
-	 * around not knowing yet that it is not a leader anymore.
+	 * Initially we've been filtering out data if it came from
+	 * an applier which instance_id doesn't match raft->leader,
+	 * but this prevents from obtaining valid leader's data when
+	 * it comes from intermediate node. For example a series of
+	 * replica hops
 	 *
-	 * XXX: it may be that this can be fine to apply leader transactions by
-	 * looking at their replica_id field if it is equal to leader id. That
-	 * can be investigated as an 'optimization'. Even though may not give
-	 * anything, because won't change total number of rows sent in the
-	 * network anyway.
+	 *  master -> replica 1 -> replica 2
+	 *
+	 * where each replica carries master's initiated transaction
+	 * in xrow->replica_id field and master's data get propagated
+	 * indirectly.
+	 *
+	 * Finally we dropped such "sender" filtration and use transaction
+	 * "initiator" filtration via xrow->replica_id only.
 	 */
-	if (!raft_is_source_allowed(box_raft(), applier->instance_id))
-		return 0;
 	struct xrow_header *first_row = &stailq_first_entry(rows,
 					struct applier_tx_row, next)->row;
 	struct xrow_header *last_row;
@@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier)
 					diag_raise();
 			}
 			applier_signal_ack(applier);
-		} else if (applier_apply_tx(applier, &rows) != 0) {
+		} else if (applier_apply_tx(&rows) != 0) {
 			diag_raise();
 		}
 
diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
index a8da564b0..fae30b03d 100644
--- a/src/lib/raft/raft.h
+++ b/src/lib/raft/raft.h
@@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft)
 	return raft->is_enabled && raft->state != RAFT_STATE_LEADER;
 }
 
-/** See if the instance can accept rows from an instance with the given ID. */
-static inline bool
-raft_is_source_allowed(const struct raft *raft, uint32_t source_id)
-{
-	return !raft->is_enabled || raft->leader == source_id;
-}
-
 /** Check if Raft is enabled. */
 static inline bool
 raft_is_enabled(const struct raft *raft)
diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result
index 5c6169f50..8b9a4051a 100644
--- a/test/replication/gh-5445-leader-inconsistency.result
+++ b/test/replication/gh-5445-leader-inconsistency.result
@@ -178,6 +178,14 @@ test_run:cmd('stop server '..leader)
 is_possible_leader[leader_nr] = false
  | ---
  | ...
+-- And other node as well.
+test_run:cmd('stop server '..other)
+ | ---
+ | - true
+ | ...
+is_possible_leader[other_nr] = false
+ | ---
+ | ...
 
 -- Emulate a situation when next_leader wins the elections. It can't do that in
 -- this configuration, obviously, because it's behind the 'other' node, so set
@@ -195,6 +203,13 @@ assert(get_leader(is_possible_leader) == next_leader_nr)
  | ---
  | - true
  | ...
+test_run:cmd('start server '..other..' with args="1 0.4 voter 2"')
+ | ---
+ | - true
+ | ...
+is_possible_leader[other_nr] = true
+ | ---
+ | ...
 test_run:switch(other)
  | ---
  | - true
diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua
index e7952f5fa..b0b8baf36 100644
--- a/test/replication/gh-5445-leader-inconsistency.test.lua
+++ b/test/replication/gh-5445-leader-inconsistency.test.lua
@@ -82,6 +82,9 @@ test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
 test_run:switch('default')
 test_run:cmd('stop server '..leader)
 is_possible_leader[leader_nr] = false
+-- And other node as well.
+test_run:cmd('stop server '..other)
+is_possible_leader[other_nr] = false
 
 -- Emulate a situation when next_leader wins the elections. It can't do that in
 -- this configuration, obviously, because it's behind the 'other' node, so set
@@ -93,6 +96,8 @@ is_possible_leader[leader_nr] = false
 -- a situation when some rows from the old leader were not received).
 test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"')
 assert(get_leader(is_possible_leader) == next_leader_nr)
+test_run:cmd('start server '..other..' with args="1 0.4 voter 2"')
+is_possible_leader[other_nr] = true
 test_run:switch(other)
 -- New leader didn't know about the unconfirmed rows but still rolled them back.
 test_run:wait_cond(function() return box.space.test:get{2} == nil end)
diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
new file mode 100644
index 000000000..2620e7b6f
--- /dev/null
+++ b/test/replication/gh-6035-applier-filter.result
@@ -0,0 +1,144 @@
+-- test-run result file version 2
+--
+-- gh-6035: verify synchronous rows filtration in applier,
+-- we need to be sure that filtering synchronous rows is
+-- done via transaction initiator not sender (iow via
+-- xrow->replica_id).
+--
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- Prepare a scheme with transitional node
+--
+--  master <=> replica1 => replica2
+--
+-- such as transaction initiated on the master node would
+-- be replicated to the replica2 via interim replica1 node.
+--
+
+test_run:cmd('create server master with script="replication/gh-6035-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('start server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica2')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica2')
+ | ---
+ | - true
+ | ...
+box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
+ | ---
+ | ...
+
+--
+-- Make the master to be RAFT leader, this drops connection
+-- to the replica2.
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg({                                               \
+    replication = {                                     \
+            require('fio').cwd() .. "/master.sock",     \
+            require('fio').cwd() .. "/replica1.sock",   \
+        },                                              \
+    replication_synchro_quorum = 2,                     \
+    election_mode = 'manual',                           \
+})
+ | ---
+ | ...
+
+box.ctl.promote()
+ | ---
+ | ...
+_ = box.schema.space.create("sync", {is_sync = true})
+ | ---
+ | ...
+_ = box.space.sync:create_index("pk")
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+
+--
+-- The first hop is replica1.
+test_run:switch('replica1')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+--
+-- And the second hop is replica2 where
+-- replica1 replicated the data to us.
+test_run:switch('replica2')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return                    \
+                   box.space.sync ~= nil and            \
+                   box.space.sync:get{1} ~= nil and     \
+                   box.space.sync:get{1}[1] == 1 end, 100)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica2')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua
new file mode 100644
index 000000000..9bfd91288
--- /dev/null
+++ b/test/replication/gh-6035-applier-filter.test.lua
@@ -0,0 +1,68 @@
+--
+-- gh-6035: verify synchronous rows filtration in applier,
+-- we need to be sure that filtering synchronous rows is
+-- done via transaction initiator not sender (iow via
+-- xrow->replica_id).
+--
+test_run = require('test_run').new()
+
+--
+-- Prepare a scheme with transitional node
+--
+--  master <=> replica1 => replica2
+--
+-- such as transaction initiated on the master node would
+-- be replicated to the replica2 via interim replica1 node.
+--
+
+test_run:cmd('create server master with script="replication/gh-6035-master.lua"')
+test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"')
+test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"')
+
+test_run:cmd('start server master')
+test_run:cmd('start server replica1')
+test_run:cmd('start server replica2')
+
+test_run:switch('replica2')
+box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
+
+--
+-- Make the master to be RAFT leader, this drops connection
+-- to the replica2.
+test_run:switch('master')
+box.cfg({                                               \
+    replication = {                                     \
+            require('fio').cwd() .. "/master.sock",     \
+            require('fio').cwd() .. "/replica1.sock",   \
+        },                                              \
+    replication_synchro_quorum = 2,                     \
+    election_mode = 'manual',                           \
+})
+
+box.ctl.promote()
+_ = box.schema.space.create("sync", {is_sync = true})
+_ = box.space.sync:create_index("pk")
+box.space.sync:insert{1}
+
+--
+-- The first hop is replica1.
+test_run:switch('replica1')
+box.space.sync:select{}
+
+--
+-- And the second hop is replica2 where
+-- replica1 replicated the data to us.
+test_run:switch('replica2')
+test_run:wait_cond(function() return                    \
+                   box.space.sync ~= nil and            \
+                   box.space.sync:get{1} ~= nil and     \
+                   box.space.sync:get{1}[1] == 1 end, 100)
+box.space.sync:select{}
+
+test_run:switch('default')
+test_run:cmd('stop server master')
+test_run:cmd('delete server master')
+test_run:cmd('stop server replica1')
+test_run:cmd('delete server replica1')
+test_run:cmd('stop server replica2')
+test_run:cmd('delete server replica2')
diff --git a/test/replication/gh-6035-master.lua b/test/replication/gh-6035-master.lua
new file mode 120000
index 000000000..f7ede7ef2
--- /dev/null
+++ b/test/replication/gh-6035-master.lua
@@ -0,0 +1 @@
+gh-6035-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua
new file mode 100644
index 000000000..e3819471a
--- /dev/null
+++ b/test/replication/gh-6035-node.lua
@@ -0,0 +1,35 @@
+local SOCKET_DIR = require('fio').cwd()
+local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua")
+
+local function unix_socket(name)
+    return SOCKET_DIR .. "/" .. name .. '.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+if INSTANCE_ID == "master" then
+    box.cfg({
+        listen = unix_socket("master"),
+    })
+elseif INSTANCE_ID == "replica1" then
+    box.cfg({
+        listen = unix_socket("replica1"),
+        replication = {
+            unix_socket("master"),
+            unix_socket("replica1")
+        },
+        election_mode = 'voter'
+    })
+else
+    assert(INSTANCE_ID == "replica2")
+    box.cfg({
+        replication = {
+            unix_socket("master"),
+        },
+        election_mode = 'voter'
+    })
+end
+
+box.once("bootstrap", function()
+    box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/gh-6035-replica1.lua b/test/replication/gh-6035-replica1.lua
new file mode 120000
index 000000000..f7ede7ef2
--- /dev/null
+++ b/test/replication/gh-6035-replica1.lua
@@ -0,0 +1 @@
+gh-6035-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6035-replica2.lua b/test/replication/gh-6035-replica2.lua
new file mode 120000
index 000000000..f7ede7ef2
--- /dev/null
+++ b/test/replication/gh-6035-replica2.lua
@@ -0,0 +1 @@
+gh-6035-node.lua
\ No newline at end of file
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 27eab20c2..55ec022ff 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -47,6 +47,9 @@
     "gh-6032-promote-wal-write.test.lua": {},
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
+    "gh-6035-applier-filter.test.lua": {
+        "memtx": {"engine": "memtx"}
+    },
     "*": {
         "memtx": {"engine": "memtx"},
         "vinyl": {"engine": "vinyl"}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches
@ 2021-06-15 14:26   ` Serge Petrenko via Tarantool-patches
  2021-06-15 17:02     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 15+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-15 14:26 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy



15.06.2021 16:56, Cyrill Gorcunov пишет:
> Currently we use synchro packets filtration based on their contents,
> in particular by their xrow->replica_id value. Still there was a
> question if we can optimize this moment and rather filter out all
> packets coming from non-leader replica.
>
> Raft specification requires that only data from a current leader
> should be applied to local WAL but doesn't put a concrete claim on
> the data transport, ie how exactly rows are reaching replicas. This
> implies that data propagation may reach replicas indirectly via transit
> hops. Thus we drop applier->instance_id filtering and rely on
> xrow->replica_id matching instead.
>
> In the test (inspired by Serge Petrenko's test) we recreate the situation
> where replica3 obtains master's node data (which is a raft leader)
> indirectly via replica2 node.
>
> Closes #6035
>
> Co-developed-by: Serge Petrenko <sergepetrenko@tarantool.org>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---

Thanks for the fixes!

Please, find a couple of finalization comments below.

>   src/box/applier.cc                            |  27 ++--
>   src/lib/raft/raft.h                           |   7 -
>   .../gh-5445-leader-inconsistency.result       |  15 ++
>   .../gh-5445-leader-inconsistency.test.lua     |   5 +
>   .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++
>   .../gh-6035-applier-filter.test.lua           |  68 +++++++++
>   test/replication/gh-6035-master.lua           |   1 +
>   test/replication/gh-6035-node.lua             |  35 +++++
>   test/replication/gh-6035-replica1.lua         |   1 +
>   test/replication/gh-6035-replica2.lua         |   1 +
>   test/replication/suite.cfg                    |   3 +
>   11 files changed, 288 insertions(+), 19 deletions(-)
>   create mode 100644 test/replication/gh-6035-applier-filter.result
>   create mode 100644 test/replication/gh-6035-applier-filter.test.lua
>   create mode 120000 test/replication/gh-6035-master.lua
>   create mode 100644 test/replication/gh-6035-node.lua
>   create mode 120000 test/replication/gh-6035-replica1.lua
>   create mode 120000 test/replication/gh-6035-replica2.lua
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 33181fdbf..d3430f582 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -1027,21 +1027,24 @@ nopify:;
>    * Return 0 for success or -1 in case of an error.
>    */
>   static int
> -applier_apply_tx(struct applier *applier, struct stailq *rows)
> +applier_apply_tx(struct stailq *rows)
>   {
>   	/*
> -	 * Rows received not directly from a leader are ignored. That is a
> -	 * protection against the case when an old leader keeps sending data
> -	 * around not knowing yet that it is not a leader anymore.
> +	 * Initially we've been filtering out data if it came from
> +	 * an applier which instance_id doesn't match raft->leader,
> +	 * but this prevents from obtaining valid leader's data when
> +	 * it comes from intermediate node. For example a series of
> +	 * replica hops
>   	 *
> -	 * XXX: it may be that this can be fine to apply leader transactions by
> -	 * looking at their replica_id field if it is equal to leader id. That
> -	 * can be investigated as an 'optimization'. Even though may not give
> -	 * anything, because won't change total number of rows sent in the
> -	 * network anyway.
> +	 *  master -> replica 1 -> replica 2
> +	 *
> +	 * where each replica carries master's initiated transaction
> +	 * in xrow->replica_id field and master's data get propagated
> +	 * indirectly.
> +	 *
> +	 * Finally we dropped such "sender" filtration and use transaction
> +	 * "initiator" filtration via xrow->replica_id only.
>   	 */
> -	if (!raft_is_source_allowed(box_raft(), applier->instance_id))
> -		return 0;
>   	struct xrow_header *first_row = &stailq_first_entry(rows,
>   					struct applier_tx_row, next)->row;
>   	struct xrow_header *last_row;
> @@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier)
>   					diag_raise();
>   			}
>   			applier_signal_ack(applier);
> -		} else if (applier_apply_tx(applier, &rows) != 0) {
> +		} else if (applier_apply_tx(&rows) != 0) {
>   			diag_raise();
>   		}
>   
> diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
> index a8da564b0..fae30b03d 100644
> --- a/src/lib/raft/raft.h
> +++ b/src/lib/raft/raft.h
> @@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft)
>   	return raft->is_enabled && raft->state != RAFT_STATE_LEADER;
>   }
>   
> -/** See if the instance can accept rows from an instance with the given ID. */
> -static inline bool
> -raft_is_source_allowed(const struct raft *raft, uint32_t source_id)
> -{
> -	return !raft->is_enabled || raft->leader == source_id;
> -}
> -
>   /** Check if Raft is enabled. */
>   static inline bool
>   raft_is_enabled(const struct raft *raft)
> diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result
> index 5c6169f50..8b9a4051a 100644
> --- a/test/replication/gh-5445-leader-inconsistency.result
> +++ b/test/replication/gh-5445-leader-inconsistency.result
> @@ -178,6 +178,14 @@ test_run:cmd('stop server '..leader)
>   is_possible_leader[leader_nr] = false
>    | ---
>    | ...
> +-- And other node as well.
> +test_run:cmd('stop server '..other)
> + | ---
> + | - true
> + | ...
> +is_possible_leader[other_nr] = false
> + | ---
> + | ...
>   
>   -- Emulate a situation when next_leader wins the elections. It can't do that in
>   -- this configuration, obviously, because it's behind the 'other' node, so set
> @@ -195,6 +203,13 @@ assert(get_leader(is_possible_leader) == next_leader_nr)
>    | ---
>    | - true
>    | ...
> +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"')
> + | ---
> + | - true
> + | ...
> +is_possible_leader[other_nr] = true
> + | ---
> + | ...
>   test_run:switch(other)
>    | ---
>    | - true
> diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua
> index e7952f5fa..b0b8baf36 100644
> --- a/test/replication/gh-5445-leader-inconsistency.test.lua
> +++ b/test/replication/gh-5445-leader-inconsistency.test.lua
> @@ -82,6 +82,9 @@ test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
>   test_run:switch('default')
>   test_run:cmd('stop server '..leader)
>   is_possible_leader[leader_nr] = false
> +-- And other node as well.
> +test_run:cmd('stop server '..other)
> +is_possible_leader[other_nr] = false
>   
>   -- Emulate a situation when next_leader wins the elections. It can't do that in
>   -- this configuration, obviously, because it's behind the 'other' node, so set
> @@ -93,6 +96,8 @@ is_possible_leader[leader_nr] = false
>   -- a situation when some rows from the old leader were not received).
>   test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"')
>   assert(get_leader(is_possible_leader) == next_leader_nr)
> +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"')
> +is_possible_leader[other_nr] = true
>   test_run:switch(other)
>   -- New leader didn't know about the unconfirmed rows but still rolled them back.
>   test_run:wait_cond(function() return box.space.test:get{2} == nil end)
> diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
> new file mode 100644
> index 000000000..2620e7b6f
> --- /dev/null
> +++ b/test/replication/gh-6035-applier-filter.result
> @@ -0,0 +1,144 @@
> +-- test-run result file version 2
> +--
> +-- gh-6035: verify synchronous rows filtration in applier,
> +-- we need to be sure that filtering synchronous rows is
> +-- done via transaction initiator not sender (iow via
> +-- xrow->replica_id).
> +--
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +--
> +-- Prepare a scheme with transitional node
> +--
> +--  master <=> replica1 => replica2
> +--
> +-- such as transaction initiated on the master node would
> +-- be replicated to the replica2 via interim replica1 node.
> +--
> +
> +test_run:cmd('create server master with script="replication/gh-6035-master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('start server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica2')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica2')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
> + | ---
> + | ...
> +
> +--
> +-- Make the master to be RAFT leader, this drops connection
> +-- to the replica2.
> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +box.cfg({                                               \
> +    replication = {                                     \
> +            require('fio').cwd() .. "/master.sock",     \
> +            require('fio').cwd() .. "/replica1.sock",   \
> +        },                                              \
> +    replication_synchro_quorum = 2,                     \
> +    election_mode = 'manual',                           \
> +})
> + | ---
> + | ...
> +
> +box.ctl.promote()
> + | ---
> + | ...
> +_ = box.schema.space.create("sync", {is_sync = true})
> + | ---
> + | ...
> +_ = box.space.sync:create_index("pk")
> + | ---
> + | ...
> +box.space.sync:insert{1}
> + | ---
> + | - [1]
> + | ...
> +
> +--
> +-- The first hop is replica1.
> +test_run:switch('replica1')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{}
> + | ---
> + | - - [1]
> + | ...

You need to wait for the space creation, just like
you do below. Otherwise the test'll be flaky.
Also, please see a comment regarding wait_lsn vs wait_cond
below.

> +
> +--
> +-- And the second hop is replica2 where
> +-- replica1 replicated the data to us.
> +test_run:switch('replica2')
> + | ---
> + | - true
> + | ...
> +test_run:wait_cond(function() return                    \
> +                   box.space.sync ~= nil and            \
> +                   box.space.sync:get{1} ~= nil and     \
> +                   box.space.sync:get{1}[1] == 1 end, 100)
> + | ---
> + | - true
> + | ...

I suggest you use wait_lsn('replica2', 'master') here
instead of this bulky wait_cond.
First of all, it takes a single line, instead of 4 lines.

Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning
the test will still fail occasionally, when index creation doesn't 
replicate in time.

> +box.space.sync:select{}
> + | ---
> + | - - [1]
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica2')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica2')
> + | ---
> + | - true
> + | ...
> diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua
> new file mode 100644
> index 000000000..9bfd91288
> --- /dev/null
> +++ b/test/replication/gh-6035-applier-filter.test.lua
> @@ -0,0 +1,68 @@
> +--
> +-- gh-6035: verify synchronous rows filtration in applier,
> +-- we need to be sure that filtering synchronous rows is
> +-- done via transaction initiator not sender (iow via
> +-- xrow->replica_id).
> +--
> +test_run = require('test_run').new()
> +
> +--
> +-- Prepare a scheme with transitional node
> +--
> +--  master <=> replica1 => replica2
> +--
> +-- such as transaction initiated on the master node would
> +-- be replicated to the replica2 via interim replica1 node.
> +--
> +
> +test_run:cmd('create server master with script="replication/gh-6035-master.lua"')
> +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"')
> +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"')
> +
> +test_run:cmd('start server master')
> +test_run:cmd('start server replica1')
> +test_run:cmd('start server replica2')
> +
> +test_run:switch('replica2')
> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
> +
> +--
> +-- Make the master to be RAFT leader, this drops connection
> +-- to the replica2.
> +test_run:switch('master')
> +box.cfg({                                               \
> +    replication = {                                     \
> +            require('fio').cwd() .. "/master.sock",     \
> +            require('fio').cwd() .. "/replica1.sock",   \
> +        },                                              \
> +    replication_synchro_quorum = 2,                     \
> +    election_mode = 'manual',                           \
> +})
> +
> +box.ctl.promote()
> +_ = box.schema.space.create("sync", {is_sync = true})
> +_ = box.space.sync:create_index("pk")
> +box.space.sync:insert{1}
> +
> +--
> +-- The first hop is replica1.
> +test_run:switch('replica1')
> +box.space.sync:select{}
> +
> +--
> +-- And the second hop is replica2 where
> +-- replica1 replicated the data to us.
> +test_run:switch('replica2')
> +test_run:wait_cond(function() return                    \
> +                   box.space.sync ~= nil and            \
> +                   box.space.sync:get{1} ~= nil and     \
> +                   box.space.sync:get{1}[1] == 1 end, 100)
> +box.space.sync:select{}
> +
> +test_run:switch('default')
> +test_run:cmd('stop server master')
> +test_run:cmd('delete server master')
> +test_run:cmd('stop server replica1')
> +test_run:cmd('delete server replica1')
> +test_run:cmd('stop server replica2')
> +test_run:cmd('delete server replica2')
> diff --git a/test/replication/gh-6035-master.lua b/test/replication/gh-6035-master.lua
> new file mode 120000
> index 000000000..f7ede7ef2
> --- /dev/null
> +++ b/test/replication/gh-6035-master.lua
> @@ -0,0 +1 @@
> +gh-6035-node.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua
> new file mode 100644
> index 000000000..e3819471a
> --- /dev/null
> +++ b/test/replication/gh-6035-node.lua
> @@ -0,0 +1,35 @@
> +local SOCKET_DIR = require('fio').cwd()
> +local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua")
> +
> +local function unix_socket(name)
> +    return SOCKET_DIR .. "/" .. name .. '.sock';
> +end
> +
> +require('console').listen(os.getenv('ADMIN'))
> +
> +if INSTANCE_ID == "master" then
> +    box.cfg({
> +        listen = unix_socket("master"),
> +    })
> +elseif INSTANCE_ID == "replica1" then
> +    box.cfg({
> +        listen = unix_socket("replica1"),
> +        replication = {
> +            unix_socket("master"),
> +            unix_socket("replica1")
> +        },
> +        election_mode = 'voter'
> +    })
> +else
> +    assert(INSTANCE_ID == "replica2")
> +    box.cfg({
> +        replication = {
> +            unix_socket("master"),
> +        },
> +        election_mode = 'voter'
> +    })
> +end
> +
> +box.once("bootstrap", function()
> +    box.schema.user.grant('guest', 'super')
> +end)
> diff --git a/test/replication/gh-6035-replica1.lua b/test/replication/gh-6035-replica1.lua
> new file mode 120000
> index 000000000..f7ede7ef2
> --- /dev/null
> +++ b/test/replication/gh-6035-replica1.lua
> @@ -0,0 +1 @@
> +gh-6035-node.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-6035-replica2.lua b/test/replication/gh-6035-replica2.lua
> new file mode 120000
> index 000000000..f7ede7ef2
> --- /dev/null
> +++ b/test/replication/gh-6035-replica2.lua
> @@ -0,0 +1 @@
> +gh-6035-node.lua
> \ No newline at end of file
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index 27eab20c2..55ec022ff 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -47,6 +47,9 @@
>       "gh-6032-promote-wal-write.test.lua": {},
>       "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
>       "gh-6094-rs-uuid-mismatch.test.lua": {},
> +    "gh-6035-applier-filter.test.lua": {
> +        "memtx": {"engine": "memtx"}
> +    },
>       "*": {
>           "memtx": {"engine": "memtx"},
>           "vinyl": {"engine": "vinyl"}

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 14:26   ` Serge Petrenko via Tarantool-patches
@ 2021-06-15 17:02     ` Cyrill Gorcunov via Tarantool-patches
  2021-06-15 18:08       ` Cyrill Gorcunov via Tarantool-patches
  2021-06-16  8:16       ` Serge Petrenko via Tarantool-patches
  0 siblings, 2 replies; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 17:02 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy

On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote:
> > +box.space.sync:select{}
> > + | ---
> > + | - - [1]
> > + | ...
> 
> You need to wait for the space creation, just like
> you do below. Otherwise the test'll be flaky.
> Also, please see a comment regarding wait_lsn vs wait_cond
> below.

As being discussed due to quorum=2 and sync space we don't need
to wait.

> > +test_run:wait_cond(function() return                    \
> > +                   box.space.sync ~= nil and            \
> > +                   box.space.sync:get{1} ~= nil and     \
> > +                   box.space.sync:get{1}[1] == 1 end, 100)
> > + | ---
> > + | - true
> > + | ...
> 
> I suggest you use wait_lsn('replica2', 'master') here
> instead of this bulky wait_cond.
> First of all, it takes a single line, instead of 4 lines.
> 
> Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning
> the test will still fail occasionally, when index creation doesn't replicate
> in time.

You mean something like below?
---
diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
index 2620e7b6f..7345a19f7 100644
--- a/test/replication/gh-6035-applier-filter.result
+++ b/test/replication/gh-6035-applier-filter.result
@@ -102,12 +102,8 @@ test_run:switch('replica2')
  | ---
  | - true
  | ...
-test_run:wait_cond(function() return                    \
-                   box.space.sync ~= nil and            \
-                   box.space.sync:get{1} ~= nil and     \
-                   box.space.sync:get{1}[1] == 1 end, 100)
+test_run:wait_lsn('replica2', 'master')
  | ---
- | - true
  | ...
 box.space.sync:select{}
  | ---
diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua
index 9bfd91288..beca5258e 100644
--- a/test/replication/gh-6035-applier-filter.test.lua
+++ b/test/replication/gh-6035-applier-filter.test.lua
@@ -53,10 +53,7 @@ box.space.sync:select{}
 -- And the second hop is replica2 where
 -- replica1 replicated the data to us.
 test_run:switch('replica2')
-test_run:wait_cond(function() return                    \
-                   box.space.sync ~= nil and            \
-                   box.space.sync:get{1} ~= nil and     \
-                   box.space.sync:get{1}[1] == 1 end, 100)
+test_run:wait_lsn('replica2', 'master')
 box.space.sync:select{}
 
 test_run:switch('default')

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 17:02     ` Cyrill Gorcunov via Tarantool-patches
@ 2021-06-15 18:08       ` Cyrill Gorcunov via Tarantool-patches
  2021-06-16  8:31         ` Serge Petrenko via Tarantool-patches
  2021-06-16  8:16       ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-15 18:08 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

On Tue, Jun 15, 2021 at 08:02:59PM +0300, Cyrill Gorcunov wrote:
> On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote:
> > > +box.space.sync:select{}
> > > + | ---
> > > + | - - [1]
> > > + | ...
> > 
> > You need to wait for the space creation, just like
> > you do below. Otherwise the test'll be flaky.
> > Also, please see a comment regarding wait_lsn vs wait_cond
> > below.
> 
> As being discussed due to quorum=2 and sync space we don't need
> to wait.
> 
> > > +test_run:wait_cond(function() return                    \
> > > +                   box.space.sync ~= nil and            \
> > > +                   box.space.sync:get{1} ~= nil and     \
> > > +                   box.space.sync:get{1}[1] == 1 end, 100)
> > > + | ---
> > > + | - true
> > > + | ...
> > 
> > I suggest you use wait_lsn('replica2', 'master') here
> > instead of this bulky wait_cond.
> > First of all, it takes a single line, instead of 4 lines.
> > 
> > Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning
> > the test will still fail occasionally, when index creation doesn't replicate
> > in time.
> 
> You mean something like below?
> ---

I force pushed an update.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 17:02     ` Cyrill Gorcunov via Tarantool-patches
  2021-06-15 18:08       ` Cyrill Gorcunov via Tarantool-patches
@ 2021-06-16  8:16       ` Serge Petrenko via Tarantool-patches
  1 sibling, 0 replies; 15+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-16  8:16 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml, Vladislav Shpilevoy



15.06.2021 20:02, Cyrill Gorcunov пишет:
> On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote:
>>> +box.space.sync:select{}
>>> + | ---
>>> + | - - [1]
>>> + | ...
>> You need to wait for the space creation, just like
>> you do below. Otherwise the test'll be flaky.
>> Also, please see a comment regarding wait_lsn vs wait_cond
>> below.
> As being discussed due to quorum=2 and sync space we don't need
> to wait.

Yes, indeed. I missed that.

>>> +test_run:wait_cond(function() return                    \
>>> +                   box.space.sync ~= nil and            \
>>> +                   box.space.sync:get{1} ~= nil and     \
>>> +                   box.space.sync:get{1}[1] == 1 end, 100)
>>> + | ---
>>> + | - true
>>> + | ...
>> I suggest you use wait_lsn('replica2', 'master') here
>> instead of this bulky wait_cond.
>> First of all, it takes a single line, instead of 4 lines.
>>
>> Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning
>> the test will still fail occasionally, when index creation doesn't replicate
>> in time.
> You mean something like below?
> ---
> diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
> index 2620e7b6f..7345a19f7 100644
> --- a/test/replication/gh-6035-applier-filter.result
> +++ b/test/replication/gh-6035-applier-filter.result
> @@ -102,12 +102,8 @@ test_run:switch('replica2')
>    | ---
>    | - true
>    | ...
> -test_run:wait_cond(function() return                    \
> -                   box.space.sync ~= nil and            \
> -                   box.space.sync:get{1} ~= nil and     \
> -                   box.space.sync:get{1}[1] == 1 end, 100)
> +test_run:wait_lsn('replica2', 'master')
>    | ---
> - | - true
>    | ...
>   box.space.sync:select{}
>    | ---
> diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua
> index 9bfd91288..beca5258e 100644
> --- a/test/replication/gh-6035-applier-filter.test.lua
> +++ b/test/replication/gh-6035-applier-filter.test.lua
> @@ -53,10 +53,7 @@ box.space.sync:select{}
>   -- And the second hop is replica2 where
>   -- replica1 replicated the data to us.
>   test_run:switch('replica2')
> -test_run:wait_cond(function() return                    \
> -                   box.space.sync ~= nil and            \
> -                   box.space.sync:get{1} ~= nil and     \
> -                   box.space.sync:get{1}[1] == 1 end, 100)
> +test_run:wait_lsn('replica2', 'master')
>   box.space.sync:select{}
>   
>   test_run:switch('default')

Exactly.

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 18:08       ` Cyrill Gorcunov via Tarantool-patches
@ 2021-06-16  8:31         ` Serge Petrenko via Tarantool-patches
  2021-06-16  8:43           ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 15+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-16  8:31 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy



15.06.2021 21:08, Cyrill Gorcunov пишет:
> On Tue, Jun 15, 2021 at 08:02:59PM +0300, Cyrill Gorcunov wrote:
>> On Tue, Jun 15, 2021 at 05:26:06PM +0300, Serge Petrenko wrote:
>>>> +box.space.sync:select{}
>>>> + | ---
>>>> + | - - [1]
>>>> + | ...
>>> You need to wait for the space creation, just like
>>> you do below. Otherwise the test'll be flaky.
>>> Also, please see a comment regarding wait_lsn vs wait_cond
>>> below.
>> As being discussed due to quorum=2 and sync space we don't need
>> to wait.
>>
>>>> +test_run:wait_cond(function() return                    \
>>>> +                   box.space.sync ~= nil and            \
>>>> +                   box.space.sync:get{1} ~= nil and     \
>>>> +                   box.space.sync:get{1}[1] == 1 end, 100)
>>>> + | ---
>>>> + | - true
>>>> + | ...
>>> I suggest you use wait_lsn('replica2', 'master') here
>>> instead of this bulky wait_cond.
>>> First of all, it takes a single line, instead of 4 lines.
>>>
>>> Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning
>>> the test will still fail occasionally, when index creation doesn't replicate
>>> in time.
>> You mean something like below?
>> ---
> I force pushed an update.

Thanks for the fixes!

I assume the new branch is
gorcunov/gh-6035-applier-filter-9

LGTM.


-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-16  8:31         ` Serge Petrenko via Tarantool-patches
@ 2021-06-16  8:43           ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-16  8:43 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy

On Wed, Jun 16, 2021 at 11:31:35AM +0300, Serge Petrenko wrote:
> 
> Thanks for the fixes!
> 
> I assume the new branch is
> gorcunov/gh-6035-applier-filter-9

Thanks fore review and overall help! Yes, I pushed there
and waiting the tests to proceed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches
  2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches
@ 2021-06-18 21:49 ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-18 22:16   ` Cyrill Gorcunov via Tarantool-patches
  2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches
  2 siblings, 1 reply; 15+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 21:49 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patch!

See 4 comments below.

> diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
> new file mode 100644
> index 000000000..7345a19f7
> --- /dev/null
> +++ b/test/replication/gh-6035-applier-filter.result

1. Maybe better rename it to gh-6035-election-filter. Because otherwise
this test is not included when I run `python test-run.py election`. And I
do that quite often when make a patch which affects election only.

> @@ -0,0 +1,140 @@
> +-- test-run result file version 2
> +--
> +-- gh-6035: verify synchronous rows filtration in applier,
> +-- we need to be sure that filtering synchronous rows is
> +-- done via transaction initiator not sender (iow via
> +-- xrow->replica_id).
> +--
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +--
> +-- Prepare a scheme with transitional node
> +--
> +--  master <=> replica1 => replica2
> +--
> +-- such as transaction initiated on the master node would
> +-- be replicated to the replica2 via interim replica1 node.
> +--
> +
> +test_run:cmd('create server master with script="replication/gh-6035-master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('start server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica2')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica2')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}

2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works
just fine. The same for the other socket paths.

> + | ---
> + | ...
> +
> +--
> +-- Make the master to be RAFT leader, this drops connection
> +-- to the replica2.

3. There was no connection to replica2 from master.

> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +box.cfg({                                               \
> +    replication = {                                     \
> +            require('fio').cwd() .. "/master.sock",     \
> +            require('fio').cwd() .. "/replica1.sock",   \
> +        },                                              \
> +    replication_synchro_quorum = 2,                     \
> +    election_mode = 'manual',                           \
> +})
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index 3a0a8649f..3b5cee75b 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -50,6 +50,9 @@
>      "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
>      "gh-6094-rs-uuid-mismatch.test.lua": {},
>      "gh-6127-election-join-new.test.lua": {},
> +    "gh-6035-applier-filter.test.lua": {
> +        "memtx": {"engine": "memtx"}
> +    },

4. You don't need to specify the engine. You don't use the
engine variable in the test. Just leave it empty {}.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-18 22:16   ` Cyrill Gorcunov via Tarantool-patches
  2021-06-18 22:58     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-18 22:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Jun 18, 2021 at 11:49:39PM +0200, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> See 4 comments below.
> 
> > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
> > new file mode 100644
> > index 000000000..7345a19f7
> > --- /dev/null
> > +++ b/test/replication/gh-6035-applier-filter.result
> 
> 1. Maybe better rename it to gh-6035-election-filter. Because otherwise
> this test is not included when I run `python test-run.py election`. And I
> do that quite often when make a patch which affects election only.

Sure, I don't mind.

> > +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
> 
> 2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works
> just fine. The same for the other socket paths.

Actually I too this from other examples since I suspect the use of
absolute path might be critical for test run engine. It is not a problem
to rename but other our tests do use this trick so meaybe we should stick
with same approach? Again, I don't mind to use relative path if this
won't cause problems in future.

> > +-- Make the master to be RAFT leader, this drops connection
> > +-- to the replica2.
> 
> 3. There was no connection to replica2 from master.
>

I'll update the comment, thanks! (actually replica2
connected to the master initialy and this is full
duplex connection which we close on reconfig, that's
what I meant saying "dropping" connection, but this
seems to be confusing).

> > +    "gh-6035-applier-filter.test.lua": {
> > +        "memtx": {"engine": "memtx"}
> > +    },
> 
> 4. You don't need to specify the engine. You don't use the
> engine variable in the test. Just leave it empty {}.

OK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-18 22:16   ` Cyrill Gorcunov via Tarantool-patches
@ 2021-06-18 22:58     ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-19 11:01       ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 15+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:58 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

>>> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
>>
>> 2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works
>> just fine. The same for the other socket paths.
> 
> Actually I too this from other examples since I suspect the use of
> absolute path might be critical for test run engine. It is not a problem
> to rename but other our tests do use this trick so meaybe we should stick
> with same approach? Again, I don't mind to use relative path if this
> won't cause problems in future.

Cargo cult does not work. If something is done somewhere, it does not
mean it is correct. I see no issues with using the relative path.
Anyway if it won't work someday, your way won't either, because
cwd() is the same as "./".

>>> +-- Make the master to be RAFT leader, this drops connection
>>> +-- to the replica2.
>>
>> 3. There was no connection to replica2 from master.
>>
> 
> I'll update the comment, thanks! (actually replica2
> connected to the master initialy and this is full
> duplex connection which we close on reconfig, that's
> what I meant saying "dropping" connection, but this
> seems to be confusing).

In the comment you said that this reconfig
drops the connection. But it still does not, even in
the updated definition above.

You dropped this connection a few lines before this
comment.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-18 22:58     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-19 11:01       ` Cyrill Gorcunov via Tarantool-patches
  2021-06-20 14:50         ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-19 11:01 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sat, Jun 19, 2021 at 12:58:03AM +0200, Vladislav Shpilevoy wrote:
> >>> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
> >>
> >> 2. You don't need require('fio').cwd(). "unix/:./replica1.sock" works
> >> just fine. The same for the other socket paths.
> > 
> > Actually I too this from other examples since I suspect the use of
> > absolute path might be critical for test run engine. It is not a problem
> > to rename but other our tests do use this trick so meaybe we should stick
> > with same approach? Again, I don't mind to use relative path if this
> > won't cause problems in future.
> 
> Cargo cult does not work. If something is done somewhere, it does not
> mean it is correct. I see no issues with using the relative path.

This has nothing to do with cargo cult, but rather with code unification.
Either we start using relative paths everywhere, or we continue
using abs paths. I'll change to relatve paths since you prefer.

> Anyway if it won't work someday, your way won't either, because
> cwd() is the same as "./".

This is a bit more complex when mount namespaces step in,
because open("cwd()" + "./entry") is not the same as
open("./entry").

> >>> +-- Make the master to be RAFT leader, this drops connection
> >>> +-- to the replica2.
> >>
> >> 3. There was no connection to replica2 from master.
> >>
> > 
> > I'll update the comment, thanks! (actually replica2
> > connected to the master initialy and this is full
> > duplex connection which we close on reconfig, that's
> > what I meant saying "dropping" connection, but this
> > seems to be confusing).
> 
> In the comment you said that this reconfig
> drops the connection. But it still does not, even in
> the updated definition above.
> 
> You dropped this connection a few lines before this
> comment.

Vlad, here is an update, does it look better?
I force pushed it into the same branch.
---
diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
index 7345a19f7..2fa593267 100644
--- a/test/replication/gh-6035-applier-filter.result
+++ b/test/replication/gh-6035-applier-filter.result
@@ -48,24 +48,23 @@ test_run:switch('replica2')
  | ---
  | - true
  | ...
-box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
+box.cfg({replication = {"unix/:./replica1.sock"}})
  | ---
  | ...
 
 --
--- Make the master to be RAFT leader, this drops connection
--- to the replica2.
+-- Make the master to be RAFT leader.
 test_run:switch('master')
  | ---
  | - true
  | ...
-box.cfg({                                               \
-    replication = {                                     \
-            require('fio').cwd() .. "/master.sock",     \
-            require('fio').cwd() .. "/replica1.sock",   \
-        },                                              \
-    replication_synchro_quorum = 2,                     \
-    election_mode = 'manual',                           \
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./master.sock",              \
+            "unix/:./replica1.sock",            \
+    },                                          \
+    replication_synchro_quorum = 2,             \
+    election_mode = 'manual',                   \
 })
  | ---
  | ...
diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua
index beca5258e..716c84bb6 100644
--- a/test/replication/gh-6035-applier-filter.test.lua
+++ b/test/replication/gh-6035-applier-filter.test.lua
@@ -24,19 +24,18 @@ test_run:cmd('start server replica1')
 test_run:cmd('start server replica2')
 
 test_run:switch('replica2')
-box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
+box.cfg({replication = {"unix/:./replica1.sock"}})
 
 --
--- Make the master to be RAFT leader, this drops connection
--- to the replica2.
+-- Make the master to be RAFT leader.
 test_run:switch('master')
-box.cfg({                                               \
-    replication = {                                     \
-            require('fio').cwd() .. "/master.sock",     \
-            require('fio').cwd() .. "/replica1.sock",   \
-        },                                              \
-    replication_synchro_quorum = 2,                     \
-    election_mode = 'manual',                           \
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./master.sock",              \
+            "unix/:./replica1.sock",            \
+    },                                          \
+    replication_synchro_quorum = 2,             \
+    election_mode = 'manual',                   \
 })
 
 box.ctl.promote()
diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua
index e3819471a..819a20332 100644
--- a/test/replication/gh-6035-node.lua
+++ b/test/replication/gh-6035-node.lua
@@ -1,8 +1,7 @@
-local SOCKET_DIR = require('fio').cwd()
 local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua")
 
 local function unix_socket(name)
-    return SOCKET_DIR .. "/" .. name .. '.sock';
+    return "unix/:./" .. name .. '.sock';
 end
 
 require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 3b5cee75b..69f2f3511 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -50,9 +50,7 @@
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
     "gh-6127-election-join-new.test.lua": {},
-    "gh-6035-applier-filter.test.lua": {
-        "memtx": {"engine": "memtx"}
-    },
+    "gh-6035-applier-filter.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
         "vinyl": {"engine": "vinyl"}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-19 11:01       ` Cyrill Gorcunov via Tarantool-patches
@ 2021-06-20 14:50         ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-20 17:17           ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 15+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-20 14:50 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

Thanks for the fixes!

>>>>> +-- Make the master to be RAFT leader, this drops connection
>>>>> +-- to the replica2.
>>>>
>>>> 3. There was no connection to replica2 from master.
>>>>
>>>
>>> I'll update the comment, thanks! (actually replica2
>>> connected to the master initialy and this is full
>>> duplex connection which we close on reconfig, that's
>>> what I meant saying "dropping" connection, but this
>>> seems to be confusing).
>>
>> In the comment you said that this reconfig
>> drops the connection. But it still does not, even in
>> the updated definition above.
>>
>> You dropped this connection a few lines before this
>> comment.
> 
> Vlad, here is an update, does it look better?

It looks better, but why did you ignore my first comment?
I remind it here:

	Maybe better rename it to gh-6035-election-filter. Because otherwise
	this test is not included when I run `python test-run.py election`. And I
	do that quite often when make a patch which affects election only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-20 14:50         ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-20 17:17           ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 15+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-20 17:17 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sun, Jun 20, 2021 at 04:50:10PM +0200, Vladislav Shpilevoy wrote:
> 
> It looks better, but why did you ignore my first comment?
> I remind it here:
> 
> 	Maybe better rename it to gh-6035-election-filter. Because otherwise
> 	this test is not included when I run `python test-run.py election`. And I
> 	do that quite often when make a patch which affects election only.

I didn't ignore it. Renamed it locally after I've sent you the fixes,
simply because if was more convenient for me to make the rename
as a last step. Everything is pushed into the branch. Thanks.

 src/box/applier.cc                                     |  27 ++++++++++---------
 src/lib/raft/raft.h                                    |   7 -----
 test/replication/gh-5445-leader-inconsistency.result   |  15 +++++++++++
 test/replication/gh-5445-leader-inconsistency.test.lua |   5 ++++
 test/replication/gh-6035-election-filter.result        | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/replication/gh-6035-election-filter.test.lua      |  64 ++++++++++++++++++++++++++++++++++++++++++++
 test/replication/gh-6035-master.lua                    |   1 +
 test/replication/gh-6035-node.lua                      |  34 +++++++++++++++++++++++
 test/replication/gh-6035-replica1.lua                  |   1 +
 test/replication/gh-6035-replica2.lua                  |   1 +
 test/replication/suite.cfg                             |   1 +

(I just looked into our repo and force pushed it again just to be sure)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator
  2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches
  2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches
  2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches
  2 siblings, 0 replies; 15+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-21 21:06 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patch!

I pushed it to master and 2.8.

On 2.7 the test gh-5445-... does not pass. Please,
cherry-pick it on 2.7 locally, investigate, and
send a new patchset specifically for 2.7.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-21 21:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches
2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches
2021-06-15 14:26   ` Serge Petrenko via Tarantool-patches
2021-06-15 17:02     ` Cyrill Gorcunov via Tarantool-patches
2021-06-15 18:08       ` Cyrill Gorcunov via Tarantool-patches
2021-06-16  8:31         ` Serge Petrenko via Tarantool-patches
2021-06-16  8:43           ` Cyrill Gorcunov via Tarantool-patches
2021-06-16  8:16       ` Serge Petrenko via Tarantool-patches
2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches
2021-06-18 22:16   ` Cyrill Gorcunov via Tarantool-patches
2021-06-18 22:58     ` Vladislav Shpilevoy via Tarantool-patches
2021-06-19 11:01       ` Cyrill Gorcunov via Tarantool-patches
2021-06-20 14:50         ` Vladislav Shpilevoy via Tarantool-patches
2021-06-20 17:17           ` Cyrill Gorcunov via Tarantool-patches
2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git