Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v5] applier: filter incoming synchro packets via transaction initiator
@ 2021-06-08 18:08 Cyrill Gorcunov via Tarantool-patches
  2021-06-08 18:52 ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 2+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-08 18:08 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Currently we use synchro packets filtration based on their contents,
in particular by their xrow->replica_id value. Still there was a
question if we can optimize this moment and rather filter our all
packets coming from non-leader replica.

Raft specification requires that only data from a current leader
should be applied to local WAL but doesn't put a concrete claim on
the data transport, ie how exactly rows are reaching replicas. This
implies that data propagation may reach replicas indirectly via transit
hops. Thus we drop applier->instance_id filtering and rely on
xrow->replica_id matching instead.

In the test (inspired by Serge Petrenko's test) we recreate the situation
where replica3 obtains master's node data (which is a raft leader)
indirectly via replica2 node.

Closes #6035

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
issue https://github.com/tarantool/tarantool/issues/6035
branch gorcunov/gh-6035-applier-filter-5

Guys, take a look please, would such description be verbose enough?
Should we allow such configurations at all?

 src/box/applier.cc                            |  27 ++--
 src/lib/raft/raft.h                           |   7 -
 .../replication/gh-6035-applier-filter.result | 147 ++++++++++++++++++
 .../gh-6035-applier-filter.test.lua           |  66 ++++++++
 test/replication/gh6035master.lua             |   1 +
 test/replication/gh6035node.lua               |  35 +++++
 test/replication/gh6035replica1.lua           |   1 +
 test/replication/gh6035replica2.lua           |   1 +
 8 files changed, 266 insertions(+), 19 deletions(-)
 create mode 100644 test/replication/gh-6035-applier-filter.result
 create mode 100644 test/replication/gh-6035-applier-filter.test.lua
 create mode 120000 test/replication/gh6035master.lua
 create mode 100644 test/replication/gh6035node.lua
 create mode 120000 test/replication/gh6035replica1.lua
 create mode 120000 test/replication/gh6035replica2.lua

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 33181fdbf..d3430f582 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -1027,21 +1027,24 @@ nopify:;
  * Return 0 for success or -1 in case of an error.
  */
 static int
-applier_apply_tx(struct applier *applier, struct stailq *rows)
+applier_apply_tx(struct stailq *rows)
 {
 	/*
-	 * Rows received not directly from a leader are ignored. That is a
-	 * protection against the case when an old leader keeps sending data
-	 * around not knowing yet that it is not a leader anymore.
+	 * Initially we've been filtering out data if it came from
+	 * an applier which instance_id doesn't match raft->leader,
+	 * but this prevents from obtaining valid leader's data when
+	 * it comes from intermediate node. For example a series of
+	 * replica hops
 	 *
-	 * XXX: it may be that this can be fine to apply leader transactions by
-	 * looking at their replica_id field if it is equal to leader id. That
-	 * can be investigated as an 'optimization'. Even though may not give
-	 * anything, because won't change total number of rows sent in the
-	 * network anyway.
+	 *  master -> replica 1 -> replica 2
+	 *
+	 * where each replica carries master's initiated transaction
+	 * in xrow->replica_id field and master's data get propagated
+	 * indirectly.
+	 *
+	 * Finally we dropped such "sender" filtration and use transaction
+	 * "initiator" filtration via xrow->replica_id only.
 	 */
-	if (!raft_is_source_allowed(box_raft(), applier->instance_id))
-		return 0;
 	struct xrow_header *first_row = &stailq_first_entry(rows,
 					struct applier_tx_row, next)->row;
 	struct xrow_header *last_row;
@@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier)
 					diag_raise();
 			}
 			applier_signal_ack(applier);
-		} else if (applier_apply_tx(applier, &rows) != 0) {
+		} else if (applier_apply_tx(&rows) != 0) {
 			diag_raise();
 		}
 
diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
index a8da564b0..fae30b03d 100644
--- a/src/lib/raft/raft.h
+++ b/src/lib/raft/raft.h
@@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft)
 	return raft->is_enabled && raft->state != RAFT_STATE_LEADER;
 }
 
-/** See if the instance can accept rows from an instance with the given ID. */
-static inline bool
-raft_is_source_allowed(const struct raft *raft, uint32_t source_id)
-{
-	return !raft->is_enabled || raft->leader == source_id;
-}
-
 /** Check if Raft is enabled. */
 static inline bool
 raft_is_enabled(const struct raft *raft)
diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
new file mode 100644
index 000000000..cf032c332
--- /dev/null
+++ b/test/replication/gh-6035-applier-filter.result
@@ -0,0 +1,147 @@
+-- test-run result file version 2
+--
+-- gh-6035: verify synchronous rows filtration in applier,
+-- we need to be sure that filtering synchronous rows is
+-- done via transaction initiator not sender (iow via
+-- xrow->replica_id).
+--
+
+local SOCKET_DIR = require('fio').cwd()
+ | ---
+ | ...
+local function unix_socket(name)                \
+    return SOCKET_DIR .. "/" .. name .. '.sock' \
+end
+ | ---
+ | ...
+
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+
+test_run:cmd('create server gh6035master with script="replication/gh6035master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server gh6035master with args="gh6035master"')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('create server gh6035replica1 with script="replication/gh6035replica1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server gh6035replica1 with args="gh6035replica1"')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('create server gh6035replica2 with script="replication/gh6035replica2.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server gh6035replica2 with args="gh6035replica2"')
+ | ---
+ | - true
+ | ...
+
+--
+-- The gh6035master node connected to gh6035replica1 and gh6035replica2,
+-- where each of gh6035replicaX connected to the gh6035master only.
+--
+-- Lets reroute gh6035replica2 to gh6035replica1 so gh6035replica1 gonna
+-- be sending data to gh6035replica2 as a transit hop from gh6035master.
+test_run:cmd('switch gh6035replica2')
+ | ---
+ | - true
+ | ...
+box.cfg{replication = {unix_socket("gh6035replica1")}}
+ | ---
+ | ...
+
+--
+-- Make the gh6035master to be RAFT leader.
+test_run:cmd('switch gh6035master')
+ | ---
+ | - true
+ | ...
+box.cfg{                                    \
+    replication = {                         \
+        unix_socket("gh6035master"),        \
+        unix_socket("gh6035replica1"),      \
+    },                                      \
+    replication_synchro_quorum = 2,         \
+    election_mode = 'manual',               \
+}
+ | ---
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+_ = box.schema.space.create("sync", {is_sync = true})
+ | ---
+ | ...
+_ = box.space.sync:create_index("pk")
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+
+-- The first hop is gh6035replica1.
+test_run:cmd('switch gh6035replica1')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+-- And the second hop is gh6035replica2 where
+-- gh6035replica1 replicated the data to us.
+test_run:cmd('switch gh6035replica2')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server gh6035master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server gh6035master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server gh6035replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server gh6035replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server gh6035replica2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server gh6035replica2')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua
new file mode 100644
index 000000000..280ae4bae
--- /dev/null
+++ b/test/replication/gh-6035-applier-filter.test.lua
@@ -0,0 +1,66 @@
+--
+-- gh-6035: verify synchronous rows filtration in applier,
+-- we need to be sure that filtering synchronous rows is
+-- done via transaction initiator not sender (iow via
+-- xrow->replica_id).
+--
+
+local SOCKET_DIR = require('fio').cwd()
+local function unix_socket(name)                \
+    return SOCKET_DIR .. "/" .. name .. '.sock' \
+end
+
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+
+test_run:cmd('create server gh6035master with script="replication/gh6035master.lua"')
+test_run:cmd('start server gh6035master with args="gh6035master"')
+
+test_run:cmd('create server gh6035replica1 with script="replication/gh6035replica1.lua"')
+test_run:cmd('start server gh6035replica1 with args="gh6035replica1"')
+
+test_run:cmd('create server gh6035replica2 with script="replication/gh6035replica2.lua"')
+test_run:cmd('start server gh6035replica2 with args="gh6035replica2"')
+
+--
+-- The gh6035master node connected to gh6035replica1 and gh6035replica2,
+-- where each of gh6035replicaX connected to the gh6035master only.
+--
+-- Lets reroute gh6035replica2 to gh6035replica1 so gh6035replica1 gonna
+-- be sending data to gh6035replica2 as a transit hop from gh6035master.
+test_run:cmd('switch gh6035replica2')
+box.cfg{replication = {unix_socket("gh6035replica1")}}
+
+--
+-- Make the gh6035master to be RAFT leader.
+test_run:cmd('switch gh6035master')
+box.cfg{                                    \
+    replication = {                         \
+        unix_socket("gh6035master"),        \
+        unix_socket("gh6035replica1"),      \
+    },                                      \
+    replication_synchro_quorum = 2,         \
+    election_mode = 'manual',               \
+}
+box.ctl.promote()
+_ = box.schema.space.create("sync", {is_sync = true})
+_ = box.space.sync:create_index("pk")
+box.space.sync:insert{1}
+
+-- The first hop is gh6035replica1.
+test_run:cmd('switch gh6035replica1')
+box.space.sync:select{}
+
+-- And the second hop is gh6035replica2 where
+-- gh6035replica1 replicated the data to us.
+test_run:cmd('switch gh6035replica2')
+box.space.sync:select{}
+
+test_run:cmd('switch default')
+test_run:cmd('stop server gh6035master')
+test_run:cmd('delete server gh6035master')
+test_run:cmd('stop server gh6035replica1')
+test_run:cmd('delete server gh6035replica1')
+test_run:cmd('stop server gh6035replica2')
+test_run:cmd('delete server gh6035replica2')
diff --git a/test/replication/gh6035master.lua b/test/replication/gh6035master.lua
new file mode 120000
index 000000000..43b20c2cc
--- /dev/null
+++ b/test/replication/gh6035master.lua
@@ -0,0 +1 @@
+gh6035node.lua
\ No newline at end of file
diff --git a/test/replication/gh6035node.lua b/test/replication/gh6035node.lua
new file mode 100644
index 000000000..2fb1ce86f
--- /dev/null
+++ b/test/replication/gh6035node.lua
@@ -0,0 +1,35 @@
+local SOCKET_DIR = require('fio').cwd()
+local node_name = arg[1]
+
+function unix_socket(name)
+    return SOCKET_DIR .. "/" .. name .. '.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+if node_name == "gh6035master" then
+    box.cfg({
+        listen = unix_socket("gh6035master"),
+    })
+elseif node_name == "gh6035replica1" then
+    box.cfg({
+        listen = unix_socket("gh6035replica1"),
+        replication = {
+            unix_socket("gh6035master"),
+            unix_socket("gh6035replica1")
+        },
+        election_mode = 'voter'
+    })
+else
+    assert(node_name == "gh6035replica2")
+    box.cfg({
+        replication = {
+            unix_socket("gh6035master"),
+        },
+        election_mode = 'voter'
+    })
+end
+
+box.once("bootstrap", function()
+    box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/gh6035replica1.lua b/test/replication/gh6035replica1.lua
new file mode 120000
index 000000000..43b20c2cc
--- /dev/null
+++ b/test/replication/gh6035replica1.lua
@@ -0,0 +1 @@
+gh6035node.lua
\ No newline at end of file
diff --git a/test/replication/gh6035replica2.lua b/test/replication/gh6035replica2.lua
new file mode 120000
index 000000000..43b20c2cc
--- /dev/null
+++ b/test/replication/gh6035replica2.lua
@@ -0,0 +1 @@
+gh6035node.lua
\ No newline at end of file
-- 
2.31.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Tarantool-patches] [PATCH v5] applier: filter incoming synchro packets via transaction initiator
  2021-06-08 18:08 [Tarantool-patches] [PATCH v5] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches
@ 2021-06-08 18:52 ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 2+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-08 18:52 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Good job on the patch!

See 7 comments below.

On 08.06.2021 20:08, Cyrill Gorcunov wrote:
> Currently we use synchro packets filtration based on their contents,
> in particular by their xrow->replica_id value. Still there was a
> question if we can optimize this moment and rather filter our all

1. our -> out.

> packets coming from non-leader replica.
> > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
> new file mode 100644
> index 000000000..cf032c332
> --- /dev/null
> +++ b/test/replication/gh-6035-applier-filter.result
> @@ -0,0 +1,147 @@
> +-- test-run result file version 2
> +--
> +-- gh-6035: verify synchronous rows filtration in applier,
> +-- we need to be sure that filtering synchronous rows is
> +-- done via transaction initiator not sender (iow via
> +-- xrow->replica_id).
> +--
> +
> +local SOCKET_DIR = require('fio').cwd()
> + | ---
> + | ...
> +local function unix_socket(name)                \
> +    return SOCKET_DIR .. "/" .. name .. '.sock' \
> +end

2. Both this function and SOCKET_DIR are not needed here. Because you
declared them as 'local', and they go out of scope on the next
line, and are deleted. In test-run's non-tap tests each line is a
visibility scope. Like in the console.

I deleted them from the test's code and the test passed.

It works below because unix_socket() is already declared as global in
your scripts calling box.cfg.

Because it is global, you get luacheck warnings:

	Checking test/replication/gh6035node.lua          6 warnings

	    test/replication/gh6035node.lua:4:10: (W111) setting non-standard global variable unix_socket
	    test/replication/gh6035node.lua:12:18: (W113) accessing undefined variable unix_socket
	    test/replication/gh6035node.lua:16:18: (W113) accessing undefined variable unix_socket
	    test/replication/gh6035node.lua:18:13: (W113) accessing undefined variable unix_socket
	    test/replication/gh6035node.lua:19:13: (W113) accessing undefined variable unix_socket
	    test/replication/gh6035node.lua:27:13: (W113) accessing undefined variable unix_socket

Maybe you could make the test not change box.cfg.replication?
You could build the master <-> replica1 -> replica2 topology
from the beginning.

You start them as read-write, no elections. Replica1 would register
on master. Replica2 would register on replcia1.

Then you make master 'manual', 'replica1' voter, and 'replica2'
voter. Then you promote the master and do the main part of the test.
No box.cfg.replication was altered.

> + | ---
> + | ...
> +
> +env = require('test_run')
> + | ---
> + | ...
> +test_run = env.new()

3. You don't need the env. You can do

	test_run = require('test_run').new()

> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')

4. You don't use the engine. Please, drop it, and make the test
run only once, without 2 engines.

> + | ---
> + | ...
> +
> +test_run:cmd('create server gh6035master with script="replication/gh6035master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server gh6035master with args="gh6035master"')

5. Filename is already passed as arg[0] into the script. Look at
autobootstrap.lua files to see how they extract the instance name
from the file name.

> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('create server gh6035replica1 with script="replication/gh6035replica1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server gh6035replica1 with args="gh6035replica1"')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('create server gh6035replica2 with script="replication/gh6035replica2.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server gh6035replica2 with args="gh6035replica2"')
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- The gh6035master node connected to gh6035replica1 and gh6035replica2,
> +-- where each of gh6035replicaX connected to the gh6035master only.
> +--
> +-- Lets reroute gh6035replica2 to gh6035replica1 so gh6035replica1 gonna
> +-- be sending data to gh6035replica2 as a transit hop from gh6035master.
> +test_run:cmd('switch gh6035replica2')

6. There is a shortcut 'test_run:switch()'.

I also propose to drop gh6035 prefix from the instance names (but keep
it in the script file names). It is hard to read otherwise.

Another proposal: change gh6035 to gh-6035 in the file names. Otherwise
they are too far from the test itself in the file explorer.

> + | ---
> + | - true
> + | ...
> +box.cfg{replication = {unix_socket("gh6035replica1")}}
> + | ---
> + | ...
> +
> +--
> +-- Make the gh6035master to be RAFT leader.
> +test_run:cmd('switch gh6035master')
> + | ---
> + | - true
> + | ...
> +box.cfg{                                    \
> +    replication = {                         \
> +        unix_socket("gh6035master"),        \
> +        unix_socket("gh6035replica1"),      \
> +    },                                      \
> +    replication_synchro_quorum = 2,         \
> +    election_mode = 'manual',               \
> +}
> + | ---
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +_ = box.schema.space.create("sync", {is_sync = true})
> + | ---
> + | ...
> +_ = box.space.sync:create_index("pk")
> + | ---
> + | ...
> +box.space.sync:insert{1}
> + | ---
> + | - [1]
> + | ...
> +
> +-- The first hop is gh6035replica1.
> +test_run:cmd('switch gh6035replica1')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{}
> + | ---
> + | - - [1]
> + | ...
> +
> +-- And the second hop is gh6035replica2 where
> +-- gh6035replica1 replicated the data to us.
> +test_run:cmd('switch gh6035replica2')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{}
> + | ---
> + | - - [1]

7. You can't be sure it is replicated to replica2 yet. If
replica1's disk is slow, it might need more time to write
it to WAL. You need to use test_run:wait_lsn() to be sure
that replica2 got the transaction. Wait_lsn() must make
replica2 wait for all data from the master node, not from
replica2. Because replica2 didn't write anything own.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-06-08 18:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08 18:08 [Tarantool-patches] [PATCH v5] applier: filter incoming synchro packets via transaction initiator Cyrill Gorcunov via Tarantool-patches
2021-06-08 18:52 ` Vladislav Shpilevoy via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox