Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
	tml <tarantool-patches@dev.tarantool.org>
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v9 1/1] applier: filter incoming synchro packets via transaction initiator
Date: Tue, 15 Jun 2021 17:26:06 +0300
Message-ID: <c692d3a3-f430-8982-1d31-fc4670d59098@tarantool.org> (raw)
In-Reply-To: <20210615135630.63465-2-gorcunov@gmail.com>



15.06.2021 16:56, Cyrill Gorcunov пишет:
> Currently we use synchro packets filtration based on their contents,
> in particular by their xrow->replica_id value. Still there was a
> question if we can optimize this moment and rather filter out all
> packets coming from non-leader replica.
>
> Raft specification requires that only data from a current leader
> should be applied to local WAL but doesn't put a concrete claim on
> the data transport, ie how exactly rows are reaching replicas. This
> implies that data propagation may reach replicas indirectly via transit
> hops. Thus we drop applier->instance_id filtering and rely on
> xrow->replica_id matching instead.
>
> In the test (inspired by Serge Petrenko's test) we recreate the situation
> where replica3 obtains master's node data (which is a raft leader)
> indirectly via replica2 node.
>
> Closes #6035
>
> Co-developed-by: Serge Petrenko <sergepetrenko@tarantool.org>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---

Thanks for the fixes!

Please, find a couple of finalization comments below.

>   src/box/applier.cc                            |  27 ++--
>   src/lib/raft/raft.h                           |   7 -
>   .../gh-5445-leader-inconsistency.result       |  15 ++
>   .../gh-5445-leader-inconsistency.test.lua     |   5 +
>   .../replication/gh-6035-applier-filter.result | 144 ++++++++++++++++++
>   .../gh-6035-applier-filter.test.lua           |  68 +++++++++
>   test/replication/gh-6035-master.lua           |   1 +
>   test/replication/gh-6035-node.lua             |  35 +++++
>   test/replication/gh-6035-replica1.lua         |   1 +
>   test/replication/gh-6035-replica2.lua         |   1 +
>   test/replication/suite.cfg                    |   3 +
>   11 files changed, 288 insertions(+), 19 deletions(-)
>   create mode 100644 test/replication/gh-6035-applier-filter.result
>   create mode 100644 test/replication/gh-6035-applier-filter.test.lua
>   create mode 120000 test/replication/gh-6035-master.lua
>   create mode 100644 test/replication/gh-6035-node.lua
>   create mode 120000 test/replication/gh-6035-replica1.lua
>   create mode 120000 test/replication/gh-6035-replica2.lua
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 33181fdbf..d3430f582 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -1027,21 +1027,24 @@ nopify:;
>    * Return 0 for success or -1 in case of an error.
>    */
>   static int
> -applier_apply_tx(struct applier *applier, struct stailq *rows)
> +applier_apply_tx(struct stailq *rows)
>   {
>   	/*
> -	 * Rows received not directly from a leader are ignored. That is a
> -	 * protection against the case when an old leader keeps sending data
> -	 * around not knowing yet that it is not a leader anymore.
> +	 * Initially we've been filtering out data if it came from
> +	 * an applier which instance_id doesn't match raft->leader,
> +	 * but this prevents from obtaining valid leader's data when
> +	 * it comes from intermediate node. For example a series of
> +	 * replica hops
>   	 *
> -	 * XXX: it may be that this can be fine to apply leader transactions by
> -	 * looking at their replica_id field if it is equal to leader id. That
> -	 * can be investigated as an 'optimization'. Even though may not give
> -	 * anything, because won't change total number of rows sent in the
> -	 * network anyway.
> +	 *  master -> replica 1 -> replica 2
> +	 *
> +	 * where each replica carries master's initiated transaction
> +	 * in xrow->replica_id field and master's data get propagated
> +	 * indirectly.
> +	 *
> +	 * Finally we dropped such "sender" filtration and use transaction
> +	 * "initiator" filtration via xrow->replica_id only.
>   	 */
> -	if (!raft_is_source_allowed(box_raft(), applier->instance_id))
> -		return 0;
>   	struct xrow_header *first_row = &stailq_first_entry(rows,
>   					struct applier_tx_row, next)->row;
>   	struct xrow_header *last_row;
> @@ -1312,7 +1315,7 @@ applier_subscribe(struct applier *applier)
>   					diag_raise();
>   			}
>   			applier_signal_ack(applier);
> -		} else if (applier_apply_tx(applier, &rows) != 0) {
> +		} else if (applier_apply_tx(&rows) != 0) {
>   			diag_raise();
>   		}
>   
> diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
> index a8da564b0..fae30b03d 100644
> --- a/src/lib/raft/raft.h
> +++ b/src/lib/raft/raft.h
> @@ -236,13 +236,6 @@ raft_is_ro(const struct raft *raft)
>   	return raft->is_enabled && raft->state != RAFT_STATE_LEADER;
>   }
>   
> -/** See if the instance can accept rows from an instance with the given ID. */
> -static inline bool
> -raft_is_source_allowed(const struct raft *raft, uint32_t source_id)
> -{
> -	return !raft->is_enabled || raft->leader == source_id;
> -}
> -
>   /** Check if Raft is enabled. */
>   static inline bool
>   raft_is_enabled(const struct raft *raft)
> diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result
> index 5c6169f50..8b9a4051a 100644
> --- a/test/replication/gh-5445-leader-inconsistency.result
> +++ b/test/replication/gh-5445-leader-inconsistency.result
> @@ -178,6 +178,14 @@ test_run:cmd('stop server '..leader)
>   is_possible_leader[leader_nr] = false
>    | ---
>    | ...
> +-- And other node as well.
> +test_run:cmd('stop server '..other)
> + | ---
> + | - true
> + | ...
> +is_possible_leader[other_nr] = false
> + | ---
> + | ...
>   
>   -- Emulate a situation when next_leader wins the elections. It can't do that in
>   -- this configuration, obviously, because it's behind the 'other' node, so set
> @@ -195,6 +203,13 @@ assert(get_leader(is_possible_leader) == next_leader_nr)
>    | ---
>    | - true
>    | ...
> +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"')
> + | ---
> + | - true
> + | ...
> +is_possible_leader[other_nr] = true
> + | ---
> + | ...
>   test_run:switch(other)
>    | ---
>    | - true
> diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua
> index e7952f5fa..b0b8baf36 100644
> --- a/test/replication/gh-5445-leader-inconsistency.test.lua
> +++ b/test/replication/gh-5445-leader-inconsistency.test.lua
> @@ -82,6 +82,9 @@ test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
>   test_run:switch('default')
>   test_run:cmd('stop server '..leader)
>   is_possible_leader[leader_nr] = false
> +-- And other node as well.
> +test_run:cmd('stop server '..other)
> +is_possible_leader[other_nr] = false
>   
>   -- Emulate a situation when next_leader wins the elections. It can't do that in
>   -- this configuration, obviously, because it's behind the 'other' node, so set
> @@ -93,6 +96,8 @@ is_possible_leader[leader_nr] = false
>   -- a situation when some rows from the old leader were not received).
>   test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"')
>   assert(get_leader(is_possible_leader) == next_leader_nr)
> +test_run:cmd('start server '..other..' with args="1 0.4 voter 2"')
> +is_possible_leader[other_nr] = true
>   test_run:switch(other)
>   -- New leader didn't know about the unconfirmed rows but still rolled them back.
>   test_run:wait_cond(function() return box.space.test:get{2} == nil end)
> diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result
> new file mode 100644
> index 000000000..2620e7b6f
> --- /dev/null
> +++ b/test/replication/gh-6035-applier-filter.result
> @@ -0,0 +1,144 @@
> +-- test-run result file version 2
> +--
> +-- gh-6035: verify synchronous rows filtration in applier,
> +-- we need to be sure that filtering synchronous rows is
> +-- done via transaction initiator not sender (iow via
> +-- xrow->replica_id).
> +--
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +--
> +-- Prepare a scheme with transitional node
> +--
> +--  master <=> replica1 => replica2
> +--
> +-- such as transaction initiated on the master node would
> +-- be replicated to the replica2 via interim replica1 node.
> +--
> +
> +test_run:cmd('create server master with script="replication/gh-6035-master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('start server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica2')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica2')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
> + | ---
> + | ...
> +
> +--
> +-- Make the master to be RAFT leader, this drops connection
> +-- to the replica2.
> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +box.cfg({                                               \
> +    replication = {                                     \
> +            require('fio').cwd() .. "/master.sock",     \
> +            require('fio').cwd() .. "/replica1.sock",   \
> +        },                                              \
> +    replication_synchro_quorum = 2,                     \
> +    election_mode = 'manual',                           \
> +})
> + | ---
> + | ...
> +
> +box.ctl.promote()
> + | ---
> + | ...
> +_ = box.schema.space.create("sync", {is_sync = true})
> + | ---
> + | ...
> +_ = box.space.sync:create_index("pk")
> + | ---
> + | ...
> +box.space.sync:insert{1}
> + | ---
> + | - [1]
> + | ...
> +
> +--
> +-- The first hop is replica1.
> +test_run:switch('replica1')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{}
> + | ---
> + | - - [1]
> + | ...

You need to wait for the space creation, just like
you do below. Otherwise the test'll be flaky.
Also, please see a comment regarding wait_lsn vs wait_cond
below.

> +
> +--
> +-- And the second hop is replica2 where
> +-- replica1 replicated the data to us.
> +test_run:switch('replica2')
> + | ---
> + | - true
> + | ...
> +test_run:wait_cond(function() return                    \
> +                   box.space.sync ~= nil and            \
> +                   box.space.sync:get{1} ~= nil and     \
> +                   box.space.sync:get{1}[1] == 1 end, 100)
> + | ---
> + | - true
> + | ...

I suggest you use wait_lsn('replica2', 'master') here
instead of this bulky wait_cond.
First of all, it takes a single line, instead of 4 lines.

Secondly, you forgot to test `box.space.sync.index.pk ~= nil`, meaning
the test will still fail occasionally, when index creation doesn't 
replicate in time.

> +box.space.sync:select{}
> + | ---
> + | - - [1]
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica2')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica2')
> + | ---
> + | - true
> + | ...
> diff --git a/test/replication/gh-6035-applier-filter.test.lua b/test/replication/gh-6035-applier-filter.test.lua
> new file mode 100644
> index 000000000..9bfd91288
> --- /dev/null
> +++ b/test/replication/gh-6035-applier-filter.test.lua
> @@ -0,0 +1,68 @@
> +--
> +-- gh-6035: verify synchronous rows filtration in applier,
> +-- we need to be sure that filtering synchronous rows is
> +-- done via transaction initiator not sender (iow via
> +-- xrow->replica_id).
> +--
> +test_run = require('test_run').new()
> +
> +--
> +-- Prepare a scheme with transitional node
> +--
> +--  master <=> replica1 => replica2
> +--
> +-- such as transaction initiated on the master node would
> +-- be replicated to the replica2 via interim replica1 node.
> +--
> +
> +test_run:cmd('create server master with script="replication/gh-6035-master.lua"')
> +test_run:cmd('create server replica1 with script="replication/gh-6035-replica1.lua"')
> +test_run:cmd('create server replica2 with script="replication/gh-6035-replica2.lua"')
> +
> +test_run:cmd('start server master')
> +test_run:cmd('start server replica1')
> +test_run:cmd('start server replica2')
> +
> +test_run:switch('replica2')
> +box.cfg{replication = {require('fio').cwd() .. "/replica1.sock"}}
> +
> +--
> +-- Make the master to be RAFT leader, this drops connection
> +-- to the replica2.
> +test_run:switch('master')
> +box.cfg({                                               \
> +    replication = {                                     \
> +            require('fio').cwd() .. "/master.sock",     \
> +            require('fio').cwd() .. "/replica1.sock",   \
> +        },                                              \
> +    replication_synchro_quorum = 2,                     \
> +    election_mode = 'manual',                           \
> +})
> +
> +box.ctl.promote()
> +_ = box.schema.space.create("sync", {is_sync = true})
> +_ = box.space.sync:create_index("pk")
> +box.space.sync:insert{1}
> +
> +--
> +-- The first hop is replica1.
> +test_run:switch('replica1')
> +box.space.sync:select{}
> +
> +--
> +-- And the second hop is replica2 where
> +-- replica1 replicated the data to us.
> +test_run:switch('replica2')
> +test_run:wait_cond(function() return                    \
> +                   box.space.sync ~= nil and            \
> +                   box.space.sync:get{1} ~= nil and     \
> +                   box.space.sync:get{1}[1] == 1 end, 100)
> +box.space.sync:select{}
> +
> +test_run:switch('default')
> +test_run:cmd('stop server master')
> +test_run:cmd('delete server master')
> +test_run:cmd('stop server replica1')
> +test_run:cmd('delete server replica1')
> +test_run:cmd('stop server replica2')
> +test_run:cmd('delete server replica2')
> diff --git a/test/replication/gh-6035-master.lua b/test/replication/gh-6035-master.lua
> new file mode 120000
> index 000000000..f7ede7ef2
> --- /dev/null
> +++ b/test/replication/gh-6035-master.lua
> @@ -0,0 +1 @@
> +gh-6035-node.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-6035-node.lua b/test/replication/gh-6035-node.lua
> new file mode 100644
> index 000000000..e3819471a
> --- /dev/null
> +++ b/test/replication/gh-6035-node.lua
> @@ -0,0 +1,35 @@
> +local SOCKET_DIR = require('fio').cwd()
> +local INSTANCE_ID = string.match(arg[0], "gh%-6035%-(.+)%.lua")
> +
> +local function unix_socket(name)
> +    return SOCKET_DIR .. "/" .. name .. '.sock';
> +end
> +
> +require('console').listen(os.getenv('ADMIN'))
> +
> +if INSTANCE_ID == "master" then
> +    box.cfg({
> +        listen = unix_socket("master"),
> +    })
> +elseif INSTANCE_ID == "replica1" then
> +    box.cfg({
> +        listen = unix_socket("replica1"),
> +        replication = {
> +            unix_socket("master"),
> +            unix_socket("replica1")
> +        },
> +        election_mode = 'voter'
> +    })
> +else
> +    assert(INSTANCE_ID == "replica2")
> +    box.cfg({
> +        replication = {
> +            unix_socket("master"),
> +        },
> +        election_mode = 'voter'
> +    })
> +end
> +
> +box.once("bootstrap", function()
> +    box.schema.user.grant('guest', 'super')
> +end)
> diff --git a/test/replication/gh-6035-replica1.lua b/test/replication/gh-6035-replica1.lua
> new file mode 120000
> index 000000000..f7ede7ef2
> --- /dev/null
> +++ b/test/replication/gh-6035-replica1.lua
> @@ -0,0 +1 @@
> +gh-6035-node.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-6035-replica2.lua b/test/replication/gh-6035-replica2.lua
> new file mode 120000
> index 000000000..f7ede7ef2
> --- /dev/null
> +++ b/test/replication/gh-6035-replica2.lua
> @@ -0,0 +1 @@
> +gh-6035-node.lua
> \ No newline at end of file
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index 27eab20c2..55ec022ff 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -47,6 +47,9 @@
>       "gh-6032-promote-wal-write.test.lua": {},
>       "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
>       "gh-6094-rs-uuid-mismatch.test.lua": {},
> +    "gh-6035-applier-filter.test.lua": {
> +        "memtx": {"engine": "memtx"}
> +    },
>       "*": {
>           "memtx": {"engine": "memtx"},
>           "vinyl": {"engine": "vinyl"}

-- 
Serge Petrenko


  reply	other threads:[~2021-06-15 14:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15 13:56 [Tarantool-patches] [PATCH v9 0/1] " Cyrill Gorcunov via Tarantool-patches
2021-06-15 13:56 ` [Tarantool-patches] [PATCH v9 1/1] " Cyrill Gorcunov via Tarantool-patches
2021-06-15 14:26   ` Serge Petrenko via Tarantool-patches [this message]
2021-06-15 17:02     ` Cyrill Gorcunov via Tarantool-patches
2021-06-15 18:08       ` Cyrill Gorcunov via Tarantool-patches
2021-06-16  8:31         ` Serge Petrenko via Tarantool-patches
2021-06-16  8:43           ` Cyrill Gorcunov via Tarantool-patches
2021-06-16  8:16       ` Serge Petrenko via Tarantool-patches
2021-06-18 21:49 ` [Tarantool-patches] [PATCH v9 0/1] " Vladislav Shpilevoy via Tarantool-patches
2021-06-18 22:16   ` Cyrill Gorcunov via Tarantool-patches
2021-06-18 22:58     ` Vladislav Shpilevoy via Tarantool-patches
2021-06-19 11:01       ` Cyrill Gorcunov via Tarantool-patches
2021-06-20 14:50         ` Vladislav Shpilevoy via Tarantool-patches
2021-06-20 17:17           ` Cyrill Gorcunov via Tarantool-patches
2021-06-21 21:06 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c692d3a3-f430-8982-1d31-fc4670d59098@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git