From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Cyrill Gorcunov <gorcunov@gmail.com>, tml <tarantool-patches@dev.tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v5] applier: filter incoming synchro packets via transaction initiator Date: Tue, 8 Jun 2021 20:52:46 +0200 [thread overview] Message-ID: <55c38f4a-7c07-4b15-f990-1d2965aedab6@tarantool.org> (raw) In-Reply-To: <20210608180833.211678-1-gorcunov@gmail.com> Hi! Good job on the patch! See 7 comments below. On 08.06.2021 20:08, Cyrill Gorcunov wrote: > Currently we use synchro packets filtration based on their contents, > in particular by their xrow->replica_id value. Still there was a > question if we can optimize this moment and rather filter our all 1. our -> out. > packets coming from non-leader replica. > > diff --git a/test/replication/gh-6035-applier-filter.result b/test/replication/gh-6035-applier-filter.result > new file mode 100644 > index 000000000..cf032c332 > --- /dev/null > +++ b/test/replication/gh-6035-applier-filter.result > @@ -0,0 +1,147 @@ > +-- test-run result file version 2 > +-- > +-- gh-6035: verify synchronous rows filtration in applier, > +-- we need to be sure that filtering synchronous rows is > +-- done via transaction initiator not sender (iow via > +-- xrow->replica_id). > +-- > + > +local SOCKET_DIR = require('fio').cwd() > + | --- > + | ... > +local function unix_socket(name) \ > + return SOCKET_DIR .. "/" .. name .. '.sock' \ > +end 2. Both this function and SOCKET_DIR are not needed here. Because you declared them as 'local', and they go out of scope on the next line, and are deleted. In test-run's non-tap tests each line is a visibility scope. Like in the console. I deleted them from the test's code and the test passed. It works below because unix_socket() is already declared as global in your scripts calling box.cfg. Because it is global, you get luacheck warnings: Checking test/replication/gh6035node.lua 6 warnings test/replication/gh6035node.lua:4:10: (W111) setting non-standard global variable unix_socket test/replication/gh6035node.lua:12:18: (W113) accessing undefined variable unix_socket test/replication/gh6035node.lua:16:18: (W113) accessing undefined variable unix_socket test/replication/gh6035node.lua:18:13: (W113) accessing undefined variable unix_socket test/replication/gh6035node.lua:19:13: (W113) accessing undefined variable unix_socket test/replication/gh6035node.lua:27:13: (W113) accessing undefined variable unix_socket Maybe you could make the test not change box.cfg.replication? You could build the master <-> replica1 -> replica2 topology from the beginning. You start them as read-write, no elections. Replica1 would register on master. Replica2 would register on replcia1. Then you make master 'manual', 'replica1' voter, and 'replica2' voter. Then you promote the master and do the main part of the test. No box.cfg.replication was altered. > + | --- > + | ... > + > +env = require('test_run') > + | --- > + | ... > +test_run = env.new() 3. You don't need the env. You can do test_run = require('test_run').new() > + | --- > + | ... > +engine = test_run:get_cfg('engine') 4. You don't use the engine. Please, drop it, and make the test run only once, without 2 engines. > + | --- > + | ... > + > +test_run:cmd('create server gh6035master with script="replication/gh6035master.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server gh6035master with args="gh6035master"') 5. Filename is already passed as arg[0] into the script. Look at autobootstrap.lua files to see how they extract the instance name from the file name. > + | --- > + | - true > + | ... > + > +test_run:cmd('create server gh6035replica1 with script="replication/gh6035replica1.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server gh6035replica1 with args="gh6035replica1"') > + | --- > + | - true > + | ... > + > +test_run:cmd('create server gh6035replica2 with script="replication/gh6035replica2.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server gh6035replica2 with args="gh6035replica2"') > + | --- > + | - true > + | ... > + > +-- > +-- The gh6035master node connected to gh6035replica1 and gh6035replica2, > +-- where each of gh6035replicaX connected to the gh6035master only. > +-- > +-- Lets reroute gh6035replica2 to gh6035replica1 so gh6035replica1 gonna > +-- be sending data to gh6035replica2 as a transit hop from gh6035master. > +test_run:cmd('switch gh6035replica2') 6. There is a shortcut 'test_run:switch()'. I also propose to drop gh6035 prefix from the instance names (but keep it in the script file names). It is hard to read otherwise. Another proposal: change gh6035 to gh-6035 in the file names. Otherwise they are too far from the test itself in the file explorer. > + | --- > + | - true > + | ... > +box.cfg{replication = {unix_socket("gh6035replica1")}} > + | --- > + | ... > + > +-- > +-- Make the gh6035master to be RAFT leader. > +test_run:cmd('switch gh6035master') > + | --- > + | - true > + | ... > +box.cfg{ \ > + replication = { \ > + unix_socket("gh6035master"), \ > + unix_socket("gh6035replica1"), \ > + }, \ > + replication_synchro_quorum = 2, \ > + election_mode = 'manual', \ > +} > + | --- > + | ... > +box.ctl.promote() > + | --- > + | ... > +_ = box.schema.space.create("sync", {is_sync = true}) > + | --- > + | ... > +_ = box.space.sync:create_index("pk") > + | --- > + | ... > +box.space.sync:insert{1} > + | --- > + | - [1] > + | ... > + > +-- The first hop is gh6035replica1. > +test_run:cmd('switch gh6035replica1') > + | --- > + | - true > + | ... > +box.space.sync:select{} > + | --- > + | - - [1] > + | ... > + > +-- And the second hop is gh6035replica2 where > +-- gh6035replica1 replicated the data to us. > +test_run:cmd('switch gh6035replica2') > + | --- > + | - true > + | ... > +box.space.sync:select{} > + | --- > + | - - [1] 7. You can't be sure it is replicated to replica2 yet. If replica1's disk is slow, it might need more time to write it to WAL. You need to use test_run:wait_lsn() to be sure that replica2 got the transaction. Wait_lsn() must make replica2 wait for all data from the master node, not from replica2. Because replica2 didn't write anything own.
prev parent reply other threads:[~2021-06-08 18:52 UTC|newest] Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-08 18:08 Cyrill Gorcunov via Tarantool-patches 2021-06-08 18:52 ` Vladislav Shpilevoy via Tarantool-patches [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=55c38f4a-7c07-4b15-f990-1d2965aedab6@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v5] applier: filter incoming synchro packets via transaction initiator' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox