Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Sergey Bronnikov <sergeyb@tarantool.org>,
	tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 3/4] replication: add test with random leaders promotion and demotion
Date: Wed, 25 Nov 2020 23:02:23 +0100	[thread overview]
Message-ID: <966e3ab6-036f-bb70-d35a-ab9a37d6ed28@tarantool.org> (raw)
In-Reply-To: <7164d167-4386-4ea7-ae33-e19ca1f8b326@tarantool.org>

Hi! Thanks for the patch!

See 3 comments below.

> diff --git a/test/replication/qsync_random_leader.result b/test/replication/qsync_random_leader.result
> new file mode 100644
> index 000000000..2d242e802
> --- /dev/null
> +++ b/test/replication/qsync_random_leader.result
> @@ -0,0 +1,137 @@
> +-- test-run result file version 2
> +os = require('os')
> + | ---
> + | ...
> +env = require('test_run')
> + | ---
> + | ...
> +math = require('math')
> + | ---
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +test_run = env.new()
> + | ---
> + | ...
> +netbox = require('net.box')
> + | ---
> + | ...
> +
> +NUM_INSTANCES = 5
> + | ---
> + | ...
> +SERVERS = {}
> + | ---
> + | ...
> +for i=1,NUM_INSTANCES do                                                       \

1. Here are some more whitespaces: '                              ',
'               ', '               ', '               '. Should be
enough for this loop and the loops below.

> +    SERVERS[i] = 'qsync' .. i                                                  \
> +end
> + | ---
> + | ...
> +SERVERS -- print instance names
> + | ---
> + | - - qsync1
> + |   - qsync2
> + |   - qsync3
> + |   - qsync4
> + |   - qsync5
> + | ...
> +
> +math.randomseed(os.time())
> + | ---
> + | ...
> +function random(excluded_num, total)                                           \
> +    local r = math.random(1, total)                                            \
> +    if (r == excluded_num) then                                                \

2. In Lua we never use () for condition bodies.

> +        return random(excluded_num, total)                                     \
> +    end                                                                        \
> +    return r                                                                   \
> +end
> + | ---
> + | ...
> +
> +-- Set 'broken' quorum on current leader.
> +-- Write value on current leader.
> +-- Pick a random replica in a cluster.
> +-- Set 'good' quorum on it and promote to a leader.
> +-- Make sure value is there and on an old leader.
> +
> +-- Testcase setup.
> +test_run:create_cluster(SERVERS)
> + | ---
> + | ...
> +test_run:wait_fullmesh(SERVERS)
> + | ---
> + | ...
> +test_run:switch('qsync1')
> + | ---
> + | - true
> + | ...
> +_ = box.schema.space.create('sync', {is_sync=true, engine = test_run:get_cfg('engine')})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('primary')
> + | ---
> + | ...
> +box.schema.user.grant('guest', 'write', 'space', 'sync')
> + | ---
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +current_leader_id = 1
> + | ---
> + | ...
> +test_run:eval(SERVERS[current_leader_id], "box.ctl.clear_synchro_queue()")
> + | ---
> + | - []
> + | ...
> +
> +SOCKET_DIR = require('fio').cwd()
> + | ---
> + | ...
> +
> +-- Testcase body.
> +for i=1,30 do                                                                  \
> +    test_run:eval(SERVERS[current_leader_id],                                  \
> +        "box.cfg{replication_synchro_quorum=6, replication_synchro_timeout=1000}") \
> +    c = netbox.connect(SOCKET_DIR..'/'..SERVERS[current_leader_id]..'.sock')   \
> +    fiber.create(function() c.space.sync:insert{i} end)                        \
> +    new_leader_id = random(current_leader_id, #SERVERS)                        \
> +    test_run:eval(SERVERS[new_leader_id],                                      \
> +        "box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=0.01}") \
> +    test_run:eval(SERVERS[new_leader_id], "box.ctl.clear_synchro_queue()")     \

3. There is a little issue, which makes this code test not what it is
supposed to test, I suspect. When you do c.space.sync:insert{i} in a new
fiber, you don't wait until the insertion is replicated to a new leader.

So it can happen, that when you call box.ctl.clear_synchro_queue() on the
new leader, the queue here is empty or contains some old records from
the previous iterations. Not the value from the current iteration.

If the test is supposed to commit the record of the old leader on the
new leader in the same iteration, then it is likely working by luck.

I suggest you to turn on the mvcc engine to turn off dirty reads to make
sure you don't treat a dirty read as committed read. And rely entirely on
LSNs to wait for replication. We have wait_lsn and get_lsn test_run methods
for that. In fact, it is always better to rely on LSNs. Because dirty reads
may be turned off by default someday, and we will need to rewrite such
tests.

> +    c:close()                                                                  \
> +    replica = random(new_leader_id, #SERVERS)                                  \
> +    test_run:wait_cond(function() return test_run:eval(SERVERS[replica],       \
> +                       string.format("box.space.sync:get{%d}", i))[1] ~= nil end)  \
> +    test_run:wait_cond(function() return test_run:eval(SERVERS[current_leader_id], \
> +                       string.format("box.space.sync:get{%d}", i))[1] ~= nil end)  \
> +    new_leader_id = random(current_leader_id, #SERVERS)                        \
> +    current_leader_id = new_leader_id                                          \
> +end
> + | ---
> + | ...
> +
> +for i=1,NUM_INSTANCES do                                                       \
> +    instance = string.format('qsync%d', i)                                     \
> +    test_run:wait_cond(function() return test_run:eval(instance,               \
> +                       "box.space.sync:count()")[1] == 30 end)                 \
> +end

  reply	other threads:[~2020-11-25 22:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 16:13 [Tarantool-patches] [PATCH 0/4 v3] Additional qsync tests sergeyb
2020-11-17 16:13 ` [Tarantool-patches] [PATCH 1/4] replication: run clear_synchro_queue() with unconfigured box sergeyb
2020-11-21 14:41   ` Vladislav Shpilevoy
2020-11-23 14:44     ` Sergey Bronnikov
2020-11-17 16:13 ` [Tarantool-patches] [PATCH 2/4] replication: test clear_synchro_queue() function sergeyb
2020-11-21 14:41   ` Vladislav Shpilevoy
2020-11-23 15:13     ` Sergey Bronnikov
2020-11-17 16:13 ` [Tarantool-patches] [PATCH 3/4] replication: add test with random leaders promotion and demotion sergeyb
2020-11-21 14:41   ` Vladislav Shpilevoy
2020-11-24  8:39     ` Sergey Bronnikov
2020-11-25 22:02       ` Vladislav Shpilevoy [this message]
2020-11-17 16:13 ` [Tarantool-patches] [PATCH 4/4] replication: add test with change space sync mode in a loop sergeyb
2020-11-21 14:41   ` Vladislav Shpilevoy
2020-11-23 15:42     ` Sergey Bronnikov
2020-11-25 22:02   ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=966e3ab6-036f-bb70-d35a-ab9a37d6ed28@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 3/4] replication: add test with random leaders promotion and demotion' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox