Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: sergeyb@tarantool.org, tarantool-patches@dev.tarantool.org,
	sergepetrenko@tarantool.org, gorcunov@gmail.com,
	lvasiliev@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion
Date: Tue, 21 Jul 2020 00:01:05 +0200	[thread overview]
Message-ID: <3c7b2274-8443-2be7-c181-2c7026ab0fec@tarantool.org> (raw)
In-Reply-To: <b6a0c7b42d71fd263c51d44c96748587092dda1e.1594314820.git.sergeyb@tarantool.org>

Thanks for the patch!

See 11 comments below.

> diff --git a/test/replication/qsync.lua b/test/replication/qsync.lua
> new file mode 100644
> index 000000000..383aa5272
> --- /dev/null
> +++ b/test/replication/qsync.lua
> @@ -0,0 +1,62 @@
> +#!/usr/bin/env tarantool
> +
> +-- get instance name from filename (qsync1.lua => qsync1)
> +local INSTANCE_ID = string.match(arg[0], "%d")
> +
> +local SOCKET_DIR = require('fio').cwd()
> +
> +local TIMEOUT = tonumber(arg[1])
> +
> +local function instance_uri(instance_id)
> +    return SOCKET_DIR..'/qsync'..instance_id..'.sock';
> +end
> +
> +-- start console first
> +require('console').listen(os.getenv('ADMIN'))
> +
> +box.cfg({
> +    listen = instance_uri(INSTANCE_ID);
> +    replication_timeout = TIMEOUT;

1. Why do you need the custom replication_timeout?

> +    replication_sync_lag = 0.01;

2. Why do you need the lag setting?

> +    replication_connect_quorum = 3;
> +    replication = {
> +        instance_uri(1);
> +        instance_uri(2);
> +        instance_uri(3);
> +        instance_uri(4);
> +        instance_uri(5);
> +        instance_uri(6);
> +        instance_uri(7);
> +        instance_uri(8);
> +        instance_uri(9);
> +        instance_uri(10);
> +        instance_uri(11);
> +        instance_uri(12);
> +        instance_uri(13);
> +        instance_uri(14);
> +        instance_uri(15);
> +        instance_uri(16);
> +        instance_uri(17);
> +        instance_uri(18);
> +        instance_uri(19);
> +        instance_uri(20);
> +        instance_uri(21);
> +        instance_uri(22);
> +        instance_uri(23);
> +        instance_uri(24);
> +        instance_uri(25);
> +        instance_uri(26);
> +        instance_uri(27);
> +        instance_uri(28);
> +        instance_uri(29);
> +        instance_uri(30);
> +        instance_uri(31);

3. Seems like in the test you use only 3 instances, not 32. Also the
quorum is set to 3.

> +    };
> +})
> +
> +box.once("bootstrap", function()
> +    local test_run = require('test_run').new()
> +    box.schema.user.grant("guest", 'replication')
> +    box.schema.space.create('test', {engine = test_run:get_cfg('engine')})
> +    box.space.test:create_index('primary')

4. Where do you use this space?

> +end)
> diff --git a/test/replication/qsync_random_leader.result b/test/replication/qsync_random_leader.result
> new file mode 100644
> index 000000000..cb1b5e232
> --- /dev/null
> +++ b/test/replication/qsync_random_leader.result
> @@ -0,0 +1,123 @@
> +-- test-run result file version 2
> +os = require('os')
> + | ---
> + | ...
> +env = require('test_run')
> + | ---
> + | ...
> +math = require('math')
> + | ---
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +test_run = env.new()
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +
> +NUM_INSTANCES = 3
> + | ---
> + | ...
> +BROKEN_QUORUM = NUM_INSTANCES + 1
> + | ---
> + | ...
> +
> +SERVERS = {}
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +for i=1,NUM_INSTANCES do
> +    SERVERS[i] = 'qsync' .. i
> +end;
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ''");

5. Please, lets be consistent and use either \ or the delimiter. Currently
it is irrational - you use \ for big code blocks, and a custom delimiter for
tiny blocks which could even be one line. Personally, I would use \
everywhere.

> + | ---
> + | - true
> + | ...
> +SERVERS -- print instance names
> + | ---
> + | - - qsync1
> + |   - qsync2
> + |   - qsync3
> + | ...
> +
> +random = function(excluded_num, min, max)       \

6. Would be better to align all \ by 80 in this file. Makes easier to add
new longer lines in future without moving all the old \.

> +    math.randomseed(os.time())                  \
> +    local r = math.random(min, max)             \
> +    if (r == excluded_num) then                 \
> +        return random(excluded_num, min, max)   \
> +    end                                         \
> +    return r                                    \
> +end
> + | ---
> + | ...
> +
> +test_run:create_cluster(SERVERS, "replication", {args="0.1"})
> + | ---
> + | ...
> +test_run:wait_fullmesh(SERVERS)
> + | ---
> + | ...
> +current_leader_id = 1
> + | ---
> + | ...
> +test_run:switch(SERVERS[current_leader_id])
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=0.1}

7. The timeout is tiny. It will lead to flakiness sooner or later, 100%.

> + | ---
> + | ...
> +_ = box.schema.space.create('sync', {is_sync=true})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('pk')
> + | ---
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +
> +-- Testcase body.
> +for i=1,10 do                                                 \
> +    new_leader_id = random(current_leader_id, 1, #SERVERS)    \
> +    test_run:switch(SERVERS[new_leader_id])                   \
> +    box.cfg{read_only=false}                                  \
> +    fiber = require('fiber')                                  \
> +    f1 = fiber.create(function() box.space.sync:delete{} end) \

8. Delete without a key will fail. You would notice it if you
would check results of the DML operations. Please, do that via pcall.

> +    f2 = fiber.create(function() for i=1,10000 do box.space.sync:insert{i} end end) \

9. You have \ exactly to avoid such long lines.

> +    f1.status()                                               \
> +    f2.status()                                               \

10. Output is not printed inside one statement. This whole cycle is
one statement because of \, so these status() calls are useless.

> +    test_run:switch('default')                                \
> +    test_run:switch(SERVERS[current_leader_id])               \
> +    box.cfg{read_only=true}                                   \
> +    test_run:switch('default')                                \
> +    current_leader_id = new_leader_id                         \
> +    fiber.sleep(0.1)                                          \

11. Why do you need this fiber.sleep()?

> +end
> + | ---
> + | ...
> +
> +-- Teardown.
> +test_run:switch(SERVERS[current_leader_id])
> + | ---
> + | - true
> + | ...
> +box.space.sync:drop()
> + | ---
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:drop_cluster(SERVERS)
> + | ---
> + | ...

  parent reply	other threads:[~2020-07-20 22:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1594314820.git.sergeyb@tarantool.org>
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 1/3] replication: print number of txs in limbo before its clear sergeyb
2020-07-09 20:07   ` Vladislav Shpilevoy
2020-07-10 12:55     ` Sergey Bronnikov
2020-07-20 21:59   ` Vladislav Shpilevoy
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function sergeyb
2020-07-20 22:00   ` Vladislav Shpilevoy
2020-08-25 12:49     ` Sergey Bronnikov
2020-08-26  7:31       ` Serge Petrenko
2020-08-26 14:48         ` Sergey Bronnikov
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion sergeyb
2020-07-09 20:07   ` Vladislav Shpilevoy
2020-07-10  8:05     ` Sergey Bronnikov
2020-07-20 22:01   ` Vladislav Shpilevoy [this message]
2020-08-26 14:45     ` Sergey Bronnikov
2020-08-27 10:49       ` Serge Petrenko
2020-08-31 10:05         ` Sergey Bronnikov
2020-09-01 13:03           ` Serge Petrenko
2020-11-12  9:32             ` Sergey Bronnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c7b2274-8443-2be7-c181-2c7026ab0fec@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=lvasiliev@tarantool.org \
    --cc=sergepetrenko@tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox