Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Yan Shtunder <ya.shtunder@gmail.com>,
	tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum
Date: Mon, 25 Oct 2021 16:32:01 +0300	[thread overview]
Message-ID: <665ef444-80de-4848-f9f0-a3ccc6e7c059@tarantool.org> (raw)
In-Reply-To: <20211025095223.22521-1-ya.shtunder@gmail.com>



25.10.2021 12:52, Yan Shtunder via Tarantool-patches пишет:

Hi! Good job on porting the test to the current luatest version!
Please, find a couple of comments below.

> Transactions have to committed after they reaches quorum of "real"

Nit: better say "Transactions should be committed".
reaches -> reach.

> cluster members. Therefore, anonymous replicas don't have to
> participate in the quorum.
>
> Closes #5418
> ---
> Issue: https://github.com/tarantool/tarantool/issues/5418
> Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas
>
>   src/box/relay.cc                          |  3 +-
>   test/replication-luatest/gh_5418_test.lua | 82 +++++++++++++++++++++++
>   2 files changed, 84 insertions(+), 1 deletion(-)
>   create mode 100644 test/replication-luatest/gh_5418_test.lua
>
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index f5852df7b..cf569e8e2 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg)
>   	struct replication_ack ack;
>   	ack.source = status->relay->replica->id;
>   	ack.vclock = &status->vclock;
> +	bool anon = status->relay->replica->anon;
>   	/*
>   	 * Let pending synchronous transactions know, which of
>   	 * them were successfully sent to the replica. Acks are
> @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg)
>   	 * the single master in 100% so far). Other instances wait
>   	 * for master's CONFIRM message instead.
>   	 */
> -	if (txn_limbo.owner_id == instance_id) {
> +	if (txn_limbo.owner_id == instance_id && !anon) {
>   		txn_limbo_ack(&txn_limbo, ack.source,
>   			      vclock_get(ack.vclock, instance_id));
>   	}

I can't build your patch to test it manually, compilation fails with 
some ERRINJ-related errors.

Seems like the commit "replication: fill replicaset.applier.vclock after 
local recovery"
you have on the branch is extraneous. And it causes the error.

Please remove it.

> diff --git a/test/replication-luatest/gh_5418_test.lua b/test/replication-luatest/gh_5418_test.lua
> new file mode 100644
> index 000000000..265d28ccb
> --- /dev/null
> +++ b/test/replication-luatest/gh_5418_test.lua

Please, find a more informative test name.
For example, "gh_5418_qsync_with_anon_test.lua*

> @@ -0,0 +1,82 @@
> +local fio = require('fio')
> +local log = require('log')
> +local fiber = require('fiber')
> +local t = require('luatest')
> +local cluster = require('test.luatest_helpers.cluster')
> +local helpers = require('test.luatest_helpers.helpers')
> +
> +local g = t.group('gh-5418')
> +
> +g.before_test('test_qsync_with_anon', function()
> +    g.cluster = cluster:new({})
> +
> +    local box_cfg = {
> +        replication         = {helpers.instance_uri('master')},
> +        replication_synchro_quorum = 2,
> +        replication_timeout = 0.1
> +    }
> +
> +    g.master = g.cluster:build_server({alias = 'master'}, engine, box_cfg)
> +
> +    local box_cfg = {
> +        replication         = {
> +            helpers.instance_uri('master'),
> +            helpers.instance_uri('replica')
> +        },
> +        replication_timeout = 0.1,
> +        replication_connect_timeout = 0.5,
> +        read_only           = true,
> +        replication_anon    = true
> +    }
> +
> +    g.replica = g.cluster:build_server({alias = 'replica'}, engine, box_cfg)
> +
> +    g.cluster:join_server(g.master)
> +    g.cluster:join_server(g.replica)
> +    g.cluster:start()
> +    log.info('Everything is started')
> +end)
> +
> +g.after_test('test_qsync_with_anon', function()
> +    g.cluster:stop()
> +    fio.rmtree(g.master.workdir)
> +    fio.rmtree(g.replica.workdir)
> +end)
> +
> +local function wait_vclock(timeout)
> +    local started_at = fiber.clock()
> +    local lsn = g.master:eval("return box.info.vclock[1]")
> +
> +    local _, tbl = g.master:eval("return next(box.info.replication_anon())")
> +    local to_lsn = tbl.downstream.vclock[1]
> +
> +    while to_lsn == nil or to_lsn < lsn do
> +        fiber.sleep(0.001)
> +
> +        if (fiber.clock() - started_at) > timeout then
> +            return false
> +        end
> +
> +        _, tbl = g.master:eval("return next(box.info.replication_anon())")
> +        to_lsn = tbl.downstream.vclock[1]
> +
> +        log.info(string.format("master lsn: %d; replica_anon lsn: %d",
> +            lsn, to_lsn))
> +    end
> +
> +    return true
> +end
> +
> +g.test_qsync_with_anon = function()
> +    g.master:eval("box.schema.space.create('sync', {is_sync = true})")
> +    g.master:eval("box.space.sync:create_index('pk')")
> +
> +    t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out",
> +        function() g.master:eval("return box.space.sync:insert{1}") end)
> +
> +    -- Wait until everything is replicated from the master to the replica
> +    t.assert(wait_vclock(1))

Please, use `t.helpers.retrying()` here.
It receives a timeout and a function to call.
Like `t.helpter.retrying({timeout=5}, wait_vclock)`
And wait_vclock should simply return true or false based on
whether the replica has reached master's vclock.

Also, please choose a bigger timeout. Like 5 or 10 seconds.
Otherwise the test will be flaky on slow testing machines in our CI.

> +
> +    t.assert_equals(g.master:eval("return box.space.sync:select()"), {})
> +    t.assert_equals(g.replica:eval("return box.space.sync:select()"), {})
> +end
> --
> 2.25.1
>

-- 
Serge Petrenko


  reply	other threads:[~2021-10-25 13:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25  9:52 Yan Shtunder via Tarantool-patches
2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches [this message]
     [not found]   ` <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com>
2021-10-29  8:06     ` Serge Petrenko via Tarantool-patches
     [not found]       ` <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com>
2021-11-03 15:01         ` Serge Petrenko via Tarantool-patches
2021-11-11 15:04           ` sergos via Tarantool-patches
2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=665ef444-80de-4848-f9f0-a3ccc6e7c059@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=sergepetrenko@tarantool.org \
    --cc=ya.shtunder@gmail.com \
    --subject='Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox