Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum
@ 2021-10-25  9:52 Yan Shtunder via Tarantool-patches
  2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches
  2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches
  0 siblings, 2 replies; 6+ messages in thread
From: Yan Shtunder via Tarantool-patches @ 2021-10-25  9:52 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Yan Shtunder

Transactions have to committed after they reaches quorum of "real"
cluster members. Therefore, anonymous replicas don't have to
participate in the quorum.

Closes #5418
---
Issue: https://github.com/tarantool/tarantool/issues/5418
Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas

 src/box/relay.cc                          |  3 +-
 test/replication-luatest/gh_5418_test.lua | 82 +++++++++++++++++++++++
 2 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 test/replication-luatest/gh_5418_test.lua

diff --git a/src/box/relay.cc b/src/box/relay.cc
index f5852df7b..cf569e8e2 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg)
 	struct replication_ack ack;
 	ack.source = status->relay->replica->id;
 	ack.vclock = &status->vclock;
+	bool anon = status->relay->replica->anon;
 	/*
 	 * Let pending synchronous transactions know, which of
 	 * them were successfully sent to the replica. Acks are
@@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg)
 	 * the single master in 100% so far). Other instances wait
 	 * for master's CONFIRM message instead.
 	 */
-	if (txn_limbo.owner_id == instance_id) {
+	if (txn_limbo.owner_id == instance_id && !anon) {
 		txn_limbo_ack(&txn_limbo, ack.source,
 			      vclock_get(ack.vclock, instance_id));
 	}
diff --git a/test/replication-luatest/gh_5418_test.lua b/test/replication-luatest/gh_5418_test.lua
new file mode 100644
index 000000000..265d28ccb
--- /dev/null
+++ b/test/replication-luatest/gh_5418_test.lua
@@ -0,0 +1,82 @@
+local fio = require('fio')
+local log = require('log')
+local fiber = require('fiber')
+local t = require('luatest')
+local cluster = require('test.luatest_helpers.cluster')
+local helpers = require('test.luatest_helpers.helpers')
+
+local g = t.group('gh-5418')
+
+g.before_test('test_qsync_with_anon', function()
+    g.cluster = cluster:new({})
+
+    local box_cfg = {
+        replication         = {helpers.instance_uri('master')},
+        replication_synchro_quorum = 2,
+        replication_timeout = 0.1
+    }
+
+    g.master = g.cluster:build_server({alias = 'master'}, engine, box_cfg)
+
+    local box_cfg = {
+        replication         = {
+            helpers.instance_uri('master'),
+            helpers.instance_uri('replica')
+        },
+        replication_timeout = 0.1,
+        replication_connect_timeout = 0.5,
+        read_only           = true,
+        replication_anon    = true
+    }
+
+    g.replica = g.cluster:build_server({alias = 'replica'}, engine, box_cfg)
+
+    g.cluster:join_server(g.master)
+    g.cluster:join_server(g.replica)
+    g.cluster:start()
+    log.info('Everything is started')
+end)
+
+g.after_test('test_qsync_with_anon', function()
+    g.cluster:stop()
+    fio.rmtree(g.master.workdir)
+    fio.rmtree(g.replica.workdir)
+end)
+
+local function wait_vclock(timeout)
+    local started_at = fiber.clock()
+    local lsn = g.master:eval("return box.info.vclock[1]")
+
+    local _, tbl = g.master:eval("return next(box.info.replication_anon())")
+    local to_lsn = tbl.downstream.vclock[1]
+
+    while to_lsn == nil or to_lsn < lsn do
+        fiber.sleep(0.001)
+
+        if (fiber.clock() - started_at) > timeout then
+            return false
+        end
+
+        _, tbl = g.master:eval("return next(box.info.replication_anon())")
+        to_lsn = tbl.downstream.vclock[1]
+
+        log.info(string.format("master lsn: %d; replica_anon lsn: %d",
+            lsn, to_lsn))
+    end
+
+    return true
+end
+
+g.test_qsync_with_anon = function()
+    g.master:eval("box.schema.space.create('sync', {is_sync = true})")
+    g.master:eval("box.space.sync:create_index('pk')")
+
+    t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out",
+        function() g.master:eval("return box.space.sync:insert{1}") end)
+
+    -- Wait until everything is replicated from the master to the replica
+    t.assert(wait_vclock(1))
+
+    t.assert_equals(g.master:eval("return box.space.sync:select()"), {})
+    t.assert_equals(g.replica:eval("return box.space.sync:select()"), {})
+end
--
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum
  2021-10-25  9:52 [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum Yan Shtunder via Tarantool-patches
@ 2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches
       [not found]   ` <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com>
  2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches
  1 sibling, 1 reply; 6+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-10-25 13:32 UTC (permalink / raw)
  To: Yan Shtunder, tarantool-patches



25.10.2021 12:52, Yan Shtunder via Tarantool-patches пишет:

Hi! Good job on porting the test to the current luatest version!
Please, find a couple of comments below.

> Transactions have to committed after they reaches quorum of "real"

Nit: better say "Transactions should be committed".
reaches -> reach.

> cluster members. Therefore, anonymous replicas don't have to
> participate in the quorum.
>
> Closes #5418
> ---
> Issue: https://github.com/tarantool/tarantool/issues/5418
> Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas
>
>   src/box/relay.cc                          |  3 +-
>   test/replication-luatest/gh_5418_test.lua | 82 +++++++++++++++++++++++
>   2 files changed, 84 insertions(+), 1 deletion(-)
>   create mode 100644 test/replication-luatest/gh_5418_test.lua
>
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index f5852df7b..cf569e8e2 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg)
>   	struct replication_ack ack;
>   	ack.source = status->relay->replica->id;
>   	ack.vclock = &status->vclock;
> +	bool anon = status->relay->replica->anon;
>   	/*
>   	 * Let pending synchronous transactions know, which of
>   	 * them were successfully sent to the replica. Acks are
> @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg)
>   	 * the single master in 100% so far). Other instances wait
>   	 * for master's CONFIRM message instead.
>   	 */
> -	if (txn_limbo.owner_id == instance_id) {
> +	if (txn_limbo.owner_id == instance_id && !anon) {
>   		txn_limbo_ack(&txn_limbo, ack.source,
>   			      vclock_get(ack.vclock, instance_id));
>   	}

I can't build your patch to test it manually, compilation fails with 
some ERRINJ-related errors.

Seems like the commit "replication: fill replicaset.applier.vclock after 
local recovery"
you have on the branch is extraneous. And it causes the error.

Please remove it.

> diff --git a/test/replication-luatest/gh_5418_test.lua b/test/replication-luatest/gh_5418_test.lua
> new file mode 100644
> index 000000000..265d28ccb
> --- /dev/null
> +++ b/test/replication-luatest/gh_5418_test.lua

Please, find a more informative test name.
For example, "gh_5418_qsync_with_anon_test.lua*

> @@ -0,0 +1,82 @@
> +local fio = require('fio')
> +local log = require('log')
> +local fiber = require('fiber')
> +local t = require('luatest')
> +local cluster = require('test.luatest_helpers.cluster')
> +local helpers = require('test.luatest_helpers.helpers')
> +
> +local g = t.group('gh-5418')
> +
> +g.before_test('test_qsync_with_anon', function()
> +    g.cluster = cluster:new({})
> +
> +    local box_cfg = {
> +        replication         = {helpers.instance_uri('master')},
> +        replication_synchro_quorum = 2,
> +        replication_timeout = 0.1
> +    }
> +
> +    g.master = g.cluster:build_server({alias = 'master'}, engine, box_cfg)
> +
> +    local box_cfg = {
> +        replication         = {
> +            helpers.instance_uri('master'),
> +            helpers.instance_uri('replica')
> +        },
> +        replication_timeout = 0.1,
> +        replication_connect_timeout = 0.5,
> +        read_only           = true,
> +        replication_anon    = true
> +    }
> +
> +    g.replica = g.cluster:build_server({alias = 'replica'}, engine, box_cfg)
> +
> +    g.cluster:join_server(g.master)
> +    g.cluster:join_server(g.replica)
> +    g.cluster:start()
> +    log.info('Everything is started')
> +end)
> +
> +g.after_test('test_qsync_with_anon', function()
> +    g.cluster:stop()
> +    fio.rmtree(g.master.workdir)
> +    fio.rmtree(g.replica.workdir)
> +end)
> +
> +local function wait_vclock(timeout)
> +    local started_at = fiber.clock()
> +    local lsn = g.master:eval("return box.info.vclock[1]")
> +
> +    local _, tbl = g.master:eval("return next(box.info.replication_anon())")
> +    local to_lsn = tbl.downstream.vclock[1]
> +
> +    while to_lsn == nil or to_lsn < lsn do
> +        fiber.sleep(0.001)
> +
> +        if (fiber.clock() - started_at) > timeout then
> +            return false
> +        end
> +
> +        _, tbl = g.master:eval("return next(box.info.replication_anon())")
> +        to_lsn = tbl.downstream.vclock[1]
> +
> +        log.info(string.format("master lsn: %d; replica_anon lsn: %d",
> +            lsn, to_lsn))
> +    end
> +
> +    return true
> +end
> +
> +g.test_qsync_with_anon = function()
> +    g.master:eval("box.schema.space.create('sync', {is_sync = true})")
> +    g.master:eval("box.space.sync:create_index('pk')")
> +
> +    t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out",
> +        function() g.master:eval("return box.space.sync:insert{1}") end)
> +
> +    -- Wait until everything is replicated from the master to the replica
> +    t.assert(wait_vclock(1))

Please, use `t.helpers.retrying()` here.
It receives a timeout and a function to call.
Like `t.helpter.retrying({timeout=5}, wait_vclock)`
And wait_vclock should simply return true or false based on
whether the replica has reached master's vclock.

Also, please choose a bigger timeout. Like 5 or 10 seconds.
Otherwise the test will be flaky on slow testing machines in our CI.

> +
> +    t.assert_equals(g.master:eval("return box.space.sync:select()"), {})
> +    t.assert_equals(g.replica:eval("return box.space.sync:select()"), {})
> +end
> --
> 2.25.1
>

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum
       [not found]   ` <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com>
@ 2021-10-29  8:06     ` Serge Petrenko via Tarantool-patches
       [not found]       ` <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-10-29  8:06 UTC (permalink / raw)
  To: Ян Штундер, tml



28.10.2021 18:56, Ян Штундер пишет:
> Hi! Thank you for the review!
> I have fixed the errors
>
>     Nit: better say "Transactions should be committed".
>     reaches -> reach.
>
>
> Transactions should be committed after they reach quorum of "real" 
> cluster members.
>
>     Please, find a more informative test name.
>     For example, "gh_5418_qsync_with_anon_test.lua*
>
>
> gh_5418_test.lua -> gh_5418_qsync_with_anon_test.lua
>
>     Please, use `t.helpers.retrying()` here.
>
>
>  I used the wait_vclock function from the luatest_helpers.lua file
>
> --
> Yan Shtunder

Good job on the fixes!
LGTM.


>
> пн, 25 окт. 2021 г. в 16:32, Serge Petrenko <sergepetrenko@tarantool.org>:
>
>
>
>     25.10.2021 12:52, Yan Shtunder via Tarantool-patches пишет:
>
>     Hi! Good job on porting the test to the current luatest version!
>     Please, find a couple of comments below.
>
>     > Transactions have to committed after they reaches quorum of "real"
>
>     Nit: better say "Transactions should be committed".
>     reaches -> reach.
>
>     > cluster members. Therefore, anonymous replicas don't have to
>     > participate in the quorum.
>     >
>     > Closes #5418
>     > ---
>     > Issue: https://github.com/tarantool/tarantool/issues/5418
>     > Patch:
>     https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas
>     >
>     >   src/box/relay.cc                          |  3 +-
>     >   test/replication-luatest/gh_5418_test.lua | 82
>     +++++++++++++++++++++++
>     >   2 files changed, 84 insertions(+), 1 deletion(-)
>     >   create mode 100644 test/replication-luatest/gh_5418_test.lua
>     >
>     > diff --git a/src/box/relay.cc b/src/box/relay.cc
>     > index f5852df7b..cf569e8e2 100644
>     > --- a/src/box/relay.cc
>     > +++ b/src/box/relay.cc
>     > @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg)
>     >       struct replication_ack ack;
>     >       ack.source = status->relay->replica->id;
>     >       ack.vclock = &status->vclock;
>     > +     bool anon = status->relay->replica->anon;
>     >       /*
>     >        * Let pending synchronous transactions know, which of
>     >        * them were successfully sent to the replica. Acks are
>     > @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg)
>     >        * the single master in 100% so far). Other instances wait
>     >        * for master's CONFIRM message instead.
>     >        */
>     > -     if (txn_limbo.owner_id == instance_id) {
>     > +     if (txn_limbo.owner_id == instance_id && !anon) {
>     >               txn_limbo_ack(&txn_limbo, ack.source,
>     >                             vclock_get(ack.vclock, instance_id));
>     >       }
>
>     I can't build your patch to test it manually, compilation fails with
>     some ERRINJ-related errors.
>
>     Seems like the commit "replication: fill replicaset.applier.vclock
>     after
>     local recovery"
>     you have on the branch is extraneous. And it causes the error.
>
>     Please remove it.
>
>     > diff --git a/test/replication-luatest/gh_5418_test.lua
>     b/test/replication-luatest/gh_5418_test.lua
>     > new file mode 100644
>     > index 000000000..265d28ccb
>     > --- /dev/null
>     > +++ b/test/replication-luatest/gh_5418_test.lua
>
>     Please, find a more informative test name.
>     For example, "gh_5418_qsync_with_anon_test.lua*
>
>     > @@ -0,0 +1,82 @@
>     > +local fio = require('fio')
>     > +local log = require('log')
>     > +local fiber = require('fiber')
>     > +local t = require('luatest')
>     > +local cluster = require('test.luatest_helpers.cluster')
>     > +local helpers = require('test.luatest_helpers.helpers')
>     > +
>     > +local g = t.group('gh-5418')
>     > +
>     > +g.before_test('test_qsync_with_anon', function()
>     > +    g.cluster = cluster:new({})
>     > +
>     > +    local box_cfg = {
>     > +        replication         = {helpers.instance_uri('master')},
>     > +        replication_synchro_quorum = 2,
>     > +        replication_timeout = 0.1
>     > +    }
>     > +
>     > +    g.master = g.cluster:build_server({alias = 'master'},
>     engine, box_cfg)
>     > +
>     > +    local box_cfg = {
>     > +        replication         = {
>     > +            helpers.instance_uri('master'),
>     > +            helpers.instance_uri('replica')
>     > +        },
>     > +        replication_timeout = 0.1,
>     > +        replication_connect_timeout = 0.5,
>     > +        read_only           = true,
>     > +        replication_anon    = true
>     > +    }
>     > +
>     > +    g.replica = g.cluster:build_server({alias = 'replica'},
>     engine, box_cfg)
>     > +
>     > +    g.cluster:join_server(g.master)
>     > +    g.cluster:join_server(g.replica)
>     > +    g.cluster:start()
>     > + log.info <http://log.info>('Everything is started')
>     > +end)
>     > +
>     > +g.after_test('test_qsync_with_anon', function()
>     > +    g.cluster:stop()
>     > +    fio.rmtree(g.master.workdir)
>     > +    fio.rmtree(g.replica.workdir)
>     > +end)
>     > +
>     > +local function wait_vclock(timeout)
>     > +    local started_at = fiber.clock()
>     > +    local lsn = g.master:eval("return box.info.vclock[1]")
>     > +
>     > +    local _, tbl = g.master:eval("return
>     next(box.info.replication_anon())")
>     > +    local to_lsn = tbl.downstream.vclock[1]
>     > +
>     > +    while to_lsn == nil or to_lsn < lsn do
>     > +        fiber.sleep(0.001)
>     > +
>     > +        if (fiber.clock() - started_at) > timeout then
>     > +            return false
>     > +        end
>     > +
>     > +        _, tbl = g.master:eval("return
>     next(box.info.replication_anon())")
>     > +        to_lsn = tbl.downstream.vclock[1]
>     > +
>     > + log.info <http://log.info>(string.format("master lsn: %d;
>     replica_anon lsn: %d",
>     > +            lsn, to_lsn))
>     > +    end
>     > +
>     > +    return true
>     > +end
>     > +
>     > +g.test_qsync_with_anon = function()
>     > +    g.master:eval("box.schema.space.create('sync', {is_sync =
>     true})")
>     > +    g.master:eval("box.space.sync:create_index('pk')")
>     > +
>     > +    t.assert_error_msg_content_equals("Quorum collection for a
>     synchronous transaction is timed out",
>     > +        function() g.master:eval("return
>     box.space.sync:insert{1}") end)
>     > +
>     > +    -- Wait until everything is replicated from the master to
>     the replica
>     > +    t.assert(wait_vclock(1))
>
>     Please, use `t.helpers.retrying()` here.
>     It receives a timeout and a function to call.
>     Like `t.helpter.retrying({timeout=5}, wait_vclock)`
>     And wait_vclock should simply return true or false based on
>     whether the replica has reached master's vclock.
>
>     Also, please choose a bigger timeout. Like 5 or 10 seconds.
>     Otherwise the test will be flaky on slow testing machines in our CI.
>
>     > +
>     > +    t.assert_equals(g.master:eval("return
>     box.space.sync:select()"), {})
>     > +    t.assert_equals(g.replica:eval("return
>     box.space.sync:select()"), {})
>     > +end
>     > --
>     > 2.25.1
>     >
>
>     -- 
>     Serge Petrenko
>

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum
       [not found]       ` <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com>
@ 2021-11-03 15:01         ` Serge Petrenko via Tarantool-patches
  2021-11-11 15:04           ` sergos via Tarantool-patches
  0 siblings, 1 reply; 6+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-11-03 15:01 UTC (permalink / raw)
  To: Ян Штундер, tml



03.11.2021 13:05, Ян Штундер пишет:
> Thanks for the comments!
>  I corrected your remarks.

Thanks for the changes!
LGTM.
>
> Diff:
>
> +++ b/test/replication-luatest/gh_5418_qsync_with_anon_test.lua
> @@ -0,0 +1,62 @@
> +local t = require('luatest')
> +local cluster = require('test.luatest_helpers.cluster')
> +local helpers = require('test.luatest_helpers')
> +
> +local g = t.group('gh-5418', {{engine = 'memtx'}, {engine = 'vinyl'}})
> +
> +g.before_each(function(cg)
> +    local engine = cg.params.engine
> +
> +    cg.cluster = cluster:new({})
> +
> +    local box_cfg = {
> +        replication         = {
> +            helpers.instance_uri('master')
> +        },
> +        replication_synchro_quorum = 2,
> +        replication_timeout = 1
> +    }
> +
> +    cg.master = cg.cluster:build_server({alias = 'master', engine = 
> engine, box_cfg = box_cfg})
> +
> +    local box_cfg = {
> +        replication         = {
> +            helpers.instance_uri('master'),
> +            helpers.instance_uri('replica')
> +        },
> +        replication_timeout = 1,
> +        replication_connect_timeout = 4,
> +        read_only           = true,
> +        replication_anon    = true
> +    }
> +
> +    cg.replica = cg.cluster:build_server({alias = 'replica', engine = 
> engine, box_cfg = box_cfg})
> +
> +    cg.cluster:add_server(cg.master)
> +    cg.cluster:add_server(cg.replica)
> +    cg.cluster:start()
> +end)
> +
> +
> +g.after_each(function(cg)
> +    cg.cluster.servers = nil
> +    cg.cluster:drop()
> +end)
> +
> +
> +g.test_qsync_with_anon = function(cg)
> +    cg.master:eval("box.schema.space.create('sync', {is_sync = true})")
> +    cg.master:eval("box.space.sync:create_index('pk')")
> +    cg.master:eval("box.ctl.promote()")
> +
> +    t.assert_error_msg_content_equals("Quorum collection for a 
> synchronous transaction is timed out",
> +        function() cg.master:eval("return box.space.sync:insert{1}") end)
> +
> +    -- Wait until everything is replicated from the master to the replica
> +    local vclock = cg.master:eval("return box.info.vclock")
> +    vclock[0] = nil
> +    helpers:wait_vclock(cg.replica, vclock)
> +
> +    t.assert_equals(cg.master:eval("return box.space.sync:select()"), {})
> +    t.assert_equals(cg.replica:eval("return 
> box.space.sync:select()"), {})
> +end
> diff --git a/test/replication/qsync_with_anon.result 
> b/test/replication/qsync_with_anon.result
> deleted file mode 100644
> index 99c6fb902..000000000
> --- a/test/replication/qsync_with_anon.result
> +++ /dev/null
> @@ -1,231 +0,0 @@
> --- test-run result file version 2
> -env = require('test_run')
> - | ---
> - | ...
> -test_run = env.new()
> - | ---
> - | ...
> -engine = test_run:get_cfg('engine')
> - | ---
> - | ...
> -
> -orig_synchro_quorum = box.cfg.replication_synchro_quorum
> - | ---
> - | ...
> -orig_synchro_timeout = box.cfg.replication_synchro_timeout
> - | ---
> - | ...
> -
> -NUM_INSTANCES = 2
> - | ---
> - | ...
> -BROKEN_QUORUM = NUM_INSTANCES + 1
> - | ---
> - | ...
> -
> -box.schema.user.grant('guest', 'replication')
> - | ---
> - | ...
> -
> --- Setup a cluster with anonymous replica.
> -test_run:cmd('create server replica_anon with rpl_master=default, 
> script="replication/anon1.lua"')
> - | ---
> - | - true
> - | ...
> -test_run:cmd('start server replica_anon')
> - | ---
> - | - true
> - | ...
> -test_run:cmd('switch replica_anon')
> - | ---
> - | - true
> - | ...
> -
> --- [RFC, Asynchronous replication] successful transaction applied on 
> async
> --- replica.
> --- Testcase setup.
> -test_run:switch('default')
> - | ---
> - | - true
> - | ...
> -box.cfg{replication_synchro_quorum=NUM_INSTANCES, 
> replication_synchro_timeout=1000}
> - | ---
> - | ...
> -_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> - | ---
> - | ...
> -_ = box.space.sync:create_index('pk')
> - | ---
> - | ...
> -box.ctl.promote()
> - | ---
> - | ...
> --- Testcase body.
> -test_run:switch('default')
> - | ---
> - | - true
> - | ...
> -box.space.sync:insert{1} -- success
> - | ---
> - | - [1]
> - | ...
> -box.space.sync:insert{2} -- success
> - | ---
> - | - [2]
> - | ...
> -box.space.sync:insert{3} -- success
> - | ---
> - | - [3]
> - | ...
> -test_run:cmd('switch replica_anon')
> - | ---
> - | - true
> - | ...
> -box.space.sync:select{} -- 1, 2, 3
> - | ---
> - | - - [1]
> - |   - [2]
> - |   - [3]
> - | ...
> --- Testcase cleanup.
> -test_run:switch('default')
> - | ---
> - | - true
> - | ...
> -box.space.sync:drop()
> - | ---
> - | ...
> -
> --- [RFC, Asynchronous replication] failed transaction rolled back on 
> async
> --- replica.
> --- Testcase setup.
> -box.cfg{replication_synchro_quorum = NUM_INSTANCES, 
> replication_synchro_timeout = 1000}
> - | ---
> - | ...
> -_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> - | ---
> - | ...
> -_ = box.space.sync:create_index('pk')
> - | ---
> - | ...
> --- Write something to flush the current master's state to replica.
> -_ = box.space.sync:insert{1}
> - | ---
> - | ...
> -_ = box.space.sync:delete{1}
> - | ---
> - | ...
> -
> -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, 
> replication_synchro_timeout = 1000}
> - | ---
> - | ...
> -fiber = require('fiber')
> - | ---
> - | ...
> -ok, err = nil
> - | ---
> - | ...
> -f = fiber.create(function()                 \
> -    ok, err = pcall(box.space.sync.insert, box.space.sync, {1})       
>           \
> -end)
> - | ---
> - | ...
> -
> -test_run:cmd('switch replica_anon')
> - | ---
> - | - true
> - | ...
> -test_run:wait_cond(function() return box.space.sync:count() == 1 end)
> - | ---
> - | - true
> - | ...
> -box.space.sync:select{}
> - | ---
> - | - - [1]
> - | ...
> -
> -test_run:switch('default')
> - | ---
> - | - true
> - | ...
> -box.cfg{replication_synchro_timeout = 0.001}
> - | ---
> - | ...
> -test_run:wait_cond(function() return f:status() == 'dead' end)
> - | ---
> - | - true
> - | ...
> -box.space.sync:select{}
> - | ---
> - | - []
> - | ...
> -
> -test_run:cmd('switch replica_anon')
> - | ---
> - | - true
> - | ...
> -test_run:wait_cond(function() return box.space.sync:count() == 0 end)
> - | ---
> - | - true
> - | ...
> -box.space.sync:select{}
> - | ---
> - | - []
> - | ...
> -
> -test_run:switch('default')
> - | ---
> - | - true
> - | ...
> -box.cfg{replication_synchro_quorum=NUM_INSTANCES, 
> replication_synchro_timeout=1000}
> - | ---
> - | ...
> -box.space.sync:insert{1} -- success
> - | ---
> - | - [1]
> - | ...
> -test_run:cmd('switch replica_anon')
> - | ---
> - | - true
> - | ...
> -box.space.sync:select{} -- 1
> - | ---
> - | - - [1]
> - | ...
> --- Testcase cleanup.
> -test_run:switch('default')
> - | ---
> - | - true
> - | ...
> -box.space.sync:drop()
> - | ---
> - | ...
> -
> --- Teardown.
> -test_run:switch('default')
> - | ---
> - | - true
> - | ...
> -test_run:cmd('stop server replica_anon')
> - | ---
> - | - true
> - | ...
> -test_run:cmd('delete server replica_anon')
> - | ---
> - | - true
> - | ...
> -box.schema.user.revoke('guest', 'replication')
> - | ---
> - | ...
> -box.cfg{                  \
> -    replication_synchro_quorum = orig_synchro_quorum,                 \
> -    replication_synchro_timeout = orig_synchro_timeout,                 \
> -}
> - | ---
> - | ...
> -box.ctl.demote()
> - | ---
> - | ...
> -test_run:cleanup_cluster()
> - | ---
> - | ...
> diff --git a/test/replication/qsync_with_anon.test.lua 
> b/test/replication/qsync_with_anon.test.lua
> deleted file mode 100644
> index e73880ec7..000000000
> --- a/test/replication/qsync_with_anon.test.lua
> +++ /dev/null
> @@ -1,86 +0,0 @@
> -env = require('test_run')
> -test_run = env.new()
> -engine = test_run:get_cfg('engine')
> -
> -orig_synchro_quorum = box.cfg.replication_synchro_quorum
> -orig_synchro_timeout = box.cfg.replication_synchro_timeout
> -
> -NUM_INSTANCES = 2
> -BROKEN_QUORUM = NUM_INSTANCES + 1
> -
> -box.schema.user.grant('guest', 'replication')
> -
> --- Setup a cluster with anonymous replica.
> -test_run:cmd('create server replica_anon with rpl_master=default, 
> script="replication/anon1.lua"')
> -test_run:cmd('start server replica_anon')
> -test_run:cmd('switch replica_anon')
> -
> --- [RFC, Asynchronous replication] successful transaction applied on 
> async
> --- replica.
> --- Testcase setup.
> -test_run:switch('default')
> -box.cfg{replication_synchro_quorum=NUM_INSTANCES, 
> replication_synchro_timeout=1000}
> -_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> -_ = box.space.sync:create_index('pk')
> -box.ctl.promote()
> --- Testcase body.
> -test_run:switch('default')
> -box.space.sync:insert{1} -- success
> -box.space.sync:insert{2} -- success
> -box.space.sync:insert{3} -- success
> -test_run:cmd('switch replica_anon')
> -box.space.sync:select{} -- 1, 2, 3
> --- Testcase cleanup.
> -test_run:switch('default')
> -box.space.sync:drop()
> -
> --- [RFC, Asynchronous replication] failed transaction rolled back on 
> async
> --- replica.
> --- Testcase setup.
> -box.cfg{replication_synchro_quorum = NUM_INSTANCES, 
> replication_synchro_timeout = 1000}
> -_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> -_ = box.space.sync:create_index('pk')
> --- Write something to flush the current master's state to replica.
> -_ = box.space.sync:insert{1}
> -_ = box.space.sync:delete{1}
> -
> -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, 
> replication_synchro_timeout = 1000}
> -fiber = require('fiber')
> -ok, err = nil
> -f = fiber.create(function()                 \
> -    ok, err = pcall(box.space.sync.insert, box.space.sync, {1})       
>           \
> -end)
> -
> -test_run:cmd('switch replica_anon')
> -test_run:wait_cond(function() return box.space.sync:count() == 1 end)
> -box.space.sync:select{}
> -
> -test_run:switch('default')
> -box.cfg{replication_synchro_timeout = 0.001}
> -test_run:wait_cond(function() return f:status() == 'dead' end)
> -box.space.sync:select{}
> -
> -test_run:cmd('switch replica_anon')
> -test_run:wait_cond(function() return box.space.sync:count() == 0 end)
> -box.space.sync:select{}
> -
> -test_run:switch('default')
> -box.cfg{replication_synchro_quorum=NUM_INSTANCES, 
> replication_synchro_timeout=1000}
> -box.space.sync:insert{1} -- success
> -test_run:cmd('switch replica_anon')
> -box.space.sync:select{} -- 1
> --- Testcase cleanup.
> -test_run:switch('default')
> -box.space.sync:drop()
> -
> --- Teardown.
> -test_run:switch('default')
> -test_run:cmd('stop server replica_anon')
> -test_run:cmd('delete server replica_anon')
> -box.schema.user.revoke('guest', 'replication')
> -box.cfg{                  \
> -    replication_synchro_quorum = orig_synchro_quorum,                 \
> -    replication_synchro_timeout = orig_synchro_timeout,                 \
> -}
> -box.ctl.demote()
> -test_run:cleanup_cluster()
>
> --
> Yan Shtunder
>
> пт, 29 окт. 2021 г. в 11:06, Serge Petrenko <sergepetrenko@tarantool.org>:
>
>
>
>     28.10.2021 18:56, Ян Штундер пишет:
>     > Hi! Thank you for the review!
>     > I have fixed the errors
>     >
>     >     Nit: better say "Transactions should be committed".
>     >     reaches -> reach.
>     >
>     >
>     > Transactions should be committed after they reach quorum of "real"
>     > cluster members.
>     >
>     >     Please, find a more informative test name.
>     >     For example, "gh_5418_qsync_with_anon_test.lua*
>     >
>     >
>     > gh_5418_test.lua -> gh_5418_qsync_with_anon_test.lua
>     >
>     >     Please, use `t.helpers.retrying()` here.
>     >
>     >
>     >  I used the wait_vclock function from the luatest_helpers.lua file
>     >
>     > --
>     > Yan Shtunder
>
>     Good job on the fixes!
>     LGTM.
>
>
>     >
>     > пн, 25 окт. 2021 г. в 16:32, Serge Petrenko
>     <sergepetrenko@tarantool.org>:
>     >
>     >
>     >
>     >     25.10.2021 12:52, Yan Shtunder via Tarantool-patches пишет:
>     >
>     >     Hi! Good job on porting the test to the current luatest version!
>     >     Please, find a couple of comments below.
>     >
>     >     > Transactions have to committed after they reaches quorum
>     of "real"
>     >
>     >     Nit: better say "Transactions should be committed".
>     >     reaches -> reach.
>     >
>     >     > cluster members. Therefore, anonymous replicas don't have to
>     >     > participate in the quorum.
>     >     >
>     >     > Closes #5418
>     >     > ---
>     >     > Issue: https://github.com/tarantool/tarantool/issues/5418
>     >     > Patch:
>     >
>     https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas
>     >     >
>     >     >   src/box/relay.cc                          |  3 +-
>     >     >   test/replication-luatest/gh_5418_test.lua | 82
>     >     +++++++++++++++++++++++
>     >     >   2 files changed, 84 insertions(+), 1 deletion(-)
>     >     >   create mode 100644 test/replication-luatest/gh_5418_test.lua
>     >     >
>     >     > diff --git a/src/box/relay.cc b/src/box/relay.cc
>     >     > index f5852df7b..cf569e8e2 100644
>     >     > --- a/src/box/relay.cc
>     >     > +++ b/src/box/relay.cc
>     >     > @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg)
>     >     >       struct replication_ack ack;
>     >     >       ack.source = status->relay->replica->id;
>     >     >       ack.vclock = &status->vclock;
>     >     > +     bool anon = status->relay->replica->anon;
>     >     >       /*
>     >     >        * Let pending synchronous transactions know, which of
>     >     >        * them were successfully sent to the replica. Acks are
>     >     > @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg)
>     >     >        * the single master in 100% so far). Other
>     instances wait
>     >     >        * for master's CONFIRM message instead.
>     >     >        */
>     >     > -     if (txn_limbo.owner_id == instance_id) {
>     >     > +     if (txn_limbo.owner_id == instance_id && !anon) {
>     >     >               txn_limbo_ack(&txn_limbo, ack.source,
>     >     >  vclock_get(ack.vclock, instance_id));
>     >     >       }
>     >
>     >     I can't build your patch to test it manually, compilation
>     fails with
>     >     some ERRINJ-related errors.
>     >
>     >     Seems like the commit "replication: fill
>     replicaset.applier.vclock
>     >     after
>     >     local recovery"
>     >     you have on the branch is extraneous. And it causes the error.
>     >
>     >     Please remove it.
>     >
>     >     > diff --git a/test/replication-luatest/gh_5418_test.lua
>     >     b/test/replication-luatest/gh_5418_test.lua
>     >     > new file mode 100644
>     >     > index 000000000..265d28ccb
>     >     > --- /dev/null
>     >     > +++ b/test/replication-luatest/gh_5418_test.lua
>     >
>     >     Please, find a more informative test name.
>     >     For example, "gh_5418_qsync_with_anon_test.lua*
>     >
>     >     > @@ -0,0 +1,82 @@
>     >     > +local fio = require('fio')
>     >     > +local log = require('log')
>     >     > +local fiber = require('fiber')
>     >     > +local t = require('luatest')
>     >     > +local cluster = require('test.luatest_helpers.cluster')
>     >     > +local helpers = require('test.luatest_helpers.helpers')
>     >     > +
>     >     > +local g = t.group('gh-5418')
>     >     > +
>     >     > +g.before_test('test_qsync_with_anon', function()
>     >     > +    g.cluster = cluster:new({})
>     >     > +
>     >     > +    local box_cfg = {
>     >     > +        replication         =
>     {helpers.instance_uri('master')},
>     >     > +        replication_synchro_quorum = 2,
>     >     > +        replication_timeout = 0.1
>     >     > +    }
>     >     > +
>     >     > +    g.master = g.cluster:build_server({alias = 'master'},
>     >     engine, box_cfg)
>     >     > +
>     >     > +    local box_cfg = {
>     >     > +        replication         = {
>     >     > +            helpers.instance_uri('master'),
>     >     > +            helpers.instance_uri('replica')
>     >     > +        },
>     >     > +        replication_timeout = 0.1,
>     >     > +        replication_connect_timeout = 0.5,
>     >     > +        read_only           = true,
>     >     > +        replication_anon    = true
>     >     > +    }
>     >     > +
>     >     > +    g.replica = g.cluster:build_server({alias = 'replica'},
>     >     engine, box_cfg)
>     >     > +
>     >     > +    g.cluster:join_server(g.master)
>     >     > +    g.cluster:join_server(g.replica)
>     >     > +    g.cluster:start()
>     >     > + log.info <http://log.info> <http://log.info>('Everything
>     is started')
>     >     > +end)
>     >     > +
>     >     > +g.after_test('test_qsync_with_anon', function()
>     >     > +    g.cluster:stop()
>     >     > +    fio.rmtree(g.master.workdir)
>     >     > +    fio.rmtree(g.replica.workdir)
>     >     > +end)
>     >     > +
>     >     > +local function wait_vclock(timeout)
>     >     > +    local started_at = fiber.clock()
>     >     > +    local lsn = g.master:eval("return box.info.vclock[1]")
>     >     > +
>     >     > +    local _, tbl = g.master:eval("return
>     >     next(box.info.replication_anon())")
>     >     > +    local to_lsn = tbl.downstream.vclock[1]
>     >     > +
>     >     > +    while to_lsn == nil or to_lsn < lsn do
>     >     > +        fiber.sleep(0.001)
>     >     > +
>     >     > +        if (fiber.clock() - started_at) > timeout then
>     >     > +            return false
>     >     > +        end
>     >     > +
>     >     > +        _, tbl = g.master:eval("return
>     >     next(box.info.replication_anon())")
>     >     > +        to_lsn = tbl.downstream.vclock[1]
>     >     > +
>     >     > + log.info <http://log.info>
>     <http://log.info>(string.format("master lsn: %d;
>     >     replica_anon lsn: %d",
>     >     > +            lsn, to_lsn))
>     >     > +    end
>     >     > +
>     >     > +    return true
>     >     > +end
>     >     > +
>     >     > +g.test_qsync_with_anon = function()
>     >     > + g.master:eval("box.schema.space.create('sync', {is_sync =
>     >     true})")
>     >     > + g.master:eval("box.space.sync:create_index('pk')")
>     >     > +
>     >     > +    t.assert_error_msg_content_equals("Quorum collection
>     for a
>     >     synchronous transaction is timed out",
>     >     > +        function() g.master:eval("return
>     >     box.space.sync:insert{1}") end)
>     >     > +
>     >     > +    -- Wait until everything is replicated from the master to
>     >     the replica
>     >     > +    t.assert(wait_vclock(1))
>     >
>     >     Please, use `t.helpers.retrying()` here.
>     >     It receives a timeout and a function to call.
>     >     Like `t.helpter.retrying({timeout=5}, wait_vclock)`
>     >     And wait_vclock should simply return true or false based on
>     >     whether the replica has reached master's vclock.
>     >
>     >     Also, please choose a bigger timeout. Like 5 or 10 seconds.
>     >     Otherwise the test will be flaky on slow testing machines in
>     our CI.
>     >
>     >     > +
>     >     > +    t.assert_equals(g.master:eval("return
>     >     box.space.sync:select()"), {})
>     >     > +    t.assert_equals(g.replica:eval("return
>     >     box.space.sync:select()"), {})
>     >     > +end
>     >     > --
>     >     > 2.25.1
>     >     >
>     >
>     >     --
>     >     Serge Petrenko
>     >
>
>     -- 
>     Serge Petrenko
>

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum
  2021-11-03 15:01         ` Serge Petrenko via Tarantool-patches
@ 2021-11-11 15:04           ` sergos via Tarantool-patches
  0 siblings, 0 replies; 6+ messages in thread
From: sergos via Tarantool-patches @ 2021-11-11 15:04 UTC (permalink / raw)
  To: Ян Штундер; +Cc: tml

Hi!

Thanks for the patch! 

Just a nit in message and a question re the test below.

Sergos.


The changelog is from https://github.com/tarantool/tarantool/commit/b9feb3853fbce389471ad3022307942aa92e8ea7

> Transactions should be committed after they reach quorum of "real"
> cluster members. Therefore, anonymous replicas don't have to

I would rephrase it as ‘replicas shouldn’t participate’
		
> participate in the quorum.
> 

>> +++ b/test/replication-luatest/gh_5418_qsync_with_anon_test.lua
>> @@ -0,0 +1,62 @@
>> +local t = require('luatest')
>> +local cluster = require('test.luatest_helpers.cluster')
>> +local helpers = require('test.luatest_helpers')
>> +
>> +local g = t.group('gh-5418', {{engine = 'memtx'}, {engine = 'vinyl'}})
>> +
>> +g.before_each(function(cg)
>> +    local engine = cg.params.engine
>> +
>> +    cg.cluster = cluster:new({})
>> +
>> +    local box_cfg = {
>> +        replication         = {
>> +            helpers.instance_uri('master')
>> +        },
>> +        replication_synchro_quorum = 2,
>> +        replication_timeout = 1
>> +    }
>> +
>> +    cg.master = cg.cluster:build_server({alias = 'master', engine = engine, box_cfg = box_cfg})
>> +
>> +    local box_cfg = {
>> +        replication         = {
>> +            helpers.instance_uri('master'),
>> +            helpers.instance_uri('replica')
>> +        },
>> +        replication_timeout = 1,
>> +        replication_connect_timeout = 4,
>> +        read_only           = true,
>> +        replication_anon    = true
>> +    }
>> +
>> +    cg.replica = cg.cluster:build_server({alias = 'replica', engine = engine, box_cfg = box_cfg})
>> +
>> +    cg.cluster:add_server(cg.master)
>> +    cg.cluster:add_server(cg.replica)
>> +    cg.cluster:start()
>> +end)
>> +
>> +
>> +g.after_each(function(cg)
>> +    cg.cluster.servers = nil
>> +    cg.cluster:drop()
>> +end)
>> +
>> +
>> +g.test_qsync_with_anon = function(cg)
>> +    cg.master:eval("box.schema.space.create('sync', {is_sync = true})")
>> +    cg.master:eval("box.space.sync:create_index('pk')")
>> +    cg.master:eval("box.ctl.promote()")
>> +
>> +    t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out",
>> +        function() cg.master:eval("return box.space.sync:insert{1}") end)
>> +
>> +    -- Wait until everything is replicated from the master to the replica
>> +    local vclock = cg.master:eval("return box.info.vclock")
>> +    vclock[0] = nil
>> +    helpers:wait_vclock(cg.replica, vclock)
>> +

By this point the insert() is replicated from the master. I wonder if ROLLBACK
will be delivered to the replica by the time of select()?

>> +    t.assert_equals(cg.master:eval("return box.space.sync:select()"), {})
>> +    t.assert_equals(cg.replica:eval("return box.space.sync:select()"), {})
>> +end


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum
  2021-10-25  9:52 [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum Yan Shtunder via Tarantool-patches
  2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches
@ 2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches
  1 sibling, 0 replies; 6+ messages in thread
From: Kirill Yukhin via Tarantool-patches @ 2021-11-29 15:17 UTC (permalink / raw)
  To: Yan Shtunder; +Cc: tarantool-patches

Hello,

On 25 окт 12:52, Yan Shtunder via Tarantool-patches wrote:
> Transactions have to committed after they reaches quorum of "real"
> cluster members. Therefore, anonymous replicas don't have to
> participate in the quorum.
> 
> Closes #5418
> ---
> Issue: https://github.com/tarantool/tarantool/issues/5418
> Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas

I've checked your patch into master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-29 15:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25  9:52 [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum Yan Shtunder via Tarantool-patches
2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches
     [not found]   ` <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com>
2021-10-29  8:06     ` Serge Petrenko via Tarantool-patches
     [not found]       ` <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com>
2021-11-03 15:01         ` Serge Petrenko via Tarantool-patches
2021-11-11 15:04           ` sergos via Tarantool-patches
2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox