* [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum @ 2021-10-25 9:52 Yan Shtunder via Tarantool-patches 2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches 2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches 0 siblings, 2 replies; 6+ messages in thread From: Yan Shtunder via Tarantool-patches @ 2021-10-25 9:52 UTC (permalink / raw) To: tarantool-patches; +Cc: Yan Shtunder Transactions have to committed after they reaches quorum of "real" cluster members. Therefore, anonymous replicas don't have to participate in the quorum. Closes #5418 --- Issue: https://github.com/tarantool/tarantool/issues/5418 Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas src/box/relay.cc | 3 +- test/replication-luatest/gh_5418_test.lua | 82 +++++++++++++++++++++++ 2 files changed, 84 insertions(+), 1 deletion(-) create mode 100644 test/replication-luatest/gh_5418_test.lua diff --git a/src/box/relay.cc b/src/box/relay.cc index f5852df7b..cf569e8e2 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg) struct replication_ack ack; ack.source = status->relay->replica->id; ack.vclock = &status->vclock; + bool anon = status->relay->replica->anon; /* * Let pending synchronous transactions know, which of * them were successfully sent to the replica. Acks are @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg) * the single master in 100% so far). Other instances wait * for master's CONFIRM message instead. */ - if (txn_limbo.owner_id == instance_id) { + if (txn_limbo.owner_id == instance_id && !anon) { txn_limbo_ack(&txn_limbo, ack.source, vclock_get(ack.vclock, instance_id)); } diff --git a/test/replication-luatest/gh_5418_test.lua b/test/replication-luatest/gh_5418_test.lua new file mode 100644 index 000000000..265d28ccb --- /dev/null +++ b/test/replication-luatest/gh_5418_test.lua @@ -0,0 +1,82 @@ +local fio = require('fio') +local log = require('log') +local fiber = require('fiber') +local t = require('luatest') +local cluster = require('test.luatest_helpers.cluster') +local helpers = require('test.luatest_helpers.helpers') + +local g = t.group('gh-5418') + +g.before_test('test_qsync_with_anon', function() + g.cluster = cluster:new({}) + + local box_cfg = { + replication = {helpers.instance_uri('master')}, + replication_synchro_quorum = 2, + replication_timeout = 0.1 + } + + g.master = g.cluster:build_server({alias = 'master'}, engine, box_cfg) + + local box_cfg = { + replication = { + helpers.instance_uri('master'), + helpers.instance_uri('replica') + }, + replication_timeout = 0.1, + replication_connect_timeout = 0.5, + read_only = true, + replication_anon = true + } + + g.replica = g.cluster:build_server({alias = 'replica'}, engine, box_cfg) + + g.cluster:join_server(g.master) + g.cluster:join_server(g.replica) + g.cluster:start() + log.info('Everything is started') +end) + +g.after_test('test_qsync_with_anon', function() + g.cluster:stop() + fio.rmtree(g.master.workdir) + fio.rmtree(g.replica.workdir) +end) + +local function wait_vclock(timeout) + local started_at = fiber.clock() + local lsn = g.master:eval("return box.info.vclock[1]") + + local _, tbl = g.master:eval("return next(box.info.replication_anon())") + local to_lsn = tbl.downstream.vclock[1] + + while to_lsn == nil or to_lsn < lsn do + fiber.sleep(0.001) + + if (fiber.clock() - started_at) > timeout then + return false + end + + _, tbl = g.master:eval("return next(box.info.replication_anon())") + to_lsn = tbl.downstream.vclock[1] + + log.info(string.format("master lsn: %d; replica_anon lsn: %d", + lsn, to_lsn)) + end + + return true +end + +g.test_qsync_with_anon = function() + g.master:eval("box.schema.space.create('sync', {is_sync = true})") + g.master:eval("box.space.sync:create_index('pk')") + + t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out", + function() g.master:eval("return box.space.sync:insert{1}") end) + + -- Wait until everything is replicated from the master to the replica + t.assert(wait_vclock(1)) + + t.assert_equals(g.master:eval("return box.space.sync:select()"), {}) + t.assert_equals(g.replica:eval("return box.space.sync:select()"), {}) +end -- 2.25.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum 2021-10-25 9:52 [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum Yan Shtunder via Tarantool-patches @ 2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches [not found] ` <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com> 2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches 1 sibling, 1 reply; 6+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-10-25 13:32 UTC (permalink / raw) To: Yan Shtunder, tarantool-patches 25.10.2021 12:52, Yan Shtunder via Tarantool-patches пишет: Hi! Good job on porting the test to the current luatest version! Please, find a couple of comments below. > Transactions have to committed after they reaches quorum of "real" Nit: better say "Transactions should be committed". reaches -> reach. > cluster members. Therefore, anonymous replicas don't have to > participate in the quorum. > > Closes #5418 > --- > Issue: https://github.com/tarantool/tarantool/issues/5418 > Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas > > src/box/relay.cc | 3 +- > test/replication-luatest/gh_5418_test.lua | 82 +++++++++++++++++++++++ > 2 files changed, 84 insertions(+), 1 deletion(-) > create mode 100644 test/replication-luatest/gh_5418_test.lua > > diff --git a/src/box/relay.cc b/src/box/relay.cc > index f5852df7b..cf569e8e2 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg) > struct replication_ack ack; > ack.source = status->relay->replica->id; > ack.vclock = &status->vclock; > + bool anon = status->relay->replica->anon; > /* > * Let pending synchronous transactions know, which of > * them were successfully sent to the replica. Acks are > @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg) > * the single master in 100% so far). Other instances wait > * for master's CONFIRM message instead. > */ > - if (txn_limbo.owner_id == instance_id) { > + if (txn_limbo.owner_id == instance_id && !anon) { > txn_limbo_ack(&txn_limbo, ack.source, > vclock_get(ack.vclock, instance_id)); > } I can't build your patch to test it manually, compilation fails with some ERRINJ-related errors. Seems like the commit "replication: fill replicaset.applier.vclock after local recovery" you have on the branch is extraneous. And it causes the error. Please remove it. > diff --git a/test/replication-luatest/gh_5418_test.lua b/test/replication-luatest/gh_5418_test.lua > new file mode 100644 > index 000000000..265d28ccb > --- /dev/null > +++ b/test/replication-luatest/gh_5418_test.lua Please, find a more informative test name. For example, "gh_5418_qsync_with_anon_test.lua* > @@ -0,0 +1,82 @@ > +local fio = require('fio') > +local log = require('log') > +local fiber = require('fiber') > +local t = require('luatest') > +local cluster = require('test.luatest_helpers.cluster') > +local helpers = require('test.luatest_helpers.helpers') > + > +local g = t.group('gh-5418') > + > +g.before_test('test_qsync_with_anon', function() > + g.cluster = cluster:new({}) > + > + local box_cfg = { > + replication = {helpers.instance_uri('master')}, > + replication_synchro_quorum = 2, > + replication_timeout = 0.1 > + } > + > + g.master = g.cluster:build_server({alias = 'master'}, engine, box_cfg) > + > + local box_cfg = { > + replication = { > + helpers.instance_uri('master'), > + helpers.instance_uri('replica') > + }, > + replication_timeout = 0.1, > + replication_connect_timeout = 0.5, > + read_only = true, > + replication_anon = true > + } > + > + g.replica = g.cluster:build_server({alias = 'replica'}, engine, box_cfg) > + > + g.cluster:join_server(g.master) > + g.cluster:join_server(g.replica) > + g.cluster:start() > + log.info('Everything is started') > +end) > + > +g.after_test('test_qsync_with_anon', function() > + g.cluster:stop() > + fio.rmtree(g.master.workdir) > + fio.rmtree(g.replica.workdir) > +end) > + > +local function wait_vclock(timeout) > + local started_at = fiber.clock() > + local lsn = g.master:eval("return box.info.vclock[1]") > + > + local _, tbl = g.master:eval("return next(box.info.replication_anon())") > + local to_lsn = tbl.downstream.vclock[1] > + > + while to_lsn == nil or to_lsn < lsn do > + fiber.sleep(0.001) > + > + if (fiber.clock() - started_at) > timeout then > + return false > + end > + > + _, tbl = g.master:eval("return next(box.info.replication_anon())") > + to_lsn = tbl.downstream.vclock[1] > + > + log.info(string.format("master lsn: %d; replica_anon lsn: %d", > + lsn, to_lsn)) > + end > + > + return true > +end > + > +g.test_qsync_with_anon = function() > + g.master:eval("box.schema.space.create('sync', {is_sync = true})") > + g.master:eval("box.space.sync:create_index('pk')") > + > + t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out", > + function() g.master:eval("return box.space.sync:insert{1}") end) > + > + -- Wait until everything is replicated from the master to the replica > + t.assert(wait_vclock(1)) Please, use `t.helpers.retrying()` here. It receives a timeout and a function to call. Like `t.helpter.retrying({timeout=5}, wait_vclock)` And wait_vclock should simply return true or false based on whether the replica has reached master's vclock. Also, please choose a bigger timeout. Like 5 or 10 seconds. Otherwise the test will be flaky on slow testing machines in our CI. > + > + t.assert_equals(g.master:eval("return box.space.sync:select()"), {}) > + t.assert_equals(g.replica:eval("return box.space.sync:select()"), {}) > +end > -- > 2.25.1 > -- Serge Petrenko ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com>]
* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum [not found] ` <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com> @ 2021-10-29 8:06 ` Serge Petrenko via Tarantool-patches [not found] ` <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-10-29 8:06 UTC (permalink / raw) To: Ян Штундер, tml 28.10.2021 18:56, Ян Штундер пишет: > Hi! Thank you for the review! > I have fixed the errors > > Nit: better say "Transactions should be committed". > reaches -> reach. > > > Transactions should be committed after they reach quorum of "real" > cluster members. > > Please, find a more informative test name. > For example, "gh_5418_qsync_with_anon_test.lua* > > > gh_5418_test.lua -> gh_5418_qsync_with_anon_test.lua > > Please, use `t.helpers.retrying()` here. > > > I used the wait_vclock function from the luatest_helpers.lua file > > -- > Yan Shtunder Good job on the fixes! LGTM. > > пн, 25 окт. 2021 г. в 16:32, Serge Petrenko <sergepetrenko@tarantool.org>: > > > > 25.10.2021 12:52, Yan Shtunder via Tarantool-patches пишет: > > Hi! Good job on porting the test to the current luatest version! > Please, find a couple of comments below. > > > Transactions have to committed after they reaches quorum of "real" > > Nit: better say "Transactions should be committed". > reaches -> reach. > > > cluster members. Therefore, anonymous replicas don't have to > > participate in the quorum. > > > > Closes #5418 > > --- > > Issue: https://github.com/tarantool/tarantool/issues/5418 > > Patch: > https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas > > > > src/box/relay.cc | 3 +- > > test/replication-luatest/gh_5418_test.lua | 82 > +++++++++++++++++++++++ > > 2 files changed, 84 insertions(+), 1 deletion(-) > > create mode 100644 test/replication-luatest/gh_5418_test.lua > > > > diff --git a/src/box/relay.cc b/src/box/relay.cc > > index f5852df7b..cf569e8e2 100644 > > --- a/src/box/relay.cc > > +++ b/src/box/relay.cc > > @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg) > > struct replication_ack ack; > > ack.source = status->relay->replica->id; > > ack.vclock = &status->vclock; > > + bool anon = status->relay->replica->anon; > > /* > > * Let pending synchronous transactions know, which of > > * them were successfully sent to the replica. Acks are > > @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg) > > * the single master in 100% so far). Other instances wait > > * for master's CONFIRM message instead. > > */ > > - if (txn_limbo.owner_id == instance_id) { > > + if (txn_limbo.owner_id == instance_id && !anon) { > > txn_limbo_ack(&txn_limbo, ack.source, > > vclock_get(ack.vclock, instance_id)); > > } > > I can't build your patch to test it manually, compilation fails with > some ERRINJ-related errors. > > Seems like the commit "replication: fill replicaset.applier.vclock > after > local recovery" > you have on the branch is extraneous. And it causes the error. > > Please remove it. > > > diff --git a/test/replication-luatest/gh_5418_test.lua > b/test/replication-luatest/gh_5418_test.lua > > new file mode 100644 > > index 000000000..265d28ccb > > --- /dev/null > > +++ b/test/replication-luatest/gh_5418_test.lua > > Please, find a more informative test name. > For example, "gh_5418_qsync_with_anon_test.lua* > > > @@ -0,0 +1,82 @@ > > +local fio = require('fio') > > +local log = require('log') > > +local fiber = require('fiber') > > +local t = require('luatest') > > +local cluster = require('test.luatest_helpers.cluster') > > +local helpers = require('test.luatest_helpers.helpers') > > + > > +local g = t.group('gh-5418') > > + > > +g.before_test('test_qsync_with_anon', function() > > + g.cluster = cluster:new({}) > > + > > + local box_cfg = { > > + replication = {helpers.instance_uri('master')}, > > + replication_synchro_quorum = 2, > > + replication_timeout = 0.1 > > + } > > + > > + g.master = g.cluster:build_server({alias = 'master'}, > engine, box_cfg) > > + > > + local box_cfg = { > > + replication = { > > + helpers.instance_uri('master'), > > + helpers.instance_uri('replica') > > + }, > > + replication_timeout = 0.1, > > + replication_connect_timeout = 0.5, > > + read_only = true, > > + replication_anon = true > > + } > > + > > + g.replica = g.cluster:build_server({alias = 'replica'}, > engine, box_cfg) > > + > > + g.cluster:join_server(g.master) > > + g.cluster:join_server(g.replica) > > + g.cluster:start() > > + log.info <http://log.info>('Everything is started') > > +end) > > + > > +g.after_test('test_qsync_with_anon', function() > > + g.cluster:stop() > > + fio.rmtree(g.master.workdir) > > + fio.rmtree(g.replica.workdir) > > +end) > > + > > +local function wait_vclock(timeout) > > + local started_at = fiber.clock() > > + local lsn = g.master:eval("return box.info.vclock[1]") > > + > > + local _, tbl = g.master:eval("return > next(box.info.replication_anon())") > > + local to_lsn = tbl.downstream.vclock[1] > > + > > + while to_lsn == nil or to_lsn < lsn do > > + fiber.sleep(0.001) > > + > > + if (fiber.clock() - started_at) > timeout then > > + return false > > + end > > + > > + _, tbl = g.master:eval("return > next(box.info.replication_anon())") > > + to_lsn = tbl.downstream.vclock[1] > > + > > + log.info <http://log.info>(string.format("master lsn: %d; > replica_anon lsn: %d", > > + lsn, to_lsn)) > > + end > > + > > + return true > > +end > > + > > +g.test_qsync_with_anon = function() > > + g.master:eval("box.schema.space.create('sync', {is_sync = > true})") > > + g.master:eval("box.space.sync:create_index('pk')") > > + > > + t.assert_error_msg_content_equals("Quorum collection for a > synchronous transaction is timed out", > > + function() g.master:eval("return > box.space.sync:insert{1}") end) > > + > > + -- Wait until everything is replicated from the master to > the replica > > + t.assert(wait_vclock(1)) > > Please, use `t.helpers.retrying()` here. > It receives a timeout and a function to call. > Like `t.helpter.retrying({timeout=5}, wait_vclock)` > And wait_vclock should simply return true or false based on > whether the replica has reached master's vclock. > > Also, please choose a bigger timeout. Like 5 or 10 seconds. > Otherwise the test will be flaky on slow testing machines in our CI. > > > + > > + t.assert_equals(g.master:eval("return > box.space.sync:select()"), {}) > > + t.assert_equals(g.replica:eval("return > box.space.sync:select()"), {}) > > +end > > -- > > 2.25.1 > > > > -- > Serge Petrenko > -- Serge Petrenko ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com>]
* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum [not found] ` <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com> @ 2021-11-03 15:01 ` Serge Petrenko via Tarantool-patches 2021-11-11 15:04 ` sergos via Tarantool-patches 0 siblings, 1 reply; 6+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-11-03 15:01 UTC (permalink / raw) To: Ян Штундер, tml 03.11.2021 13:05, Ян Штундер пишет: > Thanks for the comments! > I corrected your remarks. Thanks for the changes! LGTM. > > Diff: > > +++ b/test/replication-luatest/gh_5418_qsync_with_anon_test.lua > @@ -0,0 +1,62 @@ > +local t = require('luatest') > +local cluster = require('test.luatest_helpers.cluster') > +local helpers = require('test.luatest_helpers') > + > +local g = t.group('gh-5418', {{engine = 'memtx'}, {engine = 'vinyl'}}) > + > +g.before_each(function(cg) > + local engine = cg.params.engine > + > + cg.cluster = cluster:new({}) > + > + local box_cfg = { > + replication = { > + helpers.instance_uri('master') > + }, > + replication_synchro_quorum = 2, > + replication_timeout = 1 > + } > + > + cg.master = cg.cluster:build_server({alias = 'master', engine = > engine, box_cfg = box_cfg}) > + > + local box_cfg = { > + replication = { > + helpers.instance_uri('master'), > + helpers.instance_uri('replica') > + }, > + replication_timeout = 1, > + replication_connect_timeout = 4, > + read_only = true, > + replication_anon = true > + } > + > + cg.replica = cg.cluster:build_server({alias = 'replica', engine = > engine, box_cfg = box_cfg}) > + > + cg.cluster:add_server(cg.master) > + cg.cluster:add_server(cg.replica) > + cg.cluster:start() > +end) > + > + > +g.after_each(function(cg) > + cg.cluster.servers = nil > + cg.cluster:drop() > +end) > + > + > +g.test_qsync_with_anon = function(cg) > + cg.master:eval("box.schema.space.create('sync', {is_sync = true})") > + cg.master:eval("box.space.sync:create_index('pk')") > + cg.master:eval("box.ctl.promote()") > + > + t.assert_error_msg_content_equals("Quorum collection for a > synchronous transaction is timed out", > + function() cg.master:eval("return box.space.sync:insert{1}") end) > + > + -- Wait until everything is replicated from the master to the replica > + local vclock = cg.master:eval("return box.info.vclock") > + vclock[0] = nil > + helpers:wait_vclock(cg.replica, vclock) > + > + t.assert_equals(cg.master:eval("return box.space.sync:select()"), {}) > + t.assert_equals(cg.replica:eval("return > box.space.sync:select()"), {}) > +end > diff --git a/test/replication/qsync_with_anon.result > b/test/replication/qsync_with_anon.result > deleted file mode 100644 > index 99c6fb902..000000000 > --- a/test/replication/qsync_with_anon.result > +++ /dev/null > @@ -1,231 +0,0 @@ > --- test-run result file version 2 > -env = require('test_run') > - | --- > - | ... > -test_run = env.new() > - | --- > - | ... > -engine = test_run:get_cfg('engine') > - | --- > - | ... > - > -orig_synchro_quorum = box.cfg.replication_synchro_quorum > - | --- > - | ... > -orig_synchro_timeout = box.cfg.replication_synchro_timeout > - | --- > - | ... > - > -NUM_INSTANCES = 2 > - | --- > - | ... > -BROKEN_QUORUM = NUM_INSTANCES + 1 > - | --- > - | ... > - > -box.schema.user.grant('guest', 'replication') > - | --- > - | ... > - > --- Setup a cluster with anonymous replica. > -test_run:cmd('create server replica_anon with rpl_master=default, > script="replication/anon1.lua"') > - | --- > - | - true > - | ... > -test_run:cmd('start server replica_anon') > - | --- > - | - true > - | ... > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > - > --- [RFC, Asynchronous replication] successful transaction applied on > async > --- replica. > --- Testcase setup. > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, > replication_synchro_timeout=1000} > - | --- > - | ... > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > - | --- > - | ... > -_ = box.space.sync:create_index('pk') > - | --- > - | ... > -box.ctl.promote() > - | --- > - | ... > --- Testcase body. > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.space.sync:insert{1} -- success > - | --- > - | - [1] > - | ... > -box.space.sync:insert{2} -- success > - | --- > - | - [2] > - | ... > -box.space.sync:insert{3} -- success > - | --- > - | - [3] > - | ... > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > -box.space.sync:select{} -- 1, 2, 3 > - | --- > - | - - [1] > - | - [2] > - | - [3] > - | ... > --- Testcase cleanup. > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.space.sync:drop() > - | --- > - | ... > - > --- [RFC, Asynchronous replication] failed transaction rolled back on > async > --- replica. > --- Testcase setup. > -box.cfg{replication_synchro_quorum = NUM_INSTANCES, > replication_synchro_timeout = 1000} > - | --- > - | ... > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > - | --- > - | ... > -_ = box.space.sync:create_index('pk') > - | --- > - | ... > --- Write something to flush the current master's state to replica. > -_ = box.space.sync:insert{1} > - | --- > - | ... > -_ = box.space.sync:delete{1} > - | --- > - | ... > - > -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, > replication_synchro_timeout = 1000} > - | --- > - | ... > -fiber = require('fiber') > - | --- > - | ... > -ok, err = nil > - | --- > - | ... > -f = fiber.create(function() \ > - ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) > \ > -end) > - | --- > - | ... > - > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > -test_run:wait_cond(function() return box.space.sync:count() == 1 end) > - | --- > - | - true > - | ... > -box.space.sync:select{} > - | --- > - | - - [1] > - | ... > - > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.cfg{replication_synchro_timeout = 0.001} > - | --- > - | ... > -test_run:wait_cond(function() return f:status() == 'dead' end) > - | --- > - | - true > - | ... > -box.space.sync:select{} > - | --- > - | - [] > - | ... > - > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > -test_run:wait_cond(function() return box.space.sync:count() == 0 end) > - | --- > - | - true > - | ... > -box.space.sync:select{} > - | --- > - | - [] > - | ... > - > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, > replication_synchro_timeout=1000} > - | --- > - | ... > -box.space.sync:insert{1} -- success > - | --- > - | - [1] > - | ... > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > -box.space.sync:select{} -- 1 > - | --- > - | - - [1] > - | ... > --- Testcase cleanup. > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.space.sync:drop() > - | --- > - | ... > - > --- Teardown. > -test_run:switch('default') > - | --- > - | - true > - | ... > -test_run:cmd('stop server replica_anon') > - | --- > - | - true > - | ... > -test_run:cmd('delete server replica_anon') > - | --- > - | - true > - | ... > -box.schema.user.revoke('guest', 'replication') > - | --- > - | ... > -box.cfg{ \ > - replication_synchro_quorum = orig_synchro_quorum, \ > - replication_synchro_timeout = orig_synchro_timeout, \ > -} > - | --- > - | ... > -box.ctl.demote() > - | --- > - | ... > -test_run:cleanup_cluster() > - | --- > - | ... > diff --git a/test/replication/qsync_with_anon.test.lua > b/test/replication/qsync_with_anon.test.lua > deleted file mode 100644 > index e73880ec7..000000000 > --- a/test/replication/qsync_with_anon.test.lua > +++ /dev/null > @@ -1,86 +0,0 @@ > -env = require('test_run') > -test_run = env.new() > -engine = test_run:get_cfg('engine') > - > -orig_synchro_quorum = box.cfg.replication_synchro_quorum > -orig_synchro_timeout = box.cfg.replication_synchro_timeout > - > -NUM_INSTANCES = 2 > -BROKEN_QUORUM = NUM_INSTANCES + 1 > - > -box.schema.user.grant('guest', 'replication') > - > --- Setup a cluster with anonymous replica. > -test_run:cmd('create server replica_anon with rpl_master=default, > script="replication/anon1.lua"') > -test_run:cmd('start server replica_anon') > -test_run:cmd('switch replica_anon') > - > --- [RFC, Asynchronous replication] successful transaction applied on > async > --- replica. > --- Testcase setup. > -test_run:switch('default') > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, > replication_synchro_timeout=1000} > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > -_ = box.space.sync:create_index('pk') > -box.ctl.promote() > --- Testcase body. > -test_run:switch('default') > -box.space.sync:insert{1} -- success > -box.space.sync:insert{2} -- success > -box.space.sync:insert{3} -- success > -test_run:cmd('switch replica_anon') > -box.space.sync:select{} -- 1, 2, 3 > --- Testcase cleanup. > -test_run:switch('default') > -box.space.sync:drop() > - > --- [RFC, Asynchronous replication] failed transaction rolled back on > async > --- replica. > --- Testcase setup. > -box.cfg{replication_synchro_quorum = NUM_INSTANCES, > replication_synchro_timeout = 1000} > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > -_ = box.space.sync:create_index('pk') > --- Write something to flush the current master's state to replica. > -_ = box.space.sync:insert{1} > -_ = box.space.sync:delete{1} > - > -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, > replication_synchro_timeout = 1000} > -fiber = require('fiber') > -ok, err = nil > -f = fiber.create(function() \ > - ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) > \ > -end) > - > -test_run:cmd('switch replica_anon') > -test_run:wait_cond(function() return box.space.sync:count() == 1 end) > -box.space.sync:select{} > - > -test_run:switch('default') > -box.cfg{replication_synchro_timeout = 0.001} > -test_run:wait_cond(function() return f:status() == 'dead' end) > -box.space.sync:select{} > - > -test_run:cmd('switch replica_anon') > -test_run:wait_cond(function() return box.space.sync:count() == 0 end) > -box.space.sync:select{} > - > -test_run:switch('default') > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, > replication_synchro_timeout=1000} > -box.space.sync:insert{1} -- success > -test_run:cmd('switch replica_anon') > -box.space.sync:select{} -- 1 > --- Testcase cleanup. > -test_run:switch('default') > -box.space.sync:drop() > - > --- Teardown. > -test_run:switch('default') > -test_run:cmd('stop server replica_anon') > -test_run:cmd('delete server replica_anon') > -box.schema.user.revoke('guest', 'replication') > -box.cfg{ \ > - replication_synchro_quorum = orig_synchro_quorum, \ > - replication_synchro_timeout = orig_synchro_timeout, \ > -} > -box.ctl.demote() > -test_run:cleanup_cluster() > > -- > Yan Shtunder > > пт, 29 окт. 2021 г. в 11:06, Serge Petrenko <sergepetrenko@tarantool.org>: > > > > 28.10.2021 18:56, Ян Штундер пишет: > > Hi! Thank you for the review! > > I have fixed the errors > > > > Nit: better say "Transactions should be committed". > > reaches -> reach. > > > > > > Transactions should be committed after they reach quorum of "real" > > cluster members. > > > > Please, find a more informative test name. > > For example, "gh_5418_qsync_with_anon_test.lua* > > > > > > gh_5418_test.lua -> gh_5418_qsync_with_anon_test.lua > > > > Please, use `t.helpers.retrying()` here. > > > > > > I used the wait_vclock function from the luatest_helpers.lua file > > > > -- > > Yan Shtunder > > Good job on the fixes! > LGTM. > > > > > > пн, 25 окт. 2021 г. в 16:32, Serge Petrenko > <sergepetrenko@tarantool.org>: > > > > > > > > 25.10.2021 12:52, Yan Shtunder via Tarantool-patches пишет: > > > > Hi! Good job on porting the test to the current luatest version! > > Please, find a couple of comments below. > > > > > Transactions have to committed after they reaches quorum > of "real" > > > > Nit: better say "Transactions should be committed". > > reaches -> reach. > > > > > cluster members. Therefore, anonymous replicas don't have to > > > participate in the quorum. > > > > > > Closes #5418 > > > --- > > > Issue: https://github.com/tarantool/tarantool/issues/5418 > > > Patch: > > > https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas > > > > > > src/box/relay.cc | 3 +- > > > test/replication-luatest/gh_5418_test.lua | 82 > > +++++++++++++++++++++++ > > > 2 files changed, 84 insertions(+), 1 deletion(-) > > > create mode 100644 test/replication-luatest/gh_5418_test.lua > > > > > > diff --git a/src/box/relay.cc b/src/box/relay.cc > > > index f5852df7b..cf569e8e2 100644 > > > --- a/src/box/relay.cc > > > +++ b/src/box/relay.cc > > > @@ -543,6 +543,7 @@ tx_status_update(struct cmsg *msg) > > > struct replication_ack ack; > > > ack.source = status->relay->replica->id; > > > ack.vclock = &status->vclock; > > > + bool anon = status->relay->replica->anon; > > > /* > > > * Let pending synchronous transactions know, which of > > > * them were successfully sent to the replica. Acks are > > > @@ -550,7 +551,7 @@ tx_status_update(struct cmsg *msg) > > > * the single master in 100% so far). Other > instances wait > > > * for master's CONFIRM message instead. > > > */ > > > - if (txn_limbo.owner_id == instance_id) { > > > + if (txn_limbo.owner_id == instance_id && !anon) { > > > txn_limbo_ack(&txn_limbo, ack.source, > > > vclock_get(ack.vclock, instance_id)); > > > } > > > > I can't build your patch to test it manually, compilation > fails with > > some ERRINJ-related errors. > > > > Seems like the commit "replication: fill > replicaset.applier.vclock > > after > > local recovery" > > you have on the branch is extraneous. And it causes the error. > > > > Please remove it. > > > > > diff --git a/test/replication-luatest/gh_5418_test.lua > > b/test/replication-luatest/gh_5418_test.lua > > > new file mode 100644 > > > index 000000000..265d28ccb > > > --- /dev/null > > > +++ b/test/replication-luatest/gh_5418_test.lua > > > > Please, find a more informative test name. > > For example, "gh_5418_qsync_with_anon_test.lua* > > > > > @@ -0,0 +1,82 @@ > > > +local fio = require('fio') > > > +local log = require('log') > > > +local fiber = require('fiber') > > > +local t = require('luatest') > > > +local cluster = require('test.luatest_helpers.cluster') > > > +local helpers = require('test.luatest_helpers.helpers') > > > + > > > +local g = t.group('gh-5418') > > > + > > > +g.before_test('test_qsync_with_anon', function() > > > + g.cluster = cluster:new({}) > > > + > > > + local box_cfg = { > > > + replication = > {helpers.instance_uri('master')}, > > > + replication_synchro_quorum = 2, > > > + replication_timeout = 0.1 > > > + } > > > + > > > + g.master = g.cluster:build_server({alias = 'master'}, > > engine, box_cfg) > > > + > > > + local box_cfg = { > > > + replication = { > > > + helpers.instance_uri('master'), > > > + helpers.instance_uri('replica') > > > + }, > > > + replication_timeout = 0.1, > > > + replication_connect_timeout = 0.5, > > > + read_only = true, > > > + replication_anon = true > > > + } > > > + > > > + g.replica = g.cluster:build_server({alias = 'replica'}, > > engine, box_cfg) > > > + > > > + g.cluster:join_server(g.master) > > > + g.cluster:join_server(g.replica) > > > + g.cluster:start() > > > + log.info <http://log.info> <http://log.info>('Everything > is started') > > > +end) > > > + > > > +g.after_test('test_qsync_with_anon', function() > > > + g.cluster:stop() > > > + fio.rmtree(g.master.workdir) > > > + fio.rmtree(g.replica.workdir) > > > +end) > > > + > > > +local function wait_vclock(timeout) > > > + local started_at = fiber.clock() > > > + local lsn = g.master:eval("return box.info.vclock[1]") > > > + > > > + local _, tbl = g.master:eval("return > > next(box.info.replication_anon())") > > > + local to_lsn = tbl.downstream.vclock[1] > > > + > > > + while to_lsn == nil or to_lsn < lsn do > > > + fiber.sleep(0.001) > > > + > > > + if (fiber.clock() - started_at) > timeout then > > > + return false > > > + end > > > + > > > + _, tbl = g.master:eval("return > > next(box.info.replication_anon())") > > > + to_lsn = tbl.downstream.vclock[1] > > > + > > > + log.info <http://log.info> > <http://log.info>(string.format("master lsn: %d; > > replica_anon lsn: %d", > > > + lsn, to_lsn)) > > > + end > > > + > > > + return true > > > +end > > > + > > > +g.test_qsync_with_anon = function() > > > + g.master:eval("box.schema.space.create('sync', {is_sync = > > true})") > > > + g.master:eval("box.space.sync:create_index('pk')") > > > + > > > + t.assert_error_msg_content_equals("Quorum collection > for a > > synchronous transaction is timed out", > > > + function() g.master:eval("return > > box.space.sync:insert{1}") end) > > > + > > > + -- Wait until everything is replicated from the master to > > the replica > > > + t.assert(wait_vclock(1)) > > > > Please, use `t.helpers.retrying()` here. > > It receives a timeout and a function to call. > > Like `t.helpter.retrying({timeout=5}, wait_vclock)` > > And wait_vclock should simply return true or false based on > > whether the replica has reached master's vclock. > > > > Also, please choose a bigger timeout. Like 5 or 10 seconds. > > Otherwise the test will be flaky on slow testing machines in > our CI. > > > > > + > > > + t.assert_equals(g.master:eval("return > > box.space.sync:select()"), {}) > > > + t.assert_equals(g.replica:eval("return > > box.space.sync:select()"), {}) > > > +end > > > -- > > > 2.25.1 > > > > > > > -- > > Serge Petrenko > > > > -- > Serge Petrenko > -- Serge Petrenko ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum 2021-11-03 15:01 ` Serge Petrenko via Tarantool-patches @ 2021-11-11 15:04 ` sergos via Tarantool-patches 0 siblings, 0 replies; 6+ messages in thread From: sergos via Tarantool-patches @ 2021-11-11 15:04 UTC (permalink / raw) To: Ян Штундер; +Cc: tml Hi! Thanks for the patch! Just a nit in message and a question re the test below. Sergos. The changelog is from https://github.com/tarantool/tarantool/commit/b9feb3853fbce389471ad3022307942aa92e8ea7 > Transactions should be committed after they reach quorum of "real" > cluster members. Therefore, anonymous replicas don't have to I would rephrase it as ‘replicas shouldn’t participate’ > participate in the quorum. > >> +++ b/test/replication-luatest/gh_5418_qsync_with_anon_test.lua >> @@ -0,0 +1,62 @@ >> +local t = require('luatest') >> +local cluster = require('test.luatest_helpers.cluster') >> +local helpers = require('test.luatest_helpers') >> + >> +local g = t.group('gh-5418', {{engine = 'memtx'}, {engine = 'vinyl'}}) >> + >> +g.before_each(function(cg) >> + local engine = cg.params.engine >> + >> + cg.cluster = cluster:new({}) >> + >> + local box_cfg = { >> + replication = { >> + helpers.instance_uri('master') >> + }, >> + replication_synchro_quorum = 2, >> + replication_timeout = 1 >> + } >> + >> + cg.master = cg.cluster:build_server({alias = 'master', engine = engine, box_cfg = box_cfg}) >> + >> + local box_cfg = { >> + replication = { >> + helpers.instance_uri('master'), >> + helpers.instance_uri('replica') >> + }, >> + replication_timeout = 1, >> + replication_connect_timeout = 4, >> + read_only = true, >> + replication_anon = true >> + } >> + >> + cg.replica = cg.cluster:build_server({alias = 'replica', engine = engine, box_cfg = box_cfg}) >> + >> + cg.cluster:add_server(cg.master) >> + cg.cluster:add_server(cg.replica) >> + cg.cluster:start() >> +end) >> + >> + >> +g.after_each(function(cg) >> + cg.cluster.servers = nil >> + cg.cluster:drop() >> +end) >> + >> + >> +g.test_qsync_with_anon = function(cg) >> + cg.master:eval("box.schema.space.create('sync', {is_sync = true})") >> + cg.master:eval("box.space.sync:create_index('pk')") >> + cg.master:eval("box.ctl.promote()") >> + >> + t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out", >> + function() cg.master:eval("return box.space.sync:insert{1}") end) >> + >> + -- Wait until everything is replicated from the master to the replica >> + local vclock = cg.master:eval("return box.info.vclock") >> + vclock[0] = nil >> + helpers:wait_vclock(cg.replica, vclock) >> + By this point the insert() is replicated from the master. I wonder if ROLLBACK will be delivered to the replica by the time of select()? >> + t.assert_equals(cg.master:eval("return box.space.sync:select()"), {}) >> + t.assert_equals(cg.replica:eval("return box.space.sync:select()"), {}) >> +end ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum 2021-10-25 9:52 [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum Yan Shtunder via Tarantool-patches 2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches @ 2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches 1 sibling, 0 replies; 6+ messages in thread From: Kirill Yukhin via Tarantool-patches @ 2021-11-29 15:17 UTC (permalink / raw) To: Yan Shtunder; +Cc: tarantool-patches Hello, On 25 окт 12:52, Yan Shtunder via Tarantool-patches wrote: > Transactions have to committed after they reaches quorum of "real" > cluster members. Therefore, anonymous replicas don't have to > participate in the quorum. > > Closes #5418 > --- > Issue: https://github.com/tarantool/tarantool/issues/5418 > Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas I've checked your patch into master. -- Regards, Kirill Yukhin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-11-29 15:17 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-25 9:52 [Tarantool-patches] [PATCH v3] replication: removing anonymous replicas from synchro quorum Yan Shtunder via Tarantool-patches 2021-10-25 13:32 ` Serge Petrenko via Tarantool-patches [not found] ` <CAP94r39r6HMBxDDShO5qTYVBPz9kLVgRvaSBq8n6F+BUn1m4xw@mail.gmail.com> 2021-10-29 8:06 ` Serge Petrenko via Tarantool-patches [not found] ` <CAP94r3_CkdY5QFJ543XsL-wGU+m3K0CBXaOpznL72jpzgXWGEQ@mail.gmail.com> 2021-11-03 15:01 ` Serge Petrenko via Tarantool-patches 2021-11-11 15:04 ` sergos via Tarantool-patches 2021-11-29 15:17 ` Kirill Yukhin via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox