From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 6A99F6EC40; Mon, 27 Sep 2021 09:20:24 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6A99F6EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1632723624; bh=bxhFO03JLYAHuCi0EY2eoORTZ55zRO4AzX492bGD4Dg=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=jHncle++j95PDlxh1wRDa9U8LGeivuPrpFsaNm/ZwFLDf9i1EjppK8xCYjNrCmNX3 gasvAvhB8CK+4jfU64SPghmKZzvP8e2DIcbAlwOOUsCTfOHcB/4plMb15GHPdZhFRH Y3uc2E7jNmqrpctPdAVZAVniAMJvlePuN8zDCRA8= Received: from smtp50.i.mail.ru (smtp50.i.mail.ru [94.100.177.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 072B06EC40 for ; Mon, 27 Sep 2021 09:20:22 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 072B06EC40 Received: by smtp50.i.mail.ru with esmtpa (envelope-from ) id 1mUjzx-0004Zt-B0; Mon, 27 Sep 2021 09:20:22 +0300 To: Yan Shtunder Cc: tarantool-patches@dev.tarantool.org, shemeneval@gmail.com References: <20210921100003.16244-1-ya.shtunder@gmail.com> Message-ID: Date: Mon, 27 Sep 2021 09:20:20 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20210921100003.16244-1-ya.shtunder@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD96A58C36AA2E99649DFE41A06634A2C4EB30EB11CC09FFC21182A05F538085040E92196953BD967E0DC178282590E3DE268EEA97AF770D2CAFA93C137599AA35C X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7E9C44F3538ABC873EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006375E5376F82D9E62E48638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D859C7B659D630BF5847230F07545E7EF4117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAA867293B0326636D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B69507CD28D48FD5E7089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A54106830B6EC8F2B363CE45004FC94D2B24ECECC19FA26697D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75BFC02AB3DF06BA5A410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34E3E1DC0F9BD5530984D04D444FF8452D02292A3197C271E222AB9BA92F403D365D463B01F52047201D7E09C32AA3244CBFC1F27A07FE64B7A9A9EBEA4215C96FB038C9161EF167A1FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojnnOrOlyI/jNp3ekbmJV5dQ== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446F4EE0E0F6D0152BA76060EAF1D884BBCF72BA3CD72711FA9424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH] replication: removing anonymous replicas from synchro quorum X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 21.09.2021 13:00, Yan Shtunder пишет: > Transactions have to committed after they reaches quorum of "real" > cluster members. Therefore, anonymous replicas don't have to > participate in the quorum. > > Closes #5418 Hi! Thanks for the patch! > --- > Issue: https://github.com/tarantool/tarantool/issues/5418 > Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas Please, rebase the branch on top of the current master. > > src/box/relay.cc | 3 +- > test/replication-luatest/gh_5418_test.lua | 81 ++++++++ > .../instance_files/master.lua | 19 ++ > .../instance_files/replica.lua | 22 +++ > test/replication-luatest/suite.ini | 3 + > test/replication/qsync_with_anon.result | 187 ++---------------- > test/replication/qsync_with_anon.test.lua | 83 ++------ > 7 files changed, 161 insertions(+), 237 deletions(-) > create mode 100644 test/replication-luatest/gh_5418_test.lua > create mode 100755 test/replication-luatest/instance_files/master.lua > create mode 100755 test/replication-luatest/instance_files/replica.lua > create mode 100644 test/replication-luatest/suite.ini > > diff --git a/src/box/relay.cc b/src/box/relay.cc > index 115037fc3..4d5f9a625 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -515,6 +515,7 @@ tx_status_update(struct cmsg *msg) > struct replication_ack ack; > ack.source = status->relay->replica->id; > ack.vclock = &status->vclock; > + bool anon = status->relay->replica->anon; > /* > * Let pending synchronous transactions know, which of > * them were successfully sent to the replica. Acks are > @@ -522,7 +523,7 @@ tx_status_update(struct cmsg *msg) > * the single master in 100% so far). Other instances wait > * for master's CONFIRM message instead. > */ > - if (txn_limbo.owner_id == instance_id) { > + if (txn_limbo.owner_id == instance_id && !anon) { > txn_limbo_ack(&txn_limbo, ack.source, > vclock_get(ack.vclock, instance_id)); > } > diff --git a/test/replication-luatest/gh_5418_test.lua b/test/replication-luatest/gh_5418_test.lua > new file mode 100644 > index 000000000..858fe342f > --- /dev/null > +++ b/test/replication-luatest/gh_5418_test.lua > @@ -0,0 +1,81 @@ > +local fio = require('fio') > +local log = require('log') > +local fiber = require('fiber') > + > +local t = require('luatest') > +local g = t.group() > + > +local Server = t.Server > + > +g.before_all(function() > + g.master = Server:new({ > + alias = 'master', > + command = './test/replication-luatest/instance_files/master.lua', Can the command be shortened to `./master.lua` ? > + workdir = fio.tempdir(), > + http_port = 8081, Can you omit the http_port? It's unused in the test. > + net_box_port = 13301, > + }) > + > + g.replica = Server:new({ > + alias = 'replica', > + command = './test/replication-luatest/instance_files/replica.lua', > + workdir = fio.tempdir(), > + env = {TARANTOOL_MASTER = '13301'}, > + http_port = 8082, > + net_box_port = 13302, > + }) > + > + > + g.master:start() > + g.replica:start() > + > + t.helpers.retrying({}, function() g.master:connect_net_box() end) > + t.helpers.retrying({}, function() g.replica:connect_net_box() end) > + > + log.info('Everything is started') > +end) > + > +g.after_all(function() > + g.replica:stop() > + g.master:stop() > + fio.rmtree(g.master.workdir) > + fio.rmtree(g.replica.workdir) > +end) > + > +local function wait_vclock(timeout) > + local started_at = fiber.clock() > + local lsn = g.master:eval("return box.info.vclock[1]") > + > + local _, tbl = g.master:eval("return next(box.info.replication_anon())") > + local to_lsn = tbl.downstream.vclock[1] > + > + while to_lsn == nil or to_lsn < lsn do > + fiber.sleep(0.001) > + > + if (fiber.clock() - started_at) > timeout then > + return false > + end > + > + _, tbl = g.master:eval("return next(box.info.replication_anon())") > + to_lsn = tbl.downstream.vclock[1] > + > + log.info(string.format("master lsn: %d; replica_anon lsn: %d", > + lsn, to_lsn)) > + end > + > + return true > +end > + > +g.test_qsync_with_anon = function() > + g.master:eval("box.schema.space.create('sync', {is_sync = true})") > + g.master:eval("box.space.sync:create_index('pk')") > + > + t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out", > + function() g.master:eval("return box.space.sync:insert{1}") end) > + > + -- Wait until everything is replicated from the master to the replica > + t.assert(wait_vclock(1)) > + > + t.assert_equals(g.master:eval("return box.space.sync:select()"), {}) > + t.assert_equals(g.replica:eval("return box.space.sync:select()"), {}) > +end > diff --git a/test/replication-luatest/instance_files/master.lua b/test/replication-luatest/instance_files/master.lua > new file mode 100755 > index 000000000..f2af4b529 > --- /dev/null > +++ b/test/replication-luatest/instance_files/master.lua > @@ -0,0 +1,19 @@ > +#!/usr/bin/env tarantool > + > +local function instance_uri(instance) > + local port = os.getenv(instance) > + return 'localhost:' .. port > +end > + > +box.cfg({ > + --log_level = 7, > + work_dir = os.getenv('TARANTOOL_WORKDIR'), > + listen = os.getenv('TARANTOOL_LISTEN'), > + replication = {instance_uri('TARANTOOL_LISTEN')}, > + memtx_memory = 107374182, > + replication_synchro_quorum = 2, > + replication_timeout = 0.1 > +}) > + > +box.schema.user.grant('guest', 'read, write, execute, create', 'universe', nil, {if_not_exists=true}) > +require('log').warn("master is ready") > diff --git a/test/replication-luatest/instance_files/replica.lua b/test/replication-luatest/instance_files/replica.lua > new file mode 100755 > index 000000000..ed830cde4 > --- /dev/null > +++ b/test/replication-luatest/instance_files/replica.lua > @@ -0,0 +1,22 @@ > +#!/usr/bin/env tarantool > + > +local function instance_uri(instance) > + local port = os.getenv(instance) > + return 'localhost:' .. port > +end > + > +box.cfg({ > + work_dir = os.getenv('TARANTOOL_WORKDIR'), > + listen = os.getenv('TARANTOOL_LISTEN'), > + replication = { > + instance_uri('TARANTOOL_MASTER'), > + instance_uri('TARANTOOL_LISTEN') > + }, > + memtx_memory = 107374182, > + replication_timeout = 0.1, > + replication_connect_timeout = 0.5, > + read_only = true, > + replication_anon = true > +}) > + > +require('log').warn("replica is ready") > diff --git a/test/replication-luatest/suite.ini b/test/replication-luatest/suite.ini > new file mode 100644 > index 000000000..ccd624099 > --- /dev/null > +++ b/test/replication-luatest/suite.ini > @@ -0,0 +1,3 @@ > +[default] > +core = luatest > +description = first try of using luatest > diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result > index 6a2952a32..d847a77aa 100644 > --- a/test/replication/qsync_with_anon.result > +++ b/test/replication/qsync_with_anon.result Since you've rewritten the test in luatest, you may remove the diff-based test altogether. > @@ -1,195 +1,57 @@ > -- test-run result file version 2 > -env = require('test_run') > - | --- > - | ... > -test_run = env.new() > - | --- > - | ... > -engine = test_run:get_cfg('engine') > - | --- > - | ... > - > -orig_synchro_quorum = box.cfg.replication_synchro_quorum > - | --- > - | ... > -orig_synchro_timeout = box.cfg.replication_synchro_timeout > +test_run = require('test_run').new() > | --- > | ... > > NUM_INSTANCES = 2 > | --- > | ... > -BROKEN_QUORUM = NUM_INSTANCES + 1 > - | --- > - | ... > > box.schema.user.grant('guest', 'replication') > | --- > | ... > - > --- Setup a cluster with anonymous replica. > -test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"') > - | --- > - | - true > - | ... > -test_run:cmd('start server replica_anon') > - | --- > - | - true > - | ... > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > - > --- [RFC, Asynchronous replication] successful transaction applied on async > --- replica. > --- Testcase setup. > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} > - | --- > - | ... > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > - | --- > - | ... > -_ = box.space.sync:create_index('pk') > - | --- > - | ... > --- Testcase body. > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.space.sync:insert{1} -- success > - | --- > - | - [1] > - | ... > -box.space.sync:insert{2} -- success > - | --- > - | - [2] > - | ... > -box.space.sync:insert{3} -- success > - | --- > - | - [3] > - | ... > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > -box.space.sync:select{} -- 1, 2, 3 > - | --- > - | - - [1] > - | - [2] > - | - [3] > - | ... > --- Testcase cleanup. > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.space.sync:drop() > - | --- > - | ... > - > --- [RFC, Asynchronous replication] failed transaction rolled back on async > --- replica. > --- Testcase setup. > -box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000} > +box.cfg{replication_synchro_quorum=NUM_INSTANCES} > | --- > | ... > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > +_ = box.schema.space.create('sync', {is_sync=true}) > | --- > | ... > _ = box.space.sync:create_index('pk') > | --- > | ... > --- Write something to flush the current master's state to replica. > -_ = box.space.sync:insert{1} > - | --- > - | ... > -_ = box.space.sync:delete{1} > - | --- > - | ... > > -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000} > - | --- > - | ... > -fiber = require('fiber') > - | --- > - | ... > -ok, err = nil > - | --- > - | ... > -f = fiber.create(function() \ > - ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \ > -end) > - | --- > - | ... > > -test_run:cmd('switch replica_anon') > +-- Setup a cluster with anonymous replica > +test_run:cmd('create server replica_anon with rpl_master=default,\ > + script="replication/anon1.lua"') > | --- > | - true > | ... > -test_run:wait_cond(function() return box.space.sync:count() == 1 end) > +test_run:cmd('start server replica_anon') > | --- > | - true > | ... > -box.space.sync:select{} > - | --- > - | - - [1] > - | ... > > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.cfg{replication_synchro_timeout = 0.001} > - | --- > - | ... > -test_run:wait_cond(function() return f:status() == 'dead' end) > + > +-- Testcase > +box.space.sync:insert{1} -- error > | --- > - | - true > + | - error: Quorum collection for a synchronous transaction is timed out > | ... > -box.space.sync:select{} > +box.space.sync:insert{3} -- error > | --- > - | - [] > + | - error: Quorum collection for a synchronous transaction is timed out > | ... > - > test_run:cmd('switch replica_anon') > | --- > | - true > | ... > -test_run:wait_cond(function() return box.space.sync:count() == 0 end) > - | --- > - | - true > - | ... > -box.space.sync:select{} > +box.space.sync:select{} -- [] > | --- > | - [] > | ... > > -test_run:switch('default') > - | --- > - | - true > - | ... > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} > - | --- > - | ... > -box.space.sync:insert{1} -- success > - | --- > - | - [1] > - | ... > -test_run:cmd('switch replica_anon') > - | --- > - | - true > - | ... > -box.space.sync:select{} -- 1 > - | --- > - | - - [1] > - | ... > --- Testcase cleanup. > +-- Testcase cleanup > test_run:switch('default') > | --- > | - true > @@ -198,12 +60,13 @@ box.space.sync:drop() > | --- > | ... > > --- Teardown. > -test_run:switch('default') > + > +-- Teardown > +test_run:cmd('stop server replica_anon') > | --- > | - true > | ... > -test_run:cmd('stop server replica_anon') > +test_run:cmd('cleanup server replica_anon') > | --- > | - true > | ... > @@ -211,15 +74,3 @@ test_run:cmd('delete server replica_anon') > | --- > | - true > | ... > -box.schema.user.revoke('guest', 'replication') > - | --- > - | ... > -box.cfg{ \ > - replication_synchro_quorum = orig_synchro_quorum, \ > - replication_synchro_timeout = orig_synchro_timeout, \ > -} > - | --- > - | ... > -test_run:cleanup_cluster() > - | --- > - | ... > diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua > index d7ecaa107..2d92f08aa 100644 > --- a/test/replication/qsync_with_anon.test.lua > +++ b/test/replication/qsync_with_anon.test.lua > @@ -1,84 +1,31 @@ > -env = require('test_run') > -test_run = env.new() > -engine = test_run:get_cfg('engine') > - > -orig_synchro_quorum = box.cfg.replication_synchro_quorum > -orig_synchro_timeout = box.cfg.replication_synchro_timeout > +test_run = require('test_run').new() > > NUM_INSTANCES = 2 > -BROKEN_QUORUM = NUM_INSTANCES + 1 > > box.schema.user.grant('guest', 'replication') > - > --- Setup a cluster with anonymous replica. > -test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"') > -test_run:cmd('start server replica_anon') > -test_run:cmd('switch replica_anon') > - > --- [RFC, Asynchronous replication] successful transaction applied on async > --- replica. > --- Testcase setup. > -test_run:switch('default') > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > +box.cfg{replication_synchro_quorum=NUM_INSTANCES} > +_ = box.schema.space.create('sync', {is_sync=true}) > _ = box.space.sync:create_index('pk') > --- Testcase body. > -test_run:switch('default') > -box.space.sync:insert{1} -- success > -box.space.sync:insert{2} -- success > -box.space.sync:insert{3} -- success > -test_run:cmd('switch replica_anon') > -box.space.sync:select{} -- 1, 2, 3 > --- Testcase cleanup. > -test_run:switch('default') > -box.space.sync:drop() > > --- [RFC, Asynchronous replication] failed transaction rolled back on async > --- replica. > --- Testcase setup. > -box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000} > -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) > -_ = box.space.sync:create_index('pk') > --- Write something to flush the current master's state to replica. > -_ = box.space.sync:insert{1} > -_ = box.space.sync:delete{1} > - > -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000} > -fiber = require('fiber') > -ok, err = nil > -f = fiber.create(function() \ > - ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \ > -end) > > -test_run:cmd('switch replica_anon') > -test_run:wait_cond(function() return box.space.sync:count() == 1 end) > -box.space.sync:select{} > +-- Setup a cluster with anonymous replica > +test_run:cmd('create server replica_anon with rpl_master=default,\ > + script="replication/anon1.lua"') > +test_run:cmd('start server replica_anon') > > -test_run:switch('default') > -box.cfg{replication_synchro_timeout = 0.001} > -test_run:wait_cond(function() return f:status() == 'dead' end) > -box.space.sync:select{} > > +-- Testcase > +box.space.sync:insert{1} -- error > +box.space.sync:insert{3} -- error > test_run:cmd('switch replica_anon') > -test_run:wait_cond(function() return box.space.sync:count() == 0 end) > -box.space.sync:select{} > +box.space.sync:select{} -- [] > > -test_run:switch('default') > -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} > -box.space.sync:insert{1} -- success > -test_run:cmd('switch replica_anon') > -box.space.sync:select{} -- 1 > --- Testcase cleanup. > +-- Testcase cleanup > test_run:switch('default') > box.space.sync:drop() > > --- Teardown. > -test_run:switch('default') > + > +-- Teardown > test_run:cmd('stop server replica_anon') > +test_run:cmd('cleanup server replica_anon') > test_run:cmd('delete server replica_anon') > -box.schema.user.revoke('guest', 'replication') > -box.cfg{ \ > - replication_synchro_quorum = orig_synchro_quorum, \ > - replication_synchro_timeout = orig_synchro_timeout, \ > -} > -test_run:cleanup_cluster() > -- > 2.25.1 > -- Serge Petrenko