From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2FFC2469719 for ; Mon, 7 Sep 2020 17:46:34 +0300 (MSK) Date: Mon, 7 Sep 2020 17:46:30 +0300 From: Mergen Imeev Message-ID: <20200907144630.GA194562@tarantool.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v2] Divide replication/misc.test.lua List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Alexander V. Tikhonov" Cc: tarantool-patches@dev.tarantool.org Hi! Thank you for the patch. See 4 comments below. On Fri, Sep 04, 2020 at 09:13:58PM +0300, Alexander V. Tikhonov wrote: > To fix flaky issues of replication/misc.test.lua the test had to be > divided into smaller tests to be able to localize the flaky results: > > gh-2991-misc-assert-on-server-die.test.lua > gh-3111-misc-rebootstrap-from-ro-master.test.lua > gh-3160-misc-heartbeats-on-master-changes.test.lua > gh-3247-misc-value-not-replicated-on-iproto-request.test.lua > gh-3510-misc-assert-replica-on-applier-disconnect.test.lua > gh-3606-misc-crash-on-box-concurrent-update.test.lua > gh-3610-misc-assert-connecting-master-twice.test.lua > gh-3637-misc-no-panic-on-connected.test.lua > gh-3642-misc-no-socket-leak-on-replica-disconnect.test.lua > gh-3704-misc-replica-checks-cluster-id.test.lua > gh-3711-misc-no-restart-on-same-configuration.test.lua > gh-3760-misc-return-on-quorum-0.test.lua > gh-4399-misc-no-failure-on-error-reading-wal.test.lua > gh-4424-misc-orphan-on-reconfiguration-error.test.lua 1. I don't think 'misc' is needed in test names. > > Needed for #4940 > --- > > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4940-replication-misc > Issue: https://github.com/tarantool/tarantool/issues/4940 > > .../gh-2991-misc-assert-on-server-die.result | 29 + > ...gh-2991-misc-assert-on-server-die.test.lua | 11 + > ...111-misc-rebootstrap-from-ro-master.result | 58 ++ > ...1-misc-rebootstrap-from-ro-master.test.lua | 20 + > ...0-misc-heartbeats-on-master-changes.result | 67 ++ > ...misc-heartbeats-on-master-changes.test.lua | 37 + > ...ue-not-replicated-on-iproto-request.result | 87 ++ > ...-not-replicated-on-iproto-request.test.lua | 32 + > ...ssert-replica-on-applier-disconnect.result | 46 + > ...ert-replica-on-applier-disconnect.test.lua | 16 + > ...misc-crash-on-box-concurrent-update.result | 45 + > ...sc-crash-on-box-concurrent-update.test.lua | 17 + > ...misc-assert-connecting-master-twice.result | 83 ++ > ...sc-assert-connecting-master-twice.test.lua | 33 + > .../gh-3637-misc-no-panic-on-connected.result | 69 ++ > ...h-3637-misc-no-panic-on-connected.test.lua | 32 + > ...o-socket-leak-on-replica-disconnect.result | 95 ++ > ...socket-leak-on-replica-disconnect.test.lua | 43 + > ...3704-misc-replica-checks-cluster-id.result | 68 ++ > ...04-misc-replica-checks-cluster-id.test.lua | 25 + > ...sc-no-restart-on-same-configuration.result | 101 ++ > ...-no-restart-on-same-configuration.test.lua | 39 + > .../gh-3760-misc-return-on-quorum-0.result | 23 + > .../gh-3760-misc-return-on-quorum-0.test.lua | 14 + > ...isc-no-failure-on-error-reading-wal.result | 94 ++ > ...c-no-failure-on-error-reading-wal.test.lua | 38 + > ...isc-orphan-on-reconfiguration-error.result | 82 ++ > ...c-orphan-on-reconfiguration-error.test.lua | 35 + > test/replication/misc.result | 866 ------------------ > test/replication/misc.skipcond | 7 - > test/replication/misc.test.lua | 356 ------- > test/replication/suite.cfg | 15 +- > test/replication/suite.ini | 2 +- > 33 files changed, 1354 insertions(+), 1231 deletions(-) > create mode 100644 test/replication/gh-2991-misc-assert-on-server-die.result > create mode 100644 test/replication/gh-2991-misc-assert-on-server-die.test.lua > create mode 100644 test/replication/gh-3111-misc-rebootstrap-from-ro-master.result > create mode 100644 test/replication/gh-3111-misc-rebootstrap-from-ro-master.test.lua > create mode 100644 test/replication/gh-3160-misc-heartbeats-on-master-changes.result > create mode 100644 test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua > create mode 100644 test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.result > create mode 100644 test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.test.lua > create mode 100644 test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.result > create mode 100644 test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.test.lua > create mode 100644 test/replication/gh-3606-misc-crash-on-box-concurrent-update.result > create mode 100644 test/replication/gh-3606-misc-crash-on-box-concurrent-update.test.lua > create mode 100644 test/replication/gh-3610-misc-assert-connecting-master-twice.result > create mode 100644 test/replication/gh-3610-misc-assert-connecting-master-twice.test.lua > create mode 100644 test/replication/gh-3637-misc-no-panic-on-connected.result > create mode 100644 test/replication/gh-3637-misc-no-panic-on-connected.test.lua > create mode 100644 test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.result > create mode 100644 test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.test.lua > create mode 100644 test/replication/gh-3704-misc-replica-checks-cluster-id.result > create mode 100644 test/replication/gh-3704-misc-replica-checks-cluster-id.test.lua > create mode 100644 test/replication/gh-3711-misc-no-restart-on-same-configuration.result > create mode 100644 test/replication/gh-3711-misc-no-restart-on-same-configuration.test.lua > create mode 100644 test/replication/gh-3760-misc-return-on-quorum-0.result > create mode 100644 test/replication/gh-3760-misc-return-on-quorum-0.test.lua > create mode 100644 test/replication/gh-4399-misc-no-failure-on-error-reading-wal.result > create mode 100644 test/replication/gh-4399-misc-no-failure-on-error-reading-wal.test.lua > create mode 100644 test/replication/gh-4424-misc-orphan-on-reconfiguration-error.result > create mode 100644 test/replication/gh-4424-misc-orphan-on-reconfiguration-error.test.lua > delete mode 100644 test/replication/misc.result > delete mode 100644 test/replication/misc.skipcond > delete mode 100644 test/replication/misc.test.lua > > diff --git a/test/replication/gh-2991-misc-assert-on-server-die.result b/test/replication/gh-2991-misc-assert-on-server-die.result > new file mode 100644 > index 000000000..ffae6e44a > --- /dev/null > +++ b/test/replication/gh-2991-misc-assert-on-server-die.result > @@ -0,0 +1,29 @@ > +-- gh-2991 - Tarantool asserts on box.cfg.replication update if one of > +-- servers is dead > +replication_timeout = box.cfg.replication_timeout > +--- > +... > +replication_connect_timeout = box.cfg.replication_connect_timeout > +--- > +... > +box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}} > +--- > +... > +box.cfg{replication_connect_quorum=2} > +--- > +... > +box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}} > +--- > +... > +box.info.status > +--- > +- orphan > +... > +box.info.ro > +--- > +- true > +... > +box.cfg{replication = "", replication_timeout = replication_timeout, \ > + replication_connect_timeout = replication_connect_timeout} > +--- > +... > diff --git a/test/replication/gh-2991-misc-assert-on-server-die.test.lua b/test/replication/gh-2991-misc-assert-on-server-die.test.lua > new file mode 100644 > index 000000000..b9f217cfa > --- /dev/null > +++ b/test/replication/gh-2991-misc-assert-on-server-die.test.lua > @@ -0,0 +1,11 @@ > +-- gh-2991 - Tarantool asserts on box.cfg.replication update if one of > +-- servers is dead > +replication_timeout = box.cfg.replication_timeout > +replication_connect_timeout = box.cfg.replication_connect_timeout > +box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}} > +box.cfg{replication_connect_quorum=2} > +box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}} > +box.info.status > +box.info.ro > +box.cfg{replication = "", replication_timeout = replication_timeout, \ > + replication_connect_timeout = replication_connect_timeout} > diff --git a/test/replication/gh-3111-misc-rebootstrap-from-ro-master.result b/test/replication/gh-3111-misc-rebootstrap-from-ro-master.result > new file mode 100644 > index 000000000..7ffca1585 > --- /dev/null > +++ b/test/replication/gh-3111-misc-rebootstrap-from-ro-master.result > @@ -0,0 +1,58 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +uuid = require('uuid') > +--- > +... > +box.schema.user.grant('guest', 'replication') > +--- > +... > +-- gh-3111 - Allow to rebootstrap a replica from a read-only master > +replica_uuid = uuid.new() > +--- > +... > +test_run:cmd('create server test with rpl_master=default, script="replication/replica_uuid.lua"') > +--- > +- true > +... > +test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > +--- > +- true > +... > +test_run:cmd('stop server test') > +--- > +- true > +... > +test_run:cmd('cleanup server test') > +--- > +- true > +... > +box.cfg{read_only = true} > +--- > +... > +test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > +--- > +- true > +... > +test_run:cmd('stop server test') > +--- > +- true > +... > +test_run:cmd('cleanup server test') > +--- > +- true > +... > +box.cfg{read_only = false} > +--- > +... > +test_run:cmd('delete server test') > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > +box.schema.user.revoke('guest', 'replication') > +--- > +... > diff --git a/test/replication/gh-3111-misc-rebootstrap-from-ro-master.test.lua b/test/replication/gh-3111-misc-rebootstrap-from-ro-master.test.lua > new file mode 100644 > index 000000000..bb9b4a80f > --- /dev/null > +++ b/test/replication/gh-3111-misc-rebootstrap-from-ro-master.test.lua > @@ -0,0 +1,20 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > +uuid = require('uuid') > + > +box.schema.user.grant('guest', 'replication') > + > +-- gh-3111 - Allow to rebootstrap a replica from a read-only master > +replica_uuid = uuid.new() > +test_run:cmd('create server test with rpl_master=default, script="replication/replica_uuid.lua"') > +test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > +test_run:cmd('stop server test') > +test_run:cmd('cleanup server test') > +box.cfg{read_only = true} > +test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > +test_run:cmd('stop server test') > +test_run:cmd('cleanup server test') > +box.cfg{read_only = false} > +test_run:cmd('delete server test') > +test_run:cleanup_cluster() > +box.schema.user.revoke('guest', 'replication') > diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result > new file mode 100644 > index 000000000..9bce55ae1 > --- /dev/null > +++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result > @@ -0,0 +1,67 @@ > +test_run = require('test_run').new() > +--- > +... > +-- gh-3160 - Send heartbeats if there are changes from a remote master only > +SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' } > +--- > +... > +-- Deploy a cluster. > +test_run:create_cluster(SERVERS, "replication", {args="0.03"}) > +--- > +... > +test_run:wait_fullmesh(SERVERS) > +--- > +... > +test_run:cmd("switch autobootstrap3") > +--- > +- true > +... > +test_run = require('test_run').new() > +--- > +... > +_ = box.schema.space.create('test_timeout'):create_index('pk') > +--- > +... > +test_run:cmd("setopt delimiter ';'") > +--- > +- true > +... > +function wait_not_follow(replicaA, replicaB) > + return test_run:wait_cond(function() > + return replicaA.status ~= 'follow' or replicaB.status ~= 'follow' > + end, box.cfg.replication_timeout) > +end; > +--- > +... > +function test_timeout() > + local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream > + local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream > + local follows = test_run:wait_cond(function() > + return replicaA.status == 'follow' or replicaB.status == 'follow' > + end) > + if not follows then error('replicas are not in the follow status') end > + for i = 0, 99 do > + box.space.test_timeout:replace({1}) > + if wait_not_follow(replicaA, replicaB) then > + return error(box.info.replication) > + end > + end > + return true > +end; > +--- > +... > +test_run:cmd("setopt delimiter ''"); > +--- > +- true > +... > +test_timeout() > +--- > +- true > +... > +test_run:cmd("switch default") > +--- > +- true > +... > +test_run:drop_cluster(SERVERS) > +--- > +... > diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua > new file mode 100644 > index 000000000..b3d8d2d54 > --- /dev/null > +++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua > @@ -0,0 +1,37 @@ > +test_run = require('test_run').new() > + > +-- gh-3160 - Send heartbeats if there are changes from a remote master only > +SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' } > + > +-- Deploy a cluster. > +test_run:create_cluster(SERVERS, "replication", {args="0.03"}) > +test_run:wait_fullmesh(SERVERS) > +test_run:cmd("switch autobootstrap3") > +test_run = require('test_run').new() > +_ = box.schema.space.create('test_timeout'):create_index('pk') > +test_run:cmd("setopt delimiter ';'") > +function wait_not_follow(replicaA, replicaB) > + return test_run:wait_cond(function() > + return replicaA.status ~= 'follow' or replicaB.status ~= 'follow' > + end, box.cfg.replication_timeout) > +end; > +function test_timeout() > + local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream > + local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream > + local follows = test_run:wait_cond(function() > + return replicaA.status == 'follow' or replicaB.status == 'follow' > + end) > + if not follows then error('replicas are not in the follow status') end > + for i = 0, 99 do > + box.space.test_timeout:replace({1}) > + if wait_not_follow(replicaA, replicaB) then > + return error(box.info.replication) > + end > + end > + return true > +end; > +test_run:cmd("setopt delimiter ''"); > +test_timeout() > + > +test_run:cmd("switch default") > +test_run:drop_cluster(SERVERS) > diff --git a/test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.result b/test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.result > new file mode 100644 > index 000000000..39f9b5763 > --- /dev/null > +++ b/test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.result > @@ -0,0 +1,87 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +-- Deploy a cluster. > +SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' } > +--- > +... > +test_run:create_cluster(SERVERS, "replication", {args="0.03"}) > +--- > +... > +test_run:wait_fullmesh(SERVERS) > +--- > +... > +-- gh-3247 - Sequence-generated value is not replicated in case > +-- the request was sent via iproto. > +test_run:cmd("switch autobootstrap1") > +--- > +- true > +... > +net_box = require('net.box') > +--- > +... > +_ = box.schema.space.create('space1') > +--- > +... > +_ = box.schema.sequence.create('seq') > +--- > +... > +_ = box.space.space1:create_index('primary', {sequence = true} ) > +--- > +... > +_ = box.space.space1:create_index('secondary', {parts = {2, 'unsigned'}}) > +--- > +... > +box.schema.user.grant('guest', 'read,write', 'space', 'space1') > +--- > +... > +c = net_box.connect(box.cfg.listen) > +--- > +... > +c.space.space1:insert{box.NULL, "data"} -- fails, but bumps sequence value > +--- > +- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' > +... > +c.space.space1:insert{box.NULL, 1, "data"} > +--- > +- [2, 1, 'data'] > +... > +box.space.space1:select{} > +--- > +- - [2, 1, 'data'] > +... > +vclock = test_run:get_vclock("autobootstrap1") > +--- > +... > +vclock[0] = nil > +--- > +... > +_ = test_run:wait_vclock("autobootstrap2", vclock) > +--- > +... > +test_run:cmd("switch autobootstrap2") > +--- > +- true > +... > +box.space.space1:select{} > +--- > +- - [2, 1, 'data'] > +... > +test_run:cmd("switch autobootstrap1") > +--- > +- true > +... > +box.space.space1:drop() > +--- > +... > +test_run:cmd("switch default") > +--- > +- true > +... > +test_run:drop_cluster(SERVERS) > +--- > +... > +test_run:cleanup_cluster() > +--- > +... > diff --git a/test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.test.lua b/test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.test.lua > new file mode 100644 > index 000000000..a703377c3 > --- /dev/null > +++ b/test/replication/gh-3247-misc-value-not-replicated-on-iproto-request.test.lua > @@ -0,0 +1,32 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > + > +-- Deploy a cluster. > +SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' } > +test_run:create_cluster(SERVERS, "replication", {args="0.03"}) > +test_run:wait_fullmesh(SERVERS) > + > +-- gh-3247 - Sequence-generated value is not replicated in case > +-- the request was sent via iproto. > +test_run:cmd("switch autobootstrap1") > +net_box = require('net.box') > +_ = box.schema.space.create('space1') > +_ = box.schema.sequence.create('seq') > +_ = box.space.space1:create_index('primary', {sequence = true} ) > +_ = box.space.space1:create_index('secondary', {parts = {2, 'unsigned'}}) > +box.schema.user.grant('guest', 'read,write', 'space', 'space1') > +c = net_box.connect(box.cfg.listen) > +c.space.space1:insert{box.NULL, "data"} -- fails, but bumps sequence value > +c.space.space1:insert{box.NULL, 1, "data"} > +box.space.space1:select{} > +vclock = test_run:get_vclock("autobootstrap1") > +vclock[0] = nil > +_ = test_run:wait_vclock("autobootstrap2", vclock) > +test_run:cmd("switch autobootstrap2") > +box.space.space1:select{} > +test_run:cmd("switch autobootstrap1") > +box.space.space1:drop() > + > +test_run:cmd("switch default") > +test_run:drop_cluster(SERVERS) > +test_run:cleanup_cluster() > diff --git a/test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.result b/test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.result > new file mode 100644 > index 000000000..03e4ebfd5 > --- /dev/null > +++ b/test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.result > @@ -0,0 +1,46 @@ > +test_run = require('test_run').new() > +--- > +... > +-- gh-3510 assertion failure in replica_on_applier_disconnect() > +test_run:cmd('create server er_load1 with script="replication/er_load1.lua"') > +--- > +- true > +... > +test_run:cmd('create server er_load2 with script="replication/er_load2.lua"') > +--- > +- true > +... > +test_run:cmd('start server er_load1 with wait=False, wait_load=False') > +--- > +- true > +... > +-- Instance er_load2 will fail with error ER_REPLICASET_UUID_MISMATCH. > +-- This is OK since we only test here that er_load1 doesn't assert. > +test_run:cmd('start server er_load2 with wait=True, wait_load=True, crash_expected = True') > +--- > +- false > +... > +test_run:cmd('stop server er_load1') > +--- > +- true > +... > +-- er_load2 exits automatically. > +test_run:cmd('cleanup server er_load1') > +--- > +- true > +... > +test_run:cmd('cleanup server er_load2') > +--- > +- true > +... > +test_run:cmd('delete server er_load1') > +--- > +- true > +... > +test_run:cmd('delete server er_load2') > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > diff --git a/test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.test.lua b/test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.test.lua > new file mode 100644 > index 000000000..f55d0137b > --- /dev/null > +++ b/test/replication/gh-3510-misc-assert-replica-on-applier-disconnect.test.lua > @@ -0,0 +1,16 @@ > +test_run = require('test_run').new() > + > +-- gh-3510 assertion failure in replica_on_applier_disconnect() > +test_run:cmd('create server er_load1 with script="replication/er_load1.lua"') > +test_run:cmd('create server er_load2 with script="replication/er_load2.lua"') > +test_run:cmd('start server er_load1 with wait=False, wait_load=False') > +-- Instance er_load2 will fail with error ER_REPLICASET_UUID_MISMATCH. > +-- This is OK since we only test here that er_load1 doesn't assert. > +test_run:cmd('start server er_load2 with wait=True, wait_load=True, crash_expected = True') > +test_run:cmd('stop server er_load1') > +-- er_load2 exits automatically. > +test_run:cmd('cleanup server er_load1') > +test_run:cmd('cleanup server er_load2') > +test_run:cmd('delete server er_load1') > +test_run:cmd('delete server er_load2') > +test_run:cleanup_cluster() > diff --git a/test/replication/gh-3606-misc-crash-on-box-concurrent-update.result b/test/replication/gh-3606-misc-crash-on-box-concurrent-update.result > new file mode 100644 > index 000000000..4de4ad35a > --- /dev/null > +++ b/test/replication/gh-3606-misc-crash-on-box-concurrent-update.result > @@ -0,0 +1,45 @@ > +replication_timeout = box.cfg.replication_timeout > +--- > +... > +replication_connect_timeout = box.cfg.replication_connect_timeout > +--- > +... > +box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}} > +--- > +... > +-- gh-3606 - Tarantool crashes if box.cfg.replication is updated concurrently > +fiber = require('fiber') > +--- > +... > +c = fiber.channel(2) > +--- > +... > +f = function() fiber.create(function() pcall(box.cfg, {replication = {12345}}) c:put(true) end) end > +--- > +... > +f() > +--- > +... > +f() > +--- > +... > +c:get() > +--- > +- true > +... > +c:get() > +--- > +- true > +... > +box.cfg{replication = "", replication_timeout = replication_timeout, \ > + replication_connect_timeout = replication_connect_timeout} > +--- > +... > +box.info.status > +--- > +- running > +... > +box.info.ro > +--- > +- false > +... > diff --git a/test/replication/gh-3606-misc-crash-on-box-concurrent-update.test.lua b/test/replication/gh-3606-misc-crash-on-box-concurrent-update.test.lua > new file mode 100644 > index 000000000..3792cc9e1 > --- /dev/null > +++ b/test/replication/gh-3606-misc-crash-on-box-concurrent-update.test.lua > @@ -0,0 +1,17 @@ > +replication_timeout = box.cfg.replication_timeout > +replication_connect_timeout = box.cfg.replication_connect_timeout > +box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}} > + > +-- gh-3606 - Tarantool crashes if box.cfg.replication is updated concurrently > +fiber = require('fiber') > +c = fiber.channel(2) > +f = function() fiber.create(function() pcall(box.cfg, {replication = {12345}}) c:put(true) end) end > +f() > +f() > +c:get() > +c:get() > + > +box.cfg{replication = "", replication_timeout = replication_timeout, \ > + replication_connect_timeout = replication_connect_timeout} > +box.info.status > +box.info.ro > diff --git a/test/replication/gh-3610-misc-assert-connecting-master-twice.result b/test/replication/gh-3610-misc-assert-connecting-master-twice.result > new file mode 100644 > index 000000000..f2b07f30b > --- /dev/null > +++ b/test/replication/gh-3610-misc-assert-connecting-master-twice.result > @@ -0,0 +1,83 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +fiber = require('fiber') > +--- > +... > +-- > +-- Test case for gh-3610. Before the fix replica would fail with the assertion > +-- when trying to connect to the same master twice. > +-- > +box.schema.user.grant('guest', 'replication') > +--- > +... > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +--- > +- true > +... > +test_run:cmd("start server replica") > +--- > +- true > +... > +test_run:cmd("switch replica") > +--- > +- true > +... > +replication = box.cfg.replication[1] > +--- > +... > +box.cfg{replication = {replication, replication}} > +--- > +- error: 'Incorrect value for option ''replication'': duplicate connection to the > + same replica' > +... > +-- Check the case when duplicate connection is detected in the background. > +test_run:cmd("switch default") > +--- > +- true > +... > +listen = box.cfg.listen > +--- > +... > +box.cfg{listen = ''} > +--- > +... > +test_run:cmd("switch replica") > +--- > +- true > +... > +box.cfg{replication_connect_quorum = 0, replication_connect_timeout = 0.01} > +--- > +... > +box.cfg{replication = {replication, replication}} > +--- > +... > +test_run:cmd("switch default") > +--- > +- true > +... > +box.cfg{listen = listen} > +--- > +... > +while test_run:grep_log('replica', 'duplicate connection') == nil do fiber.sleep(0.01) end > +--- > +... > +test_run:cmd("stop server replica") > +--- > +- true > +... > +test_run:cmd("cleanup server replica") > +--- > +- true > +... > +test_run:cmd("delete server replica") > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > +box.schema.user.revoke('guest', 'replication') > +--- > +... > diff --git a/test/replication/gh-3610-misc-assert-connecting-master-twice.test.lua b/test/replication/gh-3610-misc-assert-connecting-master-twice.test.lua > new file mode 100644 > index 000000000..5f86eb15b > --- /dev/null > +++ b/test/replication/gh-3610-misc-assert-connecting-master-twice.test.lua > @@ -0,0 +1,33 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > +fiber = require('fiber') > + > +-- > +-- Test case for gh-3610. Before the fix replica would fail with the assertion > +-- when trying to connect to the same master twice. > +-- > +box.schema.user.grant('guest', 'replication') > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +test_run:cmd("start server replica") > +test_run:cmd("switch replica") > +replication = box.cfg.replication[1] > +box.cfg{replication = {replication, replication}} > + > +-- Check the case when duplicate connection is detected in the background. > +test_run:cmd("switch default") > +listen = box.cfg.listen > +box.cfg{listen = ''} > + > +test_run:cmd("switch replica") > +box.cfg{replication_connect_quorum = 0, replication_connect_timeout = 0.01} > +box.cfg{replication = {replication, replication}} > + > +test_run:cmd("switch default") > +box.cfg{listen = listen} > +while test_run:grep_log('replica', 'duplicate connection') == nil do fiber.sleep(0.01) end > + > +test_run:cmd("stop server replica") > +test_run:cmd("cleanup server replica") > +test_run:cmd("delete server replica") > +test_run:cleanup_cluster() > +box.schema.user.revoke('guest', 'replication') > diff --git a/test/replication/gh-3637-misc-no-panic-on-connected.result b/test/replication/gh-3637-misc-no-panic-on-connected.result > new file mode 100644 > index 000000000..98880d8e4 > --- /dev/null > +++ b/test/replication/gh-3637-misc-no-panic-on-connected.result > @@ -0,0 +1,69 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +-- > +-- Test case for gh-3637, gh-4550. Before the fix replica would > +-- exit with an error if a user does not exist or a password is > +-- incorrect. Now check that we don't hang/panic and successfully > +-- connect. > +-- > +fiber = require('fiber') > +--- > +... > +test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'") > +--- > +- true > +... > +test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'") > +--- > +- true > +... > +-- Wait a bit to make sure replica waits till user is created. > +fiber.sleep(0.1) > +--- > +... > +box.schema.user.create('cluster') > +--- > +... > +-- The user is created. Let the replica fail auth request due to > +-- a wrong password. > +fiber.sleep(0.1) > +--- > +... > +box.schema.user.passwd('cluster', 'pass') > +--- > +... > +box.schema.user.grant('cluster', 'replication') > +--- > +... > +while box.info.replication[2] == nil do fiber.sleep(0.01) end > +--- > +... > +vclock = test_run:get_vclock('default') > +--- > +... > +vclock[0] = nil > +--- > +... > +_ = test_run:wait_vclock('replica_auth', vclock) > +--- > +... > +test_run:cmd("stop server replica_auth") > +--- > +- true > +... > +test_run:cmd("cleanup server replica_auth") > +--- > +- true > +... > +test_run:cmd("delete server replica_auth") > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > +box.schema.user.drop('cluster') > +--- > +... > diff --git a/test/replication/gh-3637-misc-no-panic-on-connected.test.lua b/test/replication/gh-3637-misc-no-panic-on-connected.test.lua > new file mode 100644 > index 000000000..c51a2f628 > --- /dev/null > +++ b/test/replication/gh-3637-misc-no-panic-on-connected.test.lua > @@ -0,0 +1,32 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > + > +-- > +-- Test case for gh-3637, gh-4550. Before the fix replica would > +-- exit with an error if a user does not exist or a password is > +-- incorrect. Now check that we don't hang/panic and successfully > +-- connect. > +-- > +fiber = require('fiber') > +test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'") > +test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'") > +-- Wait a bit to make sure replica waits till user is created. > +fiber.sleep(0.1) > +box.schema.user.create('cluster') > +-- The user is created. Let the replica fail auth request due to > +-- a wrong password. > +fiber.sleep(0.1) > +box.schema.user.passwd('cluster', 'pass') > +box.schema.user.grant('cluster', 'replication') > + > +while box.info.replication[2] == nil do fiber.sleep(0.01) end > +vclock = test_run:get_vclock('default') > +vclock[0] = nil > +_ = test_run:wait_vclock('replica_auth', vclock) > + > +test_run:cmd("stop server replica_auth") > +test_run:cmd("cleanup server replica_auth") > +test_run:cmd("delete server replica_auth") > +test_run:cleanup_cluster() > + > +box.schema.user.drop('cluster') > diff --git a/test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.result b/test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.result > new file mode 100644 > index 000000000..d068ad8fc > --- /dev/null > +++ b/test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.result > @@ -0,0 +1,95 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +box.schema.user.grant('guest', 'replication') > +--- > +... > +-- gh-3642 - Check that socket file descriptor doesn't leak > +-- when a replica is disconnected. > +rlimit = require('rlimit') > +--- > +... > +lim = rlimit.limit() > +--- > +... > +rlimit.getrlimit(rlimit.RLIMIT_NOFILE, lim) > +--- > +... > +old_fno = lim.rlim_cur > +--- > +... > +lim.rlim_cur = 64 > +--- > +... > +rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > +--- > +... > +test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"') > +--- > +- true > +... > +test_run:cmd('start server sock') > +--- > +- true > +... > +test_run:cmd('switch sock') > +--- > +- true > +... > +test_run = require('test_run').new() > +--- > +... > +fiber = require('fiber') > +--- > +... > +test_run:cmd("setopt delimiter ';'") > +--- > +- true > +... > +for i = 1, 64 do > + local replication = box.cfg.replication > + box.cfg{replication = {}} > + box.cfg{replication = replication} > + while box.info.replication[1].upstream.status ~= 'follow' do > + fiber.sleep(0.001) > + end > +end; > +--- > +... > +test_run:cmd("setopt delimiter ''"); > +--- > +- true > +... > +box.info.replication[1].upstream.status > +--- > +- follow > +... > +test_run:cmd('switch default') > +--- > +- true > +... > +lim.rlim_cur = old_fno > +--- > +... > +rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > +--- > +... > +test_run:cmd("stop server sock") > +--- > +- true > +... > +test_run:cmd("cleanup server sock") > +--- > +- true > +... > +test_run:cmd("delete server sock") > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > +box.schema.user.revoke('guest', 'replication') > +--- > +... > diff --git a/test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.test.lua b/test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.test.lua > new file mode 100644 > index 000000000..9cfbe7214 > --- /dev/null > +++ b/test/replication/gh-3642-misc-no-socket-leak-on-replica-disconnect.test.lua > @@ -0,0 +1,43 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > + > +box.schema.user.grant('guest', 'replication') > + > +-- gh-3642 - Check that socket file descriptor doesn't leak > +-- when a replica is disconnected. > +rlimit = require('rlimit') > +lim = rlimit.limit() > +rlimit.getrlimit(rlimit.RLIMIT_NOFILE, lim) > +old_fno = lim.rlim_cur > +lim.rlim_cur = 64 > +rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > + > +test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"') > +test_run:cmd('start server sock') > +test_run:cmd('switch sock') > +test_run = require('test_run').new() > +fiber = require('fiber') > +test_run:cmd("setopt delimiter ';'") > +for i = 1, 64 do > + local replication = box.cfg.replication > + box.cfg{replication = {}} > + box.cfg{replication = replication} > + while box.info.replication[1].upstream.status ~= 'follow' do > + fiber.sleep(0.001) > + end > +end; > +test_run:cmd("setopt delimiter ''"); > + > +box.info.replication[1].upstream.status > + > +test_run:cmd('switch default') > + > +lim.rlim_cur = old_fno > +rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > + > +test_run:cmd("stop server sock") > +test_run:cmd("cleanup server sock") > +test_run:cmd("delete server sock") > +test_run:cleanup_cluster() > + > +box.schema.user.revoke('guest', 'replication') > diff --git a/test/replication/gh-3704-misc-replica-checks-cluster-id.result b/test/replication/gh-3704-misc-replica-checks-cluster-id.result > new file mode 100644 > index 000000000..1ca2913f8 > --- /dev/null > +++ b/test/replication/gh-3704-misc-replica-checks-cluster-id.result > @@ -0,0 +1,68 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +uuid = require('uuid') > +--- > +... > +-- > +-- gh-3704 move cluster id check to replica > +-- > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +--- > +- true > +... > +box.schema.user.grant("guest", "replication") > +--- > +... > +test_run:cmd("start server replica") > +--- > +- true > +... > +test_run:grep_log("replica", "REPLICASET_UUID_MISMATCH") > +--- > +- null > +... > +box.info.replication[2].downstream.status > +--- > +- follow > +... > +-- change master's cluster uuid and check that replica doesn't connect. > +test_run:cmd("stop server replica") > +--- > +- true > +... > +_ = box.space._schema:replace{'cluster', tostring(uuid.new())} > +--- > +... > +-- master believes replica is in cluster, but their cluster UUIDs differ. > +test_run:cmd("start server replica") > +--- > +- true > +... > +test_run:wait_log("replica", "REPLICASET_UUID_MISMATCH", nil, 1.0) > +--- > +- REPLICASET_UUID_MISMATCH > +... > +test_run:wait_downstream(2, {status = 'stopped'}) > +--- > +- true > +... > +test_run:cmd("stop server replica") > +--- > +- true > +... > +test_run:cmd("cleanup server replica") > +--- > +- true > +... > +test_run:cmd("delete server replica") > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > +box.schema.user.revoke('guest', 'replication') > +--- > +... > diff --git a/test/replication/gh-3704-misc-replica-checks-cluster-id.test.lua b/test/replication/gh-3704-misc-replica-checks-cluster-id.test.lua > new file mode 100644 > index 000000000..00c443a55 > --- /dev/null > +++ b/test/replication/gh-3704-misc-replica-checks-cluster-id.test.lua > @@ -0,0 +1,25 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > +uuid = require('uuid') > + > +-- > +-- gh-3704 move cluster id check to replica > +-- > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +box.schema.user.grant("guest", "replication") > +test_run:cmd("start server replica") > +test_run:grep_log("replica", "REPLICASET_UUID_MISMATCH") > +box.info.replication[2].downstream.status > +-- change master's cluster uuid and check that replica doesn't connect. > +test_run:cmd("stop server replica") > +_ = box.space._schema:replace{'cluster', tostring(uuid.new())} > +-- master believes replica is in cluster, but their cluster UUIDs differ. > +test_run:cmd("start server replica") > +test_run:wait_log("replica", "REPLICASET_UUID_MISMATCH", nil, 1.0) > +test_run:wait_downstream(2, {status = 'stopped'}) > + > +test_run:cmd("stop server replica") > +test_run:cmd("cleanup server replica") > +test_run:cmd("delete server replica") > +test_run:cleanup_cluster() > +box.schema.user.revoke('guest', 'replication') > diff --git a/test/replication/gh-3711-misc-no-restart-on-same-configuration.result b/test/replication/gh-3711-misc-no-restart-on-same-configuration.result > new file mode 100644 > index 000000000..c1e746f54 > --- /dev/null > +++ b/test/replication/gh-3711-misc-no-restart-on-same-configuration.result > @@ -0,0 +1,101 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +-- > +-- gh-3711 Do not restart replication on box.cfg in case the > +-- configuration didn't change. > +-- > +box.schema.user.grant('guest', 'replication') > +--- > +... > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +--- > +- true > +... > +test_run:cmd("start server replica") > +--- > +- true > +... > +-- Access rights are checked only during reconnect. If the new > +-- and old configurations are equivalent, no reconnect will be > +-- issued and replication should continue working. > +box.schema.user.revoke('guest', 'replication') > +--- > +... > +test_run:cmd("switch replica") > +--- > +- true > +... > +replication = box.cfg.replication[1] > +--- > +... > +box.cfg{replication = {replication}} > +--- > +... > +box.info.status == 'running' > +--- > +- true > +... > +box.cfg{replication = replication} > +--- > +... > +box.info.status == 'running' > +--- > +- true > +... > +-- Check that comparison of tables works as expected as well. > +test_run:cmd("switch default") > +--- > +- true > +... > +box.schema.user.grant('guest', 'replication') > +--- > +... > +test_run:cmd("switch replica") > +--- > +- true > +... > +replication = box.cfg.replication > +--- > +... > +table.insert(replication, box.cfg.listen) > +--- > +... > +test_run:cmd("switch default") > +--- > +- true > +... > +box.schema.user.revoke('guest', 'replication') > +--- > +... > +test_run:cmd("switch replica") > +--- > +- true > +... > +box.cfg{replication = replication} > +--- > +... > +box.info.status == 'running' > +--- > +- true > +... > +test_run:cmd("switch default") > +--- > +- true > +... > +test_run:cmd("stop server replica") > +--- > +- true > +... > +test_run:cmd("cleanup server replica") > +--- > +- true > +... > +test_run:cmd("delete server replica") > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > diff --git a/test/replication/gh-3711-misc-no-restart-on-same-configuration.test.lua b/test/replication/gh-3711-misc-no-restart-on-same-configuration.test.lua > new file mode 100644 > index 000000000..72666c5b7 > --- /dev/null > +++ b/test/replication/gh-3711-misc-no-restart-on-same-configuration.test.lua > @@ -0,0 +1,39 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > + > +-- > +-- gh-3711 Do not restart replication on box.cfg in case the > +-- configuration didn't change. > +-- > +box.schema.user.grant('guest', 'replication') > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +test_run:cmd("start server replica") > + > +-- Access rights are checked only during reconnect. If the new > +-- and old configurations are equivalent, no reconnect will be > +-- issued and replication should continue working. > +box.schema.user.revoke('guest', 'replication') > +test_run:cmd("switch replica") > +replication = box.cfg.replication[1] > +box.cfg{replication = {replication}} > +box.info.status == 'running' > +box.cfg{replication = replication} > +box.info.status == 'running' > + > +-- Check that comparison of tables works as expected as well. > +test_run:cmd("switch default") > +box.schema.user.grant('guest', 'replication') > +test_run:cmd("switch replica") > +replication = box.cfg.replication > +table.insert(replication, box.cfg.listen) > +test_run:cmd("switch default") > +box.schema.user.revoke('guest', 'replication') > +test_run:cmd("switch replica") > +box.cfg{replication = replication} > +box.info.status == 'running' > + > +test_run:cmd("switch default") > +test_run:cmd("stop server replica") > +test_run:cmd("cleanup server replica") > +test_run:cmd("delete server replica") > +test_run:cleanup_cluster() > diff --git a/test/replication/gh-3760-misc-return-on-quorum-0.result b/test/replication/gh-3760-misc-return-on-quorum-0.result > new file mode 100644 > index 000000000..79295f5c2 > --- /dev/null > +++ b/test/replication/gh-3760-misc-return-on-quorum-0.result > @@ -0,0 +1,23 @@ > +-- > +-- gh-3760: replication quorum 0 on reconfiguration should return > +-- from box.cfg immediately. > +-- > +replication = box.cfg.replication > +--- > +... > +box.cfg{ \ > + replication = {}, \ > + replication_connect_quorum = 0, \ > + replication_connect_timeout = 1000000 \ > +} > +--- > +... > +-- The call below would hang, if quorum 0 is ignored, or checked > +-- too late. > +box.cfg{replication = {'localhost:12345'}} > +--- > +... > +box.info.status > +--- > +- running > +... > diff --git a/test/replication/gh-3760-misc-return-on-quorum-0.test.lua b/test/replication/gh-3760-misc-return-on-quorum-0.test.lua > new file mode 100644 > index 000000000..30089ac23 > --- /dev/null > +++ b/test/replication/gh-3760-misc-return-on-quorum-0.test.lua > @@ -0,0 +1,14 @@ > +-- > +-- gh-3760: replication quorum 0 on reconfiguration should return > +-- from box.cfg immediately. > +-- > +replication = box.cfg.replication > +box.cfg{ \ > + replication = {}, \ > + replication_connect_quorum = 0, \ > + replication_connect_timeout = 1000000 \ > +} > +-- The call below would hang, if quorum 0 is ignored, or checked > +-- too late. > +box.cfg{replication = {'localhost:12345'}} > +box.info.status 2. I think there is something wrong with this test. I mean, you change the box.cfg options, but you don't restore them. > diff --git a/test/replication/gh-4399-misc-no-failure-on-error-reading-wal.result b/test/replication/gh-4399-misc-no-failure-on-error-reading-wal.result > new file mode 100644 > index 000000000..46b4f6464 > --- /dev/null > +++ b/test/replication/gh-4399-misc-no-failure-on-error-reading-wal.result > @@ -0,0 +1,94 @@ > +test_run = require('test_run').new() > +--- > +... > +test_run:cmd("restart server default") > +fiber = require('fiber') > +--- > +... > +-- > +-- gh-4399 Check that an error reading WAL directory on subscribe > +-- doesn't lead to a permanent replication failure. > +-- > +box.schema.user.grant("guest", "replication") > +--- > +... > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +--- > +- true > +... > +test_run:cmd("start server replica") > +--- > +- true > +... > +-- Make the WAL directory inaccessible. > +fio = require('fio') > +--- > +... > +path = fio.abspath(box.cfg.wal_dir) > +--- > +... > +fio.chmod(path, 0) > +--- > +- true > +... > +-- Break replication on timeout. > +replication_timeout = box.cfg.replication_timeout > +--- > +... > +box.cfg{replication_timeout = 9000} > +--- > +... > +test_run:cmd("switch replica") > +--- > +- true > +... > +test_run:wait_cond(function() return box.info.replication[1].upstream.status ~= 'follow' end) > +--- > +- true > +... > +require('fiber').sleep(box.cfg.replication_timeout) > +--- > +... > +test_run:cmd("switch default") > +--- > +- true > +... > +box.cfg{replication_timeout = replication_timeout} > +--- > +... > +-- Restore access to the WAL directory. > +-- Wait for replication to be reestablished. > +fio.chmod(path, tonumber('777', 8)) > +--- > +- true > +... > +test_run:cmd("switch replica") > +--- > +- true > +... > +test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end) > +--- > +- true > +... > +test_run:cmd("switch default") > +--- > +- true > +... > +test_run:cmd("stop server replica") > +--- > +- true > +... > +test_run:cmd("cleanup server replica") > +--- > +- true > +... > +test_run:cmd("delete server replica") > +--- > +- true > +... > +test_run:cleanup_cluster() > +--- > +... > +box.schema.user.revoke('guest', 'replication') > +--- > +... > diff --git a/test/replication/gh-4399-misc-no-failure-on-error-reading-wal.test.lua b/test/replication/gh-4399-misc-no-failure-on-error-reading-wal.test.lua > new file mode 100644 > index 000000000..a926ae590 > --- /dev/null > +++ b/test/replication/gh-4399-misc-no-failure-on-error-reading-wal.test.lua > @@ -0,0 +1,38 @@ > +test_run = require('test_run').new() > +test_run:cmd("restart server default") > +fiber = require('fiber') 3. This is not needed here. > + > +-- > +-- gh-4399 Check that an error reading WAL directory on subscribe > +-- doesn't lead to a permanent replication failure. > +-- > +box.schema.user.grant("guest", "replication") > +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > +test_run:cmd("start server replica") > + > +-- Make the WAL directory inaccessible. > +fio = require('fio') > +path = fio.abspath(box.cfg.wal_dir) > +fio.chmod(path, 0) > + > +-- Break replication on timeout. > +replication_timeout = box.cfg.replication_timeout > +box.cfg{replication_timeout = 9000} > +test_run:cmd("switch replica") > +test_run:wait_cond(function() return box.info.replication[1].upstream.status ~= 'follow' end) > +require('fiber').sleep(box.cfg.replication_timeout) > +test_run:cmd("switch default") > +box.cfg{replication_timeout = replication_timeout} > + > +-- Restore access to the WAL directory. > +-- Wait for replication to be reestablished. > +fio.chmod(path, tonumber('777', 8)) > +test_run:cmd("switch replica") > +test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end) > +test_run:cmd("switch default") > + > +test_run:cmd("stop server replica") > +test_run:cmd("cleanup server replica") > +test_run:cmd("delete server replica") > +test_run:cleanup_cluster() > +box.schema.user.revoke('guest', 'replication') > diff --git a/test/replication/gh-4424-misc-orphan-on-reconfiguration-error.result b/test/replication/gh-4424-misc-orphan-on-reconfiguration-error.result > new file mode 100644 > index 000000000..c87ef2e05 > --- /dev/null > +++ b/test/replication/gh-4424-misc-orphan-on-reconfiguration-error.result > @@ -0,0 +1,82 @@ > +test_run = require('test_run').new() > +--- > +... > +-- > +-- gh-4424 Always enter orphan mode on error in replication > +-- configuration change. > +-- > +replication_connect_timeout = box.cfg.replication_connect_timeout > +--- > +... > +replication_connect_quorum = box.cfg.replication_connect_quorum > +--- > +... > +box.cfg{replication="12345", replication_connect_timeout=0.1, replication_connect_quorum=1} > +--- > +... > +box.info.status > +--- > +- orphan > +... > +box.info.ro > +--- > +- true > +... > +-- reset replication => leave orphan mode > +box.cfg{replication=""} > +--- > +... > +box.info.status > +--- > +- running > +... > +box.info.ro > +--- > +- false > +... > +-- no switch to orphan when quorum == 0 > +box.cfg{replication="12345", replication_connect_quorum=0} > +--- > +... > +box.info.status > +--- > +- running > +... > +box.info.ro > +--- > +- false > +... > +-- we could connect to one out of two replicas. Set orphan. > +box.cfg{replication_connect_quorum=2} > +--- > +... > +box.cfg{replication={box.cfg.listen, "12345"}} > +--- > +... > +box.info.status > +--- > +- orphan > +... > +box.info.ro > +--- > +- true > +... > +-- lower quorum => leave orphan mode > +box.cfg{replication_connect_quorum=1} > +--- > +... > +box.info.status > +--- > +- running > +... > +box.info.ro > +--- > +- false > +... > +box.cfg{ \ > + replication = {}, \ > + replication_connect_quorum = replication_connect_quorum, \ > + replication_connect_timeout = replication_connect_timeout \ > +} > +--- > +... > diff --git a/test/replication/gh-4424-misc-orphan-on-reconfiguration-error.test.lua b/test/replication/gh-4424-misc-orphan-on-reconfiguration-error.test.lua > new file mode 100644 > index 000000000..6f42863c3 > --- /dev/null > +++ b/test/replication/gh-4424-misc-orphan-on-reconfiguration-error.test.lua > @@ -0,0 +1,35 @@ > +test_run = require('test_run').new() > + > +-- > +-- gh-4424 Always enter orphan mode on error in replication > +-- configuration change. > +-- > +replication_connect_timeout = box.cfg.replication_connect_timeout > +replication_connect_quorum = box.cfg.replication_connect_quorum > +box.cfg{replication="12345", replication_connect_timeout=0.1, replication_connect_quorum=1} > +box.info.status > +box.info.ro > +-- reset replication => leave orphan mode > +box.cfg{replication=""} > +box.info.status > +box.info.ro > +-- no switch to orphan when quorum == 0 > +box.cfg{replication="12345", replication_connect_quorum=0} > +box.info.status > +box.info.ro > + > +-- we could connect to one out of two replicas. Set orphan. > +box.cfg{replication_connect_quorum=2} > +box.cfg{replication={box.cfg.listen, "12345"}} > +box.info.status > +box.info.ro > +-- lower quorum => leave orphan mode > +box.cfg{replication_connect_quorum=1} > +box.info.status > +box.info.ro > + > +box.cfg{ \ > + replication = {}, \ > + replication_connect_quorum = replication_connect_quorum, \ > + replication_connect_timeout = replication_connect_timeout \ > +} > diff --git a/test/replication/misc.result b/test/replication/misc.result > deleted file mode 100644 > index e5d1f560e..000000000 > --- a/test/replication/misc.result > +++ /dev/null > @@ -1,866 +0,0 @@ > -uuid = require('uuid') > ---- > -... > -test_run = require('test_run').new() > ---- > -... > -box.schema.user.grant('guest', 'replication') > ---- > -... > --- gh-2991 - Tarantool asserts on box.cfg.replication update if one of > --- servers is dead > -replication_timeout = box.cfg.replication_timeout > ---- > -... > -replication_connect_timeout = box.cfg.replication_connect_timeout > ---- > -... > -box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}} > ---- > -... > -box.cfg{replication_connect_quorum=2} > ---- > -... > -box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}} > ---- > -... > -box.info.status > ---- > -- orphan > -... > -box.info.ro > ---- > -- true > -... > --- gh-3606 - Tarantool crashes if box.cfg.replication is updated concurrently > -fiber = require('fiber') > ---- > -... > -c = fiber.channel(2) > ---- > -... > -f = function() fiber.create(function() pcall(box.cfg, {replication = {12345}}) c:put(true) end) end > ---- > -... > -f() > ---- > -... > -f() > ---- > -... > -c:get() > ---- > -- true > -... > -c:get() > ---- > -- true > -... > -box.cfg{replication = "", replication_timeout = replication_timeout, replication_connect_timeout = replication_connect_timeout} > ---- > -... > -box.info.status > ---- > -- running > -... > -box.info.ro > ---- > -- false > -... > --- gh-3111 - Allow to rebootstrap a replica from a read-only master > -replica_uuid = uuid.new() > ---- > -... > -test_run:cmd('create server test with rpl_master=default, script="replication/replica_uuid.lua"') > ---- > -- true > -... > -test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > ---- > -- true > -... > -test_run:cmd('stop server test') > ---- > -- true > -... > -test_run:cmd('cleanup server test') > ---- > -- true > -... > -box.cfg{read_only = true} > ---- > -... > -test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > ---- > -- true > -... > -test_run:cmd('stop server test') > ---- > -- true > -... > -test_run:cmd('cleanup server test') > ---- > -- true > -... > -box.cfg{read_only = false} > ---- > -... > -test_run:cmd('delete server test') > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > --- gh-3160 - Send heartbeats if there are changes from a remote master only > -SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' } > ---- > -... > --- Deploy a cluster. > -test_run:create_cluster(SERVERS, "replication", {args="0.03"}) > ---- > -... > -test_run:wait_fullmesh(SERVERS) > ---- > -... > -test_run:cmd("switch autobootstrap3") > ---- > -- true > -... > -test_run = require('test_run').new() > ---- > -... > -fiber = require('fiber') > ---- > -... > -_ = box.schema.space.create('test_timeout'):create_index('pk') > ---- > -... > -test_run:cmd("setopt delimiter ';'") > ---- > -- true > -... > -function wait_not_follow(replicaA, replicaB) > - return test_run:wait_cond(function() > - return replicaA.status ~= 'follow' or replicaB.status ~= 'follow' > - end, box.cfg.replication_timeout) > -end; > ---- > -... > -function test_timeout() > - local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream > - local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream > - local follows = test_run:wait_cond(function() > - return replicaA.status == 'follow' or replicaB.status == 'follow' > - end) > - if not follows then error('replicas are not in the follow status') end > - for i = 0, 99 do > - box.space.test_timeout:replace({1}) > - if wait_not_follow(replicaA, replicaB) then > - return error(box.info.replication) > - end > - end > - return true > -end; > ---- > -... > -test_run:cmd("setopt delimiter ''"); > ---- > -- true > -... > -test_timeout() > ---- > -- true > -... > --- gh-3247 - Sequence-generated value is not replicated in case > --- the request was sent via iproto. > -test_run:cmd("switch autobootstrap1") > ---- > -- true > -... > -net_box = require('net.box') > ---- > -... > -_ = box.schema.space.create('space1') > ---- > -... > -_ = box.schema.sequence.create('seq') > ---- > -... > -_ = box.space.space1:create_index('primary', {sequence = true} ) > ---- > -... > -_ = box.space.space1:create_index('secondary', {parts = {2, 'unsigned'}}) > ---- > -... > -box.schema.user.grant('guest', 'read,write', 'space', 'space1') > ---- > -... > -c = net_box.connect(box.cfg.listen) > ---- > -... > -c.space.space1:insert{box.NULL, "data"} -- fails, but bumps sequence value > ---- > -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' > -... > -c.space.space1:insert{box.NULL, 1, "data"} > ---- > -- [2, 1, 'data'] > -... > -box.space.space1:select{} > ---- > -- - [2, 1, 'data'] > -... > -vclock = test_run:get_vclock("autobootstrap1") > ---- > -... > -vclock[0] = nil > ---- > -... > -_ = test_run:wait_vclock("autobootstrap2", vclock) > ---- > -... > -test_run:cmd("switch autobootstrap2") > ---- > -- true > -... > -box.space.space1:select{} > ---- > -- - [2, 1, 'data'] > -... > -test_run:cmd("switch autobootstrap1") > ---- > -- true > -... > -box.space.space1:drop() > ---- > -... > -test_run:cmd("switch default") > ---- > -- true > -... > -test_run:drop_cluster(SERVERS) > ---- > -... > -test_run:cleanup_cluster() > ---- > -... > --- gh-3642 - Check that socket file descriptor doesn't leak > --- when a replica is disconnected. > -rlimit = require('rlimit') > ---- > -... > -lim = rlimit.limit() > ---- > -... > -rlimit.getrlimit(rlimit.RLIMIT_NOFILE, lim) > ---- > -... > -old_fno = lim.rlim_cur > ---- > -... > -lim.rlim_cur = 64 > ---- > -... > -rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > ---- > -... > -test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"') > ---- > -- true > -... > -test_run:cmd('start server sock') > ---- > -- true > -... > -test_run:cmd('switch sock') > ---- > -- true > -... > -test_run = require('test_run').new() > ---- > -... > -fiber = require('fiber') > ---- > -... > -test_run:cmd("setopt delimiter ';'") > ---- > -- true > -... > -for i = 1, 64 do > - local replication = box.cfg.replication > - box.cfg{replication = {}} > - box.cfg{replication = replication} > - while box.info.replication[1].upstream.status ~= 'follow' do > - fiber.sleep(0.001) > - end > -end; > ---- > -... > -test_run:cmd("setopt delimiter ''"); > ---- > -- true > -... > -box.info.replication[1].upstream.status > ---- > -- follow > -... > -test_run:cmd('switch default') > ---- > -- true > -... > -lim.rlim_cur = old_fno > ---- > -... > -rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > ---- > -... > -test_run:cmd("stop server sock") > ---- > -- true > -... > -test_run:cmd("cleanup server sock") > ---- > -- true > -... > -test_run:cmd("delete server sock") > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > -box.schema.user.revoke('guest', 'replication') > ---- > -... > --- gh-3510 assertion failure in replica_on_applier_disconnect() > -test_run:cmd('create server er_load1 with script="replication/er_load1.lua"') > ---- > -- true > -... > -test_run:cmd('create server er_load2 with script="replication/er_load2.lua"') > ---- > -- true > -... > -test_run:cmd('start server er_load1 with wait=False, wait_load=False') > ---- > -- true > -... > --- Instance er_load2 will fail with error ER_REPLICASET_UUID_MISMATCH. > --- This is OK since we only test here that er_load1 doesn't assert. > -test_run:cmd('start server er_load2 with wait=True, wait_load=True, crash_expected = True') > ---- > -- false > -... > -test_run:cmd('stop server er_load1') > ---- > -- true > -... > --- er_load2 exits automatically. > -test_run:cmd('cleanup server er_load1') > ---- > -- true > -... > -test_run:cmd('cleanup server er_load2') > ---- > -- true > -... > -test_run:cmd('delete server er_load1') > ---- > -- true > -... > -test_run:cmd('delete server er_load2') > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > --- > --- Test case for gh-3637, gh-4550. Before the fix replica would > --- exit with an error if a user does not exist or a password is > --- incorrect. Now check that we don't hang/panic and successfully > --- connect. > --- > -fiber = require('fiber') > ---- > -... > -test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'") > ---- > -- true > -... > -test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'") > ---- > -- true > -... > --- Wait a bit to make sure replica waits till user is created. > -fiber.sleep(0.1) > ---- > -... > -box.schema.user.create('cluster') > ---- > -... > --- The user is created. Let the replica fail auth request due to > --- a wrong password. > -fiber.sleep(0.1) > ---- > -... > -box.schema.user.passwd('cluster', 'pass') > ---- > -... > -box.schema.user.grant('cluster', 'replication') > ---- > -... > -while box.info.replication[2] == nil do fiber.sleep(0.01) end > ---- > -... > -vclock = test_run:get_vclock('default') > ---- > -... > -vclock[0] = nil > ---- > -... > -_ = test_run:wait_vclock('replica_auth', vclock) > ---- > -... > -test_run:cmd("stop server replica_auth") > ---- > -- true > -... > -test_run:cmd("cleanup server replica_auth") > ---- > -- true > -... > -test_run:cmd("delete server replica_auth") > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > -box.schema.user.drop('cluster') > ---- > -... > --- > --- Test case for gh-3610. Before the fix replica would fail with the assertion > --- when trying to connect to the same master twice. > --- > -box.schema.user.grant('guest', 'replication') > ---- > -... > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > ---- > -- true > -... > -test_run:cmd("start server replica") > ---- > -- true > -... > -test_run:cmd("switch replica") > ---- > -- true > -... > -replication = box.cfg.replication[1] > ---- > -... > -box.cfg{replication = {replication, replication}} > ---- > -- error: 'Incorrect value for option ''replication'': duplicate connection to the > - same replica' > -... > --- Check the case when duplicate connection is detected in the background. > -test_run:cmd("switch default") > ---- > -- true > -... > -listen = box.cfg.listen > ---- > -... > -box.cfg{listen = ''} > ---- > -... > -test_run:cmd("switch replica") > ---- > -- true > -... > -box.cfg{replication_connect_quorum = 0, replication_connect_timeout = 0.01} > ---- > -... > -box.cfg{replication = {replication, replication}} > ---- > -... > -test_run:cmd("switch default") > ---- > -- true > -... > -box.cfg{listen = listen} > ---- > -... > -while test_run:grep_log('replica', 'duplicate connection') == nil do fiber.sleep(0.01) end > ---- > -... > -test_run:cmd("stop server replica") > ---- > -- true > -... > -test_run:cmd("cleanup server replica") > ---- > -- true > -... > -test_run:cmd("delete server replica") > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > -box.schema.user.revoke('guest', 'replication') > ---- > -... > --- > --- gh-3711 Do not restart replication on box.cfg in case the > --- configuration didn't change. > --- > -box.schema.user.grant('guest', 'replication') > ---- > -... > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > ---- > -- true > -... > -test_run:cmd("start server replica") > ---- > -- true > -... > --- Access rights are checked only during reconnect. If the new > --- and old configurations are equivalent, no reconnect will be > --- issued and replication should continue working. > -box.schema.user.revoke('guest', 'replication') > ---- > -... > -test_run:cmd("switch replica") > ---- > -- true > -... > -replication = box.cfg.replication[1] > ---- > -... > -box.cfg{replication = {replication}} > ---- > -... > -box.info.status == 'running' > ---- > -- true > -... > -box.cfg{replication = replication} > ---- > -... > -box.info.status == 'running' > ---- > -- true > -... > --- Check that comparison of tables works as expected as well. > -test_run:cmd("switch default") > ---- > -- true > -... > -box.schema.user.grant('guest', 'replication') > ---- > -... > -test_run:cmd("switch replica") > ---- > -- true > -... > -replication = box.cfg.replication > ---- > -... > -table.insert(replication, box.cfg.listen) > ---- > -... > -test_run:cmd("switch default") > ---- > -- true > -... > -box.schema.user.revoke('guest', 'replication') > ---- > -... > -test_run:cmd("switch replica") > ---- > -- true > -... > -box.cfg{replication = replication} > ---- > -... > -box.info.status == 'running' > ---- > -- true > -... > -test_run:cmd("switch default") > ---- > -- true > -... > -test_run:cmd("stop server replica") > ---- > -- true > -... > -test_run:cmd("cleanup server replica") > ---- > -- true > -... > -test_run:cmd("delete server replica") > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > --- > --- gh-3704 move cluster id check to replica > --- > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > ---- > -- true > -... > -box.schema.user.grant("guest", "replication") > ---- > -... > -test_run:cmd("start server replica") > ---- > -- true > -... > -test_run:grep_log("replica", "REPLICASET_UUID_MISMATCH") > ---- > -- null > -... > -box.info.replication[2].downstream.status > ---- > -- follow > -... > --- change master's cluster uuid and check that replica doesn't connect. > -test_run:cmd("stop server replica") > ---- > -- true > -... > -_ = box.space._schema:replace{'cluster', tostring(uuid.new())} > ---- > -... > --- master believes replica is in cluster, but their cluster UUIDs differ. > -test_run:cmd("start server replica") > ---- > -- true > -... > -test_run:wait_log("replica", "REPLICASET_UUID_MISMATCH", nil, 1.0) > ---- > -- REPLICASET_UUID_MISMATCH > -... > -test_run:wait_downstream(2, {status = 'stopped'}) > ---- > -- true > -... > -test_run:cmd("stop server replica") > ---- > -- true > -... > -test_run:cmd("cleanup server replica") > ---- > -- true > -... > -test_run:cmd("delete server replica") > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > -box.schema.user.revoke('guest', 'replication') > ---- > -... > --- > --- gh-4399 Check that an error reading WAL directory on subscribe > --- doesn't lead to a permanent replication failure. > --- > -box.schema.user.grant("guest", "replication") > ---- > -... > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > ---- > -- true > -... > -test_run:cmd("start server replica") > ---- > -- true > -... > --- Make the WAL directory inaccessible. > -fio = require('fio') > ---- > -... > -path = fio.abspath(box.cfg.wal_dir) > ---- > -... > -fio.chmod(path, 0) > ---- > -- true > -... > --- Break replication on timeout. > -replication_timeout = box.cfg.replication_timeout > ---- > -... > -box.cfg{replication_timeout = 9000} > ---- > -... > -test_run:cmd("switch replica") > ---- > -- true > -... > -test_run:wait_cond(function() return box.info.replication[1].upstream.status ~= 'follow' end) > ---- > -- true > -... > -require('fiber').sleep(box.cfg.replication_timeout) > ---- > -... > -test_run:cmd("switch default") > ---- > -- true > -... > -box.cfg{replication_timeout = replication_timeout} > ---- > -... > --- Restore access to the WAL directory. > --- Wait for replication to be reestablished. > -fio.chmod(path, tonumber('777', 8)) > ---- > -- true > -... > -test_run:cmd("switch replica") > ---- > -- true > -... > -test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end) > ---- > -- true > -... > -test_run:cmd("switch default") > ---- > -- true > -... > -test_run:cmd("stop server replica") > ---- > -- true > -... > -test_run:cmd("cleanup server replica") > ---- > -- true > -... > -test_run:cmd("delete server replica") > ---- > -- true > -... > -test_run:cleanup_cluster() > ---- > -... > -box.schema.user.revoke('guest', 'replication') > ---- > -... > --- > --- gh-4424 Always enter orphan mode on error in replication > --- configuration change. > --- > -replication_connect_timeout = box.cfg.replication_connect_timeout > ---- > -... > -replication_connect_quorum = box.cfg.replication_connect_quorum > ---- > -... > -box.cfg{replication="12345", replication_connect_timeout=0.1, replication_connect_quorum=1} > ---- > -... > -box.info.status > ---- > -- orphan > -... > -box.info.ro > ---- > -- true > -... > --- reset replication => leave orphan mode > -box.cfg{replication=""} > ---- > -... > -box.info.status > ---- > -- running > -... > -box.info.ro > ---- > -- false > -... > --- no switch to orphan when quorum == 0 > -box.cfg{replication="12345", replication_connect_quorum=0} > ---- > -... > -box.info.status > ---- > -- running > -... > -box.info.ro > ---- > -- false > -... > --- we could connect to one out of two replicas. Set orphan. > -box.cfg{replication_connect_quorum=2} > ---- > -... > -box.cfg{replication={box.cfg.listen, "12345"}} > ---- > -... > -box.info.status > ---- > -- orphan > -... > -box.info.ro > ---- > -- true > -... > --- lower quorum => leave orphan mode > -box.cfg{replication_connect_quorum=1} > ---- > -... > -box.info.status > ---- > -- running > -... > -box.info.ro > ---- > -- false > -... > --- > --- gh-3760: replication quorum 0 on reconfiguration should return > --- from box.cfg immediately. > --- > -replication = box.cfg.replication > ---- > -... > -box.cfg{ \ > - replication = {}, \ > - replication_connect_quorum = 0, \ > - replication_connect_timeout = 1000000 \ > -} > ---- > -... > --- The call below would hang, if quorum 0 is ignored, or checked > --- too late. > -box.cfg{replication = {'localhost:12345'}} > ---- > -... > -box.info.status > ---- > -- running > -... > -box.cfg{ \ > - replication = {}, \ > - replication_connect_quorum = replication_connect_quorum, \ > - replication_connect_timeout = replication_connect_timeout \ > -} > ---- > -... > diff --git a/test/replication/misc.skipcond b/test/replication/misc.skipcond > deleted file mode 100644 > index 48e17903e..000000000 > --- a/test/replication/misc.skipcond > +++ /dev/null > @@ -1,7 +0,0 @@ > -import platform > - > -# Disabled on FreeBSD due to flaky fail #4271. > -if platform.system() == 'FreeBSD': > - self.skip = 1 > - > -# vim: set ft=python: > diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua > deleted file mode 100644 > index d285b014a..000000000 > --- a/test/replication/misc.test.lua > +++ /dev/null > @@ -1,356 +0,0 @@ > -uuid = require('uuid') > -test_run = require('test_run').new() > - > -box.schema.user.grant('guest', 'replication') > - > --- gh-2991 - Tarantool asserts on box.cfg.replication update if one of > --- servers is dead > -replication_timeout = box.cfg.replication_timeout > -replication_connect_timeout = box.cfg.replication_connect_timeout > -box.cfg{replication_timeout=0.05, replication_connect_timeout=0.05, replication={}} > -box.cfg{replication_connect_quorum=2} > -box.cfg{replication = {'127.0.0.1:12345', box.cfg.listen}} > -box.info.status > -box.info.ro > - > --- gh-3606 - Tarantool crashes if box.cfg.replication is updated concurrently > -fiber = require('fiber') > -c = fiber.channel(2) > -f = function() fiber.create(function() pcall(box.cfg, {replication = {12345}}) c:put(true) end) end > -f() > -f() > -c:get() > -c:get() > - > -box.cfg{replication = "", replication_timeout = replication_timeout, replication_connect_timeout = replication_connect_timeout} > -box.info.status > -box.info.ro > - > --- gh-3111 - Allow to rebootstrap a replica from a read-only master > -replica_uuid = uuid.new() > -test_run:cmd('create server test with rpl_master=default, script="replication/replica_uuid.lua"') > -test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > -test_run:cmd('stop server test') > -test_run:cmd('cleanup server test') > -box.cfg{read_only = true} > -test_run:cmd(string.format('start server test with args="%s"', replica_uuid)) > -test_run:cmd('stop server test') > -test_run:cmd('cleanup server test') > -box.cfg{read_only = false} > -test_run:cmd('delete server test') > -test_run:cleanup_cluster() > - > --- gh-3160 - Send heartbeats if there are changes from a remote master only > -SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' } > - > --- Deploy a cluster. > -test_run:create_cluster(SERVERS, "replication", {args="0.03"}) > -test_run:wait_fullmesh(SERVERS) > -test_run:cmd("switch autobootstrap3") > -test_run = require('test_run').new() > -fiber = require('fiber') > -_ = box.schema.space.create('test_timeout'):create_index('pk') > -test_run:cmd("setopt delimiter ';'") > -function wait_not_follow(replicaA, replicaB) > - return test_run:wait_cond(function() > - return replicaA.status ~= 'follow' or replicaB.status ~= 'follow' > - end, box.cfg.replication_timeout) > -end; > -function test_timeout() > - local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream > - local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream > - local follows = test_run:wait_cond(function() > - return replicaA.status == 'follow' or replicaB.status == 'follow' > - end) > - if not follows then error('replicas are not in the follow status') end > - for i = 0, 99 do > - box.space.test_timeout:replace({1}) > - if wait_not_follow(replicaA, replicaB) then > - return error(box.info.replication) > - end > - end > - return true > -end; > -test_run:cmd("setopt delimiter ''"); > -test_timeout() > - > --- gh-3247 - Sequence-generated value is not replicated in case > --- the request was sent via iproto. > -test_run:cmd("switch autobootstrap1") > -net_box = require('net.box') > -_ = box.schema.space.create('space1') > -_ = box.schema.sequence.create('seq') > -_ = box.space.space1:create_index('primary', {sequence = true} ) > -_ = box.space.space1:create_index('secondary', {parts = {2, 'unsigned'}}) > -box.schema.user.grant('guest', 'read,write', 'space', 'space1') > -c = net_box.connect(box.cfg.listen) > -c.space.space1:insert{box.NULL, "data"} -- fails, but bumps sequence value > -c.space.space1:insert{box.NULL, 1, "data"} > -box.space.space1:select{} > -vclock = test_run:get_vclock("autobootstrap1") > -vclock[0] = nil > -_ = test_run:wait_vclock("autobootstrap2", vclock) > -test_run:cmd("switch autobootstrap2") > -box.space.space1:select{} > -test_run:cmd("switch autobootstrap1") > -box.space.space1:drop() > - > -test_run:cmd("switch default") > -test_run:drop_cluster(SERVERS) > -test_run:cleanup_cluster() > - > --- gh-3642 - Check that socket file descriptor doesn't leak > --- when a replica is disconnected. > -rlimit = require('rlimit') > -lim = rlimit.limit() > -rlimit.getrlimit(rlimit.RLIMIT_NOFILE, lim) > -old_fno = lim.rlim_cur > -lim.rlim_cur = 64 > -rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > - > -test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"') > -test_run:cmd('start server sock') > -test_run:cmd('switch sock') > -test_run = require('test_run').new() > -fiber = require('fiber') > -test_run:cmd("setopt delimiter ';'") > -for i = 1, 64 do > - local replication = box.cfg.replication > - box.cfg{replication = {}} > - box.cfg{replication = replication} > - while box.info.replication[1].upstream.status ~= 'follow' do > - fiber.sleep(0.001) > - end > -end; > -test_run:cmd("setopt delimiter ''"); > - > -box.info.replication[1].upstream.status > - > -test_run:cmd('switch default') > - > -lim.rlim_cur = old_fno > -rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim) > - > -test_run:cmd("stop server sock") > -test_run:cmd("cleanup server sock") > -test_run:cmd("delete server sock") > -test_run:cleanup_cluster() > - > -box.schema.user.revoke('guest', 'replication') > - > --- gh-3510 assertion failure in replica_on_applier_disconnect() > -test_run:cmd('create server er_load1 with script="replication/er_load1.lua"') > -test_run:cmd('create server er_load2 with script="replication/er_load2.lua"') > -test_run:cmd('start server er_load1 with wait=False, wait_load=False') > --- Instance er_load2 will fail with error ER_REPLICASET_UUID_MISMATCH. > --- This is OK since we only test here that er_load1 doesn't assert. > -test_run:cmd('start server er_load2 with wait=True, wait_load=True, crash_expected = True') > -test_run:cmd('stop server er_load1') > --- er_load2 exits automatically. > -test_run:cmd('cleanup server er_load1') > -test_run:cmd('cleanup server er_load2') > -test_run:cmd('delete server er_load1') > -test_run:cmd('delete server er_load2') > -test_run:cleanup_cluster() > - > --- > --- Test case for gh-3637, gh-4550. Before the fix replica would > --- exit with an error if a user does not exist or a password is > --- incorrect. Now check that we don't hang/panic and successfully > --- connect. > --- > -fiber = require('fiber') > -test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'") > -test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'") > --- Wait a bit to make sure replica waits till user is created. > -fiber.sleep(0.1) > -box.schema.user.create('cluster') > --- The user is created. Let the replica fail auth request due to > --- a wrong password. > -fiber.sleep(0.1) > -box.schema.user.passwd('cluster', 'pass') > -box.schema.user.grant('cluster', 'replication') > - > -while box.info.replication[2] == nil do fiber.sleep(0.01) end > -vclock = test_run:get_vclock('default') > -vclock[0] = nil > -_ = test_run:wait_vclock('replica_auth', vclock) > - > -test_run:cmd("stop server replica_auth") > -test_run:cmd("cleanup server replica_auth") > -test_run:cmd("delete server replica_auth") > -test_run:cleanup_cluster() > - > -box.schema.user.drop('cluster') > - > --- > --- Test case for gh-3610. Before the fix replica would fail with the assertion > --- when trying to connect to the same master twice. > --- > -box.schema.user.grant('guest', 'replication') > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > -test_run:cmd("start server replica") > -test_run:cmd("switch replica") > -replication = box.cfg.replication[1] > -box.cfg{replication = {replication, replication}} > - > --- Check the case when duplicate connection is detected in the background. > -test_run:cmd("switch default") > -listen = box.cfg.listen > -box.cfg{listen = ''} > - > -test_run:cmd("switch replica") > -box.cfg{replication_connect_quorum = 0, replication_connect_timeout = 0.01} > -box.cfg{replication = {replication, replication}} > - > -test_run:cmd("switch default") > -box.cfg{listen = listen} > -while test_run:grep_log('replica', 'duplicate connection') == nil do fiber.sleep(0.01) end > - > -test_run:cmd("stop server replica") > -test_run:cmd("cleanup server replica") > -test_run:cmd("delete server replica") > -test_run:cleanup_cluster() > -box.schema.user.revoke('guest', 'replication') > - > --- > --- gh-3711 Do not restart replication on box.cfg in case the > --- configuration didn't change. > --- > -box.schema.user.grant('guest', 'replication') > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > -test_run:cmd("start server replica") > - > --- Access rights are checked only during reconnect. If the new > --- and old configurations are equivalent, no reconnect will be > --- issued and replication should continue working. > -box.schema.user.revoke('guest', 'replication') > -test_run:cmd("switch replica") > -replication = box.cfg.replication[1] > -box.cfg{replication = {replication}} > -box.info.status == 'running' > -box.cfg{replication = replication} > -box.info.status == 'running' > - > --- Check that comparison of tables works as expected as well. > -test_run:cmd("switch default") > -box.schema.user.grant('guest', 'replication') > -test_run:cmd("switch replica") > -replication = box.cfg.replication > -table.insert(replication, box.cfg.listen) > -test_run:cmd("switch default") > -box.schema.user.revoke('guest', 'replication') > -test_run:cmd("switch replica") > -box.cfg{replication = replication} > -box.info.status == 'running' > - > -test_run:cmd("switch default") > -test_run:cmd("stop server replica") > -test_run:cmd("cleanup server replica") > -test_run:cmd("delete server replica") > -test_run:cleanup_cluster() > - > --- > --- gh-3704 move cluster id check to replica > --- > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > -box.schema.user.grant("guest", "replication") > -test_run:cmd("start server replica") > -test_run:grep_log("replica", "REPLICASET_UUID_MISMATCH") > -box.info.replication[2].downstream.status > --- change master's cluster uuid and check that replica doesn't connect. > -test_run:cmd("stop server replica") > -_ = box.space._schema:replace{'cluster', tostring(uuid.new())} > --- master believes replica is in cluster, but their cluster UUIDs differ. > -test_run:cmd("start server replica") > -test_run:wait_log("replica", "REPLICASET_UUID_MISMATCH", nil, 1.0) > -test_run:wait_downstream(2, {status = 'stopped'}) > - > -test_run:cmd("stop server replica") > -test_run:cmd("cleanup server replica") > -test_run:cmd("delete server replica") > -test_run:cleanup_cluster() > -box.schema.user.revoke('guest', 'replication') > - > --- > --- gh-4399 Check that an error reading WAL directory on subscribe > --- doesn't lead to a permanent replication failure. > --- > -box.schema.user.grant("guest", "replication") > -test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") > -test_run:cmd("start server replica") > - > --- Make the WAL directory inaccessible. > -fio = require('fio') > -path = fio.abspath(box.cfg.wal_dir) > -fio.chmod(path, 0) > - > --- Break replication on timeout. > -replication_timeout = box.cfg.replication_timeout > -box.cfg{replication_timeout = 9000} > -test_run:cmd("switch replica") > -test_run:wait_cond(function() return box.info.replication[1].upstream.status ~= 'follow' end) > -require('fiber').sleep(box.cfg.replication_timeout) > -test_run:cmd("switch default") > -box.cfg{replication_timeout = replication_timeout} > - > --- Restore access to the WAL directory. > --- Wait for replication to be reestablished. > -fio.chmod(path, tonumber('777', 8)) > -test_run:cmd("switch replica") > -test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end) > -test_run:cmd("switch default") > - > -test_run:cmd("stop server replica") > -test_run:cmd("cleanup server replica") > -test_run:cmd("delete server replica") > -test_run:cleanup_cluster() > -box.schema.user.revoke('guest', 'replication') > - > --- > --- gh-4424 Always enter orphan mode on error in replication > --- configuration change. > --- > -replication_connect_timeout = box.cfg.replication_connect_timeout > -replication_connect_quorum = box.cfg.replication_connect_quorum > -box.cfg{replication="12345", replication_connect_timeout=0.1, replication_connect_quorum=1} > -box.info.status > -box.info.ro > --- reset replication => leave orphan mode > -box.cfg{replication=""} > -box.info.status > -box.info.ro > --- no switch to orphan when quorum == 0 > -box.cfg{replication="12345", replication_connect_quorum=0} > -box.info.status > -box.info.ro > - > --- we could connect to one out of two replicas. Set orphan. > -box.cfg{replication_connect_quorum=2} > -box.cfg{replication={box.cfg.listen, "12345"}} > -box.info.status > -box.info.ro > --- lower quorum => leave orphan mode > -box.cfg{replication_connect_quorum=1} > -box.info.status > -box.info.ro > - > --- > --- gh-3760: replication quorum 0 on reconfiguration should return > --- from box.cfg immediately. > --- > -replication = box.cfg.replication > -box.cfg{ \ > - replication = {}, \ > - replication_connect_quorum = 0, \ > - replication_connect_timeout = 1000000 \ > -} > --- The call below would hang, if quorum 0 is ignored, or checked > --- too late. > -box.cfg{replication = {'localhost:12345'}} > -box.info.status > -box.cfg{ \ > - replication = {}, \ > - replication_connect_quorum = replication_connect_quorum, \ > - replication_connect_timeout = replication_connect_timeout \ > -} > diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg > index f357b07da..e21daa5ad 100644 > --- a/test/replication/suite.cfg > +++ b/test/replication/suite.cfg > @@ -1,6 +1,19 @@ > { > "anon.test.lua": {}, > - "misc.test.lua": {}, > + "misc_assert_connecting_master_twice_gh-3610.test.lua": {}, > + "misc_assert_on_server_die_gh-2991.test.lua": {}, > + "misc_assert_replica_on_applier_disconnect_gh-3510.test.lua": {}, > + "misc_crash_on_box_concurrent_update_gh-3606.test.lua": {}, > + "misc_heartbeats_on_master_changes_gh-3160.test.lua": {}, > + "misc_no_failure_on_error_reading_wal_gh-4399.test.lua": {}, > + "misc_no_panic_on_connected_gh-3637.test.lua": {}, > + "misc_no_restart_on_same_configuration_gh-3711.test.lua": {}, > + "misc_no_socket_leak_on_replica_disconnect_gh-3642.test.lua": {}, > + "misc_orphan_on_reconfiguration_error_gh-4424.test.lua": {}, > + "misc_rebootstrap_from_ro_master_gh-3111.test.lua": {}, > + "misc_replica_checks_cluster_id_gh-3704.test.lua": {}, > + "misc_return_on_quorum_0_gh-3760.test.lua": {}, > + "misc_value_not_replicated_on_iproto_request_gh-3247.test.lua": {}, 4. Wrong names of the tests. > "once.test.lua": {}, > "on_replace.test.lua": {}, > "status.test.lua": {}, > diff --git a/test/replication/suite.ini b/test/replication/suite.ini > index ab9c3dabd..a6d653d3b 100644 > --- a/test/replication/suite.ini > +++ b/test/replication/suite.ini > @@ -13,7 +13,7 @@ is_parallel = True > pretest_clean = True > fragile = errinj.test.lua ; gh-3870 > long_row_timeout.test.lua ; gh-4351 > - misc.test.lua ; gh-4940 > + gh-3160-misc-heartbeats-on-master-changes.test.lua ; gh-4940 > skip_conflict_row.test.lua ; gh-4958 > sync.test.lua ; gh-3835 > transaction.test.lua ; gh-4312 > -- > 2.17.1 >