From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A2C6C6EC40; Tue, 29 Jun 2021 01:19:02 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A2C6C6EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1624918742; bh=l442pdS0qZPO1Or6ENR5BShBMZfQQHe5Sdz4ZD1xWvY=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=sv7UkBmUJEP3rBIv/QizVhFG3iU1NfENjsYIyNp425vHeTPhiaWhACsL/mPQ3/pxb 1OMshAFhL1W2695TWBcjxtVE0jFqGPoKkf6amFXltYPYaFMEJ/BRIXY/zTAjve6Rgd O2pAqqeoHszKrVV3rXqAh6vpg0jc/WKVbEa5Rx9Y= Received: from smtp61.i.mail.ru (smtp61.i.mail.ru [217.69.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C90B46F3C7 for ; Tue, 29 Jun 2021 01:13:26 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C90B46F3C7 Received: by smtp61.i.mail.ru with esmtpa (envelope-from ) id 1lxzVN-0007oC-PL; Tue, 29 Jun 2021 01:13:26 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Tue, 29 Jun 2021 01:12:57 +0300 Message-Id: X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD954DFF1DC42D673FBE6C6848F3EB6EB89AE0756428E32ECE9182A05F5380850402B83DD5579E3B3E5B9C9AE17EF9E754042F85CCB7779F85D3738EDEFE4523CB8 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE71D44F6E7EB16B5A3EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006376976B91E969631F08638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D884A0F2A9B53D9A2243BE94E634FA58E8117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CD0F529D6CE73765543847C11F186F3C59DAA53EE0834AAEE X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2AD77751E876CB595E8F7B195E1C97831D7A3F10BEE0A642D03B6B7CA29D26CC3 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975CD87436A60A035F400A5FC80054214B6DB43734DDE3D093F49C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF92B0BE0DA6BB795D699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34AA8975497C7004E83965C89129EFF96F5356710F5314A4E7497542C73B5EC13A198EE3F0526C63911D7E09C32AA3244C75BB03C656A0F5D545DE2460207D7E9624AF4FAF06DA24FD927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojNjLyMoNI2JZVZdHihfsz5w== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446C3FEE19542AA4C0EA5D234E9E092E9F74E49AE505CC60B83424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" A joining instance may never receive the latest PROMOTE request, which is the only source of information about the limbo owner. Send out the latest limbo state (e.g. the latest applied PROMOTE request) together with the initial join snapshot. Follow-up #6034 --- src/box/applier.cc | 7 ++- src/box/relay.cc | 9 ++- test/replication/replica_rejoin.result | 77 ++++++++++++++---------- test/replication/replica_rejoin.test.lua | 50 +++++++-------- 4 files changed, 86 insertions(+), 57 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index 7abad3a64..482b9446a 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -454,7 +454,12 @@ applier_wait_snapshot(struct applier *applier) coio_read_xrow(coio, ibuf, &row); if (iproto_type_is_error(row.type)) xrow_decode_error_xc(&row); - else if (row.type != IPROTO_JOIN_SNAPSHOT) { + else if (iproto_type_is_promote_request(row.type)) { + struct synchro_request req; + if (xrow_decode_synchro(&row, &req) != 0) + diag_raise(); + txn_limbo_process(&txn_limbo, &req); + } else if (row.type != IPROTO_JOIN_SNAPSHOT) { tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE, (uint32_t)row.type); } diff --git a/src/box/relay.cc b/src/box/relay.cc index 4ebe0fb06..4b102a777 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -427,6 +427,9 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock, if (txn_limbo_wait_confirm(&txn_limbo) != 0) diag_raise(); + struct synchro_request req; + txn_limbo_checkpoint(&txn_limbo, &req); + /* Respond to the JOIN request with the current vclock. */ struct xrow_header row; xrow_encode_vclock_xc(&row, vclock); @@ -442,7 +445,11 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock, row.type = IPROTO_JOIN_META; coio_write_xrow(&relay->io, &row); - /* Empty at the moment. */ + char body[XROW_SYNCHRO_BODY_LEN_MAX]; + xrow_encode_synchro(&row, body, &req); + row.replica_id = req.replica_id; + row.sync = sync; + coio_write_xrow(&relay->io, &row); /* Mark the end of the metadata stream. */ row.type = IPROTO_JOIN_SNAPSHOT; diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result index 843333a19..e489c150a 100644 --- a/test/replication/replica_rejoin.result +++ b/test/replication/replica_rejoin.result @@ -7,10 +7,19 @@ test_run = env.new() log = require('log') --- ... -engine = test_run:get_cfg('engine') +test_run:cmd("create server master with script='replication/master1.lua'") --- +- true ... -test_run:cleanup_cluster() +test_run:cmd("start server master") +--- +- true +... +test_run:switch("master") +--- +- true +... +engine = test_run:get_cfg('engine') --- ... -- @@ -43,7 +52,7 @@ _ = box.space.test:insert{3} --- ... -- Join a replica, then stop it. -test_run:cmd("create server replica with rpl_master=default, script='replication/replica_rejoin.lua'") +test_run:cmd("create server replica with rpl_master=master, script='replication/replica_rejoin.lua'") --- - true ... @@ -65,7 +74,7 @@ box.space.test:select() - [2] - [3] ... -test_run:cmd("switch default") +test_run:cmd("switch master") --- - true ... @@ -75,7 +84,7 @@ test_run:cmd("stop server replica") ... -- Restart the server to purge the replica from -- the garbage collection state. -test_run:cmd("restart server default") +test_run:cmd("restart server master") box.cfg{wal_cleanup_delay = 0} --- ... @@ -146,7 +155,7 @@ box.space.test:select() - [20] - [30] ... -test_run:cmd("switch default") +test_run:cmd("switch master") --- - true ... @@ -154,7 +163,7 @@ test_run:cmd("switch default") for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end --- ... -vclock = test_run:get_vclock('default') +vclock = test_run:get_vclock('master') --- ... vclock[0] = nil @@ -191,7 +200,7 @@ box.space.test:replace{1, 2, 3} -- bumps LSN on the replica --- - [1, 2, 3] ... -test_run:cmd("switch default") +test_run:cmd("switch master") --- - true ... @@ -199,7 +208,7 @@ test_run:cmd("stop server replica") --- - true ... -test_run:cmd("restart server default") +test_run:cmd("restart server master") box.cfg{wal_cleanup_delay = 0} --- ... @@ -253,7 +262,7 @@ box.space.test:select() -- from the replica. -- -- Bootstrap a new replica. -test_run:cmd("switch default") +test_run:cmd("switch master") --- - true ... @@ -295,7 +304,7 @@ box.cfg{replication = ''} --- ... -- Bump vclock on the master. -test_run:cmd("switch default") +test_run:cmd("switch master") --- - true ... @@ -317,15 +326,15 @@ vclock = test_run:get_vclock('replica') vclock[0] = nil --- ... -_ = test_run:wait_vclock('default', vclock) +_ = test_run:wait_vclock('master', vclock) --- ... -- Restart the master and force garbage collection. -test_run:cmd("switch default") +test_run:cmd("switch master") --- - true ... -test_run:cmd("restart server default") +test_run:cmd("restart server master") box.cfg{wal_cleanup_delay = 0} --- ... @@ -373,7 +382,7 @@ vclock = test_run:get_vclock('replica') vclock[0] = nil --- ... -_ = test_run:wait_vclock('default', vclock) +_ = test_run:wait_vclock('master', vclock) --- ... -- Restart the replica. It should successfully rebootstrap. @@ -396,38 +405,42 @@ test_run:cmd("switch default") --- - true ... -box.cfg{replication = ''} +test_run:cmd("stop server replica") --- +- true ... -test_run:cmd("stop server replica") +test_run:cmd("delete server replica") --- - true ... -test_run:cmd("cleanup server replica") +test_run:cmd("stop server master") --- - true ... -test_run:cmd("delete server replica") +test_run:cmd("delete server master") --- - true ... -test_run:cleanup_cluster() +-- +-- gh-4107: rebootstrap fails if the replica was deleted from +-- the cluster on the master. +-- +test_run:cmd("create server master with script='replication/master1.lua'") --- +- true ... -box.space.test:drop() +test_run:cmd("start server master") --- +- true ... -box.schema.user.revoke('guest', 'replication') +test_run:switch("master") --- +- true ... --- --- gh-4107: rebootstrap fails if the replica was deleted from --- the cluster on the master. --- box.schema.user.grant('guest', 'replication') --- ... -test_run:cmd("create server replica with rpl_master=default, script='replication/replica_uuid.lua'") +test_run:cmd("create server replica with rpl_master=master, script='replication/replica_uuid.lua'") --- - true ... @@ -462,11 +475,11 @@ box.space._cluster:get(2) ~= nil --- - true ... -test_run:cmd("stop server replica") +test_run:switch("default") --- - true ... -test_run:cmd("cleanup server replica") +test_run:cmd("stop server replica") --- - true ... @@ -474,9 +487,11 @@ test_run:cmd("delete server replica") --- - true ... -box.schema.user.revoke('guest', 'replication') +test_run:cmd("stop server master") --- +- true ... -test_run:cleanup_cluster() +test_run:cmd("delete server master") --- +- true ... diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua index c3ba9bf3f..2563177cf 100644 --- a/test/replication/replica_rejoin.test.lua +++ b/test/replication/replica_rejoin.test.lua @@ -1,9 +1,11 @@ env = require('test_run') test_run = env.new() log = require('log') -engine = test_run:get_cfg('engine') -test_run:cleanup_cluster() +test_run:cmd("create server master with script='replication/master1.lua'") +test_run:cmd("start server master") +test_run:switch("master") +engine = test_run:get_cfg('engine') -- -- gh-5806: this replica_rejoin test relies on the wal cleanup fiber @@ -23,17 +25,17 @@ _ = box.space.test:insert{2} _ = box.space.test:insert{3} -- Join a replica, then stop it. -test_run:cmd("create server replica with rpl_master=default, script='replication/replica_rejoin.lua'") +test_run:cmd("create server replica with rpl_master=master, script='replication/replica_rejoin.lua'") test_run:cmd("start server replica") test_run:cmd("switch replica") box.info.replication[1].upstream.status == 'follow' or log.error(box.info) box.space.test:select() -test_run:cmd("switch default") +test_run:cmd("switch master") test_run:cmd("stop server replica") -- Restart the server to purge the replica from -- the garbage collection state. -test_run:cmd("restart server default") +test_run:cmd("restart server master") box.cfg{wal_cleanup_delay = 0} -- Make some checkpoints to remove old xlogs. @@ -58,11 +60,11 @@ box.info.replication[2].downstream.vclock ~= nil or log.error(box.info) test_run:cmd("switch replica") box.info.replication[1].upstream.status == 'follow' or log.error(box.info) box.space.test:select() -test_run:cmd("switch default") +test_run:cmd("switch master") -- Make sure the replica follows new changes. for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end -vclock = test_run:get_vclock('default') +vclock = test_run:get_vclock('master') vclock[0] = nil _ = test_run:wait_vclock('replica', vclock) test_run:cmd("switch replica") @@ -76,9 +78,9 @@ box.space.test:select() -- Check that rebootstrap is NOT initiated unless the replica -- is strictly behind the master. box.space.test:replace{1, 2, 3} -- bumps LSN on the replica -test_run:cmd("switch default") +test_run:cmd("switch master") test_run:cmd("stop server replica") -test_run:cmd("restart server default") +test_run:cmd("restart server master") box.cfg{wal_cleanup_delay = 0} checkpoint_count = box.cfg.checkpoint_count box.cfg{checkpoint_count = 1} @@ -99,7 +101,7 @@ box.space.test:select() -- -- Bootstrap a new replica. -test_run:cmd("switch default") +test_run:cmd("switch master") test_run:cmd("stop server replica") test_run:cmd("cleanup server replica") test_run:cleanup_cluster() @@ -113,17 +115,17 @@ box.cfg{replication = replica_listen} test_run:cmd("switch replica") box.cfg{replication = ''} -- Bump vclock on the master. -test_run:cmd("switch default") +test_run:cmd("switch master") box.space.test:replace{1} -- Bump vclock on the replica. test_run:cmd("switch replica") for i = 1, 10 do box.space.test:replace{2} end vclock = test_run:get_vclock('replica') vclock[0] = nil -_ = test_run:wait_vclock('default', vclock) +_ = test_run:wait_vclock('master', vclock) -- Restart the master and force garbage collection. -test_run:cmd("switch default") -test_run:cmd("restart server default") +test_run:cmd("switch master") +test_run:cmd("restart server master") box.cfg{wal_cleanup_delay = 0} replica_listen = test_run:cmd("eval replica 'return box.cfg.listen'") replica_listen ~= nil @@ -139,7 +141,7 @@ test_run:cmd("switch replica") for i = 1, 10 do box.space.test:replace{2} end vclock = test_run:get_vclock('replica') vclock[0] = nil -_ = test_run:wait_vclock('default', vclock) +_ = test_run:wait_vclock('master', vclock) -- Restart the replica. It should successfully rebootstrap. test_run:cmd("restart server replica with args='true'") box.space.test:select() @@ -148,20 +150,20 @@ box.space.test:replace{2} -- Cleanup. test_run:cmd("switch default") -box.cfg{replication = ''} test_run:cmd("stop server replica") -test_run:cmd("cleanup server replica") test_run:cmd("delete server replica") -test_run:cleanup_cluster() -box.space.test:drop() -box.schema.user.revoke('guest', 'replication') +test_run:cmd("stop server master") +test_run:cmd("delete server master") -- -- gh-4107: rebootstrap fails if the replica was deleted from -- the cluster on the master. -- +test_run:cmd("create server master with script='replication/master1.lua'") +test_run:cmd("start server master") +test_run:switch("master") box.schema.user.grant('guest', 'replication') -test_run:cmd("create server replica with rpl_master=default, script='replication/replica_uuid.lua'") +test_run:cmd("create server replica with rpl_master=master, script='replication/replica_uuid.lua'") start_cmd = string.format("start server replica with args='%s'", require('uuid').new()) box.space._cluster:get(2) == nil test_run:cmd(start_cmd) @@ -170,8 +172,8 @@ test_run:cmd("cleanup server replica") box.space._cluster:delete(2) ~= nil test_run:cmd(start_cmd) box.space._cluster:get(2) ~= nil +test_run:switch("default") test_run:cmd("stop server replica") -test_run:cmd("cleanup server replica") test_run:cmd("delete server replica") -box.schema.user.revoke('guest', 'replication') -test_run:cleanup_cluster() +test_run:cmd("stop server master") +test_run:cmd("delete server master") -- 2.30.1 (Apple Git-130)