From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4C5736EC40; Sat, 5 Jun 2021 02:41:08 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4C5736EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1622850068; bh=vHFh0AfunkO+2YmwpR+bJ7MUk989JFMs+QftiByvnvA=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=MTkE8sQOD4pYbPKIYdErtOQJ91Kz+ESkXmvyv1FF1xXl4+48oZ4BWAO95CmQh89ti lKvEckQIh0vjDqNvjNfWhXaqhtua57cjjvhJFqCu+Uazi10xWhJzygC7ApAQPLPIRa MnDPFDm207iQdGrYhJ2UzlU3XfnDYkhwMNQ7B/Yg= Received: from smtp16.mail.ru (smtp16.mail.ru [94.100.176.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5E59B6EC60 for ; Sat, 5 Jun 2021 02:38:08 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5E59B6EC60 Received: by smtp16.mail.ru with esmtpa (envelope-from ) id 1lpJOB-0001Ge-HU; Sat, 05 Jun 2021 02:38:08 +0300 To: tarantool-patches@dev.tarantool.org, gorcunov@gmail.com, sergepetrenko@tarantool.org Date: Sat, 5 Jun 2021 01:38:00 +0200 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C54907A7AE9C1BA82BC67C1327DFB87C6A6182A05F5380850400A2B0433347F21ACD386A673C379FF4687EC5F533882C0015F3C903C4214079A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7ED971EE68B26EF8CEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006376B6794A086D6ADA58638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D823C75C20CE07477C49358307B00B3D90117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BF1175FABE1C0F9B6A471835C12D1D977C4224003CC836476EB9C4185024447017B076A6E789B0E975F5C1EE8F4F765FCEF16E91398F013803AA81AA40904B5D9CF19DD082D7633A078D18283394535A93AA81AA40904B5D98AA50765F7900637E684DAED600BE6DFD81D268191BDAD3D698AB9A7B718F8C4D1B931868CE1C5781A620F70A64A45A98AA50765F79006372E808ACE2090B5E1725E5C173C3A84C3C5EA940A35A165FF2DBA43225CD8A89F83C798A30B85E16B156CCFE7AF13BCA4B5C8C57E37DE458BEDA766A37F9254B7 X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2368A440D3B0F6089093C9A16E5BC824A2A04A2ABAA09D25379311020FFC8D4AD775094620C760E1220CF238713F72A31 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975CC176FDF34A1E429F11B571310ADA9956209D6A550FA9DCEB9C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EFD99FB7B2A39B4961699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D345DB600F8E858000FA5BDD8FE60144627B002CE8213900C48CC7E48B3EE4DEAA147D17E5CCE45EB621D7E09C32AA3244CED91681A65EA80A69A36C8A83E04D9E5408A6A02710B7304FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojz99asgmzejphClqxabfhDA== X-Mailru-Sender: 504CC1E875BF3E7D9BC0E5172ADA3110B8F3BE3BFF3B96503F9CDBCB5347B67EA332435A0B571E2207784C02288277CA03E0582D3806FB6A5317862B1921BA260ED6CFD6382C13A6112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH 6/6] replication: prefer to join from booted replicas X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" The algorithm of looking for an instance to join the replicaset from didn't take into account that some of the instances might be not bootstrapped but still perfectly available. As a result, a ridiculous situation could happen - an instance could connect to a cluster with just read-only instances, but it could have itself with box.cfg{read_only = false}. Then instead of failing or waiting it just booted a brand new cluster. And after that the node just started complaining about the others having a different replicaset UUID. The patch makes so a new instance always prefers a bootstrapped join-source to a non-boostrapped one, including self. In the situation above the new instance now terminates with an error. In future hopefully it should start a retry-loop instead. Closes #5613 @TarantoolBot document Title: IPROTO_BALLOT rework and a new field A couple of fields in `IPROTO_BALLOT 0x29` used to have values not matching with their names. They are changed. * `IPROTO_BALLOT_IS_RO 0x01` used to mean "the instance has `box.cfg{read_only = true}`". It was renamed in the source code to `IPROTO_BALLOT_IS_RO_CFG`. It has the same code `0x01`, and the value is the same. Only the name has changed, and in the doc should be too. * `IPROTO_BALLOT_IS_LOADING 0x04` used to mean "the instance has finished `box.cfg()` and it has `read_only = true`". The name was wrong therefore, because even if the instance finished loading, the flag still was false for `read_only = true` nodes. Also such a value is not very suitable for any sane usage. The name was changed to `IPROTO_BALLOT_IS_RO`, the code stayed the same, and the value now is "the instance is not writable". The reason for being not writable can be any: the node is an orphan; or it has `read_only = true`; or it is a Raft follower; or anything else. And there is a new field. `IPROTO_BALLOT_IS_BOOTED 0x06` means the instance has finished its bootstrap or recovery. --- .../gh-5613-bootstrap-prefer-booted.md | 6 ++ src/box/replication.cc | 20 +++--- .../gh-5613-bootstrap-prefer-booted.result | 70 +++++++++++++++++++ .../gh-5613-bootstrap-prefer-booted.test.lua | 21 ++++++ test/replication/gh-5613-master.lua | 11 +++ test/replication/gh-5613-replica1.lua | 13 ++++ test/replication/gh-5613-replica2.lua | 11 +++ test/replication/suite.cfg | 1 + 8 files changed, 144 insertions(+), 9 deletions(-) create mode 100644 changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.result create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.test.lua create mode 100644 test/replication/gh-5613-master.lua create mode 100644 test/replication/gh-5613-replica1.lua create mode 100644 test/replication/gh-5613-replica2.lua diff --git a/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md new file mode 100644 index 000000000..c022ee012 --- /dev/null +++ b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md @@ -0,0 +1,6 @@ +## bugfix/replication + +* Fixed an error when a replica, at attempt to join a cluster with exclusively + read-only replicas available, instead of failing or retrying just decided to + boot its own replicaset. Now it fails with an error about the other nodes + being read-only so they can't register it (gh-5613). diff --git a/src/box/replication.cc b/src/box/replication.cc index d33e70f28..52086c65e 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -951,15 +951,6 @@ replicaset_next(struct replica *replica) return replica_hash_next(&replicaset.hash, replica); } -/** - * Compare vclock, read only mode and orphan status - * of all connected replicas and elect a leader. - * Initiallly, skip read-only replicas, since they - * can not properly act as bootstrap masters (register - * new nodes in _cluster table). If there are no read-write - * replicas, choose a read-only replica with biggest vclock - * as a leader, in hope it will become read-write soon. - */ struct replica * replicaset_find_join_master(void) { @@ -972,12 +963,23 @@ replicaset_find_join_master(void) const struct ballot *ballot = &applier->ballot; int score = 0; /* + * First of all try to ignore non-booted instances. Including + * self if not booted yet. For self it is even dangerous as the + * instance might decide to boot its own cluster if, for + * example, the other nodes are available, but read-only. It + * would be a mistake. + * + * For a new cluster it is ok to use a non-booted instance as it + * means the algorithm tries to find an initial "boot-master". + * * Prefer instances not configured as read-only via box.cfg, and * not being in read-only state due to any other reason. The * config is stronger because if it is configured as read-only, * it is in read-only state for sure, until the config is * changed. */ + if (ballot->is_booted) + score += 10; if (!ballot->is_ro_cfg) score += 5; if (!ballot->is_ro) diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.result b/test/replication/gh-5613-bootstrap-prefer-booted.result new file mode 100644 index 000000000..e8e7fb792 --- /dev/null +++ b/test/replication/gh-5613-bootstrap-prefer-booted.result @@ -0,0 +1,70 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... + +test_run:cmd('create server master with script="replication/gh-5613-master.lua"') + | --- + | - true + | ... +test_run:cmd('start server master with wait=False') + | --- + | - true + | ... +test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica1') + | --- + | - true + | ... +test_run:switch('master') + | --- + | - true + | ... +box.cfg{read_only = true} + | --- + | ... +test_run:switch('default') + | --- + | - true + | ... + +test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"') + | --- + | - true + | ... +-- It returns false, but it is expected. +test_run:cmd('start server replica2 with crash_expected=True') + | --- + | - false + | ... +opts = {filename = 'gh-5613-replica2.log'} + | --- + | ... +assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil) + | --- + | - true + | ... + +test_run:cmd('delete server replica2') + | --- + | - true + | ... +test_run:cmd('stop server replica1') + | --- + | - true + | ... +test_run:cmd('delete server replica1') + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('delete server master') + | --- + | - true + | ... diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.test.lua b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua new file mode 100644 index 000000000..d3c1c1189 --- /dev/null +++ b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua @@ -0,0 +1,21 @@ +test_run = require('test_run').new() + +test_run:cmd('create server master with script="replication/gh-5613-master.lua"') +test_run:cmd('start server master with wait=False') +test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"') +test_run:cmd('start server replica1') +test_run:switch('master') +box.cfg{read_only = true} +test_run:switch('default') + +test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"') +-- It returns false, but it is expected. +test_run:cmd('start server replica2 with crash_expected=True') +opts = {filename = 'gh-5613-replica2.log'} +assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil) + +test_run:cmd('delete server replica2') +test_run:cmd('stop server replica1') +test_run:cmd('delete server replica1') +test_run:cmd('stop server master') +test_run:cmd('delete server master') diff --git a/test/replication/gh-5613-master.lua b/test/replication/gh-5613-master.lua new file mode 100644 index 000000000..408427315 --- /dev/null +++ b/test/replication/gh-5613-master.lua @@ -0,0 +1,11 @@ +#!/usr/bin/env tarantool + +require('console').listen(os.getenv('ADMIN')) +box.cfg({ + listen = 'unix/:./gh-5613-master.sock', + replication = { + 'unix/:./gh-5613-master.sock', + 'unix/:./gh-5613-replica1.sock', + }, +}) +box.schema.user.grant('guest', 'super') diff --git a/test/replication/gh-5613-replica1.lua b/test/replication/gh-5613-replica1.lua new file mode 100644 index 000000000..d0d6e3372 --- /dev/null +++ b/test/replication/gh-5613-replica1.lua @@ -0,0 +1,13 @@ +#!/usr/bin/env tarantool + +require('console').listen(os.getenv('ADMIN')) +box.cfg({ + listen = 'unix/:./gh-5613-replica1.sock', + replication = { + 'unix/:./gh-5613-master.sock', + 'unix/:./gh-5613-replica1.sock', + }, + -- Set to read_only initially so as the bootstrap-master would be + -- known in advance. + read_only = true, +}) diff --git a/test/replication/gh-5613-replica2.lua b/test/replication/gh-5613-replica2.lua new file mode 100644 index 000000000..8cbd45b61 --- /dev/null +++ b/test/replication/gh-5613-replica2.lua @@ -0,0 +1,11 @@ +#!/usr/bin/env tarantool + +require('console').listen(os.getenv('ADMIN')) +box.cfg({ + listen = 'unix/:./gh-5613-replica2.sock', + replication = { + 'unix/:./gh-5613-master.sock', + 'unix/:./gh-5613-replica1.sock', + 'unix/:./gh-5613-replica2.sock', + }, +}) diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 27eab20c2..f9d5ce1cc 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -44,6 +44,7 @@ "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {}, "gh-5536-wal-limit.test.lua": {}, "gh-5566-final-join-synchro.test.lua": {}, + "gh-5613-bootstrap-prefer-booted.test.lua": {}, "gh-6032-promote-wal-write.test.lua": {}, "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, "gh-6094-rs-uuid-mismatch.test.lua": {}, -- 2.24.3 (Apple Git-128)