From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id CEFD16EC55; Thu, 10 Jun 2021 17:14:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CEFD16EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1623334475; bh=WkP3Z/TQndnTylEGy6/CuLMviFsmA/V5v8PEaFAdFFQ=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=oy597RS4LT459EmFcU+yHZl2YlfIdlcj6wKeHseJL9Q4MsLLslSRmeQw1LW/R7322 bRcvwrrk7cdSaeBSI0SkWwuZCi3taWa5AfqFU5ly2TRg4dU9LCtM4EXuAlZukAV3cl rNARR9gfFnYZhgQEujlDB4KAeyA+2D/D0txSxGwE= Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 425076EC55 for ; Thu, 10 Jun 2021 17:14:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 425076EC55 Received: by smtp49.i.mail.ru with esmtpa (envelope-from ) id 1lrLS6-0005f6-Dl; Thu, 10 Jun 2021 17:14:34 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com References: Message-ID: Date: Thu, 10 Jun 2021 17:14:33 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C540E30C2BDD69416C6E46C178CD572CB10182A05F538085040CCEF459912E9BF92ACE09FC3A10998DC10CA4E9BF5AE5CAC484EAE8AEDA1E7E4 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE76DDA24AE0BFC288EEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637C05E8F374EA5A79AEA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BD6CF32B5F8F9D4044A26B67F7EEFF25B073B5AF381EC12C1CC7F00164DA146DAFE8445B8C89999728AA50765F7900637F6B57BC7E64490618DEB871D839B7333395957E7521B51C2DFABB839C843B9C08941B15DA834481F8AA50765F7900637F6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA73AA81AA40904B5D9A18204E546F3947CFD04FE2AD1523FDB03F1AB874ED890284AD6D5ED66289B52698AB9A7B718F8C46E0066C2D8992A16725E5C173C3A84C3CA64746D51DB396CBA3038C0950A5D36B5C8C57E37DE458B0BC6067A898B09E46D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE7D151390FFDBF6399731C566533BA786AA5CC5B56E945C8DA X-C1DE0DAB: 0D63561A33F958A5529B7B59D62C5B4859E9E8093B8D5863B13391352D64F039D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75448CF9D3A7B2C848410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D349DAEDEF7DE8FDCFA81DCA4734774E979A3C266C067F6188BABFA47833F5266DE97DFDA08826823E01D7E09C32AA3244C669E5A000041EBD44DE14E2EB900156E250262A5EE9971B0FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojCpYK6nkTlbGiRhhLrP8rPQ== X-Mailru-Sender: 583F1D7ACE8F49BD9DF7A8DAE6E2B08A48480C97C25CBC4516F68BA54CBAC05FE4EFBAB2DF1EBA3F424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 6/6] replication: prefer to join from booted replicas X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 05.06.2021 02:38, Vladislav Shpilevoy пишет: > The algorithm of looking for an instance to join the replicaset > from didn't take into account that some of the instances might be > not bootstrapped but still perfectly available. > > As a result, a ridiculous situation could happen - an instance > could connect to a cluster with just read-only instances, but it > could have itself with box.cfg{read_only = false}. Then instead of > failing or waiting it just booted a brand new cluster. And after > that the node just started complaining about the others having a > different replicaset UUID. > > The patch makes so a new instance always prefers a bootstrapped > join-source to a non-boostrapped one, including self. In the > situation above the new instance now terminates with an error. > > In future hopefully it should start a retry-loop instead. > > Closes #5613 Thanks! LGTM. > > @TarantoolBot document > Title: IPROTO_BALLOT rework and a new field > > A couple of fields in `IPROTO_BALLOT 0x29` used to have values not > matching with their names. They are changed. > > * `IPROTO_BALLOT_IS_RO 0x01` used to mean "the instance has > `box.cfg{read_only = true}`". It was renamed in the source code > to `IPROTO_BALLOT_IS_RO_CFG`. It has the same code `0x01`, and > the value is the same. Only the name has changed, and in the doc > should be too. > > * `IPROTO_BALLOT_IS_LOADING 0x04` used to mean "the instance has > finished `box.cfg()` and it has `read_only = true`". The name > was wrong therefore, because even if the instance finished > loading, the flag still was false for `read_only = true` nodes. > Also such a value is not very suitable for any sane usage. > The name was changed to `IPROTO_BALLOT_IS_RO`, the code stayed > the same, and the value now is "the instance is not writable". > The reason for being not writable can be any: the node is an > orphan; or it has `read_only = true`; or it is a Raft follower; > or anything else. > > And there is a new field. > > `IPROTO_BALLOT_IS_BOOTED 0x06` means the instance has finished its > bootstrap or recovery. > --- > .../gh-5613-bootstrap-prefer-booted.md | 6 ++ > src/box/replication.cc | 20 +++--- > .../gh-5613-bootstrap-prefer-booted.result | 70 +++++++++++++++++++ > .../gh-5613-bootstrap-prefer-booted.test.lua | 21 ++++++ > test/replication/gh-5613-master.lua | 11 +++ > test/replication/gh-5613-replica1.lua | 13 ++++ > test/replication/gh-5613-replica2.lua | 11 +++ > test/replication/suite.cfg | 1 + > 8 files changed, 144 insertions(+), 9 deletions(-) > create mode 100644 changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md > create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.result > create mode 100644 test/replication/gh-5613-bootstrap-prefer-booted.test.lua > create mode 100644 test/replication/gh-5613-master.lua > create mode 100644 test/replication/gh-5613-replica1.lua > create mode 100644 test/replication/gh-5613-replica2.lua > > diff --git a/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md > new file mode 100644 > index 000000000..c022ee012 > --- /dev/null > +++ b/changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md > @@ -0,0 +1,6 @@ > +## bugfix/replication > + > +* Fixed an error when a replica, at attempt to join a cluster with exclusively > + read-only replicas available, instead of failing or retrying just decided to > + boot its own replicaset. Now it fails with an error about the other nodes > + being read-only so they can't register it (gh-5613). > diff --git a/src/box/replication.cc b/src/box/replication.cc > index d33e70f28..52086c65e 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -951,15 +951,6 @@ replicaset_next(struct replica *replica) > return replica_hash_next(&replicaset.hash, replica); > } > > -/** > - * Compare vclock, read only mode and orphan status > - * of all connected replicas and elect a leader. > - * Initiallly, skip read-only replicas, since they > - * can not properly act as bootstrap masters (register > - * new nodes in _cluster table). If there are no read-write > - * replicas, choose a read-only replica with biggest vclock > - * as a leader, in hope it will become read-write soon. > - */ > struct replica * > replicaset_find_join_master(void) > { > @@ -972,12 +963,23 @@ replicaset_find_join_master(void) > const struct ballot *ballot = &applier->ballot; > int score = 0; > /* > + * First of all try to ignore non-booted instances. Including > + * self if not booted yet. For self it is even dangerous as the > + * instance might decide to boot its own cluster if, for > + * example, the other nodes are available, but read-only. It > + * would be a mistake. > + * > + * For a new cluster it is ok to use a non-booted instance as it > + * means the algorithm tries to find an initial "boot-master". > + * > * Prefer instances not configured as read-only via box.cfg, and > * not being in read-only state due to any other reason. The > * config is stronger because if it is configured as read-only, > * it is in read-only state for sure, until the config is > * changed. > */ > + if (ballot->is_booted) > + score += 10; > if (!ballot->is_ro_cfg) > score += 5; > if (!ballot->is_ro) > diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.result b/test/replication/gh-5613-bootstrap-prefer-booted.result > new file mode 100644 > index 000000000..e8e7fb792 > --- /dev/null > +++ b/test/replication/gh-5613-bootstrap-prefer-booted.result > @@ -0,0 +1,70 @@ > +-- test-run result file version 2 > +test_run = require('test_run').new() > + | --- > + | ... > + > +test_run:cmd('create server master with script="replication/gh-5613-master.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server master with wait=False') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica1') > + | --- > + | - true > + | ... > +test_run:switch('master') > + | --- > + | - true > + | ... > +box.cfg{read_only = true} > + | --- > + | ... > +test_run:switch('default') > + | --- > + | - true > + | ... > + > +test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"') > + | --- > + | - true > + | ... > +-- It returns false, but it is expected. > +test_run:cmd('start server replica2 with crash_expected=True') > + | --- > + | - false > + | ... > +opts = {filename = 'gh-5613-replica2.log'} > + | --- > + | ... > +assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil) > + | --- > + | - true > + | ... > + > +test_run:cmd('delete server replica2') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('delete server replica1') > + | --- > + | - true > + | ... > +test_run:cmd('stop server master') > + | --- > + | - true > + | ... > +test_run:cmd('delete server master') > + | --- > + | - true > + | ... > diff --git a/test/replication/gh-5613-bootstrap-prefer-booted.test.lua b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua > new file mode 100644 > index 000000000..d3c1c1189 > --- /dev/null > +++ b/test/replication/gh-5613-bootstrap-prefer-booted.test.lua > @@ -0,0 +1,21 @@ > +test_run = require('test_run').new() > + > +test_run:cmd('create server master with script="replication/gh-5613-master.lua"') > +test_run:cmd('start server master with wait=False') > +test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"') > +test_run:cmd('start server replica1') > +test_run:switch('master') > +box.cfg{read_only = true} > +test_run:switch('default') > + > +test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"') > +-- It returns false, but it is expected. > +test_run:cmd('start server replica2 with crash_expected=True') > +opts = {filename = 'gh-5613-replica2.log'} > +assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil) > + > +test_run:cmd('delete server replica2') > +test_run:cmd('stop server replica1') > +test_run:cmd('delete server replica1') > +test_run:cmd('stop server master') > +test_run:cmd('delete server master') > diff --git a/test/replication/gh-5613-master.lua b/test/replication/gh-5613-master.lua > new file mode 100644 > index 000000000..408427315 > --- /dev/null > +++ b/test/replication/gh-5613-master.lua > @@ -0,0 +1,11 @@ > +#!/usr/bin/env tarantool > + > +require('console').listen(os.getenv('ADMIN')) > +box.cfg({ > + listen = 'unix/:./gh-5613-master.sock', > + replication = { > + 'unix/:./gh-5613-master.sock', > + 'unix/:./gh-5613-replica1.sock', > + }, > +}) > +box.schema.user.grant('guest', 'super') > diff --git a/test/replication/gh-5613-replica1.lua b/test/replication/gh-5613-replica1.lua > new file mode 100644 > index 000000000..d0d6e3372 > --- /dev/null > +++ b/test/replication/gh-5613-replica1.lua > @@ -0,0 +1,13 @@ > +#!/usr/bin/env tarantool > + > +require('console').listen(os.getenv('ADMIN')) > +box.cfg({ > + listen = 'unix/:./gh-5613-replica1.sock', > + replication = { > + 'unix/:./gh-5613-master.sock', > + 'unix/:./gh-5613-replica1.sock', > + }, > + -- Set to read_only initially so as the bootstrap-master would be > + -- known in advance. > + read_only = true, > +}) > diff --git a/test/replication/gh-5613-replica2.lua b/test/replication/gh-5613-replica2.lua > new file mode 100644 > index 000000000..8cbd45b61 > --- /dev/null > +++ b/test/replication/gh-5613-replica2.lua > @@ -0,0 +1,11 @@ > +#!/usr/bin/env tarantool > + > +require('console').listen(os.getenv('ADMIN')) > +box.cfg({ > + listen = 'unix/:./gh-5613-replica2.sock', > + replication = { > + 'unix/:./gh-5613-master.sock', > + 'unix/:./gh-5613-replica1.sock', > + 'unix/:./gh-5613-replica2.sock', > + }, > +}) > diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg > index 27eab20c2..f9d5ce1cc 100644 > --- a/test/replication/suite.cfg > +++ b/test/replication/suite.cfg > @@ -44,6 +44,7 @@ > "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {}, > "gh-5536-wal-limit.test.lua": {}, > "gh-5566-final-join-synchro.test.lua": {}, > + "gh-5613-bootstrap-prefer-booted.test.lua": {}, > "gh-6032-promote-wal-write.test.lua": {}, > "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, > "gh-6094-rs-uuid-mismatch.test.lua": {}, -- Serge Petrenko