From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 89B1E469719 for ; Sat, 12 Sep 2020 20:25:58 +0300 (MSK) From: Vladislav Shpilevoy Date: Sat, 12 Sep 2020 19:25:52 +0200 Message-Id: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 0/4] Boot with anon List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org, gorcunov@gmail.com The patch attempts to address with problem of anonymous replicas being registered in _cluster, if they are present during bootstrap. The bug was found during working on another issue related to Raft. The problem is that Raft won't work properly during bootstrap if non-joined replicas are registered in _cluster. When their auto-registration by applier was removed, the anon bug was found. The auto-registration removal is trivial, but it breaks the cluster bootstrap in another way creating false-positive XlogGap errors. See the next to last commit with an explanation. To solve the issue quite a radical solution is applied - gap errors are not considered critical anymore, and can be retried. I am not sure that is the best option, but couldn't come up with anything better after a long struggle with that. This is a bug, so whatever we will come up with after all, it should be pushed to the older versions too. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register Issue: https://github.com/tarantool/tarantool/issues/5287 @ChangeLog * Anonymous replica could be registered and could prevent WAL files removal (gh-5287). * XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287). Vladislav Shpilevoy (4): replication: replace anon flag with enum xlog: introduce an error code for XlogGapError replication: retry in case of XlogGapError replication: do not register outgoing connections src/box/applier.cc | 40 ++++ src/box/box.cc | 41 +++-- src/box/errcode.h | 2 + src/box/error.cc | 2 + src/box/error.h | 1 + src/box/lua/info.c | 2 +- src/box/recovery.h | 2 - src/box/relay.cc | 6 +- src/box/replication.cc | 23 ++- src/box/replication.h | 31 +++- test/box/error.result | 2 + test/replication/autobootstrap_anon.lua | 25 +++ test/replication/autobootstrap_anon1.lua | 1 + test/replication/autobootstrap_anon2.lua | 1 + test/replication/force_recovery.result | 110 ----------- test/replication/force_recovery.test.lua | 43 ----- test/replication/gh-5287-boot-anon.result | 77 ++++++++ test/replication/gh-5287-boot-anon.test.lua | 29 +++ test/replication/prune.result | 18 +- test/replication/prune.test.lua | 7 +- test/replication/replica.lua | 2 + test/replication/replica_rejoin.result | 6 +- test/replication/replica_rejoin.test.lua | 4 +- .../show_error_on_disconnect.result | 2 +- .../show_error_on_disconnect.test.lua | 2 +- test/xlog/panic_on_wal_error.result | 171 ------------------ test/xlog/panic_on_wal_error.test.lua | 75 -------- 27 files changed, 286 insertions(+), 439 deletions(-) create mode 100644 test/replication/autobootstrap_anon.lua create mode 120000 test/replication/autobootstrap_anon1.lua create mode 120000 test/replication/autobootstrap_anon2.lua delete mode 100644 test/replication/force_recovery.result delete mode 100644 test/replication/force_recovery.test.lua create mode 100644 test/replication/gh-5287-boot-anon.result create mode 100644 test/replication/gh-5287-boot-anon.test.lua delete mode 100644 test/xlog/panic_on_wal_error.result delete mode 100644 test/xlog/panic_on_wal_error.test.lua -- 2.21.1 (Apple Git-122.3)