From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: tarantool-patches@dev.tarantool.org, gorcunov@gmail.com, sergepetrenko@tarantool.org Subject: [Tarantool-patches] [PATCH v2 0/4] Boot with anon Date: Tue, 15 Sep 2020 01:11:26 +0200 [thread overview] Message-ID: <cover.1600124767.git.v.shpilevoy@tarantool.org> (raw) The patch attempts to address with problem of anonymous replicas being registered in _cluster, if they are present during bootstrap. The bug was found during working on another issue related to Raft. The problem is that Raft won't work properly during bootstrap if non-joined replicas are registered in _cluster. When their auto-registration by applier was removed, the anon bug was found. The auto-registration removal is trivial, but it breaks the cluster bootstrap in another way creating false-positive XlogGap errors. See the second commit with an explanation. To solve the issue quite a radical solution is applied - gap errors are not considered critical anymore, and can be retried. I am not sure that is the best option, but couldn't come up with anything better after a long struggle with that. This is a bug, so whatever we will come up with after all, it should be pushed to the older versions too. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register Issue: https://github.com/tarantool/tarantool/issues/5287 Changes in v2: - Anon status is stored as a flag again. In v1 it was stored as enum, but an alternative solution was proposed, where the enum is not needed. - Ballot now has a new field is_anon. It helps to avoid the enum, and set replica->anon flag to a correct value right when it becomes connected. Through relay or applier, either. @ChangeLog * Anonymous replica could be registered and could prevent WAL files removal (gh-5287). * XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287). Vladislav Shpilevoy (4): xlog: introduce an error code for XlogGapError replication: retry in case of XlogGapError replication: add is_anon flag to ballot replication: do not register outgoing connections src/box/applier.cc | 40 ++++ src/box/box.cc | 30 +-- src/box/errcode.h | 2 + src/box/error.cc | 2 + src/box/error.h | 1 + src/box/iproto_constants.h | 1 + src/box/recovery.h | 2 - src/box/replication.cc | 14 +- src/box/xrow.c | 14 +- src/box/xrow.h | 5 + test/box/error.result | 2 + test/replication/autobootstrap_anon.lua | 25 +++ test/replication/autobootstrap_anon1.lua | 1 + test/replication/autobootstrap_anon2.lua | 1 + test/replication/force_recovery.result | 110 ----------- test/replication/force_recovery.test.lua | 43 ----- test/replication/gh-5287-boot-anon.result | 81 +++++++++ test/replication/gh-5287-boot-anon.test.lua | 30 +++ test/replication/prune.result | 18 +- test/replication/prune.test.lua | 7 +- test/replication/replica.lua | 2 + test/replication/replica_rejoin.result | 6 +- test/replication/replica_rejoin.test.lua | 4 +- .../show_error_on_disconnect.result | 2 +- .../show_error_on_disconnect.test.lua | 2 +- test/xlog/panic_on_wal_error.result | 171 ------------------ test/xlog/panic_on_wal_error.test.lua | 75 -------- 27 files changed, 262 insertions(+), 429 deletions(-) create mode 100644 test/replication/autobootstrap_anon.lua create mode 120000 test/replication/autobootstrap_anon1.lua create mode 120000 test/replication/autobootstrap_anon2.lua delete mode 100644 test/replication/force_recovery.result delete mode 100644 test/replication/force_recovery.test.lua create mode 100644 test/replication/gh-5287-boot-anon.result create mode 100644 test/replication/gh-5287-boot-anon.test.lua delete mode 100644 test/xlog/panic_on_wal_error.result delete mode 100644 test/xlog/panic_on_wal_error.test.lua -- 2.21.1 (Apple Git-122.3)
next reply other threads:[~2020-09-14 23:11 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-14 23:11 Vladislav Shpilevoy [this message] 2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy 2020-09-15 7:53 ` Serge Petrenko 2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError Vladislav Shpilevoy 2020-09-15 7:35 ` Serge Petrenko 2020-09-15 21:23 ` Vladislav Shpilevoy 2020-09-16 10:59 ` Serge Petrenko 2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot Vladislav Shpilevoy 2020-09-15 7:46 ` Serge Petrenko 2020-09-15 21:22 ` Vladislav Shpilevoy 2020-09-16 10:59 ` Serge Petrenko 2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not register outgoing connections Vladislav Shpilevoy 2020-09-15 7:50 ` Serge Petrenko 2020-09-17 12:08 ` [Tarantool-patches] [PATCH v2 0/4] Boot with anon Kirill Yukhin 2020-09-17 13:00 ` Vladislav Shpilevoy 2020-09-17 15:04 ` Kirill Yukhin 2020-09-17 16:42 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=cover.1600124767.git.v.shpilevoy@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 0/4] Boot with anon' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox