From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org, gorcunov@gmail.com Subject: [Tarantool-patches] [PATCH 0/4] Boot with anon Date: Sat, 12 Sep 2020 19:25:52 +0200 [thread overview] Message-ID: <cover.1599931123.git.v.shpilevoy@tarantool.org> (raw) The patch attempts to address with problem of anonymous replicas being registered in _cluster, if they are present during bootstrap. The bug was found during working on another issue related to Raft. The problem is that Raft won't work properly during bootstrap if non-joined replicas are registered in _cluster. When their auto-registration by applier was removed, the anon bug was found. The auto-registration removal is trivial, but it breaks the cluster bootstrap in another way creating false-positive XlogGap errors. See the next to last commit with an explanation. To solve the issue quite a radical solution is applied - gap errors are not considered critical anymore, and can be retried. I am not sure that is the best option, but couldn't come up with anything better after a long struggle with that. This is a bug, so whatever we will come up with after all, it should be pushed to the older versions too. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register Issue: https://github.com/tarantool/tarantool/issues/5287 @ChangeLog * Anonymous replica could be registered and could prevent WAL files removal (gh-5287). * XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287). Vladislav Shpilevoy (4): replication: replace anon flag with enum xlog: introduce an error code for XlogGapError replication: retry in case of XlogGapError replication: do not register outgoing connections src/box/applier.cc | 40 ++++ src/box/box.cc | 41 +++-- src/box/errcode.h | 2 + src/box/error.cc | 2 + src/box/error.h | 1 + src/box/lua/info.c | 2 +- src/box/recovery.h | 2 - src/box/relay.cc | 6 +- src/box/replication.cc | 23 ++- src/box/replication.h | 31 +++- test/box/error.result | 2 + test/replication/autobootstrap_anon.lua | 25 +++ test/replication/autobootstrap_anon1.lua | 1 + test/replication/autobootstrap_anon2.lua | 1 + test/replication/force_recovery.result | 110 ----------- test/replication/force_recovery.test.lua | 43 ----- test/replication/gh-5287-boot-anon.result | 77 ++++++++ test/replication/gh-5287-boot-anon.test.lua | 29 +++ test/replication/prune.result | 18 +- test/replication/prune.test.lua | 7 +- test/replication/replica.lua | 2 + test/replication/replica_rejoin.result | 6 +- test/replication/replica_rejoin.test.lua | 4 +- .../show_error_on_disconnect.result | 2 +- .../show_error_on_disconnect.test.lua | 2 +- test/xlog/panic_on_wal_error.result | 171 ------------------ test/xlog/panic_on_wal_error.test.lua | 75 -------- 27 files changed, 286 insertions(+), 439 deletions(-) create mode 100644 test/replication/autobootstrap_anon.lua create mode 120000 test/replication/autobootstrap_anon1.lua create mode 120000 test/replication/autobootstrap_anon2.lua delete mode 100644 test/replication/force_recovery.result delete mode 100644 test/replication/force_recovery.test.lua create mode 100644 test/replication/gh-5287-boot-anon.result create mode 100644 test/replication/gh-5287-boot-anon.test.lua delete mode 100644 test/xlog/panic_on_wal_error.result delete mode 100644 test/xlog/panic_on_wal_error.test.lua -- 2.21.1 (Apple Git-122.3)
next reply other threads:[~2020-09-12 17:25 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-12 17:25 Vladislav Shpilevoy [this message] 2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy 2020-09-14 10:09 ` Cyrill Gorcunov 2020-09-12 17:25 ` [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy 2020-09-14 10:18 ` Cyrill Gorcunov 2020-09-12 17:25 ` [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError Vladislav Shpilevoy 2020-09-14 12:27 ` Cyrill Gorcunov 2020-09-12 17:25 ` [Tarantool-patches] [PATCH 4/4] replication: do not register outgoing connections Vladislav Shpilevoy 2020-09-12 17:32 ` [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy 2020-09-13 16:03 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=cover.1599931123.git.v.shpilevoy@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 0/4] Boot with anon' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox