[Tarantool-patches] [PATCH 0/4] Boot with anon
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sat Sep 12 20:25:52 MSK 2020
The patch attempts to address with problem of anonymous replicas being
registered in _cluster, if they are present during bootstrap.
The bug was found during working on another issue related to Raft. The problem
is that Raft won't work properly during bootstrap if non-joined replicas are
registered in _cluster.
When their auto-registration by applier was removed, the anon bug was found.
The auto-registration removal is trivial, but it breaks the cluster bootstrap in
another way creating false-positive XlogGap errors. See the next to last commit
with an explanation. To solve the issue quite a radical solution is applied -
gap errors are not considered critical anymore, and can be retried. I am not
sure that is the best option, but couldn't come up with anything better after a
long struggle with that.
This is a bug, so whatever we will come up with after all, it should be pushed
to the older versions too.
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register
Issue: https://github.com/tarantool/tarantool/issues/5287
@ChangeLog
* Anonymous replica could be registered and could prevent WAL files removal (gh-5287).
* XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287).
Vladislav Shpilevoy (4):
replication: replace anon flag with enum
xlog: introduce an error code for XlogGapError
replication: retry in case of XlogGapError
replication: do not register outgoing connections
src/box/applier.cc | 40 ++++
src/box/box.cc | 41 +++--
src/box/errcode.h | 2 +
src/box/error.cc | 2 +
src/box/error.h | 1 +
src/box/lua/info.c | 2 +-
src/box/recovery.h | 2 -
src/box/relay.cc | 6 +-
src/box/replication.cc | 23 ++-
src/box/replication.h | 31 +++-
test/box/error.result | 2 +
test/replication/autobootstrap_anon.lua | 25 +++
test/replication/autobootstrap_anon1.lua | 1 +
test/replication/autobootstrap_anon2.lua | 1 +
test/replication/force_recovery.result | 110 -----------
test/replication/force_recovery.test.lua | 43 -----
test/replication/gh-5287-boot-anon.result | 77 ++++++++
test/replication/gh-5287-boot-anon.test.lua | 29 +++
test/replication/prune.result | 18 +-
test/replication/prune.test.lua | 7 +-
test/replication/replica.lua | 2 +
test/replication/replica_rejoin.result | 6 +-
test/replication/replica_rejoin.test.lua | 4 +-
.../show_error_on_disconnect.result | 2 +-
.../show_error_on_disconnect.test.lua | 2 +-
test/xlog/panic_on_wal_error.result | 171 ------------------
test/xlog/panic_on_wal_error.test.lua | 75 --------
27 files changed, 286 insertions(+), 439 deletions(-)
create mode 100644 test/replication/autobootstrap_anon.lua
create mode 120000 test/replication/autobootstrap_anon1.lua
create mode 120000 test/replication/autobootstrap_anon2.lua
delete mode 100644 test/replication/force_recovery.result
delete mode 100644 test/replication/force_recovery.test.lua
create mode 100644 test/replication/gh-5287-boot-anon.result
create mode 100644 test/replication/gh-5287-boot-anon.test.lua
delete mode 100644 test/xlog/panic_on_wal_error.result
delete mode 100644 test/xlog/panic_on_wal_error.test.lua
--
2.21.1 (Apple Git-122.3)
More information about the Tarantool-patches
mailing list