Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/4] Boot with anon
@ 2020-09-12 17:25 Vladislav Shpilevoy
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-12 17:25 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

The patch attempts to address with problem of anonymous replicas being
registered in _cluster, if they are present during bootstrap.

The bug was found during working on another issue related to Raft. The problem
is that Raft won't work properly during bootstrap if non-joined replicas are
registered in _cluster.

When their auto-registration by applier was removed, the anon bug was found.

The auto-registration removal is trivial, but it breaks the cluster bootstrap in
another way creating false-positive XlogGap errors. See the next to last commit
with an explanation. To solve the issue quite a radical solution is applied -
gap errors are not considered critical anymore, and can be retried. I am not
sure that is the best option, but couldn't come up with anything better after a
long struggle with that.

This is a bug, so whatever we will come up with after all, it should be pushed
to the older versions too.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register
Issue: https://github.com/tarantool/tarantool/issues/5287

@ChangeLog
* Anonymous replica could be registered and could prevent WAL files removal (gh-5287).
* XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287).

Vladislav Shpilevoy (4):
  replication: replace anon flag with enum
  xlog: introduce an error code for XlogGapError
  replication: retry in case of XlogGapError
  replication: do not register outgoing connections

 src/box/applier.cc                            |  40 ++++
 src/box/box.cc                                |  41 +++--
 src/box/errcode.h                             |   2 +
 src/box/error.cc                              |   2 +
 src/box/error.h                               |   1 +
 src/box/lua/info.c                            |   2 +-
 src/box/recovery.h                            |   2 -
 src/box/relay.cc                              |   6 +-
 src/box/replication.cc                        |  23 ++-
 src/box/replication.h                         |  31 +++-
 test/box/error.result                         |   2 +
 test/replication/autobootstrap_anon.lua       |  25 +++
 test/replication/autobootstrap_anon1.lua      |   1 +
 test/replication/autobootstrap_anon2.lua      |   1 +
 test/replication/force_recovery.result        | 110 -----------
 test/replication/force_recovery.test.lua      |  43 -----
 test/replication/gh-5287-boot-anon.result     |  77 ++++++++
 test/replication/gh-5287-boot-anon.test.lua   |  29 +++
 test/replication/prune.result                 |  18 +-
 test/replication/prune.test.lua               |   7 +-
 test/replication/replica.lua                  |   2 +
 test/replication/replica_rejoin.result        |   6 +-
 test/replication/replica_rejoin.test.lua      |   4 +-
 .../show_error_on_disconnect.result           |   2 +-
 .../show_error_on_disconnect.test.lua         |   2 +-
 test/xlog/panic_on_wal_error.result           | 171 ------------------
 test/xlog/panic_on_wal_error.test.lua         |  75 --------
 27 files changed, 286 insertions(+), 439 deletions(-)
 create mode 100644 test/replication/autobootstrap_anon.lua
 create mode 120000 test/replication/autobootstrap_anon1.lua
 create mode 120000 test/replication/autobootstrap_anon2.lua
 delete mode 100644 test/replication/force_recovery.result
 delete mode 100644 test/replication/force_recovery.test.lua
 create mode 100644 test/replication/gh-5287-boot-anon.result
 create mode 100644 test/replication/gh-5287-boot-anon.test.lua
 delete mode 100644 test/xlog/panic_on_wal_error.result
 delete mode 100644 test/xlog/panic_on_wal_error.test.lua

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-14 12:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
2020-09-14 10:09   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
2020-09-14 10:18   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
2020-09-14 12:27   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
2020-09-12 17:32 ` [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
2020-09-13 16:03   ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox