[PATCH 0/2] replication: don't stop syncing on configuration errors

Vladimir Davydov vdavydov.dev at gmail.com
Sun Sep 23 18:31:19 MSK 2018


This patch aims at resolving the issue described in #3636, when
replication restart with the same replica set configuration leads
to ER_CFG error and transient orphan mode:

  replication = box.cfg.replication
  box.cfg{}
  box.cfg{replication = box.cfg.replication} -- success
  box.info.status -- orphan! wait a sec and it will change
                  -- back to running

This issue also results in spurious replication-py/multi test failures
(see #3692):

  replication-py/multi.test.py                                [ fail ]
  
  Test failed! Result content mismatch:
  --- replication-py/multi.result Mon Sep 18 13:55:15 2017
  +++ replication-py/multi.reject Wed Sep 19 13:32:54 2018
  @@ -60,9 +60,9 @@
   done
  
   Check data
  -server 1 is ok
  -server 2 is ok
  -server 3 is ok
  +server 1 is not ok
  +server 2 is not ok
  +server 3 is not ok
   Done

The first patch of the series cleanups error messages printed in
the above-mentioned case, which turned out to be very confusing,
while the second patch fixes the issue.

https://github.com/tarantool/tarantool/issues/3636
https://github.com/tarantool/tarantool/issues/3692
https://github.com/tarantool/tarantool/tree/dv/gh-3636-replication-dont-stop-sync-on-cfg-error

Vladimir Davydov (2):
  replication: fix recoverable error reporting
  replication: don't stop syncing on configuration errors

 src/box/applier.cc                | 69 ++++++++++++++++----------------
 src/box/box.cc                    |  6 +++
 src/box/relay.cc                  |  6 +--
 test/replication/sync.result      | 82 +++++++++++++++++++++++++++++++++++++--
 test/replication/sync.test.lua    | 35 ++++++++++++++++-
 test/replication/wal_off.result   |  9 +++++
 test/replication/wal_off.test.lua |  3 ++
 7 files changed, 167 insertions(+), 43 deletions(-)

-- 
2.11.0




More information about the Tarantool-patches mailing list