[Tarantool-patches] [PATCH 0/4] Boot with anon

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH 0/4] Boot with anon
@ 2020-09-12 17:25 Vladislav Shpilevoy
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-12 17:25 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

The patch attempts to address with problem of anonymous replicas being
registered in _cluster, if they are present during bootstrap.

The bug was found during working on another issue related to Raft. The problem
is that Raft won't work properly during bootstrap if non-joined replicas are
registered in _cluster.

When their auto-registration by applier was removed, the anon bug was found.

The auto-registration removal is trivial, but it breaks the cluster bootstrap in
another way creating false-positive XlogGap errors. See the next to last commit
with an explanation. To solve the issue quite a radical solution is applied -
gap errors are not considered critical anymore, and can be retried. I am not
sure that is the best option, but couldn't come up with anything better after a
long struggle with that.

This is a bug, so whatever we will come up with after all, it should be pushed
to the older versions too.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register
Issue: https://github.com/tarantool/tarantool/issues/5287

@ChangeLog
* Anonymous replica could be registered and could prevent WAL files removal (gh-5287).
* XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287).

Vladislav Shpilevoy (4):
  replication: replace anon flag with enum
  xlog: introduce an error code for XlogGapError
  replication: retry in case of XlogGapError
  replication: do not register outgoing connections

 src/box/applier.cc                            |  40 ++++
 src/box/box.cc                                |  41 +++--
 src/box/errcode.h                             |   2 +
 src/box/error.cc                              |   2 +
 src/box/error.h                               |   1 +
 src/box/lua/info.c                            |   2 +-
 src/box/recovery.h                            |   2 -
 src/box/relay.cc                              |   6 +-
 src/box/replication.cc                        |  23 ++-
 src/box/replication.h                         |  31 +++-
 test/box/error.result                         |   2 +
 test/replication/autobootstrap_anon.lua       |  25 +++
 test/replication/autobootstrap_anon1.lua      |   1 +
 test/replication/autobootstrap_anon2.lua      |   1 +
 test/replication/force_recovery.result        | 110 -----------
 test/replication/force_recovery.test.lua      |  43 -----
 test/replication/gh-5287-boot-anon.result     |  77 ++++++++
 test/replication/gh-5287-boot-anon.test.lua   |  29 +++
 test/replication/prune.result                 |  18 +-
 test/replication/prune.test.lua               |   7 +-
 test/replication/replica.lua                  |   2 +
 test/replication/replica_rejoin.result        |   6 +-
 test/replication/replica_rejoin.test.lua      |   4 +-
 .../show_error_on_disconnect.result           |   2 +-
 .../show_error_on_disconnect.test.lua         |   2 +-
 test/xlog/panic_on_wal_error.result           | 171 ------------------
 test/xlog/panic_on_wal_error.test.lua         |  75 --------
 27 files changed, 286 insertions(+), 439 deletions(-)
 create mode 100644 test/replication/autobootstrap_anon.lua
 create mode 120000 test/replication/autobootstrap_anon1.lua
 create mode 120000 test/replication/autobootstrap_anon2.lua
 delete mode 100644 test/replication/force_recovery.result
 delete mode 100644 test/replication/force_recovery.test.lua
 create mode 100644 test/replication/gh-5287-boot-anon.result
 create mode 100644 test/replication/gh-5287-boot-anon.test.lua
 delete mode 100644 test/xlog/panic_on_wal_error.result
 delete mode 100644 test/xlog/panic_on_wal_error.test.lua

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum
  2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
@ 2020-09-12 17:25 ` Vladislav Shpilevoy
  2020-09-14 10:09   ` Cyrill Gorcunov
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-12 17:25 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

Struct replica had a flag whether it is anonymous. But a flag is
not enough. When a remote instance didn't tell whether it is
anonymous, the flag can't be true and false, it should be
something third until the replica sends its identity flag.

Not knowing whether a replica is actually non-anonymous, or just
was added with a 'false' anon flag locally due to being in
box.cfg.replication, leads to not being able to update the flag
properly, when the remote instance sends it.

For example, when a subscribe request is received with a flag
saying the remote instance is anonymous, and local struct replica
object stores a false anon flag, it is not possible to say,
whether it is false because the replica was added by
box.cfg.replication, or because it really isn't anonymous. And it
is not possible to update the flag.

In turn, not being able to update the anon flag may lead to an
issue, when the replica is considered non-anonymous even though it
is, and it keeps a garbage collection consumer, preventing WAL
files deletion.

This patch replaces the flag with a enum having 3 values, so in a
next patch the issue can be fixed.

Part of #5287
---
 src/box/box.cc         |  2 +-
 src/box/lua/info.c     |  2 +-
 src/box/relay.cc       |  6 +++---
 src/box/replication.cc | 11 ++++++-----
 src/box/replication.h  | 23 +++++++++++++++++++----
 5 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index faffd5769..3f2999fc0 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -955,7 +955,7 @@ box_clear_synchro_queue(void)
 		replicaset_foreach(replica) {
 			if (replica->relay != NULL &&
 			    relay_get_state(replica->relay) != RELAY_OFF &&
-			    !replica->anon) {
+			    replica->id_status != REPLICA_ID_STATUS_ANON) {
 				assert(!tt_uuid_is_equal(&INSTANCE_UUID,
 							 &replica->uuid));
 				vclock = relay_vclock(replica->relay);
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 1c131caec..6c28ae721 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -226,7 +226,7 @@ lbox_info_replication_anon_call(struct lua_State *L)
 	lua_setmetatable(L, -2);
 
 	replicaset_foreach(replica) {
-		if (!replica->anon)
+		if (replica->id_status != REPLICA_ID_STATUS_ANON)
 			continue;
 
 		lua_pushstring(L, tt_uuid_str(&replica->uuid));
diff --git a/src/box/relay.cc b/src/box/relay.cc
index a7843a8c2..043ae255f 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -601,7 +601,7 @@ relay_subscribe_f(va_list ap)
 	 */
 	struct trigger on_close_log;
 	trigger_create(&on_close_log, relay_on_close_log_f, relay, NULL);
-	if (!relay->replica->anon)
+	if (relay->replica->id_status != REPLICA_ID_STATUS_ANON)
 		trigger_add(&r->on_close_log, &on_close_log);
 
 	/* Setup WAL watcher for sending new rows to the replica. */
@@ -709,7 +709,6 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 		struct vclock *replica_clock, uint32_t replica_version_id,
 		uint32_t replica_id_filter)
 {
-	assert(replica->anon || replica->id != REPLICA_ID_NIL);
 	struct relay *relay = replica->relay;
 	assert(relay->state != RELAY_FOLLOW);
 	/*
@@ -717,7 +716,8 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 	 * unless it has already been registered by initial
 	 * join.
 	 */
-	if (replica->gc == NULL && !replica->anon) {
+	if (replica->gc == NULL &&
+	    replica->id_status != REPLICA_ID_STATUS_ANON) {
 		replica->gc = gc_consumer_register(replica_clock, "replica %s",
 						   tt_uuid_str(&replica->uuid));
 		if (replica->gc == NULL)
diff --git a/src/box/replication.cc b/src/box/replication.cc
index ef0e2411d..4cd08e6ff 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -173,7 +173,7 @@ replica_new(void)
 		diag_raise();
 	}
 	replica->id = 0;
-	replica->anon = false;
+	replica->id_status = REPLICA_ID_STATUS_UNKNOWN;
 	replica->uuid = uuid_nil;
 	replica->applier = NULL;
 	replica->gc = NULL;
@@ -206,6 +206,7 @@ replicaset_add(uint32_t replica_id, const struct tt_uuid *replica_uuid)
 	assert(replica_by_uuid(replica_uuid) == NULL);
 	struct replica *replica = replica_new();
 	replica->uuid = *replica_uuid;
+	replica->id_status = REPLICA_ID_STATUS_REGISTERED;
 	replica_hash_insert(&replicaset.hash, replica);
 	replica_set_id(replica, replica_id);
 	return replica;
@@ -220,7 +221,7 @@ replicaset_add_anon(const struct tt_uuid *replica_uuid)
 	struct replica *replica = replica_new();
 	replica->uuid = *replica_uuid;
 	replica_hash_insert(&replicaset.hash, replica);
-	replica->anon = true;
+	replica->id_status = REPLICA_ID_STATUS_ANON;
 	replicaset.anon_count++;
 	return replica;
 }
@@ -236,7 +237,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
 		/* Assign local replica id */
 		assert(instance_id == REPLICA_ID_NIL);
 		instance_id = replica_id;
-	} else if (replica->anon) {
+	} else if (replica->id_status == REPLICA_ID_STATUS_ANON) {
 		/*
 		 * Set replica gc on its transition from
 		 * anonymous to a normal one.
@@ -250,7 +251,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
 
 	say_info("assigned id %d to replica %s",
 		 replica->id, tt_uuid_str(&replica->uuid));
-	replica->anon = false;
+	replica->id_status = REPLICA_ID_STATUS_REGISTERED;
 }
 
 void
@@ -902,7 +903,7 @@ replica_on_relay_stop(struct replica *replica)
 	 * collector then. See also replica_clear_id.
 	 */
 	if (replica->id == REPLICA_ID_NIL) {
-		if (!replica->anon) {
+		if (replica->id_status != REPLICA_ID_STATUS_ANON) {
 			gc_consumer_unregister(replica->gc);
 			replica->gc = NULL;
 		} else {
diff --git a/src/box/replication.h b/src/box/replication.h
index ddc2bddf4..3e34d4544 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -269,6 +269,21 @@ struct replicaset {
 };
 extern struct replicaset replicaset;
 
+enum replica_id_status {
+	/**
+	 * Replica is connected, but its ID is unknown, and it didn't report yet
+	 * whether it is anonymous.
+	 */
+	REPLICA_ID_STATUS_UNKNOWN,
+	/**
+	 * Anonymous replica doesn't have an id and isn't present in _cluster
+	 * table.
+	 */
+	REPLICA_ID_STATUS_ANON,
+	/** Normal replica having an ID in _cluster. */
+	REPLICA_ID_STATUS_REGISTERED,
+};
+
 /**
  * Summary information about a replica in the replica set.
  */
@@ -286,11 +301,11 @@ struct replica {
 	 */
 	uint32_t id;
 	/**
-	 * Whether this is an anonymous replica, e.g. a read-only
-	 * replica that doesn't have an id and isn't present in
-	 * _cluster table.
+	 * Current status of the replica identity. It is not a flag, because it
+	 * is the replica's decision whether to tell its identity. And if it
+	 * didn't tell it yet, the status may be unknown.
 	 */
-	bool anon;
+	enum replica_id_status id_status;
 	/** Applier fiber. */
 	struct applier *applier;
 	/** Relay thread. */
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError
  2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
@ 2020-09-12 17:25 ` Vladislav Shpilevoy
  2020-09-14 10:18   ` Cyrill Gorcunov
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-12 17:25 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

XlogGapError object didn't have a code in ClientError code space.
Because of that it was not possible to handle the gap error
together with client errors in some switch-case statement.

Now the gap error has a code.

This is going to be used in applier code to handle XlogGapError
among other errors using its code instead of RTTI.

Needed for #5287
---
 src/box/errcode.h     | 1 +
 src/box/error.cc      | 2 ++
 src/box/error.h       | 1 +
 src/box/recovery.h    | 2 --
 test/box/error.result | 1 +
 5 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/box/errcode.h b/src/box/errcode.h
index 3c21375f5..9a240a431 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -271,6 +271,7 @@ struct errcode_record {
         /*216 */_(ER_SYNC_QUORUM_TIMEOUT,       "Quorum collection for a synchronous transaction is timed out") \
         /*217 */_(ER_SYNC_ROLLBACK,             "A rollback for a synchronous transaction is received") \
 	/*218 */_(ER_TUPLE_METADATA_IS_TOO_BIG,	"Can't create tuple: metadata size %u is too big") \
+	/*219 */_(ER_XLOG_GAP,			"%s") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/error.cc b/src/box/error.cc
index c3c2af3ab..ca1d73e0c 100644
--- a/src/box/error.cc
+++ b/src/box/error.cc
@@ -221,6 +221,8 @@ ClientError::get_errcode(const struct error *e)
 		return ER_SYSTEM;
 	if (type_cast(CollationError, e))
 		return ER_CANT_CREATE_COLLATION;
+	if (type_cast(XlogGapError, e))
+		return ER_XLOG_GAP;
 	return ER_PROC_LUA;
 }
 
diff --git a/src/box/error.h b/src/box/error.h
index 988b98255..338121dd9 100644
--- a/src/box/error.h
+++ b/src/box/error.h
@@ -178,6 +178,7 @@ box_error_new(const char *file, unsigned line, uint32_t code,
 
 extern const struct type_info type_ClientError;
 extern const struct type_info type_XlogError;
+extern const struct type_info type_XlogGapError;
 extern const struct type_info type_AccessDeniedError;
 extern const struct type_info type_CustomError;
 
diff --git a/src/box/recovery.h b/src/box/recovery.h
index 6e68abc0b..b8d83951a 100644
--- a/src/box/recovery.h
+++ b/src/box/recovery.h
@@ -40,8 +40,6 @@
 extern "C" {
 #endif /* defined(__cplusplus) */
 
-extern const struct type_info type_XlogGapError;
-
 struct xrow_header;
 struct xstream;
 
diff --git a/test/box/error.result b/test/box/error.result
index cdecdb221..600517316 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -437,6 +437,7 @@ t;
  |   216: box.error.SYNC_QUORUM_TIMEOUT
  |   217: box.error.SYNC_ROLLBACK
  |   218: box.error.TUPLE_METADATA_IS_TOO_BIG
+ |   219: box.error.XLOG_GAP
  | ...
 
 test_run:cmd("setopt delimiter ''");
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError
  2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
@ 2020-09-12 17:25 ` Vladislav Shpilevoy
  2020-09-14 12:27   ` Cyrill Gorcunov
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
  2020-09-12 17:32 ` [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
  4 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-12 17:25 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

Previously XlogGapError was considered a critical error stopping
the replication. That may be not so good as it looks.

XlogGapError is a perfectly fine error, which should not kill the
replication connection. It should be retried instead.

Because here is an example, when the gap can be recovered on its
own. Consider the case: node1 is a leader, it is booted with
vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
also gets vclock {1: 3}. Then node1 writes something and its
vclock becomes {1: 4}. Now node3 boots from node1, and gets the
same vclock. Vclocks now look like this:

  - node1: {1: 4}, leader, has {1: 3} snap.
  - node2: {1: 3}, booted from node1, has only snap.
  - node3: {1: 4}, booted from node1, has only snap.

If the cluster is a fullmesh, node2 will send subscribe requests
with vclock {1: 3}. If node3 receives it, it will respond with
xlog gap error, because it only has a snap with {1: 4}, nothing
else. In that case node2 should retry connecting to node3, and in
the meantime try to get newer changes from node1.

The example is totally valid. However it is unreachable now
because master registers all replicas in _cluster before allowing
them to make a join. So they all bootstrap from a snapshot
containing all their IDs. This is a bug, because such
auto-registration leads to registration of anonymous replicas, if
they are present during bootstrap. Also it blocks Raft, which
can't work if there are registered, but not yet joined nodes.

Once the registration problem will be solved in a next commit, the
XlogGapError will strike quite often during bootstrap. This patch
won't allow that happen.

Needed for #5287
---
 src/box/applier.cc                            |  27 +++
 test/replication/force_recovery.result        | 110 -----------
 test/replication/force_recovery.test.lua      |  43 -----
 test/replication/replica.lua                  |   2 +
 test/replication/replica_rejoin.result        |   6 +-
 test/replication/replica_rejoin.test.lua      |   4 +-
 .../show_error_on_disconnect.result           |   2 +-
 .../show_error_on_disconnect.test.lua         |   2 +-
 test/xlog/panic_on_wal_error.result           | 171 ------------------
 test/xlog/panic_on_wal_error.test.lua         |  75 --------
 10 files changed, 36 insertions(+), 406 deletions(-)
 delete mode 100644 test/replication/force_recovery.result
 delete mode 100644 test/replication/force_recovery.test.lua
 delete mode 100644 test/xlog/panic_on_wal_error.result
 delete mode 100644 test/xlog/panic_on_wal_error.test.lua

diff --git a/src/box/applier.cc b/src/box/applier.cc
index c1d07ca54..96dd48c0d 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -106,6 +106,7 @@ applier_log_error(struct applier *applier, struct error *e)
 	case ER_SYSTEM:
 	case ER_UNKNOWN_REPLICA:
 	case ER_PASSWORD_MISMATCH:
+	case ER_XLOG_GAP:
 		say_info("will retry every %.2lf second",
 			 replication_reconnect_interval());
 		break;
@@ -1333,6 +1334,32 @@ applier_f(va_list ap)
 				applier_disconnect(applier, APPLIER_STOPPED);
 				return -1;
 			}
+		} catch (XlogGapError *e) {
+			/*
+			 * Xlog gap error can't be a critical error. Because it
+			 * is totally normal during bootstrap. Consider the
+			 * case: node1 is a leader, it is booted with vclock
+			 * {1: 3}. Node2 connects and fetches snapshot of node1,
+			 * it also gets vclock {1: 3}. Then node1 writes
+			 * something and its vclock becomes {1: 4}. Now node3
+			 * boots from node1, and gets the same vclock. Vclocks
+			 * now look like this:
+			 *
+			 * - node1: {1: 4}, leader, has {1: 3} snap.
+			 * - node2: {1: 3}, booted from node1, has only snap.
+			 * - node3: {1: 4}, booted from node1, has only snap.
+			 *
+			 * If the cluster is a fullmesh, node2 will send
+			 * subscribe requests with vclock {1: 3}. If node3
+			 * receives it, it will respond with xlog gap error,
+			 * because it only has a snap with {1: 4}, nothing else.
+			 * In that case node2 should retry connecting to node3,
+			 * and in the meantime try to get newer changes from
+			 * node1.
+			 */
+			applier_log_error(applier, e);
+			applier_disconnect(applier, APPLIER_LOADING);
+			goto reconnect;
 		} catch (FiberIsCancelled *e) {
 			if (!diag_is_empty(&applier->diag)) {
 				diag_move(&applier->diag, &fiber()->diag);
diff --git a/test/replication/force_recovery.result b/test/replication/force_recovery.result
deleted file mode 100644
index f50452858..000000000
--- a/test/replication/force_recovery.result
+++ /dev/null
@@ -1,110 +0,0 @@
-test_run = require('test_run').new()
----
-...
-fio = require('fio')
----
-...
---
--- Test that box.cfg.force_recovery is ignored by relay threads (gh-3910).
---
-_ = box.schema.space.create('test')
----
-...
-_ = box.space.test:create_index('primary')
----
-...
-box.schema.user.grant('guest', 'replication')
----
-...
--- Deploy a replica.
-test_run:cmd("create server test with rpl_master=default, script='replication/replica.lua'")
----
-- true
-...
-test_run:cmd("start server test")
----
-- true
-...
--- Stop the replica and wait for the relay thread to exit.
-test_run:cmd("stop server test")
----
-- true
-...
-test_run:wait_cond(function() return box.info.replication[2].downstream.status == 'stopped' end, 10)
----
-- true
-...
--- Delete an xlog file that is needed by the replica.
-box.snapshot()
----
-- ok
-...
-xlog = fio.pathjoin(box.cfg.wal_dir, string.format('%020d.xlog', box.info.signature))
----
-...
-box.space.test:replace{1}
----
-- [1]
-...
-box.snapshot()
----
-- ok
-...
-box.space.test:replace{2}
----
-- [2]
-...
-fio.unlink(xlog)
----
-- true
-...
--- Check that even though box.cfg.force_recovery is set,
--- replication will still fail due to LSN gap.
-box.cfg{force_recovery = true}
----
-...
-test_run:cmd("start server test")
----
-- true
-...
-test_run:cmd("switch test")
----
-- true
-...
-box.space.test:select()
----
-- []
-...
-box.info.replication[1].upstream.status == 'stopped' or box.info
----
-- true
-...
-test_run:cmd("switch default")
----
-- true
-...
-box.cfg{force_recovery = false}
----
-...
--- Cleanup.
-test_run:cmd("stop server test")
----
-- true
-...
-test_run:cmd("cleanup server test")
----
-- true
-...
-test_run:cmd("delete server test")
----
-- true
-...
-test_run:cleanup_cluster()
----
-...
-box.schema.user.revoke('guest', 'replication')
----
-...
-box.space.test:drop()
----
-...
diff --git a/test/replication/force_recovery.test.lua b/test/replication/force_recovery.test.lua
deleted file mode 100644
index 54307814b..000000000
--- a/test/replication/force_recovery.test.lua
+++ /dev/null
@@ -1,43 +0,0 @@
-test_run = require('test_run').new()
-fio = require('fio')
-
---
--- Test that box.cfg.force_recovery is ignored by relay threads (gh-3910).
---
-_ = box.schema.space.create('test')
-_ = box.space.test:create_index('primary')
-box.schema.user.grant('guest', 'replication')
-
--- Deploy a replica.
-test_run:cmd("create server test with rpl_master=default, script='replication/replica.lua'")
-test_run:cmd("start server test")
-
--- Stop the replica and wait for the relay thread to exit.
-test_run:cmd("stop server test")
-test_run:wait_cond(function() return box.info.replication[2].downstream.status == 'stopped' end, 10)
-
--- Delete an xlog file that is needed by the replica.
-box.snapshot()
-xlog = fio.pathjoin(box.cfg.wal_dir, string.format('%020d.xlog', box.info.signature))
-box.space.test:replace{1}
-box.snapshot()
-box.space.test:replace{2}
-fio.unlink(xlog)
-
--- Check that even though box.cfg.force_recovery is set,
--- replication will still fail due to LSN gap.
-box.cfg{force_recovery = true}
-test_run:cmd("start server test")
-test_run:cmd("switch test")
-box.space.test:select()
-box.info.replication[1].upstream.status == 'stopped' or box.info
-test_run:cmd("switch default")
-box.cfg{force_recovery = false}
-
--- Cleanup.
-test_run:cmd("stop server test")
-test_run:cmd("cleanup server test")
-test_run:cmd("delete server test")
-test_run:cleanup_cluster()
-box.schema.user.revoke('guest', 'replication')
-box.space.test:drop()
diff --git a/test/replication/replica.lua b/test/replication/replica.lua
index f3a6dfe58..ef0d6d63a 100644
--- a/test/replication/replica.lua
+++ b/test/replication/replica.lua
@@ -1,6 +1,7 @@
 #!/usr/bin/env tarantool
 
 repl_include_self = arg[1] and arg[1] == 'true' or false
+repl_quorum = arg[2] and tonumber(arg[2]) or nil
 repl_list = nil
 
 if repl_include_self then
@@ -14,6 +15,7 @@ box.cfg({
     replication         = repl_list,
     memtx_memory        = 107374182,
     replication_timeout = 0.1,
+    replication_connect_quorum = repl_quorum,
 })
 
 require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index dd04ae297..7b64c3813 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -213,7 +213,7 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
 box.cfg{checkpoint_count = checkpoint_count}
 ---
 ...
-test_run:cmd("start server replica with args='true'")
+test_run:cmd("start server replica with args='true 0'")
 ---
 - true
 ...
@@ -221,9 +221,9 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
-box.info.status -- orphan
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file'})
 ---
-- orphan
+- true
 ...
 box.space.test:select()
 ---
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 410ef44d7..dcd0557cb 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -79,9 +79,9 @@ for i = 1, 3 do box.space.test:insert{i * 100} end
 fio = require('fio')
 test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) == 1 end) or fio.pathjoin(box.cfg.wal_dir, '*.xlog')
 box.cfg{checkpoint_count = checkpoint_count}
-test_run:cmd("start server replica with args='true'")
+test_run:cmd("start server replica with args='true 0'")
 test_run:cmd("switch replica")
-box.info.status -- orphan
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file'})
 box.space.test:select()
 
 --
diff --git a/test/replication/show_error_on_disconnect.result b/test/replication/show_error_on_disconnect.result
index 48003db06..19d3886e4 100644
--- a/test/replication/show_error_on_disconnect.result
+++ b/test/replication/show_error_on_disconnect.result
@@ -72,7 +72,7 @@ box.space.test:select()
 other_id = box.info.id % 2 + 1
 ---
 ...
-test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
+test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
 ---
 - true
 ...
diff --git a/test/replication/show_error_on_disconnect.test.lua b/test/replication/show_error_on_disconnect.test.lua
index 1b0ea4373..dce926a34 100644
--- a/test/replication/show_error_on_disconnect.test.lua
+++ b/test/replication/show_error_on_disconnect.test.lua
@@ -29,7 +29,7 @@ test_run:cmd("switch master_quorum1")
 box.cfg{replication = repl}
 box.space.test:select()
 other_id = box.info.id % 2 + 1
-test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
+test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
 test_run:cmd("switch master_quorum2")
 box.space.test:select()
 other_id = box.info.id % 2 + 1
diff --git a/test/xlog/panic_on_wal_error.result b/test/xlog/panic_on_wal_error.result
deleted file mode 100644
index 22f14f912..000000000
--- a/test/xlog/panic_on_wal_error.result
+++ /dev/null
@@ -1,171 +0,0 @@
--- preparatory stuff
-env = require('test_run')
----
-...
-test_run = env.new()
----
-...
-test_run:cmd("restart server default with cleanup=True")
-box.schema.user.grant('guest', 'replication')
----
-...
-_ = box.schema.space.create('test')
----
-...
-_ = box.space.test:create_index('pk')
----
-...
---
--- reopen xlog
---
-test_run:cmd("restart server default")
-box.space.test ~= nil
----
-- true
-...
--- insert some stuff
--- 
-box.space.test:auto_increment{'before snapshot'}
----
-- [1, 'before snapshot']
-...
---
--- this snapshot will go to the replica
---
-box.snapshot()
----
-- ok
-...
--- 
--- create a replica, let it catch up somewhat
---
-test_run:cmd("create server replica with rpl_master=default, script='xlog/replica.lua'")
----
-- true
-...
-test_run:cmd("start server replica")
----
-- true
-...
-test_run:cmd("switch replica")
----
-- true
-...
-box.space.test:select{}
----
-- - [1, 'before snapshot']
-...
--- 
--- stop replica, restart the master, insert more stuff
--- which will make it into an xlog only
---
-test_run:cmd("switch default")
----
-- true
-...
-test_run:cmd("stop server replica")
----
-- true
-...
-test_run:cmd("restart server default")
-box.space.test:auto_increment{'after snapshot'}
----
-- [2, 'after snapshot']
-...
-box.space.test:auto_increment{'after snapshot - one more row'}
----
-- [3, 'after snapshot - one more row']
-...
---
--- save snapshot and remove xlogs
--- 
-box.snapshot()
----
-- ok
-...
-fio = require('fio')
----
-...
-glob = fio.pathjoin(box.cfg.wal_dir, '*.xlog')
----
-...
-files = fio.glob(glob)
----
-...
-for _, file in pairs(files) do fio.unlink(file) end
----
-...
---
--- make sure the server has some xlogs, otherwise the
--- replica doesn't discover the gap in the logs
---
-box.space.test:auto_increment{'after snapshot and restart'}
----
-- [4, 'after snapshot and restart']
-...
-box.space.test:auto_increment{'after snapshot and restart - one more row'}
----
-- [5, 'after snapshot and restart - one more row']
-...
---  
---  check that panic is true
---
-box.cfg{force_recovery=false}
----
-...
-box.cfg.force_recovery
----
-- false
-...
--- 
--- try to start the replica, ha-ha
--- (replication should fail, some rows are missing)
---
-test_run:cmd("start server replica")
----
-- true
-...
-test_run:cmd("switch replica")
----
-- true
-...
-fiber = require('fiber')
----
-...
-while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
----
-...
-box.info.replication[1].upstream.status
----
-- stopped
-...
-box.info.replication[1].upstream.message
----
-- 'Missing .xlog file between LSN 6 {1: 6} and 8 {1: 8}'
-...
-box.space.test:select{}
----
-- - [1, 'before snapshot']
-...
---
---
-test_run:cmd("switch default")
----
-- true
-...
-test_run:cmd("stop server replica")
----
-- true
-...
-test_run:cmd("cleanup server replica")
----
-- true
-...
---
--- cleanup
-box.space.test:drop()
----
-...
-box.schema.user.revoke('guest', 'replication')
----
-...
diff --git a/test/xlog/panic_on_wal_error.test.lua b/test/xlog/panic_on_wal_error.test.lua
deleted file mode 100644
index 2e95431c6..000000000
--- a/test/xlog/panic_on_wal_error.test.lua
+++ /dev/null
@@ -1,75 +0,0 @@
--- preparatory stuff
-env = require('test_run')
-test_run = env.new()
-
-test_run:cmd("restart server default with cleanup=True")
-box.schema.user.grant('guest', 'replication')
-_ = box.schema.space.create('test')
-_ = box.space.test:create_index('pk')
---
--- reopen xlog
---
-test_run:cmd("restart server default")
-box.space.test ~= nil
--- insert some stuff
--- 
-box.space.test:auto_increment{'before snapshot'}
---
--- this snapshot will go to the replica
---
-box.snapshot()
--- 
--- create a replica, let it catch up somewhat
---
-test_run:cmd("create server replica with rpl_master=default, script='xlog/replica.lua'")
-test_run:cmd("start server replica")
-test_run:cmd("switch replica")
-box.space.test:select{}
--- 
--- stop replica, restart the master, insert more stuff
--- which will make it into an xlog only
---
-test_run:cmd("switch default")
-test_run:cmd("stop server replica")
-test_run:cmd("restart server default")
-box.space.test:auto_increment{'after snapshot'}
-box.space.test:auto_increment{'after snapshot - one more row'}
---
--- save snapshot and remove xlogs
--- 
-box.snapshot()
-fio = require('fio')
-glob = fio.pathjoin(box.cfg.wal_dir, '*.xlog')
-files = fio.glob(glob)
-for _, file in pairs(files) do fio.unlink(file) end
---
--- make sure the server has some xlogs, otherwise the
--- replica doesn't discover the gap in the logs
---
-box.space.test:auto_increment{'after snapshot and restart'}
-box.space.test:auto_increment{'after snapshot and restart - one more row'}
---  
---  check that panic is true
---
-box.cfg{force_recovery=false}
-box.cfg.force_recovery
--- 
--- try to start the replica, ha-ha
--- (replication should fail, some rows are missing)
---
-test_run:cmd("start server replica")
-test_run:cmd("switch replica")
-fiber = require('fiber')
-while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
-box.info.replication[1].upstream.status
-box.info.replication[1].upstream.message
-box.space.test:select{}
---
---
-test_run:cmd("switch default")
-test_run:cmd("stop server replica")
-test_run:cmd("cleanup server replica")
---
--- cleanup
-box.space.test:drop()
-box.schema.user.revoke('guest', 'replication')
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Tarantool-patches] [PATCH 4/4] replication: do not register outgoing connections
  2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
@ 2020-09-12 17:25 ` Vladislav Shpilevoy
  2020-09-12 17:32 ` [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
  4 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-12 17:25 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

Replication protocol's first stage for non-anonymous replicas is
that the replica should be registered in _cluster to get a unique
ID number.

That happens, when replica connects to a writable node, which
performs the registration. So it means, registration always
happens on the master node when appears an *incoming* request for
it, explicitly asking for a registration. Only relay can do that.

That wasn't the case for bootstrap. If box.cfg.replication wasn't
empty on the master node doing the cluster bootstrap, it
registered all the outgoing connections in _cluster. Note, the
target node could be even anonymous, but still was registered.

That breaks the protocol, and leads to registration of anon
replicas sometimes. The patch drops it.

After the registration was dropped, appeared that it could lead to
an anonymous replica stored on the master as non-anonymous, with
an unknown anon status. That was because box.cfg.replication
addresses were added to struct replicaset in order to create their
appliers, even before they managed to connect back.

As a result, when they did connect back, they didn't update the
anon flag, and even got a garbage collection consumer object,
preventing WAL files deletion on the master.

Another motivation here is Raft cluster bootstrap specifics.
During Raft bootstrap it is going to be very important that
non-joined replicas should not be registered in _cluster. A
replica can only register after its JOIN request was accepted, and
its snapshot download has started.

Closes #5287
---
 src/box/applier.cc                          | 13 ++++
 src/box/box.cc                              | 39 +++++++----
 src/box/errcode.h                           |  1 +
 src/box/replication.cc                      | 12 ++++
 src/box/replication.h                       |  8 +++
 test/box/error.result                       |  1 +
 test/replication/autobootstrap_anon.lua     | 25 +++++++
 test/replication/autobootstrap_anon1.lua    |  1 +
 test/replication/autobootstrap_anon2.lua    |  1 +
 test/replication/gh-5287-boot-anon.result   | 77 +++++++++++++++++++++
 test/replication/gh-5287-boot-anon.test.lua | 29 ++++++++
 test/replication/prune.result               | 18 ++++-
 test/replication/prune.test.lua             |  7 +-
 13 files changed, 215 insertions(+), 17 deletions(-)
 create mode 100644 test/replication/autobootstrap_anon.lua
 create mode 120000 test/replication/autobootstrap_anon1.lua
 create mode 120000 test/replication/autobootstrap_anon2.lua
 create mode 100644 test/replication/gh-5287-boot-anon.result
 create mode 100644 test/replication/gh-5287-boot-anon.test.lua

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 96dd48c0d..e272a7af6 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -107,6 +107,7 @@ applier_log_error(struct applier *applier, struct error *e)
 	case ER_UNKNOWN_REPLICA:
 	case ER_PASSWORD_MISMATCH:
 	case ER_XLOG_GAP:
+	case ER_TOO_EARLY_SUBSCRIBE:
 		say_info("will retry every %.2lf second",
 			 replication_reconnect_interval());
 		break;
@@ -1306,6 +1307,18 @@ applier_f(va_list ap)
 				applier_log_error(applier, e);
 				applier_disconnect(applier, APPLIER_LOADING);
 				goto reconnect;
+			} else if (e->errcode() == ER_TOO_EARLY_SUBSCRIBE) {
+				/*
+				 * The instance is not anonymous, and is
+				 * registered, but its ID is not delivered to
+				 * all the nodes in the cluster yet, and some
+				 * nodes may ask to retry connection later,
+				 * until they receive _cluster record of this
+				 * instance. From some third node, for example.
+				 */
+				applier_log_error(applier, e);
+				applier_disconnect(applier, APPLIER_LOADING);
+				goto reconnect;
 			} else if (e->errcode() == ER_SYNC_QUORUM_TIMEOUT ||
 				   e->errcode() == ER_SYNC_ROLLBACK) {
 				/*
diff --git a/src/box/box.cc b/src/box/box.cc
index 3f2999fc0..fed2681f6 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1992,9 +1992,33 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	struct replica *replica = replica_by_uuid(&replica_uuid);
 
 	if (!anon && (replica == NULL || replica->id == REPLICA_ID_NIL)) {
-		tnt_raise(ClientError, ER_UNKNOWN_REPLICA,
-			  tt_uuid_str(&replica_uuid),
-			  tt_uuid_str(&REPLICASET_UUID));
+		/*
+		 * The instance is not anonymous, and is registered (at least it
+		 * claims so), but its ID is not delivered to the current
+		 * instance yet. Need to wait until its _cluster record arrives
+		 * from some third node. Likely to happen on bootstrap, when
+		 * there is a fullmesh and 1 leader doing all the _cluster
+		 * registrations. Not all of them are delivered to the other
+		 * nodes yet.
+		 * Also can happen when the replica is deleted from _cluster,
+		 * but still tries to subscribe. It won't have an ID here.
+		 */
+		tnt_raise(ClientError, ER_TOO_EARLY_SUBSCRIBE,
+			  tt_uuid_str(&replica_uuid));
+	}
+	if (anon && replica != NULL && replica->id != REPLICA_ID_NIL) {
+		tnt_raise(ClientError, ER_PROTOCOL, "Can't subscribe an "
+			  "anonymous replica having an ID assigned");
+	}
+	if (anon && replica != NULL &&
+	    replica->id_status == REPLICA_ID_STATUS_UNKNOWN) {
+		/*
+		 * A replica can already exist even before a first subscribe
+		 * request, if it is specified in box.cfg.replication of this
+		 * instance. In that case it was added with an unknown ID
+		 * status to wait for ID or anon flag. Turns out it is anon.
+		 */
+		replica_set_anon(replica);
 	}
 	if (replica == NULL)
 		replica = replicaset_add_anon(&replica_uuid);
@@ -2207,15 +2231,6 @@ bootstrap_master(const struct tt_uuid *replicaset_uuid)
 	box_register_replica(replica_id, &INSTANCE_UUID);
 	assert(replica_by_uuid(&INSTANCE_UUID)->id == 1);
 
-	/* Register other cluster members */
-	replicaset_foreach(replica) {
-		if (tt_uuid_is_equal(&replica->uuid, &INSTANCE_UUID))
-			continue;
-		assert(replica->applier != NULL);
-		box_register_replica(++replica_id, &replica->uuid);
-		assert(replica->id == replica_id);
-	}
-
 	/* Set UUID of a new replica set */
 	box_set_replicaset_uuid(replicaset_uuid);
 
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 9a240a431..e6957d612 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -272,6 +272,7 @@ struct errcode_record {
         /*217 */_(ER_SYNC_ROLLBACK,             "A rollback for a synchronous transaction is received") \
 	/*218 */_(ER_TUPLE_METADATA_IS_TOO_BIG,	"Can't create tuple: metadata size %u is too big") \
 	/*219 */_(ER_XLOG_GAP,			"%s") \
+	/*220 */_(ER_TOO_EARLY_SUBSCRIBE,	"Can't subscribe non-anonymous replica %s until join is done") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 4cd08e6ff..3c95fe5df 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -254,6 +254,18 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
 	replica->id_status = REPLICA_ID_STATUS_REGISTERED;
 }
 
+void
+replica_set_anon(struct replica *replica)
+{
+	assert(replica->id == REPLICA_ID_NIL);
+	assert(replica->id_status == REPLICA_ID_STATUS_UNKNOWN);
+	assert(replica->gc == NULL);
+	replica->id_status = REPLICA_ID_STATUS_ANON;
+	replicaset.anon_count++;
+	say_info("replica %s is defined as anonymous",
+		 tt_uuid_str(&replica->uuid));
+}
+
 void
 replica_clear_id(struct replica *replica)
 {
diff --git a/src/box/replication.h b/src/box/replication.h
index 3e34d4544..5fbe39135 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -376,6 +376,14 @@ replicaset_next(struct replica *replica);
 void
 replica_set_id(struct replica *replica, uint32_t id);
 
+/**
+ * Set the replica to be anonymous. That can only happen if the current ID
+ * status of the replica is unknown. A registered replica can't be downgraded to
+ * an anon replica so far.
+ */
+void
+replica_set_anon(struct replica *replica);
+
 /*
  * Clear the numeric replica-set-local id of a replica.
  *
diff --git a/test/box/error.result b/test/box/error.result
index 600517316..4d85b9e55 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -438,6 +438,7 @@ t;
  |   217: box.error.SYNC_ROLLBACK
  |   218: box.error.TUPLE_METADATA_IS_TOO_BIG
  |   219: box.error.XLOG_GAP
+ |   220: box.error.TOO_EARLY_SUBSCRIBE
  | ...
 
 test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/autobootstrap_anon.lua b/test/replication/autobootstrap_anon.lua
new file mode 100644
index 000000000..2e96d9d1a
--- /dev/null
+++ b/test/replication/autobootstrap_anon.lua
@@ -0,0 +1,25 @@
+#!/usr/bin/env tarantool
+
+local INSTANCE_ID = string.match(arg[0], "%d")
+local SOCKET_DIR = require('fio').cwd()
+local ANON = arg[1] == 'true'
+
+local function instance_uri(instance_id)
+    return SOCKET_DIR..'/autobootstrap'..instance_id..'.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+    listen = instance_uri(INSTANCE_ID),
+    replication = {
+        instance_uri(1),
+        instance_uri(2),
+    };
+    replication_anon = ANON,
+    read_only = ANON,
+})
+
+box.once("bootstrap", function()
+    box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/autobootstrap_anon1.lua b/test/replication/autobootstrap_anon1.lua
new file mode 120000
index 000000000..27e0fec36
--- /dev/null
+++ b/test/replication/autobootstrap_anon1.lua
@@ -0,0 +1 @@
+autobootstrap_anon.lua
\ No newline at end of file
diff --git a/test/replication/autobootstrap_anon2.lua b/test/replication/autobootstrap_anon2.lua
new file mode 120000
index 000000000..27e0fec36
--- /dev/null
+++ b/test/replication/autobootstrap_anon2.lua
@@ -0,0 +1 @@
+autobootstrap_anon.lua
\ No newline at end of file
diff --git a/test/replication/gh-5287-boot-anon.result b/test/replication/gh-5287-boot-anon.result
new file mode 100644
index 000000000..4df1bce8c
--- /dev/null
+++ b/test/replication/gh-5287-boot-anon.result
@@ -0,0 +1,77 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-5287: when a cluster contained an anonymous replica during bootstrap, it
+-- could be registered anyway.
+--
+
+test_run:cmd("create server replica1 with script='replication/autobootstrap_anon1.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd("start server replica1 with wait=False")
+ | ---
+ | - true
+ | ...
+
+test_run:cmd("create server replica2 with script='replication/autobootstrap_anon2.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd("start server replica2 with args='true', wait=False")
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica2')
+ | ---
+ | - true
+ | ...
+-- Without box.info.replication test-run fails to wait a cond.
+test_run:wait_cond(function() return next(box.info.replication) ~= nil end)
+ | ---
+ | - true
+ | ...
+test_run:wait_upstream(1, {status = 'follow'})
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica1')
+ | ---
+ | - true
+ | ...
+-- The anonymous replica wasn't registered.
+box.space._cluster:len()
+ | ---
+ | - 1
+ | ...
+box.info.gc().consumers
+ | ---
+ | - []
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd("stop server replica1")
+ | ---
+ | - true
+ | ...
+test_run:cmd("delete server replica1")
+ | ---
+ | - true
+ | ...
+test_run:cmd("stop server replica2")
+ | ---
+ | - true
+ | ...
+test_run:cmd("delete server replica2")
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-5287-boot-anon.test.lua b/test/replication/gh-5287-boot-anon.test.lua
new file mode 100644
index 000000000..eb535a326
--- /dev/null
+++ b/test/replication/gh-5287-boot-anon.test.lua
@@ -0,0 +1,29 @@
+test_run = require('test_run').new()
+
+--
+-- gh-5287: when a cluster contained an anonymous replica during bootstrap, it
+-- could be registered anyway.
+--
+
+test_run:cmd("create server replica1 with script='replication/autobootstrap_anon1.lua'")
+test_run:cmd("start server replica1 with wait=False")
+
+test_run:cmd("create server replica2 with script='replication/autobootstrap_anon2.lua'")
+test_run:cmd("start server replica2 with args='true', wait=False")
+
+test_run:switch('replica2')
+-- Without box.info.replication test-run fails to wait a cond.
+test_run:wait_cond(function() return next(box.info.replication) ~= nil end)
+test_run:wait_upstream(1, {status = 'follow'})
+
+test_run:switch('replica1')
+-- The anonymous replica wasn't registered.
+box.space._cluster:len()
+box.info.gc().consumers
+
+test_run:switch('default')
+
+test_run:cmd("stop server replica1")
+test_run:cmd("delete server replica1")
+test_run:cmd("stop server replica2")
+test_run:cmd("delete server replica2")
diff --git a/test/replication/prune.result b/test/replication/prune.result
index 1a130df40..67fd62990 100644
--- a/test/replication/prune.result
+++ b/test/replication/prune.result
@@ -133,7 +133,19 @@ test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')
 - ['The local instance id 2 is read-only']
 ...
 -- restart replica and check that replica isn't able to join to cluster
-test_run:cmd('restart server replica1')
+test_run:cmd('stop server replica1')
+---
+- true
+...
+test_run:cmd('start server replica1 with args="true 0"')
+---
+- true
+...
+test_run:cmd('switch replica1')
+---
+- true
+...
+test_run:wait_upstream(1, {message_re = "Can't subscribe non%-anonymous replica"})
 ---
 - true
 ...
@@ -147,9 +159,9 @@ box.space._cluster:len() == 1
 ...
 test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')
 ---
-- ['stopped']
+- ['loading']
 ...
-test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("is not registered with replica set") ~= nil
+test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("Can't subscribe non%-anonymous replica") ~= nil
 ---
 - true
 ...
diff --git a/test/replication/prune.test.lua b/test/replication/prune.test.lua
index 80847325b..ea8b0b3c3 100644
--- a/test/replication/prune.test.lua
+++ b/test/replication/prune.test.lua
@@ -65,11 +65,14 @@ while test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')[1]
 test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')
 
 -- restart replica and check that replica isn't able to join to cluster
-test_run:cmd('restart server replica1')
+test_run:cmd('stop server replica1')
+test_run:cmd('start server replica1 with args="true 0"')
+test_run:cmd('switch replica1')
+test_run:wait_upstream(1, {message_re = "Can't subscribe non%-anonymous replica"})
 test_run:cmd('switch default')
 box.space._cluster:len() == 1
 test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')
-test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("is not registered with replica set") ~= nil
+test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("Can't subscribe non%-anonymous replica") ~= nil
 replica_set.delete(test_run, 2)
 
 box.space.test:drop()
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Boot with anon
  2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
                   ` (3 preceding siblings ...)
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
@ 2020-09-12 17:32 ` Vladislav Shpilevoy
  2020-09-13 16:03   ` Vladislav Shpilevoy
  4 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-12 17:32 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

The CI fails in some tests still. I will look into them,
but unlikely the patch will change significantly.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Boot with anon
  2020-09-12 17:32 ` [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
@ 2020-09-13 16:03   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-13 16:03 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

All is fine. The failing tests are not related to this patch.

Some of them fail on master too, don't know why. Some fail
because of that: https://github.com/tarantool/tarantool/issues/5298

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
@ 2020-09-14 10:09   ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-09-14 10:09 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Sat, Sep 12, 2020 at 07:25:53PM +0200, Vladislav Shpilevoy wrote:
> Struct replica had a flag whether it is anonymous. But a flag is
> not enough. When a remote instance didn't tell whether it is
> anonymous, the flag can't be true and false, it should be
> something third until the replica sends its identity flag.
>

This change is worth to have regardless of anything.
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
@ 2020-09-14 10:18   ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-09-14 10:18 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Sat, Sep 12, 2020 at 07:25:54PM +0200, Vladislav Shpilevoy wrote:
> XlogGapError object didn't have a code in ClientError code space.
> Because of that it was not possible to handle the gap error
> together with client errors in some switch-case statement.
> 
> Now the gap error has a code.
> 
> This is going to be used in applier code to handle XlogGapError
> among other errors using its code instead of RTTI.
> 
> Needed for #5287

And this one too
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError
  2020-09-12 17:25 ` [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
@ 2020-09-14 12:27   ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-09-14 12:27 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Sat, Sep 12, 2020 at 07:25:55PM +0200, Vladislav Shpilevoy wrote:
> Previously XlogGapError was considered a critical error stopping
> the replication. That may be not so good as it looks.
> 
> XlogGapError is a perfectly fine error, which should not kill the
> replication connection. It should be retried instead.
> 
> Because here is an example, when the gap can be recovered on its
> own. Consider the case: node1 is a leader, it is booted with
> vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
> also gets vclock {1: 3}. Then node1 writes something and its
> vclock becomes {1: 4}. Now node3 boots from node1, and gets the
> same vclock. Vclocks now look like this:
> 
>   - node1: {1: 4}, leader, has {1: 3} snap.
>   - node2: {1: 3}, booted from node1, has only snap.
>   - node3: {1: 4}, booted from node1, has only snap.
> 
> If the cluster is a fullmesh, node2 will send subscribe requests
> with vclock {1: 3}. If node3 receives it, it will respond with
> xlog gap error, because it only has a snap with {1: 4}, nothing
> else. In that case node2 should retry connecting to node3, and in
> the meantime try to get newer changes from node1.
> 
> The example is totally valid. However it is unreachable now
> because master registers all replicas in _cluster before allowing
> them to make a join. So they all bootstrap from a snapshot
> containing all their IDs. This is a bug, because such
> auto-registration leads to registration of anonymous replicas, if
> they are present during bootstrap. Also it blocks Raft, which
> can't work if there are registered, but not yet joined nodes.
> 
> Once the registration problem will be solved in a next commit, the
> XlogGapError will strike quite often during bootstrap. This patch
> won't allow that happen.
> 
> Needed for #5287

While indeed this looks like a sane thing to do (ie try to reconnect
in a cycle) when such error happens it breaks backward compatibility
because state of the node is getting changed.

For me the new behaviour is more sane and valid but I'm a bit nervious
if it could cause problems on third party tools which might rely on
the former behaviour...

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-14 12:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
2020-09-14 10:09   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
2020-09-14 10:18   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
2020-09-14 12:27   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
2020-09-12 17:32 ` [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
2020-09-13 16:03   ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox