Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas
@ 2019-12-25 12:46 sergepetrenko
  2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 1/5] box: update comment describing join protocol sergepetrenko
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: sergepetrenko @ 2019-12-25 12:46 UTC (permalink / raw)
  To: v.shpilevoy, georgy; +Cc: tarantool-patches

https://github.com/tarantool/tarantool/tree/sp/gh-3186-anon-replica
https://github.com/tarantool/tarantool/issues/3186

The first patch alters the comment regarding join protocol
to be in sync with latest changes: join isn't done off a snapshot,
but from a read-view rather, created at the moment the request is received.

The second patch removes unnecessary decoding of replicaset uuid on
master side, since master just ignores it now.

The third patch is a preparation for anonymous replica. We need to split join
protocol into two stages: first, fetch a snapshot. This is done by both anonymous
and regular replicas. The second stage is registration, i. e. the addition of
replica to _cluster table. It is done right away when joining a regular replica,
or it can be done by a request from an anonymous replica to transition it to
normal one.
The patch itself just splits applier join into appropriate stages.

The fourth patch modifies vclock logic to ignore 0 component in comparisons.
This is also used by anonymous replicas, which will be the first users of the 
0-th vclock component. This component will be used to keep track of anonymous
replica local changes, which are not replicated. These local changes will vary
from instance to instance, and the anonymous instances will all use 0th component
to track them, so no need to compare this component.

The fifth patch does the main job of adding anonymous replicas.
New requests, FETCH_SNAPSHOT and REGISTER are introduced. Anonymous replicas
are allowed to have a 0 id. They aren't registered as gc consumers and aren't
present in _cluster table.

Changes in v2: 
  - review fixes as per reviews from
    Georgy Kirichenko and Vlad Shpilevoy.

Serge Petrenko (5):
  box: update comment describing join protocol
  replication: do not decode replicaset uuid when processing a subscribe
  applier: split join processing into two stages
  vclock: ignore 0th component in comparisons.
  replication: introduce anonymous replica.

 src/box/applier.cc              | 124 ++++++++++--
 src/box/applier.h               |   4 +
 src/box/box.cc                  | 276 +++++++++++++++++++++++++--
 src/box/box.h                   |  11 +-
 src/box/errcode.h               |   1 +
 src/box/iproto.cc               |  16 +-
 src/box/iproto_constants.h      |   6 +
 src/box/lua/cfg.cc              |  14 +-
 src/box/lua/info.c              |   4 +-
 src/box/lua/load_cfg.lua        |   4 +
 src/box/recovery.cc             |   7 +-
 src/box/relay.cc                |  28 ++-
 src/box/replication.cc          |  41 +++-
 src/box/replication.h           |  24 +++
 src/box/vclock.h                |  12 +-
 src/box/wal.c                   |   4 +
 src/box/xrow.c                  |  49 ++++-
 src/box/xrow.h                  |  68 ++++++-
 test/app-tap/init_script.result |  49 ++---
 test/box/admin.result           |   2 +
 test/box/cfg.result             |   4 +
 test/box/misc.result            |   1 +
 test/replication/anon.lua       |  13 ++
 test/replication/anon.result    | 326 ++++++++++++++++++++++++++++++++
 test/replication/anon.test.lua  | 118 ++++++++++++
 test/replication/anon1.lua      |   1 +
 test/replication/anon2.lua      |   1 +
 test/replication/suite.cfg      |   1 +
 test/unit/vclock.cc             |   8 +-
 29 files changed, 1123 insertions(+), 94 deletions(-)
 create mode 100644 test/replication/anon.lua
 create mode 100644 test/replication/anon.result
 create mode 100644 test/replication/anon.test.lua
 create mode 120000 test/replication/anon1.lua
 create mode 120000 test/replication/anon2.lua

-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH v2 1/5] box: update comment describing join protocol
  2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
@ 2019-12-25 12:46 ` sergepetrenko
  2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 2/5] replication: do not decode replicaset uuid when processing a subscribe sergepetrenko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: sergepetrenko @ 2019-12-25 12:46 UTC (permalink / raw)
  To: v.shpilevoy, georgy; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

The comment states that relay sends the latest snapshot to replica
during initial join, however, this was changed in commit
6332aca655ae7f95d391bdc0109e79915f6e6ad0 (relay: join new replicas off
read view).
Now relay sends rows from the read view created at the moment of join.
Update the comment to match.

Follow-up #1271
---
 src/box/box.cc | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index b119c927b..22ac8bda1 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1391,13 +1391,14 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 	 * => JOIN { INSTANCE_UUID: replica_uuid }
 	 * <= OK { VCLOCK: start_vclock }
 	 *    Replica has enough permissions and master is ready for JOIN.
-	 *     - start_vclock - vclock of the latest master's checkpoint.
+	 *     - start_vclock - master's vclock at the time of join.
 	 *
 	 * <= INSERT
 	 *    ...
 	 *    Initial data: a stream of engine-specifc rows, e.g. snapshot
-	 *    rows for memtx or dirty cursor data for Vinyl. Engine can
-	 *    use REPLICA_ID, LSN and other fields for internal purposes.
+	 *    rows for memtx or dirty cursor data for Vinyl fed from a
+	 *    read-view. Engine can use REPLICA_ID, LSN and other fields
+	 *    for internal purposes.
 	 *    ...
 	 * <= INSERT
 	 * <= OK { VCLOCK: stop_vclock } - end of initial JOIN stage.
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH v2 2/5] replication: do not decode replicaset uuid when processing a subscribe
  2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
  2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 1/5] box: update comment describing join protocol sergepetrenko
@ 2019-12-25 12:46 ` sergepetrenko
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 3/5] applier: split join processing into two stages sergepetrenko
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: sergepetrenko @ 2019-12-25 12:46 UTC (permalink / raw)
  To: v.shpilevoy, georgy; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

After moving cluster id check to replica (7f8cbde3555084ad6c41f137aec4faba4648c705)
we do not check it on master side, so no need to decode it.

Prerequisite #3186
---
 src/box/box.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 22ac8bda1..56d07e634 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1533,11 +1533,11 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	if (!is_box_configured)
 		tnt_raise(ClientError, ER_LOADING);
 
-	struct tt_uuid replicaset_uuid = uuid_nil, replica_uuid = uuid_nil;
+	struct tt_uuid replica_uuid = uuid_nil;
 	struct vclock replica_clock;
 	uint32_t replica_version_id;
 	vclock_create(&replica_clock);
-	xrow_decode_subscribe_xc(header, &replicaset_uuid, &replica_uuid,
+	xrow_decode_subscribe_xc(header, NULL, &replica_uuid,
 				 &replica_clock, &replica_version_id);
 
 	/* Forbid connection to itself */
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH v2 3/5] applier: split join processing into two stages
  2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
  2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 1/5] box: update comment describing join protocol sergepetrenko
  2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 2/5] replication: do not decode replicaset uuid when processing a subscribe sergepetrenko
@ 2019-12-25 12:47 ` sergepetrenko
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons sergepetrenko
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: sergepetrenko @ 2019-12-25 12:47 UTC (permalink / raw)
  To: v.shpilevoy, georgy; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

We already have 'initial join' and 'final join' stages in applier logic.
The first actually means fetching master's snapshot, and the second one
-- receiving the rows which should contain replica's registration in
_cluster.
These stages will be used separately once anonymous replica is
implemented, so split them as a preparation.

Prerequisite #3186
---
 src/box/applier.cc | 65 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 50 insertions(+), 15 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 42374f886..f4f9d0670 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -202,7 +202,7 @@ applier_writer_f(va_list ap)
 }
 
 static int
-apply_initial_join_row(struct xrow_header *row)
+apply_snapshot_row(struct xrow_header *row)
 {
 	int rc;
 	struct request request;
@@ -388,18 +388,12 @@ done:
 	applier_set_state(applier, APPLIER_READY);
 }
 
-/**
- * Execute and process JOIN request (bootstrap the instance).
- */
-static void
-applier_join(struct applier *applier)
+static uint64_t
+applier_wait_snapshot(struct applier *applier)
 {
-	/* Send JOIN request */
 	struct ev_io *coio = &applier->io;
 	struct ibuf *ibuf = &applier->ibuf;
 	struct xrow_header row;
-	xrow_encode_join_xc(&row, &INSTANCE_UUID);
-	coio_write_xrow(coio, &row);
 
 	/**
 	 * Tarantool < 1.7.0: if JOIN is successful, there is no "OK"
@@ -423,8 +417,6 @@ applier_join(struct applier *applier)
 		xrow_decode_vclock_xc(&row, &replicaset.vclock);
 	}
 
-	applier_set_state(applier, APPLIER_INITIAL_JOIN);
-
 	/*
 	 * Receive initial data.
 	 */
@@ -433,7 +425,7 @@ applier_join(struct applier *applier)
 		coio_read_xrow(coio, ibuf, &row);
 		applier->last_row_time = ev_monotonic_now(loop());
 		if (iproto_type_is_dml(row.type)) {
-			if (apply_initial_join_row(&row) != 0)
+			if (apply_snapshot_row(&row) != 0)
 				diag_raise();
 			if (++row_count % 100000 == 0)
 				say_info("%.1fM rows received", row_count / 1e6);
@@ -456,9 +448,16 @@ applier_join(struct applier *applier)
 				  (uint32_t) row.type);
 		}
 	}
-	say_info("initial data received");
 
-	applier_set_state(applier, APPLIER_FINAL_JOIN);
+	return row_count;
+}
+
+static uint64_t
+applier_wait_register(struct applier *applier, uint64_t row_count)
+{
+	struct ev_io *coio = &applier->io;
+	struct ibuf *ibuf = &applier->ibuf;
+	struct xrow_header row;
 
 	/*
 	 * Tarantool < 1.7.0: there is no "final join" stage.
@@ -466,7 +465,7 @@ applier_join(struct applier *applier)
 	 * until replica id is received.
 	 */
 	if (applier->version_id < version_id(1, 7, 0))
-		return;
+		return row_count;
 
 	/*
 	 * Receive final data.
@@ -485,6 +484,7 @@ applier_join(struct applier *applier)
 			 * Current vclock. This is not used now,
 			 * ignore.
 			 */
+			++row_count;
 			break; /* end of stream */
 		} else if (iproto_type_is_error(row.type)) {
 			xrow_decode_error_xc(&row);  /* rethrow error */
@@ -493,6 +493,41 @@ applier_join(struct applier *applier)
 				  (uint32_t) row.type);
 		}
 	}
+
+	return row_count;
+}
+
+/**
+ * Execute and process JOIN request (bootstrap the instance).
+ */
+static void
+applier_join(struct applier *applier)
+{
+	/* Send JOIN request */
+	struct ev_io *coio = &applier->io;
+	struct xrow_header row;
+	uint64_t row_count;
+
+	xrow_encode_join_xc(&row, &INSTANCE_UUID);
+	coio_write_xrow(coio, &row);
+
+	applier_set_state(applier, APPLIER_INITIAL_JOIN);
+
+	row_count = applier_wait_snapshot(applier);
+
+	say_info("initial data received");
+
+	applier_set_state(applier, APPLIER_FINAL_JOIN);
+
+	if (applier_wait_register(applier, row_count) == row_count) {
+		/*
+		 * We didn't receive any rows during registration.
+		 * Proceed to "subscribe" and do not finish bootstrap
+		 * until replica id is received.
+		 */
+		return;
+	}
+
 	say_info("final data received");
 
 	applier_set_state(applier, APPLIER_JOINED);
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons.
  2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
                   ` (2 preceding siblings ...)
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 3/5] applier: split join processing into two stages sergepetrenko
@ 2019-12-25 12:47 ` sergepetrenko
  2019-12-25 16:00   ` Vladislav Shpilevoy
  2019-12-27 18:42   ` Vladislav Shpilevoy
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica sergepetrenko
  2019-12-30  5:12 ` [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas Kirill Yukhin
  5 siblings, 2 replies; 15+ messages in thread
From: sergepetrenko @ 2019-12-25 12:47 UTC (permalink / raw)
  To: v.shpilevoy, georgy; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

0th vclock component will be used to count replica-local rows of an
anonymous replica. These rows won't be replicated and different
instances will have different values in vclock[0]. So ignore 0th
component in comparisons.

Part of #3186
---
 src/box/vclock.h    | 12 ++++++++++--
 test/unit/vclock.cc |  8 ++++----
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/box/vclock.h b/src/box/vclock.h
index b5eddcf8b..35ba6284c 100644
--- a/src/box/vclock.h
+++ b/src/box/vclock.h
@@ -279,8 +279,16 @@ vclock_compare(const struct vclock *a, const struct vclock *b)
 	struct bit_iterator it;
 	bit_iterator_init(&it, &map, sizeof(map), true);
 
-	for (size_t replica_id = bit_iterator_next(&it); replica_id < VCLOCK_MAX;
-	     replica_id = bit_iterator_next(&it)) {
+	size_t replica_id = bit_iterator_next(&it);
+	/*
+	 * Ignore 0-th component in comparisons.
+	 * It is empty for normal replicas and should
+	 * be ignored for anonymous ones.
+	 */
+	if (replica_id == 0)
+		replica_id = bit_iterator_next(&it);
+
+	for (; replica_id < VCLOCK_MAX; replica_id = bit_iterator_next(&it)) {
 
 		int64_t lsn_a = vclock_get(a, replica_id);
 		int64_t lsn_b = vclock_get(b, replica_id);
diff --git a/test/unit/vclock.cc b/test/unit/vclock.cc
index 15e9f9379..d9f676a31 100644
--- a/test/unit/vclock.cc
+++ b/test/unit/vclock.cc
@@ -50,11 +50,11 @@ test_compare_one(uint32_t a_count, const int64_t *lsns_a,
 	vclock_create(&b);
 	for (uint32_t node_id = 0; node_id < a_count; node_id++) {
 		if (lsns_a[node_id] > 0)
-			vclock_follow(&a, node_id, lsns_a[node_id]);
+			vclock_follow(&a, node_id + 1, lsns_a[node_id]);
 	}
 	for (uint32_t node_id = 0; node_id < b_count; node_id++) {
 		if (lsns_b[node_id] > 0)
-			vclock_follow(&b, node_id, lsns_b[node_id]);
+			vclock_follow(&b, node_id + 1, lsns_b[node_id]);
 	}
 
 	return vclock_compare(&a, &b);
@@ -119,7 +119,7 @@ testset_create(vclockset_t *set, int64_t *files, int files_n, int node_n)
 			signature += lsn;
 
 			/* Update cluster hash */
-			vclock_follow(vclock, node_id, lsn);
+			vclock_follow(vclock, node_id + 1, lsn);
 		}
 		vclockset_insert(set, vclock);
 	}
@@ -225,7 +225,7 @@ test_isearch()
 			if (lsn <= 0)
 				continue;
 
-			vclock_follow(&vclock, node_id, lsn);
+			vclock_follow(&vclock, node_id + 1, lsn);
 		}
 
 		int64_t check = *(query + NODE_N);
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.
  2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
                   ` (3 preceding siblings ...)
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons sergepetrenko
@ 2019-12-25 12:47 ` sergepetrenko
  2019-12-25 18:22   ` Vladislav Shpilevoy
  2019-12-30  5:12 ` [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas Kirill Yukhin
  5 siblings, 1 reply; 15+ messages in thread
From: sergepetrenko @ 2019-12-25 12:47 UTC (permalink / raw)
  To: v.shpilevoy, georgy; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

This commit introduces anonymous replicas. Such replicas do not pollute
_cluster table (they can only be read-only and have a zero id in return).
An anonymous replica can be promoted to a normal one if needed.

Closes #3186

@TarantoolBot document
Title: Document anonymous replica

There is a new type of replica in tarantool, anonymous one. Anonymous
replica is read-only (but you still can write to temporary and
replica-local spaces), and it isn't present in _cluster table.

Since anonymous replica isn't registered in _cluster table, there is no
limitation for anonymous replica count in a replicaset. You can have as
many of them as you want.

In order to make a replica anonymous, you have to pass an option
`replication_anon=true` to `box.cfg`. You also have to set 'read_only'
to true.

Let's go through anonymous replica bootstrap.
Suppose we have a master configured with
```
box.cfg{listen=3301}
```
And created a local space called "loc"
```
box.schema.space.create('loc', {is_local=true})
box.space.loc:create_index("pk")
```
Now, to configure an anonymous replica, we have to issue `box.cfg`,
as usual.
```
box.cfg{replication_anon=true, read_only=true, replication=3301}
```
As mentioned above, `replication_anon` may be set to true only together
with `read_only`
The instance will fetch masters snapshot and proceed to following its
changes. It will not receive an id so its id will remain zero.
```
tarantool> box.info.id
---
- 0
...
```
```
tarantool> box.info.replication
---
- 1:
    id: 1
    uuid: 3c84f8d9-e34d-4651-969c-3d0ed214c60f
    lsn: 4
    upstream:
      status: follow
      idle: 0.6912029999985
      peer:
      lag: 0.00014615058898926
...
```
Now we can use the replica.
For example, we may do inserts into the local space:
```
tarantool> for i = 1,10 do
         > box.space.loc:insert{i}
         > end
---
...
```
Note, that while the instance is anonymous, it will increase the 0-th
component of its vclock:
```
tarantool> box.info.vclock
---
- {0: 10, 1: 4}
...
```
Let's now promote the replica to a normal one:
```
tarantool> box.cfg{replication_anon=false}
2019-12-13 20:34:37.423 [71329] main I> assigned id 2 to replica 6a9c2ed2-b9e1-4c57-a0e8-51a46def7661
2019-12-13 20:34:37.424 [71329] main/102/interactive I> set 'replication_anon' configuration option to false
---
...

tarantool> 2019-12-13 20:34:37.424 [71329] main/117/applier/ I> subscribed
2019-12-13 20:34:37.424 [71329] main/117/applier/ I> remote vclock {1: 5} local vclock {0: 10, 1: 5}
2019-12-13 20:34:37.425 [71329] main/118/applierw/ C> leaving orphan mode
```
The replica just received id 2. We can make it read-write now.
```
box.cfg{read_only=false}
2019-12-13 20:35:46.392 [71329] main/102/interactive I> set 'read_only' configuration option to false
---
...

tarantool> box.schema.space.create('test')
---
- engine: memtx
  before_replace: 'function: 0x01109f9dc8'
  on_replace: 'function: 0x01109f9d90'
  ck_constraint: []
  field_count: 0
  temporary: false
  index: []
  is_local: false
  enabled: false
  name: test
  id: 513
- created
...

tarantool> box.info.vclock
---
- {0: 10, 1: 5, 2: 2}
...
```
Now replica tracks its changes in 2nd vclock component, as expected.
It can also become replication master from now on.

Side notes:
  * You cannot replicate from an anonymous instance.
  * To promote an anonymous instance to a regular one,
    you first have to start it as anonymous, and only
    then issue `box.cfg{replication_anon=false}`
  * In order for the deanonymization to succeed, the
    instance must replicate from some read-write instance,
    otherwise noone will be able to add it to _cluster table.
---
 src/box/applier.cc              |  59 +++++-
 src/box/applier.h               |   4 +
 src/box/box.cc                  | 265 ++++++++++++++++++++++++--
 src/box/box.h                   |  11 +-
 src/box/errcode.h               |   1 +
 src/box/iproto.cc               |  16 +-
 src/box/iproto_constants.h      |   6 +
 src/box/lua/cfg.cc              |  14 +-
 src/box/lua/info.c              |   4 +-
 src/box/lua/load_cfg.lua        |   4 +
 src/box/recovery.cc             |   7 +-
 src/box/relay.cc                |  28 ++-
 src/box/replication.cc          |  41 +++-
 src/box/replication.h           |  24 +++
 src/box/wal.c                   |   4 +
 src/box/xrow.c                  |  49 ++++-
 src/box/xrow.h                  |  68 ++++++-
 test/app-tap/init_script.result |  49 ++---
 test/box/admin.result           |   2 +
 test/box/cfg.result             |   4 +
 test/box/misc.result            |   1 +
 test/replication/anon.lua       |  13 ++
 test/replication/anon.result    | 326 ++++++++++++++++++++++++++++++++
 test/replication/anon.test.lua  | 118 ++++++++++++
 test/replication/anon1.lua      |   1 +
 test/replication/anon2.lua      |   1 +
 test/replication/suite.cfg      |   1 +
 27 files changed, 1053 insertions(+), 68 deletions(-)
 create mode 100644 test/replication/anon.lua
 create mode 100644 test/replication/anon.result
 create mode 100644 test/replication/anon.test.lua
 create mode 120000 test/replication/anon1.lua
 create mode 120000 test/replication/anon2.lua

diff --git a/src/box/applier.cc b/src/box/applier.cc
index f4f9d0670..2939a0f61 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -400,7 +400,7 @@ applier_wait_snapshot(struct applier *applier)
 	 * response, but a stream of rows from checkpoint.
 	 */
 	if (applier->version_id >= version_id(1, 7, 0)) {
-		/* Decode JOIN response */
+		/* Decode JOIN/FETCH_SNAPSHOT response */
 		coio_read_xrow(coio, ibuf, &row);
 		if (iproto_type_is_error(row.type)) {
 			xrow_decode_error_xc(&row); /* re-throw error */
@@ -452,6 +452,23 @@ applier_wait_snapshot(struct applier *applier)
 	return row_count;
 }
 
+static void
+applier_fetch_snapshot(struct applier *applier)
+{
+	/* Send FETCH SNAPSHOT request */
+	struct ev_io *coio = &applier->io;
+	struct xrow_header row;
+
+	memset(&row, 0, sizeof(row));
+	row.type = IPROTO_FETCH_SNAPSHOT;
+	coio_write_xrow(coio, &row);
+
+	applier_set_state(applier, APPLIER_FETCH_SNAPSHOT);
+	applier_wait_snapshot(applier);
+	applier_set_state(applier, APPLIER_FETCHED_SNAPSHOT);
+	applier_set_state(applier, APPLIER_READY);
+}
+
 static uint64_t
 applier_wait_register(struct applier *applier, uint64_t row_count)
 {
@@ -497,6 +514,28 @@ applier_wait_register(struct applier *applier, uint64_t row_count)
 	return row_count;
 }
 
+static void
+applier_register(struct applier *applier)
+{
+	/* Send REGISTER request */
+	struct ev_io *coio = &applier->io;
+	struct xrow_header row;
+
+	memset(&row, 0, sizeof(row));
+	/*
+	 * Send this instance's current vclock together
+	 * with REGISTER request.
+	 */
+	xrow_encode_register(&row, &INSTANCE_UUID, box_vclock);
+	row.type = IPROTO_REGISTER;
+	coio_write_xrow(coio, &row);
+
+	applier_set_state(applier, APPLIER_REGISTER);
+	applier_wait_register(applier, 0);
+	applier_set_state(applier, APPLIER_REGISTERED);
+	applier_set_state(applier, APPLIER_READY);
+}
+
 /**
  * Execute and process JOIN request (bootstrap the instance).
  */
@@ -828,7 +867,7 @@ applier_subscribe(struct applier *applier)
 	vclock_create(&vclock);
 	vclock_copy(&vclock, &replicaset.vclock);
 	xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,
-				 &vclock);
+				 &vclock, replication_anon);
 	coio_write_xrow(coio, &row);
 
 	/* Read SUBSCRIBE response */
@@ -996,10 +1035,24 @@ applier_f(va_list ap)
 			if (tt_uuid_is_nil(&REPLICASET_UUID)) {
 				/*
 				 * Execute JOIN if this is a bootstrap.
+				 * In case of anonymous replication, don't
+				 * join but just fetch master's snapshot.
+				 *
 				 * The join will pause the applier
 				 * until WAL is created.
 				 */
-				applier_join(applier);
+				if (replication_anon)
+					applier_fetch_snapshot(applier);
+				else
+					applier_join(applier);
+			}
+			if (applier->version_id >= version_id(1, 7, 0) &&
+			    !replication_anon && instance_id == REPLICA_ID_NIL) {
+				/* anonymity was turned off while we were
+				 * fetching a snapshot or following master.
+				 * Register the replica now.
+				 */
+				applier_register(applier);
 			}
 			applier_subscribe(applier);
 			/*
diff --git a/src/box/applier.h b/src/box/applier.h
index b406e6aaf..c9fdc2955 100644
--- a/src/box/applier.h
+++ b/src/box/applier.h
@@ -61,6 +61,10 @@ enum { APPLIER_SOURCE_MAXLEN = 1024 }; /* enough to fit URI with passwords */
 	_(APPLIER_STOPPED, 10)                                       \
 	_(APPLIER_DISCONNECTED, 11)                                  \
 	_(APPLIER_LOADING, 12)                                       \
+	_(APPLIER_FETCH_SNAPSHOT, 13)                                \
+	_(APPLIER_FETCHED_SNAPSHOT, 14)                              \
+	_(APPLIER_REGISTER, 15)                                      \
+	_(APPLIER_REGISTERED, 16)                                    \
 
 /** States for the applier */
 ENUM(applier_state, applier_STATE);
diff --git a/src/box/box.cc b/src/box/box.cc
index 56d07e634..a93e2503e 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -223,9 +223,13 @@ error:
 	return -1;
 }
 
+static bool
+box_check_ro(void);
+
 void
-box_set_ro(bool ro)
+box_set_ro()
 {
+	bool ro = box_check_ro();
 	if (ro == is_ro)
 		return; /* nothing to do */
 	if (ro)
@@ -486,6 +490,32 @@ box_check_uuid(struct tt_uuid *uuid, const char *name)
 	}
 }
 
+static bool
+box_check_ro()
+{
+	bool ro = cfg_geti("read_only") != 0;
+	bool anon = cfg_geti("replication_anon") != 0;
+	if (anon && !ro) {
+		tnt_raise(ClientError, ER_CFG, "read_only",
+			  "the value may be set to false only when "
+			  "replication_anon is false");
+	}
+	return ro;
+}
+
+static bool
+box_check_replication_anon(void)
+{
+	bool anon = cfg_geti("replication_anon") != 0;
+	bool ro = cfg_geti("read_only") != 0;
+	if (anon && !ro) {
+		tnt_raise(ClientError, ER_CFG, "replication_anon",
+			  "the value may be set to true only when "
+			  "the instance is read-only");
+	}
+	return anon;
+}
+
 static void
 box_check_instance_uuid(struct tt_uuid *uuid)
 {
@@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
 	replication_skip_conflict = cfg_geti("replication_skip_conflict");
 }
 
+void
+box_set_replication_anon(void)
+{
+	bool anon = box_check_replication_anon();
+	if (anon == replication_anon)
+		return;
+
+	if (!anon) {
+		/* Turn anonymous instance into a normal one. */
+		replication_anon = anon;
+		/*
+		 * Reset all appliers. This will interrupt
+		 * anonymous follow they're in so that one of
+		 * them can register and others resend a
+		 * non-anonymous subscribe.
+		 */
+		replicaset_foreach(replica) {
+			struct applier *applier = replica->applier;
+			if (applier == NULL)
+				continue;
+			replica_clear_applier(replica);
+			applier_stop(applier);
+			replica->applier_sync_state = APPLIER_DISCONNECTED;
+			replica_set_applier(replica, applier);
+		}
+		/* Choose a master to send register request to. */
+		struct replica *master = replicaset_leader();
+		assert(master != NULL && master->applier != NULL);
+		struct applier *master_applier = master->applier;
+
+		applier_start(master_applier);
+
+		applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
+		applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
+		applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
+		applier_resume(master_applier);
+		/**
+		 * Restart other appliers to
+		 * resend non-anonymous subscribe.
+		 */
+		replicaset_foreach(replica) {
+			if (replica != master && replica->applier)
+				applier_start(replica->applier);
+		}
+	} else if (!is_box_configured) {
+		replication_anon = anon;
+	} else {
+		/*
+		 * It is forbidden to turn a normal replica into
+		 * an anonymous one.
+		 */
+		tnt_raise(ClientError, ER_CFG, "replication_anon",
+			  "cannot be turned on after bootstrap"
+			  " has finished");
+	}
+
+}
+
 void
 box_listen(void)
 {
@@ -1379,6 +1467,132 @@ box_process_auth(struct auth_request *request, const char *salt)
 	authenticate(user, len, salt, request->scramble);
 }
 
+void
+box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header)
+{
+	assert(header->type == IPROTO_FETCH_SNAPSHOT);
+
+	/* Check that bootstrap has been finished */
+	if (!is_box_configured)
+		tnt_raise(ClientError, ER_LOADING);
+
+	/* Check permissions */
+	access_check_universe_xc(PRIV_R);
+
+	/* Forbid replication with disabled WAL */
+	if (wal_mode() == WAL_NONE) {
+		tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
+			  "wal_mode = 'none'");
+	}
+
+	say_info("sending current read-view to replica at %s", sio_socketname(io->fd));
+
+	/* Send the snapshot data to the instance. */
+	struct vclock start_vclock;
+	relay_initial_join(io->fd, header->sync, &start_vclock);
+	say_info("read-view sent.");
+
+	/* Remember master's vclock after the last request */
+	struct vclock stop_vclock;
+	vclock_copy(&stop_vclock, &replicaset.vclock);
+
+	/* Send end of snapshot data marker */
+	struct xrow_header row;
+	xrow_encode_vclock_xc(&row, &stop_vclock);
+	row.sync = header->sync;
+	coio_write_xrow(io, &row);
+}
+
+void
+box_process_register(struct ev_io *io, struct xrow_header *header)
+{
+	assert(header->type == IPROTO_REGISTER);
+
+	struct tt_uuid instance_uuid = uuid_nil;
+	struct vclock vclock;
+	xrow_decode_register_xc(header, &instance_uuid, &vclock);
+
+	if (!is_box_configured)
+		tnt_raise(ClientError, ER_LOADING);
+
+	if (tt_uuid_is_equal(&instance_uuid, &INSTANCE_UUID))
+		tnt_raise(ClientError, ER_CONNECTION_TO_SELF);
+
+	/* Forbid replication from an anonymous instance. */
+	if (replication_anon) {
+		tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
+			  "replicating from an anonymous instance.");
+	}
+
+	access_check_universe_xc(PRIV_R);
+	/* We only get register requests from anonymous instances. */
+	struct replica *replica = replica_by_uuid(&instance_uuid);
+	if (replica && replica->id != REPLICA_ID_NIL) {
+		tnt_raise(ClientError, ER_REPLICA_NOT_ANON,
+			  tt_uuid_str(&instance_uuid));
+	}
+	/* See box_process_join() */
+	box_check_writable_xc();
+	struct space *space = space_cache_find_xc(BOX_CLUSTER_ID);
+	access_check_space_xc(space, PRIV_W);
+
+	/* Forbid replication with disabled WAL */
+	if (wal_mode() == WAL_NONE) {
+		tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
+			  "wal_mode = 'none'");
+	}
+
+	struct gc_consumer *gc = gc_consumer_register(&replicaset.vclock,
+				"replica %s", tt_uuid_str(&instance_uuid));
+	if (gc == NULL)
+		diag_raise();
+	auto gc_guard = make_scoped_guard([&] { gc_consumer_unregister(gc); });
+
+	say_info("registering replica %s at %s",
+		 tt_uuid_str(&instance_uuid), sio_socketname(io->fd));
+
+	struct vclock start_vclock;
+	vclock_copy(&start_vclock, &replicaset.vclock);
+
+	/**
+	 * Call the server-side hook which stores the replica uuid
+	 * in _cluster space.
+	 */
+	box_on_join(&instance_uuid);
+
+	ERROR_INJECT_YIELD(ERRINJ_REPLICA_JOIN_DELAY);
+
+	/* Remember master's vclock after the last request */
+	struct vclock stop_vclock;
+	vclock_copy(&stop_vclock, &replicaset.vclock);
+
+	/*
+	 * Feed replica with WALs in range
+	 * (start_vclock, stop_vclock) so that it gets its
+	 * registration.
+	 */
+	relay_final_join(io->fd, header->sync, &start_vclock, &stop_vclock);
+	say_info("final data sent.");
+
+	struct xrow_header row;
+	/* Send end of WAL stream marker */
+	xrow_encode_vclock_xc(&row, &replicaset.vclock);
+	row.sync = header->sync;
+	coio_write_xrow(io, &row);
+
+	/*
+	 * Advance the WAL consumer state to the position where
+	 * registration was complete and assign it to the
+	 * replica.
+	 */
+	gc_consumer_advance(gc, &stop_vclock);
+	replica = replica_by_uuid(&instance_uuid);
+	if (replica->gc != NULL)
+		gc_consumer_unregister(replica->gc);
+	replica->gc = gc;
+	gc_guard.is_active = false;
+}
+
 void
 box_process_join(struct ev_io *io, struct xrow_header *header)
 {
@@ -1438,6 +1652,12 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 	if (tt_uuid_is_equal(&instance_uuid, &INSTANCE_UUID))
 		tnt_raise(ClientError, ER_CONNECTION_TO_SELF);
 
+	/* Forbid replication from an anonymous instance. */
+	if (replication_anon) {
+		tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
+			  "replicating from an anonymous instance.");
+	}
+
 	/* Check permissions */
 	access_check_universe_xc(PRIV_R);
 
@@ -1537,23 +1757,33 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	struct vclock replica_clock;
 	uint32_t replica_version_id;
 	vclock_create(&replica_clock);
+	bool anon;
 	xrow_decode_subscribe_xc(header, NULL, &replica_uuid,
-				 &replica_clock, &replica_version_id);
+				 &replica_clock, &replica_version_id, &anon);
 
 	/* Forbid connection to itself */
 	if (tt_uuid_is_equal(&replica_uuid, &INSTANCE_UUID))
 		tnt_raise(ClientError, ER_CONNECTION_TO_SELF);
 
+	/* Forbid replication from an anonymous instance. */
+	if (replication_anon) {
+		tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
+			  "replicating from an anonymous instance.");
+	}
+
 	/* Check permissions */
 	access_check_universe_xc(PRIV_R);
 
 	/* Check replica uuid */
 	struct replica *replica = replica_by_uuid(&replica_uuid);
-	if (replica == NULL || replica->id == REPLICA_ID_NIL) {
+
+	if (!anon && (replica == NULL || replica->id == REPLICA_ID_NIL)) {
 		tnt_raise(ClientError, ER_UNKNOWN_REPLICA,
 			  tt_uuid_str(&replica_uuid),
 			  tt_uuid_str(&REPLICASET_UUID));
 	}
+	if (replica == NULL)
+		replica = replicaset_add_anon(&replica_uuid);
 
 	/* Don't allow multiple relays for the same replica */
 	if (relay_get_state(replica->relay) == RELAY_FOLLOW) {
@@ -1774,13 +2004,17 @@ bootstrap_from_master(struct replica *master)
 	 */
 
 	assert(!tt_uuid_is_nil(&INSTANCE_UUID));
-	applier_resume_to_state(applier, APPLIER_INITIAL_JOIN, TIMEOUT_INFINITY);
-
+	enum applier_state wait_state = replication_anon ?
+					APPLIER_FETCH_SNAPSHOT :
+					APPLIER_INITIAL_JOIN;
+	applier_resume_to_state(applier, wait_state, TIMEOUT_INFINITY);
 	/*
 	 * Process initial data (snapshot or dirty disk data).
 	 */
 	engine_begin_initial_recovery_xc(NULL);
-	applier_resume_to_state(applier, APPLIER_FINAL_JOIN, TIMEOUT_INFINITY);
+	wait_state = replication_anon ? APPLIER_FETCHED_SNAPSHOT :
+					APPLIER_FINAL_JOIN;
+	applier_resume_to_state(applier, wait_state, TIMEOUT_INFINITY);
 
 	/*
 	 * Process final data (WALs).
@@ -1790,8 +2024,10 @@ bootstrap_from_master(struct replica *master)
 	recovery_journal_create(&journal, &replicaset.vclock);
 	journal_set(&journal.base);
 
-	applier_resume_to_state(applier, APPLIER_JOINED, TIMEOUT_INFINITY);
-
+	if (!replication_anon) {
+		applier_resume_to_state(applier, APPLIER_JOINED,
+					TIMEOUT_INFINITY);
+	}
 	/* Finalize the new replica */
 	engine_end_recovery_xc();
 
@@ -2106,6 +2342,7 @@ box_cfg_xc(void)
 	box_set_replication_sync_lag();
 	box_set_replication_sync_timeout();
 	box_set_replication_skip_conflict();
+	box_set_replication_anon();
 
 	struct gc_checkpoint *checkpoint = gc_last_checkpoint();
 
@@ -2136,14 +2373,20 @@ box_cfg_xc(void)
 	}
 	fiber_gc();
 
-	/* Check for correct registration of the instance in _cluster */
-	{
-		struct replica *self = replica_by_uuid(&INSTANCE_UUID);
+	/*
+	 * Check for correct registration of the instance in _cluster
+	 * The instance won't exist in _cluster space if it is an
+	 * anonymous replica, add it manually.
+	 */
+	struct replica *self = replica_by_uuid(&INSTANCE_UUID);
+	if (!replication_anon) {
 		if (self == NULL || self->id == REPLICA_ID_NIL) {
 			tnt_raise(ClientError, ER_UNKNOWN_REPLICA,
 				  tt_uuid_str(&INSTANCE_UUID),
 				  tt_uuid_str(&REPLICASET_UUID));
 		}
+	} else if (self == NULL) {
+		replicaset_add_anon(&INSTANCE_UUID);
 	}
 
 	rmean_cleanup(rmean_box);
diff --git a/src/box/box.h b/src/box/box.h
index ccd527bd5..e4088d6b6 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -100,7 +100,7 @@ void
 box_atfork(void);
 
 void
-box_set_ro(bool ro);
+box_set_ro();
 
 bool
 box_is_ro(void);
@@ -179,6 +179,14 @@ box_reset_stat(void);
 void
 box_process_auth(struct auth_request *request, const char *salt);
 
+/** Send current read view to the replica. */
+void
+box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header);
+
+/** Register a replica */
+void
+box_process_register(struct ev_io *io, struct xrow_header *header);
+
 /**
  * Join a replica.
  *
@@ -234,6 +242,7 @@ void box_set_replication_connect_quorum(void);
 void box_set_replication_sync_lag(void);
 void box_set_replication_sync_timeout(void);
 void box_set_replication_skip_conflict(void);
+void box_set_replication_anon(void);
 void box_set_net_msg_max(void);
 
 extern "C" {
diff --git a/src/box/errcode.h b/src/box/errcode.h
index c660b1c70..dce69c9c0 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -258,6 +258,7 @@ struct errcode_record {
 	/*203 */_(ER_BOOTSTRAP_READONLY,	"Trying to bootstrap a local read-only instance as master") \
 	/*204 */_(ER_SQL_FUNC_WRONG_RET_COUNT,	"SQL expects exactly one argument returned from %s, got %d")\
 	/*205 */_(ER_FUNC_INVALID_RETURN_TYPE,	"Function '%s' returned value of invalid type: expected %s got %s") \
+	/*206 */_(ER_REPLICA_NOT_ANON, "Replica '%s' is not anonymous and cannot register.") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/iproto.cc b/src/box/iproto.cc
index c39b8e7bf..9e6bd2dd7 100644
--- a/src/box/iproto.cc
+++ b/src/box/iproto.cc
@@ -1162,7 +1162,7 @@ static void
 net_send_error(struct cmsg *msg);
 
 static void
-tx_process_join_subscribe(struct cmsg *msg);
+tx_process_replication(struct cmsg *msg);
 
 static void
 net_end_join(struct cmsg *msg);
@@ -1212,12 +1212,12 @@ static const struct cmsg_hop *dml_route[IPROTO_TYPE_STAT_MAX] = {
 };
 
 static const struct cmsg_hop join_route[] = {
-	{ tx_process_join_subscribe, &net_pipe },
+	{ tx_process_replication, &net_pipe },
 	{ net_end_join, NULL },
 };
 
 static const struct cmsg_hop subscribe_route[] = {
-	{ tx_process_join_subscribe, &net_pipe },
+	{ tx_process_replication, &net_pipe },
 	{ net_end_subscribe, NULL },
 };
 
@@ -1272,6 +1272,8 @@ iproto_msg_decode(struct iproto_msg *msg, const char **pos, const char *reqend,
 		cmsg_init(&msg->base, misc_route);
 		break;
 	case IPROTO_JOIN:
+	case IPROTO_FETCH_SNAPSHOT:
+	case IPROTO_REGISTER:
 		cmsg_init(&msg->base, join_route);
 		*stop_input = true;
 		break;
@@ -1752,7 +1754,7 @@ error:
 }
 
 static void
-tx_process_join_subscribe(struct cmsg *m)
+tx_process_replication(struct cmsg *m)
 {
 	struct iproto_msg *msg = tx_accept_msg(m);
 	struct iproto_connection *con = msg->connection;
@@ -1768,6 +1770,12 @@ tx_process_join_subscribe(struct cmsg *m)
 			 */
 			box_process_join(&io, &msg->header);
 			break;
+		case IPROTO_FETCH_SNAPSHOT:
+			box_process_fetch_snapshot(&io, &msg->header);
+			break;
+		case IPROTO_REGISTER:
+			box_process_register(&io, &msg->header);
+			break;
 		case IPROTO_SUBSCRIBE:
 			/*
 			 * Subscribe never returns - unless there
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 5e8a7d483..10a9a0102 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -120,6 +120,8 @@ enum iproto_key {
 	 * }
 	 */
 	IPROTO_SQL_INFO = 0x42,
+	/* Leave a gap between SQL keys and additional request keys */
+	IPROTO_REPLICA_ANON = 0x50,
 	IPROTO_KEY_MAX
 };
 
@@ -216,6 +218,10 @@ enum iproto_type {
 	IPROTO_VOTE_DEPRECATED = 67,
 	/** Vote request command for master election */
 	IPROTO_VOTE = 68,
+	/** Anonymous replication FETCH SNAPSHOT. */
+	IPROTO_FETCH_SNAPSHOT = 69,
+	/** REGISTER request to leave anonymous replication. */
+	IPROTO_REGISTER = 70,
 
 	/** Vinyl run info stored in .index file */
 	VY_INDEX_RUN_INFO = 100,
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index 4884ce013..f59470774 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -190,7 +190,7 @@ static int
 lbox_cfg_set_read_only(struct lua_State *L)
 {
 	try {
-		box_set_ro(cfg_geti("read_only") != 0);
+		box_set_ro();
 	} catch (Exception *) {
 		luaT_error(L);
 	}
@@ -338,6 +338,17 @@ lbox_cfg_set_replication_sync_timeout(struct lua_State *L)
 	return 0;
 }
 
+static int
+lbox_cfg_set_replication_anon(struct lua_State *L)
+{
+	try {
+		box_set_replication_anon();
+	} catch (Exception *) {
+		luaT_error(L);
+	}
+	return 0;
+}
+
 static int
 lbox_cfg_set_replication_skip_conflict(struct lua_State *L)
 {
@@ -377,6 +388,7 @@ box_lua_cfg_init(struct lua_State *L)
 		{"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
 		{"cfg_set_replication_sync_timeout", lbox_cfg_set_replication_sync_timeout},
 		{"cfg_set_replication_skip_conflict", lbox_cfg_set_replication_skip_conflict},
+		{"cfg_set_replication_anon", lbox_cfg_set_replication_anon},
 		{"cfg_set_net_msg_max", lbox_cfg_set_net_msg_max},
 		{NULL, NULL}
 	};
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index e029e0e17..b5909a878 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -223,7 +223,7 @@ lbox_info_id(struct lua_State *L)
 	 * at box.info.status.
 	 */
 	struct replica *self = replica_by_uuid(&INSTANCE_UUID);
-	if (self != NULL && self->id != REPLICA_ID_NIL) {
+	if (self != NULL && (self->id != REPLICA_ID_NIL || replication_anon)) {
 		lua_pushinteger(L, self->id);
 	} else {
 		luaL_pushnull(L);
@@ -243,7 +243,7 @@ lbox_info_lsn(struct lua_State *L)
 {
 	/* See comments in lbox_info_id */
 	struct replica *self = replica_by_uuid(&INSTANCE_UUID);
-	if (self != NULL && self->id != REPLICA_ID_NIL) {
+	if (self != NULL && (self->id != REPLICA_ID_NIL || replication_anon)) {
 		luaL_pushint64(L, vclock_get(box_vclock, self->id));
 	} else {
 		luaL_pushint64(L, -1);
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 85617c8f0..9dee75b7d 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -77,6 +77,7 @@ local default_cfg = {
     replication_connect_timeout = 30,
     replication_connect_quorum = nil, -- connect all
     replication_skip_conflict = false,
+    replication_anon      = false,
     feedback_enabled      = true,
     feedback_host         = "https://feedback.tarantool.io",
     feedback_interval     = 3600,
@@ -140,6 +141,7 @@ local template_cfg = {
     replication_connect_timeout = 'number',
     replication_connect_quorum = 'number',
     replication_skip_conflict = 'boolean',
+    replication_anon      = 'boolean',
     feedback_enabled      = 'boolean',
     feedback_host         = 'string',
     feedback_interval     = 'number',
@@ -247,6 +249,7 @@ local dynamic_cfg = {
     replication_sync_lag    = private.cfg_set_replication_sync_lag,
     replication_sync_timeout = private.cfg_set_replication_sync_timeout,
     replication_skip_conflict = private.cfg_set_replication_skip_conflict,
+    replication_anon        = private.cfg_set_replication_anon,
     instance_uuid           = check_instance_uuid,
     replicaset_uuid         = check_replicaset_uuid,
     net_msg_max             = private.cfg_set_net_msg_max,
@@ -301,6 +304,7 @@ local dynamic_cfg_skip_at_load = {
     replication_sync_lag    = true,
     replication_sync_timeout = true,
     replication_skip_conflict = true,
+    replication_anon        = true,
     wal_dir_rescan_delay    = true,
     custom_proc_title       = true,
     force_recovery          = true,
diff --git a/src/box/recovery.cc b/src/box/recovery.cc
index d122d618a..64aa467b1 100644
--- a/src/box/recovery.cc
+++ b/src/box/recovery.cc
@@ -262,9 +262,12 @@ recover_xlog(struct recovery *r, struct xstream *stream,
 
 		/*
 		 * All rows in xlog files have an assigned
-		 * replica id.
+		 * replica id. The only exception is anonymous
+		 * replica, which has a zero instance id.
+		 * In this case the only rows from such an instance
+		 * can be for the local spaces.
 		 */
-		assert(row.replica_id != 0);
+		assert(row.replica_id != 0 || row.group_id == GROUP_LOCAL);
 		/*
 		 * We can promote the vclock either before or
 		 * after xstream_write(): it only makes any impact
diff --git a/src/box/relay.cc b/src/box/relay.cc
index e849fcf4f..22029751f 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -569,11 +569,16 @@ relay_subscribe_f(va_list ap)
 	cbus_pair("tx", relay->endpoint.name, &relay->tx_pipe,
 		  &relay->relay_pipe, NULL, NULL, cbus_process);
 
-	/* Setup garbage collection trigger. */
+	/*
+	 * Setup garbage collection trigger.
+	 * Not needed for anonymous replicas, since they
+	 * aren't registered with gc at all.
+	 */
 	struct trigger on_close_log = {
 		RLIST_LINK_INITIALIZER, relay_on_close_log_f, relay, NULL
 	};
-	trigger_add(&r->on_close_log, &on_close_log);
+	if (!relay->replica->anon)
+		trigger_add(&r->on_close_log, &on_close_log);
 
 	/* Setup WAL watcher for sending new rows to the replica. */
 	wal_set_watcher(&relay->wal_watcher, relay->endpoint.name,
@@ -652,7 +657,8 @@ relay_subscribe_f(va_list ap)
 	say_crit("exiting the relay loop");
 
 	/* Clear garbage collector trigger and WAL watcher. */
-	trigger_clear(&on_close_log);
+	if (!relay->replica->anon)
+		trigger_clear(&on_close_log);
 	wal_clear_watcher(&relay->wal_watcher, cbus_process);
 
 	/* Join ack reader fiber. */
@@ -673,7 +679,7 @@ void
 relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 		struct vclock *replica_clock, uint32_t replica_version_id)
 {
-	assert(replica->id != REPLICA_ID_NIL);
+	assert(replica->anon || replica->id != REPLICA_ID_NIL);
 	struct relay *relay = replica->relay;
 	assert(relay->state != RELAY_FOLLOW);
 	/*
@@ -681,7 +687,7 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 	 * unless it has already been registered by initial
 	 * join.
 	 */
-	if (replica->gc == NULL) {
+	if (replica->gc == NULL && !replica->anon) {
 		replica->gc = gc_consumer_register(replica_clock, "replica %s",
 						   tt_uuid_str(&replica->uuid));
 		if (replica->gc == NULL)
@@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 	relay_start(relay, fd, sync, relay_send_row);
 	auto relay_guard = make_scoped_guard([=] {
 		relay_stop(relay);
-		replica_on_relay_stop(replica);
+		if (replica->anon)
+			replica_anon_delete(replica);
+		else
+			replica_on_relay_stop(replica);
 	});
 
 	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
@@ -741,6 +750,13 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
 {
 	struct relay *relay = container_of(stream, struct relay, stream);
 	assert(iproto_type_is_dml(packet->type));
+	/*
+	 * Replica-local requests generated while replica was
+	 * anonymous have a zero instance id. Just skip all
+	 * these rows.
+	 */
+	if (packet->replica_id == REPLICA_ID_NIL)
+		return;
 	/*
 	 * Transform replica local requests to IPROTO_NOP so as to
 	 * promote vclock on the replica without actually modifying
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 81f19aa07..ce707811a 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -53,6 +53,7 @@ int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
 double replication_sync_lag = 10.0; /* seconds */
 double replication_sync_timeout = 300.0; /* seconds */
 bool replication_skip_conflict = false;
+bool replication_anon = false;
 
 struct replicaset replicaset;
 
@@ -172,6 +173,7 @@ replica_new(void)
 		diag_raise();
 	}
 	replica->id = 0;
+	replica->anon = false;
 	replica->uuid = uuid_nil;
 	replica->applier = NULL;
 	replica->gc = NULL;
@@ -209,6 +211,19 @@ replicaset_add(uint32_t replica_id, const struct tt_uuid *replica_uuid)
 	return replica;
 }
 
+struct replica *
+replicaset_add_anon(const struct tt_uuid *replica_uuid)
+{
+	assert(!tt_uuid_is_nil(replica_uuid));
+	assert(replica_by_uuid(replica_uuid) == NULL);
+
+	struct replica *replica = replica_new();
+	replica->uuid = *replica_uuid;
+	replica_hash_insert(&replicaset.hash, replica);
+	replica->anon = true;
+	return replica;
+}
+
 void
 replica_set_id(struct replica *replica, uint32_t replica_id)
 {
@@ -220,11 +235,21 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
 		/* Assign local replica id */
 		assert(instance_id == REPLICA_ID_NIL);
 		instance_id = replica_id;
+	} else if (replica->anon) {
+		/*
+		 * Set replica gc on its transition from
+		 * anonymous to a normal one.
+		 */
+		assert(replica->gc == NULL);
+		replica->gc = gc_consumer_register(&replicaset.vclock,
+						   "replica %s",
+						   tt_uuid_str(&replica->uuid));
 	}
 	replicaset.replica_by_id[replica_id] = replica;
 
 	say_info("assigned id %d to replica %s",
 		 replica->id, tt_uuid_str(&replica->uuid));
+	replica->anon = false;
 }
 
 void
@@ -268,7 +293,7 @@ replica_clear_id(struct replica *replica)
 	}
 }
 
-static void
+void
 replica_set_applier(struct replica *replica, struct applier *applier)
 {
 	assert(replica->applier == NULL);
@@ -277,7 +302,7 @@ replica_set_applier(struct replica *replica, struct applier *applier)
 		    &replica->on_applier_state);
 }
 
-static void
+void
 replica_clear_applier(struct replica *replica)
 {
 	assert(replica->applier != NULL);
@@ -880,6 +905,18 @@ replica_on_relay_stop(struct replica *replica)
 	}
 }
 
+void
+replica_anon_delete(struct replica *replica)
+{
+	assert(replica->gc == NULL);
+	assert(replica->id == REPLICA_ID_NIL);
+	/* We do not replicate from anonymous replicas */
+	assert(replica->applier == NULL);
+	replica_hash_remove(&replicaset.hash, replica);
+	replica_delete(replica);
+}
+
+
 struct replica *
 replicaset_first(void)
 {
diff --git a/src/box/replication.h b/src/box/replication.h
index 470420592..978a09d41 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -137,6 +137,12 @@ extern double replication_sync_timeout;
  */
 extern bool replication_skip_conflict;
 
+/**
+ * Whether this replica will be anonymous or not, e.g. be preset
+ * in _cluster table and have a non-zero id.
+ */
+extern bool replication_anon;
+
 /**
  * Wait for the given period of time before trying to reconnect
  * to a master.
@@ -265,6 +271,12 @@ struct replica {
 	 * registered in the _cluster space yet.
 	 */
 	uint32_t id;
+	/**
+	 * Whether this is an anonymous replica, e.g. a read-only
+	 * replica that doesn't have an id and isn't present in
+	 * _cluster table.
+	 */
+	bool anon;
 	/** Applier fiber. */
 	struct applier *applier;
 	/** Relay thread. */
@@ -343,12 +355,21 @@ replica_set_id(struct replica *replica, uint32_t id);
 void
 replica_clear_id(struct replica *replica);
 
+void
+replica_clear_applier(struct replica *replica);
+
+void
+replica_set_applier(struct replica * replica, struct applier * applier);
+
 /**
  * Unregister \a relay from the \a replica.
  */
 void
 replica_on_relay_stop(struct replica *replica);
 
+void
+replica_anon_delete(struct replica *replica);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 
@@ -364,6 +385,9 @@ replica_check_id(uint32_t replica_id);
 struct replica *
 replicaset_add(uint32_t replica_id, const struct tt_uuid *instance_uuid);
 
+struct replica *
+replicaset_add_anon(const struct tt_uuid *replica_uuid);
+
 /**
  * Try to connect appliers to remote peers and receive UUID.
  * Appliers that did not connect will connect asynchronously.
diff --git a/src/box/wal.c b/src/box/wal.c
index 5e2c13e0e..2b238b743 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -930,6 +930,10 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 		if ((*row)->replica_id == 0) {
 			(*row)->lsn = vclock_inc(vclock_diff, instance_id) +
 				      vclock_get(base, instance_id);
+			/*
+			 * Note, an anonymous replica signs local
+			 * rows whith a zero instance id.
+			 */
 			(*row)->replica_id = instance_id;
 			/* Use lsn of the first local row as transaction id. */
 			tsn = tsn == 0 ? (*row)->lsn : tsn;
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 18bf08971..27623d5df 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -1148,11 +1148,40 @@ err:
 	return -1;
 }
 
+int
+xrow_encode_register(struct xrow_header *row,
+		     const struct tt_uuid *instance_uuid,
+		     const struct vclock *vclock)
+{
+	memset(row, 0, sizeof(*row));
+	size_t size = mp_sizeof_map(2) +
+		      mp_sizeof_uint(IPROTO_INSTANCE_UUID) +
+		      mp_sizeof_str(UUID_STR_LEN) +
+		      mp_sizeof_uint(IPROTO_VCLOCK) + mp_sizeof_vclock(vclock);
+	char *buf = (char *) region_alloc(&fiber()->gc, size);
+	if (buf == NULL) {
+		diag_set(OutOfMemory, size, "region_alloc", "buf");
+		return -1;
+	}
+	char *data = buf;
+	data = mp_encode_map(data, 2);
+	data = mp_encode_uint(data, IPROTO_INSTANCE_UUID);
+	data = xrow_encode_uuid(data, instance_uuid);
+	data = mp_encode_uint(data, IPROTO_VCLOCK);
+	data = mp_encode_vclock(data, vclock);
+	assert(data <= buf + size);
+	row->body[0].iov_base = buf;
+	row->body[0].iov_len = (data - buf);
+	row->bodycnt = 1;
+	row->type = IPROTO_REGISTER;
+	return 0;
+}
+
 int
 xrow_encode_subscribe(struct xrow_header *row,
 		      const struct tt_uuid *replicaset_uuid,
 		      const struct tt_uuid *instance_uuid,
-		      const struct vclock *vclock)
+		      const struct vclock *vclock, bool anon)
 {
 	memset(row, 0, sizeof(*row));
 	size_t size = XROW_BODY_LEN_MAX + mp_sizeof_vclock(vclock);
@@ -1162,7 +1191,7 @@ xrow_encode_subscribe(struct xrow_header *row,
 		return -1;
 	}
 	char *data = buf;
-	data = mp_encode_map(data, 4);
+	data = mp_encode_map(data, 5);
 	data = mp_encode_uint(data, IPROTO_CLUSTER_UUID);
 	data = xrow_encode_uuid(data, replicaset_uuid);
 	data = mp_encode_uint(data, IPROTO_INSTANCE_UUID);
@@ -1171,6 +1200,8 @@ xrow_encode_subscribe(struct xrow_header *row,
 	data = mp_encode_vclock(data, vclock);
 	data = mp_encode_uint(data, IPROTO_SERVER_VERSION);
 	data = mp_encode_uint(data, tarantool_version_id());
+	data = mp_encode_uint(data, IPROTO_REPLICA_ANON);
+	data = mp_encode_bool(data, anon);
 	assert(data <= buf + size);
 	row->body[0].iov_base = buf;
 	row->body[0].iov_len = (data - buf);
@@ -1182,7 +1213,7 @@ xrow_encode_subscribe(struct xrow_header *row,
 int
 xrow_decode_subscribe(struct xrow_header *row, struct tt_uuid *replicaset_uuid,
 		      struct tt_uuid *instance_uuid, struct vclock *vclock,
-		      uint32_t *version_id)
+		      uint32_t *version_id, bool *anon)
 {
 	if (row->bodycnt == 0) {
 		diag_set(ClientError, ER_INVALID_MSGPACK, "request body");
@@ -1198,6 +1229,8 @@ xrow_decode_subscribe(struct xrow_header *row, struct tt_uuid *replicaset_uuid,
 		return -1;
 	}
 
+	if (anon)
+		*anon = false;
 	d = data;
 	uint32_t map_size = mp_decode_map(&d);
 	for (uint32_t i = 0; i < map_size; i++) {
@@ -1245,6 +1278,16 @@ xrow_decode_subscribe(struct xrow_header *row, struct tt_uuid *replicaset_uuid,
 			}
 			*version_id = mp_decode_uint(&d);
 			break;
+		case IPROTO_REPLICA_ANON:
+			if (anon == NULL)
+				goto skip;
+			if (mp_typeof(*d) != MP_BOOL) {
+				xrow_on_decode_err(data, end, ER_INVALID_MSGPACK,
+						   "invalid REPLICA_ANON flag");
+				return -1;
+			}
+			*anon = mp_decode_bool(&d);
+			break;
 		default: skip:
 			mp_next(&d); /* value */
 		}
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 60def2d3c..b8da3a0d0 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -301,12 +301,27 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot);
 void
 xrow_encode_vote(struct xrow_header *row);
 
+/**
+ * Encode REGISTER command.
+ * @param[out] Row.
+ * @param instance_uuid Instance uuid.
+ * @param vclock Replication clock.
+ *
+ * @retval 0 Success.
+ * @retval -1 Memory error.
+ */
+int
+xrow_encode_register(struct xrow_header *row,
+		     const struct tt_uuid *instance_uuid,
+		     const struct vclock *vclock);
+
 /**
  * Encode SUBSCRIBE command.
  * @param[out] Row.
  * @param replicaset_uuid Replica set uuid.
  * @param instance_uuid Instance uuid.
  * @param vclock Replication clock.
+ * @param anon Whether it is an anonymous subscribe request or not.
  *
  * @retval  0 Success.
  * @retval -1 Memory error.
@@ -315,7 +330,7 @@ int
 xrow_encode_subscribe(struct xrow_header *row,
 		      const struct tt_uuid *replicaset_uuid,
 		      const struct tt_uuid *instance_uuid,
-		      const struct vclock *vclock);
+		      const struct vclock *vclock, bool anon);
 
 /**
  * Decode SUBSCRIBE command.
@@ -324,6 +339,7 @@ xrow_encode_subscribe(struct xrow_header *row,
  * @param[out] instance_uuid.
  * @param[out] vclock.
  * @param[out] version_id.
+ * @param[out] anon Whether it is an anonymous subscribe.
  *
  * @retval  0 Success.
  * @retval -1 Memory or format error.
@@ -331,7 +347,7 @@ xrow_encode_subscribe(struct xrow_header *row,
 int
 xrow_decode_subscribe(struct xrow_header *row, struct tt_uuid *replicaset_uuid,
 		      struct tt_uuid *instance_uuid, struct vclock *vclock,
-		      uint32_t *version_id);
+		      uint32_t *version_id, bool *anon);
 
 /**
  * Encode JOIN command.
@@ -355,7 +371,22 @@ xrow_encode_join(struct xrow_header *row, const struct tt_uuid *instance_uuid);
 static inline int
 xrow_decode_join(struct xrow_header *row, struct tt_uuid *instance_uuid)
 {
-	return xrow_decode_subscribe(row, NULL, instance_uuid, NULL, NULL);
+	return xrow_decode_subscribe(row, NULL, instance_uuid, NULL, NULL, NULL);
+}
+
+/**
+ * Decode REGISTER request.
+ * @param row Row to decode.
+ * @param[out] instance_uuid Instance uuid.
+ * @param[out] vclock Instance vclock.
+ * @retval 0 Success.
+ * @retval -1 Memory or format error.
+ */
+static inline int
+xrow_decode_register(struct xrow_header *row, struct tt_uuid *instance_uuid,
+		     struct vclock *vclock)
+{
+	return xrow_decode_subscribe(row, NULL, instance_uuid, vclock, NULL, NULL);
 }
 
 /**
@@ -380,7 +411,7 @@ xrow_encode_vclock(struct xrow_header *row, const struct vclock *vclock);
 static inline int
 xrow_decode_vclock(struct xrow_header *row, struct vclock *vclock)
 {
-	return xrow_decode_subscribe(row, NULL, NULL, vclock, NULL);
+	return xrow_decode_subscribe(row, NULL, NULL, vclock, NULL, NULL);
 }
 
 /**
@@ -411,7 +442,7 @@ xrow_decode_subscribe_response(struct xrow_header *row,
 			       struct tt_uuid *replicaset_uuid,
 			       struct vclock *vclock)
 {
-	return xrow_decode_subscribe(row, replicaset_uuid, NULL, vclock, NULL);
+	return xrow_decode_subscribe(row, replicaset_uuid, NULL, vclock, NULL, NULL);
 }
 
 /**
@@ -769,15 +800,25 @@ xrow_decode_ballot_xc(struct xrow_header *row, struct ballot *ballot)
 		diag_raise();
 }
 
+/** @copydoc xrow_encode_register. */
+static inline void
+xrow_encode_register_xc(struct xrow_header *row,
+		       const struct tt_uuid *instance_uuid,
+		       const struct vclock *vclock)
+{
+	if (xrow_encode_register(row, instance_uuid, vclock) != 0)
+		diag_raise();
+}
+
 /** @copydoc xrow_encode_subscribe. */
 static inline void
 xrow_encode_subscribe_xc(struct xrow_header *row,
 			 const struct tt_uuid *replicaset_uuid,
 			 const struct tt_uuid *instance_uuid,
-			 const struct vclock *vclock)
+			 const struct vclock *vclock, bool anon)
 {
 	if (xrow_encode_subscribe(row, replicaset_uuid, instance_uuid,
-				  vclock) != 0)
+				  vclock, anon) != 0)
 		diag_raise();
 }
 
@@ -786,10 +827,10 @@ static inline void
 xrow_decode_subscribe_xc(struct xrow_header *row,
 			 struct tt_uuid *replicaset_uuid,
 		         struct tt_uuid *instance_uuid, struct vclock *vclock,
-			 uint32_t *replica_version_id)
+			 uint32_t *replica_version_id, bool *anon)
 {
 	if (xrow_decode_subscribe(row, replicaset_uuid, instance_uuid,
-				  vclock, replica_version_id) != 0)
+				  vclock, replica_version_id, anon) != 0)
 		diag_raise();
 }
 
@@ -810,6 +851,15 @@ xrow_decode_join_xc(struct xrow_header *row, struct tt_uuid *instance_uuid)
 		diag_raise();
 }
 
+/** @copydoc xrow_decode_register. */
+static inline void
+xrow_decode_register_xc(struct xrow_header *row, struct tt_uuid *instance_uuid,
+			struct vclock *vclock)
+{
+	if (xrow_decode_register(row, instance_uuid, vclock) != 0)
+		diag_raise();
+}
+
 /** @copydoc xrow_encode_vclock. */
 static inline void
 xrow_encode_vclock_xc(struct xrow_header *row, const struct vclock *vclock)
diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
index 799297ba0..7aec1d715 100644
--- a/test/app-tap/init_script.result
+++ b/test/app-tap/init_script.result
@@ -25,30 +25,31 @@ box.cfg
 20	pid_file:box.pid
 21	read_only:false
 22	readahead:16320
-23	replication_connect_timeout:30
-24	replication_skip_conflict:false
-25	replication_sync_lag:10
-26	replication_sync_timeout:300
-27	replication_timeout:1
-28	slab_alloc_factor:1.05
-29	strip_core:true
-30	too_long_threshold:0.5
-31	vinyl_bloom_fpr:0.05
-32	vinyl_cache:134217728
-33	vinyl_dir:.
-34	vinyl_max_tuple_size:1048576
-35	vinyl_memory:134217728
-36	vinyl_page_size:8192
-37	vinyl_read_threads:1
-38	vinyl_run_count_per_level:2
-39	vinyl_run_size_ratio:3.5
-40	vinyl_timeout:60
-41	vinyl_write_threads:4
-42	wal_dir:.
-43	wal_dir_rescan_delay:2
-44	wal_max_size:268435456
-45	wal_mode:write
-46	worker_pool_threads:4
+23	replication_anon:false
+24	replication_connect_timeout:30
+25	replication_skip_conflict:false
+26	replication_sync_lag:10
+27	replication_sync_timeout:300
+28	replication_timeout:1
+29	slab_alloc_factor:1.05
+30	strip_core:true
+31	too_long_threshold:0.5
+32	vinyl_bloom_fpr:0.05
+33	vinyl_cache:134217728
+34	vinyl_dir:.
+35	vinyl_max_tuple_size:1048576
+36	vinyl_memory:134217728
+37	vinyl_page_size:8192
+38	vinyl_read_threads:1
+39	vinyl_run_count_per_level:2
+40	vinyl_run_size_ratio:3.5
+41	vinyl_timeout:60
+42	vinyl_write_threads:4
+43	wal_dir:.
+44	wal_dir_rescan_delay:2
+45	wal_max_size:268435456
+46	wal_mode:write
+47	worker_pool_threads:4
 --
 -- Test insert from detached fiber
 --
diff --git a/test/box/admin.result b/test/box/admin.result
index 6126f3a97..5a03a979a 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -71,6 +71,8 @@ cfg_filter(box.cfg)
     - false
   - - readahead
     - 16320
+  - - replication_anon
+    - false
   - - replication_connect_timeout
     - 30
   - - replication_skip_conflict
diff --git a/test/box/cfg.result b/test/box/cfg.result
index 5370bb870..d6ce6b621 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -59,6 +59,8 @@ cfg_filter(box.cfg)
  |     - false
  |   - - readahead
  |     - 16320
+ |   - - replication_anon
+ |     - false
  |   - - replication_connect_timeout
  |     - 30
  |   - - replication_skip_conflict
@@ -158,6 +160,8 @@ cfg_filter(box.cfg)
  |     - false
  |   - - readahead
  |     - 16320
+ |   - - replication_anon
+ |     - false
  |   - - replication_connect_timeout
  |     - 30
  |   - - replication_skip_conflict
diff --git a/test/box/misc.result b/test/box/misc.result
index d2a20307a..134caef63 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -554,6 +554,7 @@ t;
   203: box.error.BOOTSTRAP_READONLY
   204: box.error.SQL_FUNC_WRONG_RET_COUNT
   205: box.error.FUNC_INVALID_RETURN_TYPE
+  206: box.error.REPLICA_NOT_ANON
 ...
 test_run:cmd("setopt delimiter ''");
 ---
diff --git a/test/replication/anon.lua b/test/replication/anon.lua
new file mode 100644
index 000000000..2e7ee9983
--- /dev/null
+++ b/test/replication/anon.lua
@@ -0,0 +1,13 @@
+#!/usr/bin/env tarantool
+
+box.cfg({
+    listen              = os.getenv("LISTEN"),
+    replication         = os.getenv("MASTER"),
+    memtx_memory        = 107374182,
+    replication_timeout = 0.1,
+    replication_connect_timeout = 0.5,
+    read_only=true,
+    replication_anon=true,
+})
+
+require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/anon.result b/test/replication/anon.result
new file mode 100644
index 000000000..f6a85e7d0
--- /dev/null
+++ b/test/replication/anon.result
@@ -0,0 +1,326 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+vclock_diff = require('fast_replica').vclock_diff
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+
+
+--
+-- gh-3186 Anonymous replicas.
+--
+-- Prepare master.
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+_ = box.schema.space.create('loc', {is_local=true})
+ | ---
+ | ...
+_ = box.schema.space.create('temp', {temporary=true})
+ | ---
+ | ...
+_ = box.schema.space.create('test')
+ | ---
+ | ...
+_ = box.space.loc:create_index('pk')
+ | ---
+ | ...
+_ = box.space.temp:create_index('pk')
+ | ---
+ | ...
+_ = box.space.test:create_index('pk')
+ | ---
+ | ...
+box.space.test:insert{1}
+ | ---
+ | - [1]
+ | ...
+
+test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+
+box.info.status
+ | ---
+ | - running
+ | ...
+box.info.id
+ | ---
+ | - 0
+ | ...
+box.info.lsn
+ | ---
+ | - 0
+ | ...
+test_run:wait_upstream(1, {status='follow'})
+ | ---
+ | - true
+ | ...
+
+-- Temporary spaces are accessible as read / write.
+for i = 1,10 do box.space.temp:insert{i} end
+ | ---
+ | ...
+box.space.temp:select{}
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ |   - [4]
+ |   - [5]
+ |   - [6]
+ |   - [7]
+ |   - [8]
+ |   - [9]
+ |   - [10]
+ | ...
+
+box.info.lsn
+ | ---
+ | - 0
+ | ...
+
+-- Same for local spaces.
+for i = 1,10 do box.space.loc:insert{i} end
+ | ---
+ | ...
+box.space.loc:select{}
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ |   - [4]
+ |   - [5]
+ |   - [6]
+ |   - [7]
+ |   - [8]
+ |   - [9]
+ |   - [10]
+ | ...
+
+-- Replica-local changes are accounted for in 0 vclock component.
+box.info.lsn
+ | ---
+ | - 10
+ | ...
+box.info.vclock[0]
+ | ---
+ | - 10
+ | ...
+
+-- Replica is read-only.
+box.cfg.read_only
+ | ---
+ | - true
+ | ...
+box.cfg{read_only=false}
+ | ---
+ | - error: 'Incorrect value for option ''read_only'': the value may be set to false
+ |     only when replication_anon is false'
+ | ...
+
+box.space.test:insert{2}
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+box.space.loc:drop()
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+box.space.loc:truncate()
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+-- Replica isn't visible on master.
+#box.info.replication
+ | ---
+ | - 1
+ | ...
+
+-- Test that replication (even anonymous) from an anonymous
+-- instance is forbidden. An anonymous replica will fetch
+-- a snapshot though.
+test_run:cmd([[create server replica_anon2 with rpl_master=replica_anon,\
+             script="replication/anon2.lua"]])
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica_anon2')
+ | ---
+ | - true
+ | ...
+test_run:wait_log('replica_anon2',\
+                  'Replication does not support replicating from an anonymous instance',\
+                  nil, 10)
+ | ---
+ | - Replication does not support replicating from an anonymous instance
+ | ...
+test_run:cmd('switch replica_anon2')
+ | ---
+ | - true
+ | ...
+a = box.info.vclock[1]
+ | ---
+ | ...
+-- The instance did fetch a snapshot.
+a > 0
+ | ---
+ | - true
+ | ...
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+box.space.test:insert{2}
+ | ---
+ | - [2]
+ | ...
+test_run:cmd("switch replica_anon2")
+ | ---
+ | - true
+ | ...
+-- Second replica doesn't follow master through the
+-- 1st one. Replication from an anonymous instance
+-- is forbidden indeed.
+box.info.vclock[1] == a or box.info.vclock[1]
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('stop server replica_anon2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica_anon2')
+ | ---
+ | - true
+ | ...
+
+-- Promote anonymous replica.
+box.cfg{replication_anon=false}
+ | ---
+ | ...
+-- Cannot switch back after becoming "normal".
+box.cfg{replication_anon=true}
+ | ---
+ | - error: 'Incorrect value for option ''replication_anon'': cannot be turned on after
+ |     bootstrap has finished'
+ | ...
+
+box.info.id
+ | ---
+ | - 2
+ | ...
+#box.info.replication
+ | ---
+ | - 2
+ | ...
+test_run:wait_upstream(1, {status='follow'})
+ | ---
+ | - true
+ | ...
+box.info.replication.downstream
+ | ---
+ | - null
+ | ...
+
+old_lsn = box.info.vclock[2] or 0
+ | ---
+ | ...
+
+-- Now read_only can be turned off.
+box.cfg{read_only=false}
+ | ---
+ | ...
+box.space.test:insert{3}
+ | ---
+ | - [3]
+ | ...
+-- New changes are tracked under freshly assigned id.
+box.info.vclock[2] == old_lsn + 1
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+-- Other instances may replicate from a previously-anonymous one.
+test_run:cmd("set variable repl_source to 'replica_anon.listen'")
+ | ---
+ | - true
+ | ...
+box.cfg{replication=repl_source}
+ | ---
+ | ...
+#box.info.replication
+ | ---
+ | - 2
+ | ...
+test_run:wait_upstream(2, {status='follow'})
+ | ---
+ | - true
+ | ...
+test_run:wait_downstream(2, {status='follow'})
+ | ---
+ | - true
+ | ...
+#box.info.vclock
+ | ---
+ | - 2
+ | ...
+
+-- Cleanup.
+box.cfg{replication=""}
+ | ---
+ | ...
+test_run:cmd('stop server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica_anon')
+ | ---
+ | - true
+ | ...
+box.space.test:drop()
+ | ---
+ | ...
+box.space.temp:drop()
+ | ---
+ | ...
+box.space.loc:drop()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
diff --git a/test/replication/anon.test.lua b/test/replication/anon.test.lua
new file mode 100644
index 000000000..1877ec8bb
--- /dev/null
+++ b/test/replication/anon.test.lua
@@ -0,0 +1,118 @@
+env = require('test_run')
+vclock_diff = require('fast_replica').vclock_diff
+test_run = env.new()
+
+
+--
+-- gh-3186 Anonymous replicas.
+--
+-- Prepare master.
+box.schema.user.grant('guest', 'replication')
+_ = box.schema.space.create('loc', {is_local=true})
+_ = box.schema.space.create('temp', {temporary=true})
+_ = box.schema.space.create('test')
+_ = box.space.loc:create_index('pk')
+_ = box.space.temp:create_index('pk')
+_ = box.space.test:create_index('pk')
+box.space.test:insert{1}
+
+test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"')
+test_run:cmd('start server replica_anon')
+test_run:cmd('switch replica_anon')
+
+box.info.status
+box.info.id
+box.info.lsn
+test_run:wait_upstream(1, {status='follow'})
+
+-- Temporary spaces are accessible as read / write.
+for i = 1,10 do box.space.temp:insert{i} end
+box.space.temp:select{}
+
+box.info.lsn
+
+-- Same for local spaces.
+for i = 1,10 do box.space.loc:insert{i} end
+box.space.loc:select{}
+
+-- Replica-local changes are accounted for in 0 vclock component.
+box.info.lsn
+box.info.vclock[0]
+
+-- Replica is read-only.
+box.cfg.read_only
+box.cfg{read_only=false}
+
+box.space.test:insert{2}
+
+box.space.loc:drop()
+box.space.loc:truncate()
+
+test_run:cmd('switch default')
+
+-- Replica isn't visible on master.
+#box.info.replication
+
+-- Test that replication (even anonymous) from an anonymous
+-- instance is forbidden. An anonymous replica will fetch
+-- a snapshot though.
+test_run:cmd([[create server replica_anon2 with rpl_master=replica_anon,\
+             script="replication/anon2.lua"]])
+test_run:cmd('start server replica_anon2')
+test_run:wait_log('replica_anon2',\
+                  'Replication does not support replicating from an anonymous instance',\
+                  nil, 10)
+test_run:cmd('switch replica_anon2')
+a = box.info.vclock[1]
+-- The instance did fetch a snapshot.
+a > 0
+test_run:cmd('switch default')
+box.space.test:insert{2}
+test_run:cmd("switch replica_anon2")
+-- Second replica doesn't follow master through the
+-- 1st one. Replication from an anonymous instance
+-- is forbidden indeed.
+box.info.vclock[1] == a or box.info.vclock[1]
+
+test_run:cmd('switch replica_anon')
+
+test_run:cmd('stop server replica_anon2')
+test_run:cmd('delete server replica_anon2')
+
+-- Promote anonymous replica.
+box.cfg{replication_anon=false}
+-- Cannot switch back after becoming "normal".
+box.cfg{replication_anon=true}
+
+box.info.id
+#box.info.replication
+test_run:wait_upstream(1, {status='follow'})
+box.info.replication.downstream
+
+old_lsn = box.info.vclock[2] or 0
+
+-- Now read_only can be turned off.
+box.cfg{read_only=false}
+box.space.test:insert{3}
+-- New changes are tracked under freshly assigned id.
+box.info.vclock[2] == old_lsn + 1
+
+test_run:cmd('switch default')
+
+-- Other instances may replicate from a previously-anonymous one.
+test_run:cmd("set variable repl_source to 'replica_anon.listen'")
+box.cfg{replication=repl_source}
+#box.info.replication
+test_run:wait_upstream(2, {status='follow'})
+test_run:wait_downstream(2, {status='follow'})
+#box.info.vclock
+
+-- Cleanup.
+box.cfg{replication=""}
+test_run:cmd('stop server replica_anon')
+test_run:cmd('delete server replica_anon')
+box.space.test:drop()
+box.space.temp:drop()
+box.space.loc:drop()
+box.schema.user.revoke('guest', 'replication')
+test_run:cleanup_cluster()
diff --git a/test/replication/anon1.lua b/test/replication/anon1.lua
new file mode 120000
index 000000000..6638147e5
--- /dev/null
+++ b/test/replication/anon1.lua
@@ -0,0 +1 @@
+anon.lua
\ No newline at end of file
diff --git a/test/replication/anon2.lua b/test/replication/anon2.lua
new file mode 120000
index 000000000..6638147e5
--- /dev/null
+++ b/test/replication/anon2.lua
@@ -0,0 +1 @@
+anon.lua
\ No newline at end of file
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index cd686a0e2..429c64df3 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -1,4 +1,5 @@
 {
+    "anon.test.lua": {},
     "misc.test.lua": {},
     "once.test.lua": {},
     "on_replace.test.lua": {},
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons.
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons sergepetrenko
@ 2019-12-25 16:00   ` Vladislav Shpilevoy
  2019-12-27 18:42   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-25 16:00 UTC (permalink / raw)
  To: sergepetrenko, Kirill Yukhin; +Cc: tarantool-patches

Thanks for working on this!

The first 4 commits LGTM. I've force pushed a tiny
nit fix below:

diff --git a/src/box/vclock.h b/src/box/vclock.h
index 35ba6284c..5d1a50d3d 100644
--- a/src/box/vclock.h
+++ b/src/box/vclock.h
@@ -289,7 +289,6 @@ vclock_compare(const struct vclock *a, const struct vclock *b)
 		replica_id = bit_iterator_next(&it);
 
 	for (; replica_id < VCLOCK_MAX; replica_id = bit_iterator_next(&it)) {
-
 		int64_t lsn_a = vclock_get(a, replica_id);
 		int64_t lsn_b = vclock_get(b, replica_id);
 		le = le && lsn_a <= lsn_b;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica sergepetrenko
@ 2019-12-25 18:22   ` Vladislav Shpilevoy
  2019-12-27 15:27     ` Sergey Petrenko
  0 siblings, 1 reply; 15+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-25 18:22 UTC (permalink / raw)
  To: sergepetrenko, georgy; +Cc: tarantool-patches

Thanks for the patch!

I've pushed my review fixes on top of this commit. See it in the
end of the email, and on the branch.

We should document new iproto codes: IPROTO_FETCH_SNAPSHOT,
IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here:
https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/

See 4 comments below.

> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index f4f9d0670..2939a0f61 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -996,10 +1035,24 @@ applier_f(va_list ap)
>  			if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>  				/*
>  				 * Execute JOIN if this is a bootstrap.
> +				 * In case of anonymous replication, don't
> +				 * join but just fetch master's snapshot.
> +				 *
>  				 * The join will pause the applier
>  				 * until WAL is created.
>  				 */
> -				applier_join(applier);
> +				if (replication_anon)
> +					applier_fetch_snapshot(applier);
> +				else
> +					applier_join(applier);
> +			}
> +			if (applier->version_id >= version_id(1, 7, 0) &&
> +			    !replication_anon && instance_id == REPLICA_ID_NIL) {
> +				/* anonymity was turned off while we were
> +				 * fetching a snapshot or following master.
> +				 * Register the replica now.
> +				 */
> +				applier_register(applier);
>  			}

1. Why do we check for version here? Wouldn't it be more straight
to do something like this?

================================================================================

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 3efa2b765..32851506c 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -1032,6 +1032,7 @@ applier_f(va_list ap)
 	while (!fiber_is_cancelled()) {
 		try {
 			applier_connect(applier);
+			bool was_anon = replication_anon;
 			if (tt_uuid_is_nil(&REPLICASET_UUID)) {
 				/*
 				 * Execute JOIN if this is a bootstrap.
@@ -1046,8 +1047,8 @@ applier_f(va_list ap)
 				else
 					applier_join(applier);
 			}
-			if (applier->version_id >= version_id(1, 7, 0) &&
-			    !replication_anon && instance_id == REPLICA_ID_NIL) {
+			if (was_anon && !replication_anon &&
+			    instance_id == REPLICA_ID_NIL) {
 				/*
 				 * Anonymity was turned off while
 				 * we were fetching a snapshot or

================================================================================

Also I don't understand, why is it needed? From what I see in
box_set_replication_anon(), we stop existing appliers, when
change `replication_anon`. So in case `replication_anon` is
changed, the fiber should be canceled by that moment.
applier_connect() does fiber_testcancel() via coio after each
yield.

A register could even harm here, because this instance will do
register twice - from there, and from a new applier fiber, started
in box_set_replication_anon().

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 56d07e634..a93e2503e 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
>  	replication_skip_conflict = cfg_geti("replication_skip_conflict");
>  }
>  
> +void
> +box_set_replication_anon(void)
> +{
> +	bool anon = box_check_replication_anon();
> +	if (anon == replication_anon)
> +		return;
> +
> +	if (!anon) {
> +		/* Turn anonymous instance into a normal one. */
> +		replication_anon = anon;

2. In case of any exception `replication_anon` will contain a not
applied value.

But there is a worse problem - we stop the appliers, and don't
restart them in case of an exception. So box_set_replication_anon()
is not atomic, it may leave the instance in a half configured
state. box_set_replication() deals with that by creation of a new
array of appliers. It connects them, and only then atomically
replaces the old appliers.

Also looks like deanonymisation does not respect quorum. I did
this test:

    Master:
        box.cfg{listen=3301}
        box.schema.user.grant('guest', 'read,write,execute', 'universe')


    Replica:
        box.cfg{replication_anon=true,
                read_only=true,
                replication_connect_quorum=0,
                replication=3301}

    Master:
        Killed.

    Replica:
        box.cfg{replication_anon=false, read_only=false}

The replica hangs. Even though the quorum is 0.

I think, we somehow need to reuse box_set_replication() for all
of this. Looks like it does all we need except does not send the
register requests.

> +		/*
> +		 * Reset all appliers. This will interrupt
> +		 * anonymous follow they're in so that one of
> +		 * them can register and others resend a
> +		 * non-anonymous subscribe.
> +		 */
> +		replicaset_foreach(replica) {
> +			struct applier *applier = replica->applier;
> +			if (applier == NULL)
> +				continue;
> +			replica_clear_applier(replica);
> +			applier_stop(applier);
> +			replica->applier_sync_state = APPLIER_DISCONNECTED;
> +			replica_set_applier(replica, applier);
> +		}
> +		/* Choose a master to send register request to. */
> +		struct replica *master = replicaset_leader();
> +		assert(master != NULL && master->applier != NULL);
> +		struct applier *master_applier = master->applier;
> +
> +		applier_start(master_applier);
> +
> +		applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
> +		applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
> +		applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
> +		applier_resume(master_applier);
> +		/**
> +		 * Restart other appliers to
> +		 * resend non-anonymous subscribe.
> +		 */
> +		replicaset_foreach(replica) {
> +			if (replica != master && replica->applier)
> +				applier_start(replica->applier);
> +		}
> +	} else if (!is_box_configured) {
> +		replication_anon = anon;
> +	} else {
> +		/*
> +		 * It is forbidden to turn a normal replica into
> +		 * an anonymous one.
> +		 */
> +		tnt_raise(ClientError, ER_CFG, "replication_anon",
> +			  "cannot be turned on after bootstrap"
> +			  " has finished");
> +	}
> +
> +}
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index e849fcf4f..22029751f 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>  	relay_start(relay, fd, sync, relay_send_row);
>  	auto relay_guard = make_scoped_guard([=] {
>  		relay_stop(relay);
> -		replica_on_relay_stop(replica);
> +		if (replica->anon)
> +			replica_anon_delete(replica);
> +		else
> +			replica_on_relay_stop(replica);

3. Why can't replica_on_relay_stop() be used for an anon replica?

>  	});
>  
> @@ -741,6 +750,13 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
>  {
>  	struct relay *relay = container_of(stream, struct relay, stream);
>  	assert(iproto_type_is_dml(packet->type));
> +	/*
> +	 * Replica-local requests generated while replica was
> +	 * anonymous have a zero instance id. Just skip all
> +	 * these rows.
> +	 */
> +	if (packet->replica_id == REPLICA_ID_NIL)
> +		return;

4. From what I remember about local requests in an anon replica,
this is possible only with packet->group_id == GROUP_LOCAL,
right? Then can you move it into the check about GROUP_LOCAL
below? To avoid checking replica_id, when the group is not local
anyway. Like this:

	if (packet->group_id == GROUP_LOCAL) {
		if (packet->replica_id == REPLICA_ID_NIL)
			return;
		packet->type = IPROTO_NOP;
		packet->group_id = GROUP_DEFAULT;
		packet->bodycnt = 0;
	}

================================================================================

commit 3e8fded091fae8abacc876bf6c899750b595d72d
Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Date:   Wed Dec 25 17:20:45 2019 +0100

    Review fixes

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 2939a0f61..3efa2b765 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -1048,9 +1048,11 @@ applier_f(va_list ap)
 			}
 			if (applier->version_id >= version_id(1, 7, 0) &&
 			    !replication_anon && instance_id == REPLICA_ID_NIL) {
-				/* anonymity was turned off while we were
-				 * fetching a snapshot or following master.
-				 * Register the replica now.
+				/*
+				 * Anonymity was turned off while
+				 * we were fetching a snapshot or
+				 * following master. Register the
+				 * replica now.
 				 */
 				applier_register(applier);
 			}
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 9dee75b7d..51d157aa3 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -279,6 +279,7 @@ local dynamic_cfg_order = {
     replication_sync_timeout    = 150,
     replication_connect_timeout = 150,
     replication_connect_quorum  = 150,
+    replication_anon        = 175,
================================================================================

I am not sure about this change.

The only way how can it be changed now - from true to false.
Not back. The only dubious scenario: we change `replication_anon`
to false, and change `replication` at the same time.

If we set `replication_anon` false before new `replication`,
we will try to register the old `replication`. It means
`replication_anon` update should be done after `replication`.
So as the new configuration could connect as anon, and then be
upgraded to normal.

But when we will support anonymization of normal replicas, the
config update could look like this:

    `replication_anon` false -> true
    `replication` old -> new

And here `replication_anon` should be set before `replication`.
So as we would connect anonymously to the new replication set.

In both cases we do unnecessary actions with the old
replication.

Seems like it can't be properly solved by dynamic_cfg_order
table. We need to configure `replication_anon` and `replication`
at the same time. So as in box_set_replication() you always know
if the new `replication` setting is anon or not.

Maybe we need to extend dynamic_cfg_order idea, and introduce
depending parameters. Or parameter groups. Or something like
that. So as we could, for example, group `replication_anon` and
`replication`, and pass anon value to box_set_replication() when
they are updated both at the same box.cfg.

Or we need to forbid update of some parameters in one box.cfg.

Or we could make that order of parameters depends on their
values. Although it does not solve the problem that when we
change both `replication_anon` and `replication`, we do
unnecessary actions to the old `replication` in some cases.

================================================================================
     replication             = 200,
 }
 
diff --git a/src/box/replication.cc b/src/box/replication.cc
index ce707811a..d9848debe 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -916,7 +916,6 @@ replica_anon_delete(struct replica *replica)
 	replica_delete(replica);
 }
 
-
 struct replica *
 replicaset_first(void)
 {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.
  2019-12-25 18:22   ` Vladislav Shpilevoy
@ 2019-12-27 15:27     ` Sergey Petrenko
  2019-12-27 18:42       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Petrenko @ 2019-12-27 15:27 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches


>Среда, 25 декабря 2019, 21:22 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
>
>Thanks for the patch!

Hi! Thanks for your answer! My comments and incremental diff are below.
Also please check out the discussion with Kostja Osipov in this thread
( [Tarantool-patches] [PATCH 4/5] vclock: ignore 0th component in comparisons.),
there are some new changes to the commit regarding 0 vclock component comparison.

>
>I've pushed my review fixes on top of this commit. See it in the
>end of the email, and on the branch.
Thanks! Looks good.
>
>
>We should document new iproto codes: IPROTO_FETCH_SNAPSHOT,
>IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here:
>https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/

Opened a ticket. https://github.com/tarantool/doc/issues/1046

>
>
>See 4 comments below.
>
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index f4f9d0670..2939a0f61 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -996,10 +1035,24 @@ applier_f(va_list ap)
>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>  /*
>>   * Execute JOIN if this is a bootstrap.
>> + * In case of anonymous replication, don't
>> + * join but just fetch master's snapshot.
>> + *
>>   * The join will pause the applier
>>   * until WAL is created.
>>   */
>> -applier_join(applier);
>> +if (replication_anon)
>> +applier_fetch_snapshot(applier);
>> +else
>> +applier_join(applier);
>> +}
>> +if (applier->version_id >= version_id(1, 7, 0) &&
>> +    !replication_anon && instance_id == REPLICA_ID_NIL) {
>> +/* anonymity was turned off while we were
>> + * fetching a snapshot or following master.
>> + * Register the replica now.
>> + */
>> +applier_register(applier);
>>  }
>
>1. Why do we check for version here? Wouldn't it be more straight
>to do something like this?
>
>================================================================================
>
>diff --git a/src/box/applier.cc b/src/box/applier.cc
>index 3efa2b765..32851506c 100644
>--- a/src/box/applier.cc
>+++ b/src/box/applier.cc
>@@ -1032,6 +1032,7 @@ applier_f(va_list ap)
> while (!fiber_is_cancelled()) {
> try {
> applier_connect(applier);
>+bool was_anon = replication_anon;
> if (tt_uuid_is_nil(&REPLICASET_UUID)) {
> /*
>  * Execute JOIN if this is a bootstrap.
>@@ -1046,8 +1047,8 @@ applier_f(va_list ap)
> else
> applier_join(applier);
> }
>-if (applier->version_id >= version_id(1, 7, 0) &&
>-    !replication_anon && instance_id == REPLICA_ID_NIL) {
>+if (was_anon && !replication_anon &&
>+    instance_id == REPLICA_ID_NIL) {
> /*
>  * Anonymity was turned off while
>  * we were fetching a snapshot or
>
>================================================================================
>
>Also I don't understand, why is it needed? From what I see in
>box_set_replication_anon(), we stop existing appliers, when
>change `replication_anon`. So in case `replication_anon` is
>changed, the fiber should be canceled by that moment.
>applier_connect() does fiber_testcancel() via coio after each
>yield.
>
>A register could even harm here, because this instance will do
>register twice - from there, and from a new applier fiber, started
>in box_set_replication_anon().

True, the version check is pointless and the comment is misleading.
But we need to call applier_register here. In case of a transition
from anon to normal, we have already fetched a snapshot, so we
shouldn't execute join in a new applier, we just execute register and
proceed to subscribe. Fixed.

>
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 56d07e634..a93e2503e 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
>>  replication_skip_conflict = cfg_geti("replication_skip_conflict");
>>  }
>> 
>> +void
>> +box_set_replication_anon(void)
>> +{
>> +bool anon = box_check_replication_anon();
>> +if (anon == replication_anon)
>> +return;
>> +
>> +if (!anon) {
>> +/* Turn anonymous instance into a normal one. */
>> +replication_anon = anon;
>
>2. In case of any exception `replication_anon` will contain a not
>applied value.
>
>But there is a worse problem - we stop the appliers, and don't
>restart them in case of an exception. So box_set_replication_anon()
>is not atomic, it may leave the instance in a half configured
>state. box_set_replication() deals with that by creation of a new
>array of appliers. It connects them, and only then atomically
>replaces the old appliers.
>
>Also looks like deanonymisation does not respect quorum. I did
>this test:
>
>    Master:
>        box.cfg{listen=3301}
>        box.schema.user.grant('guest', 'read,write,execute', 'universe')
>
>
>    Replica:
>        box.cfg{replication_anon=true,
>                read_only=true,
>                replication_connect_quorum=0,
>                replication=3301}
>
>    Master:
>        Killed.
>
>    Replica:
>        box.cfg{replication_anon=false, read_only=false}
>
>The replica hangs. Even though the quorum is 0.
>
>I think, we somehow need to reuse box_set_replication() for all
>of this. Looks like it does all we need except does not send the
>register requests.

True.

>
>> +/*
>> + * Reset all appliers. This will interrupt
>> + * anonymous follow they're in so that one of
>> + * them can register and others resend a
>> + * non-anonymous subscribe.
>> + */
>> +replicaset_foreach(replica) {
>> +struct applier *applier = replica->applier;
>> +if (applier == NULL)
>> +continue;
>> +replica_clear_applier(replica);
>> +applier_stop(applier);
>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>> +replica_set_applier(replica, applier);
>> +}
>> +/* Choose a master to send register request to. */
>> +struct replica *master = replicaset_leader();
>> +assert(master != NULL && master->applier != NULL);
>> +struct applier *master_applier = master->applier;
>> +
>> +applier_start(master_applier);
>> +
>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
>> +applier_resume(master_applier);
>> +/**
>> + * Restart other appliers to
>> + * resend non-anonymous subscribe.
>> + */
>> +replicaset_foreach(replica) {
>> +if (replica != master && replica->applier)
>> +applier_start(replica->applier);
>> +}
>> +} else if (!is_box_configured) {
>> +replication_anon = anon;
>> +} else {
>> +/*
>> + * It is forbidden to turn a normal replica into
>> + * an anonymous one.
>> + */
>> +tnt_raise(ClientError, ER_CFG, "replication_anon",
>> +  "cannot be turned on after bootstrap"
>> +  " has finished");
>> +}
>> +
>> +}
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index e849fcf4f..22029751f 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>>  relay_start(relay, fd, sync, relay_send_row);
>>  auto relay_guard = make_scoped_guard([=] {
>>  relay_stop(relay);
>> -replica_on_relay_stop(replica);
>> +if (replica->anon)
>> +replica_anon_delete(replica);
>> +else
>> +replica_on_relay_stop(replica);
>
>3. Why can't replica_on_relay_stop() be used for an anon replica?

replica_on_relay_stop() asserts the replica is registered with gc and
removes it from consumers. Anonymous replicas aren't registered with gc.
I can rewrite replica_on_relay_stop() appropriately, but why?

>
>>  });
>> 
>> @@ -741,6 +750,13 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
>>  {
>>  struct relay *relay = container_of(stream, struct relay, stream);
>>  assert(iproto_type_is_dml(packet->type));
>> +/*
>> + * Replica-local requests generated while replica was
>> + * anonymous have a zero instance id. Just skip all
>> + * these rows.
>> + */
>> +if (packet->replica_id == REPLICA_ID_NIL)
>> +return;
>
>4. From what I remember about local requests in an anon replica,
>this is possible only with packet->group_id == GROUP_LOCAL,
>right? Then can you move it into the check about GROUP_LOCAL
>below? To avoid checking replica_id, when the group is not local
>anyway. Like this:
>
>if (packet->group_id == GROUP_LOCAL) {
>if (packet->replica_id == REPLICA_ID_NIL)
>return;
>packet->type = IPROTO_NOP;
>packet->group_id = GROUP_DEFAULT;
>packet->bodycnt = 0;
>}

Good point, thanks!

My changes are below.

>
>================================================================================
>
>commit 3e8fded091fae8abacc876bf6c899750b595d72d
>Author: Vladislav Shpilevoy < v.shpilevoy@tarantool.org >
>Date:   Wed Dec 25 17:20:45 2019 +0100
>
>    Review fixes
>
>diff --git a/src/box/applier.cc b/src/box/applier.cc
>index 2939a0f61..3efa2b765 100644
>--- a/src/box/applier.cc
>+++ b/src/box/applier.cc
>@@ -1048,9 +1048,11 @@ applier_f(va_list ap)
> }
> if (applier->version_id >= version_id(1, 7, 0) &&
>     !replication_anon && instance_id == REPLICA_ID_NIL) {
>-/* anonymity was turned off while we were
>- * fetching a snapshot or following master.
>- * Register the replica now.
>+/*
>+ * Anonymity was turned off while
>+ * we were fetching a snapshot or
>+ * following master. Register the
>+ * replica now.
>  */
> applier_register(applier);
> }
>diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>index 9dee75b7d..51d157aa3 100644
>--- a/src/box/lua/load_cfg.lua
>+++ b/src/box/lua/load_cfg.lua
>@@ -279,6 +279,7 @@ local dynamic_cfg_order = {
>     replication_sync_timeout    = 150,
>     replication_connect_timeout = 150,
>     replication_connect_quorum  = 150,
>+    replication_anon        = 175,
>================================================================================
>
>I am not sure about this change.
>
>The only way how can it be changed now - from true to false.
>Not back. The only dubious scenario: we change `replication_anon`
>to false, and change `replication` at the same time.
>
>If we set `replication_anon` false before new `replication`,
>we will try to register the old `replication`. It means
>`replication_anon` update should be done after `replication`.
>So as the new configuration could connect as anon, and then be
>upgraded to normal.
>
>But when we will support anonymization of normal replicas, the
>config update could look like this:
>
>    `replication_anon` false -> true
>    `replication` old -> new
>
>And here `replication_anon` should be set before `replication`.
>So as we would connect anonymously to the new replication set.
>
>In both cases we do unnecessary actions with the old
>replication.
>
>Seems like it can't be properly solved by dynamic_cfg_order
>table. We need to configure `replication_anon` and `replication`
>at the same time. So as in box_set_replication() you always know
>if the new `replication` setting is anon or not.
>
>Maybe we need to extend dynamic_cfg_order idea, and introduce
>depending parameters. Or parameter groups. Or something like
>that. So as we could, for example, group `replication_anon` and
>`replication`, and pass anon value to box_set_replication() when
>they are updated both at the same box.cfg.
>
>Or we need to forbid update of some parameters in one box.cfg.
>
>Or we could make that order of parameters depends on their
>values. Although it does not solve the problem that when we
>change both `replication_anon` and `replication`, we do
>unnecessary actions to the old `replication` in some cases.

Well, I reused the box_set_replication code in replication_anon change, so
now if replication_anon is changed together with replication, it will trigger
duplicate box_set_replication action (almost). This is ok, I guess.

>
>
>================================================================================
>     replication             = 200,
> }
>
>diff --git a/src/box/replication.cc b/src/box/replication.cc
>index ce707811a..d9848debe 100644
>--- a/src/box/replication.cc
>+++ b/src/box/replication.cc
>@@ -916,7 +916,6 @@ replica_anon_delete(struct replica *replica)
> replica_delete(replica);
> }
>
>-
> struct replica *
> replicaset_first(void)
> {


diff --git a/src/box/applier.cc b/src/box/applier.cc
index 10725ed2c..ae3d281a5 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -1046,13 +1046,12 @@ applier_f(va_list ap)
 else
 applier_join(applier);
 }
-if (applier->version_id >= version_id(1, 7, 0) &&
-    !replication_anon && instance_id == REPLICA_ID_NIL) {
+if (instance_id == REPLICA_ID_NIL &&
+    !replication_anon) {
 /*
- * Anonymity was turned off while
- * we were fetching a snapshot or
- * following master. Register the
- * replica now.
+ * The instance transitioned
+ * from anonymous. Register it
+ * now.
  */
 applier_register(applier);
 }
diff --git a/src/box/box.cc b/src/box/box.cc
index a93e2503e..f99aeebc6 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -778,6 +778,9 @@ box_set_replication_anon(void)
 return;
 
 if (!anon) {
+auto guard = make_scoped_guard([&]{
+replication_anon = !anon;
+});
 /* Turn anonymous instance into a normal one. */
 replication_anon = anon;
 /*
@@ -786,34 +789,30 @@ box_set_replication_anon(void)
  * them can register and others resend a
  * non-anonymous subscribe.
  */
-replicaset_foreach(replica) {
-struct applier *applier = replica->applier;
-if (applier == NULL)
-continue;
-replica_clear_applier(replica);
-applier_stop(applier);
-replica->applier_sync_state = APPLIER_DISCONNECTED;
-replica_set_applier(replica, applier);
-}
+box_sync_replication(false);
 /* Choose a master to send register request to. */
 struct replica *master = replicaset_leader();
-assert(master != NULL && master->applier != NULL);
+if (master == NULL || master->applier == NULL ||
+    master->applier->state != APPLIER_CONNECTED) {
+tnt_raise(ClientError, ER_CANNOT_REGISTER);
+}
 struct applier *master_applier = master->applier;
 
-applier_start(master_applier);
-
-applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
-applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
-applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
-applier_resume(master_applier);
+applier_resume_to_state(master_applier, APPLIER_READY,
+TIMEOUT_INFINITY);
+applier_resume_to_state(master_applier, APPLIER_REGISTER,
+TIMEOUT_INFINITY);
+applier_resume_to_state(master_applier, APPLIER_REGISTERED,
+TIMEOUT_INFINITY);
+applier_resume_to_state(master_applier, APPLIER_READY,
+TIMEOUT_INFINITY);
 /**
- * Restart other appliers to
+ * Resume other appliers to
  * resend non-anonymous subscribe.
  */
-replicaset_foreach(replica) {
-if (replica != master && replica->applier)
-applier_start(replica->applier);
-}
+replicaset_follow();
+replicaset_sync();
+guard.is_active = false;
 } else if (!is_box_configured) {
 replication_anon = anon;
 } else {
diff --git a/src/box/errcode.h b/src/box/errcode.h
index dce69c9c0..0f7563e64 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -259,6 +259,7 @@ struct errcode_record {
 /*204 */_(ER_SQL_FUNC_WRONG_RET_COUNT,"SQL expects exactly one argument returned from %s, got %d")\
 /*205 */_(ER_FUNC_INVALID_RETURN_TYPE,"Function '%s' returned value of invalid type: expected %s got %s") \
 /*206 */_(ER_REPLICA_NOT_ANON, "Replica '%s' is not anonymous and cannot register.") \
+/*207 */_(ER_CANNOT_REGISTER, "Couldn't find an instance to register this replica on.") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 22029751f..4258edfcd 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -750,19 +750,19 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
 {
 struct relay *relay = container_of(stream, struct relay, stream);
 assert(iproto_type_is_dml(packet->type));
-/*
- * Replica-local requests generated while replica was
- * anonymous have a zero instance id. Just skip all
- * these rows.
- */
-if (packet->replica_id == REPLICA_ID_NIL)
-return;
 /*
  * Transform replica local requests to IPROTO_NOP so as to
  * promote vclock on the replica without actually modifying
  * any data.
  */
 if (packet->group_id == GROUP_LOCAL) {
+/*
+ * Replica-local requests generated while replica
+ * was anonymous have a zero instance id. Just
+ * skip all these rows.
+ */
+if (packet->replica_id == REPLICA_ID_NIL)
+return;
 packet->type = IPROTO_NOP;
 packet->group_id = GROUP_DEFAULT;
 packet->bodycnt = 0;
diff --git a/test/box/misc.result b/test/box/misc.result
index 134caef63..51ad29cdd 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -555,6 +555,7 @@ t;
   204: box.error.SQL_FUNC_WRONG_RET_COUNT
   205: box.error.FUNC_INVALID_RETURN_TYPE
   206: box.error.REPLICA_NOT_ANON
+  207: box.error.CANNOT_REGISTER
 ...
 test_run:cmd("setopt delimiter ''");
 ---
 

-- 
Sergey Petrenko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.
  2019-12-27 15:27     ` Sergey Petrenko
@ 2019-12-27 18:42       ` Vladislav Shpilevoy
  2019-12-28 11:48         ` Sergey Petrenko
  0 siblings, 1 reply; 15+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-27 18:42 UTC (permalink / raw)
  To: Sergey Petrenko; +Cc: tarantool-patches

Thanks for the fixes!

I've pushed my review fixes on top of this commit. See it below
and on the branch. If you agree, then squash. Otherwise lets
discuss.

> Also please check out the discussion with Kostja Osipov in this thread
> ( [Tarantool-patches] [PATCH 4/5] vclock: ignore 0th component in comparisons.),
> there are some new changes to the commit regarding 0 vclock component comparison.

Yeah, I saw. Honestly, I don't understand how is 0 component used anywhere
before your patch. From what I see, it was just unused always. I've left a
small amendment in v2 thread to this commit. See answer to
"[Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons."

>> We should document new iproto codes: IPROTO_FETCH_SNAPSHOT,
>> IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here:
>> https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/
> 
> Opened a ticket. https://github.com/tarantool/doc/issues/1046

I rather meant to add them to the docbot request, but this is
ok as well.

>> See 4 comments below.
>>
>>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>>> index f4f9d0670..2939a0f61 100644
>>> --- a/src/box/applier.cc
>>> +++ b/src/box/applier.cc
>>> @@ -996,10 +1035,24 @@ applier_f(va_list ap)
>>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>>  /*
>>>   * Execute JOIN if this is a bootstrap.
>>> + * In case of anonymous replication, don't
>>> + * join but just fetch master's snapshot.
>>> + *
>>>   * The join will pause the applier
>>>   * until WAL is created.
>>>   */
>>> -applier_join(applier);
>>> +if (replication_anon)
>>> +applier_fetch_snapshot(applier);
>>> +else
>>> +applier_join(applier);
>>> +}
>>> +if (applier->version_id >= version_id(1, 7, 0) &&
>>> +    !replication_anon && instance_id == REPLICA_ID_NIL) {
>>> +/* anonymity was turned off while we were
>>> + * fetching a snapshot or following master.
>>> + * Register the replica now.
>>> + */
>>> +applier_register(applier);
>>>  }
>>
>> 1. Why do we check for version here? Wouldn't it be more straight
>> to do something like this?
>>
>> ================================================================================
>>
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index 3efa2b765..32851506c 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -1032,6 +1032,7 @@ applier_f(va_list ap)
>>  while (!fiber_is_cancelled()) {
>>  try {
>>  applier_connect(applier);
>> +bool was_anon = replication_anon;
>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>  /*
>>   * Execute JOIN if this is a bootstrap.
>> @@ -1046,8 +1047,8 @@ applier_f(va_list ap)
>>  else
>>  applier_join(applier);
>>  }
>> -if (applier->version_id >= version_id(1, 7, 0) &&
>> -    !replication_anon && instance_id == REPLICA_ID_NIL) {
>> +if (was_anon && !replication_anon &&
>> +    instance_id == REPLICA_ID_NIL) {
>>  /*
>>   * Anonymity was turned off while
>>   * we were fetching a snapshot or
>>
>> ================================================================================
>>
>> Also I don't understand, why is it needed? From what I see in
>> box_set_replication_anon(), we stop existing appliers, when
>> change `replication_anon`. So in case `replication_anon` is
>> changed, the fiber should be canceled by that moment.
>> applier_connect() does fiber_testcancel() via coio after each
>> yield.
>>
>> A register could even harm here, because this instance will do
>> register twice - from there, and from a new applier fiber, started
>> in box_set_replication_anon().
> 
> True, the version check is pointless and the comment is misleading.
> But we need to call applier_register here. In case of a transition
> from anon to normal, we have already fetched a snapshot, so we
> shouldn't execute join in a new applier, we just execute register and
> proceed to subscribe. Fixed.

Thanks for the fix. Although I still don't understand how is this
code reachable (I tried to add assert(false), and it crashed). It
is even covered with a test, what is cool btw.

As I see, when we change anon setting, the existing appliers are
cancelled. Why does this applier_f fiber still works, if any change
in anon and replication cancels all the existing appliers?

Apparently, I miss something.

>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index 56d07e634..a93e2503e 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
>>>  replication_skip_conflict = cfg_geti("replication_skip_conflict");
>>>  }
>>>
>>> +void
>>> +box_set_replication_anon(void)
>>> +{
>>> +bool anon = box_check_replication_anon();
>>> +if (anon == replication_anon)
>>> +return;
>>> +
>>> +if (!anon) {
>>> +/* Turn anonymous instance into a normal one. */
>>> +replication_anon = anon;
>>
>> 2. In case of any exception `replication_anon` will contain a not
>> applied value.
>>
>> But there is a worse problem - we stop the appliers, and don't
>> restart them in case of an exception. So box_set_replication_anon()
>> is not atomic, it may leave the instance in a half configured
>> state. box_set_replication() deals with that by creation of a new
>> array of appliers. It connects them, and only then atomically
>> replaces the old appliers.
>>
>> Also looks like deanonymisation does not respect quorum. I did
>> this test:
>>
>>     Master:
>>         box.cfg{listen=3301}
>>         box.schema.user.grant('guest', 'read,write,execute', 'universe')
>>
>>
>>     Replica:
>>         box.cfg{replication_anon=true,
>>                 read_only=true,
>>                 replication_connect_quorum=0,
>>                 replication=3301}
>>
>>     Master:
>>         Killed.
>>
>>     Replica:
>>         box.cfg{replication_anon=false, read_only=false}
>>
>> The replica hangs. Even though the quorum is 0.
>>
>> I think, we somehow need to reuse box_set_replication() for all
>> of this. Looks like it does all we need except does not send the
>> register requests.
> 
> True.

Cool, I am happy that it was fixed so easy! Although this test
should probably be added to anon.test.lua.

Now I've discovered another possible problem. Are anon replicas
going to be used as a failover master in case the old master
dies? Because the example above shows that it is not possible to
make an anon replica read-write without a master. This is what
I get, even with 0 quorum:
---
- error: Couldn't find an instance to register this replica on.
...

Is that intended? Likely yes, but I ask just in case.

However the thing below looks like a bug:

    Master:
      box.cfg{listen=3301}
      box.schema.user.grant('guest', 'read,write,execute', 'universe')
      box.schema.user.grant('guest', 'replication')

    Replica:
      box.cfg{replication_anon=true,
              read_only=true,
              replication_connect_quorum=0,
              replication=3301}

      box.cfg{replication_anon=false}
      ---
      - error: Couldn't find an instance to register this replica on.
      ...

This is because of quorum 0. When the quorum is 0, box_set_replication_anon()
tries to find a leader immediately after creation of appliers. Note, that the
master is alive. I tried to make a simple fix, but failed. Temporary change
to quorum 1 won't help, btw, because it can connect to a read-only replica
instead of a master.

Everything works, when I change quorum to 1, and then try deanon again.

Also it works when I do this:

@@ -789,7 +789,7 @@ box_set_replication_anon(void)
 		 * them can register and others resend a
 		 * non-anonymous subscribe.
 		 */
-		box_sync_replication(false);
+		box_sync_replication(true);
 		/*
 		 * Wait until the master has registered this
 		 * instance.

But I am not sure that it is legal to do so.

>>> +/*
>>> + * Reset all appliers. This will interrupt
>>> + * anonymous follow they're in so that one of
>>> + * them can register and others resend a
>>> + * non-anonymous subscribe.
>>> + */
>>> +replicaset_foreach(replica) {
>>> +struct applier *applier = replica->applier;
>>> +if (applier == NULL)
>>> +continue;
>>> +replica_clear_applier(replica);
>>> +applier_stop(applier);
>>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>>> +replica_set_applier(replica, applier);
>>> +}
>>> +/* Choose a master to send register request to. */
>>> +struct replica *master = replicaset_leader();
>>> +assert(master != NULL && master->applier != NULL);
>>> +struct applier *master_applier = master->applier;
>>> +
>>> +applier_start(master_applier);
>>> +
>>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);

Is it possible to always wait for APPLIER_READY and not to
pay attention to the prior states?

Or at least wait for REGISTERED + READY only? Like this:

================================================================================
 		}
 		struct applier *master_applier = master->applier;
 
-		applier_resume_to_state(master_applier, APPLIER_READY,
-					TIMEOUT_INFINITY);
-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
-					TIMEOUT_INFINITY);
 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
 					TIMEOUT_INFINITY);
================================================================================

>>> +applier_resume(master_applier);
>>> +/**
>>> + * Restart other appliers to
>>> + * resend non-anonymous subscribe.
>>> + */
>>> +replicaset_foreach(replica) {
>>> +if (replica != master && replica->applier)
>>> +applier_start(replica->applier);
>>> +}
>>> +} else if (!is_box_configured) {
>>> +replication_anon = anon;
>>> +} else {
>>> +/*
>>> + * It is forbidden to turn a normal replica into
>>> + * an anonymous one.
>>> + */
>>> +tnt_raise(ClientError, ER_CFG, "replication_anon",
>>> +  "cannot be turned on after bootstrap"
>>> +  " has finished");
>>> +}
>>> +
>>> +}
>>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>>> index e849fcf4f..22029751f 100644
>>> --- a/src/box/relay.cc
>>> +++ b/src/box/relay.cc
>>> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>>>  relay_start(relay, fd, sync, relay_send_row);
>>>  auto relay_guard = make_scoped_guard([=] {
>>>  relay_stop(relay);
>>> -replica_on_relay_stop(replica);
>>> +if (replica->anon)
>>> +replica_anon_delete(replica);
>>> +else
>>> +replica_on_relay_stop(replica);
>>
>> 3. Why can't replica_on_relay_stop() be used for an anon replica?
> 
> replica_on_relay_stop() asserts the replica is registered with gc and
> removes it from consumers. Anonymous replicas aren't registered with gc.
> I can rewrite replica_on_relay_stop() appropriately, but why?

The problem is that replica_on_relay_stop() is not really a destructor.
It is an event. What to do on that event is encapsulated inside
replica_on_relay_stop(). When you added branching on anon setting, the
encapsulation was broken.

>> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>> index 9dee75b7d..51d157aa3 100644
>> --- a/src/box/lua/load_cfg.lua
>> +++ b/src/box/lua/load_cfg.lua
>> @@ -279,6 +279,7 @@ local dynamic_cfg_order = {
>>      replication_sync_timeout    = 150,
>>      replication_connect_timeout = 150,
>>      replication_connect_quorum  = 150,
>> +    replication_anon        = 175,
>> ================================================================================
>>
>> I am not sure about this change.
>>
>> The only way how can it be changed now - from true to false.
>> Not back. The only dubious scenario: we change `replication_anon`
>> to false, and change `replication` at the same time.
>>
>> If we set `replication_anon` false before new `replication`,
>> we will try to register the old `replication`. It means
>> `replication_anon` update should be done after `replication`.
>> So as the new configuration could connect as anon, and then be
>> upgraded to normal.
>>
>> But when we will support anonymization of normal replicas, the
>> config update could look like this:
>>
>>     `replication_anon` false -> true
>>     `replication` old -> new
>>
>> And here `replication_anon` should be set before `replication`.
>> So as we would connect anonymously to the new replication set.
>>
>> In both cases we do unnecessary actions with the old
>> replication.
>>
>> Seems like it can't be properly solved by dynamic_cfg_order
>> table. We need to configure `replication_anon` and `replication`
>> at the same time. So as in box_set_replication() you always know
>> if the new `replication` setting is anon or not.
>>
>> Maybe we need to extend dynamic_cfg_order idea, and introduce
>> depending parameters. Or parameter groups. Or something like
>> that. So as we could, for example, group `replication_anon` and
>> `replication`, and pass anon value to box_set_replication() when
>> they are updated both at the same box.cfg.
>>
>> Or we need to forbid update of some parameters in one box.cfg.
>>
>> Or we could make that order of parameters depends on their
>> values. Although it does not solve the problem that when we
>> change both `replication_anon` and `replication`, we do
>> unnecessary actions to the old `replication` in some cases.
> 
> Well, I reused the box_set_replication code in replication_anon change, so
> now if replication_anon is changed together with replication, it will trigger
> duplicate box_set_replication action (almost). This is ok, I guess.

No, looks like the same problem still exists. The settings
are applied one by one. So when we change anon and replication,
the anon is changed first, and replication is changed second.

When anon is changed, it still sees the old replication setting.
Check this code in load_cfg.lua:

    local function reload_cfg(oldcfg, cfg)
        cfg = upgrade_cfg(cfg, translate_cfg)
        local newcfg = prepare_cfg(cfg, default_cfg, template_cfg, modify_cfg)
        local ordered_cfg = {}
        -- iterate over original table because prepare_cfg() may store NILs
        for key, val in pairs(cfg) do
            if dynamic_cfg[key] == nil and oldcfg[key] ~= val then
                box.error(box.error.RELOAD_CFG, key);
            end
            table.insert(ordered_cfg, key)
        end
        table.sort(ordered_cfg, sort_cfg_cb)
        for _, key in pairs(ordered_cfg) do
            local val = newcfg[key]
            local oldval = oldcfg[key]
            if not compare_cfg(val, oldval) then
                rawset(oldcfg, key, val)
                if not pcall(dynamic_cfg[key]) then
                    rawset(oldcfg, key, oldval) -- revert the old value
                    return box.error() -- re-throw
                end
                if log_cfg_option[key] ~= nil then
                    val = log_cfg_option[key](val)
                end
                log.info("set '%s' configuration option to %s", key,
                    json.encode(val))
            end
        end
        ...

Here rawset(oldcfg, key, val) is the actual update of what we
see in cfg_get*() functions. So when anon is changed before
replication, it will try to deanon the old replication. And
only then will configure the new replication.

In that case lets configure anon after replication in this
patch. It partially solves the problem for now. And I added
a ticket to improve this part.

https://github.com/tarantool/tarantool/issues/4714

Below are my review fixes:

================================================================================

commit ed559c8f18d5c8606410bfbb76a380a43e2e163f
Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Date:   Fri Dec 27 19:56:11 2019 +0300

    Review fixes 5/5

diff --git a/src/box/box.cc b/src/box/box.cc
index f99aeebc6..5a8e4d3b9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -790,7 +790,10 @@ box_set_replication_anon(void)
 		 * non-anonymous subscribe.
 		 */
 		box_sync_replication(false);
-		/* Choose a master to send register request to. */
+		/*
+		 * Wait until the master has registered this
+		 * instance.
+		 */
 		struct replica *master = replicaset_leader();
 		if (master == NULL || master->applier == NULL ||
 		    master->applier->state != APPLIER_CONNECTED) {
@@ -798,10 +801,6 @@ box_set_replication_anon(void)
 		}
 		struct applier *master_applier = master->applier;
 
-		applier_resume_to_state(master_applier, APPLIER_READY,
-					TIMEOUT_INFINITY);
-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
-					TIMEOUT_INFINITY);
 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
 					TIMEOUT_INFINITY);
 		applier_resume_to_state(master_applier, APPLIER_READY,
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 51d157aa3..955f50061 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -279,8 +279,16 @@ local dynamic_cfg_order = {
     replication_sync_timeout    = 150,
     replication_connect_timeout = 150,
     replication_connect_quorum  = 150,
-    replication_anon        = 175,
     replication             = 200,
+    -- Anon is set after `replication` as a temporary workaround
+    -- for the problem, that `replication` and `replication_anon`
+    -- depend on each other. If anon would be configured before
+    -- `replication`, it could lead to a bug, when anon is changed
+    -- from true to false together with `replication`, and it
+    -- would try to deanon the old `replication` before applying
+    -- the new one. This should be fixed when box.cfg is able to
+    -- apply some parameters together and atomically.
+    replication_anon        = 250,
 }
 
 local function sort_cfg_cb(l, r)
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4258edfcd..7ffc79285 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -697,10 +697,7 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 	relay_start(relay, fd, sync, relay_send_row);
 	auto relay_guard = make_scoped_guard([=] {
 		relay_stop(relay);
-		if (replica->anon)
-			replica_anon_delete(replica);
-		else
-			replica_on_relay_stop(replica);
+		replica_on_relay_stop(replica);
 	});
 
 	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 11096cc79..1476c7472 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -890,14 +890,24 @@ void
 replica_on_relay_stop(struct replica *replica)
 {
 	/*
-	 * If the replica was evicted from the cluster, we don't
-	 * need to keep WALs for it anymore. Unregister it with
-	 * the garbage collector then. See also replica_clear_id.
+	 * If the replica was evicted from the cluster, or was not
+	 * even added there (anon replica), we don't need to keep
+	 * WALs for it anymore. Unregister it with the garbage
+	 * collector then. See also replica_clear_id.
 	 */
-	assert(replica->gc != NULL);
 	if (replica->id == REPLICA_ID_NIL) {
-		gc_consumer_unregister(replica->gc);
-		replica->gc = NULL;
+		if (!replica->anon) {
+			gc_consumer_unregister(replica->gc);
+			replica->gc = NULL;
+		} else {
+			assert(replica->gc == NULL);
+			assert(replica->id == REPLICA_ID_NIL);
+			/*
+			 * We do not replicate from anonymous
+			 * replicas.
+			 */
+			assert(replica->applier == NULL);
+		}
 	}
 	if (replica_is_orphan(replica)) {
 		replica_hash_remove(&replicaset.hash, replica);
@@ -905,17 +915,6 @@ replica_on_relay_stop(struct replica *replica)
 	}
 }
 
-void
-replica_anon_delete(struct replica *replica)
-{
-	assert(replica->gc == NULL);
-	assert(replica->id == REPLICA_ID_NIL);
-	/* We do not replicate from anonymous replicas */
-	assert(replica->applier == NULL);
-	replica_hash_remove(&replicaset.hash, replica);
-	replica_delete(replica);
-}
-
 struct replica *
 replicaset_first(void)
 {
diff --git a/src/box/replication.h b/src/box/replication.h
index 978a09d41..2ef1255b3 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -367,9 +367,6 @@ replica_set_applier(struct replica * replica, struct applier * applier);
 void
 replica_on_relay_stop(struct replica *replica);
 
-void
-replica_anon_delete(struct replica *replica);
-
 #if defined(__cplusplus)
 } /* extern "C" */
 
diff --git a/test/replication/anon.result b/test/replication/anon.result
index f6a85e7d0..88061569f 100644
--- a/test/replication/anon.result
+++ b/test/replication/anon.result
@@ -324,3 +324,62 @@ box.schema.user.revoke('guest', 'replication')
 test_run:cleanup_cluster()
  | ---
  | ...
+
+--
+-- Check that in case of a master absence an anon replica can't
+-- deanonymize itself, regardless of quorum value.
+--
+test_run:cmd('create server master with script="replication/master1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master')
+ | ---
+ | - true
+ | ...
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:switch('replica_anon')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_connect_quorum = 0}
+ | ---
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_anon = false}
+ | ---
+ | - error: Couldn't find an instance to register this replica on.
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica_anon')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/anon.test.lua b/test/replication/anon.test.lua
index 1877ec8bb..8a8d15c18 100644
--- a/test/replication/anon.test.lua
+++ b/test/replication/anon.test.lua
@@ -116,3 +116,22 @@ box.space.temp:drop()
 box.space.loc:drop()
 box.schema.user.revoke('guest', 'replication')
 test_run:cleanup_cluster()
+
+--
+-- Check that in case of a master absence an anon replica can't
+-- deanonymize itself, regardless of quorum value.
+--
+test_run:cmd('create server master with script="replication/master1.lua"')
+test_run:cmd('start server master')
+test_run:switch('master')
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
+test_run:cmd('start server replica_anon')
+test_run:switch('replica_anon')
+box.cfg{replication_connect_quorum = 0}
+test_run:cmd('stop server master')
+test_run:cmd('delete server master')
+box.cfg{replication_anon = false}
+test_run:switch('default')
+test_run:cmd('stop server replica_anon')
+test_run:cmd('delete server replica_anon')
diff --git a/test/replication/master1.lua b/test/replication/master1.lua
new file mode 120000
index 000000000..1c7debd2f
--- /dev/null
+++ b/test/replication/master1.lua
@@ -0,0 +1 @@
+master.lua
\ No newline at end of file

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons.
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons sergepetrenko
  2019-12-25 16:00   ` Vladislav Shpilevoy
@ 2019-12-27 18:42   ` Vladislav Shpilevoy
  2019-12-28 11:21     ` Sergey Petrenko
  1 sibling, 1 reply; 15+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-27 18:42 UTC (permalink / raw)
  To: sergepetrenko, georgy; +Cc: tarantool-patches

I've just noticed that we have '.' in the end of the commit
title. I dropped it force pushed this commit, as an obvious
fix.

Also see some other changes below. They are pushed on top of
this commit. Squash if you agree. Or change it, if I miss
something.

================================================================================

commit 77455d8aeb1f5f26920bbc6d213b9d32edcfd6bb
Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Date:   Fri Dec 27 19:08:54 2019 +0300

    Review fix 4/5

diff --git a/src/box/vclock.h b/src/box/vclock.h
index e1625410e..eb0fb5d8b 100644
--- a/src/box/vclock.h
+++ b/src/box/vclock.h
@@ -266,7 +266,10 @@ enum { VCLOCK_ORDER_UNDEFINED = INT_MAX };
  * \brief Compare vclocks
  * \param a vclock
  * \param b vclock
- * \param ignore_zero Whether to order by 0-th component or not
+ * \param ignore_zero Whether to order by 0-th component or not.
+ *        The 0 component is ignored on anonymous replicas when
+ *        they apply rows from remote master. Because anon
+ *        replicas use the component for local purposes.
  *
  * \retval 1 if \a vclock is ordered after \a other
  * \retval -1 if \a vclock is ordered before than \a other
@@ -275,7 +278,7 @@ enum { VCLOCK_ORDER_UNDEFINED = INT_MAX };
  */
 static inline int
 vclock_compare_generic(const struct vclock *a, const struct vclock *b,
-	       bool ignore_zero)
+		       bool ignore_zero)
 {
 	bool le = true, ge = true;
 	unsigned int map = a->map | b->map;
@@ -283,11 +286,6 @@ vclock_compare_generic(const struct vclock *a, const struct vclock *b,
 	bit_iterator_init(&it, &map, sizeof(map), true);
 
 	size_t replica_id = bit_iterator_next(&it);
-	/*
-	 * Ignore 0-th component in comparisons.
-	 * It is empty for normal replicas and should
-	 * be ignored for anonymous ones.
-	 */
 	if (replica_id == 0 && ignore_zero)
 		replica_id = bit_iterator_next(&it);
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons.
  2019-12-27 18:42   ` Vladislav Shpilevoy
@ 2019-12-28 11:21     ` Sergey Petrenko
  0 siblings, 0 replies; 15+ messages in thread
From: Sergey Petrenko @ 2019-12-28 11:21 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 2136 bytes --]


>Пятница, 27 декабря 2019, 21:42 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
>
>I've just noticed that we have '.' in the end of the commit
>title. I dropped it force pushed this commit, as an obvious
>fix.
>
>Also see some other changes below. They are pushed on top of
>this commit. Squash if you agree. Or change it, if I miss
>something.
Hi! Looks good, thanks. Squashed.
>
>
>================================================================================
>
>commit 77455d8aeb1f5f26920bbc6d213b9d32edcfd6bb
>Author: Vladislav Shpilevoy < v.shpilevoy@tarantool.org >
>Date:   Fri Dec 27 19:08:54 2019 +0300
>
>    Review fix 4/5
>
>diff --git a/src/box/vclock.h b/src/box/vclock.h
>index e1625410e..eb0fb5d8b 100644
>--- a/src/box/vclock.h
>+++ b/src/box/vclock.h
>@@ -266,7 +266,10 @@ enum { VCLOCK_ORDER_UNDEFINED = INT_MAX };
>  * \brief Compare vclocks
>  * \param a vclock
>  * \param b vclock
>- * \param ignore_zero Whether to order by 0-th component or not
>+ * \param ignore_zero Whether to order by 0-th component or not.
>+ *        The 0 component is ignored on anonymous replicas when
>+ *        they apply rows from remote master. Because anon
>+ *        replicas use the component for local purposes.
>  *
>  * \retval 1 if \a vclock is ordered after \a other
>  * \retval -1 if \a vclock is ordered before than \a other
>@@ -275,7 +278,7 @@ enum { VCLOCK_ORDER_UNDEFINED = INT_MAX };
>  */
> static inline int
> vclock_compare_generic(const struct vclock *a, const struct vclock *b,
>-	       bool ignore_zero)
>+		       bool ignore_zero)
> {
> 	bool le = true, ge = true;
> 	unsigned int map = a->map | b->map;
>@@ -283,11 +286,6 @@ vclock_compare_generic(const struct vclock *a, const struct vclock *b,
> 	bit_iterator_init(&it, &map, sizeof(map), true);
>
> 	size_t replica_id = bit_iterator_next(&it);
>-	/*
>-	 * Ignore 0-th component in comparisons.
>-	 * It is empty for normal replicas and should
>-	 * be ignored for anonymous ones.
>-	 */
> 	if (replica_id == 0 && ignore_zero)
> 		replica_id = bit_iterator_next(&it);
>


-- 
Sergey Petrenko

[-- Attachment #2: Type: text/html, Size: 2972 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.
  2019-12-27 18:42       ` Vladislav Shpilevoy
@ 2019-12-28 11:48         ` Sergey Petrenko
  2019-12-28 12:15           ` Vladislav Shpilevoy
  0 siblings, 1 reply; 15+ messages in thread
From: Sergey Petrenko @ 2019-12-28 11:48 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 26367 bytes --]


>Пятница, 27 декабря 2019, 21:42 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
>
>Thanks for the fixes!
>
>I've pushed my review fixes on top of this commit. See it below
>and on the branch. If you agree, then squash. Otherwise lets
>discuss.
>
>> Also please check out the discussion with Kostja Osipov in this thread
>> ( [Tarantool-patches] [PATCH 4/5] vclock: ignore 0th component in comparisons.),
>> there are some new changes to the commit regarding 0 vclock component comparison.
>
>Yeah, I saw. Honestly, I don't understand how is 0 component used anywhere
>before your patch. From what I see, it was just unused always. I've left a
>small amendment in v2 thread to this commit. See answer to
>"[Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons."
>
>>> We should document new iproto codes: IPROTO_FETCH_SNAPSHOT,
>>> IPROTO_REGISTER, IPROTO_REPLICA_ANON, like here:
>>>  https://www.tarantool.io/en/doc/2.2/dev_guide/internals/box_protocol/
>> 
>> Opened a ticket. https://github.com/tarantool/doc/issues/1046
>
>I rather meant to add them to the docbot request, but this is
>ok as well.
>
>>> See 4 comments below.
>>>
>>>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>>>> index f4f9d0670..2939a0f61 100644
>>>> --- a/src/box/applier.cc
>>>> +++ b/src/box/applier.cc
>>>> @@ -996,10 +1035,24 @@ applier_f(va_list ap)
>>>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>>>  /*
>>>>   * Execute JOIN if this is a bootstrap.
>>>> + * In case of anonymous replication, don't
>>>> + * join but just fetch master's snapshot.
>>>> + *
>>>>   * The join will pause the applier
>>>>   * until WAL is created.
>>>>   */
>>>> -applier_join(applier);
>>>> +if (replication_anon)
>>>> +applier_fetch_snapshot(applier);
>>>> +else
>>>> +applier_join(applier);
>>>> +}
>>>> +if (applier->version_id >= version_id(1, 7, 0) &&
>>>> +    !replication_anon && instance_id == REPLICA_ID_NIL) {
>>>> +/* anonymity was turned off while we were
>>>> + * fetching a snapshot or following master.
>>>> + * Register the replica now.
>>>> + */
>>>> +applier_register(applier);
>>>>  }
>>>
>>> 1. Why do we check for version here? Wouldn't it be more straight
>>> to do something like this?
>>>
>>> ================================================================================
>>>
>>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>>> index 3efa2b765..32851506c 100644
>>> --- a/src/box/applier.cc
>>> +++ b/src/box/applier.cc
>>> @@ -1032,6 +1032,7 @@ applier_f(va_list ap)
>>>  while (!fiber_is_cancelled()) {
>>>  try {
>>>  applier_connect(applier);
>>> +bool was_anon = replication_anon;
>>>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>>>  /*
>>>   * Execute JOIN if this is a bootstrap.
>>> @@ -1046,8 +1047,8 @@ applier_f(va_list ap)
>>>  else
>>>  applier_join(applier);
>>>  }
>>> -if (applier->version_id >= version_id(1, 7, 0) &&
>>> -    !replication_anon && instance_id == REPLICA_ID_NIL) {
>>> +if (was_anon && !replication_anon &&
>>> +    instance_id == REPLICA_ID_NIL) {
>>>  /*
>>>   * Anonymity was turned off while
>>>   * we were fetching a snapshot or
>>>
>>> ================================================================================
>>>
>>> Also I don't understand, why is it needed? From what I see in
>>> box_set_replication_anon(), we stop existing appliers, when
>>> change `replication_anon`. So in case `replication_anon` is
>>> changed, the fiber should be canceled by that moment.
>>> applier_connect() does fiber_testcancel() via coio after each
>>> yield.
>>>
>>> A register could even harm here, because this instance will do
>>> register twice - from there, and from a new applier fiber, started
>>> in box_set_replication_anon().
>> 
>> True, the version check is pointless and the comment is misleading.
>> But we need to call applier_register here. In case of a transition
>> from anon to normal, we have already fetched a snapshot, so we
>> shouldn't execute join in a new applier, we just execute register and
>> proceed to subscribe. Fixed.
>
>Thanks for the fix. Although I still don't understand how is this
>code reachable (I tried to add assert(false), and it crashed). It
>is even covered with a test, what is cool btw.
>
>
>As I see, when we change anon setting, the existing appliers are
>cancelled. Why does this applier_f fiber still works, if any change
>in anon and replication cancels all the existing appliers?
On replication_anon reconfiguration old appliers are killed.
The new applier is started, it connects, checks for replicaset_uuid, which is
not nil, since the previous applier has fetched a snapshot,
proceeds to checking whether the instance_uuid is zero.
If it is (it is, since we were anonymous), and replication_anon is false
(and it is, since we have reconfigured it), it proceeds to register.
Then it subscribes.

I guess the previous comment that we fixed here was misleading.
This was how I understood it when I started working on this and
tried to reuse the same applier without killing it. So, sorry for the
confusion.

>
>
>Apparently, I miss something.
>
>>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>>> index 56d07e634..a93e2503e 100644
>>>> --- a/src/box/box.cc
>>>> +++ b/src/box/box.cc
>>>> @@ -740,6 +770,64 @@ box_set_replication_skip_conflict(void)
>>>>  replication_skip_conflict = cfg_geti("replication_skip_conflict");
>>>>  }
>>>>
>>>> +void
>>>> +box_set_replication_anon(void)
>>>> +{
>>>> +bool anon = box_check_replication_anon();
>>>> +if (anon == replication_anon)
>>>> +return;
>>>> +
>>>> +if (!anon) {
>>>> +/* Turn anonymous instance into a normal one. */
>>>> +replication_anon = anon;
>>>
>>> 2. In case of any exception `replication_anon` will contain a not
>>> applied value.
>>>
>>> But there is a worse problem - we stop the appliers, and don't
>>> restart them in case of an exception. So box_set_replication_anon()
>>> is not atomic, it may leave the instance in a half configured
>>> state. box_set_replication() deals with that by creation of a new
>>> array of appliers. It connects them, and only then atomically
>>> replaces the old appliers.
>>>
>>> Also looks like deanonymisation does not respect quorum. I did
>>> this test:
>>>
>>>     Master:
>>>         box.cfg{listen=3301}
>>>         box.schema.user.grant('guest', 'read,write,execute', 'universe')
>>>
>>>
>>>     Replica:
>>>         box.cfg{replication_anon=true,
>>>                 read_only=true,
>>>                 replication_connect_quorum=0,
>>>                 replication=3301}
>>>
>>>     Master:
>>>         Killed.
>>>
>>>     Replica:
>>>         box.cfg{replication_anon=false, read_only=false}
>>>
>>> The replica hangs. Even though the quorum is 0.
>>>
>>> I think, we somehow need to reuse box_set_replication() for all
>>> of this. Looks like it does all we need except does not send the
>>> register requests.
>> 
>> True.
>
>Cool, I am happy that it was fixed so easy! Although this test
>should probably be added to anon.test.lua.
>
>Now I've discovered another possible problem. Are anon replicas
>going to be used as a failover master in case the old master
>dies? Because the example above shows that it is not possible to
>make an anon replica read-write without a master. This is what
>I get, even with 0 quorum:
>---
>- error: Couldn't find an instance to register this replica on.
>...
>
>Is that intended? Likely yes, but I ask just in case. 
True, this is intended.

>
>
>However the thing below looks like a bug:
>
>    Master:
>      box.cfg{listen=3301}
>      box.schema.user.grant('guest', 'read,write,execute', 'universe')
>      box.schema.user.grant('guest', 'replication')
>
>    Replica:
>      box.cfg{replication_anon=true,
>              read_only=true,
>              replication_connect_quorum=0,
>              replication=3301}
>
>      box.cfg{replication_anon=false}
>      ---
>      - error: Couldn't find an instance to register this replica on.
>      ...
>
>This is because of quorum 0. When the quorum is 0, box_set_replication_anon()
>tries to find a leader immediately after creation of appliers. Note, that the
>master is alive. I tried to make a simple fix, but failed. Temporary change
>to quorum 1 won't help, btw, because it can connect to a read-only replica
>instead of a master.
>
>Everything works, when I change quorum to 1, and then try deanon again.
>
>Also it works when I do this:
>
>@@ -789,7 +789,7 @@ box_set_replication_anon(void)
> 		 * them can register and others resend a
> 		 * non-anonymous subscribe.
> 		 */
>-		box_sync_replication(false);
>+		box_sync_replication(true);
> 		/*
> 		 * Wait until the master has registered this
> 		 * instance.
>
>But I am not sure that it is legal to do so.
Maybe leave it as is for now? I can't see what the consequences
will be. If you want to promote an anon replica to master this
probably means that one of the masters is dead. So if you do
box_sync_replication(true), your instance will hang on replication_anon
configuration for replication timeout or whatever (replication_connect_timeout?).
You'll need to remove the dead master from box.cfg.replication first.
So this will still require additional configuration mangling.

If you think it's fine to wait for a timeout on replication_anon reconfiguration,
feel free to apply the change above. I'm not against it.

>
>
>>>> +/*
>>>> + * Reset all appliers. This will interrupt
>>>> + * anonymous follow they're in so that one of
>>>> + * them can register and others resend a
>>>> + * non-anonymous subscribe.
>>>> + */
>>>> +replicaset_foreach(replica) {
>>>> +struct applier *applier = replica->applier;
>>>> +if (applier == NULL)
>>>> +continue;
>>>> +replica_clear_applier(replica);
>>>> +applier_stop(applier);
>>>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>>>> +replica_set_applier(replica, applier);
>>>> +}
>>>> +/* Choose a master to send register request to. */
>>>> +struct replica *master = replicaset_leader();
>>>> +assert(master != NULL && master->applier != NULL);
>>>> +struct applier *master_applier = master->applier;
>>>> +
>>>> +applier_start(master_applier);
>>>> +
>>>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>>>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>>>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
>
>Is it possible to always wait for APPLIER_READY and not to
>pay attention to the prior states? 
No, applier transitions to ready right after CONNECTED.
>
>
>Or at least wait for REGISTERED + READY only? Like this:
This is ok. I just liked the verbosity of waiting for all the states step by step.
>
>
>================================================================================
> 		}
> 		struct applier *master_applier = master->applier;
>
>-		applier_resume_to_state(master_applier, APPLIER_READY,
>-					TIMEOUT_INFINITY);
>-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
>-					TIMEOUT_INFINITY);
> 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
> 					TIMEOUT_INFINITY);
>================================================================================
>
>>>> +applier_resume(master_applier);
>>>> +/**
>>>> + * Restart other appliers to
>>>> + * resend non-anonymous subscribe.
>>>> + */
>>>> +replicaset_foreach(replica) {
>>>> +if (replica != master && replica->applier)
>>>> +applier_start(replica->applier);
>>>> +}
>>>> +} else if (!is_box_configured) {
>>>> +replication_anon = anon;
>>>> +} else {
>>>> +/*
>>>> + * It is forbidden to turn a normal replica into
>>>> + * an anonymous one.
>>>> + */
>>>> +tnt_raise(ClientError, ER_CFG, "replication_anon",
>>>> +  "cannot be turned on after bootstrap"
>>>> +  " has finished");
>>>> +}
>>>> +
>>>> +}
>>>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>>>> index e849fcf4f..22029751f 100644
>>>> --- a/src/box/relay.cc
>>>> +++ b/src/box/relay.cc
>>>> @@ -691,7 +697,10 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>>>>  relay_start(relay, fd, sync, relay_send_row);
>>>>  auto relay_guard = make_scoped_guard([=] {
>>>>  relay_stop(relay);
>>>> -replica_on_relay_stop(replica);
>>>> +if (replica->anon)
>>>> +replica_anon_delete(replica);
>>>> +else
>>>> +replica_on_relay_stop(replica);
>>>
>>> 3. Why can't replica_on_relay_stop() be used for an anon replica?
>> 
>> replica_on_relay_stop() asserts the replica is registered with gc and
>> removes it from consumers. Anonymous replicas aren't registered with gc.
>> I can rewrite replica_on_relay_stop() appropriately, but why?
>
>The problem is that replica_on_relay_stop() is not really a destructor.
>It is an event. What to do on that event is encapsulated inside
>replica_on_relay_stop(). When you added branching on anon setting, the
>encapsulation was broken.
Ok, I see, thanks.
>
>
>>> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>>> index 9dee75b7d..51d157aa3 100644
>>> --- a/src/box/lua/load_cfg.lua
>>> +++ b/src/box/lua/load_cfg.lua
>>> @@ -279,6 +279,7 @@ local dynamic_cfg_order = {
>>>      replication_sync_timeout    = 150,
>>>      replication_connect_timeout = 150,
>>>      replication_connect_quorum  = 150,
>>> +    replication_anon        = 175,
>>> ================================================================================
>>>
>>> I am not sure about this change.
>>>
>>> The only way how can it be changed now - from true to false.
>>> Not back. The only dubious scenario: we change `replication_anon`
>>> to false, and change `replication` at the same time.
>>>
>>> If we set `replication_anon` false before new `replication`,
>>> we will try to register the old `replication`. It means
>>> `replication_anon` update should be done after `replication`.
>>> So as the new configuration could connect as anon, and then be
>>> upgraded to normal.
>>>
>>> But when we will support anonymization of normal replicas, the
>>> config update could look like this:
>>>
>>>     `replication_anon` false -> true
>>>     `replication` old -> new
>>>
>>> And here `replication_anon` should be set before `replication`.
>>> So as we would connect anonymously to the new replication set.
>>>
>>> In both cases we do unnecessary actions with the old
>>> replication.
>>>
>>> Seems like it can't be properly solved by dynamic_cfg_order
>>> table. We need to configure `replication_anon` and `replication`
>>> at the same time. So as in box_set_replication() you always know
>>> if the new `replication` setting is anon or not.
>>>
>>> Maybe we need to extend dynamic_cfg_order idea, and introduce
>>> depending parameters. Or parameter groups. Or something like
>>> that. So as we could, for example, group `replication_anon` and
>>> `replication`, and pass anon value to box_set_replication() when
>>> they are updated both at the same box.cfg.
>>>
>>> Or we need to forbid update of some parameters in one box.cfg.
>>>
>>> Or we could make that order of parameters depends on their
>>> values. Although it does not solve the problem that when we
>>> change both `replication_anon` and `replication`, we do
>>> unnecessary actions to the old `replication` in some cases.
>> 
>> Well, I reused the box_set_replication code in replication_anon change, so
>> now if replication_anon is changed together with replication, it will trigger
>> duplicate box_set_replication action (almost). This is ok, I guess.
>
>No, looks like the same problem still exists. The settings
>are applied one by one. So when we change anon and replication,
>the anon is changed first, and replication is changed second.
>
>When anon is changed, it still sees the old replication setting.
>Check this code in load_cfg.lua:
>
>    local function reload_cfg(oldcfg, cfg)
>        cfg = upgrade_cfg(cfg, translate_cfg)
>        local newcfg = prepare_cfg(cfg, default_cfg, template_cfg, modify_cfg)
>        local ordered_cfg = {}
>        -- iterate over original table because prepare_cfg() may store NILs
>        for key, val in pairs(cfg) do
>            if dynamic_cfg[key] == nil and oldcfg[key] ~= val then
>                box.error(box.error.RELOAD_CFG, key);
>            end
>            table.insert(ordered_cfg, key)
>        end
>        table.sort(ordered_cfg, sort_cfg_cb)
>        for _, key in pairs(ordered_cfg) do
>            local val = newcfg[key]
>            local oldval = oldcfg[key]
>            if not compare_cfg(val, oldval) then
>                rawset(oldcfg, key, val)
>                if not pcall(dynamic_cfg[key]) then
>                    rawset(oldcfg, key, oldval) -- revert the old value
>                    return box.error() -- re-throw
>                end
>                if log_cfg_option[key] ~= nil then
>                    val = log_cfg_option[key](val)
>                end
>                log.info("set '%s' configuration option to %s", key,
>                    json.encode(val))
>            end
>        end
>        ...
>
>Here rawset(oldcfg, key, val) is the actual update of what we
>see in cfg_get*() functions. So when anon is changed before
>replication, it will try to deanon the old replication. And
>only then will configure the new replication.
>
>In that case lets configure anon after replication in this
>patch. It partially solves the problem for now. And I added
>a ticket to improve this part.
Thanks for the clarification and fixes, I missed it.
>
>
>https://github.com/tarantool/tarantool/issues/4714
>
>Below are my review fixes:
>
>================================================================================
>
>commit ed559c8f18d5c8606410bfbb76a380a43e2e163f
>Author: Vladislav Shpilevoy < v.shpilevoy@tarantool.org >
>Date:   Fri Dec 27 19:56:11 2019 +0300
>
>    Review fixes 5/5
>
>diff --git a/src/box/box.cc b/src/box/box.cc
>index f99aeebc6..5a8e4d3b9 100644
>--- a/src/box/box.cc
>+++ b/src/box/box.cc
>@@ -790,7 +790,10 @@ box_set_replication_anon(void)
> 		 * non-anonymous subscribe.
> 		 */
> 		box_sync_replication(false);
>-		/* Choose a master to send register request to. */
>+		/*
>+		 * Wait until the master has registered this
>+		 * instance.
>+		 */
> 		struct replica *master = replicaset_leader();
> 		if (master == NULL || master->applier == NULL ||
> 		    master->applier->state != APPLIER_CONNECTED) {
>@@ -798,10 +801,6 @@ box_set_replication_anon(void)
> 		}
> 		struct applier *master_applier = master->applier;
>
>-		applier_resume_to_state(master_applier, APPLIER_READY,
>-					TIMEOUT_INFINITY);
>-		applier_resume_to_state(master_applier, APPLIER_REGISTER,
>-					TIMEOUT_INFINITY);
> 		applier_resume_to_state(master_applier, APPLIER_REGISTERED,
> 					TIMEOUT_INFINITY);
> 		applier_resume_to_state(master_applier, APPLIER_READY,
>diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
>index 51d157aa3..955f50061 100644
>--- a/src/box/lua/load_cfg.lua
>+++ b/src/box/lua/load_cfg.lua
>@@ -279,8 +279,16 @@ local dynamic_cfg_order = {
>     replication_sync_timeout    = 150,
>     replication_connect_timeout = 150,
>     replication_connect_quorum  = 150,
>-    replication_anon        = 175,
>     replication             = 200,
>+    -- Anon is set after `replication` as a temporary workaround
>+    -- for the problem, that `replication` and `replication_anon`
>+    -- depend on each other. If anon would be configured before
>+    -- `replication`, it could lead to a bug, when anon is changed
>+    -- from true to false together with `replication`, and it
>+    -- would try to deanon the old `replication` before applying
>+    -- the new one. This should be fixed when box.cfg is able to
>+    -- apply some parameters together and atomically.
>+    replication_anon        = 250,
> }
>
> local function sort_cfg_cb(l, r)
>diff --git a/src/box/relay.cc b/src/box/relay.cc
>index 4258edfcd..7ffc79285 100644
>--- a/src/box/relay.cc
>+++ b/src/box/relay.cc
>@@ -697,10 +697,7 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
> 	relay_start(relay, fd, sync, relay_send_row);
> 	auto relay_guard = make_scoped_guard([=] {
> 		relay_stop(relay);
>-		if (replica->anon)
>-			replica_anon_delete(replica);
>-		else
>-			replica_on_relay_stop(replica);
>+		replica_on_relay_stop(replica);
> 	});
>
> 	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
>diff --git a/src/box/replication.cc b/src/box/replication.cc
>index 11096cc79..1476c7472 100644
>--- a/src/box/replication.cc
>+++ b/src/box/replication.cc
>@@ -890,14 +890,24 @@ void
> replica_on_relay_stop(struct replica *replica)
> {
> 	/*
>-	 * If the replica was evicted from the cluster, we don't
>-	 * need to keep WALs for it anymore. Unregister it with
>-	 * the garbage collector then. See also replica_clear_id.
>+	 * If the replica was evicted from the cluster, or was not
>+	 * even added there (anon replica), we don't need to keep
>+	 * WALs for it anymore. Unregister it with the garbage
>+	 * collector then. See also replica_clear_id.
> 	 */
>-	assert(replica->gc != NULL);
> 	if (replica->id == REPLICA_ID_NIL) {
>-		gc_consumer_unregister(replica->gc);
>-		replica->gc = NULL;
>+		if (!replica->anon) {
>+			gc_consumer_unregister(replica->gc);
>+			replica->gc = NULL;
>+		} else {
>+			assert(replica->gc == NULL);
>+			assert(replica->id == REPLICA_ID_NIL);
>+			/*
>+			 * We do not replicate from anonymous
>+			 * replicas.
>+			 */
>+			assert(replica->applier == NULL);
>+		}
> 	}
> 	if (replica_is_orphan(replica)) {
> 		replica_hash_remove(&replicaset.hash, replica);
>@@ -905,17 +915,6 @@ replica_on_relay_stop(struct replica *replica)
> 	}
> }
>
>-void
>-replica_anon_delete(struct replica *replica)
>-{
>-	assert(replica->gc == NULL);
>-	assert(replica->id == REPLICA_ID_NIL);
>-	/* We do not replicate from anonymous replicas */
>-	assert(replica->applier == NULL);
>-	replica_hash_remove(&replicaset.hash, replica);
>-	replica_delete(replica);
>-}
>-
> struct replica *
> replicaset_first(void)
> {
>diff --git a/src/box/replication.h b/src/box/replication.h
>index 978a09d41..2ef1255b3 100644
>--- a/src/box/replication.h
>+++ b/src/box/replication.h
>@@ -367,9 +367,6 @@ replica_set_applier(struct replica * replica, struct applier * applier);
> void
> replica_on_relay_stop(struct replica *replica);
>
>-void
>-replica_anon_delete(struct replica *replica);
>-
> #if defined(__cplusplus)
> } /* extern "C" */
>
>diff --git a/test/replication/anon.result b/test/replication/anon.result
>index f6a85e7d0..88061569f 100644
>--- a/test/replication/anon.result
>+++ b/test/replication/anon.result
>@@ -324,3 +324,62 @@ box.schema.user.revoke('guest', 'replication')
> test_run:cleanup_cluster()
>  | ---
>  | ...
>+
>+--
>+-- Check that in case of a master absence an anon replica can't
>+-- deanonymize itself, regardless of quorum value.
>+--
>+test_run:cmd('create server master with script="replication/master1.lua"')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('start server master')
>+ | ---
>+ | - true
>+ | ...
>+test_run:switch('master')
>+ | ---
>+ | - true
>+ | ...
>+box.schema.user.grant('guest', 'replication')
>+ | ---
>+ | ...
>+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('start server replica_anon')
>+ | ---
>+ | - true
>+ | ...
>+test_run:switch('replica_anon')
>+ | ---
>+ | - true
>+ | ...
>+box.cfg{replication_connect_quorum = 0}
>+ | ---
>+ | ...
>+test_run:cmd('stop server master')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('delete server master')
>+ | ---
>+ | - true
>+ | ...
>+box.cfg{replication_anon = false}
>+ | ---
>+ | - error: Couldn't find an instance to register this replica on.
>+ | ...
>+test_run:switch('default')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('stop server replica_anon')
>+ | ---
>+ | - true
>+ | ...
>+test_run:cmd('delete server replica_anon')
>+ | ---
>+ | - true
>+ | ...
>diff --git a/test/replication/anon.test.lua b/test/replication/anon.test.lua
>index 1877ec8bb..8a8d15c18 100644
>--- a/test/replication/anon.test.lua
>+++ b/test/replication/anon.test.lua
>@@ -116,3 +116,22 @@ box.space.temp:drop()
> box.space.loc:drop()
> box.schema.user.revoke('guest', 'replication')
> test_run:cleanup_cluster()
>+
>+--
>+-- Check that in case of a master absence an anon replica can't
>+-- deanonymize itself, regardless of quorum value.
>+--
>+test_run:cmd('create server master with script="replication/master1.lua"')
>+test_run:cmd('start server master')
>+test_run:switch('master')
>+box.schema.user.grant('guest', 'replication')
>+test_run:cmd('create server replica_anon with rpl_master=master, script="replication/anon1.lua"')
>+test_run:cmd('start server replica_anon')
>+test_run:switch('replica_anon')
>+box.cfg{replication_connect_quorum = 0}
>+test_run:cmd('stop server master')
>+test_run:cmd('delete server master')
>+box.cfg{replication_anon = false}
>+test_run:switch('default')
>+test_run:cmd('stop server replica_anon')
>+test_run:cmd('delete server replica_anon')
>diff --git a/test/replication/master1.lua b/test/replication/master1.lua
>new file mode 120000
>index 000000000..1c7debd2f
>--- /dev/null
>+++ b/test/replication/master1.lua
>@@ -0,0 +1 @@
>+master.lua
>\ No newline at end of file

LGTM, squashed with a tiny change:

diff --git a/src/box/replication.cc b/src/box/replication.cc
index 1476c7472..e7bfa22ab 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -901,7 +901,6 @@ replica_on_relay_stop(struct replica *replica)
 replica->gc = NULL;
 } else {
 assert(replica->gc == NULL);
-assert(replica->id == REPLICA_ID_NIL);
 /*
  * We do not replicate from anonymous
  * replicas.

The assertion was inside if (replica->id == REPLICA_ID_NIL) branch.

-- 
Sergey Petrenko

[-- Attachment #2: Type: text/html, Size: 39765 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica.
  2019-12-28 11:48         ` Sergey Petrenko
@ 2019-12-28 12:15           ` Vladislav Shpilevoy
  0 siblings, 0 replies; 15+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-28 12:15 UTC (permalink / raw)
  To: Sergey Petrenko, Kirill Yukhin; +Cc: tarantool-patches

Thanks for the fixes!

The whole patchset LGTM. With some of my answers below.

>     >> 1. Why do we check for version here? Wouldn't it be more straight
>     >> to do something like this?
>     >>
>     >> ================================================================================
>     >>
>     >> diff --git a/src/box/applier.cc b/src/box/applier.cc
>     >> index 3efa2b765..32851506c 100644
>     >> --- a/src/box/applier.cc
>     >> +++ b/src/box/applier.cc
>     >> @@ -1032,6 +1032,7 @@ applier_f(va_list ap)
>     >>  while (!fiber_is_cancelled()) {
>     >>  try {
>     >>  applier_connect(applier);
>     >> +bool was_anon = replication_anon;
>     >>  if (tt_uuid_is_nil(&REPLICASET_UUID)) {
>     >>  /*
>     >>   * Execute JOIN if this is a bootstrap.
>     >> @@ -1046,8 +1047,8 @@ applier_f(va_list ap)
>     >>  else
>     >>  applier_join(applier);
>     >>  }
>     >> -if (applier->version_id >= version_id(1, 7, 0) &&
>     >> - !replication_anon && instance_id == REPLICA_ID_NIL) {
>     >> +if (was_anon && !replication_anon &&
>     >> + instance_id == REPLICA_ID_NIL) {
>     >>  /*
>     >>   * Anonymity was turned off while
>     >>   * we were fetching a snapshot or
>     >>
>     >> ================================================================================
>     >>
>     >> Also I don't understand, why is it needed? From what I see in
>     >> box_set_replication_anon(), we stop existing appliers, when
>     >> change `replication_anon`. So in case `replication_anon` is
>     >> changed, the fiber should be canceled by that moment.
>     >> applier_connect() does fiber_testcancel() via coio after each
>     >> yield.
>     >>
>     >> A register could even harm here, because this instance will do
>     >> register twice - from there, and from a new applier fiber, started
>     >> in box_set_replication_anon().
>     >
>     > True, the version check is pointless and the comment is misleading.
>     > But we need to call applier_register here. In case of a transition
>     > from anon to normal, we have already fetched a snapshot, so we
>     > shouldn't execute join in a new applier, we just execute register and
>     > proceed to subscribe. Fixed.
> 
>     Thanks for the fix. Although I still don't understand how is this
>     code reachable (I tried to add assert(false), and it crashed). It
>     is even covered with a test, what is cool btw.
> 
> 
> 
> 
>     As I see, when we change anon setting, the existing appliers are
>     cancelled. Why does this applier_f fiber still works, if any change
>     in anon and replication cancels all the existing appliers?
> 
> 
> On replication_anon reconfiguration old appliers are killed.
> The new applier is started, it connects, checks for replicaset_uuid, which is
> not nil, since the previous applier has fetched a snapshot,
> proceeds to checking whether the instance_uuid is zero.
> If it is (it is, since we were anonymous), and replication_anon is false
> (and it is, since we have reconfigured it), it proceeds to register.
> Then it subscribes.
> 
> I guess the previous comment that we fixed here was misleading.
> This was how I understood it when I started working on this and
> tried to reuse the same applier without killing it. So, sorry for the
> confusion.

Aha, now I understand. Then looks ok.

>     However the thing below looks like a bug:
> 
>         Master:
>           box.cfg{listen=3301}
>           box.schema.user.grant('guest', 'read,write,execute', 'universe')
>           box.schema.user.grant('guest', 'replication')
> 
>         Replica:
>           box.cfg{replication_anon=true,
>                   read_only=true,
>                   replication_connect_quorum=0,
>                   replication=3301}
> 
>           box.cfg{replication_anon=false}
>           ---
>           - error: Couldn't find an instance to register this replica on.
>           ...
> 
>     This is because of quorum 0. When the quorum is 0, box_set_replication_anon()
>     tries to find a leader immediately after creation of appliers. Note, that the
>     master is alive. I tried to make a simple fix, but failed. Temporary change
>     to quorum 1 won't help, btw, because it can connect to a read-only replica
>     instead of a master.
> 
>     Everything works, when I change quorum to 1, and then try deanon again.
> 
>     Also it works when I do this:
> 
>     @@ -789,7 +789,7 @@ box_set_replication_anon(void)
>       * them can register and others resend a
>       * non-anonymous subscribe.
>       */
>     - box_sync_replication(false);
>     + box_sync_replication(true);
>       /*
>       * Wait until the master has registered this
>       * instance.
> 
>     But I am not sure that it is legal to do so.
> 
> 
> Maybe leave it as is for now? I can't see what the consequences
> will be. If you want to promote an anon replica to master this
> probably means that one of the masters is dead. So if you do
> box_sync_replication(true), your instance will hang on replication_anon
> configuration for replication timeout or whatever (replication_connect_timeout?).
> You'll need to remove the dead master from box.cfg.replication first.
> So this will still require additional configuration mangling.
> 
> If you think it's fine to wait for a timeout on replication_anon reconfiguration,
> feel free to apply the change above. I'm not against it.

I don't know what is more correct to do here. Yes, lets keep it
as is, and look at user feedback.

>     >>> +/*
>     >>> + * Reset all appliers. This will interrupt
>     >>> + * anonymous follow they're in so that one of
>     >>> + * them can register and others resend a
>     >>> + * non-anonymous subscribe.
>     >>> + */
>     >>> +replicaset_foreach(replica) {
>     >>> +struct applier *applier = replica->applier;
>     >>> +if (applier == NULL)
>     >>> +continue;
>     >>> +replica_clear_applier(replica);
>     >>> +applier_stop(applier);
>     >>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>     >>> +replica_set_applier(replica, applier);
>     >>> +}
>     >>> +/* Choose a master to send register request to. */
>     >>> +struct replica *master = replicaset_leader();
>     >>> +assert(master != NULL && master->applier != NULL);
>     >>> +struct applier *master_applier = master->applier;
>     >>> +
>     >>> +applier_start(master_applier);
>     >>> +
>     >>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>     >>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>     >>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
> 
>     Is it possible to always wait for APPLIER_READY and not to
>     pay attention to the prior states?
> 
> No, applier transitions to ready right after CONNECTED.
> 
> 
> 
>     Or at least wait for REGISTERED + READY only? Like this:
> 
> 
> This is ok. I just liked the verbosity of waiting for all the states step by step.

You are free to revert this part of my change, I don't mind.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas
  2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
                   ` (4 preceding siblings ...)
  2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica sergepetrenko
@ 2019-12-30  5:12 ` Kirill Yukhin
  5 siblings, 0 replies; 15+ messages in thread
From: Kirill Yukhin @ 2019-12-30  5:12 UTC (permalink / raw)
  To: sergepetrenko; +Cc: v.shpilevoy, tarantool-patches

Hello,

On 25 дек 15:46, sergepetrenko wrote:
> https://github.com/tarantool/tarantool/tree/sp/gh-3186-anon-replica
> https://github.com/tarantool/tarantool/issues/3186
> 
> The first patch alters the comment regarding join protocol
> to be in sync with latest changes: join isn't done off a snapshot,
> but from a read-view rather, created at the moment the request is received.
> 
> The second patch removes unnecessary decoding of replicaset uuid on
> master side, since master just ignores it now.
> 
> The third patch is a preparation for anonymous replica. We need to split join
> protocol into two stages: first, fetch a snapshot. This is done by both anonymous
> and regular replicas. The second stage is registration, i. e. the addition of
> replica to _cluster table. It is done right away when joining a regular replica,
> or it can be done by a request from an anonymous replica to transition it to
> normal one.
> The patch itself just splits applier join into appropriate stages.
> 
> The fourth patch modifies vclock logic to ignore 0 component in comparisons.
> This is also used by anonymous replicas, which will be the first users of the 
> 0-th vclock component. This component will be used to keep track of anonymous
> replica local changes, which are not replicated. These local changes will vary
> from instance to instance, and the anonymous instances will all use 0th component
> to track them, so no need to compare this component.
> 
> The fifth patch does the main job of adding anonymous replicas.
> New requests, FETCH_SNAPSHOT and REGISTER are introduced. Anonymous replicas
> are allowed to have a 0 id. They aren't registered as gc consumers and aren't
> present in _cluster table.
> 
> Changes in v2: 
>   - review fixes as per reviews from
>     Georgy Kirichenko and Vlad Shpilevoy.

I've checked your patchset into master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-12-30  5:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-25 12:46 [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas sergepetrenko
2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 1/5] box: update comment describing join protocol sergepetrenko
2019-12-25 12:46 ` [Tarantool-patches] [PATCH v2 2/5] replication: do not decode replicaset uuid when processing a subscribe sergepetrenko
2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 3/5] applier: split join processing into two stages sergepetrenko
2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 4/5] vclock: ignore 0th component in comparisons sergepetrenko
2019-12-25 16:00   ` Vladislav Shpilevoy
2019-12-27 18:42   ` Vladislav Shpilevoy
2019-12-28 11:21     ` Sergey Petrenko
2019-12-25 12:47 ` [Tarantool-patches] [PATCH v2 5/5] replication: introduce anonymous replica sergepetrenko
2019-12-25 18:22   ` Vladislav Shpilevoy
2019-12-27 15:27     ` Sergey Petrenko
2019-12-27 18:42       ` Vladislav Shpilevoy
2019-12-28 11:48         ` Sergey Petrenko
2019-12-28 12:15           ` Vladislav Shpilevoy
2019-12-30  5:12 ` [Tarantool-patches] [PATCH v2 0/5] introduce anonymous replicas Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox