[Tarantool-patches] [PATCH 5/5] replication: introduce anonymous replica.

Sergey Petrenko sergepetrenko at tarantool.org
Wed Dec 25 15:40:13 MSK 2019




>Воскресенье, 22 декабря 2019, 20:58 +03:00 от Vladislav Shpilevoy <v.shpilevoy at tarantool.org>:
>
>Thanks for the patch!
Thanks for review!

>
>
>See 16 comments/questions below.
>
>On 15/12/2019 21:58, sergepetrenko wrote:
>> This commit introduces anonymous replicas. Such replicas do not pollute
>> _cluster table (they can only be read-only and have a zero id in return).
>> An anonymous replica can be promoted to a normal one if needed.
>
>1. Do we need the promotion? Seems like it was not asked for, but we will
>need to support it forever in case we allow this now. Also it looks not
>even, when I can promote, but can't demote.

Well, one of the use cases for anonymous replica, in my opinion, is a backup
instance, which can be promoted once one of the normal instances gets down.
(Almost like a hot standby instance).

Let's ask Gosha whether we should keep promotion or get rid of it.
Regarding demotion, what if I add it later after giving it some thought?

>
>
>> 
>> Closes #3186
>> 
>> @TarantoolBot document
>> Title: Document anonymous replica
>> 
>> There is a new type of replica in tarantool, anonymous one. Anonymous
>> replica is read-only (but you still can write to temporary and
>> replica-local spaces), and it isn't present in _cluster table.
>
>2. Replica-local and temporary spaces' schema is replicated. From
>_space, _index, and other system spaces. Does not it break anything?
>If it does not, then why can't a user create non-local and non-temporary
>spaces on an anon replica?
>
>Or do you mean, that the master should create the spaces, and the
>replica can fill them with data, and can't create these spaces by
>itself?

I mean the latter. Spaces are created on master and can be used on anon replicas.
Such replicas cannot create spaces. Even local/temporary ones

>
>
>> 
>> Since anonymous replica isn't registered in _cluster table, there is no
>> limitation for anonymous replica count in a replicaset. You can have as
>> many of them as you want.
>> 
>> In order to make a replica anonymous, you have to pass an option
>> `replication_anon=true` to `box.cfg`. You also have to set 'read_only'
>> to true.
>> 
>> Let's go through anonymous replica bootstrap.
>> Suppose we have a master configured with
>> ```
>> box.cfg{listen=3301}
>> ```
>> And created a local space called "loc"
>> ```
>> box.schema.space.create('loc', {is_local=true})
>> box.space.loc:create_index("pk")
>> ```
>> Now, to configure an anonymous replica, we have to issue `box.cfg`,
>> as usual.
>> ```
>> box.cfg{replication_anon=true, read_only=true, replication=3301}
>> ```
>> As mentioned above, `replication_anon` may be set to true only together
>> with `read_only`
>> The instance will fetch masters snapshot and proceed to following its
>> changes. It will not receive an id so its id will remain zero.
>> ```
>> tarantool> box.info.id
>> ---
>> - 0
>> ...
>> ```
>> ```
>> tarantool> box.info.replication
>> ---
>> - 1:
>>     id: 1
>>     uuid: 3c84f8d9-e34d-4651-969c-3d0ed214c60f
>>     lsn: 4
>>     upstream:
>>       status: follow
>>       idle: 0.6912029999985
>>       peer:
>>       lag: 0.00014615058898926
>> ...
>> ```
>> Now we can use the replica.
>> For example, we may do inserts into the local space:
>> ```
>> tarantool> for i = 1,10 do
>>          > box.space.loc:insert{i}
>>          > end
>> ---
>> ...
>> ```
>> Note, that while the instance is anonymous, it will increase the 0-th
>> component of its vclock:
>> ```
>> tarantool> box.info.vclock
>> ---
>> - {0: 10, 1: 4}
>> ...
>> ```
>> Let's now promote the replica to a normal one:
>> ```
>> tarantool> box.cfg{replication_anon=false}
>> 2019-12-13 20:34:37.423 [71329] main I> assigned id 2 to replica 6a9c2ed2-b9e1-4c57-a0e8-51a46def7661
>> 2019-12-13 20:34:37.424 [71329] main/102/interactive I> set 'replication_anon' configuration option to false
>> ---
>> ...
>> 
>> tarantool> 2019-12-13 20:34:37.424 [71329] main/117/applier/ I> subscribed
>> 2019-12-13 20:34:37.424 [71329] main/117/applier/ I> remote vclock {1: 5} local vclock {0: 10, 1: 5}
>> 2019-12-13 20:34:37.425 [71329] main/118/applierw/ C> leaving orphan mode
>> ```
>> The replica just received id 2. We can make it read-write now.
>> ```
>> box.cfg{read_only=false}
>> 2019-12-13 20:35:46.392 [71329] main/102/interactive I> set 'read_only' configuration option to false
>> ---
>> ...
>> 
>> tarantool> box.schema.space.create('test')
>> ---
>> - engine: memtx
>>   before_replace: 'function: 0x01109f9dc8'
>>   on_replace: 'function: 0x01109f9d90'
>>   ck_constraint: []
>>   field_count: 0
>>   temporary: false
>>   index: []
>>   is_local: false
>>   enabled: false
>>   name: test
>>   id: 513
>> - created
>> ...
>> 
>> tarantool> box.info.vclock
>> ---
>> - {0: 10, 1: 5, 2: 2}
>> ...
>> ```
>> Now replica tracks its changes in 2nd vclock component, as expected.
>> It can also become replication master from now on.
>> 
>> Side notes:
>>   * You cannot replicate from an anonymous instance.
>>   * To promote an anonymous instance to a regular one,
>>     you first have to start it as anonymous, ano only
>
>3. ano -> and.

Fixed, thanks!

>
>
>>     then issue `box.cfg{replication_anon=false}`
>>   * In order for the deanonymization to succeed, the
>>     instance must replicate from some read-write instance,
>>     otherwise noone will be able to add it to _cluster table.
>> ---
>>  src/box/applier.cc              |  58 ++++++-
>>  src/box/applier.h               |   4 +
>>  src/box/box.cc                  | 267 ++++++++++++++++++++++++++++++--
>>  src/box/box.h                   |  11 +-
>>  src/box/iproto.cc               |  16 +-
>>  src/box/iproto_constants.h      |   6 +
>>  src/box/lua/cfg.cc              |  14 +-
>>  src/box/lua/info.c              |   4 +-
>>  src/box/lua/load_cfg.lua        |   4 +
>>  src/box/recovery.cc             |   7 +-
>>  src/box/relay.cc                |  32 +++-
>>  src/box/replication.cc          |  41 ++++-
>>  src/box/replication.h           |  24 +++
>>  src/box/wal.c                   |   4 +
>>  src/box/xrow.c                  |  47 +++++-
>>  src/box/xrow.h                  |  68 ++++++--
>>  test/app-tap/init_script.result |  49 +++---
>>  test/box/admin.result           |   2 +
>>  test/box/cfg.result             |   4 +
>>  test/replication/anon.lua       |  13 ++
>>  test/replication/anon.result    | 259 +++++++++++++++++++++++++++++++
>>  test/replication/anon.test.lua  |  89 +++++++++++
>>  test/replication/suite.cfg      |   1 +
>>  23 files changed, 957 insertions(+), 67 deletions(-)
>>  create mode 100644 test/replication/anon.lua
>>  create mode 100644 test/replication/anon.result
>>  create mode 100644 test/replication/anon.test.lua
>> 
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index 357369025..1445dd4d1 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -452,6 +452,23 @@ applier_do_fetch_snapshot(struct applier *applier)
>>  return row_count;
>>  }
>> 
>> +static void
>> +applier_fetch_snapshot(struct applier *applier)
>> +{
>> +/* Send FETCH SNAPSHOT request */
>> +struct ev_io *coio = &applier->io;
>> +struct xrow_header row;
>> +
>> +memset(&row, 0, sizeof(row));
>> +row.type = IPROTO_FETCH_SNAPSHOT;
>> +coio_write_xrow(coio, &row);
>> +
>> +applier_set_state(applier, APPLIER_FETCH_SNAPSHOT);
>> +applier_do_fetch_snapshot(applier);
>
>4. This is time to mention FETCH_SNAPSHOT request in the
>applier_do_fetch_snapshot() comments, which now mention
>only JOIN.

Done.

>
>
>> +applier_set_state(applier, APPLIER_FETCHED_SNAPSHOT);
>> +applier_set_state(applier, APPLIER_READY);
>> +}
>> +
>>  static uint64_t
>>  applier_do_register(struct applier *applier, uint64_t row_count)
>>  {
>>  /** States for the applier */
>>  ENUM(applier_state, applier_STATE);
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 981a5bac1..4c39e4971 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -740,6 +770,65 @@ box_set_replication_skip_conflict(void)
>>  replication_skip_conflict = cfg_geti("replication_skip_conflict");
>>  }
>> 
>> +void
>> +box_set_replication_anon(void)
>> +{> +bool anon = box_check_replication_anon();
>> +if (anon == replication_anon)
>> +return;
>> +
>> +if (!anon) {
>> +/* Turn anonymous instance into a normal one. */
>> +replication_anon = anon;
>> +/*
>> + * Reset all appliers. This will interrupt
>> + * anonymous follow they're in and also update
>> + * corresponding instance ballots so that we can
>> + * use the latest info when choosing a replica to
>> + * register on.
>> + */
>> +replicaset_foreach(replica) {
>> +struct applier *applier = replica->applier;
>> +if (applier == NULL)
>> +continue;
>> +replica_clear_applier(replica);
>> +replica->applier_sync_state = APPLIER_DISCONNECTED;
>> +applier_stop(applier);
>> +applier_start(applier);
>> +replica_set_applier(replica, applier);
>> +applier_resume_to_state(applier, APPLIER_CONNECTED, TIMEOUT_INFINITY);
>> +}
>> +/* Choose a master to send register request to. */
>> +struct replica *master = replicaset_leader();
>> +assert(master != NULL && master->applier != NULL);
>> +struct applier *master_applier = master->applier;
>> +applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
>> +applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
>> +applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
>> +applier_resume(master_applier);
>> +/**
>> + * Restart other appliers to
>> + * resend non-anonymous subscribe.
>> + */
>> +replicaset_foreach(replica) {
>> +if (replica == master || replica->applier == NULL)
>> +continue;
>> +applier_resume(replica->applier);
>> +}
>
>5. Consider this refactoring:
>
>================================================================================
>
>diff --git a/src/box/box.cc b/src/box/box.cc
>index efffa654f..aa9221aaa 100644
>--- a/src/box/box.cc
>+++ b/src/box/box.cc
>@@ -804,15 +804,13 @@ box_set_replication_anon(void)
> applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
> applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
> applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
>-applier_resume(master_applier);
> /**
>- * Restart other appliers to
>- * resend non-anonymous subscribe.
>+ * Restart appliers to resend non-anonymous
>+ * subscribe.
>  */
> replicaset_foreach(replica) {
>-if (replica == master || replica->applier == NULL)
>-continue;
>-applier_resume(replica->applier);
>+if (replica->applier != NULL)
>+applier_resume(replica->applier);
> }
> } else if (!is_box_configured) {
> replication_anon = anon;
>
>================================================================================

I've found an error in this piece of code during review. I assumed that
appliers paused themselves on successful connect, hence the applier_resume
for every one of them below. This wasn't true. The trigger to pause appliers on
connect is set only during bootstrap, which isn't our case. So I've rewritten this part.
Remote instance ballots are not updated anymore, unfortunately, but I think we can
live with that.

===========================================================
diff --git a/src/box/box.cc b/src/box/box.cc
index 4e41f6b76..d7aa7fb49 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -782,10 +782,9 @@ box_set_replication_anon(void)
                replication_anon = anon;
                /*
                 * Reset all appliers. This will interrupt
-                * anonymous follow they're in and also update
-                * corresponding instance ballots so that we can
-                * use the latest info when choosing a replica to
-                * register on.
+                 * anonymous follow they're in so that one of
+                 * them can register and others resend a
+                 * non-anonymous subscribe.
                 */
                replicaset_foreach(replica) {
                        struct applier *applier = replica->applier;
@@ -795,12 +794,14 @@ box_set_replication_anon(void)
                        applier_stop(applier);
                        replica->applier_sync_state = APPLIER_DISCONNECTED;
                        replica_set_applier(replica, applier);
-                       applier_start(applier);
                }
                /* Choose a master to send register request to. */
                struct replica *master = replicaset_leader();
                assert(master != NULL && master->applier != NULL);
                struct applier *master_applier = master->applier;
+
+                applier_start(master_applier);
+
                applier_resume_to_state(master_applier, APPLIER_REGISTER, TIMEOUT_INFINITY);
                applier_resume_to_state(master_applier, APPLIER_REGISTERED, TIMEOUT_INFINITY);
                applier_resume_to_state(master_applier, APPLIER_READY, TIMEOUT_INFINITY);
@@ -810,9 +811,8 @@ box_set_replication_anon(void)
                 * resend non-anonymous subscribe.
                 */
                replicaset_foreach(replica) {
-                       if (replica == master || replica->applier == NULL)
-                               continue;
-                       applier_resume(replica->applier);
+                        if (replica != master && replica->applier)
+                                applier_start(replica->applier);
                }
        } else if (!is_box_configured) {
                replication_anon = anon;

===========================================================

>
>
>Why do you call resume on all of them, if you just called
>applier_start() and didn't call pause? 

No need to call applier_resume, true. Mentioned above.


>
>
>> +} else if (!is_box_configured) {
>> +replication_anon = anon;
>> +} else {
>> +/*
>> + * It is forbidden to turn a normal replica into
>> + * an anonymous one.
>> + */
>> +tnt_raise(ClientError, ER_CFG, "replication_anon",
>> +  "cannot be turned on after bootstrap"
>> +  " has finished");
>> +}
>> +
>> +}
>> +
>>  void
>>  box_listen(void)
>>  {
>> @@ -1379,6 +1468,132 @@ box_process_auth(struct auth_request *request, const char *salt)
>>  authenticate(user, len, salt, request->scramble);
>>  }
>> 
>> +void
>> +box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header)
>> +{
>> +
>
>6. Extra empty line.

Fixed.

>
>> +assert(header->type == IPROTO_FETCH_SNAPSHOT);
>> +
>> +/* Check that bootstrap has been finished */
>> +if (!is_box_configured)
>> +tnt_raise(ClientError, ER_LOADING);
>> +
>> +/* Check permissions */
>> +access_check_universe_xc(PRIV_R);
>> +
>> +/* Forbid replication with disabled WAL */
>> +if (wal_mode() == WAL_NONE) {
>> +tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
>> +  "wal_mode = 'none'");
>> +}
>> +
>> +say_info("sending current read-view to replica at %s", sio_socketname(io->fd));
>> +
>> +/* Send the snapshot data to the instance. */
>> +struct vclock start_vclock;
>> +relay_initial_join(io->fd, header->sync, &start_vclock);
>> +say_info("read-view sent.");
>> +
>> +/* Remember master's vclock after the last request */
>> +struct vclock stop_vclock;
>> +vclock_copy(&stop_vclock, &replicaset.vclock);
>> +
>> +/* Send end of snapshot data marker */
>> +struct xrow_header row;
>> +xrow_encode_vclock_xc(&row, &stop_vclock);
>> +row.sync = header->sync;
>> +coio_write_xrow(io, &row);
>> +}
>> +
>> +void
>> +box_process_register(struct ev_io *io, struct xrow_header *header)
>> +{
>> +assert(header->type == IPROTO_REGISTER);
>> +
>> +struct tt_uuid instance_uuid = uuid_nil;
>> +struct vclock vclock;
>> +xrow_decode_register_xc(header, &instance_uuid, &vclock);
>> +
>> +if (!is_box_configured)
>> +tnt_raise(ClientError, ER_LOADING);
>> +
>> +if (tt_uuid_is_equal(&instance_uuid, &INSTANCE_UUID))
>> +tnt_raise(ClientError, ER_CONNECTION_TO_SELF);
>> +
>> +/* Forbid replication from an anonymous instance. */
>> +if (replication_anon) {
>> +tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
>> +  "replicating from an anonymous instance.");
>> +}
>> +
>> +access_check_universe_xc(PRIV_R);
>> +/* We only get register requests from anonymous instances. */
>> +struct replica *replica = replica_by_uuid(&instance_uuid);
>> +assert(replica == NULL || replica->id == REPLICA_ID_NIL);
>
>7. What if I send a malicious packet with a not existing instance
>UUID?

Fixed, will throw an error now.

>
>> +/* See box_process_join() */
>> +box_check_writable_xc();
>> +struct space *space = space_cache_find_xc(BOX_CLUSTER_ID);
>> +access_check_space_xc(space, PRIV_W);
>> +
>> +/* Forbid replication with disabled WAL */
>> +if (wal_mode() == WAL_NONE) {
>> +tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
>> +  "wal_mode = 'none'");
>> +}
>> +
>> +/*
>> + * Register the replica as a WAL consumer so that
>> + * it can resume FINAL JOIN where INITIAL JOIN ends.
>> + */
>> +struct gc_consumer *gc = gc_consumer_register(&replicaset.vclock,
>> +"replica %s", tt_uuid_str(&instance_uuid));
>> +if (gc == NULL)
>> +diag_raise();
>> +auto gc_guard = make_scoped_guard([&] { gc_consumer_unregister(gc); });
>> +
>> +say_info("registering replica %s at %s",
>> + tt_uuid_str(&instance_uuid), sio_socketname(io->fd));
>> +
>> +struct vclock start_vclock;
>> +vclock_copy(&start_vclock, &replicaset.vclock);
>> +
>> +/**
>> + * Call the server-side hook which stores the replica uuid
>> + * in _cluster space.
>> + */
>> +box_on_join(&instance_uuid);
>> +
>> +ERROR_INJECT_YIELD(ERRINJ_REPLICA_JOIN_DELAY);
>> +
>> +/* Remember master's vclock after the last request */
>> +struct vclock stop_vclock;
>> +vclock_copy(&stop_vclock, &replicaset.vclock);
>> +
>> +/*
>> + * Feed replica with WALs in range (start_vclock, stop_vclock)
>> + * so that it gets its registration.
>> + */
>
>8. Lets keep 66 border.

Fixed.

>
>> +relay_final_join(io->fd, header->sync, &start_vclock, &stop_vclock);
>> +say_info("final data sent.");
>> +
>> +struct xrow_header row;
>> +/* Send end of WAL stream marker */
>> +xrow_encode_vclock_xc(&row, &replicaset.vclock);
>> +row.sync = header->sync;
>> +coio_write_xrow(io, &row);
>> +
>> +/*
>> + * Advance the WAL consumer state to the position where
>> + * FINAL JOIN ended and assign it to the replica.
>> + */
>> +gc_consumer_advance(gc, &stop_vclock);
>> +replica = replica_by_uuid(&instance_uuid);
>> +if (replica->gc != NULL)
>> +gc_consumer_unregister(replica->gc);
>
>9. How is that possible that the former anon replica has gc?

Anon replica receives a gc registration in replica_set_id, which is called
above. So I can remove this piece of code altogether. But I don't know whether
I should do it or not. It seems more verbose if I leave this part consistent with
box_process_join. Otherwise it's not clear where does the replica receive the
gc registration.
Another option is to remove the piece and add a comment stating that previously
anon replica is registered with gc in _cluster on_commit trigger.
Which one do you prefer? I've left the current variant intact for now.

>
>> +replica->gc = gc;
>> +gc_guard.is_active = false;
>> +}
>> +
>>  void
>>  box_process_join(struct ev_io *io, struct xrow_header *header)
>>  {
>> @@ -1533,27 +1754,39 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>>  if (!is_box_configured)
>>  tnt_raise(ClientError, ER_LOADING);
>> 
>> +
>
>10. Extra empty line.

Fixed.

>
>>  struct tt_uuid replica_uuid = uuid_nil;
>>  struct vclock replica_clock;
>>  uint32_t replica_version_id;
>>  vclock_create(&replica_clock);
>> +bool anon;
>>  xrow_decode_subscribe_xc(header, NULL, &replica_uuid,
>> - &replica_clock, &replica_version_id);
>> + &replica_clock, &replica_version_id, &anon);
>> 
>>  /* Forbid connection to itself */
>>  if (tt_uuid_is_equal(&replica_uuid, &INSTANCE_UUID))
>>  tnt_raise(ClientError, ER_CONNECTION_TO_SELF);
>> 
>> +/* Forbid replication from an anonymous instance. */
>> +if (replication_anon) {
>> +tnt_raise(ClientError, ER_UNSUPPORTED, "Replication",
>> +  "replicating from an anonymous instance.");
>> +}
>> +
>>  /* Check permissions */
>>  access_check_universe_xc(PRIV_R);
>> 
>>  /* Check replica uuid */
>>  struct replica *replica = replica_by_uuid(&replica_uuid);
>> -if (replica == NULL || replica->id == REPLICA_ID_NIL) {
>> +
>> +if (!anon && (replica == NULL || replica->id == REPLICA_ID_NIL)) {
>>  tnt_raise(ClientError, ER_UNKNOWN_REPLICA,
>>    tt_uuid_str(&replica_uuid),
>>    tt_uuid_str(&REPLICASET_UUID));
>>  }
>> +if (replica == NULL) {
>> +replica = replicaset_add_anon(&replica_uuid);
>> +}
>
>11. I propose to omit {} when 'if' body consists of one line.
>In all places, in all commits.

No problem.

>
>> 
>>  /* Don't allow multiple relays for the same replica */
>>  if (relay_get_state(replica->relay) == RELAY_FOLLOW) {
>> @@ -1774,13 +2007,16 @@ bootstrap_from_master(struct replica *master)
>>   */
>> 
>>  assert(!tt_uuid_is_nil(&INSTANCE_UUID));
>> -applier_resume_to_state(applier, APPLIER_INITIAL_JOIN, TIMEOUT_INFINITY);
>> -
>> +enum applier_state wait_state = replication_anon ? APPLIER_FETCH_SNAPSHOT :
>> +      APPLIER_INITIAL_JOIN;
>> +applier_resume_to_state(applier, wait_state, TIMEOUT_INFINITY);
>>  /*
>>   * Process initial data (snapshot or dirty disk data).
>>   */
>>  engine_begin_initial_recovery_xc(NULL);
>> -applier_resume_to_state(applier, APPLIER_FINAL_JOIN, TIMEOUT_INFINITY);
>> +wait_state = replication_anon ? APPLIER_FETCHED_SNAPSHOT :
>> +   APPLIER_FINAL_JOIN;
>> +applier_resume_to_state(applier, wait_state, TIMEOUT_INFINITY);
>
>12. Please, fix indentation and keep 80 border in this
>hunk.

Done.

>
>> 
>>  /*
>>   * Process final data (WALs).> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
>> index 5e8a7d483..cc8dd7cd7 100644
>> --- a/src/box/iproto_constants.h
>> +++ b/src/box/iproto_constants.h
>> @@ -216,6 +218,10 @@ enum iproto_type {
>>  IPROTO_VOTE_DEPRECATED = 67,
>>  /** Vote request command for master election */
>>  IPROTO_VOTE = 68,
>> +/** Anonymous replication FETCH SNAPSHOT */
>> +IPROTO_FETCH_SNAPSHOT = 69,
>> +/** REGISTER request to leave anonymous replication */
>
>13. Please, keep 66 and put a dot in the end of sentence.

Done.

>
>> +IPROTO_REGISTER = 70,
>> 
>>  /** Vinyl run info stored in .index file */
>>  VY_INDEX_RUN_INFO = 100,
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index e849fcf4f..14644716d 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -569,11 +569,17 @@ relay_subscribe_f(va_list ap)
>>  cbus_pair("tx", relay->endpoint.name, &relay->tx_pipe,
>>    &relay->relay_pipe, NULL, NULL, cbus_process);
>> 
>> -/* Setup garbage collection trigger. */
>> +/*
>> + * Setup garbage collection trigger.
>> + * Not needed for anonymous replicas, since they
>> + * aren't registered with gc at all.
>> + */
>
>14. If a master does not register an anon replica as a gc consumer,
>it will remove xlogs even if the replica didn't get them yet. Not
>sure, if we want that behaviour. AFAIU, purpose of anon replicas
>is to break the limit on vlock size about 32 instances only.

True, but if we do register replicas as gc consumers there is no way
to exclude them, so such a replica, in case of failure, will stall gc on master
forever. We need some mechanism, to remove anonymous replicas from gc
consumers. Normal replicas can be removed by deleting their entries from
_cluster, we cannot do this for anonymous replicas. And if we just remove
replica from gc consumer on every disconnect, there is no point in registering
it with gc at all, since the point of gc consumer, among others, is to wait for
dead replicas to reconnect and collect xlogs.

>
>>  struct trigger on_close_log = {
>>  RLIST_LINK_INITIALIZER, relay_on_close_log_f, relay, NULL
>>  };
>> -trigger_add(&r->on_close_log, &on_close_log);
>> +if (!relay->replica->anon) {
>> +trigger_add(&r->on_close_log, &on_close_log);
>> +}
>> 
>>  /* Setup WAL watcher for sending new rows to the replica. */
>>  wal_set_watcher(&relay->wal_watcher, relay->endpoint.name,
>> diff --git a/src/box/xrow.c b/src/box/xrow.c
>> index 18bf08971..37a565bcb 100644
>> --- a/src/box/xrow.c
>> +++ b/src/box/xrow.c
>> @@ -1182,7 +1213,7 @@ xrow_encode_subscribe(struct xrow_header *row,
>>  int
>>  xrow_decode_subscribe(struct xrow_header *row, struct tt_uuid *replicaset_uuid,
>>        struct tt_uuid *instance_uuid, struct vclock *vclock,
>> -      uint32_t *version_id)
>> +      uint32_t *version_id, bool *anon)
>>  {
>>  if (row->bodycnt == 0) {
>>  diag_set(ClientError, ER_INVALID_MSGPACK, "request body");
>> @@ -1245,6 +1276,16 @@ xrow_decode_subscribe(struct xrow_header *row, struct tt_uuid *replicaset_uuid,
>>  }
>>  *version_id = mp_decode_uint(&d);
>>  break;
>> +case IPROTO_REPLICA_ANON:
>> +if (anon == NULL)
>> +goto skip;
>> +if (mp_typeof(*d) != MP_BOOL) {
>> +xrow_on_decode_err(data, end, ER_INVALID_MSGPACK,
>> +   "invalid REPLICA_ANON flag");
>> +return -1;
>> +}
>> +*anon = mp_decode_bool(&d);
>
>15. Anon is not initialized in case IPROTO_REPLICA_ANON was not specified.
>box_process_subscribe() can fail on that, because it passes a not initialized
>anon variable to xrow_decode_subscribe().

Fixed.

>
>> +break;
>>  default: skip:
>>  mp_next(&d); /* value */
>>  }
>> diff --git a/test/replication/anon.result b/test/replication/anon.result
>> new file mode 100644
>> index 000000000..df84484b2
>> --- /dev/null
>> +++ b/test/replication/anon.result
>> @@ -0,0 +1,259 @@
>> +-- test-run result file version 2
>> +env = require('test_run')
>> + | ---
>> + | ...
>> +vclock_diff = require('fast_replica').vclock_diff
>> + | ---
>> + | ...
>> +test_run = env.new()
>> + | ---
>> + | ...
>> +
>> +-- prepare master
>
>16. Lets give a reference to the ticket, and use capital letters
>+ dots in all the sentences.

Done.

>
>> +box.schema.user.grant('guest', 'replication')
>> + | ---
>> + | ...
>> +_ = box.schema.space.create('loc', {is_local=true})
>> + | ---
>> + | ...
>> +_ = box.schema.space.create('temp', {temporary=true})
>> + | ---
>> + | ...
>> +_ = box.schema.space.create('test')
>> + | ---
>> + | ...
>> +_ = box.space.loc:create_index('pk')
>> + | ---
>> + | ...
>> +_ = box.space.temp:create_index('pk')
>> + | ---
>> + | ...
>> +_ = box.space.test:create_index('pk')
>> + | ---
>> + | ...
>> +box.space.test:insert{1}
>> + | ---
>> + | - [1]
>> + | ...
>> +
>> +test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon.lua"')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('start server replica_anon')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('switch replica_anon')
>> + | ---
>> + | - true
>> + | ...
>> +
>> +box.info.status
>> + | ---
>> + | - running
>> + | ...
>> +box.info.id
>> + | ---
>> + | - 0
>> + | ...
>> +box.info.lsn
>> + | ---
>> + | - 0
>> + | ...
>> +test_run:wait_upstream(1, {status='follow'})
>> + | ---
>> + | - true
>> + | ...
>> +
>> +-- Temporary spaces are accessible as read / write.
>> +for i = 1,10 do box.space.temp:insert{i} end
>> + | ---
>> + | ...
>> +box.space.temp:select{}
>> + | ---
>> + | - - [1]
>> + |   - [2]
>> + |   - [3]
>> + |   - [4]
>> + |   - [5]
>> + |   - [6]
>> + |   - [7]
>> + |   - [8]
>> + |   - [9]
>> + |   - [10]
>> + | ...
>> +
>> +box.info.lsn
>> + | ---
>> + | - 0
>> + | ...
>> +
>> +-- Same for local spaces.
>> +for i = 1,10 do box.space.loc:insert{i} end
>> + | ---
>> + | ...
>> +box.space.loc:select{}
>> + | ---
>> + | - - [1]
>> + |   - [2]
>> + |   - [3]
>> + |   - [4]
>> + |   - [5]
>> + |   - [6]
>> + |   - [7]
>> + |   - [8]
>> + |   - [9]
>> + |   - [10]
>> + | ...
>> +
>> +-- Replica-local changes are accounted for in 0 vclock component.
>> +box.info.lsn
>> + | ---
>> + | - 10
>> + | ...
>> +box.info.vclock[0]
>> + | ---
>> + | - 10
>> + | ...
>> +
>> +-- Replica is read-only.
>> +box.cfg.read_only
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{read_only=false}
>> + | ---
>> + | - error: 'Incorrect value for option ''read_only'': the value may be set to false
>> + |     only when replication_anon is false'
>> + | ...
>> +
>> +box.space.test:insert{2}
>> + | ---
>> + | - error: Can't modify data because this instance is in read-only mode.
>> + | ...
>> +
>> +box.space.loc:drop()
>> + | ---
>> + | - error: Can't modify data because this instance is in read-only mode.
>> + | ...
>> +box.space.loc:truncate()
>> + | ---
>> + | - error: Can't modify data because this instance is in read-only mode.
>> + | ...
>> +
>> +test_run:cmd('switch default')
>> + | ---
>> + | - true
>> + | ...
>> +
>> +-- Replica isn't visible on master.
>> +#box.info.replication
>> + | ---
>> + | - 1
>> + | ...
>> +
>> +test_run:cmd('switch replica_anon')
>> + | ---
>> + | - true
>> + | ...
>> +
>> +-- Promote anonymous replica.
>> +box.cfg{replication_anon=false}
>> + | ---
>> + | ...
>> +-- Cannot switch back after becoming "normal".
>> +box.cfg{replication_anon=true}
>> + | ---
>> + | - error: 'Incorrect value for option ''replication_anon'': cannot be turned on after
>> + |     bootstrap has finished'
>> + | ...
>> +
>> +box.info.id
>> + | ---
>> + | - 2
>> + | ...
>> +#box.info.replication
>> + | ---
>> + | - 2
>> + | ...
>> +test_run:wait_upstream(1, {status='follow'})
>> + | ---
>> + | - true
>> + | ...
>> +box.info.replication.downstream
>> + | ---
>> + | - null
>> + | ...
>> +
>> +old_lsn = box.info.vclock[2] or 0
>> + | ---
>> + | ...
>> +
>> +-- Now read_only can be turned off.
>> +box.cfg{read_only=false}
>> + | ---
>> + | ...
>> +box.space.test:insert{2}
>> + | ---
>> + | - [2]
>> + | ...
>> +-- New changes are tracked under freshly assigned id.
>> +box.info.vclock[2] == old_lsn + 1
>> + | ---
>> + | - true
>> + | ...
>> +
>> +test_run:cmd('switch default')
>> + | ---
>> + | - true
>> + | ...
>> +
>> +-- Other instances may replicate from a previously-anonymous one.
>> +test_run:cmd("set variable repl_source to 'replica_anon.listen'")
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{replication=repl_source}
>> + | ---
>> + | ...
>> +#box.info.replication
>> + | ---
>> + | - 2
>> + | ...
>> +test_run:wait_upstream(2, {status='follow'})
>> + | ---
>> + | - true
>> + | ...
>> +test_run:wait_downstream(2, {status='follow'})
>> + | ---
>> + | - true
>> + | ...
>> +#box.info.vclock
>> + | ---
>> + | - 2
>> + | ...
>> +
>> +-- cleanup
>> +box.cfg{replication=""}
>> + | ---
>> + | ...
>> +test_run:cmd('stop server replica_anon with cleanup=1')
>> + | ---
>> + | - true
>> + | ...
>> +box.space.test:drop()
>> + | ---
>> + | ...
>> +box.space.temp:drop()
>> + | ---
>> + | ...
>> +box.space.loc:drop()
>> + | ---
>> + | ...
>> +box.schema.user.revoke('guest', 'replication')
>> + | ---
>> + | ...
>> +test_run:cleanup_cluster()
>> + | ---
>> + | ...

I guess it's getting pretty hard to track all the changes here, so I'll resend
v2.

-- 
Sergey Petrenko


More information about the Tarantool-patches mailing list