Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Olga Arkhangelskaia <krishtal.olja@gmail.com>
Cc: tarantool-patches@freelists.org
Subject: Re: [tarantool-patches] [PATCH v6 3/3] box: adds replication sync after cfg. update
Date: Thu, 30 Aug 2018 19:41:49 +0300	[thread overview]
Message-ID: <20180830164149.7kmvyo4xncgdjpxu@esperanza> (raw)
In-Reply-To: <20180830141114.4531-3-krishtal.olja@gmail.com>

On Thu, Aug 30, 2018 at 05:11:14PM +0300, Olga Arkhangelskaia wrote:
> diff --git a/src/box/box.cc b/src/box/box.cc
> index dcedfd002..e54a79467 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -646,6 +646,12 @@ box_set_replication(void)
>  	box_sync_replication(true);
>  	/* Follow replica */
>  	replicaset_follow();
> +	/* Set orphan and sync replica up to quorum.
> +	 * If we fail to sync up, replica will be left in orphan state.
> +	 */
> +	is_orphan = true;
> +	title("orphan");

It isn't enough. You should also wake up fibers waiting on ro_cond, take
a look at box_clear_orphan(). Also, I think that it's worth
encapsulating this code in function box_set_orphan().

> +	replicaset_sync();

If the instance happens to be synchronized when this function is called,
it will still enter orphan state briefly and then leave it in a while.
While in orphan state, it might reject some rw requests. I asked Kostja
and Georgy about that. Both of them agreed that it isn't good. Ideally
we shouldn't enter orphan state, even briefly, if the instance doesn't
need to wait for appliers to synchronize. In other words, if vclock
received by at least replication_connect_quorum appliers in reply to
SUBSCRIBE command is less than replicaset.vclock we shouldn't enter
orphan state.

May be, I'm missing something. You might want to talk to Kostja and/or
Georgy about that.

>  }
>  
>  void
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index be58b0225..d85700b78 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -700,6 +700,7 @@ replicaset_sync(void)
>  
>  	say_crit("replica set sync complete, quorum of %d "
>  		 "replicas formed", quorum);
> +	return;

Pointless change.

>  }
>  
>  void
> diff --git a/src/box/replication.h b/src/box/replication.h
> index a6f1dbf69..64f6e7f97 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -378,10 +378,9 @@ void
>  replicaset_follow(void);
>  
>  /**
> - * Wait until a replication quorum is formed.
> - * Return immediately if a quorum cannot be
> - * formed because of errors.
> + * Wait until a replication quorum is formed and sync up with it.
>   */
> +

Why did you change the comment?

>  void
>  replicaset_sync(void);
>  

> diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua

Tests didn't pass on Travis CI:

https://travis-ci.org/tarantool/tarantool/builds/422570554?utm_source=github_status&utm_medium=notification

> new file mode 100644
> index 000000000..0c4fff483
> --- /dev/null
> +++ b/test/replication/sync.test.lua
> @@ -0,0 +1,71 @@
> +fiber = require('fiber')
> +--
> +-- gh-3427: no sync after configuration update
> +--
> +
> +--
> +-- successful sync
> +--
> +
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +
> +box.schema.user.grant('guest', 'replication')
> +
> +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
> +test_run:cmd("start server replica")
> +
> +s = box.schema.space.create('test', {engine = engine})
> +index = s:create_index('primary')
> +
> +-- change replica configuration
> +test_run:cmd("switch replica")
> +replication = box.cfg.replication
> +box.cfg{replication={}}
> +
> +test_run:cmd("switch default")
> +-- insert values on the master while replica is unconfigured
> +box.begin() for i = 1, 100 do box.space.test:insert{i, i} end box.commit()
> +box.space.test:count()
> +
> +test_run:cmd("switch replica")
> +box.cfg{replication = replication}
> +box.space.test:count() == 100
> +
> +--
> +-- unsuccessful sync entering orphan state
> +--
> +box.cfg{replication={}}
> +box.cfg{replication_sync_timeout = 0.000001}

This looks flimsy. Should test-run stall for a millisecond, and the test
would fail. I'd use error injections to make sure it doesn't happen.

> +
> +test_run:cmd("switch default")
> +-- insert values on the master while replica is unconfigured
> +box.begin() for i = 101, 200 do box.space.test:insert{i, i} end box.commit()
> +
> +test_run:cmd("switch replica")
> +box.cfg{replication = replication}
> +box.info.status
> +require'fiber'.sleep(0.1)
> +box.info.status
> +
> +--
> +-- replication_sync_lag is too big
> +--
> +
> +box.cfg{replication_sync_lag = 100}

If I change the lag to 1 or 0.1 or 0.0000001, the test still passes,
i.e. it doesn't check that this option actually works.

> +
> +test_run:cmd("switch default")
> +
> +function f () box.begin() for i = 201, 500 do box.space.test:insert{i, i} end box.commit(); end

These numbers (100, 200, 500) look random. Using error injection would
let you make them look reasonable.

> +_=fiber.create(f)
> +
> +test_run:cmd("switch replica")
> +box.space.test:count() < 500
> +
> +test_run:cmd("switch default")
> +-- cleanup
> +test_run:cmd("stop server replica")
> +test_run:cmd("cleanup server replica")
> +box.space.test:drop()
> +box.schema.user.revoke('guest', 'replication')

You don't test that the instance enters orphan mode and that it
eventually leaves it.

      reply	other threads:[~2018-08-30 16:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30 14:11 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
2018-08-30 14:11 ` [tarantool-patches] [PATCH v2 2/3] box: add replication_sync_timeout Olga Arkhangelskaia
2018-08-30 16:11   ` Vladimir Davydov
2018-08-30 14:11 ` [tarantool-patches] [PATCH v6 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia
2018-08-30 16:41   ` Vladimir Davydov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180830164149.7kmvyo4xncgdjpxu@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=krishtal.olja@gmail.com \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] [PATCH v6 3/3] box: adds replication sync after cfg. update' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox