Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Sergey Petrenko <sergepetrenko@tarantool.org>
Cc: Georgy Kirichenko <georgy@tarantool.org>,
	tarantool-patches@freelists.org
Subject: Re: [tarantool-patches] Re: [PATCH v2] replication: do not ignore replication_connect_quorum.
Date: Thu, 9 Aug 2018 12:11:01 +0300	[thread overview]
Message-ID: <20180809091101.hzsxrym3zgndgmgz@esperanza> (raw)
In-Reply-To: <34F1BCA9-2C61-4C3A-939D-F90E745B349D@tarantool.org>

On Thu, Aug 09, 2018 at 10:50:46AM +0300, Sergey Petrenko wrote:
> >> @@ -587,15 +587,13 @@ replicaset_connect(struct applier **appliers, int count,
> >> 		double wait_start = ev_monotonic_now(loop());
> >> 		if (fiber_cond_wait_timeout(&state.wakeup, timeout) != 0)
> >> 			break;
> >> -		if (state.failed > 0 && connect_all)
> >> -			break;
> > 
> > I guess you should break the loop if
> > 
> >  state.failed > count - min_count_to_connect
> 
> AFAIU, if replication_timeout is less than replication_connect_timeout, the appliers, which have
> failed, will have time to try and reconnect during replicaset_connect(). So failing here is essentially ignoring
> replication_connect_timeout.

No, replicaset_connect_state.failed is only incremented on unrecoverable
errors, when an applier stops - see applier_on_connect_f, applier_f.

> >> diff --git a/test/replication-py/master.lua b/test/replication-py/master.lua
> >> index 0f9f7a6f0..51283efdf 100644
> >> --- a/test/replication-py/master.lua
> >> +++ b/test/replication-py/master.lua
> >> @@ -3,6 +3,8 @@ os = require('os')
> >> box.cfg({
> >>     listen              = os.getenv("LISTEN"),
> >>     memtx_memory        = 107374182,
> >> +    replication_connect_timeout = 1.0,
> >> +    replication_timeout = 0.3
> > 
> > Why do you need to adjust the timeouts?
> 
> The timeouts set in all cfg files in all the tests had no effect, just like the
> replication_connect_quorum option, cos both of the options were ignored
> During bootstrap. We wanted every replica to connect, and the timeout was
> Set to TIMEOUT_INFINITY. Now when we actually start passing
> replication_connect_timeout, all these timeouts become too small.

Some tests now take too long because of this. E.g. replication/quorum
runs for 15 seconds for memtx+vinyl without this patch, which is already
long enough, but with this patch it takes more than 30 seconds. We need
to do something about that.

> >> diff --git a/test/replication/autobootstrap_guest.lua b/test/replication/autobootstrap_guest.lua
> >> index 40fef2c7a..7cd921e3c 100644
> >> --- a/test/replication/autobootstrap_guest.lua
> >> +++ b/test/replication/autobootstrap_guest.lua
> >> @@ -20,7 +20,7 @@ box.cfg({
> >>         instance_uri(2);
> >>         instance_uri(3);
> >>     };
> >> -    replication_connect_timeout = 0.5,
> >> +    replication_connect_timeout = 5,
> > 
> > Why do you use different timeouts in different tests?
> 
> I tried to find the lowest possible boundary in every test. It happens so that they differ.
> I believe, any finite timeout is better that the infinite one.

I'd like to see some system behind choosing timeouts.
Currently, they look random :-(

> diff --git a/src/box/replication.h b/src/box/replication.h
> index 95122eb45..9ce9910f8 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -356,13 +356,13 @@ replicaset_add(uint32_t replica_id, const struct tt_uuid *instance_uuid);
>   *
>   * \param appliers the array of appliers
>   * \param count size of appliers array
> - * \param timeout connection timeout
> - * \param connect_all if this flag is set, fail unless all
> - *                    appliers have successfully connected
> + * \param connect_quorum if this flag is set, fail unless at
> + *		       least replication_connect_quorum

Malformed indentation: this line should start exactly below the
parameter description ('if this flag is set ...') - it simply looks
better that way. Please fix.

> + *		       appliers have successfully connected.
>   */
>  void
>  replicaset_connect(struct applier **appliers, int count,
> -		   double timeout, bool connect_all);
> +		   bool connect_quorum);
>  
>  /**
>   * Resume all appliers registered with the replica set.

> diff --git a/test/replication/errinj.test.lua b/test/replication/errinj.test.lua
> index 463d89a8f..f00b98eed 100644
> --- a/test/replication/errinj.test.lua
> +++ b/test/replication/errinj.test.lua
> @@ -173,7 +173,7 @@ errinj.set("ERRINJ_RELAY_EXIT_DELAY", 0)
>  box.cfg{replication_timeout = 0.01}
>  
>  test_run:cmd("create server replica_timeout with rpl_master=default, script='replication/replica_timeout.lua'")
> -test_run:cmd("start server replica_timeout with args='0.01'")
> +test_run:cmd("start server replica_timeout with args='0.1, 0.5'")

This won't work as expected - arguments are supposed to be delimited by
space, not by comma. That is, TIMEOUT will turn into null in the script.

Also, I think you should write some tests that check that the quorum is
not ignored on bootstrap and on replication reconfiguration anymore.

  reply	other threads:[~2018-08-09  9:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-07 12:44 [tarantool-patches] " Serge Petrenko
2018-08-07 14:18 ` [tarantool-patches] " Georgy Kirichenko
2018-08-07 17:28 ` [tarantool-patches] " Vladimir Davydov
2018-08-09  7:50   ` [tarantool-patches] " Sergey Petrenko
2018-08-09  9:11     ` Vladimir Davydov [this message]
2018-08-09 12:20       ` Vladimir Davydov
2018-08-09 12:39         ` Serge Petrenko
2018-08-10  9:14           ` [PATCH v3] " Serge Petrenko
2018-08-10 10:48             ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180809091101.hzsxrym3zgndgmgz@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=georgy@tarantool.org \
    --cc=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] Re: [PATCH v2] replication: do not ignore replication_connect_quorum.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox