From: Vladimir Davydov <vdavydov.dev@gmail.com> To: Sergey Petrenko <sergepetrenko@tarantool.org> Cc: Georgy Kirichenko <georgy@tarantool.org>, tarantool-patches@freelists.org Subject: Re: [tarantool-patches] Re: [PATCH v2] replication: do not ignore replication_connect_quorum. Date: Thu, 9 Aug 2018 12:11:01 +0300 [thread overview] Message-ID: <20180809091101.hzsxrym3zgndgmgz@esperanza> (raw) In-Reply-To: <34F1BCA9-2C61-4C3A-939D-F90E745B349D@tarantool.org> On Thu, Aug 09, 2018 at 10:50:46AM +0300, Sergey Petrenko wrote: > >> @@ -587,15 +587,13 @@ replicaset_connect(struct applier **appliers, int count, > >> double wait_start = ev_monotonic_now(loop()); > >> if (fiber_cond_wait_timeout(&state.wakeup, timeout) != 0) > >> break; > >> - if (state.failed > 0 && connect_all) > >> - break; > > > > I guess you should break the loop if > > > > state.failed > count - min_count_to_connect > > AFAIU, if replication_timeout is less than replication_connect_timeout, the appliers, which have > failed, will have time to try and reconnect during replicaset_connect(). So failing here is essentially ignoring > replication_connect_timeout. No, replicaset_connect_state.failed is only incremented on unrecoverable errors, when an applier stops - see applier_on_connect_f, applier_f. > >> diff --git a/test/replication-py/master.lua b/test/replication-py/master.lua > >> index 0f9f7a6f0..51283efdf 100644 > >> --- a/test/replication-py/master.lua > >> +++ b/test/replication-py/master.lua > >> @@ -3,6 +3,8 @@ os = require('os') > >> box.cfg({ > >> listen = os.getenv("LISTEN"), > >> memtx_memory = 107374182, > >> + replication_connect_timeout = 1.0, > >> + replication_timeout = 0.3 > > > > Why do you need to adjust the timeouts? > > The timeouts set in all cfg files in all the tests had no effect, just like the > replication_connect_quorum option, cos both of the options were ignored > During bootstrap. We wanted every replica to connect, and the timeout was > Set to TIMEOUT_INFINITY. Now when we actually start passing > replication_connect_timeout, all these timeouts become too small. Some tests now take too long because of this. E.g. replication/quorum runs for 15 seconds for memtx+vinyl without this patch, which is already long enough, but with this patch it takes more than 30 seconds. We need to do something about that. > >> diff --git a/test/replication/autobootstrap_guest.lua b/test/replication/autobootstrap_guest.lua > >> index 40fef2c7a..7cd921e3c 100644 > >> --- a/test/replication/autobootstrap_guest.lua > >> +++ b/test/replication/autobootstrap_guest.lua > >> @@ -20,7 +20,7 @@ box.cfg({ > >> instance_uri(2); > >> instance_uri(3); > >> }; > >> - replication_connect_timeout = 0.5, > >> + replication_connect_timeout = 5, > > > > Why do you use different timeouts in different tests? > > I tried to find the lowest possible boundary in every test. It happens so that they differ. > I believe, any finite timeout is better that the infinite one. I'd like to see some system behind choosing timeouts. Currently, they look random :-( > diff --git a/src/box/replication.h b/src/box/replication.h > index 95122eb45..9ce9910f8 100644 > --- a/src/box/replication.h > +++ b/src/box/replication.h > @@ -356,13 +356,13 @@ replicaset_add(uint32_t replica_id, const struct tt_uuid *instance_uuid); > * > * \param appliers the array of appliers > * \param count size of appliers array > - * \param timeout connection timeout > - * \param connect_all if this flag is set, fail unless all > - * appliers have successfully connected > + * \param connect_quorum if this flag is set, fail unless at > + * least replication_connect_quorum Malformed indentation: this line should start exactly below the parameter description ('if this flag is set ...') - it simply looks better that way. Please fix. > + * appliers have successfully connected. > */ > void > replicaset_connect(struct applier **appliers, int count, > - double timeout, bool connect_all); > + bool connect_quorum); > > /** > * Resume all appliers registered with the replica set. > diff --git a/test/replication/errinj.test.lua b/test/replication/errinj.test.lua > index 463d89a8f..f00b98eed 100644 > --- a/test/replication/errinj.test.lua > +++ b/test/replication/errinj.test.lua > @@ -173,7 +173,7 @@ errinj.set("ERRINJ_RELAY_EXIT_DELAY", 0) > box.cfg{replication_timeout = 0.01} > > test_run:cmd("create server replica_timeout with rpl_master=default, script='replication/replica_timeout.lua'") > -test_run:cmd("start server replica_timeout with args='0.01'") > +test_run:cmd("start server replica_timeout with args='0.1, 0.5'") This won't work as expected - arguments are supposed to be delimited by space, not by comma. That is, TIMEOUT will turn into null in the script. Also, I think you should write some tests that check that the quorum is not ignored on bootstrap and on replication reconfiguration anymore.
next prev parent reply other threads:[~2018-08-09 9:11 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-07 12:44 [tarantool-patches] " Serge Petrenko 2018-08-07 14:18 ` [tarantool-patches] " Georgy Kirichenko 2018-08-07 17:28 ` [tarantool-patches] " Vladimir Davydov 2018-08-09 7:50 ` [tarantool-patches] " Sergey Petrenko 2018-08-09 9:11 ` Vladimir Davydov [this message] 2018-08-09 12:20 ` Vladimir Davydov 2018-08-09 12:39 ` Serge Petrenko 2018-08-10 9:14 ` [PATCH v3] " Serge Petrenko 2018-08-10 10:48 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180809091101.hzsxrym3zgndgmgz@esperanza \ --to=vdavydov.dev@gmail.com \ --cc=georgy@tarantool.org \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [tarantool-patches] Re: [PATCH v2] replication: do not ignore replication_connect_quorum.' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox