From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 24 Aug 2018 15:54:18 +0300 From: Vladimir Davydov Subject: Re: [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap Message-ID: <20180824125418.7wm7ylwr3j4nsgyt@esperanza> References: <20180824115645.43531-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180824115645.43531-1-sergepetrenko@tarantool.org> To: Serge Petrenko Cc: tarantool-patches@freelists.org List-ID: On Fri, Aug 24, 2018 at 02:56:45PM +0300, Serge Petrenko wrote: > When replication is configured via some user created in box.once() > function and box.once() takes more than replication_timeout seconds > to execute, appliers recieve ER_NO_SUCH_USER error, which they don't > handle. This leads to occasional test failures in replication suite. > Fix this by handling the aforementioned case in applier_f() and add a > test case. > > Closes #3637 > --- > https://github.com/tarantool/tarantool/issues/3637 > https://github.com/tarantool/tarantool/tree/sp/gh-3637-replication-tests-fix The issue was reassigned to 1.9. Please rebase. > diff --git a/test/replication/autobootstrap.test.lua b/test/replication/autobootstrap.test.lua > index 752d5f317..21417a738 100644 > --- a/test/replication/autobootstrap.test.lua > +++ b/test/replication/autobootstrap.test.lua IMHO we'd better put this small test case in replication/misc, because it's not really about cluster autobootstrap. > @@ -108,3 +108,31 @@ _ = test_run:cmd("switch default") > -- Stop servers > -- > test_run:drop_cluster(SERVERS) > + > +-- > +-- Test case for gh-3637. Before the fix replica would exit with > +-- an error. Now check that we don't hang and successfully connect. > +-- > +fiber = require("fiber") > + > +test_run:cmd("setopt delimiter ';'") > +function wait_replica() > + while box.info.replication[2] == nil do > + fiber.sleep(0.01) > + end > +end; > +test_run:cmd("setopt delimiter ''"); Please use test_run:wait_vclock instead. > + > +test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'") > +test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.1'") I'd set the timeout to 0.01 to make sure replica tries to reconnect several times before it succeeds. > +-- Wait a bit to make sure replica waits till user is created. > +fiber.sleep(0.1) > +box.schema.user.create('cluster', {password='pass'}) > +box.schema.user.grant('cluster', 'replication') > +wait_replica() > + > +test_run:cmd("stop server replica_auth") > +test_run:cmd("cleanup server replica_auth") > +test_run:cmd("delete server replica_auth") > + > +box.schema.user.drop('cluster')