From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 23 Aug 2018 22:26:32 +0300 From: Vladimir Davydov Subject: Re: [PATCH v2] replication: fix exit with ER_NO_SUCH_USER during bootstrap Message-ID: <20180823192632.5ctuewsg6r6krexe@esperanza> References: <20180823145616.97810-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180823145616.97810-1-sergepetrenko@tarantool.org> To: Serge Petrenko Cc: tarantool-patches@freelists.org List-ID: On Thu, Aug 23, 2018 at 05:56:16PM +0300, Serge Petrenko wrote: > diff --git a/src/box/applier.cc b/src/box/applier.cc > index dbb4d05f9..02ec729bc 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -602,8 +602,12 @@ applier_f(va_list ap) > applier_log_error(applier, e); > applier_disconnect(applier, APPLIER_LOADING); > goto reconnect; > - } else if (e->errcode() == ER_ACCESS_DENIED) { > - /* Invalid configuration */ > + } else if (e->errcode() == ER_ACCESS_DENIED || > + e->errcode() == ER_NO_SUCH_USER) { > + /* > + * Invalid configuration or box.once() > + * hasn't finished on bootstrap master. > + */ Please leave the old comment. box.once() is just one way to configure a cluster. No need to mention it explicitly here. > applier_log_error(applier, e); > applier_disconnect(applier, APPLIER_DISCONNECTED); > goto reconnect; > diff --git a/test/replication/autobootstrap.test.lua b/test/replication/autobootstrap.test.lua > index 752d5f317..bbf6f6512 100644 > --- a/test/replication/autobootstrap.test.lua > +++ b/test/replication/autobootstrap.test.lua > @@ -108,3 +108,13 @@ _ = test_run:cmd("switch default") > -- Stop servers > -- > test_run:drop_cluster(SERVERS) > + > +-- > +-- Test case for gh-3637. Before the fix replica would exit with > +-- an error. Now check that we don't hang and successfully connect. > +-- > +SERVERS = {'replica_bootstrap1', 'replica_bootstrap2'} > +test_run:create_cluster(SERVERS, "replication", {args="0.1"}) > +test_run:wait_fullmesh(SERVERS) > +print("ok") This is redundant, obviously. > +test_run:drop_cluster(SERVERS) > diff --git a/test/replication/replica_bootstrap.lua b/test/replication/replica_bootstrap.lua > new file mode 100644 > index 000000000..9a6c5f4a0 > --- /dev/null > +++ b/test/replication/replica_bootstrap.lua > @@ -0,0 +1,30 @@ > +#!/usr/bin/env tarantool > + > +local INSTANCE_ID = string.match(arg[0], "%d") > +local USER = 'cluster' > +local PASSWORD = 'pass' > +local SOCKET_DIR = require('fio').cwd() > +local TIMEOUT = tonumber(arg[1]) > +local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 100 * 365 * 86400 > + > +local function instance_uri(instance_id) > + return SOCKET_DIR..'/replica_bootstrap'..instance_id..'.sock' > +end > + > +require('console').listen(os.getenv('ADMIN')) > + > +box.cfg({ > + listen=instance_uri(INSTANCE_ID), > + replication={ > + USER..':'..PASSWORD..'@'..instance_uri(1), > + USER..':'..PASSWORD..'@'..instance_uri(2) > + }, > + replication_timeout=TIMEOUT, > + replication_connect_timeout=CON_TIMEOUT > +}) > + > +box.once('bootstrap', function() > + require('fiber').sleep(3 * TIMEOUT) > + box.schema.user.create(USER, {password=PASSWORD}) > + box.schema.user.grant(USER, 'replication') > +end) I don't like this new script, because, despite its generic name and arguments, we are unlikely to be able to ever reuse it for any other test. I suggest you try to rewrite the test without creating a cluster of two nodes. I think you could instead connect a single replica to the default server (the one that runs the test). Then you would call the script 'replica_auth.lua' and pass the user:password pair in the first argument and the timeouts in the second and the third arguments. It would look more generic and reusable then, similar to 'replica_quorum.lua'.