From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 23 Aug 2018 16:29:05 +0300 From: Vladimir Davydov Subject: Re: [PATCH] replication: fix exit with ER_NO_SUCH_USER during bootstrap Message-ID: <20180823132905.qpye5zntutsmznv2@esperanza> References: <20180823125743.56072-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180823125743.56072-1-sergepetrenko@tarantool.org> To: Serge Petrenko Cc: tarantool-patches@freelists.org List-ID: On Thu, Aug 23, 2018 at 03:57:43PM +0300, Serge Petrenko wrote: > When replication is configured via some user created in box.once() > function and box.once() takes more than replication_timeout seconds > to execute, appliers recieve ER_NO_SUCH_USER error, which they don't > handle. This leads to occasional test failures in replication suite. > Fix this by handling the aforementioned case in applier_f(). > > Closes #3637 > --- > https://github.com/tarantool/tarantool/issues/3637 > https://github.com/tarantool/tarantool/tree/sp/gh-3637-replication-tests-fix > > src/box/applier.cc | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index dbb4d05f9..740778e80 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -607,6 +607,14 @@ applier_f(va_list ap) > applier_log_error(applier, e); > applier_disconnect(applier, APPLIER_DISCONNECTED); > goto reconnect; > + } else if (e->errcode() == ER_NO_SUCH_USER) { > + /* > + * Probably box.once() hasn't finished > + * on bootstrap leader yet. > + */ > + applier_log_error(applier, e); > + applier_disconnect(applier, APPLIER_DISCONNECTED); > + goto reconnect; This piece of code isn't covered by any test, see https://coveralls.io/builds/18630789/source?filename=src/box/applier.cc#L610 Please add a test case. I think it should be pretty easy to do: start a replica with a small value of box.cfg.replication_timeout which tries to connect to the default instance as a non-existent user, then wait a bit, create the user, and make sure it finally connects. Also, I think that you should share the code with ER_ACCESS_DENIED case, because these two errors have the same nature - missing or invalid configuration: if (e->errcode() == ER_ACCESS_DENIED || e->errcode() == ER_NO_SUCH_USER) { /* Invalid configuration */ applier_log_error(applier, e); applier_disconnect(applier, APPLIER_DISCONNECTED); goto reconnect; }