* [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap
@ 2018-08-24 11:56 Serge Petrenko
2018-08-24 12:54 ` Vladimir Davydov
2018-08-24 16:32 ` Vladimir Davydov
0 siblings, 2 replies; 4+ messages in thread
From: Serge Petrenko @ 2018-08-24 11:56 UTC (permalink / raw)
To: vdavydov.dev; +Cc: tarantool-patches, Serge Petrenko
When replication is configured via some user created in box.once()
function and box.once() takes more than replication_timeout seconds
to execute, appliers recieve ER_NO_SUCH_USER error, which they don't
handle. This leads to occasional test failures in replication suite.
Fix this by handling the aforementioned case in applier_f() and add a
test case.
Closes #3637
---
https://github.com/tarantool/tarantool/issues/3637
https://github.com/tarantool/tarantool/tree/sp/gh-3637-replication-tests-fix
Changes in v3:
- rewrite test case to be more versatile.
- go back to old comments in applier_f().
Changes in v2:
- add a test and ensure new relevant
lines are covered.
- merge ER_NOSUCH_USER case with
ER_ACCESS_DENIED due to similarity.
src/box/applier.cc | 3 +-
test/replication/autobootstrap.result | 58 +++++++++++++++++++++++++++++++++
test/replication/autobootstrap.test.lua | 28 ++++++++++++++++
test/replication/replica_auth.lua | 14 ++++++++
4 files changed, 102 insertions(+), 1 deletion(-)
create mode 100644 test/replication/replica_auth.lua
diff --git a/src/box/applier.cc b/src/box/applier.cc
index dbb4d05f9..28df8f7ca 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -602,7 +602,8 @@ applier_f(va_list ap)
applier_log_error(applier, e);
applier_disconnect(applier, APPLIER_LOADING);
goto reconnect;
- } else if (e->errcode() == ER_ACCESS_DENIED) {
+ } else if (e->errcode() == ER_ACCESS_DENIED ||
+ e->errcode() == ER_NO_SUCH_USER) {
/* Invalid configuration */
applier_log_error(applier, e);
applier_disconnect(applier, APPLIER_DISCONNECTED);
diff --git a/test/replication/autobootstrap.result b/test/replication/autobootstrap.result
index 91badc1f1..ed904672d 100644
--- a/test/replication/autobootstrap.result
+++ b/test/replication/autobootstrap.result
@@ -231,3 +231,61 @@ _ = test_run:cmd("switch default")
test_run:drop_cluster(SERVERS)
---
...
+--
+-- Test case for gh-3637. Before the fix replica would exit with
+-- an error. Now check that we don't hang and successfully connect.
+--
+fiber = require("fiber")
+---
+...
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+function wait_replica()
+ while box.info.replication[2] == nil do
+ fiber.sleep(0.01)
+ end
+end;
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
+---
+- true
+...
+test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.1'")
+---
+- true
+...
+-- Wait a bit to make sure replica waits till user is created.
+fiber.sleep(0.1)
+---
+...
+box.schema.user.create('cluster', {password='pass'})
+---
+...
+box.schema.user.grant('cluster', 'replication')
+---
+...
+wait_replica()
+---
+...
+test_run:cmd("stop server replica_auth")
+---
+- true
+...
+test_run:cmd("cleanup server replica_auth")
+---
+- true
+...
+test_run:cmd("delete server replica_auth")
+---
+- true
+...
+box.schema.user.drop('cluster')
+---
+...
diff --git a/test/replication/autobootstrap.test.lua b/test/replication/autobootstrap.test.lua
index 752d5f317..21417a738 100644
--- a/test/replication/autobootstrap.test.lua
+++ b/test/replication/autobootstrap.test.lua
@@ -108,3 +108,31 @@ _ = test_run:cmd("switch default")
-- Stop servers
--
test_run:drop_cluster(SERVERS)
+
+--
+-- Test case for gh-3637. Before the fix replica would exit with
+-- an error. Now check that we don't hang and successfully connect.
+--
+fiber = require("fiber")
+
+test_run:cmd("setopt delimiter ';'")
+function wait_replica()
+ while box.info.replication[2] == nil do
+ fiber.sleep(0.01)
+ end
+end;
+test_run:cmd("setopt delimiter ''");
+
+test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
+test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.1'")
+-- Wait a bit to make sure replica waits till user is created.
+fiber.sleep(0.1)
+box.schema.user.create('cluster', {password='pass'})
+box.schema.user.grant('cluster', 'replication')
+wait_replica()
+
+test_run:cmd("stop server replica_auth")
+test_run:cmd("cleanup server replica_auth")
+test_run:cmd("delete server replica_auth")
+
+box.schema.user.drop('cluster')
diff --git a/test/replication/replica_auth.lua b/test/replication/replica_auth.lua
new file mode 100644
index 000000000..22ba9146c
--- /dev/null
+++ b/test/replication/replica_auth.lua
@@ -0,0 +1,14 @@
+#!/usr/bin/env tarantool
+
+local USER_PASS = arg[1]
+local TIMEOUT = arg[2] and tonumber(arg[2]) or 0.1
+local CON_TIMEOUT = arg[3] and tonumber(arg[3]) or 30.0
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+ listen = os.getenv("LISTEN"),
+ replication = USER_PASS .. "@" .. os.getenv("MASTER"),
+ replication_timeout = TIMEOUT,
+ replication_connect_timeout = CON_TIMEOUT
+})
--
2.15.2 (Apple Git-101.1)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap
2018-08-24 11:56 [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap Serge Petrenko
@ 2018-08-24 12:54 ` Vladimir Davydov
2018-08-24 16:15 ` Serge Petrenko
2018-08-24 16:32 ` Vladimir Davydov
1 sibling, 1 reply; 4+ messages in thread
From: Vladimir Davydov @ 2018-08-24 12:54 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches
On Fri, Aug 24, 2018 at 02:56:45PM +0300, Serge Petrenko wrote:
> When replication is configured via some user created in box.once()
> function and box.once() takes more than replication_timeout seconds
> to execute, appliers recieve ER_NO_SUCH_USER error, which they don't
> handle. This leads to occasional test failures in replication suite.
> Fix this by handling the aforementioned case in applier_f() and add a
> test case.
>
> Closes #3637
> ---
> https://github.com/tarantool/tarantool/issues/3637
> https://github.com/tarantool/tarantool/tree/sp/gh-3637-replication-tests-fix
The issue was reassigned to 1.9. Please rebase.
> diff --git a/test/replication/autobootstrap.test.lua b/test/replication/autobootstrap.test.lua
> index 752d5f317..21417a738 100644
> --- a/test/replication/autobootstrap.test.lua
> +++ b/test/replication/autobootstrap.test.lua
IMHO we'd better put this small test case in replication/misc, because
it's not really about cluster autobootstrap.
> @@ -108,3 +108,31 @@ _ = test_run:cmd("switch default")
> -- Stop servers
> --
> test_run:drop_cluster(SERVERS)
> +
> +--
> +-- Test case for gh-3637. Before the fix replica would exit with
> +-- an error. Now check that we don't hang and successfully connect.
> +--
> +fiber = require("fiber")
> +
> +test_run:cmd("setopt delimiter ';'")
> +function wait_replica()
> + while box.info.replication[2] == nil do
> + fiber.sleep(0.01)
> + end
> +end;
> +test_run:cmd("setopt delimiter ''");
Please use test_run:wait_vclock instead.
> +
> +test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
> +test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.1'")
I'd set the timeout to 0.01 to make sure replica tries to reconnect
several times before it succeeds.
> +-- Wait a bit to make sure replica waits till user is created.
> +fiber.sleep(0.1)
> +box.schema.user.create('cluster', {password='pass'})
> +box.schema.user.grant('cluster', 'replication')
> +wait_replica()
> +
> +test_run:cmd("stop server replica_auth")
> +test_run:cmd("cleanup server replica_auth")
> +test_run:cmd("delete server replica_auth")
> +
> +box.schema.user.drop('cluster')
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap
2018-08-24 12:54 ` Vladimir Davydov
@ 2018-08-24 16:15 ` Serge Petrenko
0 siblings, 0 replies; 4+ messages in thread
From: Serge Petrenko @ 2018-08-24 16:15 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
Hi, I pushed all fixes on the branch.
> 24 авг. 2018 г., в 15:54, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
>
> On Fri, Aug 24, 2018 at 02:56:45PM +0300, Serge Petrenko wrote:
>> When replication is configured via some user created in box.once()
>> function and box.once() takes more than replication_timeout seconds
>> to execute, appliers recieve ER_NO_SUCH_USER error, which they don't
>> handle. This leads to occasional test failures in replication suite.
>> Fix this by handling the aforementioned case in applier_f() and add a
>> test case.
>>
>> Closes #3637
>> ---
>> https://github.com/tarantool/tarantool/issues/3637
>> https://github.com/tarantool/tarantool/tree/sp/gh-3637-replication-tests-fix
>
> The issue was reassigned to 1.9. Please rebase.
Done.
>
>> diff --git a/test/replication/autobootstrap.test.lua b/test/replication/autobootstrap.test.lua
>> index 752d5f317..21417a738 100644
>> --- a/test/replication/autobootstrap.test.lua
>> +++ b/test/replication/autobootstrap.test.lua
>
> IMHO we'd better put this small test case in replication/misc, because
> it's not really about cluster autobootstrap.
Ok, I moved the case to replication/misc
>
>> @@ -108,3 +108,31 @@ _ = test_run:cmd("switch default")
>> -- Stop servers
>> --
>> test_run:drop_cluster(SERVERS)
>> +
>> +--
>> +-- Test case for gh-3637. Before the fix replica would exit with
>> +-- an error. Now check that we don't hang and successfully connect.
>> +--
>> +fiber = require("fiber")
>> +
>> +test_run:cmd("setopt delimiter ';'")
>> +function wait_replica()
>> + while box.info.replication[2] == nil do
>> + fiber.sleep(0.01)
>> + end
>> +end;
>> +test_run:cmd("setopt delimiter ''");
>
> Please use test_run:wait_vclock instead.
It doesn’t work, since we cannot start replica_auth with wait_load=True, because it
won't load until we create user and give it privs. Due to not waiting for replica to load,
we may sometimes see an error in test_run:wait_vclock(), when there is no vclock yet since
the instance hasn’t had time to load.
So I inlined the old while loop and removed function declaration. Also added a small delay afterwards
to make sure replica fully loads.
>
>> +
>> +test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
>> +test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.1'")
>
> I'd set the timeout to 0.01 to make sure replica tries to reconnect
> several times before it succeeds.
>
Ok. Set to 0.03, with 0.01 replica reaches timeout before applying any rows and never loads.
>> +-- Wait a bit to make sure replica waits till user is created.
>> +fiber.sleep(0.1)
>> +box.schema.user.create('cluster', {password='pass'})
>> +box.schema.user.grant('cluster', 'replication')
>> +wait_replica()
>> +
>> +test_run:cmd("stop server replica_auth")
>> +test_run:cmd("cleanup server replica_auth")
>> +test_run:cmd("delete server replica_auth")
>> +
>> +box.schema.user.drop('cluster')
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap
2018-08-24 11:56 [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap Serge Petrenko
2018-08-24 12:54 ` Vladimir Davydov
@ 2018-08-24 16:32 ` Vladimir Davydov
1 sibling, 0 replies; 4+ messages in thread
From: Vladimir Davydov @ 2018-08-24 16:32 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches
Pushed to 1.9, here's the final version:
From 33950162f3e766d413567cd75aaa7e6c384831bd Mon Sep 17 00:00:00 2001
From: Serge Petrenko <sergepetrenko@tarantool.org>
Date: Thu, 23 Aug 2018 14:08:51 +0300
Subject: [PATCH] replication: fix exit with ER_NO_SUCH_USER during bootstrap
When replication is configured via some user created in box.once()
function and box.once() takes more than replication_timeout seconds
to execute, appliers recieve ER_NO_SUCH_USER error, which they don't
handle. This leads to occasional test failures in replication suite.
Fix this by handling the aforementioned case in applier_f() and add a
test case.
Closes #3637
diff --git a/src/box/applier.cc b/src/box/applier.cc
index b9f041d8..16a87389 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -596,7 +596,8 @@ applier_f(va_list ap)
applier_log_error(applier, e);
applier_disconnect(applier, APPLIER_LOADING);
goto reconnect;
- } else if (e->errcode() == ER_ACCESS_DENIED) {
+ } else if (e->errcode() == ER_ACCESS_DENIED ||
+ e->errcode() == ER_NO_SUCH_USER) {
/* Invalid configuration */
applier_log_error(applier, e);
applier_disconnect(applier, APPLIER_DISCONNECTED);
diff --git a/test/replication/misc.result b/test/replication/misc.result
index 9df2a2c4..76e7fd5e 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -232,3 +232,55 @@ test_run:drop_cluster(SERVERS)
box.schema.user.revoke('guest', 'replication')
---
...
+--
+-- Test case for gh-3637. Before the fix replica would exit with
+-- an error. Now check that we don't hang and successfully connect.
+--
+fiber = require('fiber')
+---
+...
+test_run:cleanup_cluster()
+---
+...
+test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
+---
+- true
+...
+test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'")
+---
+- true
+...
+-- Wait a bit to make sure replica waits till user is created.
+fiber.sleep(0.1)
+---
+...
+box.schema.user.create('cluster', {password='pass'})
+---
+...
+box.schema.user.grant('cluster', 'replication')
+---
+...
+while box.info.replication[2] == nil do fiber.sleep(0.01) end
+---
+...
+vclock = test_run:get_vclock('default')
+---
+...
+_ = test_run:wait_vclock('replica_auth', vclock)
+---
+...
+test_run:cmd("stop server replica_auth")
+---
+- true
+...
+test_run:cmd("cleanup server replica_auth")
+---
+- true
+...
+test_run:cmd("delete server replica_auth")
+---
+- true
+...
+box.schema.user.drop('cluster')
+---
+...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 979c5d58..c60adf5a 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -91,3 +91,28 @@ test_run:cmd("switch default")
test_run:drop_cluster(SERVERS)
box.schema.user.revoke('guest', 'replication')
+
+--
+-- Test case for gh-3637. Before the fix replica would exit with
+-- an error. Now check that we don't hang and successfully connect.
+--
+fiber = require('fiber')
+
+test_run:cleanup_cluster()
+
+test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
+test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'")
+-- Wait a bit to make sure replica waits till user is created.
+fiber.sleep(0.1)
+box.schema.user.create('cluster', {password='pass'})
+box.schema.user.grant('cluster', 'replication')
+
+while box.info.replication[2] == nil do fiber.sleep(0.01) end
+vclock = test_run:get_vclock('default')
+_ = test_run:wait_vclock('replica_auth', vclock)
+
+test_run:cmd("stop server replica_auth")
+test_run:cmd("cleanup server replica_auth")
+test_run:cmd("delete server replica_auth")
+
+box.schema.user.drop('cluster')
diff --git a/test/replication/replica_auth.lua b/test/replication/replica_auth.lua
new file mode 100644
index 00000000..22ba9146
--- /dev/null
+++ b/test/replication/replica_auth.lua
@@ -0,0 +1,14 @@
+#!/usr/bin/env tarantool
+
+local USER_PASS = arg[1]
+local TIMEOUT = arg[2] and tonumber(arg[2]) or 0.1
+local CON_TIMEOUT = arg[3] and tonumber(arg[3]) or 30.0
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+ listen = os.getenv("LISTEN"),
+ replication = USER_PASS .. "@" .. os.getenv("MASTER"),
+ replication_timeout = TIMEOUT,
+ replication_connect_timeout = CON_TIMEOUT
+})
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-08-24 16:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-24 11:56 [PATCH v3] replication: fix exit with ER_NO_SUCH_USER during bootstrap Serge Petrenko
2018-08-24 12:54 ` Vladimir Davydov
2018-08-24 16:15 ` Serge Petrenko
2018-08-24 16:32 ` Vladimir Davydov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox