From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 99C5843D67A for ; Fri, 18 Oct 2019 00:11:32 +0300 (MSK) From: Vladislav Shpilevoy Date: Thu, 17 Oct 2019 23:16:37 +0200 Message-Id: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 1/1] replication: auto reconnect if password is invalid List-Id: Tarantool development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org Cc: tarantool-patches@freelists.org Before the patch there was a race in replication password configuration. It was possible that a replica connects to a master with a custom password before that password is actually set. The replica treated the error as critical and exited. But in fact it is not critical. Replica even can withstand absence of a user and keeps reconnecting. Wrong password situation arises from the same problem of non atomic configuration and is fixed the same - keep reconnect attempts if the password was wrong. Closes #4550 --- Branch: https://github.com/tarantool/tarantool/tree/gerold103/gh-4550-replication-password-cfg src/box/applier.cc | 4 +++- test/replication/misc.result | 16 +++++++++++++--- test/replication/misc.test.lua | 12 +++++++++--- 3 files changed, 25 insertions(+), 7 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index 6239fcfd3..7d4a670d7 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -101,6 +101,7 @@ applier_log_error(struct applier *applier, struct error *e) case ER_NO_SUCH_USER: case ER_SYSTEM: case ER_UNKNOWN_REPLICA: + case ER_PASSWORD_MISMATCH: say_info("will retry every %.2lf second", replication_reconnect_interval()); break; @@ -979,7 +980,8 @@ applier_f(va_list ap) goto reconnect; } else if (e->errcode() == ER_CFG || e->errcode() == ER_ACCESS_DENIED || - e->errcode() == ER_NO_SUCH_USER) { + e->errcode() == ER_NO_SUCH_USER || + e->errcode() == ER_PASSWORD_MISMATCH) { /* Invalid configuration */ applier_log_error(applier, e); applier_disconnect(applier, APPLIER_LOADING); diff --git a/test/replication/misc.result b/test/replication/misc.result index f7098aac8..b63d72846 100644 --- a/test/replication/misc.result +++ b/test/replication/misc.result @@ -374,8 +374,10 @@ test_run:cleanup_cluster() --- ... -- --- Test case for gh-3637. Before the fix replica would exit with --- an error. Now check that we don't hang and successfully connect. +-- Test case for gh-3637, gh-4550. Before the fix replica would +-- exit with an error if a user does not exist or a password is +-- incorrect. Now check that we don't hang/panic and successfully +-- connect. -- fiber = require('fiber') --- @@ -392,7 +394,15 @@ test_run:cmd("start server replica_auth with wait=False, wait_load=False, args=' fiber.sleep(0.1) --- ... -box.schema.user.create('cluster', {password='pass'}) +box.schema.user.create('cluster') +--- +... +-- The user is created. Let the replica fail auth request due to +-- a wrong password. +fiber.sleep(0.1) +--- +... +box.schema.user.passwd('cluster', 'pass') --- ... box.schema.user.grant('cluster', 'replication') diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua index c4ddbdb47..c454a0992 100644 --- a/test/replication/misc.test.lua +++ b/test/replication/misc.test.lua @@ -153,15 +153,21 @@ test_run:cmd('delete server er_load2') test_run:cleanup_cluster() -- --- Test case for gh-3637. Before the fix replica would exit with --- an error. Now check that we don't hang and successfully connect. +-- Test case for gh-3637, gh-4550. Before the fix replica would +-- exit with an error if a user does not exist or a password is +-- incorrect. Now check that we don't hang/panic and successfully +-- connect. -- fiber = require('fiber') test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'") test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'") -- Wait a bit to make sure replica waits till user is created. fiber.sleep(0.1) -box.schema.user.create('cluster', {password='pass'}) +box.schema.user.create('cluster') +-- The user is created. Let the replica fail auth request due to +-- a wrong password. +fiber.sleep(0.1) +box.schema.user.passwd('cluster', 'pass') box.schema.user.grant('cluster', 'replication') while box.info.replication[2] == nil do fiber.sleep(0.01) end -- 2.21.0 (Apple Git-122)