Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH] replication: fix broken cases with quorum=0
@ 2018-04-17 14:07 Konstantin Belyavskiy
  2018-04-17 18:09 ` [tarantool-patches] " Vladislav Shpilevoy
  2018-04-18 10:33 ` Vladimir Davydov
  0 siblings, 2 replies; 4+ messages in thread
From: Konstantin Belyavskiy @ 2018-04-17 14:07 UTC (permalink / raw)
  To: vdavydov, georgy; +Cc: tarantool-patches

Ticket: https://github.com/tarantool/tarantool/issues/3278
branch: https://github.com/tarantool/tarantool/compare/gh-3278-quorum-fix

This commit is related with 6d81fa99
With replication_connect_quorum=0 set, previous commit broke replication
since skip applier_resume() and applier_start() parts.
Fix it and add more test cases.

Close #3278
---
 src/box/replication.cc              | 13 +++--
 test/replication/master_quorum.lua  | 33 +++++++++++++
 test/replication/master_quorum1.lua |  1 +
 test/replication/master_quorum2.lua |  1 +
 test/replication/quorum.result      | 98 ++++++++++++++++++++++++++++++++++++-
 test/replication/quorum.test.lua    | 33 ++++++++++++-
 6 files changed, 173 insertions(+), 6 deletions(-)
 create mode 100644 test/replication/master_quorum.lua
 create mode 120000 test/replication/master_quorum1.lua
 create mode 120000 test/replication/master_quorum2.lua

diff --git a/src/box/replication.cc b/src/box/replication.cc
index 760f83751..b4d5cc2a2 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -600,11 +600,9 @@ error:
 void
 replicaset_follow(void)
 {
-	if (replicaset.applier.total == 0 || replicaset_quorum() == 0) {
+	if (replicaset.applier.total == 0) {
 		/*
-		 * Replication is not configured or quorum is set to
-		 * zero so in the latter case we have no need to wait
-		 * for others.
+		 * Replication is not configured.
 		 */
 		box_clear_orphan();
 		return;
@@ -619,6 +617,13 @@ replicaset_follow(void)
 		/* Restart appliers that failed to connect. */
 		applier_start(replica->applier);
 	}
+	if (replicaset_quorum() == 0) {
+		/*
+		 * Leaving orphan mode, since
+		 * replication_connect_quorum is set to 0.
+		 */
+		box_clear_orphan();
+	}
 }
 
 void
diff --git a/test/replication/master_quorum.lua b/test/replication/master_quorum.lua
new file mode 100644
index 000000000..fb5f7ec2b
--- /dev/null
+++ b/test/replication/master_quorum.lua
@@ -0,0 +1,33 @@
+#!/usr/bin/env tarantool
+
+-- get instance name from filename (master_quorum1.lua => master_quorum1)
+local INSTANCE_ID = string.match(arg[0], "%d")
+
+local SOCKET_DIR = require('fio').cwd()
+local function instance_uri(instance_id)
+    --return 'localhost:'..(3310 + instance_id)
+    return SOCKET_DIR..'/master_quorum'..instance_id..'.sock';
+end
+
+-- start console first
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+    listen = instance_uri(INSTANCE_ID);
+--    log_level = 7;
+    replication = {
+        instance_uri(1);
+        instance_uri(2);
+    };
+    replication_connect_quorum = 0;
+    replication_connect_timeout = 0.1;
+})
+
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
+
+box.once("bootstrap", function()
+    box.schema.user.grant("guest", 'replication')
+    box.schema.space.create('test', {engine = engine})
+    box.space.test:create_index('primary')
+end)
diff --git a/test/replication/master_quorum1.lua b/test/replication/master_quorum1.lua
new file mode 120000
index 000000000..07096d4b7
--- /dev/null
+++ b/test/replication/master_quorum1.lua
@@ -0,0 +1 @@
+master_quorum.lua
\ No newline at end of file
diff --git a/test/replication/master_quorum2.lua b/test/replication/master_quorum2.lua
new file mode 120000
index 000000000..07096d4b7
--- /dev/null
+++ b/test/replication/master_quorum2.lua
@@ -0,0 +1 @@
+master_quorum.lua
\ No newline at end of file
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 909bfb55b..8f6e7a070 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -245,6 +245,17 @@ test_run:drop_cluster(SERVERS)
 box.schema.user.grant('guest', 'replication')
 ---
 ...
+space = box.schema.space.create('test', {engine = test_run:get_cfg('engine')});
+---
+...
+index = box.space.test:create_index('primary')
+---
+...
+-- Insert something just to check that replica with quorum = 0 works as expected.
+space:insert{1}
+---
+- [1]
+...
 test_run:cmd("create server replica with rpl_master=default, script='replication/replica_no_quorum.lua'")
 ---
 - true
@@ -261,6 +272,10 @@ box.info.status -- running
 ---
 - running
 ...
+box.space.test:select()
+---
+- - [1]
+...
 test_run:cmd("switch default")
 ---
 - true
@@ -291,6 +306,37 @@ test_run:cmd("switch default")
 ---
 - true
 ...
+-- Check that replica is able to reconnect, case was broken with earlier quorum "fix".
+box.cfg{listen = listen}
+---
+...
+space:insert{2}
+---
+- [2]
+...
+vclock = test_run:get_vclock("default")
+---
+...
+_ = test_run:wait_vclock("replica", vclock)
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.info.status -- running
+---
+- running
+...
+box.space.test:select()
+---
+- - [1]
+  - [2]
+...
+test_run:cmd("switch default")
+---
+- true
+...
 test_run:cmd("stop server replica")
 ---
 - true
@@ -299,9 +345,59 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+space:drop()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
-box.cfg{listen = listen}
+-- Second case, check that master-master works.
+SERVERS = {'master_quorum1', 'master_quorum2'}
+---
+...
+-- Deploy a cluster.
+test_run:create_cluster(SERVERS)
+---
+...
+test_run:wait_fullmesh(SERVERS)
+---
+...
+test_run:cmd("switch master_quorum1")
+---
+- true
+...
+repl = box.cfg.replication
+---
+...
+box.cfg{replication = ""}
+---
+...
+box.space.test:insert{1}
+---
+- [1]
+...
+box.cfg{replication = repl}
+---
+...
+vclock = test_run:get_vclock("master_quorum1")
+---
+...
+_ = test_run:wait_vclock("master_quorum2", vclock)
+---
+...
+test_run:cmd("switch master_quorum2")
+---
+- true
+...
+box.space.test:select()
+---
+- - [1]
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- Cleanup.
+test_run:drop_cluster(SERVERS)
 ---
 ...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index a96dec759..1df0ae1e7 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -103,10 +103,15 @@ test_run:drop_cluster(SERVERS)
 --
 
 box.schema.user.grant('guest', 'replication')
+space = box.schema.space.create('test', {engine = test_run:get_cfg('engine')});
+index = box.space.test:create_index('primary')
+-- Insert something just to check that replica with quorum = 0 works as expected.
+space:insert{1}
 test_run:cmd("create server replica with rpl_master=default, script='replication/replica_no_quorum.lua'")
 test_run:cmd("start server replica")
 test_run:cmd("switch replica")
 box.info.status -- running
+box.space.test:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 listen = box.cfg.listen
@@ -115,7 +120,33 @@ test_run:cmd("start server replica")
 test_run:cmd("switch replica")
 box.info.status -- running
 test_run:cmd("switch default")
+-- Check that replica is able to reconnect, case was broken with earlier quorum "fix".
+box.cfg{listen = listen}
+space:insert{2}
+vclock = test_run:get_vclock("default")
+_ = test_run:wait_vclock("replica", vclock)
+test_run:cmd("switch replica")
+box.info.status -- running
+box.space.test:select()
+test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+space:drop()
 box.schema.user.revoke('guest', 'replication')
-box.cfg{listen = listen}
+-- Second case, check that master-master works.
+SERVERS = {'master_quorum1', 'master_quorum2'}
+-- Deploy a cluster.
+test_run:create_cluster(SERVERS)
+test_run:wait_fullmesh(SERVERS)
+test_run:cmd("switch master_quorum1")
+repl = box.cfg.replication
+box.cfg{replication = ""}
+box.space.test:insert{1}
+box.cfg{replication = repl}
+vclock = test_run:get_vclock("master_quorum1")
+_ = test_run:wait_vclock("master_quorum2", vclock)
+test_run:cmd("switch master_quorum2")
+box.space.test:select()
+test_run:cmd("switch default")
+-- Cleanup.
+test_run:drop_cluster(SERVERS)
-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [tarantool-patches] [PATCH] replication: fix broken cases with quorum=0
  2018-04-17 14:07 [PATCH] replication: fix broken cases with quorum=0 Konstantin Belyavskiy
@ 2018-04-17 18:09 ` Vladislav Shpilevoy
  2018-04-18  8:07   ` [tarantool-patches] " Konstantin Belyavskiy
  2018-04-18 10:33 ` Vladimir Davydov
  1 sibling, 1 reply; 4+ messages in thread
From: Vladislav Shpilevoy @ 2018-04-17 18:09 UTC (permalink / raw)
  To: tarantool-patches, Konstantin Belyavskiy, vdavydov, georgy

Hello, see my comments below.

On 17/04/2018 17:07, Konstantin Belyavskiy wrote:
> Ticket: https://github.com/tarantool/tarantool/issues/3278
> branch: https://github.com/tarantool/tarantool/compare/gh-3278-quorum-fix

1. Please, put branch and issue links after '---'.

2. The patch still does not work on a case with two nodes,
when one is not available. See the test in the issue comments.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [tarantool-patches] Re: [PATCH] replication: fix broken cases with quorum=0
  2018-04-17 18:09 ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-04-18  8:07   ` Konstantin Belyavskiy
  0 siblings, 0 replies; 4+ messages in thread
From: Konstantin Belyavskiy @ 2018-04-18  8:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov, georgy

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

Hello.
Vlad, please explain second case, if it is not available during bootstrap,
it's a different case and won't be fixed by this patch.
If it happens after bootstrap how can I reproduce it?

I perform simple check here:
1. Bootstrap. Run two master instances in a full mesh.
2. Shutdown both.
3a. Run second with replication_connect_quorum = 0
      It has status "running" and is read-write.
3b. Run second without replication_connect_quorum
      It has status "orphan" and is read-only. Is this not an expected behaviour and/or how have you managed to get other?

>Вторник, 17 апреля 2018, 21:10 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
>
>Hello, see my comments below.
>
>On 17/04/2018 17:07, Konstantin Belyavskiy wrote:
>> Ticket:  https://github.com/tarantool/tarantool/issues/3278
>> branch:  https://github.com/tarantool/tarantool/compare/gh-3278-quorum-fix
>
>1. Please, put branch and issue links after '---'.
>
>2. The patch still does not work on a case with two nodes,
>when one is not available. See the test in the issue comments.
>
>


Best regards,
Konstantin Belyavskiy
k.belyavskiy@tarantool.org

[-- Attachment #2: Type: text/html, Size: 1951 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] replication: fix broken cases with quorum=0
  2018-04-17 14:07 [PATCH] replication: fix broken cases with quorum=0 Konstantin Belyavskiy
  2018-04-17 18:09 ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-04-18 10:33 ` Vladimir Davydov
  1 sibling, 0 replies; 4+ messages in thread
From: Vladimir Davydov @ 2018-04-18 10:33 UTC (permalink / raw)
  To: Konstantin Belyavskiy; +Cc: georgy, tarantool-patches

On Tue, Apr 17, 2018 at 05:07:26PM +0300, Konstantin Belyavskiy wrote:
> Ticket: https://github.com/tarantool/tarantool/issues/3278
> branch: https://github.com/tarantool/tarantool/compare/gh-3278-quorum-fix
> 
> This commit is related with 6d81fa99
> With replication_connect_quorum=0 set, previous commit broke replication
> since skip applier_resume() and applier_start() parts.
> Fix it and add more test cases.
> 
> Close #3278
> ---
>  src/box/replication.cc              | 13 +++--
>  test/replication/master_quorum.lua  | 33 +++++++++++++
>  test/replication/master_quorum1.lua |  1 +
>  test/replication/master_quorum2.lua |  1 +
>  test/replication/quorum.result      | 98 ++++++++++++++++++++++++++++++++++++-
>  test/replication/quorum.test.lua    | 33 ++++++++++++-
>  6 files changed, 173 insertions(+), 6 deletions(-)
>  create mode 100644 test/replication/master_quorum.lua
>  create mode 120000 test/replication/master_quorum1.lua
>  create mode 120000 test/replication/master_quorum2.lua

Pushed to 1.9.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-04-18 10:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 14:07 [PATCH] replication: fix broken cases with quorum=0 Konstantin Belyavskiy
2018-04-17 18:09 ` [tarantool-patches] " Vladislav Shpilevoy
2018-04-18  8:07   ` [tarantool-patches] " Konstantin Belyavskiy
2018-04-18 10:33 ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox