Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test
@ 2019-11-25 11:50 Ilya Kosarev
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 1/4] test: update test-run Ilya Kosarev
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Ilya Kosarev @ 2019-11-25 11:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

This patchset fixes appliers pruning in replicaset_update, anon
replicas iteration issues in replicaset_follow and stabilizes quorum
test. It also stabilizes tcp_connect in test_run:cmd().

Changes in v2:
- fixed mac os specific instability in quorum test
- added appropriate test case for anon replicas iteration
- fixed appliers pruning (fail discovered with test case mentioned above)
- bumped test-run to stabilize tcp_connect in test_run:cmd()

Changes in v3:
- simplified mac os specific instability fix in quorum test
- improved appliers pruning fix

Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4586-fix-quorum-test
Issues: https://github.com/tarantool/tarantool/issues/4586
        https://github.com/tarantool/tarantool/issues/4643
        https://github.com/tarantool/tarantool/issues/4576
        https://github.com/tarantool/tarantool/issues/4440
        https://github.com/tarantool/test-run/issues/193

Ilya Kosarev (4):
  test: update test-run
  replication: fix appliers pruning
  replication: make anon replicas iteration safe
  test: stabilize quorum test conditions

 src/box/replication.cc                        | 19 ++++++----
 test-run                                      |  2 +-
 .../box_set_replication_stress.result         | 38 +++++++++++++++++++
 .../box_set_replication_stress.test.lua       | 17 +++++++++
 test/replication/quorum.result                | 34 ++++++++++-------
 test/replication/quorum.test.lua              | 19 +++++-----
 6 files changed, 98 insertions(+), 31 deletions(-)
 create mode 100644 test/replication/box_set_replication_stress.result
 create mode 100644 test/replication/box_set_replication_stress.test.lua

-- 
2.17.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH v3 1/4] test: update test-run
  2019-11-25 11:50 [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Ilya Kosarev
@ 2019-11-25 11:50 ` Ilya Kosarev
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 2/4] replication: fix appliers pruning Ilya Kosarev
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ilya Kosarev @ 2019-11-25 11:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

Stabilize tcp_connect in test_run:cmd() (tarantool/test-run#193)
---
 test-run | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test-run b/test-run
index e665a6601..b9dafc8ca 160000
--- a/test-run
+++ b/test-run
@@ -1 +1 @@
-Subproject commit e665a66013dc1d2edf11f53a35fbdd1128b2274b
+Subproject commit b9dafc8cadf9edf3c76146d5a289ea90055962ee
-- 
2.17.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH v3 2/4] replication: fix appliers pruning
  2019-11-25 11:50 [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Ilya Kosarev
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 1/4] test: update test-run Ilya Kosarev
@ 2019-11-25 11:50 ` Ilya Kosarev
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 3/4] replication: make anon replicas iteration safe Ilya Kosarev
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ilya Kosarev @ 2019-11-25 11:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

During pruning of appliers some anon replicas might connect
from replicaset_follow called in another fiber. Therefore we need to
prune appliers of anon replicas first and, moreover, prune them one by
one instead of iterating them, as far as any of them might connect
while we are stopping the other one and it will break iteration.

Part of #4586
Closes #4643
---
 src/box/replication.cc | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/box/replication.cc b/src/box/replication.cc
index 509187cd3..f15e51f4d 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -516,23 +516,25 @@ replicaset_update(struct applier **appliers, int count)
 	 */
 
 	/* Prune old appliers */
-	replicaset_foreach(replica) {
-		if (replica->applier == NULL)
-			continue;
+	while (!rlist_empty(&replicaset.anon)) {
+		replica = rlist_first_entry(&replicaset.anon,
+					    typeof(*replica), in_anon);
 		applier = replica->applier;
 		replica_clear_applier(replica);
-		replica->applier_sync_state = APPLIER_DISCONNECTED;
+		rlist_del_entry(replica, in_anon);
+		replica_delete(replica);
 		applier_stop(applier);
 		applier_delete(applier);
 	}
-	rlist_foreach_entry_safe(replica, &replicaset.anon, in_anon, next) {
+	replicaset_foreach(replica) {
+		if (replica->applier == NULL)
+			continue;
 		applier = replica->applier;
 		replica_clear_applier(replica);
-		replica_delete(replica);
+		replica->applier_sync_state = APPLIER_DISCONNECTED;
 		applier_stop(applier);
 		applier_delete(applier);
 	}
-	rlist_create(&replicaset.anon);
 
 	/* Save new appliers */
 	replicaset.applier.total = count;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH v3 3/4] replication: make anon replicas iteration safe
  2019-11-25 11:50 [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Ilya Kosarev
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 1/4] test: update test-run Ilya Kosarev
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 2/4] replication: fix appliers pruning Ilya Kosarev
@ 2019-11-25 11:50 ` Ilya Kosarev
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 4/4] test: stabilize quorum test conditions Ilya Kosarev
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ilya Kosarev @ 2019-11-25 11:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

In replicaset_follow we iterate anon replicas list: list of replicas
that haven't received an UUID. In case of successful connect replica
link is being removed from anon list. If it happens immediately,
without yield in applier, iteration breaks. Now it is fixed by
rlist_foreach_entry_safe instead of common rlist_foreach_entry.
Relevant test case is added.

Part of #4586
Closes #4576
Closes #4440
---
 src/box/replication.cc                        |  3 +-
 .../box_set_replication_stress.result         | 38 +++++++++++++++++++
 .../box_set_replication_stress.test.lua       | 17 +++++++++
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 test/replication/box_set_replication_stress.result
 create mode 100644 test/replication/box_set_replication_stress.test.lua

diff --git a/src/box/replication.cc b/src/box/replication.cc
index f15e51f4d..81f19aa07 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -802,7 +802,8 @@ replicaset_follow(void)
 		if (replica->applier != NULL)
 			applier_resume(replica->applier);
 	}
-	rlist_foreach_entry(replica, &replicaset.anon, in_anon) {
+	struct replica *tmp;
+	rlist_foreach_entry_safe(replica, &replicaset.anon, in_anon, tmp) {
 		/* Restart appliers that failed to connect. */
 		applier_start(replica->applier);
 	}
diff --git a/test/replication/box_set_replication_stress.result b/test/replication/box_set_replication_stress.result
new file mode 100644
index 000000000..e683c0643
--- /dev/null
+++ b/test/replication/box_set_replication_stress.result
@@ -0,0 +1,38 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+SERVERS = {'master_quorum1', 'master_quorum2'}
+ | ---
+ | ...
+
+-- Deploy a cluster.
+test_run:create_cluster(SERVERS)
+ | ---
+ | ...
+test_run:wait_fullmesh(SERVERS)
+ | ---
+ | ...
+
+test_run:cmd("switch master_quorum1")
+ | ---
+ | - true
+ | ...
+repl = box.cfg.replication
+ | ---
+ | ...
+for i = 1, 1000 do              \
+    box.cfg{replication = ""}   \
+    box.cfg{replication = repl} \
+end
+ | ---
+ | ...
+test_run:cmd("switch default")
+ | ---
+ | - true
+ | ...
+
+-- Cleanup.
+test_run:drop_cluster(SERVERS)
+ | ---
+ | ...
diff --git a/test/replication/box_set_replication_stress.test.lua b/test/replication/box_set_replication_stress.test.lua
new file mode 100644
index 000000000..407e91e0f
--- /dev/null
+++ b/test/replication/box_set_replication_stress.test.lua
@@ -0,0 +1,17 @@
+test_run = require('test_run').new()
+SERVERS = {'master_quorum1', 'master_quorum2'}
+
+-- Deploy a cluster.
+test_run:create_cluster(SERVERS)
+test_run:wait_fullmesh(SERVERS)
+
+test_run:cmd("switch master_quorum1")
+repl = box.cfg.replication
+for i = 1, 1000 do              \
+    box.cfg{replication = ""}   \
+    box.cfg{replication = repl} \
+end
+test_run:cmd("switch default")
+
+-- Cleanup.
+test_run:drop_cluster(SERVERS)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH v3 4/4] test: stabilize quorum test conditions
  2019-11-25 11:50 [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Ilya Kosarev
                   ` (2 preceding siblings ...)
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 3/4] replication: make anon replicas iteration safe Ilya Kosarev
@ 2019-11-25 11:50 ` Ilya Kosarev
  2019-11-25 23:04 ` [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Vladislav Shpilevoy
  2019-12-02 11:47 ` Kirill Yukhin
  5 siblings, 0 replies; 7+ messages in thread
From: Ilya Kosarev @ 2019-11-25 11:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

There were some pass conditions in quorum test which could take some
time to be satisfied. Now they are wrapped using test_run:wait_cond to
make the test stable.

Closes #4586
---
 test/replication/quorum.result   | 34 ++++++++++++++++++++------------
 test/replication/quorum.test.lua | 19 +++++++++---------
 2 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index ff5fa0150..07abe7f2a 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -40,6 +40,10 @@ box.info.ro -- true
 ---
 - true
 ...
+test_run:wait_cond(function() return box.space.test ~= nil end, 20)
+---
+- true
+...
 box.space.test:replace{100} -- error
 ---
 - error: Can't modify data because this instance is in read-only mode.
@@ -115,15 +119,15 @@ box.info.status -- running
 - running
 ...
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 ---
 - true
 ...
@@ -149,6 +153,10 @@ test_run:cmd('stop server quorum1')
 ---
 - true
 ...
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
+---
+- true
+...
 for i = 1, 100 do box.space.test:insert{i} end
 ---
 ...
@@ -166,9 +174,9 @@ test_run:cmd('switch quorum1')
 ---
 - true
 ...
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -197,9 +205,9 @@ test_run:cmd('switch quorum1')
 - true
 ...
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 ---
-- 100
+- true
 ...
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
@@ -207,11 +215,9 @@ test_run:cmd('switch quorum2')
 ---
 - true
 ...
-fiber = require('fiber')
----
-...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 ---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
@@ -221,11 +227,13 @@ test_run:cmd('switch quorum3')
 ---
 - true
 ...
-fiber = require('fiber')
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
 ---
+- true
 ...
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 ---
+- true
 ...
 box.info.replication[4].upstream.status
 ---
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 98febb367..5f2872675 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -22,6 +22,7 @@ test_run:cmd('restart server quorum2 with args="0.1 0.5"')
 box.info.status -- orphan
 box.ctl.wait_rw(0.001) -- timeout
 box.info.ro -- true
+test_run:wait_cond(function() return box.space.test ~= nil end, 20)
 box.space.test:replace{100} -- error
 box.cfg{replication={}}
 box.info.status -- running
@@ -47,9 +48,9 @@ box.info.ro -- false
 box.info.status -- running
 
 -- Check that the replica follows all masters.
-box.info.id == 1 or box.info.replication[1].upstream.status == 'follow'
-box.info.id == 2 or box.info.replication[2].upstream.status == 'follow'
-box.info.id == 3 or box.info.replication[3].upstream.status == 'follow'
+box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20)
+box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20)
+box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20)
 
 -- Check that box.cfg() doesn't return until the instance
 -- catches up with all configured replicas.
@@ -59,13 +60,14 @@ test_run:cmd('switch quorum2')
 box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
 test_run:cmd('stop server quorum1')
 
+test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
 for i = 1, 100 do box.space.test:insert{i} end
 fiber = require('fiber')
 fiber.sleep(0.1)
 
 test_run:cmd('start server quorum1 with args="0.1  0.5"')
 test_run:cmd('switch quorum1')
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- Rebootstrap one node of the cluster and check that others follow.
 -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
@@ -81,17 +83,16 @@ box.snapshot()
 test_run:cmd('switch quorum1')
 test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
 
-box.space.test:count() -- 100
+test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
 
 -- The rebootstrapped replica will be assigned id = 4,
 -- because ids 1..3 are busy.
 test_run:cmd('switch quorum2')
-fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 test_run:cmd('switch quorum3')
-fiber = require('fiber')
-while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end
+test_run:wait_cond(function() return box.info.replication ~= nil end, 20)
+test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20)
 box.info.replication[4].upstream.status
 
 -- Cleanup.
-- 
2.17.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test
  2019-11-25 11:50 [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Ilya Kosarev
                   ` (3 preceding siblings ...)
  2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 4/4] test: stabilize quorum test conditions Ilya Kosarev
@ 2019-11-25 23:04 ` Vladislav Shpilevoy
  2019-12-02 11:47 ` Kirill Yukhin
  5 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2019-11-25 23:04 UTC (permalink / raw)
  To: Ilya Kosarev, tarantool-patches

Thanks for the patchset!

LGTM.

On 25/11/2019 12:50, Ilya Kosarev wrote:
> This patchset fixes appliers pruning in replicaset_update, anon
> replicas iteration issues in replicaset_follow and stabilizes quorum
> test. It also stabilizes tcp_connect in test_run:cmd().
> 
> Changes in v2:
> - fixed mac os specific instability in quorum test
> - added appropriate test case for anon replicas iteration
> - fixed appliers pruning (fail discovered with test case mentioned above)
> - bumped test-run to stabilize tcp_connect in test_run:cmd()
> 
> Changes in v3:
> - simplified mac os specific instability fix in quorum test
> - improved appliers pruning fix
> 
> Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4586-fix-quorum-test
> Issues: https://github.com/tarantool/tarantool/issues/4586
>         https://github.com/tarantool/tarantool/issues/4643
>         https://github.com/tarantool/tarantool/issues/4576
>         https://github.com/tarantool/tarantool/issues/4440
>         https://github.com/tarantool/test-run/issues/193
> 
> Ilya Kosarev (4):
>   test: update test-run
>   replication: fix appliers pruning
>   replication: make anon replicas iteration safe
>   test: stabilize quorum test conditions
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test
  2019-11-25 11:50 [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Ilya Kosarev
                   ` (4 preceding siblings ...)
  2019-11-25 23:04 ` [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Vladislav Shpilevoy
@ 2019-12-02 11:47 ` Kirill Yukhin
  5 siblings, 0 replies; 7+ messages in thread
From: Kirill Yukhin @ 2019-12-02 11:47 UTC (permalink / raw)
  To: Ilya Kosarev; +Cc: tarantool-patches, v.shpilevoy

Helllo,

On 25 ноя 14:50, Ilya Kosarev wrote:
> This patchset fixes appliers pruning in replicaset_update, anon
> replicas iteration issues in replicaset_follow and stabilizes quorum
> test. It also stabilizes tcp_connect in test_run:cmd().
> 
> Changes in v2:
> - fixed mac os specific instability in quorum test
> - added appropriate test case for anon replicas iteration
> - fixed appliers pruning (fail discovered with test case mentioned above)
> - bumped test-run to stabilize tcp_connect in test_run:cmd()
> 
> Changes in v3:
> - simplified mac os specific instability fix in quorum test
> - improved appliers pruning fix
> 
> Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4586-fix-quorum-test
> Issues: https://github.com/tarantool/tarantool/issues/4586
>         https://github.com/tarantool/tarantool/issues/4643
>         https://github.com/tarantool/tarantool/issues/4576
>         https://github.com/tarantool/tarantool/issues/4440
>         https://github.com/tarantool/test-run/issues/193
> 
> Ilya Kosarev (4):
>   test: update test-run

Sasha, could you please take a look and if OK - commit
to test-run & bump new version?

>   replication: fix appliers pruning
>   replication: make anon replicas iteration safe
>   test: stabilize quorum test conditions

I've checked 3 patches into master, 2.2 and 1.10.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-02 11:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-25 11:50 [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Ilya Kosarev
2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 1/4] test: update test-run Ilya Kosarev
2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 2/4] replication: fix appliers pruning Ilya Kosarev
2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 3/4] replication: make anon replicas iteration safe Ilya Kosarev
2019-11-25 11:50 ` [Tarantool-patches] [PATCH v3 4/4] test: stabilize quorum test conditions Ilya Kosarev
2019-11-25 23:04 ` [Tarantool-patches] [PATCH v3 0/4] fix box_set_replication issues & stabilize quorum test Vladislav Shpilevoy
2019-12-02 11:47 ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox