* [Tarantool-patches] [PATCH v2 0/4] Boot with anon
@ 2020-09-14 23:11 Vladislav Shpilevoy
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-14 23:11 UTC (permalink / raw)
To: tarantool-patches, gorcunov, sergepetrenko
The patch attempts to address with problem of anonymous replicas being
registered in _cluster, if they are present during bootstrap.
The bug was found during working on another issue related to Raft. The problem
is that Raft won't work properly during bootstrap if non-joined replicas are
registered in _cluster.
When their auto-registration by applier was removed, the anon bug was found.
The auto-registration removal is trivial, but it breaks the cluster bootstrap in
another way creating false-positive XlogGap errors. See the second commit with
an explanation. To solve the issue quite a radical solution is applied - gap
errors are not considered critical anymore, and can be retried. I am not sure
that is the best option, but couldn't come up with anything better after a long
struggle with that.
This is a bug, so whatever we will come up with after all, it should be pushed
to the older versions too.
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register
Issue: https://github.com/tarantool/tarantool/issues/5287
Changes in v2:
- Anon status is stored as a flag again. In v1 it was stored as enum, but an
alternative solution was proposed, where the enum is not needed.
- Ballot now has a new field is_anon. It helps to avoid the enum, and set
replica->anon flag to a correct value right when it becomes connected. Through
relay or applier, either.
@ChangeLog
* Anonymous replica could be registered and could prevent WAL files removal (gh-5287).
* XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287).
Vladislav Shpilevoy (4):
xlog: introduce an error code for XlogGapError
replication: retry in case of XlogGapError
replication: add is_anon flag to ballot
replication: do not register outgoing connections
src/box/applier.cc | 40 ++++
src/box/box.cc | 30 +--
src/box/errcode.h | 2 +
src/box/error.cc | 2 +
src/box/error.h | 1 +
src/box/iproto_constants.h | 1 +
src/box/recovery.h | 2 -
src/box/replication.cc | 14 +-
src/box/xrow.c | 14 +-
src/box/xrow.h | 5 +
test/box/error.result | 2 +
test/replication/autobootstrap_anon.lua | 25 +++
test/replication/autobootstrap_anon1.lua | 1 +
test/replication/autobootstrap_anon2.lua | 1 +
test/replication/force_recovery.result | 110 -----------
test/replication/force_recovery.test.lua | 43 -----
test/replication/gh-5287-boot-anon.result | 81 +++++++++
test/replication/gh-5287-boot-anon.test.lua | 30 +++
test/replication/prune.result | 18 +-
test/replication/prune.test.lua | 7 +-
test/replication/replica.lua | 2 +
test/replication/replica_rejoin.result | 6 +-
test/replication/replica_rejoin.test.lua | 4 +-
.../show_error_on_disconnect.result | 2 +-
.../show_error_on_disconnect.test.lua | 2 +-
test/xlog/panic_on_wal_error.result | 171 ------------------
test/xlog/panic_on_wal_error.test.lua | 75 --------
27 files changed, 262 insertions(+), 429 deletions(-)
create mode 100644 test/replication/autobootstrap_anon.lua
create mode 120000 test/replication/autobootstrap_anon1.lua
create mode 120000 test/replication/autobootstrap_anon2.lua
delete mode 100644 test/replication/force_recovery.result
delete mode 100644 test/replication/force_recovery.test.lua
create mode 100644 test/replication/gh-5287-boot-anon.result
create mode 100644 test/replication/gh-5287-boot-anon.test.lua
delete mode 100644 test/xlog/panic_on_wal_error.result
delete mode 100644 test/xlog/panic_on_wal_error.test.lua
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError
2020-09-14 23:11 [Tarantool-patches] [PATCH v2 0/4] Boot with anon Vladislav Shpilevoy
@ 2020-09-14 23:11 ` Vladislav Shpilevoy
2020-09-15 7:53 ` Serge Petrenko
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
` (3 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-14 23:11 UTC (permalink / raw)
To: tarantool-patches, gorcunov, sergepetrenko
XlogGapError object didn't have a code in ClientError code space.
Because of that it was not possible to handle the gap error
together with client errors in some switch-case statement.
Now the gap error has a code.
This is going to be used in applier code to handle XlogGapError
among other errors using its code instead of RTTI.
Needed for #5287
---
src/box/errcode.h | 1 +
src/box/error.cc | 2 ++
src/box/error.h | 1 +
src/box/recovery.h | 2 --
test/box/error.result | 1 +
5 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 3c21375f5..9a240a431 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -271,6 +271,7 @@ struct errcode_record {
/*216 */_(ER_SYNC_QUORUM_TIMEOUT, "Quorum collection for a synchronous transaction is timed out") \
/*217 */_(ER_SYNC_ROLLBACK, "A rollback for a synchronous transaction is received") \
/*218 */_(ER_TUPLE_METADATA_IS_TOO_BIG, "Can't create tuple: metadata size %u is too big") \
+ /*219 */_(ER_XLOG_GAP, "%s") \
/*
* !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/error.cc b/src/box/error.cc
index c3c2af3ab..ca1d73e0c 100644
--- a/src/box/error.cc
+++ b/src/box/error.cc
@@ -221,6 +221,8 @@ ClientError::get_errcode(const struct error *e)
return ER_SYSTEM;
if (type_cast(CollationError, e))
return ER_CANT_CREATE_COLLATION;
+ if (type_cast(XlogGapError, e))
+ return ER_XLOG_GAP;
return ER_PROC_LUA;
}
diff --git a/src/box/error.h b/src/box/error.h
index 988b98255..338121dd9 100644
--- a/src/box/error.h
+++ b/src/box/error.h
@@ -178,6 +178,7 @@ box_error_new(const char *file, unsigned line, uint32_t code,
extern const struct type_info type_ClientError;
extern const struct type_info type_XlogError;
+extern const struct type_info type_XlogGapError;
extern const struct type_info type_AccessDeniedError;
extern const struct type_info type_CustomError;
diff --git a/src/box/recovery.h b/src/box/recovery.h
index 6e68abc0b..b8d83951a 100644
--- a/src/box/recovery.h
+++ b/src/box/recovery.h
@@ -40,8 +40,6 @@
extern "C" {
#endif /* defined(__cplusplus) */
-extern const struct type_info type_XlogGapError;
-
struct xrow_header;
struct xstream;
diff --git a/test/box/error.result b/test/box/error.result
index cdecdb221..600517316 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -437,6 +437,7 @@ t;
| 216: box.error.SYNC_QUORUM_TIMEOUT
| 217: box.error.SYNC_ROLLBACK
| 218: box.error.TUPLE_METADATA_IS_TOO_BIG
+ | 219: box.error.XLOG_GAP
| ...
test_run:cmd("setopt delimiter ''");
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError
2020-09-14 23:11 [Tarantool-patches] [PATCH v2 0/4] Boot with anon Vladislav Shpilevoy
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
@ 2020-09-14 23:11 ` Vladislav Shpilevoy
2020-09-15 7:35 ` Serge Petrenko
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot Vladislav Shpilevoy
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-14 23:11 UTC (permalink / raw)
To: tarantool-patches, gorcunov, sergepetrenko
Previously XlogGapError was considered a critical error stopping
the replication. That may be not so good as it looks.
XlogGapError is a perfectly fine error, which should not kill the
replication connection. It should be retried instead.
Because here is an example, when the gap can be recovered on its
own. Consider the case: node1 is a leader, it is booted with
vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
also gets vclock {1: 3}. Then node1 writes something and its
vclock becomes {1: 4}. Now node3 boots from node1, and gets the
same vclock. Vclocks now look like this:
- node1: {1: 4}, leader, has {1: 3} snap.
- node2: {1: 3}, booted from node1, has only snap.
- node3: {1: 4}, booted from node1, has only snap.
If the cluster is a fullmesh, node2 will send subscribe requests
with vclock {1: 3}. If node3 receives it, it will respond with
xlog gap error, because it only has a snap with {1: 4}, nothing
else. In that case node2 should retry connecting to node3, and in
the meantime try to get newer changes from node1.
The example is totally valid. However it is unreachable now
because master registers all replicas in _cluster before allowing
them to make a join. So they all bootstrap from a snapshot
containing all their IDs. This is a bug, because such
auto-registration leads to registration of anonymous replicas, if
they are present during bootstrap. Also it blocks Raft, which
can't work if there are registered, but not yet joined nodes.
Once the registration problem will be solved in a next commit, the
XlogGapError will strike quite often during bootstrap. This patch
won't allow that happen.
Needed for #5287
---
src/box/applier.cc | 27 +++
test/replication/force_recovery.result | 110 -----------
test/replication/force_recovery.test.lua | 43 -----
test/replication/replica.lua | 2 +
test/replication/replica_rejoin.result | 6 +-
test/replication/replica_rejoin.test.lua | 4 +-
.../show_error_on_disconnect.result | 2 +-
.../show_error_on_disconnect.test.lua | 2 +-
test/xlog/panic_on_wal_error.result | 171 ------------------
test/xlog/panic_on_wal_error.test.lua | 75 --------
10 files changed, 36 insertions(+), 406 deletions(-)
delete mode 100644 test/replication/force_recovery.result
delete mode 100644 test/replication/force_recovery.test.lua
delete mode 100644 test/xlog/panic_on_wal_error.result
delete mode 100644 test/xlog/panic_on_wal_error.test.lua
diff --git a/src/box/applier.cc b/src/box/applier.cc
index c1d07ca54..96dd48c0d 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -106,6 +106,7 @@ applier_log_error(struct applier *applier, struct error *e)
case ER_SYSTEM:
case ER_UNKNOWN_REPLICA:
case ER_PASSWORD_MISMATCH:
+ case ER_XLOG_GAP:
say_info("will retry every %.2lf second",
replication_reconnect_interval());
break;
@@ -1333,6 +1334,32 @@ applier_f(va_list ap)
applier_disconnect(applier, APPLIER_STOPPED);
return -1;
}
+ } catch (XlogGapError *e) {
+ /*
+ * Xlog gap error can't be a critical error. Because it
+ * is totally normal during bootstrap. Consider the
+ * case: node1 is a leader, it is booted with vclock
+ * {1: 3}. Node2 connects and fetches snapshot of node1,
+ * it also gets vclock {1: 3}. Then node1 writes
+ * something and its vclock becomes {1: 4}. Now node3
+ * boots from node1, and gets the same vclock. Vclocks
+ * now look like this:
+ *
+ * - node1: {1: 4}, leader, has {1: 3} snap.
+ * - node2: {1: 3}, booted from node1, has only snap.
+ * - node3: {1: 4}, booted from node1, has only snap.
+ *
+ * If the cluster is a fullmesh, node2 will send
+ * subscribe requests with vclock {1: 3}. If node3
+ * receives it, it will respond with xlog gap error,
+ * because it only has a snap with {1: 4}, nothing else.
+ * In that case node2 should retry connecting to node3,
+ * and in the meantime try to get newer changes from
+ * node1.
+ */
+ applier_log_error(applier, e);
+ applier_disconnect(applier, APPLIER_LOADING);
+ goto reconnect;
} catch (FiberIsCancelled *e) {
if (!diag_is_empty(&applier->diag)) {
diag_move(&applier->diag, &fiber()->diag);
diff --git a/test/replication/force_recovery.result b/test/replication/force_recovery.result
deleted file mode 100644
index f50452858..000000000
--- a/test/replication/force_recovery.result
+++ /dev/null
@@ -1,110 +0,0 @@
-test_run = require('test_run').new()
----
-...
-fio = require('fio')
----
-...
---
--- Test that box.cfg.force_recovery is ignored by relay threads (gh-3910).
---
-_ = box.schema.space.create('test')
----
-...
-_ = box.space.test:create_index('primary')
----
-...
-box.schema.user.grant('guest', 'replication')
----
-...
--- Deploy a replica.
-test_run:cmd("create server test with rpl_master=default, script='replication/replica.lua'")
----
-- true
-...
-test_run:cmd("start server test")
----
-- true
-...
--- Stop the replica and wait for the relay thread to exit.
-test_run:cmd("stop server test")
----
-- true
-...
-test_run:wait_cond(function() return box.info.replication[2].downstream.status == 'stopped' end, 10)
----
-- true
-...
--- Delete an xlog file that is needed by the replica.
-box.snapshot()
----
-- ok
-...
-xlog = fio.pathjoin(box.cfg.wal_dir, string.format('%020d.xlog', box.info.signature))
----
-...
-box.space.test:replace{1}
----
-- [1]
-...
-box.snapshot()
----
-- ok
-...
-box.space.test:replace{2}
----
-- [2]
-...
-fio.unlink(xlog)
----
-- true
-...
--- Check that even though box.cfg.force_recovery is set,
--- replication will still fail due to LSN gap.
-box.cfg{force_recovery = true}
----
-...
-test_run:cmd("start server test")
----
-- true
-...
-test_run:cmd("switch test")
----
-- true
-...
-box.space.test:select()
----
-- []
-...
-box.info.replication[1].upstream.status == 'stopped' or box.info
----
-- true
-...
-test_run:cmd("switch default")
----
-- true
-...
-box.cfg{force_recovery = false}
----
-...
--- Cleanup.
-test_run:cmd("stop server test")
----
-- true
-...
-test_run:cmd("cleanup server test")
----
-- true
-...
-test_run:cmd("delete server test")
----
-- true
-...
-test_run:cleanup_cluster()
----
-...
-box.schema.user.revoke('guest', 'replication')
----
-...
-box.space.test:drop()
----
-...
diff --git a/test/replication/force_recovery.test.lua b/test/replication/force_recovery.test.lua
deleted file mode 100644
index 54307814b..000000000
--- a/test/replication/force_recovery.test.lua
+++ /dev/null
@@ -1,43 +0,0 @@
-test_run = require('test_run').new()
-fio = require('fio')
-
---
--- Test that box.cfg.force_recovery is ignored by relay threads (gh-3910).
---
-_ = box.schema.space.create('test')
-_ = box.space.test:create_index('primary')
-box.schema.user.grant('guest', 'replication')
-
--- Deploy a replica.
-test_run:cmd("create server test with rpl_master=default, script='replication/replica.lua'")
-test_run:cmd("start server test")
-
--- Stop the replica and wait for the relay thread to exit.
-test_run:cmd("stop server test")
-test_run:wait_cond(function() return box.info.replication[2].downstream.status == 'stopped' end, 10)
-
--- Delete an xlog file that is needed by the replica.
-box.snapshot()
-xlog = fio.pathjoin(box.cfg.wal_dir, string.format('%020d.xlog', box.info.signature))
-box.space.test:replace{1}
-box.snapshot()
-box.space.test:replace{2}
-fio.unlink(xlog)
-
--- Check that even though box.cfg.force_recovery is set,
--- replication will still fail due to LSN gap.
-box.cfg{force_recovery = true}
-test_run:cmd("start server test")
-test_run:cmd("switch test")
-box.space.test:select()
-box.info.replication[1].upstream.status == 'stopped' or box.info
-test_run:cmd("switch default")
-box.cfg{force_recovery = false}
-
--- Cleanup.
-test_run:cmd("stop server test")
-test_run:cmd("cleanup server test")
-test_run:cmd("delete server test")
-test_run:cleanup_cluster()
-box.schema.user.revoke('guest', 'replication')
-box.space.test:drop()
diff --git a/test/replication/replica.lua b/test/replication/replica.lua
index f3a6dfe58..ef0d6d63a 100644
--- a/test/replication/replica.lua
+++ b/test/replication/replica.lua
@@ -1,6 +1,7 @@
#!/usr/bin/env tarantool
repl_include_self = arg[1] and arg[1] == 'true' or false
+repl_quorum = arg[2] and tonumber(arg[2]) or nil
repl_list = nil
if repl_include_self then
@@ -14,6 +15,7 @@ box.cfg({
replication = repl_list,
memtx_memory = 107374182,
replication_timeout = 0.1,
+ replication_connect_quorum = repl_quorum,
})
require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index dd04ae297..7b64c3813 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -213,7 +213,7 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
box.cfg{checkpoint_count = checkpoint_count}
---
...
-test_run:cmd("start server replica with args='true'")
+test_run:cmd("start server replica with args='true 0'")
---
- true
...
@@ -221,9 +221,9 @@ test_run:cmd("switch replica")
---
- true
...
-box.info.status -- orphan
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file'})
---
-- orphan
+- true
...
box.space.test:select()
---
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 410ef44d7..dcd0557cb 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -79,9 +79,9 @@ for i = 1, 3 do box.space.test:insert{i * 100} end
fio = require('fio')
test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) == 1 end) or fio.pathjoin(box.cfg.wal_dir, '*.xlog')
box.cfg{checkpoint_count = checkpoint_count}
-test_run:cmd("start server replica with args='true'")
+test_run:cmd("start server replica with args='true 0'")
test_run:cmd("switch replica")
-box.info.status -- orphan
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file'})
box.space.test:select()
--
diff --git a/test/replication/show_error_on_disconnect.result b/test/replication/show_error_on_disconnect.result
index 48003db06..19d3886e4 100644
--- a/test/replication/show_error_on_disconnect.result
+++ b/test/replication/show_error_on_disconnect.result
@@ -72,7 +72,7 @@ box.space.test:select()
other_id = box.info.id % 2 + 1
---
...
-test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
+test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
---
- true
...
diff --git a/test/replication/show_error_on_disconnect.test.lua b/test/replication/show_error_on_disconnect.test.lua
index 1b0ea4373..dce926a34 100644
--- a/test/replication/show_error_on_disconnect.test.lua
+++ b/test/replication/show_error_on_disconnect.test.lua
@@ -29,7 +29,7 @@ test_run:cmd("switch master_quorum1")
box.cfg{replication = repl}
box.space.test:select()
other_id = box.info.id % 2 + 1
-test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
+test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
test_run:cmd("switch master_quorum2")
box.space.test:select()
other_id = box.info.id % 2 + 1
diff --git a/test/xlog/panic_on_wal_error.result b/test/xlog/panic_on_wal_error.result
deleted file mode 100644
index 22f14f912..000000000
--- a/test/xlog/panic_on_wal_error.result
+++ /dev/null
@@ -1,171 +0,0 @@
--- preparatory stuff
-env = require('test_run')
----
-...
-test_run = env.new()
----
-...
-test_run:cmd("restart server default with cleanup=True")
-box.schema.user.grant('guest', 'replication')
----
-...
-_ = box.schema.space.create('test')
----
-...
-_ = box.space.test:create_index('pk')
----
-...
---
--- reopen xlog
---
-test_run:cmd("restart server default")
-box.space.test ~= nil
----
-- true
-...
--- insert some stuff
---
-box.space.test:auto_increment{'before snapshot'}
----
-- [1, 'before snapshot']
-...
---
--- this snapshot will go to the replica
---
-box.snapshot()
----
-- ok
-...
---
--- create a replica, let it catch up somewhat
---
-test_run:cmd("create server replica with rpl_master=default, script='xlog/replica.lua'")
----
-- true
-...
-test_run:cmd("start server replica")
----
-- true
-...
-test_run:cmd("switch replica")
----
-- true
-...
-box.space.test:select{}
----
-- - [1, 'before snapshot']
-...
---
--- stop replica, restart the master, insert more stuff
--- which will make it into an xlog only
---
-test_run:cmd("switch default")
----
-- true
-...
-test_run:cmd("stop server replica")
----
-- true
-...
-test_run:cmd("restart server default")
-box.space.test:auto_increment{'after snapshot'}
----
-- [2, 'after snapshot']
-...
-box.space.test:auto_increment{'after snapshot - one more row'}
----
-- [3, 'after snapshot - one more row']
-...
---
--- save snapshot and remove xlogs
---
-box.snapshot()
----
-- ok
-...
-fio = require('fio')
----
-...
-glob = fio.pathjoin(box.cfg.wal_dir, '*.xlog')
----
-...
-files = fio.glob(glob)
----
-...
-for _, file in pairs(files) do fio.unlink(file) end
----
-...
---
--- make sure the server has some xlogs, otherwise the
--- replica doesn't discover the gap in the logs
---
-box.space.test:auto_increment{'after snapshot and restart'}
----
-- [4, 'after snapshot and restart']
-...
-box.space.test:auto_increment{'after snapshot and restart - one more row'}
----
-- [5, 'after snapshot and restart - one more row']
-...
---
--- check that panic is true
---
-box.cfg{force_recovery=false}
----
-...
-box.cfg.force_recovery
----
-- false
-...
---
--- try to start the replica, ha-ha
--- (replication should fail, some rows are missing)
---
-test_run:cmd("start server replica")
----
-- true
-...
-test_run:cmd("switch replica")
----
-- true
-...
-fiber = require('fiber')
----
-...
-while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
----
-...
-box.info.replication[1].upstream.status
----
-- stopped
-...
-box.info.replication[1].upstream.message
----
-- 'Missing .xlog file between LSN 6 {1: 6} and 8 {1: 8}'
-...
-box.space.test:select{}
----
-- - [1, 'before snapshot']
-...
---
---
-test_run:cmd("switch default")
----
-- true
-...
-test_run:cmd("stop server replica")
----
-- true
-...
-test_run:cmd("cleanup server replica")
----
-- true
-...
---
--- cleanup
-box.space.test:drop()
----
-...
-box.schema.user.revoke('guest', 'replication')
----
-...
diff --git a/test/xlog/panic_on_wal_error.test.lua b/test/xlog/panic_on_wal_error.test.lua
deleted file mode 100644
index 2e95431c6..000000000
--- a/test/xlog/panic_on_wal_error.test.lua
+++ /dev/null
@@ -1,75 +0,0 @@
--- preparatory stuff
-env = require('test_run')
-test_run = env.new()
-
-test_run:cmd("restart server default with cleanup=True")
-box.schema.user.grant('guest', 'replication')
-_ = box.schema.space.create('test')
-_ = box.space.test:create_index('pk')
---
--- reopen xlog
---
-test_run:cmd("restart server default")
-box.space.test ~= nil
--- insert some stuff
---
-box.space.test:auto_increment{'before snapshot'}
---
--- this snapshot will go to the replica
---
-box.snapshot()
---
--- create a replica, let it catch up somewhat
---
-test_run:cmd("create server replica with rpl_master=default, script='xlog/replica.lua'")
-test_run:cmd("start server replica")
-test_run:cmd("switch replica")
-box.space.test:select{}
---
--- stop replica, restart the master, insert more stuff
--- which will make it into an xlog only
---
-test_run:cmd("switch default")
-test_run:cmd("stop server replica")
-test_run:cmd("restart server default")
-box.space.test:auto_increment{'after snapshot'}
-box.space.test:auto_increment{'after snapshot - one more row'}
---
--- save snapshot and remove xlogs
---
-box.snapshot()
-fio = require('fio')
-glob = fio.pathjoin(box.cfg.wal_dir, '*.xlog')
-files = fio.glob(glob)
-for _, file in pairs(files) do fio.unlink(file) end
---
--- make sure the server has some xlogs, otherwise the
--- replica doesn't discover the gap in the logs
---
-box.space.test:auto_increment{'after snapshot and restart'}
-box.space.test:auto_increment{'after snapshot and restart - one more row'}
---
--- check that panic is true
---
-box.cfg{force_recovery=false}
-box.cfg.force_recovery
---
--- try to start the replica, ha-ha
--- (replication should fail, some rows are missing)
---
-test_run:cmd("start server replica")
-test_run:cmd("switch replica")
-fiber = require('fiber')
-while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
-box.info.replication[1].upstream.status
-box.info.replication[1].upstream.message
-box.space.test:select{}
---
---
-test_run:cmd("switch default")
-test_run:cmd("stop server replica")
-test_run:cmd("cleanup server replica")
---
--- cleanup
-box.space.test:drop()
-box.schema.user.revoke('guest', 'replication')
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot
2020-09-14 23:11 [Tarantool-patches] [PATCH v2 0/4] Boot with anon Vladislav Shpilevoy
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
@ 2020-09-14 23:11 ` Vladislav Shpilevoy
2020-09-15 7:46 ` Serge Petrenko
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
2020-09-17 12:08 ` [Tarantool-patches] [PATCH v2 0/4] Boot with anon Kirill Yukhin
4 siblings, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-14 23:11 UTC (permalink / raw)
To: tarantool-patches, gorcunov, sergepetrenko
Ballot is a message sent in response on vote request, which is
sent by applier first thing after connection establishment.
It contains basic info about the remote instance such as whether
it is read only, if it is still loading, and more.
The ballot didn't contain a flag whether the instance is
anonymous. That led to a problem, when applier was connected to a
remote instance, was added to struct replicaset inside a struct
replica object, but it was unknown whether it is anonymous. It was
added as not anonymous by default.
If the remote instance was in fact anonymous and sent a subscribe
response back to the first instance with the anon flag = true,
then it looked like the remote instance was not anonymous, and
suddenly became such, without even a reconnect. It could lead to
an assertion.
The bug is hidden behind another bug, because of which the leader
instance on boostrap registers all replicas listed in its
box.cfg.replication, even anonymous ones.
The patch makes the ballot contain the anon flag. Now both relay
and applier send whether their host is anonymous. Relay does it by
sending the ballot, applier sends it in scope of subscribe
request. By the time a replica gets UUID and is added into struct
replicaset, its anon flag is determined.
Also the patch makes anon_count updated on each replica hash table
change. Previously it was only updated when something related to
relay was done. Now anon is updated by applier actions too, and
it is not ok to update the counter on relay-specific actions.
The early registration bug is a subject for a next patch.
Part of #5287
@TarantoolBot document
Title: IPROTO_BALLOT_IS_ANON flag
There is a request type IPROTO_BALLOT, with code 0x29. It has
fields IPROTO_BALLOT_IS_RO (0x01), IPROTO_BALLOT_VCLOCK (0x02),
IPROTO_BALLOT_GC_VCLOCK (0x03), IPROTO_BALLOT_IS_LOADING (0x04).
Now it gets a new field IPROTO_BALLOT_IS_ANON (0x05). The field
is a boolean, and equals to box.cfg.replication_anon of the
sender.
---
src/box/box.cc | 1 +
src/box/iproto_constants.h | 1 +
src/box/replication.cc | 14 ++++++++++++--
src/box/xrow.c | 14 ++++++++++++--
src/box/xrow.h | 5 +++++
5 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index faffd5769..145b53ec8 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2080,6 +2080,7 @@ void
box_process_vote(struct ballot *ballot)
{
ballot->is_ro = cfg_geti("read_only") != 0;
+ ballot->is_anon = replication_anon;
/*
* is_ro is true on initial load and is set to box.cfg.read_only
* after box_cfg() returns, during dynamic box.cfg parameters setting.
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 4f5a2b195..9c2fb6058 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -149,6 +149,7 @@ enum iproto_ballot_key {
IPROTO_BALLOT_VCLOCK = 0x02,
IPROTO_BALLOT_GC_VCLOCK = 0x03,
IPROTO_BALLOT_IS_LOADING = 0x04,
+ IPROTO_BALLOT_IS_ANON = 0x05,
};
#define bit(c) (1ULL<<IPROTO_##c)
diff --git a/src/box/replication.cc b/src/box/replication.cc
index ef0e2411d..8408f395e 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -290,6 +290,8 @@ replica_clear_id(struct replica *replica)
}
if (replica_is_orphan(replica)) {
replica_hash_remove(&replicaset.hash, replica);
+ replicaset.anon_count -= replica->anon;
+ assert(replicaset.anon_count >= 0);
replica_delete(replica);
}
}
@@ -332,6 +334,7 @@ replica_on_applier_connect(struct replica *replica)
assert(replica->applier_sync_state == APPLIER_DISCONNECTED);
replica->uuid = applier->uuid;
+ replica->anon = applier->ballot.is_anon;
replica->applier_sync_state = APPLIER_CONNECTED;
replicaset.applier.connected++;
@@ -362,6 +365,7 @@ replica_on_applier_connect(struct replica *replica)
} else {
/* Add a new struct replica */
replica_hash_insert(&replicaset.hash, replica);
+ replicaset.anon_count += replica->anon;
}
}
@@ -390,7 +394,9 @@ replica_on_applier_reconnect(struct replica *replica)
if (orig == NULL) {
orig = replica_new();
orig->uuid = applier->uuid;
+ orig->anon = applier->ballot.is_anon;
replica_hash_insert(&replicaset.hash, orig);
+ replicaset.anon_count += orig->anon;
}
if (orig->applier != NULL) {
@@ -526,6 +532,7 @@ replicaset_update(struct applier **appliers, int count)
assert(!tt_uuid_is_nil(&applier->uuid));
replica->uuid = applier->uuid;
+ replica->anon = applier->ballot.is_anon;
if (replica_hash_search(&uniq, replica) != NULL) {
replica_clear_applier(replica);
@@ -582,6 +589,7 @@ replicaset_update(struct applier **appliers, int count)
} else {
/* Add a new struct replica */
replica_hash_insert(&replicaset.hash, replica);
+ replicaset.anon_count += replica->anon;
}
replica->applier_sync_state = APPLIER_CONNECTED;
@@ -593,6 +601,8 @@ replicaset_update(struct applier **appliers, int count)
replica_hash_foreach_safe(&replicaset.hash, replica, next) {
if (replica_is_orphan(replica)) {
replica_hash_remove(&replicaset.hash, replica);
+ replicaset.anon_count -= replica->anon;
+ assert(replicaset.anon_count >= 0);
replica_delete(replica);
}
}
@@ -907,12 +917,12 @@ replica_on_relay_stop(struct replica *replica)
replica->gc = NULL;
} else {
assert(replica->gc == NULL);
- assert(replicaset.anon_count > 0);
- replicaset.anon_count--;
}
}
if (replica_is_orphan(replica)) {
replica_hash_remove(&replicaset.hash, replica);
+ replicaset.anon_count -= replica->anon;
+ assert(replicaset.anon_count >= 0);
replica_delete(replica);
}
}
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 95ddb1fe7..2edcb2f8f 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -447,9 +447,11 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
uint64_t sync, uint32_t schema_version)
{
size_t max_size = IPROTO_HEADER_LEN + mp_sizeof_map(1) +
- mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(4) +
+ mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(5) +
mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_ro) +
mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_loading) +
+ mp_sizeof_uint(IPROTO_BALLOT_IS_ANON) +
+ mp_sizeof_bool(ballot->is_anon) +
mp_sizeof_uint(UINT32_MAX) +
mp_sizeof_vclock_ignore0(&ballot->vclock) +
mp_sizeof_uint(UINT32_MAX) +
@@ -465,11 +467,13 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
char *data = buf + IPROTO_HEADER_LEN;
data = mp_encode_map(data, 1);
data = mp_encode_uint(data, IPROTO_BALLOT);
- data = mp_encode_map(data, 4);
+ data = mp_encode_map(data, 5);
data = mp_encode_uint(data, IPROTO_BALLOT_IS_RO);
data = mp_encode_bool(data, ballot->is_ro);
data = mp_encode_uint(data, IPROTO_BALLOT_IS_LOADING);
data = mp_encode_bool(data, ballot->is_loading);
+ data = mp_encode_uint(data, IPROTO_BALLOT_IS_ANON);
+ data = mp_encode_bool(data, ballot->is_anon);
data = mp_encode_uint(data, IPROTO_BALLOT_VCLOCK);
data = mp_encode_vclock_ignore0(data, &ballot->vclock);
data = mp_encode_uint(data, IPROTO_BALLOT_GC_VCLOCK);
@@ -1227,6 +1231,7 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
{
ballot->is_ro = false;
ballot->is_loading = false;
+ ballot->is_anon = false;
vclock_create(&ballot->vclock);
const char *start = NULL;
@@ -1276,6 +1281,11 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
goto err;
ballot->is_loading = mp_decode_bool(&data);
break;
+ case IPROTO_BALLOT_IS_ANON:
+ if (mp_typeof(*data) != MP_BOOL)
+ goto err;
+ ballot->is_anon = mp_decode_bool(&data);
+ break;
case IPROTO_BALLOT_VCLOCK:
if (mp_decode_vclock_ignore0(&data,
&ballot->vclock) != 0)
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 58d47b12d..0dc9eb71a 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -332,6 +332,11 @@ xrow_encode_auth(struct xrow_header *row, const char *salt, size_t salt_len,
struct ballot {
/** Set if the instance is running in read-only mode. */
bool is_ro;
+ /**
+ * A flag whether the instance is anonymous, not having an
+ * ID, and not going to request it.
+ */
+ bool is_anon;
/**
* Set if the instance hasn't finished bootstrap or recovery, or
* is syncing with other replicas in the replicaset.
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Tarantool-patches] [PATCH v2 4/4] replication: do not register outgoing connections
2020-09-14 23:11 [Tarantool-patches] [PATCH v2 0/4] Boot with anon Vladislav Shpilevoy
` (2 preceding siblings ...)
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot Vladislav Shpilevoy
@ 2020-09-14 23:11 ` Vladislav Shpilevoy
2020-09-15 7:50 ` Serge Petrenko
2020-09-17 12:08 ` [Tarantool-patches] [PATCH v2 0/4] Boot with anon Kirill Yukhin
4 siblings, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-14 23:11 UTC (permalink / raw)
To: tarantool-patches, gorcunov, sergepetrenko
Replication protocol's first stage for non-anonymous replicas is
that the replica should be registered in _cluster to get a unique
ID number.
That happens, when replica connects to a writable node, which
performs the registration. So it means, registration always
happens on the master node when appears an *incoming* request for
it, explicitly asking for a registration. Only relay can do that.
That wasn't the case for bootstrap. If box.cfg.replication wasn't
empty on the master node doing the cluster bootstrap, it
registered all the outgoing connections in _cluster. Note, the
target node could be even anonymous, but still was registered.
That breaks the protocol, and leads to registration of anon
replicas sometimes. The patch drops it.
Another motivation here is Raft cluster bootstrap specifics.
During Raft bootstrap it is going to be very important that
non-joined replicas should not be registered in _cluster. A
replica can only register after its JOIN request was accepted, and
its snapshot download has started.
Closes #5287
Needed for #1146
---
src/box/applier.cc | 13 ++++
src/box/box.cc | 29 +++++---
src/box/errcode.h | 1 +
test/box/error.result | 1 +
test/replication/autobootstrap_anon.lua | 25 +++++++
test/replication/autobootstrap_anon1.lua | 1 +
test/replication/autobootstrap_anon2.lua | 1 +
test/replication/gh-5287-boot-anon.result | 81 +++++++++++++++++++++
test/replication/gh-5287-boot-anon.test.lua | 30 ++++++++
test/replication/prune.result | 18 ++++-
test/replication/prune.test.lua | 7 +-
11 files changed, 190 insertions(+), 17 deletions(-)
create mode 100644 test/replication/autobootstrap_anon.lua
create mode 120000 test/replication/autobootstrap_anon1.lua
create mode 120000 test/replication/autobootstrap_anon2.lua
create mode 100644 test/replication/gh-5287-boot-anon.result
create mode 100644 test/replication/gh-5287-boot-anon.test.lua
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 96dd48c0d..e272a7af6 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -107,6 +107,7 @@ applier_log_error(struct applier *applier, struct error *e)
case ER_UNKNOWN_REPLICA:
case ER_PASSWORD_MISMATCH:
case ER_XLOG_GAP:
+ case ER_TOO_EARLY_SUBSCRIBE:
say_info("will retry every %.2lf second",
replication_reconnect_interval());
break;
@@ -1306,6 +1307,18 @@ applier_f(va_list ap)
applier_log_error(applier, e);
applier_disconnect(applier, APPLIER_LOADING);
goto reconnect;
+ } else if (e->errcode() == ER_TOO_EARLY_SUBSCRIBE) {
+ /*
+ * The instance is not anonymous, and is
+ * registered, but its ID is not delivered to
+ * all the nodes in the cluster yet, and some
+ * nodes may ask to retry connection later,
+ * until they receive _cluster record of this
+ * instance. From some third node, for example.
+ */
+ applier_log_error(applier, e);
+ applier_disconnect(applier, APPLIER_LOADING);
+ goto reconnect;
} else if (e->errcode() == ER_SYNC_QUORUM_TIMEOUT ||
e->errcode() == ER_SYNC_ROLLBACK) {
/*
diff --git a/src/box/box.cc b/src/box/box.cc
index 145b53ec8..0b1f6c237 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1992,9 +1992,23 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
struct replica *replica = replica_by_uuid(&replica_uuid);
if (!anon && (replica == NULL || replica->id == REPLICA_ID_NIL)) {
- tnt_raise(ClientError, ER_UNKNOWN_REPLICA,
- tt_uuid_str(&replica_uuid),
- tt_uuid_str(&REPLICASET_UUID));
+ /*
+ * The instance is not anonymous, and is registered (at least it
+ * claims so), but its ID is not delivered to the current
+ * instance yet. Need to wait until its _cluster record arrives
+ * from some third node. Likely to happen on bootstrap, when
+ * there is a fullmesh and 1 leader doing all the _cluster
+ * registrations. Not all of them are delivered to the other
+ * nodes yet.
+ * Also can happen when the replica is deleted from _cluster,
+ * but still tries to subscribe. It won't have an ID here.
+ */
+ tnt_raise(ClientError, ER_TOO_EARLY_SUBSCRIBE,
+ tt_uuid_str(&replica_uuid));
+ }
+ if (anon && replica != NULL && replica->id != REPLICA_ID_NIL) {
+ tnt_raise(ClientError, ER_PROTOCOL, "Can't subscribe an "
+ "anonymous replica having an ID assigned");
}
if (replica == NULL)
replica = replicaset_add_anon(&replica_uuid);
@@ -2208,15 +2222,6 @@ bootstrap_master(const struct tt_uuid *replicaset_uuid)
box_register_replica(replica_id, &INSTANCE_UUID);
assert(replica_by_uuid(&INSTANCE_UUID)->id == 1);
- /* Register other cluster members */
- replicaset_foreach(replica) {
- if (tt_uuid_is_equal(&replica->uuid, &INSTANCE_UUID))
- continue;
- assert(replica->applier != NULL);
- box_register_replica(++replica_id, &replica->uuid);
- assert(replica->id == replica_id);
- }
-
/* Set UUID of a new replica set */
box_set_replicaset_uuid(replicaset_uuid);
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 9a240a431..e6957d612 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -272,6 +272,7 @@ struct errcode_record {
/*217 */_(ER_SYNC_ROLLBACK, "A rollback for a synchronous transaction is received") \
/*218 */_(ER_TUPLE_METADATA_IS_TOO_BIG, "Can't create tuple: metadata size %u is too big") \
/*219 */_(ER_XLOG_GAP, "%s") \
+ /*220 */_(ER_TOO_EARLY_SUBSCRIBE, "Can't subscribe non-anonymous replica %s until join is done") \
/*
* !IMPORTANT! Please follow instructions at start of the file
diff --git a/test/box/error.result b/test/box/error.result
index 600517316..4d85b9e55 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -438,6 +438,7 @@ t;
| 217: box.error.SYNC_ROLLBACK
| 218: box.error.TUPLE_METADATA_IS_TOO_BIG
| 219: box.error.XLOG_GAP
+ | 220: box.error.TOO_EARLY_SUBSCRIBE
| ...
test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/autobootstrap_anon.lua b/test/replication/autobootstrap_anon.lua
new file mode 100644
index 000000000..2e96d9d1a
--- /dev/null
+++ b/test/replication/autobootstrap_anon.lua
@@ -0,0 +1,25 @@
+#!/usr/bin/env tarantool
+
+local INSTANCE_ID = string.match(arg[0], "%d")
+local SOCKET_DIR = require('fio').cwd()
+local ANON = arg[1] == 'true'
+
+local function instance_uri(instance_id)
+ return SOCKET_DIR..'/autobootstrap'..instance_id..'.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+ listen = instance_uri(INSTANCE_ID),
+ replication = {
+ instance_uri(1),
+ instance_uri(2),
+ };
+ replication_anon = ANON,
+ read_only = ANON,
+})
+
+box.once("bootstrap", function()
+ box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/autobootstrap_anon1.lua b/test/replication/autobootstrap_anon1.lua
new file mode 120000
index 000000000..27e0fec36
--- /dev/null
+++ b/test/replication/autobootstrap_anon1.lua
@@ -0,0 +1 @@
+autobootstrap_anon.lua
\ No newline at end of file
diff --git a/test/replication/autobootstrap_anon2.lua b/test/replication/autobootstrap_anon2.lua
new file mode 120000
index 000000000..27e0fec36
--- /dev/null
+++ b/test/replication/autobootstrap_anon2.lua
@@ -0,0 +1 @@
+autobootstrap_anon.lua
\ No newline at end of file
diff --git a/test/replication/gh-5287-boot-anon.result b/test/replication/gh-5287-boot-anon.result
new file mode 100644
index 000000000..bf6660ae5
--- /dev/null
+++ b/test/replication/gh-5287-boot-anon.result
@@ -0,0 +1,81 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-5287: when a cluster contained an anonymous replica during bootstrap, it
+-- could be registered anyway.
+--
+
+test_run:cmd("create server replica1 with script='replication/autobootstrap_anon1.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd("start server replica1 with wait=False")
+ | ---
+ | - true
+ | ...
+
+test_run:cmd("create server replica2 with script='replication/autobootstrap_anon2.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd("start server replica2 with args='true', wait=False")
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica2')
+ | ---
+ | - true
+ | ...
+-- Without box.info.replication test-run fails to wait a cond.
+test_run:wait_cond(function() return next(box.info.replication) ~= nil end)
+ | ---
+ | - true
+ | ...
+test_run:wait_upstream(1, {status = 'follow'})
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica1')
+ | ---
+ | - true
+ | ...
+-- The anonymous replica wasn't registered.
+assert(box.space._cluster:len() == 1)
+ | ---
+ | - true
+ | ...
+box.info.gc().consumers
+ | ---
+ | - []
+ | ...
+box.info.replication_anon.count == 1
+ | ---
+ | - true
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd("stop server replica1")
+ | ---
+ | - true
+ | ...
+test_run:cmd("delete server replica1")
+ | ---
+ | - true
+ | ...
+test_run:cmd("stop server replica2")
+ | ---
+ | - true
+ | ...
+test_run:cmd("delete server replica2")
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-5287-boot-anon.test.lua b/test/replication/gh-5287-boot-anon.test.lua
new file mode 100644
index 000000000..94ad81af7
--- /dev/null
+++ b/test/replication/gh-5287-boot-anon.test.lua
@@ -0,0 +1,30 @@
+test_run = require('test_run').new()
+
+--
+-- gh-5287: when a cluster contained an anonymous replica during bootstrap, it
+-- could be registered anyway.
+--
+
+test_run:cmd("create server replica1 with script='replication/autobootstrap_anon1.lua'")
+test_run:cmd("start server replica1 with wait=False")
+
+test_run:cmd("create server replica2 with script='replication/autobootstrap_anon2.lua'")
+test_run:cmd("start server replica2 with args='true', wait=False")
+
+test_run:switch('replica2')
+-- Without box.info.replication test-run fails to wait a cond.
+test_run:wait_cond(function() return next(box.info.replication) ~= nil end)
+test_run:wait_upstream(1, {status = 'follow'})
+
+test_run:switch('replica1')
+-- The anonymous replica wasn't registered.
+assert(box.space._cluster:len() == 1)
+box.info.gc().consumers
+box.info.replication_anon.count == 1
+
+test_run:switch('default')
+
+test_run:cmd("stop server replica1")
+test_run:cmd("delete server replica1")
+test_run:cmd("stop server replica2")
+test_run:cmd("delete server replica2")
diff --git a/test/replication/prune.result b/test/replication/prune.result
index 1a130df40..67fd62990 100644
--- a/test/replication/prune.result
+++ b/test/replication/prune.result
@@ -133,7 +133,19 @@ test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')
- ['The local instance id 2 is read-only']
...
-- restart replica and check that replica isn't able to join to cluster
-test_run:cmd('restart server replica1')
+test_run:cmd('stop server replica1')
+---
+- true
+...
+test_run:cmd('start server replica1 with args="true 0"')
+---
+- true
+...
+test_run:cmd('switch replica1')
+---
+- true
+...
+test_run:wait_upstream(1, {message_re = "Can't subscribe non%-anonymous replica"})
---
- true
...
@@ -147,9 +159,9 @@ box.space._cluster:len() == 1
...
test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')
---
-- ['stopped']
+- ['loading']
...
-test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("is not registered with replica set") ~= nil
+test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("Can't subscribe non%-anonymous replica") ~= nil
---
- true
...
diff --git a/test/replication/prune.test.lua b/test/replication/prune.test.lua
index 80847325b..ea8b0b3c3 100644
--- a/test/replication/prune.test.lua
+++ b/test/replication/prune.test.lua
@@ -65,11 +65,14 @@ while test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')[1]
test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')
-- restart replica and check that replica isn't able to join to cluster
-test_run:cmd('restart server replica1')
+test_run:cmd('stop server replica1')
+test_run:cmd('start server replica1 with args="true 0"')
+test_run:cmd('switch replica1')
+test_run:wait_upstream(1, {message_re = "Can't subscribe non%-anonymous replica"})
test_run:cmd('switch default')
box.space._cluster:len() == 1
test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')
-test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("is not registered with replica set") ~= nil
+test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("Can't subscribe non%-anonymous replica") ~= nil
replica_set.delete(test_run, 2)
box.space.test:drop()
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
@ 2020-09-15 7:35 ` Serge Petrenko
2020-09-15 21:23 ` Vladislav Shpilevoy
0 siblings, 1 reply; 17+ messages in thread
From: Serge Petrenko @ 2020-09-15 7:35 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, gorcunov
15.09.2020 02:11, Vladislav Shpilevoy пишет:
> Previously XlogGapError was considered a critical error stopping
> the replication. That may be not so good as it looks.
>
> XlogGapError is a perfectly fine error, which should not kill the
> replication connection. It should be retried instead.
>
> Because here is an example, when the gap can be recovered on its
> own. Consider the case: node1 is a leader, it is booted with
> vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
> also gets vclock {1: 3}. Then node1 writes something and its
> vclock becomes {1: 4}. Now node3 boots from node1, and gets the
> same vclock. Vclocks now look like this:
>
> - node1: {1: 4}, leader, has {1: 3} snap.
> - node2: {1: 3}, booted from node1, has only snap.
> - node3: {1: 4}, booted from node1, has only snap.
>
> If the cluster is a fullmesh, node2 will send subscribe requests
> with vclock {1: 3}. If node3 receives it, it will respond with
> xlog gap error, because it only has a snap with {1: 4}, nothing
> else. In that case node2 should retry connecting to node3, and in
> the meantime try to get newer changes from node1.
>
> The example is totally valid. However it is unreachable now
> because master registers all replicas in _cluster before allowing
> them to make a join. So they all bootstrap from a snapshot
> containing all their IDs. This is a bug, because such
> auto-registration leads to registration of anonymous replicas, if
> they are present during bootstrap. Also it blocks Raft, which
> can't work if there are registered, but not yet joined nodes.
>
> Once the registration problem will be solved in a next commit, the
> XlogGapError will strike quite often during bootstrap. This patch
> won't allow that happen.
>
> Needed for #5287
> ---
> src/box/applier.cc | 27 +++
> test/replication/force_recovery.result | 110 -----------
> test/replication/force_recovery.test.lua | 43 -----
> test/replication/replica.lua | 2 +
> test/replication/replica_rejoin.result | 6 +-
> test/replication/replica_rejoin.test.lua | 4 +-
> .../show_error_on_disconnect.result | 2 +-
> .../show_error_on_disconnect.test.lua | 2 +-
> test/xlog/panic_on_wal_error.result | 171 ------------------
> test/xlog/panic_on_wal_error.test.lua | 75 --------
> 10 files changed, 36 insertions(+), 406 deletions(-)
> delete mode 100644 test/replication/force_recovery.result
> delete mode 100644 test/replication/force_recovery.test.lua
> delete mode 100644 test/xlog/panic_on_wal_error.result
> delete mode 100644 test/xlog/panic_on_wal_error.test.lua
Hi! Thanks for the patch!
I propose you rework the tests so that they expect upstream.status ==
'loading'
instead of deleting them altogether.
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index c1d07ca54..96dd48c0d 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -106,6 +106,7 @@ applier_log_error(struct applier *applier, struct error *e)
> case ER_SYSTEM:
> case ER_UNKNOWN_REPLICA:
> case ER_PASSWORD_MISMATCH:
> + case ER_XLOG_GAP:
> say_info("will retry every %.2lf second",
> replication_reconnect_interval());
> break;
> @@ -1333,6 +1334,32 @@ applier_f(va_list ap)
> applier_disconnect(applier, APPLIER_STOPPED);
> return -1;
> }
> + } catch (XlogGapError *e) {
> + /*
> + * Xlog gap error can't be a critical error. Because it
> + * is totally normal during bootstrap. Consider the
> + * case: node1 is a leader, it is booted with vclock
> + * {1: 3}. Node2 connects and fetches snapshot of node1,
> + * it also gets vclock {1: 3}. Then node1 writes
> + * something and its vclock becomes {1: 4}. Now node3
> + * boots from node1, and gets the same vclock. Vclocks
> + * now look like this:
> + *
> + * - node1: {1: 4}, leader, has {1: 3} snap.
> + * - node2: {1: 3}, booted from node1, has only snap.
> + * - node3: {1: 4}, booted from node1, has only snap.
> + *
> + * If the cluster is a fullmesh, node2 will send
> + * subscribe requests with vclock {1: 3}. If node3
> + * receives it, it will respond with xlog gap error,
> + * because it only has a snap with {1: 4}, nothing else.
> + * In that case node2 should retry connecting to node3,
> + * and in the meantime try to get newer changes from
> + * node1.
> + */
> + applier_log_error(applier, e);
> + applier_disconnect(applier, APPLIER_LOADING);
> + goto reconnect;
> } catch (FiberIsCancelled *e) {
> if (!diag_is_empty(&applier->diag)) {
> diag_move(&applier->diag, &fiber()->diag);
> diff --git a/test/replication/force_recovery.result b/test/replication/force_recovery.result
> deleted file mode 100644
> index f50452858..000000000
> --- a/test/replication/force_recovery.result
> +++ /dev/null
> @@ -1,110 +0,0 @@
> -test_run = require('test_run').new()
> ----
> -...
> -fio = require('fio')
> ----
> -...
> ---
> --- Test that box.cfg.force_recovery is ignored by relay threads (gh-3910).
> ---
> -_ = box.schema.space.create('test')
> ----
> -...
> -_ = box.space.test:create_index('primary')
> ----
> -...
> -box.schema.user.grant('guest', 'replication')
> ----
> -...
> --- Deploy a replica.
> -test_run:cmd("create server test with rpl_master=default, script='replication/replica.lua'")
> ----
> -- true
> -...
> -test_run:cmd("start server test")
> ----
> -- true
> -...
> --- Stop the replica and wait for the relay thread to exit.
> -test_run:cmd("stop server test")
> ----
> -- true
> -...
> -test_run:wait_cond(function() return box.info.replication[2].downstream.status == 'stopped' end, 10)
> ----
> -- true
> -...
> --- Delete an xlog file that is needed by the replica.
> -box.snapshot()
> ----
> -- ok
> -...
> -xlog = fio.pathjoin(box.cfg.wal_dir, string.format('%020d.xlog', box.info.signature))
> ----
> -...
> -box.space.test:replace{1}
> ----
> -- [1]
> -...
> -box.snapshot()
> ----
> -- ok
> -...
> -box.space.test:replace{2}
> ----
> -- [2]
> -...
> -fio.unlink(xlog)
> ----
> -- true
> -...
> --- Check that even though box.cfg.force_recovery is set,
> --- replication will still fail due to LSN gap.
> -box.cfg{force_recovery = true}
> ----
> -...
> -test_run:cmd("start server test")
> ----
> -- true
> -...
> -test_run:cmd("switch test")
> ----
> -- true
> -...
> -box.space.test:select()
> ----
> -- []
> -...
> -box.info.replication[1].upstream.status == 'stopped' or box.info
> ----
> -- true
> -...
> -test_run:cmd("switch default")
> ----
> -- true
> -...
> -box.cfg{force_recovery = false}
> ----
> -...
> --- Cleanup.
> -test_run:cmd("stop server test")
> ----
> -- true
> -...
> -test_run:cmd("cleanup server test")
> ----
> -- true
> -...
> -test_run:cmd("delete server test")
> ----
> -- true
> -...
> -test_run:cleanup_cluster()
> ----
> -...
> -box.schema.user.revoke('guest', 'replication')
> ----
> -...
> -box.space.test:drop()
> ----
> -...
> diff --git a/test/replication/force_recovery.test.lua b/test/replication/force_recovery.test.lua
> deleted file mode 100644
> index 54307814b..000000000
> --- a/test/replication/force_recovery.test.lua
> +++ /dev/null
> @@ -1,43 +0,0 @@
> -test_run = require('test_run').new()
> -fio = require('fio')
> -
> ---
> --- Test that box.cfg.force_recovery is ignored by relay threads (gh-3910).
> ---
> -_ = box.schema.space.create('test')
> -_ = box.space.test:create_index('primary')
> -box.schema.user.grant('guest', 'replication')
> -
> --- Deploy a replica.
> -test_run:cmd("create server test with rpl_master=default, script='replication/replica.lua'")
> -test_run:cmd("start server test")
> -
> --- Stop the replica and wait for the relay thread to exit.
> -test_run:cmd("stop server test")
> -test_run:wait_cond(function() return box.info.replication[2].downstream.status == 'stopped' end, 10)
> -
> --- Delete an xlog file that is needed by the replica.
> -box.snapshot()
> -xlog = fio.pathjoin(box.cfg.wal_dir, string.format('%020d.xlog', box.info.signature))
> -box.space.test:replace{1}
> -box.snapshot()
> -box.space.test:replace{2}
> -fio.unlink(xlog)
> -
> --- Check that even though box.cfg.force_recovery is set,
> --- replication will still fail due to LSN gap.
> -box.cfg{force_recovery = true}
> -test_run:cmd("start server test")
> -test_run:cmd("switch test")
> -box.space.test:select()
> -box.info.replication[1].upstream.status == 'stopped' or box.info
> -test_run:cmd("switch default")
> -box.cfg{force_recovery = false}
> -
> --- Cleanup.
> -test_run:cmd("stop server test")
> -test_run:cmd("cleanup server test")
> -test_run:cmd("delete server test")
> -test_run:cleanup_cluster()
> -box.schema.user.revoke('guest', 'replication')
> -box.space.test:drop()
> diff --git a/test/replication/replica.lua b/test/replication/replica.lua
> index f3a6dfe58..ef0d6d63a 100644
> --- a/test/replication/replica.lua
> +++ b/test/replication/replica.lua
> @@ -1,6 +1,7 @@
> #!/usr/bin/env tarantool
>
> repl_include_self = arg[1] and arg[1] == 'true' or false
> +repl_quorum = arg[2] and tonumber(arg[2]) or nil
> repl_list = nil
>
> if repl_include_self then
> @@ -14,6 +15,7 @@ box.cfg({
> replication = repl_list,
> memtx_memory = 107374182,
> replication_timeout = 0.1,
> + replication_connect_quorum = repl_quorum,
> })
>
> require('console').listen(os.getenv('ADMIN'))
> diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
> index dd04ae297..7b64c3813 100644
> --- a/test/replication/replica_rejoin.result
> +++ b/test/replication/replica_rejoin.result
> @@ -213,7 +213,7 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
> box.cfg{checkpoint_count = checkpoint_count}
> ---
> ...
> -test_run:cmd("start server replica with args='true'")
> +test_run:cmd("start server replica with args='true 0'")
> ---
> - true
> ...
> @@ -221,9 +221,9 @@ test_run:cmd("switch replica")
> ---
> - true
> ...
> -box.info.status -- orphan
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file'})
> ---
> -- orphan
> +- true
> ...
> box.space.test:select()
> ---
> diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
> index 410ef44d7..dcd0557cb 100644
> --- a/test/replication/replica_rejoin.test.lua
> +++ b/test/replication/replica_rejoin.test.lua
> @@ -79,9 +79,9 @@ for i = 1, 3 do box.space.test:insert{i * 100} end
> fio = require('fio')
> test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) == 1 end) or fio.pathjoin(box.cfg.wal_dir, '*.xlog')
> box.cfg{checkpoint_count = checkpoint_count}
> -test_run:cmd("start server replica with args='true'")
> +test_run:cmd("start server replica with args='true 0'")
> test_run:cmd("switch replica")
> -box.info.status -- orphan
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file'})
> box.space.test:select()
>
> --
> diff --git a/test/replication/show_error_on_disconnect.result b/test/replication/show_error_on_disconnect.result
> index 48003db06..19d3886e4 100644
> --- a/test/replication/show_error_on_disconnect.result
> +++ b/test/replication/show_error_on_disconnect.result
> @@ -72,7 +72,7 @@ box.space.test:select()
> other_id = box.info.id % 2 + 1
> ---
> ...
> -test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
> +test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
> ---
> - true
> ...
> diff --git a/test/replication/show_error_on_disconnect.test.lua b/test/replication/show_error_on_disconnect.test.lua
> index 1b0ea4373..dce926a34 100644
> --- a/test/replication/show_error_on_disconnect.test.lua
> +++ b/test/replication/show_error_on_disconnect.test.lua
> @@ -29,7 +29,7 @@ test_run:cmd("switch master_quorum1")
> box.cfg{replication = repl}
> box.space.test:select()
> other_id = box.info.id % 2 + 1
> -test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
> +test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
> test_run:cmd("switch master_quorum2")
> box.space.test:select()
> other_id = box.info.id % 2 + 1
> diff --git a/test/xlog/panic_on_wal_error.result b/test/xlog/panic_on_wal_error.result
> deleted file mode 100644
> index 22f14f912..000000000
> --- a/test/xlog/panic_on_wal_error.result
> +++ /dev/null
> @@ -1,171 +0,0 @@
> --- preparatory stuff
> -env = require('test_run')
> ----
> -...
> -test_run = env.new()
> ----
> -...
> -test_run:cmd("restart server default with cleanup=True")
> -box.schema.user.grant('guest', 'replication')
> ----
> -...
> -_ = box.schema.space.create('test')
> ----
> -...
> -_ = box.space.test:create_index('pk')
> ----
> -...
> ---
> --- reopen xlog
> ---
> -test_run:cmd("restart server default")
> -box.space.test ~= nil
> ----
> -- true
> -...
> --- insert some stuff
> ---
> -box.space.test:auto_increment{'before snapshot'}
> ----
> -- [1, 'before snapshot']
> -...
> ---
> --- this snapshot will go to the replica
> ---
> -box.snapshot()
> ----
> -- ok
> -...
> ---
> --- create a replica, let it catch up somewhat
> ---
> -test_run:cmd("create server replica with rpl_master=default, script='xlog/replica.lua'")
> ----
> -- true
> -...
> -test_run:cmd("start server replica")
> ----
> -- true
> -...
> -test_run:cmd("switch replica")
> ----
> -- true
> -...
> -box.space.test:select{}
> ----
> -- - [1, 'before snapshot']
> -...
> ---
> --- stop replica, restart the master, insert more stuff
> --- which will make it into an xlog only
> ---
> -test_run:cmd("switch default")
> ----
> -- true
> -...
> -test_run:cmd("stop server replica")
> ----
> -- true
> -...
> -test_run:cmd("restart server default")
> -box.space.test:auto_increment{'after snapshot'}
> ----
> -- [2, 'after snapshot']
> -...
> -box.space.test:auto_increment{'after snapshot - one more row'}
> ----
> -- [3, 'after snapshot - one more row']
> -...
> ---
> --- save snapshot and remove xlogs
> ---
> -box.snapshot()
> ----
> -- ok
> -...
> -fio = require('fio')
> ----
> -...
> -glob = fio.pathjoin(box.cfg.wal_dir, '*.xlog')
> ----
> -...
> -files = fio.glob(glob)
> ----
> -...
> -for _, file in pairs(files) do fio.unlink(file) end
> ----
> -...
> ---
> --- make sure the server has some xlogs, otherwise the
> --- replica doesn't discover the gap in the logs
> ---
> -box.space.test:auto_increment{'after snapshot and restart'}
> ----
> -- [4, 'after snapshot and restart']
> -...
> -box.space.test:auto_increment{'after snapshot and restart - one more row'}
> ----
> -- [5, 'after snapshot and restart - one more row']
> -...
> ---
> --- check that panic is true
> ---
> -box.cfg{force_recovery=false}
> ----
> -...
> -box.cfg.force_recovery
> ----
> -- false
> -...
> ---
> --- try to start the replica, ha-ha
> --- (replication should fail, some rows are missing)
> ---
> -test_run:cmd("start server replica")
> ----
> -- true
> -...
> -test_run:cmd("switch replica")
> ----
> -- true
> -...
> -fiber = require('fiber')
> ----
> -...
> -while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
> ----
> -...
> -box.info.replication[1].upstream.status
> ----
> -- stopped
> -...
> -box.info.replication[1].upstream.message
> ----
> -- 'Missing .xlog file between LSN 6 {1: 6} and 8 {1: 8}'
> -...
> -box.space.test:select{}
> ----
> -- - [1, 'before snapshot']
> -...
> ---
> ---
> -test_run:cmd("switch default")
> ----
> -- true
> -...
> -test_run:cmd("stop server replica")
> ----
> -- true
> -...
> -test_run:cmd("cleanup server replica")
> ----
> -- true
> -...
> ---
> --- cleanup
> -box.space.test:drop()
> ----
> -...
> -box.schema.user.revoke('guest', 'replication')
> ----
> -...
> diff --git a/test/xlog/panic_on_wal_error.test.lua b/test/xlog/panic_on_wal_error.test.lua
> deleted file mode 100644
> index 2e95431c6..000000000
> --- a/test/xlog/panic_on_wal_error.test.lua
> +++ /dev/null
> @@ -1,75 +0,0 @@
> --- preparatory stuff
> -env = require('test_run')
> -test_run = env.new()
> -
> -test_run:cmd("restart server default with cleanup=True")
> -box.schema.user.grant('guest', 'replication')
> -_ = box.schema.space.create('test')
> -_ = box.space.test:create_index('pk')
> ---
> --- reopen xlog
> ---
> -test_run:cmd("restart server default")
> -box.space.test ~= nil
> --- insert some stuff
> ---
> -box.space.test:auto_increment{'before snapshot'}
> ---
> --- this snapshot will go to the replica
> ---
> -box.snapshot()
> ---
> --- create a replica, let it catch up somewhat
> ---
> -test_run:cmd("create server replica with rpl_master=default, script='xlog/replica.lua'")
> -test_run:cmd("start server replica")
> -test_run:cmd("switch replica")
> -box.space.test:select{}
> ---
> --- stop replica, restart the master, insert more stuff
> --- which will make it into an xlog only
> ---
> -test_run:cmd("switch default")
> -test_run:cmd("stop server replica")
> -test_run:cmd("restart server default")
> -box.space.test:auto_increment{'after snapshot'}
> -box.space.test:auto_increment{'after snapshot - one more row'}
> ---
> --- save snapshot and remove xlogs
> ---
> -box.snapshot()
> -fio = require('fio')
> -glob = fio.pathjoin(box.cfg.wal_dir, '*.xlog')
> -files = fio.glob(glob)
> -for _, file in pairs(files) do fio.unlink(file) end
> ---
> --- make sure the server has some xlogs, otherwise the
> --- replica doesn't discover the gap in the logs
> ---
> -box.space.test:auto_increment{'after snapshot and restart'}
> -box.space.test:auto_increment{'after snapshot and restart - one more row'}
> ---
> --- check that panic is true
> ---
> -box.cfg{force_recovery=false}
> -box.cfg.force_recovery
> ---
> --- try to start the replica, ha-ha
> --- (replication should fail, some rows are missing)
> ---
> -test_run:cmd("start server replica")
> -test_run:cmd("switch replica")
> -fiber = require('fiber')
> -while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
> -box.info.replication[1].upstream.status
> -box.info.replication[1].upstream.message
> -box.space.test:select{}
> ---
> ---
> -test_run:cmd("switch default")
> -test_run:cmd("stop server replica")
> -test_run:cmd("cleanup server replica")
> ---
> --- cleanup
> -box.space.test:drop()
> -box.schema.user.revoke('guest', 'replication')
--
Serge Petrenko
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot Vladislav Shpilevoy
@ 2020-09-15 7:46 ` Serge Petrenko
2020-09-15 21:22 ` Vladislav Shpilevoy
0 siblings, 1 reply; 17+ messages in thread
From: Serge Petrenko @ 2020-09-15 7:46 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, gorcunov
15.09.2020 02:11, Vladislav Shpilevoy пишет:
> Ballot is a message sent in response on vote request, which is
> sent by applier first thing after connection establishment.
>
> It contains basic info about the remote instance such as whether
> it is read only, if it is still loading, and more.
>
> The ballot didn't contain a flag whether the instance is
> anonymous. That led to a problem, when applier was connected to a
> remote instance, was added to struct replicaset inside a struct
> replica object, but it was unknown whether it is anonymous. It was
> added as not anonymous by default.
>
> If the remote instance was in fact anonymous and sent a subscribe
> response back to the first instance with the anon flag = true,
> then it looked like the remote instance was not anonymous, and
> suddenly became such, without even a reconnect. It could lead to
> an assertion.
>
> The bug is hidden behind another bug, because of which the leader
> instance on boostrap registers all replicas listed in its
> box.cfg.replication, even anonymous ones.
>
> The patch makes the ballot contain the anon flag. Now both relay
> and applier send whether their host is anonymous. Relay does it by
> sending the ballot, applier sends it in scope of subscribe
> request. By the time a replica gets UUID and is added into struct
> replicaset, its anon flag is determined.
>
> Also the patch makes anon_count updated on each replica hash table
> change. Previously it was only updated when something related to
> relay was done. Now anon is updated by applier actions too, and
> it is not ok to update the counter on relay-specific actions.
>
> The early registration bug is a subject for a next patch.
>
> Part of #5287
>
> @TarantoolBot document
> Title: IPROTO_BALLOT_IS_ANON flag
>
> There is a request type IPROTO_BALLOT, with code 0x29. It has
> fields IPROTO_BALLOT_IS_RO (0x01), IPROTO_BALLOT_VCLOCK (0x02),
> IPROTO_BALLOT_GC_VCLOCK (0x03), IPROTO_BALLOT_IS_LOADING (0x04).
>
> Now it gets a new field IPROTO_BALLOT_IS_ANON (0x05). The field
> is a boolean, and equals to box.cfg.replication_anon of the
> sender.
> ---
Hi! Thanks for the patch!
Please see one comment below.
> src/box/box.cc | 1 +
> src/box/iproto_constants.h | 1 +
> src/box/replication.cc | 14 ++++++++++++--
> src/box/xrow.c | 14 ++++++++++++--
> src/box/xrow.h | 5 +++++
> 5 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index faffd5769..145b53ec8 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -2080,6 +2080,7 @@ void
> box_process_vote(struct ballot *ballot)
> {
> ballot->is_ro = cfg_geti("read_only") != 0;
> + ballot->is_anon = replication_anon;
> /*
> * is_ro is true on initial load and is set to box.cfg.read_only
> * after box_cfg() returns, during dynamic box.cfg parameters setting.
> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
> index 4f5a2b195..9c2fb6058 100644
> --- a/src/box/iproto_constants.h
> +++ b/src/box/iproto_constants.h
> @@ -149,6 +149,7 @@ enum iproto_ballot_key {
> IPROTO_BALLOT_VCLOCK = 0x02,
> IPROTO_BALLOT_GC_VCLOCK = 0x03,
> IPROTO_BALLOT_IS_LOADING = 0x04,
> + IPROTO_BALLOT_IS_ANON = 0x05,
> };
>
> #define bit(c) (1ULL<<IPROTO_##c)
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index ef0e2411d..8408f395e 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -290,6 +290,8 @@ replica_clear_id(struct replica *replica)
> }
> if (replica_is_orphan(replica)) {
> replica_hash_remove(&replicaset.hash, replica);
> + replicaset.anon_count -= replica->anon;
> + assert(replicaset.anon_count >= 0);
replica_clear_id() is only called on _cluster entry deletion.
So you're surely working with a normal replica, not anonymous.
We may add an assset(!replica->anon) somewhere.
Don't know if it's really needed though.
> replica_delete(replica);
> }
> }
> @@ -332,6 +334,7 @@ replica_on_applier_connect(struct replica *replica)
> assert(replica->applier_sync_state == APPLIER_DISCONNECTED);
>
> replica->uuid = applier->uuid;
> + replica->anon = applier->ballot.is_anon;
> replica->applier_sync_state = APPLIER_CONNECTED;
> replicaset.applier.connected++;
>
> @@ -362,6 +365,7 @@ replica_on_applier_connect(struct replica *replica)
> } else {
> /* Add a new struct replica */
> replica_hash_insert(&replicaset.hash, replica);
> + replicaset.anon_count += replica->anon;
> }
> }
>
> @@ -390,7 +394,9 @@ replica_on_applier_reconnect(struct replica *replica)
> if (orig == NULL) {
> orig = replica_new();
> orig->uuid = applier->uuid;
> + orig->anon = applier->ballot.is_anon;
> replica_hash_insert(&replicaset.hash, orig);
> + replicaset.anon_count += orig->anon;
> }
>
> if (orig->applier != NULL) {
> @@ -526,6 +532,7 @@ replicaset_update(struct applier **appliers, int count)
>
> assert(!tt_uuid_is_nil(&applier->uuid));
> replica->uuid = applier->uuid;
> + replica->anon = applier->ballot.is_anon;
>
> if (replica_hash_search(&uniq, replica) != NULL) {
> replica_clear_applier(replica);
> @@ -582,6 +589,7 @@ replicaset_update(struct applier **appliers, int count)
> } else {
> /* Add a new struct replica */
> replica_hash_insert(&replicaset.hash, replica);
> + replicaset.anon_count += replica->anon;
> }
>
> replica->applier_sync_state = APPLIER_CONNECTED;
> @@ -593,6 +601,8 @@ replicaset_update(struct applier **appliers, int count)
> replica_hash_foreach_safe(&replicaset.hash, replica, next) {
> if (replica_is_orphan(replica)) {
> replica_hash_remove(&replicaset.hash, replica);
> + replicaset.anon_count -= replica->anon;
> + assert(replicaset.anon_count >= 0);
> replica_delete(replica);
> }
> }
> @@ -907,12 +917,12 @@ replica_on_relay_stop(struct replica *replica)
> replica->gc = NULL;
> } else {
> assert(replica->gc == NULL);
> - assert(replicaset.anon_count > 0);
> - replicaset.anon_count--;
> }
> }
> if (replica_is_orphan(replica)) {
> replica_hash_remove(&replicaset.hash, replica);
> + replicaset.anon_count -= replica->anon;
> + assert(replicaset.anon_count >= 0);
> replica_delete(replica);
> }
> }
> diff --git a/src/box/xrow.c b/src/box/xrow.c
> index 95ddb1fe7..2edcb2f8f 100644
> --- a/src/box/xrow.c
> +++ b/src/box/xrow.c
> @@ -447,9 +447,11 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
> uint64_t sync, uint32_t schema_version)
> {
> size_t max_size = IPROTO_HEADER_LEN + mp_sizeof_map(1) +
> - mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(4) +
> + mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(5) +
> mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_ro) +
> mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_loading) +
> + mp_sizeof_uint(IPROTO_BALLOT_IS_ANON) +
> + mp_sizeof_bool(ballot->is_anon) +
> mp_sizeof_uint(UINT32_MAX) +
> mp_sizeof_vclock_ignore0(&ballot->vclock) +
> mp_sizeof_uint(UINT32_MAX) +
> @@ -465,11 +467,13 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
> char *data = buf + IPROTO_HEADER_LEN;
> data = mp_encode_map(data, 1);
> data = mp_encode_uint(data, IPROTO_BALLOT);
> - data = mp_encode_map(data, 4);
> + data = mp_encode_map(data, 5);
> data = mp_encode_uint(data, IPROTO_BALLOT_IS_RO);
> data = mp_encode_bool(data, ballot->is_ro);
> data = mp_encode_uint(data, IPROTO_BALLOT_IS_LOADING);
> data = mp_encode_bool(data, ballot->is_loading);
> + data = mp_encode_uint(data, IPROTO_BALLOT_IS_ANON);
> + data = mp_encode_bool(data, ballot->is_anon);
> data = mp_encode_uint(data, IPROTO_BALLOT_VCLOCK);
> data = mp_encode_vclock_ignore0(data, &ballot->vclock);
> data = mp_encode_uint(data, IPROTO_BALLOT_GC_VCLOCK);
> @@ -1227,6 +1231,7 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
> {
> ballot->is_ro = false;
> ballot->is_loading = false;
> + ballot->is_anon = false;
> vclock_create(&ballot->vclock);
>
> const char *start = NULL;
> @@ -1276,6 +1281,11 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
> goto err;
> ballot->is_loading = mp_decode_bool(&data);
> break;
> + case IPROTO_BALLOT_IS_ANON:
> + if (mp_typeof(*data) != MP_BOOL)
> + goto err;
> + ballot->is_anon = mp_decode_bool(&data);
> + break;
> case IPROTO_BALLOT_VCLOCK:
> if (mp_decode_vclock_ignore0(&data,
> &ballot->vclock) != 0)
> diff --git a/src/box/xrow.h b/src/box/xrow.h
> index 58d47b12d..0dc9eb71a 100644
> --- a/src/box/xrow.h
> +++ b/src/box/xrow.h
> @@ -332,6 +332,11 @@ xrow_encode_auth(struct xrow_header *row, const char *salt, size_t salt_len,
> struct ballot {
> /** Set if the instance is running in read-only mode. */
> bool is_ro;
> + /**
> + * A flag whether the instance is anonymous, not having an
> + * ID, and not going to request it.
> + */
> + bool is_anon;
> /**
> * Set if the instance hasn't finished bootstrap or recovery, or
> * is syncing with other replicas in the replicaset.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 4/4] replication: do not register outgoing connections
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
@ 2020-09-15 7:50 ` Serge Petrenko
0 siblings, 0 replies; 17+ messages in thread
From: Serge Petrenko @ 2020-09-15 7:50 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, gorcunov
15.09.2020 02:11, Vladislav Shpilevoy пишет:
> Replication protocol's first stage for non-anonymous replicas is
> that the replica should be registered in _cluster to get a unique
> ID number.
>
> That happens, when replica connects to a writable node, which
> performs the registration. So it means, registration always
> happens on the master node when appears an *incoming* request for
> it, explicitly asking for a registration. Only relay can do that.
>
> That wasn't the case for bootstrap. If box.cfg.replication wasn't
> empty on the master node doing the cluster bootstrap, it
> registered all the outgoing connections in _cluster. Note, the
> target node could be even anonymous, but still was registered.
>
> That breaks the protocol, and leads to registration of anon
> replicas sometimes. The patch drops it.
>
> Another motivation here is Raft cluster bootstrap specifics.
> During Raft bootstrap it is going to be very important that
> non-joined replicas should not be registered in _cluster. A
> replica can only register after its JOIN request was accepted, and
> its snapshot download has started.
>
> Closes #5287
> Needed for #1146
> ---
> src/box/applier.cc | 13 ++++
> src/box/box.cc | 29 +++++---
> src/box/errcode.h | 1 +
> test/box/error.result | 1 +
> test/replication/autobootstrap_anon.lua | 25 +++++++
> test/replication/autobootstrap_anon1.lua | 1 +
> test/replication/autobootstrap_anon2.lua | 1 +
> test/replication/gh-5287-boot-anon.result | 81 +++++++++++++++++++++
> test/replication/gh-5287-boot-anon.test.lua | 30 ++++++++
> test/replication/prune.result | 18 ++++-
> test/replication/prune.test.lua | 7 +-
> 11 files changed, 190 insertions(+), 17 deletions(-)
> create mode 100644 test/replication/autobootstrap_anon.lua
> create mode 120000 test/replication/autobootstrap_anon1.lua
> create mode 120000 test/replication/autobootstrap_anon2.lua
> create mode 100644 test/replication/gh-5287-boot-anon.result
> create mode 100644 test/replication/gh-5287-boot-anon.test.lua
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 96dd48c0d..e272a7af6 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -107,6 +107,7 @@ applier_log_error(struct applier *applier, struct error *e)
> case ER_UNKNOWN_REPLICA:
> case ER_PASSWORD_MISMATCH:
> case ER_XLOG_GAP:
> + case ER_TOO_EARLY_SUBSCRIBE:
> say_info("will retry every %.2lf second",
> replication_reconnect_interval());
> break;
> @@ -1306,6 +1307,18 @@ applier_f(va_list ap)
> applier_log_error(applier, e);
> applier_disconnect(applier, APPLIER_LOADING);
> goto reconnect;
> + } else if (e->errcode() == ER_TOO_EARLY_SUBSCRIBE) {
> + /*
> + * The instance is not anonymous, and is
> + * registered, but its ID is not delivered to
> + * all the nodes in the cluster yet, and some
> + * nodes may ask to retry connection later,
> + * until they receive _cluster record of this
> + * instance. From some third node, for example.
> + */
> + applier_log_error(applier, e);
> + applier_disconnect(applier, APPLIER_LOADING);
> + goto reconnect;
> } else if (e->errcode() == ER_SYNC_QUORUM_TIMEOUT ||
> e->errcode() == ER_SYNC_ROLLBACK) {
> /*
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 145b53ec8..0b1f6c237 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1992,9 +1992,23 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
> struct replica *replica = replica_by_uuid(&replica_uuid);
>
> if (!anon && (replica == NULL || replica->id == REPLICA_ID_NIL)) {
> - tnt_raise(ClientError, ER_UNKNOWN_REPLICA,
> - tt_uuid_str(&replica_uuid),
> - tt_uuid_str(&REPLICASET_UUID));
> + /*
> + * The instance is not anonymous, and is registered (at least it
> + * claims so), but its ID is not delivered to the current
> + * instance yet. Need to wait until its _cluster record arrives
> + * from some third node. Likely to happen on bootstrap, when
> + * there is a fullmesh and 1 leader doing all the _cluster
> + * registrations. Not all of them are delivered to the other
> + * nodes yet.
> + * Also can happen when the replica is deleted from _cluster,
> + * but still tries to subscribe. It won't have an ID here.
> + */
> + tnt_raise(ClientError, ER_TOO_EARLY_SUBSCRIBE,
> + tt_uuid_str(&replica_uuid));
> + }
> + if (anon && replica != NULL && replica->id != REPLICA_ID_NIL) {
> + tnt_raise(ClientError, ER_PROTOCOL, "Can't subscribe an "
> + "anonymous replica having an ID assigned");
> }
> if (replica == NULL)
> replica = replicaset_add_anon(&replica_uuid);
> @@ -2208,15 +2222,6 @@ bootstrap_master(const struct tt_uuid *replicaset_uuid)
> box_register_replica(replica_id, &INSTANCE_UUID);
> assert(replica_by_uuid(&INSTANCE_UUID)->id == 1);
>
> - /* Register other cluster members */
> - replicaset_foreach(replica) {
> - if (tt_uuid_is_equal(&replica->uuid, &INSTANCE_UUID))
> - continue;
> - assert(replica->applier != NULL);
> - box_register_replica(++replica_id, &replica->uuid);
> - assert(replica->id == replica_id);
> - }
> -
> /* Set UUID of a new replica set */
> box_set_replicaset_uuid(replicaset_uuid);
>
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index 9a240a431..e6957d612 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -272,6 +272,7 @@ struct errcode_record {
> /*217 */_(ER_SYNC_ROLLBACK, "A rollback for a synchronous transaction is received") \
> /*218 */_(ER_TUPLE_METADATA_IS_TOO_BIG, "Can't create tuple: metadata size %u is too big") \
> /*219 */_(ER_XLOG_GAP, "%s") \
> + /*220 */_(ER_TOO_EARLY_SUBSCRIBE, "Can't subscribe non-anonymous replica %s until join is done") \
>
> /*
> * !IMPORTANT! Please follow instructions at start of the file
> diff --git a/test/box/error.result b/test/box/error.result
> index 600517316..4d85b9e55 100644
> --- a/test/box/error.result
> +++ b/test/box/error.result
> @@ -438,6 +438,7 @@ t;
> | 217: box.error.SYNC_ROLLBACK
> | 218: box.error.TUPLE_METADATA_IS_TOO_BIG
> | 219: box.error.XLOG_GAP
> + | 220: box.error.TOO_EARLY_SUBSCRIBE
> | ...
>
> test_run:cmd("setopt delimiter ''");
> diff --git a/test/replication/autobootstrap_anon.lua b/test/replication/autobootstrap_anon.lua
> new file mode 100644
> index 000000000..2e96d9d1a
> --- /dev/null
> +++ b/test/replication/autobootstrap_anon.lua
> @@ -0,0 +1,25 @@
> +#!/usr/bin/env tarantool
> +
> +local INSTANCE_ID = string.match(arg[0], "%d")
> +local SOCKET_DIR = require('fio').cwd()
> +local ANON = arg[1] == 'true'
> +
> +local function instance_uri(instance_id)
> + return SOCKET_DIR..'/autobootstrap'..instance_id..'.sock';
> +end
> +
> +require('console').listen(os.getenv('ADMIN'))
> +
> +box.cfg({
> + listen = instance_uri(INSTANCE_ID),
> + replication = {
> + instance_uri(1),
> + instance_uri(2),
> + };
> + replication_anon = ANON,
> + read_only = ANON,
> +})
> +
> +box.once("bootstrap", function()
> + box.schema.user.grant('guest', 'super')
> +end)
> diff --git a/test/replication/autobootstrap_anon1.lua b/test/replication/autobootstrap_anon1.lua
> new file mode 120000
> index 000000000..27e0fec36
> --- /dev/null
> +++ b/test/replication/autobootstrap_anon1.lua
> @@ -0,0 +1 @@
> +autobootstrap_anon.lua
> \ No newline at end of file
> diff --git a/test/replication/autobootstrap_anon2.lua b/test/replication/autobootstrap_anon2.lua
> new file mode 120000
> index 000000000..27e0fec36
> --- /dev/null
> +++ b/test/replication/autobootstrap_anon2.lua
> @@ -0,0 +1 @@
> +autobootstrap_anon.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-5287-boot-anon.result b/test/replication/gh-5287-boot-anon.result
> new file mode 100644
> index 000000000..bf6660ae5
> --- /dev/null
> +++ b/test/replication/gh-5287-boot-anon.result
> @@ -0,0 +1,81 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +--
> +-- gh-5287: when a cluster contained an anonymous replica during bootstrap, it
> +-- could be registered anyway.
> +--
> +
> +test_run:cmd("create server replica1 with script='replication/autobootstrap_anon1.lua'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server replica1 with wait=False")
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd("create server replica2 with script='replication/autobootstrap_anon2.lua'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server replica2 with args='true', wait=False")
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica2')
> + | ---
> + | - true
> + | ...
> +-- Without box.info.replication test-run fails to wait a cond.
> +test_run:wait_cond(function() return next(box.info.replication) ~= nil end)
> + | ---
> + | - true
> + | ...
> +test_run:wait_upstream(1, {status = 'follow'})
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('replica1')
> + | ---
> + | - true
> + | ...
> +-- The anonymous replica wasn't registered.
> +assert(box.space._cluster:len() == 1)
> + | ---
> + | - true
> + | ...
> +box.info.gc().consumers
> + | ---
> + | - []
> + | ...
> +box.info.replication_anon.count == 1
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd("stop server replica1")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("delete server replica1")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("stop server replica2")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("delete server replica2")
> + | ---
> + | - true
> + | ...
> diff --git a/test/replication/gh-5287-boot-anon.test.lua b/test/replication/gh-5287-boot-anon.test.lua
> new file mode 100644
> index 000000000..94ad81af7
> --- /dev/null
> +++ b/test/replication/gh-5287-boot-anon.test.lua
> @@ -0,0 +1,30 @@
> +test_run = require('test_run').new()
> +
> +--
> +-- gh-5287: when a cluster contained an anonymous replica during bootstrap, it
> +-- could be registered anyway.
> +--
> +
> +test_run:cmd("create server replica1 with script='replication/autobootstrap_anon1.lua'")
> +test_run:cmd("start server replica1 with wait=False")
> +
> +test_run:cmd("create server replica2 with script='replication/autobootstrap_anon2.lua'")
> +test_run:cmd("start server replica2 with args='true', wait=False")
> +
> +test_run:switch('replica2')
> +-- Without box.info.replication test-run fails to wait a cond.
> +test_run:wait_cond(function() return next(box.info.replication) ~= nil end)
> +test_run:wait_upstream(1, {status = 'follow'})
> +
> +test_run:switch('replica1')
> +-- The anonymous replica wasn't registered.
> +assert(box.space._cluster:len() == 1)
> +box.info.gc().consumers
> +box.info.replication_anon.count == 1
> +
> +test_run:switch('default')
> +
> +test_run:cmd("stop server replica1")
> +test_run:cmd("delete server replica1")
> +test_run:cmd("stop server replica2")
> +test_run:cmd("delete server replica2")
> diff --git a/test/replication/prune.result b/test/replication/prune.result
> index 1a130df40..67fd62990 100644
> --- a/test/replication/prune.result
> +++ b/test/replication/prune.result
> @@ -133,7 +133,19 @@ test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')
> - ['The local instance id 2 is read-only']
> ...
> -- restart replica and check that replica isn't able to join to cluster
> -test_run:cmd('restart server replica1')
> +test_run:cmd('stop server replica1')
> +---
> +- true
> +...
> +test_run:cmd('start server replica1 with args="true 0"')
> +---
> +- true
> +...
> +test_run:cmd('switch replica1')
> +---
> +- true
> +...
> +test_run:wait_upstream(1, {message_re = "Can't subscribe non%-anonymous replica"})
> ---
> - true
> ...
> @@ -147,9 +159,9 @@ box.space._cluster:len() == 1
> ...
> test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')
> ---
> -- ['stopped']
> +- ['loading']
> ...
> -test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("is not registered with replica set") ~= nil
> +test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("Can't subscribe non%-anonymous replica") ~= nil
> ---
> - true
> ...
> diff --git a/test/replication/prune.test.lua b/test/replication/prune.test.lua
> index 80847325b..ea8b0b3c3 100644
> --- a/test/replication/prune.test.lua
> +++ b/test/replication/prune.test.lua
> @@ -65,11 +65,14 @@ while test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')[1]
> test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')
>
> -- restart replica and check that replica isn't able to join to cluster
> -test_run:cmd('restart server replica1')
> +test_run:cmd('stop server replica1')
> +test_run:cmd('start server replica1 with args="true 0"')
> +test_run:cmd('switch replica1')
> +test_run:wait_upstream(1, {message_re = "Can't subscribe non%-anonymous replica"})
> test_run:cmd('switch default')
> box.space._cluster:len() == 1
> test_run:cmd('eval replica1 "box.info.replication[1].upstream.status"')
> -test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("is not registered with replica set") ~= nil
> +test_run:cmd('eval replica1 "box.info.replication[1].upstream.message"')[1]:match("Can't subscribe non%-anonymous replica") ~= nil
> replica_set.delete(test_run, 2)
>
> box.space.test:drop()
LGTM
--
Serge Petrenko
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
@ 2020-09-15 7:53 ` Serge Petrenko
0 siblings, 0 replies; 17+ messages in thread
From: Serge Petrenko @ 2020-09-15 7:53 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, gorcunov
15.09.2020 02:11, Vladislav Shpilevoy пишет:
> XlogGapError object didn't have a code in ClientError code space.
> Because of that it was not possible to handle the gap error
> together with client errors in some switch-case statement.
>
> Now the gap error has a code.
>
> This is going to be used in applier code to handle XlogGapError
> among other errors using its code instead of RTTI.
>
> Needed for #5287
> ---
> src/box/errcode.h | 1 +
> src/box/error.cc | 2 ++
> src/box/error.h | 1 +
> src/box/recovery.h | 2 --
> test/box/error.result | 1 +
> 5 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index 3c21375f5..9a240a431 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -271,6 +271,7 @@ struct errcode_record {
> /*216 */_(ER_SYNC_QUORUM_TIMEOUT, "Quorum collection for a synchronous transaction is timed out") \
> /*217 */_(ER_SYNC_ROLLBACK, "A rollback for a synchronous transaction is received") \
> /*218 */_(ER_TUPLE_METADATA_IS_TOO_BIG, "Can't create tuple: metadata size %u is too big") \
> + /*219 */_(ER_XLOG_GAP, "%s") \
>
> /*
> * !IMPORTANT! Please follow instructions at start of the file
> diff --git a/src/box/error.cc b/src/box/error.cc
> index c3c2af3ab..ca1d73e0c 100644
> --- a/src/box/error.cc
> +++ b/src/box/error.cc
> @@ -221,6 +221,8 @@ ClientError::get_errcode(const struct error *e)
> return ER_SYSTEM;
> if (type_cast(CollationError, e))
> return ER_CANT_CREATE_COLLATION;
> + if (type_cast(XlogGapError, e))
> + return ER_XLOG_GAP;
> return ER_PROC_LUA;
> }
>
> diff --git a/src/box/error.h b/src/box/error.h
> index 988b98255..338121dd9 100644
> --- a/src/box/error.h
> +++ b/src/box/error.h
> @@ -178,6 +178,7 @@ box_error_new(const char *file, unsigned line, uint32_t code,
>
> extern const struct type_info type_ClientError;
> extern const struct type_info type_XlogError;
> +extern const struct type_info type_XlogGapError;
> extern const struct type_info type_AccessDeniedError;
> extern const struct type_info type_CustomError;
>
> diff --git a/src/box/recovery.h b/src/box/recovery.h
> index 6e68abc0b..b8d83951a 100644
> --- a/src/box/recovery.h
> +++ b/src/box/recovery.h
> @@ -40,8 +40,6 @@
> extern "C" {
> #endif /* defined(__cplusplus) */
>
> -extern const struct type_info type_XlogGapError;
> -
> struct xrow_header;
> struct xstream;
>
> diff --git a/test/box/error.result b/test/box/error.result
> index cdecdb221..600517316 100644
> --- a/test/box/error.result
> +++ b/test/box/error.result
> @@ -437,6 +437,7 @@ t;
> | 216: box.error.SYNC_QUORUM_TIMEOUT
> | 217: box.error.SYNC_ROLLBACK
> | 218: box.error.TUPLE_METADATA_IS_TOO_BIG
> + | 219: box.error.XLOG_GAP
> | ...
>
> test_run:cmd("setopt delimiter ''");
LGTM
--
Serge Petrenko
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot
2020-09-15 7:46 ` Serge Petrenko
@ 2020-09-15 21:22 ` Vladislav Shpilevoy
2020-09-16 10:59 ` Serge Petrenko
0 siblings, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-15 21:22 UTC (permalink / raw)
To: Serge Petrenko, tarantool-patches, gorcunov
Thanks for the review!
>> diff --git a/src/box/replication.cc b/src/box/replication.cc
>> index ef0e2411d..8408f395e 100644
>> --- a/src/box/replication.cc
>> +++ b/src/box/replication.cc
>> @@ -290,6 +290,8 @@ replica_clear_id(struct replica *replica)
>> }
>> if (replica_is_orphan(replica)) {
>> replica_hash_remove(&replicaset.hash, replica);
>> + replicaset.anon_count -= replica->anon;
>> + assert(replicaset.anon_count >= 0);
>
> replica_clear_id() is only called on _cluster entry deletion.
> So you're surely working with a normal replica, not anonymous.
>
> We may add an assset(!replica->anon) somewhere.
Indeed:
====================
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 8408f395e..c1fcdb660 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -290,8 +290,11 @@ replica_clear_id(struct replica *replica)
}
if (replica_is_orphan(replica)) {
replica_hash_remove(&replicaset.hash, replica);
- replicaset.anon_count -= replica->anon;
- assert(replicaset.anon_count >= 0);
+ /*
+ * The replica had an ID, it couldn't be anon by
+ * definition.
+ */
+ assert(!replica->anon);
replica_delete(replica);
}
}
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError
2020-09-15 7:35 ` Serge Petrenko
@ 2020-09-15 21:23 ` Vladislav Shpilevoy
2020-09-16 10:59 ` Serge Petrenko
0 siblings, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-15 21:23 UTC (permalink / raw)
To: Serge Petrenko, tarantool-patches, gorcunov
Hi! Thanks for the review!
> I propose you rework the tests so that they expect upstream.status == 'loading'
> instead of deleting them altogether.
I return them back for now with a few amendments. However here is
the motivation why I dropped them earlier:
- Their functionality was fully covered by some other tests doing exactly the same;
- Both removed tests were also doing the same;
- Xlog/panic_on_wal_error is in a wrong folder, we should have such tests in
replication/ folder. Also the test is close to useless because it seems it
tries to see what happens with recovery when box.cfg{force_recovery = false}
on the replica, but the setting is installed on the master. So it does not mean
anything.
Anyway, now they are back. Maybe we will drop them later someday.
To make them work with the 'loading' status I needed to move one line in
xlog/replica.lua and replication/replica.lua about starting the admin console
before box.cfg. So as test-run could attach before box.cfg is finished.
Otherwise it hangs because of 'loading'.
See the full new patch below. The diff was too big.
====================
replication: retry in case of XlogGapError
Previously XlogGapError was considered a critical error stopping
the replication. That may be not so good as it looks.
XlogGapError is a perfectly fine error, which should not kill the
replication connection. It should be retried instead.
Because here is an example, when the gap can be recovered on its
own. Consider the case: node1 is a leader, it is booted with
vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
also gets vclock {1: 3}. Then node1 writes something and its
vclock becomes {1: 4}. Now node3 boots from node1, and gets the
same vclock. Vclocks now look like this:
- node1: {1: 4}, leader, has {1: 3} snap.
- node2: {1: 3}, booted from node1, has only snap.
- node3: {1: 4}, booted from node1, has only snap.
If the cluster is a fullmesh, node2 will send subscribe requests
with vclock {1: 3}. If node3 receives it, it will respond with
xlog gap error, because it only has a snap with {1: 4}, nothing
else. In that case node2 should retry connecting to node3, and in
the meantime try to get newer changes from node1.
The example is totally valid. However it is unreachable now
because master registers all replicas in _cluster before allowing
them to make a join. So they all bootstrap from a snapshot
containing all their IDs. This is a bug, because such
auto-registration leads to registration of anonymous replicas, if
they are present during bootstrap. Also it blocks Raft, which
can't work if there are registered, but not yet joined nodes.
Once the registration problem will be solved in a next commit, the
XlogGapError will strike quite often during bootstrap. This patch
won't allow that happen.
Needed for #5287
diff --git a/src/box/applier.cc b/src/box/applier.cc
index c1d07ca54..96dd48c0d 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -106,6 +106,7 @@ applier_log_error(struct applier *applier, struct error *e)
case ER_SYSTEM:
case ER_UNKNOWN_REPLICA:
case ER_PASSWORD_MISMATCH:
+ case ER_XLOG_GAP:
say_info("will retry every %.2lf second",
replication_reconnect_interval());
break;
@@ -1333,6 +1334,32 @@ applier_f(va_list ap)
applier_disconnect(applier, APPLIER_STOPPED);
return -1;
}
+ } catch (XlogGapError *e) {
+ /*
+ * Xlog gap error can't be a critical error. Because it
+ * is totally normal during bootstrap. Consider the
+ * case: node1 is a leader, it is booted with vclock
+ * {1: 3}. Node2 connects and fetches snapshot of node1,
+ * it also gets vclock {1: 3}. Then node1 writes
+ * something and its vclock becomes {1: 4}. Now node3
+ * boots from node1, and gets the same vclock. Vclocks
+ * now look like this:
+ *
+ * - node1: {1: 4}, leader, has {1: 3} snap.
+ * - node2: {1: 3}, booted from node1, has only snap.
+ * - node3: {1: 4}, booted from node1, has only snap.
+ *
+ * If the cluster is a fullmesh, node2 will send
+ * subscribe requests with vclock {1: 3}. If node3
+ * receives it, it will respond with xlog gap error,
+ * because it only has a snap with {1: 4}, nothing else.
+ * In that case node2 should retry connecting to node3,
+ * and in the meantime try to get newer changes from
+ * node1.
+ */
+ applier_log_error(applier, e);
+ applier_disconnect(applier, APPLIER_LOADING);
+ goto reconnect;
} catch (FiberIsCancelled *e) {
if (!diag_is_empty(&applier->diag)) {
diag_move(&applier->diag, &fiber()->diag);
diff --git a/test/replication/force_recovery.result b/test/replication/force_recovery.result
index f50452858..e142e829a 100644
--- a/test/replication/force_recovery.result
+++ b/test/replication/force_recovery.result
@@ -63,7 +63,7 @@ fio.unlink(xlog)
box.cfg{force_recovery = true}
---
...
-test_run:cmd("start server test")
+test_run:cmd("start server test with wait=False")
---
- true
...
@@ -71,13 +71,13 @@ test_run:cmd("switch test")
---
- true
...
-box.space.test:select()
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
---
-- []
+- true
...
-box.info.replication[1].upstream.status == 'stopped' or box.info
+box.space.test:select()
---
-- true
+- []
...
test_run:cmd("switch default")
---
diff --git a/test/replication/force_recovery.test.lua b/test/replication/force_recovery.test.lua
index 54307814b..bd3b439d2 100644
--- a/test/replication/force_recovery.test.lua
+++ b/test/replication/force_recovery.test.lua
@@ -27,10 +27,10 @@ fio.unlink(xlog)
-- Check that even though box.cfg.force_recovery is set,
-- replication will still fail due to LSN gap.
box.cfg{force_recovery = true}
-test_run:cmd("start server test")
+test_run:cmd("start server test with wait=False")
test_run:cmd("switch test")
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
box.space.test:select()
-box.info.replication[1].upstream.status == 'stopped' or box.info
test_run:cmd("switch default")
box.cfg{force_recovery = false}
diff --git a/test/replication/replica.lua b/test/replication/replica.lua
index f3a6dfe58..76949a568 100644
--- a/test/replication/replica.lua
+++ b/test/replication/replica.lua
@@ -9,11 +9,13 @@ else
repl_list = os.getenv("MASTER")
end
+-- Start the console first to allow test-run to attach even before
+-- box.cfg is finished.
+require('console').listen(os.getenv('ADMIN'))
+
box.cfg({
listen = os.getenv("LISTEN"),
replication = repl_list,
memtx_memory = 107374182,
replication_timeout = 0.1,
})
-
-require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index dd04ae297..f6e74eae1 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -213,7 +213,7 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
box.cfg{checkpoint_count = checkpoint_count}
---
...
-test_run:cmd("start server replica with args='true'")
+test_run:cmd("start server replica with args='true', wait=False")
---
- true
...
@@ -221,9 +221,17 @@ test_run:cmd("switch replica")
---
- true
...
-box.info.status -- orphan
+-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
+-- wait for the upstream status sometimes.
+test_run:wait_cond(function() \
+ return box.info ~= nil and box.info.replication[1] ~= nil \
+end)
---
-- orphan
+- true
+...
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
+---
+- true
...
box.space.test:select()
---
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 410ef44d7..0feea152e 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -79,9 +79,14 @@ for i = 1, 3 do box.space.test:insert{i * 100} end
fio = require('fio')
test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) == 1 end) or fio.pathjoin(box.cfg.wal_dir, '*.xlog')
box.cfg{checkpoint_count = checkpoint_count}
-test_run:cmd("start server replica with args='true'")
+test_run:cmd("start server replica with args='true', wait=False")
test_run:cmd("switch replica")
-box.info.status -- orphan
+-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
+-- wait for the upstream status sometimes.
+test_run:wait_cond(function() \
+ return box.info ~= nil and box.info.replication[1] ~= nil \
+end)
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
box.space.test:select()
--
diff --git a/test/replication/show_error_on_disconnect.result b/test/replication/show_error_on_disconnect.result
index 48003db06..19d3886e4 100644
--- a/test/replication/show_error_on_disconnect.result
+++ b/test/replication/show_error_on_disconnect.result
@@ -72,7 +72,7 @@ box.space.test:select()
other_id = box.info.id % 2 + 1
---
...
-test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
+test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
---
- true
...
diff --git a/test/replication/show_error_on_disconnect.test.lua b/test/replication/show_error_on_disconnect.test.lua
index 1b0ea4373..dce926a34 100644
--- a/test/replication/show_error_on_disconnect.test.lua
+++ b/test/replication/show_error_on_disconnect.test.lua
@@ -29,7 +29,7 @@ test_run:cmd("switch master_quorum1")
box.cfg{replication = repl}
box.space.test:select()
other_id = box.info.id % 2 + 1
-test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
+test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
test_run:cmd("switch master_quorum2")
box.space.test:select()
other_id = box.info.id % 2 + 1
diff --git a/test/xlog/panic_on_wal_error.result b/test/xlog/panic_on_wal_error.result
index 22f14f912..c4494ac87 100644
--- a/test/xlog/panic_on_wal_error.result
+++ b/test/xlog/panic_on_wal_error.result
@@ -121,7 +121,7 @@ box.cfg.force_recovery
-- try to start the replica, ha-ha
-- (replication should fail, some rows are missing)
--
-test_run:cmd("start server replica")
+test_run:cmd("start server replica with wait=False")
---
- true
...
@@ -129,19 +129,17 @@ test_run:cmd("switch replica")
---
- true
...
-fiber = require('fiber')
----
-...
-while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
+-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
+-- wait for the upstream status sometimes.
+test_run:wait_cond(function() \
+ return box.info ~= nil and box.info.replication[1] ~= nil \
+end)
---
+- true
...
-box.info.replication[1].upstream.status
----
-- stopped
-...
-box.info.replication[1].upstream.message
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
---
-- 'Missing .xlog file between LSN 6 {1: 6} and 8 {1: 8}'
+- true
...
box.space.test:select{}
---
diff --git a/test/xlog/panic_on_wal_error.test.lua b/test/xlog/panic_on_wal_error.test.lua
index 2e95431c6..eea6aad30 100644
--- a/test/xlog/panic_on_wal_error.test.lua
+++ b/test/xlog/panic_on_wal_error.test.lua
@@ -57,12 +57,14 @@ box.cfg.force_recovery
-- try to start the replica, ha-ha
-- (replication should fail, some rows are missing)
--
-test_run:cmd("start server replica")
+test_run:cmd("start server replica with wait=False")
test_run:cmd("switch replica")
-fiber = require('fiber')
-while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
-box.info.replication[1].upstream.status
-box.info.replication[1].upstream.message
+-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
+-- wait for the upstream status sometimes.
+test_run:wait_cond(function() \
+ return box.info ~= nil and box.info.replication[1] ~= nil \
+end)
+test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
box.space.test:select{}
--
--
diff --git a/test/xlog/replica.lua b/test/xlog/replica.lua
index b76c2f776..3268023dc 100644
--- a/test/xlog/replica.lua
+++ b/test/xlog/replica.lua
@@ -1,5 +1,9 @@
#!/usr/bin/env tarantool
+-- Start the console first to allow test-run to attach even before
+-- box.cfg is finished.
+require('console').listen(os.getenv('ADMIN'))
+
box.cfg({
listen = os.getenv("LISTEN"),
replication = os.getenv("MASTER"),
@@ -7,5 +11,3 @@ box.cfg({
-- pid_file = "tarantool.pid",
-- logger = "tarantool.log",
})
-
-require('console').listen(os.getenv('ADMIN'))
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError
2020-09-15 21:23 ` Vladislav Shpilevoy
@ 2020-09-16 10:59 ` Serge Petrenko
0 siblings, 0 replies; 17+ messages in thread
From: Serge Petrenko @ 2020-09-16 10:59 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, gorcunov
16.09.2020 00:23, Vladislav Shpilevoy пишет:
> Hi! Thanks for the review!
>
>> I propose you rework the tests so that they expect upstream.status == 'loading'
>> instead of deleting them altogether.
> I return them back for now with a few amendments. However here is
> the motivation why I dropped them earlier:
>
> - Their functionality was fully covered by some other tests doing exactly the same;
> - Both removed tests were also doing the same;
> - Xlog/panic_on_wal_error is in a wrong folder, we should have such tests in
> replication/ folder. Also the test is close to useless because it seems it
> tries to see what happens with recovery when box.cfg{force_recovery = false}
> on the replica, but the setting is installed on the master. So it does not mean
> anything.
>
> Anyway, now they are back. Maybe we will drop them later someday.
Thanks for your answer!
Still I believe we should delete the tests as a part of some separate
activity.
The patch LGTM.
>
> To make them work with the 'loading' status I needed to move one line in
> xlog/replica.lua and replication/replica.lua about starting the admin console
> before box.cfg. So as test-run could attach before box.cfg is finished.
> Otherwise it hangs because of 'loading'.
>
> See the full new patch below. The diff was too big.
>
> ====================
> replication: retry in case of XlogGapError
>
> Previously XlogGapError was considered a critical error stopping
> the replication. That may be not so good as it looks.
>
> XlogGapError is a perfectly fine error, which should not kill the
> replication connection. It should be retried instead.
>
> Because here is an example, when the gap can be recovered on its
> own. Consider the case: node1 is a leader, it is booted with
> vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
> also gets vclock {1: 3}. Then node1 writes something and its
> vclock becomes {1: 4}. Now node3 boots from node1, and gets the
> same vclock. Vclocks now look like this:
>
> - node1: {1: 4}, leader, has {1: 3} snap.
> - node2: {1: 3}, booted from node1, has only snap.
> - node3: {1: 4}, booted from node1, has only snap.
>
> If the cluster is a fullmesh, node2 will send subscribe requests
> with vclock {1: 3}. If node3 receives it, it will respond with
> xlog gap error, because it only has a snap with {1: 4}, nothing
> else. In that case node2 should retry connecting to node3, and in
> the meantime try to get newer changes from node1.
>
> The example is totally valid. However it is unreachable now
> because master registers all replicas in _cluster before allowing
> them to make a join. So they all bootstrap from a snapshot
> containing all their IDs. This is a bug, because such
> auto-registration leads to registration of anonymous replicas, if
> they are present during bootstrap. Also it blocks Raft, which
> can't work if there are registered, but not yet joined nodes.
>
> Once the registration problem will be solved in a next commit, the
> XlogGapError will strike quite often during bootstrap. This patch
> won't allow that happen.
>
> Needed for #5287
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index c1d07ca54..96dd48c0d 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -106,6 +106,7 @@ applier_log_error(struct applier *applier, struct error *e)
> case ER_SYSTEM:
> case ER_UNKNOWN_REPLICA:
> case ER_PASSWORD_MISMATCH:
> + case ER_XLOG_GAP:
> say_info("will retry every %.2lf second",
> replication_reconnect_interval());
> break;
> @@ -1333,6 +1334,32 @@ applier_f(va_list ap)
> applier_disconnect(applier, APPLIER_STOPPED);
> return -1;
> }
> + } catch (XlogGapError *e) {
> + /*
> + * Xlog gap error can't be a critical error. Because it
> + * is totally normal during bootstrap. Consider the
> + * case: node1 is a leader, it is booted with vclock
> + * {1: 3}. Node2 connects and fetches snapshot of node1,
> + * it also gets vclock {1: 3}. Then node1 writes
> + * something and its vclock becomes {1: 4}. Now node3
> + * boots from node1, and gets the same vclock. Vclocks
> + * now look like this:
> + *
> + * - node1: {1: 4}, leader, has {1: 3} snap.
> + * - node2: {1: 3}, booted from node1, has only snap.
> + * - node3: {1: 4}, booted from node1, has only snap.
> + *
> + * If the cluster is a fullmesh, node2 will send
> + * subscribe requests with vclock {1: 3}. If node3
> + * receives it, it will respond with xlog gap error,
> + * because it only has a snap with {1: 4}, nothing else.
> + * In that case node2 should retry connecting to node3,
> + * and in the meantime try to get newer changes from
> + * node1.
> + */
> + applier_log_error(applier, e);
> + applier_disconnect(applier, APPLIER_LOADING);
> + goto reconnect;
> } catch (FiberIsCancelled *e) {
> if (!diag_is_empty(&applier->diag)) {
> diag_move(&applier->diag, &fiber()->diag);
> diff --git a/test/replication/force_recovery.result b/test/replication/force_recovery.result
> index f50452858..e142e829a 100644
> --- a/test/replication/force_recovery.result
> +++ b/test/replication/force_recovery.result
> @@ -63,7 +63,7 @@ fio.unlink(xlog)
> box.cfg{force_recovery = true}
> ---
> ...
> -test_run:cmd("start server test")
> +test_run:cmd("start server test with wait=False")
> ---
> - true
> ...
> @@ -71,13 +71,13 @@ test_run:cmd("switch test")
> ---
> - true
> ...
> -box.space.test:select()
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
> ---
> -- []
> +- true
> ...
> -box.info.replication[1].upstream.status == 'stopped' or box.info
> +box.space.test:select()
> ---
> -- true
> +- []
> ...
> test_run:cmd("switch default")
> ---
> diff --git a/test/replication/force_recovery.test.lua b/test/replication/force_recovery.test.lua
> index 54307814b..bd3b439d2 100644
> --- a/test/replication/force_recovery.test.lua
> +++ b/test/replication/force_recovery.test.lua
> @@ -27,10 +27,10 @@ fio.unlink(xlog)
> -- Check that even though box.cfg.force_recovery is set,
> -- replication will still fail due to LSN gap.
> box.cfg{force_recovery = true}
> -test_run:cmd("start server test")
> +test_run:cmd("start server test with wait=False")
> test_run:cmd("switch test")
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
> box.space.test:select()
> -box.info.replication[1].upstream.status == 'stopped' or box.info
> test_run:cmd("switch default")
> box.cfg{force_recovery = false}
>
> diff --git a/test/replication/replica.lua b/test/replication/replica.lua
> index f3a6dfe58..76949a568 100644
> --- a/test/replication/replica.lua
> +++ b/test/replication/replica.lua
> @@ -9,11 +9,13 @@ else
> repl_list = os.getenv("MASTER")
> end
>
> +-- Start the console first to allow test-run to attach even before
> +-- box.cfg is finished.
> +require('console').listen(os.getenv('ADMIN'))
> +
> box.cfg({
> listen = os.getenv("LISTEN"),
> replication = repl_list,
> memtx_memory = 107374182,
> replication_timeout = 0.1,
> })
> -
> -require('console').listen(os.getenv('ADMIN'))
> diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
> index dd04ae297..f6e74eae1 100644
> --- a/test/replication/replica_rejoin.result
> +++ b/test/replication/replica_rejoin.result
> @@ -213,7 +213,7 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
> box.cfg{checkpoint_count = checkpoint_count}
> ---
> ...
> -test_run:cmd("start server replica with args='true'")
> +test_run:cmd("start server replica with args='true', wait=False")
> ---
> - true
> ...
> @@ -221,9 +221,17 @@ test_run:cmd("switch replica")
> ---
> - true
> ...
> -box.info.status -- orphan
> +-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
> +-- wait for the upstream status sometimes.
> +test_run:wait_cond(function() \
> + return box.info ~= nil and box.info.replication[1] ~= nil \
> +end)
> ---
> -- orphan
> +- true
> +...
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
> +---
> +- true
> ...
> box.space.test:select()
> ---
> diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
> index 410ef44d7..0feea152e 100644
> --- a/test/replication/replica_rejoin.test.lua
> +++ b/test/replication/replica_rejoin.test.lua
> @@ -79,9 +79,14 @@ for i = 1, 3 do box.space.test:insert{i * 100} end
> fio = require('fio')
> test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.xlog')) == 1 end) or fio.pathjoin(box.cfg.wal_dir, '*.xlog')
> box.cfg{checkpoint_count = checkpoint_count}
> -test_run:cmd("start server replica with args='true'")
> +test_run:cmd("start server replica with args='true', wait=False")
> test_run:cmd("switch replica")
> -box.info.status -- orphan
> +-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
> +-- wait for the upstream status sometimes.
> +test_run:wait_cond(function() \
> + return box.info ~= nil and box.info.replication[1] ~= nil \
> +end)
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
> box.space.test:select()
>
> --
> diff --git a/test/replication/show_error_on_disconnect.result b/test/replication/show_error_on_disconnect.result
> index 48003db06..19d3886e4 100644
> --- a/test/replication/show_error_on_disconnect.result
> +++ b/test/replication/show_error_on_disconnect.result
> @@ -72,7 +72,7 @@ box.space.test:select()
> other_id = box.info.id % 2 + 1
> ---
> ...
> -test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
> +test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
> ---
> - true
> ...
> diff --git a/test/replication/show_error_on_disconnect.test.lua b/test/replication/show_error_on_disconnect.test.lua
> index 1b0ea4373..dce926a34 100644
> --- a/test/replication/show_error_on_disconnect.test.lua
> +++ b/test/replication/show_error_on_disconnect.test.lua
> @@ -29,7 +29,7 @@ test_run:cmd("switch master_quorum1")
> box.cfg{replication = repl}
> box.space.test:select()
> other_id = box.info.id % 2 + 1
> -test_run:wait_upstream(other_id, {status = 'stopped', message_re = 'Missing'})
> +test_run:wait_upstream(other_id, {status = 'loading', message_re = 'Missing'})
> test_run:cmd("switch master_quorum2")
> box.space.test:select()
> other_id = box.info.id % 2 + 1
> diff --git a/test/xlog/panic_on_wal_error.result b/test/xlog/panic_on_wal_error.result
> index 22f14f912..c4494ac87 100644
> --- a/test/xlog/panic_on_wal_error.result
> +++ b/test/xlog/panic_on_wal_error.result
> @@ -121,7 +121,7 @@ box.cfg.force_recovery
> -- try to start the replica, ha-ha
> -- (replication should fail, some rows are missing)
> --
> -test_run:cmd("start server replica")
> +test_run:cmd("start server replica with wait=False")
> ---
> - true
> ...
> @@ -129,19 +129,17 @@ test_run:cmd("switch replica")
> ---
> - true
> ...
> -fiber = require('fiber')
> ----
> -...
> -while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
> +-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
> +-- wait for the upstream status sometimes.
> +test_run:wait_cond(function() \
> + return box.info ~= nil and box.info.replication[1] ~= nil \
> +end)
> ---
> +- true
> ...
> -box.info.replication[1].upstream.status
> ----
> -- stopped
> -...
> -box.info.replication[1].upstream.message
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
> ---
> -- 'Missing .xlog file between LSN 6 {1: 6} and 8 {1: 8}'
> +- true
> ...
> box.space.test:select{}
> ---
> diff --git a/test/xlog/panic_on_wal_error.test.lua b/test/xlog/panic_on_wal_error.test.lua
> index 2e95431c6..eea6aad30 100644
> --- a/test/xlog/panic_on_wal_error.test.lua
> +++ b/test/xlog/panic_on_wal_error.test.lua
> @@ -57,12 +57,14 @@ box.cfg.force_recovery
> -- try to start the replica, ha-ha
> -- (replication should fail, some rows are missing)
> --
> -test_run:cmd("start server replica")
> +test_run:cmd("start server replica with wait=False")
> test_run:cmd("switch replica")
> -fiber = require('fiber')
> -while box.info.replication[1].upstream.status ~= "stopped" do fiber.sleep(0.001) end
> -box.info.replication[1].upstream.status
> -box.info.replication[1].upstream.message
> +-- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
> +-- wait for the upstream status sometimes.
> +test_run:wait_cond(function() \
> + return box.info ~= nil and box.info.replication[1] ~= nil \
> +end)
> +test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
> box.space.test:select{}
> --
> --
> diff --git a/test/xlog/replica.lua b/test/xlog/replica.lua
> index b76c2f776..3268023dc 100644
> --- a/test/xlog/replica.lua
> +++ b/test/xlog/replica.lua
> @@ -1,5 +1,9 @@
> #!/usr/bin/env tarantool
>
> +-- Start the console first to allow test-run to attach even before
> +-- box.cfg is finished.
> +require('console').listen(os.getenv('ADMIN'))
> +
> box.cfg({
> listen = os.getenv("LISTEN"),
> replication = os.getenv("MASTER"),
> @@ -7,5 +11,3 @@ box.cfg({
> -- pid_file = "tarantool.pid",
> -- logger = "tarantool.log",
> })
> -
> -require('console').listen(os.getenv('ADMIN'))
--
Serge Petrenko
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot
2020-09-15 21:22 ` Vladislav Shpilevoy
@ 2020-09-16 10:59 ` Serge Petrenko
0 siblings, 0 replies; 17+ messages in thread
From: Serge Petrenko @ 2020-09-16 10:59 UTC (permalink / raw)
To: Vladislav Shpilevoy, tarantool-patches, gorcunov
16.09.2020 00:22, Vladislav Shpilevoy пишет:
> Thanks for the review!
>
>>> diff --git a/src/box/replication.cc b/src/box/replication.cc
>>> index ef0e2411d..8408f395e 100644
>>> --- a/src/box/replication.cc
>>> +++ b/src/box/replication.cc
>>> @@ -290,6 +290,8 @@ replica_clear_id(struct replica *replica)
>>> }
>>> if (replica_is_orphan(replica)) {
>>> replica_hash_remove(&replicaset.hash, replica);
>>> + replicaset.anon_count -= replica->anon;
>>> + assert(replicaset.anon_count >= 0);
>> replica_clear_id() is only called on _cluster entry deletion.
>> So you're surely working with a normal replica, not anonymous.
>>
>> We may add an assset(!replica->anon) somewhere.
> Indeed:
>
> ====================
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 8408f395e..c1fcdb660 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -290,8 +290,11 @@ replica_clear_id(struct replica *replica)
> }
> if (replica_is_orphan(replica)) {
> replica_hash_remove(&replicaset.hash, replica);
> - replicaset.anon_count -= replica->anon;
> - assert(replicaset.anon_count >= 0);
> + /*
> + * The replica had an ID, it couldn't be anon by
> + * definition.
> + */
> + assert(!replica->anon);
> replica_delete(replica);
> }
> }
Thanks for your answert!
Now LGTM.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 0/4] Boot with anon
2020-09-14 23:11 [Tarantool-patches] [PATCH v2 0/4] Boot with anon Vladislav Shpilevoy
` (3 preceding siblings ...)
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
@ 2020-09-17 12:08 ` Kirill Yukhin
2020-09-17 13:00 ` Vladislav Shpilevoy
4 siblings, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2020-09-17 12:08 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tarantool-patches
Hello,
On 15 сен 01:11, Vladislav Shpilevoy wrote:
> The patch attempts to address with problem of anonymous replicas being
> registered in _cluster, if they are present during bootstrap.
>
> The bug was found during working on another issue related to Raft. The problem
> is that Raft won't work properly during bootstrap if non-joined replicas are
> registered in _cluster.
>
> When their auto-registration by applier was removed, the anon bug was found.
>
> The auto-registration removal is trivial, but it breaks the cluster bootstrap in
> another way creating false-positive XlogGap errors. See the second commit with
> an explanation. To solve the issue quite a radical solution is applied - gap
> errors are not considered critical anymore, and can be retried. I am not sure
> that is the best option, but couldn't come up with anything better after a long
> struggle with that.
>
> This is a bug, so whatever we will come up with after all, it should be pushed
> to the older versions too.
>
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5287-anon-false-register
> Issue: https://github.com/tarantool/tarantool/issues/5287
>
> Changes in v2:
> - Anon status is stored as a flag again. In v1 it was stored as enum, but an
> alternative solution was proposed, where the enum is not needed.
> - Ballot now has a new field is_anon. It helps to avoid the enum, and set
> replica->anon flag to a correct value right when it becomes connected. Through
> relay or applier, either.
>
> @ChangeLog
> * Anonymous replica could be registered and could prevent WAL files removal (gh-5287).
> * XlogGapError is not a critical error anymore. It means, box.info.replication will show upstream status as 'loading' if the error was found. The upstream will be restarted until the error is resolved automatically with a help of another instance, or until the replica is removed from box.cfg.replication (gh-5287).
I've checked your patchset into 2.5 and master.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 0/4] Boot with anon
2020-09-17 12:08 ` [Tarantool-patches] [PATCH v2 0/4] Boot with anon Kirill Yukhin
@ 2020-09-17 13:00 ` Vladislav Shpilevoy
2020-09-17 15:04 ` Kirill Yukhin
0 siblings, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-17 13:00 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tarantool-patches
The bug also exists on 2.4.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 0/4] Boot with anon
2020-09-17 13:00 ` Vladislav Shpilevoy
@ 2020-09-17 15:04 ` Kirill Yukhin
2020-09-17 16:42 ` Vladislav Shpilevoy
0 siblings, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2020-09-17 15:04 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tarantool-patches
Hello,
On 17 сен 15:00, Vladislav Shpilevoy wrote:
> The bug also exists on 2.4.
Could you pls cherry-pick it to 2.4 then?
--
Regards, Kirill YUkhin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 0/4] Boot with anon
2020-09-17 15:04 ` Kirill Yukhin
@ 2020-09-17 16:42 ` Vladislav Shpilevoy
0 siblings, 0 replies; 17+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-17 16:42 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tarantool-patches
Done.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-09-17 16:42 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14 23:11 [Tarantool-patches] [PATCH v2 0/4] Boot with anon Vladislav Shpilevoy
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 1/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
2020-09-15 7:53 ` Serge Petrenko
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 2/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
2020-09-15 7:35 ` Serge Petrenko
2020-09-15 21:23 ` Vladislav Shpilevoy
2020-09-16 10:59 ` Serge Petrenko
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 3/4] replication: add is_anon flag to ballot Vladislav Shpilevoy
2020-09-15 7:46 ` Serge Petrenko
2020-09-15 21:22 ` Vladislav Shpilevoy
2020-09-16 10:59 ` Serge Petrenko
2020-09-14 23:11 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
2020-09-15 7:50 ` Serge Petrenko
2020-09-17 12:08 ` [Tarantool-patches] [PATCH v2 0/4] Boot with anon Kirill Yukhin
2020-09-17 13:00 ` Vladislav Shpilevoy
2020-09-17 15:04 ` Kirill Yukhin
2020-09-17 16:42 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox