* [PATCH v2 1/2] replication: stay in orphan mode until replica is synced by vclock
2018-03-30 14:03 [PATCH v2 0/2] recover missing local data from replica Konstantin Belyavskiy
@ 2018-03-30 14:03 ` Konstantin Belyavskiy
2018-03-30 14:03 ` [PATCH v2 2/2] replication: recover missing local data from replica Konstantin Belyavskiy
2018-03-30 16:52 ` [PATCH v2 0/2] " Vladimir Davydov
2 siblings, 0 replies; 4+ messages in thread
From: Konstantin Belyavskiy @ 2018-03-30 14:03 UTC (permalink / raw)
To: tarantool-patches, vdavydov
Stay in orphan (read-only) mode until local vclock is lower than
master's to make sure that datasets are the same across replicaset.
Also revert and slightly update catch test.
Needed for 3210
---
src/box/applier.cc | 16 +++++++++++-----
test/replication/catch.result | 15 ++++++++++-----
test/replication/catch.test.lua | 7 ++++---
3 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 6bfe5a99a..12bf1f0d2 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -305,7 +305,7 @@ applier_join(struct applier *applier)
* server is 1.6. Since we have
* not initialized replication
* vclock yet, do it now. In 1.7+
- * this vlcock is not used.
+ * this vclock is not used.
*/
xrow_decode_vclock_xc(&row, &replicaset.vclock);
}
@@ -370,6 +370,7 @@ applier_subscribe(struct applier *applier)
struct ev_io *coio = &applier->io;
struct ibuf *ibuf = &applier->ibuf;
struct xrow_header row;
+ struct vclock remote_vclock_at_subscribe;
xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,
&replicaset.vclock);
@@ -411,9 +412,8 @@ applier_subscribe(struct applier *applier)
* In case of successful subscribe, the server
* responds with its current vclock.
*/
- struct vclock vclock;
- vclock_create(&vclock);
- xrow_decode_vclock_xc(&row, &vclock);
+ vclock_create(&remote_vclock_at_subscribe);
+ xrow_decode_vclock_xc(&row, &remote_vclock_at_subscribe);
}
/**
* Tarantool < 1.6.7:
@@ -452,8 +452,14 @@ applier_subscribe(struct applier *applier)
applier_set_state(applier, APPLIER_FOLLOW);
}
+ /*
+ * Must stay in read-only mode, until it synchronized.
+ * Check lag and compare local vclock with remote one.
+ */
if (applier->state == APPLIER_SYNC &&
- applier->lag <= replication_sync_lag) {
+ applier->lag <= replication_sync_lag &&
+ vclock_compare(&remote_vclock_at_subscribe,
+ &replicaset.vclock) <= 0) {
/* Applier is synced, switch to "follow". */
applier_set_state(applier, APPLIER_FOLLOW);
}
diff --git a/test/replication/catch.result b/test/replication/catch.result
index 7d61ad26f..681cd77ac 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -19,11 +19,11 @@ errinj = box.error.injection
box.schema.user.grant('guest', 'replication')
---
...
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica_timeout.lua'")
---
- true
...
-test_run:cmd("start server replica")
+test_run:cmd("start server replica with args='0.1'")
---
- true
...
@@ -69,7 +69,7 @@ errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
---
- ok
...
-test_run:cmd("start server replica")
+test_run:cmd("start server replica with args='0.1'")
---
- true
...
@@ -99,10 +99,11 @@ box.space.test ~= nil
...
d = box.space.test:delete{1}
---
+- error: Can't modify data because this instance is in read-only mode.
...
box.space.test:get(1) == nil
---
-- true
+- false
...
-- case #2: delete tuple by net.box
test_run:cmd("switch default")
@@ -116,9 +117,13 @@ test_run:cmd("set variable r_uri to 'replica.listen'")
c = net_box.connect(r_uri)
---
...
+d = c.space.test:delete{1}
+---
+- error: Can't modify data because this instance is in read-only mode.
+...
c.space.test:get(1) == nil
---
-- true
+- false
...
-- check sync
errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index cb865aa3c..cbfa1c19a 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -8,8 +8,8 @@ net_box = require('net.box')
errinj = box.error.injection
box.schema.user.grant('guest', 'replication')
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
-test_run:cmd("start server replica")
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica_timeout.lua'")
+test_run:cmd("start server replica with args='0.1'")
test_run:cmd("switch replica")
test_run:cmd("switch default")
@@ -29,7 +29,7 @@ for i=1,100 do s:insert{i, 'this is test message12345'} end
-- sleep after every tuple
errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
-test_run:cmd("start server replica")
+test_run:cmd("start server replica with args='0.1'")
test_run:cmd("switch replica")
fiber = require('fiber')
@@ -53,6 +53,7 @@ box.space.test:get(1) == nil
test_run:cmd("switch default")
test_run:cmd("set variable r_uri to 'replica.listen'")
c = net_box.connect(r_uri)
+d = c.space.test:delete{1}
c.space.test:get(1) == nil
-- check sync
--
2.14.3 (Apple Git-98)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] replication: recover missing local data from replica
2018-03-30 14:03 [PATCH v2 0/2] recover missing local data from replica Konstantin Belyavskiy
2018-03-30 14:03 ` [PATCH v2 1/2] replication: stay in orphan mode until replica is synced by vclock Konstantin Belyavskiy
@ 2018-03-30 14:03 ` Konstantin Belyavskiy
2018-03-30 16:52 ` [PATCH v2 0/2] " Vladimir Davydov
2 siblings, 0 replies; 4+ messages in thread
From: Konstantin Belyavskiy @ 2018-03-30 14:03 UTC (permalink / raw)
To: tarantool-patches, vdavydov
In case of sudden power-loss, if data was not written to WAL but
already sent to remote replica, local can't recover properly and
we have different datasets.
Fix it by using remote replica's data and LSN comparison.
Based on @GeorgyKirichenko proposal and @locker race free check.
Closes #3210
---
src/box/relay.cc | 16 ++++-
src/box/wal.cc | 15 +++-
test/replication/recover_missing.result | 116 ++++++++++++++++++++++++++++++
test/replication/recover_missing.test.lua | 41 +++++++++++
test/replication/suite.ini | 2 +-
5 files changed, 185 insertions(+), 5 deletions(-)
create mode 100644 test/replication/recover_missing.result
create mode 100644 test/replication/recover_missing.test.lua
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 2bd05ad5f..88de2a32b 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -110,6 +110,11 @@ struct relay {
struct vclock recv_vclock;
/** Replicatoin slave version. */
uint32_t version_id;
+ /**
+ * Local vclock at the moment of subscribe, used to check
+ * dataset on the other side and send missing data rows if any.
+ */
+ struct vclock local_vclock_at_subscribe;
/** Relay endpoint */
struct cbus_endpoint endpoint;
@@ -541,6 +546,7 @@ relay_subscribe(int fd, uint64_t sync, struct replica *replica,
relay.version_id = replica_version_id;
relay.replica = replica;
replica_set_relay(replica, &relay);
+ vclock_copy(&relay.local_vclock_at_subscribe, &replicaset.vclock);
int rc = cord_costart(&relay.cord, tt_sprintf("relay_%p", &relay),
relay_subscribe_f, &relay);
@@ -583,10 +589,16 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
/*
* We're feeding a WAL, thus responding to SUBSCRIBE request.
* In that case, only send a row if it is not from the same replica
- * (i.e. don't send replica's own rows back).
+ * (i.e. don't send replica's own rows back) or if this row is
+ * missing on the other side (i.e. in case of sudden power-loss,
+ * data was not written to WAL, so remote master can't recover
+ * it). In this case packet's LSN is lower or equal then local
+ * master's LSN at the moment it has issued 'SUBSCRIBE' request.
*/
if (relay->replica == NULL ||
- packet->replica_id != relay->replica->id) {
+ packet->replica_id != relay->replica->id ||
+ packet->lsn <= vclock_get(&relay->local_vclock_at_subscribe,
+ packet->replica_id)) {
relay_send(relay, packet);
}
}
diff --git a/src/box/wal.cc b/src/box/wal.cc
index 4576cfe09..7ea8fe20b 100644
--- a/src/box/wal.cc
+++ b/src/box/wal.cc
@@ -770,8 +770,19 @@ wal_write(struct journal *journal, struct journal_entry *entry)
* and promote vclock.
*/
if ((*last)->replica_id == instance_id) {
- vclock_follow(&replicaset.vclock, instance_id,
- (*last)->lsn);
+ /*
+ * In master-master configuration, during sudden
+ * power-loss, if the data has not been written
+ * to WAL but have been already sent to others
+ * they will send the data back. In this case
+ * vclock has already been promoted in applier.
+ */
+ if (vclock_get(&replicaset.vclock,
+ instance_id) < (*last)->lsn) {
+ vclock_follow(&replicaset.vclock,
+ instance_id,
+ (*last)->lsn);
+ }
break;
}
--last;
diff --git a/test/replication/recover_missing.result b/test/replication/recover_missing.result
new file mode 100644
index 000000000..4c9c9b195
--- /dev/null
+++ b/test/replication/recover_missing.result
@@ -0,0 +1,116 @@
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
+---
+...
+-- Start servers
+test_run:create_cluster(SERVERS)
+---
+...
+-- Wait for full mesh
+test_run:wait_fullmesh(SERVERS)
+---
+...
+test_run:cmd("switch autobootstrap1")
+---
+- true
+...
+for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
+---
+...
+box.space.test:count()
+---
+- 10
+...
+test_run:cmd('switch default')
+---
+- true
+...
+vclock1 = test_run:get_vclock('autobootstrap1')
+---
+...
+vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
+---
+...
+test_run:cmd("switch autobootstrap2")
+---
+- true
+...
+box.space.test:count()
+---
+- 10
+...
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
+---
+- ok
+...
+test_run:cmd("stop server autobootstrap1")
+---
+- true
+...
+fio = require('fio')
+---
+...
+-- This test checks ability to recover missing local data
+-- from remote replica. See #3210.
+-- Delete data on first master and test that after restart,
+-- due to difference in vclock it will be able to recover
+-- all missing data from replica.
+-- Also check that there is no concurrency, i.e. master is
+-- in 'read-only' mode unless it receives all data.
+fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
+---
+- true
+...
+test_run:cmd("start server autobootstrap1")
+---
+- true
+...
+test_run:cmd("switch autobootstrap1")
+---
+- true
+...
+for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end
+---
+...
+fiber = require('fiber')
+---
+...
+fiber.sleep(0.1)
+---
+...
+box.space.test:select()
+---
+- - [0, 'test0']
+ - [1, 'test1']
+ - [2, 'test2']
+ - [3, 'test3']
+ - [4, 'test4']
+ - [5, 'test5']
+ - [6, 'test6']
+ - [7, 'test7']
+ - [8, 'test8']
+ - [9, 'test9']
+ - [10, 'test10']
+ - [11, 'test11']
+ - [12, 'test12']
+ - [13, 'test13']
+ - [14, 'test14']
+ - [15, 'test15']
+ - [16, 'test16']
+ - [17, 'test17']
+ - [18, 'test18']
+ - [19, 'test19']
+...
+-- Cleanup.
+test_run:cmd('switch default')
+---
+- true
+...
+test_run:drop_cluster(SERVERS)
+---
+...
diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua
new file mode 100644
index 000000000..775d23a0b
--- /dev/null
+++ b/test/replication/recover_missing.test.lua
@@ -0,0 +1,41 @@
+env = require('test_run')
+test_run = env.new()
+
+SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
+-- Start servers
+test_run:create_cluster(SERVERS)
+-- Wait for full mesh
+test_run:wait_fullmesh(SERVERS)
+
+test_run:cmd("switch autobootstrap1")
+for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
+box.space.test:count()
+
+test_run:cmd('switch default')
+vclock1 = test_run:get_vclock('autobootstrap1')
+vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
+
+test_run:cmd("switch autobootstrap2")
+box.space.test:count()
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
+test_run:cmd("stop server autobootstrap1")
+fio = require('fio')
+-- This test checks ability to recover missing local data
+-- from remote replica. See #3210.
+-- Delete data on first master and test that after restart,
+-- due to difference in vclock it will be able to recover
+-- all missing data from replica.
+-- Also check that there is no concurrency, i.e. master is
+-- in 'read-only' mode unless it receives all data.
+fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
+test_run:cmd("start server autobootstrap1")
+
+test_run:cmd("switch autobootstrap1")
+for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end
+fiber = require('fiber')
+fiber.sleep(0.1)
+box.space.test:select()
+
+-- Cleanup.
+test_run:cmd('switch default')
+test_run:drop_cluster(SERVERS)
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index ee76a3b00..b538f9625 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
script = master.lua
description = tarantool/box, replication
disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing.test.lua
config = suite.cfg
lua_libs = lua/fast_replica.lua
long_run = prune.test.lua
--
2.14.3 (Apple Git-98)
^ permalink raw reply [flat|nested] 4+ messages in thread