Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH 0/2] [replication] [recovery] recover missing data
@ 2018-03-29 16:15 Konstantin Belyavskiy
  2018-03-29 16:15 ` [tarantool-patches] [PATCH 1/2] " Konstantin Belyavskiy
  2018-03-29 16:15 ` [tarantool-patches] [PATCH 2/2] " Konstantin Belyavskiy
  0 siblings, 2 replies; 5+ messages in thread
From: Konstantin Belyavskiy @ 2018-03-29 16:15 UTC (permalink / raw)
  To: tarantool-patches

This changes fix issue with recovery missing local data
by using the one from replica.
And add vclock comparison as necessarry sync condition.
Splitted across two patches:
 - Add vclock comparison as sync condition and update test.
 - All required changes to make this recovery possibel and
   new test.

Konstantin Belyavskiy (2):
  [replication] [recovery] recover missing data
  [replication] [recovery] recover missing data

 src/box/applier.cc                        |  16 +++--
 src/box/relay.cc                          |  16 ++++-
 src/box/wal.cc                            |  15 +++-
 test/replication/catch.result             |   5 +-
 test/replication/recover_missing.result   | 116 ++++++++++++++++++++++++++++++
 test/replication/recover_missing.test.lua |  41 +++++++++++
 test/replication/suite.ini                |   2 +-
 7 files changed, 199 insertions(+), 12 deletions(-)
 create mode 100644 test/replication/recover_missing.result
 create mode 100644 test/replication/recover_missing.test.lua

-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] [PATCH 1/2] [replication] [recovery] recover missing data
  2018-03-29 16:15 [tarantool-patches] [PATCH 0/2] [replication] [recovery] recover missing data Konstantin Belyavskiy
@ 2018-03-29 16:15 ` Konstantin Belyavskiy
  2018-03-30 11:17   ` Vladimir Davydov
  2018-03-29 16:15 ` [tarantool-patches] [PATCH 2/2] " Konstantin Belyavskiy
  1 sibling, 1 reply; 5+ messages in thread
From: Konstantin Belyavskiy @ 2018-03-29 16:15 UTC (permalink / raw)
  To: tarantool-patches

Part 1 of 2.
Add vclock comparison as a new synchronization condition.
And update catch test as now it works as expected.

Closes #3210
---
 branch: gh-3210-recover-missing-local-data-master-master
 src/box/applier.cc            | 16 +++++++++++-----
 test/replication/catch.result |  5 +++--
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 6bfe5a99a..12bf1f0d2 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -305,7 +305,7 @@ applier_join(struct applier *applier)
 				 * server is 1.6. Since we have
 				 * not initialized replication
 				 * vclock yet, do it now. In 1.7+
-				 * this vlcock is not used.
+				 * this vclock is not used.
 				 */
 				xrow_decode_vclock_xc(&row, &replicaset.vclock);
 			}
@@ -370,6 +370,7 @@ applier_subscribe(struct applier *applier)
 	struct ev_io *coio = &applier->io;
 	struct ibuf *ibuf = &applier->ibuf;
 	struct xrow_header row;
+	struct vclock remote_vclock_at_subscribe;
 
 	xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,
 				 &replicaset.vclock);
@@ -411,9 +412,8 @@ applier_subscribe(struct applier *applier)
 		 * In case of successful subscribe, the server
 		 * responds with its current vclock.
 		 */
-		struct vclock vclock;
-		vclock_create(&vclock);
-		xrow_decode_vclock_xc(&row, &vclock);
+		vclock_create(&remote_vclock_at_subscribe);
+		xrow_decode_vclock_xc(&row, &remote_vclock_at_subscribe);
 	}
 	/**
 	 * Tarantool < 1.6.7:
@@ -452,8 +452,14 @@ applier_subscribe(struct applier *applier)
 			applier_set_state(applier, APPLIER_FOLLOW);
 		}
 
+		/*
+		 * Must stay in read-only mode, until it synchronized.
+		 * Check lag and compare local vclock with remote one.
+		 */
 		if (applier->state == APPLIER_SYNC &&
-		    applier->lag <= replication_sync_lag) {
+		    applier->lag <= replication_sync_lag &&
+		    vclock_compare(&remote_vclock_at_subscribe,
+				   &replicaset.vclock) <= 0) {
 			/* Applier is synced, switch to "follow". */
 			applier_set_state(applier, APPLIER_FOLLOW);
 		}
diff --git a/test/replication/catch.result b/test/replication/catch.result
index 7d61ad26f..05c1c243e 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -99,10 +99,11 @@ box.space.test ~= nil
 ...
 d = box.space.test:delete{1}
 ---
+- error: Can't modify data because this instance is in read-only mode.
 ...
 box.space.test:get(1) == nil
 ---
-- true
+- false
 ...
 -- case #2: delete tuple by net.box
 test_run:cmd("switch default")
@@ -118,7 +119,7 @@ c = net_box.connect(r_uri)
 ...
 c.space.test:get(1) == nil
 ---
-- true
+- false
 ...
 -- check sync
 errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] [PATCH 2/2] [replication] [recovery] recover missing data
  2018-03-29 16:15 [tarantool-patches] [PATCH 0/2] [replication] [recovery] recover missing data Konstantin Belyavskiy
  2018-03-29 16:15 ` [tarantool-patches] [PATCH 1/2] " Konstantin Belyavskiy
@ 2018-03-29 16:15 ` Konstantin Belyavskiy
  2018-03-30 11:33   ` Vladimir Davydov
  1 sibling, 1 reply; 5+ messages in thread
From: Konstantin Belyavskiy @ 2018-03-29 16:15 UTC (permalink / raw)
  To: tarantool-patches

Part 2 of 2.
Recover missing local data from replica.
In case of sudden power-loss, if data was not written to WAL but
already sent to remote replica, local can't recover properly and
we have different datasets.
Fix it by using remote replica's data and LSN comparison.
Based on @GeorgyKirichenko proposal and @locker race free check.

Closes #3210
---
 branch: gh-3210-recover-missing-local-data-master-master
 src/box/relay.cc                          |  16 ++++-
 src/box/wal.cc                            |  15 +++-
 test/replication/recover_missing.result   | 116 ++++++++++++++++++++++++++++++
 test/replication/recover_missing.test.lua |  41 +++++++++++
 test/replication/suite.ini                |   2 +-
 5 files changed, 185 insertions(+), 5 deletions(-)
 create mode 100644 test/replication/recover_missing.result
 create mode 100644 test/replication/recover_missing.test.lua

diff --git a/src/box/relay.cc b/src/box/relay.cc
index 2bd05ad5f..88de2a32b 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -110,6 +110,11 @@ struct relay {
 	struct vclock recv_vclock;
 	/** Replicatoin slave version. */
 	uint32_t version_id;
+	/**
+	 * Local vclock at the moment of subscribe, used to check
+	 * dataset on the other side and send missing data rows if any.
+	 */
+	struct vclock local_vclock_at_subscribe;
 
 	/** Relay endpoint */
 	struct cbus_endpoint endpoint;
@@ -541,6 +546,7 @@ relay_subscribe(int fd, uint64_t sync, struct replica *replica,
 	relay.version_id = replica_version_id;
 	relay.replica = replica;
 	replica_set_relay(replica, &relay);
+	vclock_copy(&relay.local_vclock_at_subscribe, &replicaset.vclock);
 
 	int rc = cord_costart(&relay.cord, tt_sprintf("relay_%p", &relay),
 			      relay_subscribe_f, &relay);
@@ -583,10 +589,16 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
 	/*
 	 * We're feeding a WAL, thus responding to SUBSCRIBE request.
 	 * In that case, only send a row if it is not from the same replica
-	 * (i.e. don't send replica's own rows back).
+	 * (i.e. don't send replica's own rows back) or if this row is
+	 * missing on the other side (i.e. in case of sudden power-loss,
+	 * data was not written to WAL, so remote master can't recover
+	 * it). In this case packet's LSN is lower or equal then local
+	 * master's LSN at the moment it has issued 'SUBSCRIBE' request.
 	 */
 	if (relay->replica == NULL ||
-	    packet->replica_id != relay->replica->id) {
+	    packet->replica_id != relay->replica->id ||
+	    packet->lsn <= vclock_get(&relay->local_vclock_at_subscribe,
+				      packet->replica_id)) {
 		relay_send(relay, packet);
 	}
 }
diff --git a/src/box/wal.cc b/src/box/wal.cc
index 4576cfe09..7ea8fe20b 100644
--- a/src/box/wal.cc
+++ b/src/box/wal.cc
@@ -770,8 +770,19 @@ wal_write(struct journal *journal, struct journal_entry *entry)
 			 * and promote vclock.
 			 */
 			if ((*last)->replica_id == instance_id) {
-				vclock_follow(&replicaset.vclock, instance_id,
-					      (*last)->lsn);
+				/*
+				 * In master-master configuration, during sudden
+				 * power-loss, if the data has not been written
+				 * to WAL but have been already sent to others
+				 * they will send the data back. In this case
+				 * vclock has already been promoted in applier.
+				 */
+				if (vclock_get(&replicaset.vclock,
+					       instance_id) < (*last)->lsn) {
+					vclock_follow(&replicaset.vclock,
+						      instance_id,
+						      (*last)->lsn);
+				}
 				break;
 			}
 			--last;
diff --git a/test/replication/recover_missing.result b/test/replication/recover_missing.result
new file mode 100644
index 000000000..4c9c9b195
--- /dev/null
+++ b/test/replication/recover_missing.result
@@ -0,0 +1,116 @@
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
+---
+...
+-- Start servers
+test_run:create_cluster(SERVERS)
+---
+...
+-- Wait for full mesh
+test_run:wait_fullmesh(SERVERS)
+---
+...
+test_run:cmd("switch autobootstrap1")
+---
+- true
+...
+for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
+---
+...
+box.space.test:count()
+---
+- 10
+...
+test_run:cmd('switch default')
+---
+- true
+...
+vclock1 = test_run:get_vclock('autobootstrap1')
+---
+...
+vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
+---
+...
+test_run:cmd("switch autobootstrap2")
+---
+- true
+...
+box.space.test:count()
+---
+- 10
+...
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
+---
+- ok
+...
+test_run:cmd("stop server autobootstrap1")
+---
+- true
+...
+fio = require('fio')
+---
+...
+-- This test checks ability to recover missing local data
+-- from remote replica. See #3210.
+-- Delete data on first master and test that after restart,
+-- due to difference in vclock it will be able to recover
+-- all missing data from replica.
+-- Also check that there is no concurrency, i.e. master is
+-- in 'read-only' mode unless it receives all data.
+fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
+---
+- true
+...
+test_run:cmd("start server autobootstrap1")
+---
+- true
+...
+test_run:cmd("switch autobootstrap1")
+---
+- true
+...
+for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end
+---
+...
+fiber = require('fiber')
+---
+...
+fiber.sleep(0.1)
+---
+...
+box.space.test:select()
+---
+- - [0, 'test0']
+  - [1, 'test1']
+  - [2, 'test2']
+  - [3, 'test3']
+  - [4, 'test4']
+  - [5, 'test5']
+  - [6, 'test6']
+  - [7, 'test7']
+  - [8, 'test8']
+  - [9, 'test9']
+  - [10, 'test10']
+  - [11, 'test11']
+  - [12, 'test12']
+  - [13, 'test13']
+  - [14, 'test14']
+  - [15, 'test15']
+  - [16, 'test16']
+  - [17, 'test17']
+  - [18, 'test18']
+  - [19, 'test19']
+...
+-- Cleanup.
+test_run:cmd('switch default')
+---
+- true
+...
+test_run:drop_cluster(SERVERS)
+---
+...
diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua
new file mode 100644
index 000000000..775d23a0b
--- /dev/null
+++ b/test/replication/recover_missing.test.lua
@@ -0,0 +1,41 @@
+env = require('test_run')
+test_run = env.new()
+
+SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
+-- Start servers
+test_run:create_cluster(SERVERS)
+-- Wait for full mesh
+test_run:wait_fullmesh(SERVERS)
+
+test_run:cmd("switch autobootstrap1")
+for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
+box.space.test:count()
+
+test_run:cmd('switch default')
+vclock1 = test_run:get_vclock('autobootstrap1')
+vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
+
+test_run:cmd("switch autobootstrap2")
+box.space.test:count()
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
+test_run:cmd("stop server autobootstrap1")
+fio = require('fio')
+-- This test checks ability to recover missing local data
+-- from remote replica. See #3210.
+-- Delete data on first master and test that after restart,
+-- due to difference in vclock it will be able to recover
+-- all missing data from replica.
+-- Also check that there is no concurrency, i.e. master is
+-- in 'read-only' mode unless it receives all data.
+fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
+test_run:cmd("start server autobootstrap1")
+
+test_run:cmd("switch autobootstrap1")
+for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end
+fiber = require('fiber')
+fiber.sleep(0.1)
+box.space.test:select()
+
+-- Cleanup.
+test_run:cmd('switch default')
+test_run:drop_cluster(SERVERS)
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index ee76a3b00..b538f9625 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
 script =  master.lua
 description = tarantool/box, replication
 disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua
 long_run = prune.test.lua
-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] [PATCH 1/2] [replication] [recovery] recover missing data
  2018-03-29 16:15 ` [tarantool-patches] [PATCH 1/2] " Konstantin Belyavskiy
@ 2018-03-30 11:17   ` Vladimir Davydov
  0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Davydov @ 2018-03-30 11:17 UTC (permalink / raw)
  To: Konstantin Belyavskiy; +Cc: tarantool-patches

The subject line doesn't conform to our guidelines. Please fix.

There's no version tag in the subject prefix.

Change log is missing (should be in the cover letter in this case).

On Thu, Mar 29, 2018 at 07:15:15PM +0300, Konstantin Belyavskiy wrote:
> Part 1 of 2.

Please remove this.

> Add vclock comparison as a new synchronization condition.
> And update catch test as now it works as expected.

The comment is awful. Please improve.

> 
> Closes #3210

This patch doesn't close the issue, but it's needed for it so you should
use "Needed for" instead of "Closes".

> ---
>  branch: gh-3210-recover-missing-local-data-master-master

The branch should be given as a hyperlink to GitHub. Also, a hyperlink
to the GitHub issue you're fixing should be here as well.

Please read the guidelines carefully and follow them to the letter,
as others already do:

  https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines.html#how-to-write-a-commit-message
  https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines.html#how-to-submit-a-patch-for-review

> diff --git a/test/replication/catch.result b/test/replication/catch.result
> index 7d61ad26f..05c1c243e 100644
> --- a/test/replication/catch.result
> +++ b/test/replication/catch.result
> @@ -99,10 +99,11 @@ box.space.test ~= nil
>  ...
>  d = box.space.test:delete{1}
>  ---
> +- error: Can't modify data because this instance is in read-only mode.
>  ...
>  box.space.test:get(1) == nil
>  ---
> -- true
> +- false
>  ...
>  -- case #2: delete tuple by net.box
>  test_run:cmd("switch default")
> @@ -118,7 +119,7 @@ c = net_box.connect(r_uri)
>  ...
>  c.space.test:get(1) == nil
>  ---
> -- true
> +- false
>  ...
>  -- check sync
>  errinj.set("ERRINJ_RELAY_TIMEOUT", 0)

From replication/catch.test.lua:

> -- case #2: delete tuple by net.box
> 
> test_run:cmd("switch default")
> test_run:cmd("set variable r_uri to 'replica.listen'")
> c = net_box.connect(r_uri)
> c.space.test:get(1) == nil

But the 'delete' command mentioned in the comment is missing.
It was removed by

  commit 73cbb4a6368a7c5bb3e0176a5d7f8b25bbce3493
  Author: Ilya <markovilya197@gmail.com>
  Date:   Thu Mar 30 14:08:18 2017 +0300
  
      Fix buggy replication/catch.test.lua
  
      Fixes #2092

which was obviously wrong (the author didn't even care to fix
the comment). Now, we can resurrect the missing 'delete' command.
Actually, we can revert all changes done by this commit altogether.
Please do.

Also, now the test takes too long to complete (> 4 seconds).
This happens, because the replica disconnects from the master
by timeout, which is set to 4 times box.cfg.replication_timeout.
Please use replica_timeout.lua instead of replica.lua to speed
up the test execution time.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] [PATCH 2/2] [replication] [recovery] recover missing data
  2018-03-29 16:15 ` [tarantool-patches] [PATCH 2/2] " Konstantin Belyavskiy
@ 2018-03-30 11:33   ` Vladimir Davydov
  0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Davydov @ 2018-03-30 11:33 UTC (permalink / raw)
  To: Konstantin Belyavskiy; +Cc: tarantool-patches

On Thu, Mar 29, 2018 at 07:15:16PM +0300, Konstantin Belyavskiy wrote:
> Part 2 of 2.
> Recover missing local data from replica.
> In case of sudden power-loss, if data was not written to WAL but
> already sent to remote replica, local can't recover properly and
> we have different datasets.
> Fix it by using remote replica's data and LSN comparison.
> Based on @GeorgyKirichenko proposal and @locker race free check.
> 
> Closes #3210
> ---
>  branch: gh-3210-recover-missing-local-data-master-master
>  src/box/relay.cc                          |  16 ++++-
>  src/box/wal.cc                            |  15 +++-
>  test/replication/recover_missing.result   | 116 ++++++++++++++++++++++++++++++
>  test/replication/recover_missing.test.lua |  41 +++++++++++
>  test/replication/suite.ini                |   2 +-
>  5 files changed, 185 insertions(+), 5 deletions(-)
>  create mode 100644 test/replication/recover_missing.result
>  create mode 100644 test/replication/recover_missing.test.lua

Nit: please rename the test to recover_missing_xlog.test.lua

> diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua
> new file mode 100644
> index 000000000..775d23a0b
> --- /dev/null
> +++ b/test/replication/recover_missing.test.lua
> @@ -0,0 +1,41 @@
> +env = require('test_run')
> +test_run = env.new()
> +
> +SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
> +-- Start servers
> +test_run:create_cluster(SERVERS)
> +-- Wait for full mesh
> +test_run:wait_fullmesh(SERVERS)
> +
> +test_run:cmd("switch autobootstrap1")
> +for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
> +box.space.test:count()
> +
> +test_run:cmd('switch default')
> +vclock1 = test_run:get_vclock('autobootstrap1')
> +vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
> +
> +test_run:cmd("switch autobootstrap2")
> +box.space.test:count()
> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
> +test_run:cmd("stop server autobootstrap1")
> +fio = require('fio')
> +-- This test checks ability to recover missing local data
> +-- from remote replica. See #3210.
> +-- Delete data on first master and test that after restart,
> +-- due to difference in vclock it will be able to recover
> +-- all missing data from replica.
> +-- Also check that there is no concurrency, i.e. master is
> +-- in 'read-only' mode unless it receives all data.
> +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
> +test_run:cmd("start server autobootstrap1")
> +
> +test_run:cmd("switch autobootstrap1")
> +for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end

> +fiber = require('fiber')
> +fiber.sleep(0.1)

I don't think you still need this 'sleep', not after patch 1.

> +box.space.test:select()
> +
> +-- Cleanup.
> +test_run:cmd('switch default')
> +test_run:drop_cluster(SERVERS)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-03-30 11:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-29 16:15 [tarantool-patches] [PATCH 0/2] [replication] [recovery] recover missing data Konstantin Belyavskiy
2018-03-29 16:15 ` [tarantool-patches] [PATCH 1/2] " Konstantin Belyavskiy
2018-03-30 11:17   ` Vladimir Davydov
2018-03-29 16:15 ` [tarantool-patches] [PATCH 2/2] " Konstantin Belyavskiy
2018-03-30 11:33   ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox