Tarantool development patches archive
 help / color / mirror / Atom feed
From: Olga Arkhangelskaia <krishtal.olja@gmail.com>
To: tarantool-patches@freelists.org
Cc: Olga Arkhangelskaia <krishtal.olja@gmail.com>
Subject: [tarantool-patches] [PATCH v6 3/3] box: adds replication sync after cfg. update
Date: Thu, 30 Aug 2018 17:11:14 +0300	[thread overview]
Message-ID: <20180830141114.4531-3-krishtal.olja@gmail.com> (raw)
In-Reply-To: <20180830141114.4531-1-krishtal.olja@gmail.com>

When replica reconnects to replica set not for the first time, we
suffer from absence of synchronization. Such behavior leads to giving
away outdated data.

Closes #3427

@TarantoolBot document
Title: Orphan status after configuration update or initial bootstrap.
In case of initial bootstrap or after configuration update we can get
an orphan status in two cases. If we synced up with number of replicas
that is smaller than quorum or if we failed to sync up during the time
specified in replication_sync_timeout.
---
https://github.com/tarantool/tarantool/issues/3427
https://github.com/tarantool/tarantool/tree/OKriw/gh-3427-replication-no-sync-1.9

v1:
https://www.freelists.org/post/tarantool-patches/PATCH-replication-adds-replication-sync-after-cfg-update

v2:
https://www.freelists.org/post/tarantool-patches/PATCH-v2-replication-adds-replication-sync-after-cfg-update

v3:
https://www.freelists.org/post/tarantool-patches/PATCH-v3-box-adds-replication-sync-after-cfg-update

v4:
https://www.freelists.org/post/tarantool-patches/PATCH-v4-22-box-adds-replication-sync-after-cfg-update

v5:
https://www.freelists.org/post/tarantool-patches/PATCH-v5-33-box-adds-replication-sync-after-cfg-update

Changes in v2:
- fixed test
- changed replicaset_sync

Changes in v3:
- now we raise the exception when sync is not successful.
- fixed test
- renamed test 

Changes in v4:
- fixed test
- replication_sync_lag is made dynamicall in separate patch
- removed unnecessary error type
- moved say_crit to another place
- in case of sync error we rollback to prev. config

Changes in v5:
- added test case
- now we don't roll back to prev. cfg

Changes in v6:
- set orphan
- added testcases

 src/box/box.cc                 |   6 ++
 src/box/replication.cc         |   1 +
 src/box/replication.h          |   5 +-
 test/replication/sync.result   | 147 +++++++++++++++++++++++++++++++++++++++++
 test/replication/sync.test.lua |  71 ++++++++++++++++++++
 5 files changed, 227 insertions(+), 3 deletions(-)
 create mode 100644 test/replication/sync.result
 create mode 100644 test/replication/sync.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index dcedfd002..e54a79467 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -646,6 +646,12 @@ box_set_replication(void)
 	box_sync_replication(true);
 	/* Follow replica */
 	replicaset_follow();
+	/* Set orphan and sync replica up to quorum.
+	 * If we fail to sync up, replica will be left in orphan state.
+	 */
+	is_orphan = true;
+	title("orphan");
+	replicaset_sync();
 }
 
 void
diff --git a/src/box/replication.cc b/src/box/replication.cc
index be58b0225..d85700b78 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -700,6 +700,7 @@ replicaset_sync(void)
 
 	say_crit("replica set sync complete, quorum of %d "
 		 "replicas formed", quorum);
+	return;
 }
 
 void
diff --git a/src/box/replication.h b/src/box/replication.h
index a6f1dbf69..64f6e7f97 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -378,10 +378,9 @@ void
 replicaset_follow(void);
 
 /**
- * Wait until a replication quorum is formed.
- * Return immediately if a quorum cannot be
- * formed because of errors.
+ * Wait until a replication quorum is formed and sync up with it.
  */
+
 void
 replicaset_sync(void);
 
diff --git a/test/replication/sync.result b/test/replication/sync.result
new file mode 100644
index 000000000..d0a0eb5eb
--- /dev/null
+++ b/test/replication/sync.result
@@ -0,0 +1,147 @@
+fiber = require('fiber')
+---
+...
+--
+-- gh-3427: no sync after configuration update
+--
+--
+-- successful sync
+--
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+engine = test_run:get_cfg('engine')
+---
+...
+box.schema.user.grant('guest', 'replication')
+---
+...
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+s = box.schema.space.create('test', {engine = engine})
+---
+...
+index = s:create_index('primary')
+---
+...
+-- change replica configuration
+test_run:cmd("switch replica")
+---
+- true
+...
+replication = box.cfg.replication
+---
+...
+box.cfg{replication={}}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 1, 100 do box.space.test:insert{i, i} end box.commit()
+---
+...
+box.space.test:count()
+---
+- 100
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{replication = replication}
+---
+...
+box.space.test:count() == 100
+---
+- true
+...
+--
+-- unsuccessful sync entering orphan state
+--
+box.cfg{replication={}}
+---
+...
+box.cfg{replication_sync_timeout = 0.000001}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 101, 200 do box.space.test:insert{i, i} end box.commit()
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{replication = replication}
+---
+...
+box.info.status
+---
+- orphan
+...
+require'fiber'.sleep(0.1)
+---
+...
+box.info.status
+---
+- running
+...
+--
+-- replication_sync_lag is too big
+--
+box.cfg{replication_sync_lag = 100}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+function f () box.begin() for i = 201, 500 do box.space.test:insert{i, i} end box.commit(); end
+---
+...
+_=fiber.create(f)
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.space.test:count() < 500
+---
+- true
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- cleanup
+test_run:cmd("stop server replica")
+---
+- true
+...
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+box.space.test:drop()
+---
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
new file mode 100644
index 000000000..0c4fff483
--- /dev/null
+++ b/test/replication/sync.test.lua
@@ -0,0 +1,71 @@
+fiber = require('fiber')
+--
+-- gh-3427: no sync after configuration update
+--
+
+--
+-- successful sync
+--
+
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+
+box.schema.user.grant('guest', 'replication')
+
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica")
+
+s = box.schema.space.create('test', {engine = engine})
+index = s:create_index('primary')
+
+-- change replica configuration
+test_run:cmd("switch replica")
+replication = box.cfg.replication
+box.cfg{replication={}}
+
+test_run:cmd("switch default")
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 1, 100 do box.space.test:insert{i, i} end box.commit()
+box.space.test:count()
+
+test_run:cmd("switch replica")
+box.cfg{replication = replication}
+box.space.test:count() == 100
+
+--
+-- unsuccessful sync entering orphan state
+--
+box.cfg{replication={}}
+box.cfg{replication_sync_timeout = 0.000001}
+
+test_run:cmd("switch default")
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 101, 200 do box.space.test:insert{i, i} end box.commit()
+
+test_run:cmd("switch replica")
+box.cfg{replication = replication}
+box.info.status
+require'fiber'.sleep(0.1)
+box.info.status
+
+--
+-- replication_sync_lag is too big
+--
+
+box.cfg{replication_sync_lag = 100}
+
+test_run:cmd("switch default")
+
+function f () box.begin() for i = 201, 500 do box.space.test:insert{i, i} end box.commit(); end
+_=fiber.create(f)
+
+test_run:cmd("switch replica")
+box.space.test:count() < 500
+
+test_run:cmd("switch default")
+-- cleanup
+test_run:cmd("stop server replica")
+test_run:cmd("cleanup server replica")
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
-- 
2.14.3 (Apple Git-98)

  parent reply	other threads:[~2018-08-30 14:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30 14:11 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
2018-08-30 14:11 ` [tarantool-patches] [PATCH v2 2/3] box: add replication_sync_timeout Olga Arkhangelskaia
2018-08-30 16:11   ` Vladimir Davydov
2018-08-30 14:11 ` Olga Arkhangelskaia [this message]
2018-08-30 16:41   ` [tarantool-patches] [PATCH v6 3/3] box: adds replication sync after cfg. update Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180830141114.4531-3-krishtal.olja@gmail.com \
    --to=krishtal.olja@gmail.com \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] [PATCH v6 3/3] box: adds replication sync after cfg. update' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox