Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic
@ 2018-08-28 11:43 Olga Arkhangelskaia
  2018-08-28 11:43 ` [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update Olga Arkhangelskaia
  2018-08-28 14:03 ` [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic Vladimir Davydov
  0 siblings, 2 replies; 6+ messages in thread
From: Olga Arkhangelskaia @ 2018-08-28 11:43 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Olga Arkhangelskaia

In gh-3427 replication_sync_lag should be taken into account during
replication reconfiguration. In order to configure replication properly
this parameter is made dynamical and can be changed on demand.

@TarantoolBot document
Title: recation_sync_lag option can be set dynamically
recation_sync_lag now can be set at any time.
---
 src/box/box.cc            |  6 ++++++
 src/box/box.h             |  1 +
 src/box/lua/cfg.cc        | 12 ++++++++++++
 src/box/lua/load_cfg.lua  |  1 +
 test/box-tap/cfg.test.lua |  4 +++-
 5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 8d7454d1f..be5077da8 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -656,6 +656,12 @@ box_set_replication_connect_quorum(void)
 		replicaset_check_quorum();
 }
 
+void
+box_set_replication_sync_lag(void)
+{
+	replication_sync_lag = box_check_replication_sync_lag();
+}
+
 void
 box_bind(void)
 {
diff --git a/src/box/box.h b/src/box/box.h
index 9dfb3fd2a..3090fdcdb 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -176,6 +176,7 @@ void box_set_vinyl_timeout(void);
 void box_set_replication_timeout(void);
 void box_set_replication_connect_timeout(void);
 void box_set_replication_connect_quorum(void);
+void box_set_replication_sync_lag(void);
 
 extern "C" {
 #endif /* defined(__cplusplus) */
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index 0ca150877..5442723b5 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -262,6 +262,17 @@ lbox_cfg_set_replication_connect_quorum(struct lua_State *L)
 	return 0;
 }
 
+static int
+lbox_cfg_set_replication_sync_lag(struct lua_State *L)
+{
+	try {
+		box_set_replication_sync_lag();
+	} catch (Exception *) {
+		luaT_error(L);
+	}
+	return 0;
+}
+
 void
 box_lua_cfg_init(struct lua_State *L)
 {
@@ -286,6 +297,7 @@ box_lua_cfg_init(struct lua_State *L)
 		{"cfg_set_replication_timeout", lbox_cfg_set_replication_timeout},
 		{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
 		{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
+		{"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
 		{NULL, NULL}
 	};
 
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 2a7142de6..cdee475d5 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -211,6 +211,7 @@ local dynamic_cfg = {
                       'Can\'t change replicaset uuid')
         end
     end,
+    replication_sync_lag    = private.cfg_set_replication_sync_lag,
 }
 
 local dynamic_cfg_skip_at_load = {
diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
index 5e72004ca..7cb2ed9a3 100755
--- a/test/box-tap/cfg.test.lua
+++ b/test/box-tap/cfg.test.lua
@@ -6,7 +6,7 @@ local socket = require('socket')
 local fio = require('fio')
 local uuid = require('uuid')
 local msgpack = require('msgpack')
-test:plan(90)
+test:plan(92)
 
 --------------------------------------------------------------------------------
 -- Invalid values
@@ -32,6 +32,8 @@ invalid('replication_sync_lag', 0)
 invalid('replication_connect_timeout', -1)
 invalid('replication_connect_timeout', 0)
 invalid('replication_connect_quorum', -1)
+invalid('replication_sync_lag', -1)
+invalid('replication_sync_lag', -1)
 invalid('wal_mode', 'invalid')
 invalid('rows_per_wal', -1)
 invalid('listen', '//!')
-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update
  2018-08-28 11:43 [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
@ 2018-08-28 11:43 ` Olga Arkhangelskaia
  2018-08-28 15:58   ` Vladimir Davydov
  2018-08-28 14:03 ` [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic Vladimir Davydov
  1 sibling, 1 reply; 6+ messages in thread
From: Olga Arkhangelskaia @ 2018-08-28 11:43 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Olga Arkhangelskaia

When replica reconnects to replica set not for the first time, we
suffer from absence of synchronization. Such behavior leads to giving
away outdated data.

Closes #3427
---
https://github.com/tarantool/tarantool/issues/3427
https://github.com/tarantool/tarantool/tree/OKriw/replication_no_sync-1.9

v1:
https://www.freelists.org/post/tarantool-patches/PATCH-replication-adds-replication-sync-after-cfg-update

v2:
https://www.freelists.org/post/tarantool-patches/PATCH-v2-replication-adds-replication-sync-after-cfg-update

v3:
https://www.freelists.org/post/tarantool-patches/PATCH-v3-box-adds-replication-sync-after-cfg-update

Changes in v2:
- fixed test
- changed replicaset_sync

Changes in v3:
- now we raise the exception when sync is not successful.
- fixed test
- renamed test 

Changes in v4:
- fixed test
- replication_sync_lag is made dynamicall in separate patch
- removed unnecessary error type
- moved say_crit to another place
- in case of sync error we rollback to prev. config

 src/box/box.cc                 |  8 ++++-
 src/box/replication.cc         |  8 ++---
 src/box/replication.h          |  6 ++--
 test/replication/sync.result   | 81 ++++++++++++++++++++++++++++++++++++++++++
 test/replication/sync.test.lua | 38 ++++++++++++++++++++
 5 files changed, 133 insertions(+), 8 deletions(-)
 create mode 100644 test/replication/sync.result
 create mode 100644 test/replication/sync.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index be5077da8..aaae4219f 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -634,6 +634,11 @@ box_set_replication(void)
 	box_sync_replication(true);
 	/* Follow replica */
 	replicaset_follow();
+	/* Sync replica up to quorum */
+	if (!replicaset_sync()) {
+		tnt_raise(ClientError, ER_CFG, "replication",
+			  "failed to connect to one or more replicas");
+	}
 }
 
 void
@@ -1948,7 +1953,8 @@ box_cfg_xc(void)
 	is_box_configured = true;
 
 	if (!is_bootstrap_leader)
-		replicaset_sync();
+		if (!replicaset_sync())
+			say_crit("entering orphan mode");
 
 	say_info("ready to accept requests");
 }
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 861ce34ea..9d3b1094c 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -661,13 +661,13 @@ replicaset_follow(void)
 	}
 }
 
-void
+bool
 replicaset_sync(void)
 {
 	int quorum = replicaset_quorum();
 
 	if (quorum == 0)
-		return;
+		return true;
 
 	say_verbose("synchronizing with %d replicas", quorum);
 
@@ -686,12 +686,12 @@ replicaset_sync(void)
 		 * Do not stall configuration, leave the instance
 		 * in 'orphan' state.
 		 */
-		say_crit("entering orphan mode");
-		return;
+		return false;
 	}
 
 	say_crit("replica set sync complete, quorum of %d "
 		 "replicas formed", quorum);
+	return true;
 }
 
 void
diff --git a/src/box/replication.h b/src/box/replication.h
index 06a2867b6..d4e6f7e3e 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -373,10 +373,10 @@ replicaset_follow(void);
 
 /**
  * Wait until a replication quorum is formed.
- * Return immediately if a quorum cannot be
- * formed because of errors.
+ * @return true in case of success.
+ * @return false if a quorum cannot be formed because of errors.
  */
-void
+bool
 replicaset_sync(void);
 
 /**
diff --git a/test/replication/sync.result b/test/replication/sync.result
new file mode 100644
index 000000000..f6ddb02e0
--- /dev/null
+++ b/test/replication/sync.result
@@ -0,0 +1,81 @@
+--
+-- gh-3427: no sync after configuration update
+--
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+engine = test_run:get_cfg('engine')
+---
+...
+box.schema.user.grant('guest', 'replication')
+---
+...
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+s = box.schema.space.create('test', {engine = engine})
+---
+...
+index = s:create_index('primary')
+---
+...
+-- change replica configuration
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{replication_sync_lag = 0.1}
+---
+...
+replication = box.cfg.replication
+---
+...
+box.cfg{replication={}}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- insert values on the master while replica is unconfigured
+a = 3000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} end box.commit()
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{replication = replication}
+---
+...
+box.space.test:count() == 3000
+---
+- true
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- cleanup
+test_run:cmd("stop server replica")
+---
+- true
+...
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+box.space.test:drop()
+---
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
new file mode 100644
index 000000000..4c2b55af8
--- /dev/null
+++ b/test/replication/sync.test.lua
@@ -0,0 +1,38 @@
+--
+-- gh-3427: no sync after configuration update
+--
+
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+
+box.schema.user.grant('guest', 'replication')
+
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica")
+
+s = box.schema.space.create('test', {engine = engine})
+index = s:create_index('primary')
+
+-- change replica configuration
+test_run:cmd("switch replica")
+box.cfg{replication_sync_lag = 0.1}
+replication = box.cfg.replication
+box.cfg{replication={}}
+
+test_run:cmd("switch default")
+-- insert values on the master while replica is unconfigured
+a = 3000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} end box.commit()
+
+test_run:cmd("switch replica")
+box.cfg{replication = replication}
+
+box.space.test:count() == 3000
+
+test_run:cmd("switch default")
+
+-- cleanup
+test_run:cmd("stop server replica")
+test_run:cmd("cleanup server replica")
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
-- 
2.14.3 (Apple Git-98)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic
  2018-08-28 11:43 [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
  2018-08-28 11:43 ` [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update Olga Arkhangelskaia
@ 2018-08-28 14:03 ` Vladimir Davydov
  1 sibling, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-08-28 14:03 UTC (permalink / raw)
  To: Olga Arkhangelskaia; +Cc: tarantool-patches

On Tue, Aug 28, 2018 at 02:43:27PM +0300, Olga Arkhangelskaia wrote:
> In gh-3427 replication_sync_lag should be taken into account during
> replication reconfiguration. In order to configure replication properly
> this parameter is made dynamical and can be changed on demand.
> 
> @TarantoolBot document
> Title: recation_sync_lag option can be set dynamically
> recation_sync_lag now can be set at any time.
> ---
>  src/box/box.cc            |  6 ++++++
>  src/box/box.h             |  1 +
>  src/box/lua/cfg.cc        | 12 ++++++++++++
>  src/box/lua/load_cfg.lua  |  1 +
>  test/box-tap/cfg.test.lua |  4 +++-
>  5 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 8d7454d1f..be5077da8 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -656,6 +656,12 @@ box_set_replication_connect_quorum(void)
>  		replicaset_check_quorum();
>  }
>  
> +void
> +box_set_replication_sync_lag(void)
> +{
> +	replication_sync_lag = box_check_replication_sync_lag();
> +}
> +

Nit: you can now use this function in box_cfg_xc() instead of

	replication_sync_lag = box_check_replication_sync_lag();

Please do.

>  void
>  box_bind(void)
>  {
> diff --git a/src/box/box.h b/src/box/box.h
> index 9dfb3fd2a..3090fdcdb 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -176,6 +176,7 @@ void box_set_vinyl_timeout(void);
>  void box_set_replication_timeout(void);
>  void box_set_replication_connect_timeout(void);
>  void box_set_replication_connect_quorum(void);
> +void box_set_replication_sync_lag(void);
>  
>  extern "C" {
>  #endif /* defined(__cplusplus) */
> diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
> index 0ca150877..5442723b5 100644
> --- a/src/box/lua/cfg.cc
> +++ b/src/box/lua/cfg.cc
> @@ -262,6 +262,17 @@ lbox_cfg_set_replication_connect_quorum(struct lua_State *L)
>  	return 0;
>  }
>  
> +static int
> +lbox_cfg_set_replication_sync_lag(struct lua_State *L)
> +{
> +	try {
> +		box_set_replication_sync_lag();
> +	} catch (Exception *) {
> +		luaT_error(L);
> +	}
> +	return 0;
> +}
> +
>  void
>  box_lua_cfg_init(struct lua_State *L)
>  {
> @@ -286,6 +297,7 @@ box_lua_cfg_init(struct lua_State *L)
>  		{"cfg_set_replication_timeout", lbox_cfg_set_replication_timeout},
>  		{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
>  		{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
> +		{"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
>  		{NULL, NULL}
>  	};
>  
> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
> index 2a7142de6..cdee475d5 100644
> --- a/src/box/lua/load_cfg.lua
> +++ b/src/box/lua/load_cfg.lua
> @@ -211,6 +211,7 @@ local dynamic_cfg = {
>                        'Can\'t change replicaset uuid')
>          end
>      end,
> +    replication_sync_lag    = private.cfg_set_replication_sync_lag,

Nit: please move this option definition closer to replication_timeout
and friends.

Also, you should add this option to dynamic_cfg_skip_at_load, because it
is initialized by box_cfg_xc().

>  }
>  
>  local dynamic_cfg_skip_at_load = {
> diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
> index 5e72004ca..7cb2ed9a3 100755
> --- a/test/box-tap/cfg.test.lua
> +++ b/test/box-tap/cfg.test.lua
> @@ -6,7 +6,7 @@ local socket = require('socket')
>  local fio = require('fio')
>  local uuid = require('uuid')
>  local msgpack = require('msgpack')
> -test:plan(90)
> +test:plan(92)
>  
>  --------------------------------------------------------------------------------
>  -- Invalid values
> @@ -32,6 +32,8 @@ invalid('replication_sync_lag', 0)
>  invalid('replication_connect_timeout', -1)
>  invalid('replication_connect_timeout', 0)
>  invalid('replication_connect_quorum', -1)
> +invalid('replication_sync_lag', -1)
> +invalid('replication_sync_lag', -1)

The same value is tested twice. Besides, this is already tested a few
lines above.

What you should test is that replication_sync_lag can now be changed
after initial configuration. Please do.

>  invalid('wal_mode', 'invalid')
>  invalid('rows_per_wal', -1)
>  invalid('listen', '//!')

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update
  2018-08-28 11:43 ` [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update Olga Arkhangelskaia
@ 2018-08-28 15:58   ` Vladimir Davydov
  2018-08-28 16:19     ` Olga Krishtal
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Davydov @ 2018-08-28 15:58 UTC (permalink / raw)
  To: Olga Arkhangelskaia; +Cc: tarantool-patches

On Tue, Aug 28, 2018 at 02:43:28PM +0300, Olga Arkhangelskaia wrote:
> When replica reconnects to replica set not for the first time, we
> suffer from absence of synchronization. Such behavior leads to giving
> away outdated data.
> 
> Closes #3427

Please write a documentation request.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index be5077da8..aaae4219f 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -634,6 +634,11 @@ box_set_replication(void)
>  	box_sync_replication(true);
>  	/* Follow replica */
>  	replicaset_follow();
> +	/* Sync replica up to quorum */
> +	if (!replicaset_sync()) {
> +		tnt_raise(ClientError, ER_CFG, "replication",
> +			  "failed to connect to one or more replicas");
> +	}

Throwing ER_CFG error from box.cfg() and still applying the new
replication configuration looks weird. We should either revert the
configuration back to what we had before box.cfg() was called or not
throw exceptions.

Reverting configuration seems to be unreasonable, because we could've
applied some rows from the new replicas.

We discussed the matter with Georgy and Kostja and agreed that instead
an instance should enter the orphan mode, just like it does on initial
configuration.

Sorry, we didn't come to an agreement earlier.

Please rework and add a test case.

> diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
> new file mode 100644
> index 000000000..4c2b55af8
> --- /dev/null
> +++ b/test/replication/sync.test.lua
> @@ -0,0 +1,38 @@
> +--
> +-- gh-3427: no sync after configuration update
> +--
> +
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +
> +box.schema.user.grant('guest', 'replication')
> +
> +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
> +test_run:cmd("start server replica")
> +
> +s = box.schema.space.create('test', {engine = engine})
> +index = s:create_index('primary')
> +
> +-- change replica configuration
> +test_run:cmd("switch replica")
> +box.cfg{replication_sync_lag = 0.1}
> +replication = box.cfg.replication
> +box.cfg{replication={}}
> +
> +test_run:cmd("switch default")
> +-- insert values on the master while replica is unconfigured
> +a = 3000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} end box.commit()

Nit: for i = 1, 100 do ... end

Anyway, why 3000? When I change it to 1000 or even 100 the test still
passes with this patch and fails without it.

Also, I'd like to see a test case that checks that in case
box.cfg.replication_sync_lag is big, not all records arrive
by the time box.cfg{replication} returns.

And a test case that checks that tarantool enters the orphan mode
if it fails to sync.

Please add.

> +
> +test_run:cmd("switch replica")
> +box.cfg{replication = replication}
> +
> +box.space.test:count() == 3000

Nit: better do

box.space.test:count() -- 3000

The reject file will be more informative in case of error then.

> +
> +test_run:cmd("switch default")
> +
> +-- cleanup
> +test_run:cmd("stop server replica")
> +test_run:cmd("cleanup server replica")
> +box.space.test:drop()
> +box.schema.user.revoke('guest', 'replication')

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update
  2018-08-28 15:58   ` Vladimir Davydov
@ 2018-08-28 16:19     ` Olga Krishtal
  2018-08-28 16:36       ` Vladimir Davydov
  0 siblings, 1 reply; 6+ messages in thread
From: Olga Krishtal @ 2018-08-28 16:19 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 4190 bytes --]

Thanks for the review!

вт, 28 авг. 2018 г. в 18:58, Vladimir Davydov <vdavydov.dev@gmail.com>:

> On Tue, Aug 28, 2018 at 02:43:28PM +0300, Olga Arkhangelskaia wrote:
> > When replica reconnects to replica set not for the first time, we
> > suffer from absence of synchronization. Such behavior leads to giving
> > away outdated data.
> >
> > Closes #3427
>
> Please write a documentation request.
>

Ok


>
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index be5077da8..aaae4219f 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -634,6 +634,11 @@ box_set_replication(void)
> >       box_sync_replication(true);
> >       /* Follow replica */
> >       replicaset_follow();
> > +     /* Sync replica up to quorum */
> > +     if (!replicaset_sync()) {
> > +             tnt_raise(ClientError, ER_CFG, "replication",
> > +                       "failed to connect to one or more replicas");
> > +     }
>
> Throwing ER_CFG error from box.cfg() and still applying the new
> replication configuration looks weird. We should either revert the
> configuration back to what we had before box.cfg() was called or not
> throw exceptions.
>
> Reverting configuration seems to be unreasonable, because we could've
> applied some rows from the new replicas.
>
> We discussed the matter with Georgy and Kostja and agreed that instead
> an instance should enter the orphan mode, just like it does on initial
> configuration.
>
>

Just curious, why?  How can we applied changes if box.cfg throws an error?
Or I miss smth?
Ok



> Sorry, we didn't come to an agreement earlier.
>
> Please rework and add a test case.
>
> > diff --git a/test/replication/sync.test.lua
> b/test/replication/sync.test.lua
> > new file mode 100644
> > index 000000000..4c2b55af8
> > --- /dev/null
> > +++ b/test/replication/sync.test.lua
> > @@ -0,0 +1,38 @@
> > +--
> > +-- gh-3427: no sync after configuration update
> > +--
> > +
> > +env = require('test_run')
> > +test_run = env.new()
> > +engine = test_run:get_cfg('engine')
> > +
> > +box.schema.user.grant('guest', 'replication')
> > +
> > +test_run:cmd("create server replica with rpl_master=default,
> script='replication/replica.lua'")
> > +test_run:cmd("start server replica")
> > +
> > +s = box.schema.space.create('test', {engine = engine})
> > +index = s:create_index('primary')
> > +
> > +-- change replica configuration
> > +test_run:cmd("switch replica")
> > +box.cfg{replication_sync_lag = 0.1}
> > +replication = box.cfg.replication
> > +box.cfg{replication={}}
> > +
> > +test_run:cmd("switch default")
> > +-- insert values on the master while replica is unconfigured
> > +a = 3000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a}
> end box.commit()
>
> Nit: for i = 1, 100 do ... end


> Anyway, why 3000? When I change it to 1000 or even 100 the test still
> passes with this patch and fails without it.
>
>
I used 3000 because when there is no patch and I put replica into sleep for
replication sync lag (0.1) arrives nearly 2500 tuples.



> Also, I'd like to see a test case that checks that in case
> box.cfg.replication_sync_lag is big, not all records arrive
> by the time box.cfg{replication} returns.
>
>
You mean see difference in tuples count in case when replicas are synced,
however due to lag, but not due to data has arrived?


> And a test case that checks that tarantool enters the orphan mode
> if it fails to sync.
>
> Please add.
>


Ok


>
> > +
> > +test_run:cmd("switch replica")
> > +box.cfg{replication = replication}
> > +
> > +box.space.test:count() == 3000
>
> Nit: better do
>
> box.space.test:count() -- 3000
>
> The reject file will be more informative in case of error then.
>

So I need 3 test case
Test that we are synced.
Test with sync and big lag.
Test with failed sync - orphan mode?



>
> > +
> > +test_run:cmd("switch default")
> > +
> > +-- cleanup
> > +test_run:cmd("stop server replica")
> > +test_run:cmd("cleanup server replica")
> > +box.space.test:drop()
> > +box.schema.user.revoke('guest', 'replication')
>

[-- Attachment #2: Type: text/html, Size: 6315 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update
  2018-08-28 16:19     ` Olga Krishtal
@ 2018-08-28 16:36       ` Vladimir Davydov
  0 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-08-28 16:36 UTC (permalink / raw)
  To: Olga Krishtal; +Cc: tarantool-patches

On Tue, Aug 28, 2018 at 07:19:55PM +0300, Olga Krishtal wrote:
> > > diff --git a/src/box/box.cc b/src/box/box.cc
> > > index be5077da8..aaae4219f 100644
> > > --- a/src/box/box.cc
> > > +++ b/src/box/box.cc
> > > @@ -634,6 +634,11 @@ box_set_replication(void)
> > >       box_sync_replication(true);
> > >       /* Follow replica */
> > >       replicaset_follow();
> > > +     /* Sync replica up to quorum */
> > > +     if (!replicaset_sync()) {
> > > +             tnt_raise(ClientError, ER_CFG, "replication",
> > > +                       "failed to connect to one or more replicas");
> > > +     }
> >
> > Throwing ER_CFG error from box.cfg() and still applying the new
> > replication configuration looks weird. We should either revert the
> > configuration back to what we had before box.cfg() was called or not
> > throw exceptions.
> >
> > Reverting configuration seems to be unreasonable, because we could've
> > applied some rows from the new replicas.
> >
> > We discussed the matter with Georgy and Kostja and agreed that instead
> > an instance should enter the orphan mode, just like it does on initial
> > configuration.
> >
> >
> 
> Just curious, why?  How can we applied changes if box.cfg throws an error?
> Or I miss smth?

box.cfg{replication} successfully connected appliers and updated
replica set. Then it started to wait for appliers to synchronize.
Some appliers synchronized. Others not. Failure. We can't revert
to old replication configuration, because the appliers applied some
rows from the new configuration. That's why we should return from
box.cfg{} without an error, but enter the orphan mode.

> > > +a = 3000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a}
> > end box.commit()
> >
> > Nit: for i = 1, 100 do ... end
> 
> 
> > Anyway, why 3000? When I change it to 1000 or even 100 the test still
> > passes with this patch and fails without it.
> >
> >
> I used 3000 because when there is no patch and I put replica into sleep for
> replication sync lag (0.1) arrives nearly 2500 tuples.

Better use error injections.

> 
> 
> 
> > Also, I'd like to see a test case that checks that in case
> > box.cfg.replication_sync_lag is big, not all records arrive
> > by the time box.cfg{replication} returns.
> >
> >
> You mean see difference in tuples count in case when replicas are synced,
> however due to lag, but not due to data has arrived?

Yes, box.cfg{} returns, because the lag is within the target range, even
though not all rows have arrived.

> So I need 3 test case
> Test that we are synced.
> Test with sync and big lag.
> Test with failed sync - orphan mode?

Correct.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-28 16:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28 11:43 [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
2018-08-28 11:43 ` [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update Olga Arkhangelskaia
2018-08-28 15:58   ` Vladimir Davydov
2018-08-28 16:19     ` Olga Krishtal
2018-08-28 16:36       ` Vladimir Davydov
2018-08-28 14:03 ` [tarantool-patches] [PATCH 1/2] box: make replication_sync_lag option dynamic Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox