[PATCH v2] vinyl: ignore quota timeout on replication

Konstantin Osipov kostja at tarantool.org
Mon Jan 29 21:15:27 MSK 2018


* Vladimir Davydov <vdavydov.dev at gmail.com> [18/01/29 17:07]:

OK to push.

> If vinyl fails to do memory dumps in time on a replica (e.g. it ran
> out of disk space), replication will stop forever with an error, and
> the admin will have to call box.cfg() to restart replication. Since
> replication is asynchronous anyway, we shouldn't stop it on vinyl
> timeout - it isn't critical as the replica will recover as soon as
> the admin fixes the problem (e.g. frees up some disk space). Let's
> ignore vinyl timeout altogether for applier fibers (currently, we
> ignore it only on join) - the admin can monitor how badly a replica
> lags behind the master via box.info.replication lag/idle.
> 
> Closes #3087
> ---
> Branch: gh-3087-vy-ignore-quota-timeout-on-replication
> 
> Changes in v2:
>  - Use session->type instead of fiber->type (kostja)
> 
>  src/box/vinyl.c                   | 11 ++++++-----
>  test/vinyl/replica_quota.result   | 13 +++++++++++++
>  test/vinyl/replica_quota.test.lua |  7 +++++++
>  3 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index 59bd6e7d..b2331081 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -70,6 +70,7 @@
>  #include "column_mask.h"
>  #include "trigger.h"
>  #include "checkpoint.h"
> +#include "session.h"
>  #include "wal.h" /* wal_mode() */
>  
>  /**
> @@ -2324,12 +2325,12 @@ vinyl_engine_prepare(struct engine *engine, struct txn *txn)
>  		return -1;
>  
>  	/*
> -	 * A replica receives a lot of data during initial join.
> -	 * If the network connection is fast enough, it might fail
> -	 * to keep up with dumps. To avoid replication failure due
> -	 * to this, we ignore the quota timeout during bootstrap.
> +	 * Do not abort join/subscribe on quota timeout - replication
> +	 * is asynchronous anyway and there's box.info.replication
> +	 * available for the admin to track the lag so let the applier
> +	 * wait as long as necessary for memory dump to complete.
>  	 */
> -	double timeout = (env->status == VINYL_ONLINE ?
> +	double timeout = (current_session()->type != SESSION_TYPE_APPLIER ?
>  			  env->timeout : TIMEOUT_INFINITY);


>  	/*
>  	 * Reserve quota needed by the transaction before allocating
> diff --git a/test/vinyl/replica_quota.result b/test/vinyl/replica_quota.result
> index 485efde7..b85c7398 100644
> --- a/test/vinyl/replica_quota.result
> +++ b/test/vinyl/replica_quota.result
> @@ -45,6 +45,19 @@ _ = test_run:cmd("start server replica")
>  _ = test_run:wait_lsn('replica', 'default')
>  ---
>  ...
> +-- Check vinyl_timeout is ignored on 'subscribe' (gh-3087).
> +_ = test_run:cmd("stop server replica")
> +---
> +...
> +for i = 2001,3000 do s:insert{i, pad} end
> +---
> +...
> +_ = test_run:cmd("start server replica")
> +---
> +...
> +_ = test_run:wait_lsn('replica', 'default')
> +---
> +...
>  _ = test_run:cmd("stop server replica")
>  ---
>  ...
> diff --git a/test/vinyl/replica_quota.test.lua b/test/vinyl/replica_quota.test.lua
> index bc6cfb0d..ab89c1bc 100644
> --- a/test/vinyl/replica_quota.test.lua
> +++ b/test/vinyl/replica_quota.test.lua
> @@ -24,6 +24,13 @@ for i = 1001,2000 do s:insert{i, pad} end
>  _ = test_run:cmd("create server replica with rpl_master=default, script='vinyl/join_quota.lua'")
>  _ = test_run:cmd("start server replica")
>  _ = test_run:wait_lsn('replica', 'default')
> +
> +-- Check vinyl_timeout is ignored on 'subscribe' (gh-3087).
> +_ = test_run:cmd("stop server replica")
> +for i = 2001,3000 do s:insert{i, pad} end
> +_ = test_run:cmd("start server replica")
> +_ = test_run:wait_lsn('replica', 'default')
> +
>  _ = test_run:cmd("stop server replica")
>  _ = test_run:cmd("cleanup server replica")

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.org - www.twitter.com/kostja_osipov



More information about the Tarantool-patches mailing list