[PATCH v2] vinyl: ignore quota timeout on replication
Konstantin Osipov
kostja at tarantool.org
Mon Jan 29 21:15:27 MSK 2018
* Vladimir Davydov <vdavydov.dev at gmail.com> [18/01/29 17:07]:
OK to push.
> If vinyl fails to do memory dumps in time on a replica (e.g. it ran
> out of disk space), replication will stop forever with an error, and
> the admin will have to call box.cfg() to restart replication. Since
> replication is asynchronous anyway, we shouldn't stop it on vinyl
> timeout - it isn't critical as the replica will recover as soon as
> the admin fixes the problem (e.g. frees up some disk space). Let's
> ignore vinyl timeout altogether for applier fibers (currently, we
> ignore it only on join) - the admin can monitor how badly a replica
> lags behind the master via box.info.replication lag/idle.
>
> Closes #3087
> ---
> Branch: gh-3087-vy-ignore-quota-timeout-on-replication
>
> Changes in v2:
> - Use session->type instead of fiber->type (kostja)
>
> src/box/vinyl.c | 11 ++++++-----
> test/vinyl/replica_quota.result | 13 +++++++++++++
> test/vinyl/replica_quota.test.lua | 7 +++++++
> 3 files changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index 59bd6e7d..b2331081 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -70,6 +70,7 @@
> #include "column_mask.h"
> #include "trigger.h"
> #include "checkpoint.h"
> +#include "session.h"
> #include "wal.h" /* wal_mode() */
>
> /**
> @@ -2324,12 +2325,12 @@ vinyl_engine_prepare(struct engine *engine, struct txn *txn)
> return -1;
>
> /*
> - * A replica receives a lot of data during initial join.
> - * If the network connection is fast enough, it might fail
> - * to keep up with dumps. To avoid replication failure due
> - * to this, we ignore the quota timeout during bootstrap.
> + * Do not abort join/subscribe on quota timeout - replication
> + * is asynchronous anyway and there's box.info.replication
> + * available for the admin to track the lag so let the applier
> + * wait as long as necessary for memory dump to complete.
> */
> - double timeout = (env->status == VINYL_ONLINE ?
> + double timeout = (current_session()->type != SESSION_TYPE_APPLIER ?
> env->timeout : TIMEOUT_INFINITY);
> /*
> * Reserve quota needed by the transaction before allocating
> diff --git a/test/vinyl/replica_quota.result b/test/vinyl/replica_quota.result
> index 485efde7..b85c7398 100644
> --- a/test/vinyl/replica_quota.result
> +++ b/test/vinyl/replica_quota.result
> @@ -45,6 +45,19 @@ _ = test_run:cmd("start server replica")
> _ = test_run:wait_lsn('replica', 'default')
> ---
> ...
> +-- Check vinyl_timeout is ignored on 'subscribe' (gh-3087).
> +_ = test_run:cmd("stop server replica")
> +---
> +...
> +for i = 2001,3000 do s:insert{i, pad} end
> +---
> +...
> +_ = test_run:cmd("start server replica")
> +---
> +...
> +_ = test_run:wait_lsn('replica', 'default')
> +---
> +...
> _ = test_run:cmd("stop server replica")
> ---
> ...
> diff --git a/test/vinyl/replica_quota.test.lua b/test/vinyl/replica_quota.test.lua
> index bc6cfb0d..ab89c1bc 100644
> --- a/test/vinyl/replica_quota.test.lua
> +++ b/test/vinyl/replica_quota.test.lua
> @@ -24,6 +24,13 @@ for i = 1001,2000 do s:insert{i, pad} end
> _ = test_run:cmd("create server replica with rpl_master=default, script='vinyl/join_quota.lua'")
> _ = test_run:cmd("start server replica")
> _ = test_run:wait_lsn('replica', 'default')
> +
> +-- Check vinyl_timeout is ignored on 'subscribe' (gh-3087).
> +_ = test_run:cmd("stop server replica")
> +for i = 2001,3000 do s:insert{i, pad} end
> +_ = test_run:cmd("start server replica")
> +_ = test_run:wait_lsn('replica', 'default')
> +
> _ = test_run:cmd("stop server replica")
> _ = test_run:cmd("cleanup server replica")
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.org - www.twitter.com/kostja_osipov
More information about the Tarantool-patches
mailing list