[PATCH v2] vinyl: ignore quota timeout on replication

Vladimir Davydov vdavydov.dev at gmail.com
Mon Jan 29 17:02:16 MSK 2018


If vinyl fails to do memory dumps in time on a replica (e.g. it ran
out of disk space), replication will stop forever with an error, and
the admin will have to call box.cfg() to restart replication. Since
replication is asynchronous anyway, we shouldn't stop it on vinyl
timeout - it isn't critical as the replica will recover as soon as
the admin fixes the problem (e.g. frees up some disk space). Let's
ignore vinyl timeout altogether for applier fibers (currently, we
ignore it only on join) - the admin can monitor how badly a replica
lags behind the master via box.info.replication lag/idle.

Closes #3087
---
Branch: gh-3087-vy-ignore-quota-timeout-on-replication

Changes in v2:
 - Use session->type instead of fiber->type (kostja)

 src/box/vinyl.c                   | 11 ++++++-----
 test/vinyl/replica_quota.result   | 13 +++++++++++++
 test/vinyl/replica_quota.test.lua |  7 +++++++
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 59bd6e7d..b2331081 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -70,6 +70,7 @@
 #include "column_mask.h"
 #include "trigger.h"
 #include "checkpoint.h"
+#include "session.h"
 #include "wal.h" /* wal_mode() */
 
 /**
@@ -2324,12 +2325,12 @@ vinyl_engine_prepare(struct engine *engine, struct txn *txn)
 		return -1;
 
 	/*
-	 * A replica receives a lot of data during initial join.
-	 * If the network connection is fast enough, it might fail
-	 * to keep up with dumps. To avoid replication failure due
-	 * to this, we ignore the quota timeout during bootstrap.
+	 * Do not abort join/subscribe on quota timeout - replication
+	 * is asynchronous anyway and there's box.info.replication
+	 * available for the admin to track the lag so let the applier
+	 * wait as long as necessary for memory dump to complete.
 	 */
-	double timeout = (env->status == VINYL_ONLINE ?
+	double timeout = (current_session()->type != SESSION_TYPE_APPLIER ?
 			  env->timeout : TIMEOUT_INFINITY);
 	/*
 	 * Reserve quota needed by the transaction before allocating
diff --git a/test/vinyl/replica_quota.result b/test/vinyl/replica_quota.result
index 485efde7..b85c7398 100644
--- a/test/vinyl/replica_quota.result
+++ b/test/vinyl/replica_quota.result
@@ -45,6 +45,19 @@ _ = test_run:cmd("start server replica")
 _ = test_run:wait_lsn('replica', 'default')
 ---
 ...
+-- Check vinyl_timeout is ignored on 'subscribe' (gh-3087).
+_ = test_run:cmd("stop server replica")
+---
+...
+for i = 2001,3000 do s:insert{i, pad} end
+---
+...
+_ = test_run:cmd("start server replica")
+---
+...
+_ = test_run:wait_lsn('replica', 'default')
+---
+...
 _ = test_run:cmd("stop server replica")
 ---
 ...
diff --git a/test/vinyl/replica_quota.test.lua b/test/vinyl/replica_quota.test.lua
index bc6cfb0d..ab89c1bc 100644
--- a/test/vinyl/replica_quota.test.lua
+++ b/test/vinyl/replica_quota.test.lua
@@ -24,6 +24,13 @@ for i = 1001,2000 do s:insert{i, pad} end
 _ = test_run:cmd("create server replica with rpl_master=default, script='vinyl/join_quota.lua'")
 _ = test_run:cmd("start server replica")
 _ = test_run:wait_lsn('replica', 'default')
+
+-- Check vinyl_timeout is ignored on 'subscribe' (gh-3087).
+_ = test_run:cmd("stop server replica")
+for i = 2001,3000 do s:insert{i, pad} end
+_ = test_run:cmd("start server replica")
+_ = test_run:wait_lsn('replica', 'default')
+
 _ = test_run:cmd("stop server replica")
 _ = test_run:cmd("cleanup server replica")
 
-- 
2.11.0




More information about the Tarantool-patches mailing list