Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: kostja@tarantool.org
Cc: tarantool-patches@freelists.org
Subject: [PATCH 7/8] vinyl: don't start scheduler fiber until local recovery is complete
Date: Tue,  4 Sep 2018 20:23:50 +0300	[thread overview]
Message-ID: <8dd7a2f1d46aae68e3509311826f9f6efc8bc49e.1536080993.git.vdavydov.dev@gmail.com> (raw)
In-Reply-To: <cover.1536080993.git.vdavydov.dev@gmail.com>
In-Reply-To: <cover.1536080993.git.vdavydov.dev@gmail.com>

We must not schedule any background jobs during local recovery, because
they may disrupt yet to be recovered data stored on disk. Since we start
the scheduler fiber as soon as the engine is initialized, we have to
pull some tricks to make sure it doesn't schedule any tasks: the
scheduler fiber function yields immediately upon startup; we assume
that it won't be woken up until local recovery is complete, because we
don't set the memory limit until then.

This looks rather flimsy, because the logic is spread among several
seemingly unrelated functions: the scheduler fiber (vy_scheduler_f),
the quota watermark callback (vy_env_quota_exceeded_cb), and the engine
recovery callback (vinyl_engine_begin_initial_recovery), where we leave
the memory limit unset until recovery is complete. The latter isn't even
mentioned in comments, which makes the code difficult to follow. Think
how everything would fall apart should we try to wake up the scheduler
fiber somewhere else for some reason.

This patch attempts to make the code more straightforward by postponing
startup of the scheduler fiber until recovery completion. It also moves
the comment explaining why we can't schedule tasks during local recovery
from vy_env_quota_exceeded_cb to vinyl_engine_begin_initial_recovery,
because this is where we actually omit the scheduler fiber startup.

Note, since now the scheduler fiber goes straight to business once
started, we can't start worker threads in the fiber function as we used
to, because then workers threads would be running even if vinyl was
unused. So we move this code to vy_worker_pool_get, which is called when
a worker is actually needed to run a task.
---
 src/box/vinyl.c        | 39 ++++++++++++++++++++-------------------
 src/box/vy_scheduler.c | 27 ++++++++++++---------------
 src/box/vy_scheduler.h |  6 ++++++
 3 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 416c9824..1cf9ad16 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -2420,21 +2420,6 @@ vy_env_quota_exceeded_cb(struct vy_quota *quota)
 {
 	struct vy_env *env = container_of(quota, struct vy_env, quota);
 
-	/*
-	 * The scheduler must be disabled during local recovery so as
-	 * not to distort data stored on disk. Not that we really need
-	 * it anyway, because the memory footprint is limited by the
-	 * memory limit from the previous run.
-	 *
-	 * On the contrary, remote recovery does require the scheduler
-	 * to be up and running, because the amount of data received
-	 * when bootstrapping from a remote master is only limited by
-	 * its disk size, which can exceed the size of available
-	 * memory by orders of magnitude.
-	 */
-	assert(env->status != VINYL_INITIAL_RECOVERY_LOCAL &&
-	       env->status != VINYL_FINAL_RECOVERY_LOCAL);
-
 	if (lsregion_used(&env->mem_env.allocator) == 0) {
 		/*
 		 * The memory limit has been exceeded, but there's
@@ -2710,13 +2695,15 @@ vy_set_deferred_delete_trigger(void)
 static int
 vinyl_engine_bootstrap(struct engine *engine)
 {
+	vy_set_deferred_delete_trigger();
+
 	struct vy_env *e = vy_env(engine);
 	assert(e->status == VINYL_OFFLINE);
 	if (vy_log_bootstrap() != 0)
 		return -1;
+	vy_scheduler_start(&e->scheduler);
 	vy_quota_set_limit(&e->quota, e->memory);
 	e->status = VINYL_ONLINE;
-	vy_set_deferred_delete_trigger();
 	return 0;
 }
 
@@ -2724,6 +2711,8 @@ static int
 vinyl_engine_begin_initial_recovery(struct engine *engine,
 				    const struct vclock *recovery_vclock)
 {
+	vy_set_deferred_delete_trigger();
+
 	struct vy_env *e = vy_env(engine);
 	assert(e->status == VINYL_OFFLINE);
 	if (recovery_vclock != NULL) {
@@ -2732,14 +2721,25 @@ vinyl_engine_begin_initial_recovery(struct engine *engine,
 		e->recovery = vy_log_begin_recovery(recovery_vclock);
 		if (e->recovery == NULL)
 			return -1;
+		/*
+		 * We can't schedule any background tasks until
+		 * local recovery is complete, because they would
+		 * disrupt yet to be recovered data stored on disk.
+		 * So we don't start the scheduler fiber or enable
+		 * memory limit until then.
+		 *
+		 * This is OK, because during recovery an instance
+		 * can't consume more memory than it used before
+		 * restart and hence memory dumps are not necessary.
+		 */
 		e->status = VINYL_INITIAL_RECOVERY_LOCAL;
 	} else {
 		if (vy_log_bootstrap() != 0)
 			return -1;
+		vy_scheduler_start(&e->scheduler);
 		vy_quota_set_limit(&e->quota, e->memory);
 		e->status = VINYL_INITIAL_RECOVERY_REMOTE;
 	}
-	vy_set_deferred_delete_trigger();
 	return 0;
 }
 
@@ -2780,11 +2780,10 @@ vinyl_engine_end_recovery(struct engine *engine)
 		vy_recovery_delete(e->recovery);
 		e->recovery = NULL;
 		e->recovery_vclock = NULL;
-		e->status = VINYL_ONLINE;
+		vy_scheduler_start(&e->scheduler);
 		vy_quota_set_limit(&e->quota, e->memory);
 		break;
 	case VINYL_FINAL_RECOVERY_REMOTE:
-		e->status = VINYL_ONLINE;
 		break;
 	default:
 		unreachable();
@@ -2796,6 +2795,8 @@ vinyl_engine_end_recovery(struct engine *engine)
 	 */
 	if (e->lsm_env.lsm_count > 0)
 		vy_run_env_enable_coio(&e->run_env, e->read_threads);
+
+	e->status = VINYL_ONLINE;
 	return 0;
 }
 
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index 4f57c0a0..5daaf4f1 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -399,6 +399,14 @@ vy_worker_pool_destroy(struct vy_worker_pool *pool)
 static struct vy_worker *
 vy_worker_pool_get(struct vy_worker_pool *pool)
 {
+	/*
+	 * Start worker threads only when a task is scheduled
+	 * so that they are not dangling around if vinyl is
+	 * not used.
+	 */
+	if (pool->workers == NULL)
+		vy_worker_pool_start(pool);
+
 	struct vy_worker *worker = NULL;
 	if (!stailq_empty(&pool->idle_workers)) {
 		worker = stailq_shift_entry(&pool->idle_workers,
@@ -465,7 +473,11 @@ vy_scheduler_create(struct vy_scheduler *scheduler, int write_threads,
 
 	diag_create(&scheduler->diag);
 	fiber_cond_create(&scheduler->dump_cond);
+}
 
+void
+vy_scheduler_start(struct vy_scheduler *scheduler)
+{
 	fiber_start(scheduler->scheduler_fiber, scheduler);
 }
 
@@ -1947,21 +1959,6 @@ vy_scheduler_f(va_list va)
 {
 	struct vy_scheduler *scheduler = va_arg(va, struct vy_scheduler *);
 
-	/*
-	 * Yield immediately, until the quota watermark is reached
-	 * for the first time or a checkpoint is made.
-	 * Then start the worker threads: we know they will be
-	 * needed. If quota watermark is never reached, workers
-	 * are not started and the scheduler is idle until
-	 * shutdown or checkpoint.
-	 */
-	fiber_cond_wait(&scheduler->scheduler_cond);
-	if (scheduler->scheduler_fiber == NULL)
-		return 0; /* destroyed */
-
-	vy_worker_pool_start(&scheduler->dump_pool);
-	vy_worker_pool_start(&scheduler->compact_pool);
-
 	while (scheduler->scheduler_fiber != NULL) {
 		struct stailq processed_tasks;
 		struct vy_task *task, *next;
diff --git a/src/box/vy_scheduler.h b/src/box/vy_scheduler.h
index 841d5a53..96ce721b 100644
--- a/src/box/vy_scheduler.h
+++ b/src/box/vy_scheduler.h
@@ -172,6 +172,12 @@ vy_scheduler_create(struct vy_scheduler *scheduler, int write_threads,
 		    struct vy_run_env *run_env, struct rlist *read_views);
 
 /**
+ * Start a scheduler fiber.
+ */
+void
+vy_scheduler_start(struct vy_scheduler *scheduler);
+
+/**
  * Destroy a scheduler instance.
  */
 void
-- 
2.11.0

  parent reply	other threads:[~2018-09-04 17:23 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-02 20:18 [PATCH 0/7] vinyl: improve stats for throttling Vladimir Davydov
2018-09-02 20:18 ` [PATCH 1/7] vinyl: fix accounting of secondary index cache statements Vladimir Davydov
2018-09-02 22:26   ` [tarantool-patches] " Konstantin Osipov
2018-09-02 20:18 ` [PATCH 2/7] vinyl: add global memory stats Vladimir Davydov
2018-09-02 22:27   ` [tarantool-patches] " Konstantin Osipov
2018-09-02 22:27   ` Konstantin Osipov
2018-09-03  8:10     ` Vladimir Davydov
2018-09-02 20:18 ` [PATCH 3/7] vinyl: add global disk stats Vladimir Davydov
2018-09-02 22:30   ` [tarantool-patches] " Konstantin Osipov
2018-09-02 20:18 ` [PATCH 4/7] vinyl: fix force compaction logic Vladimir Davydov
2018-09-02 20:18 ` [PATCH 5/7] vinyl: update compact priority usual way on range split/coalesce Vladimir Davydov
2018-09-02 20:18 ` [PATCH 6/7] vinyl: keep track of compaction queue length and debt Vladimir Davydov
2018-09-02 20:19 ` [PATCH 7/7] vinyl: keep track of disk idle time Vladimir Davydov
2018-09-04 11:54   ` Vladimir Davydov
2018-09-04 17:23     ` Vladimir Davydov
2018-09-04 17:23       ` [PATCH 1/8] vinyl: add helper to check whether dump is in progress Vladimir Davydov
2018-09-06  7:33         ` Konstantin Osipov
2018-09-04 17:23       ` [PATCH 2/8] vinyl: don't use mempool for allocating background tasks Vladimir Davydov
2018-09-06  7:33         ` Konstantin Osipov
2018-09-04 17:23       ` [PATCH 3/8] vinyl: factor out worker pool from scheduler struct Vladimir Davydov
2018-09-06  7:34         ` Konstantin Osipov
2018-09-04 17:23       ` [PATCH 4/8] vinyl: move worker allocation closer to task creation Vladimir Davydov
2018-09-06  7:35         ` Konstantin Osipov
2018-09-04 17:23       ` [PATCH 5/8] vinyl: use separate thread pools for dump and compaction tasks Vladimir Davydov
2018-09-06  7:37         ` Konstantin Osipov
2018-09-06  9:48           ` Vladimir Davydov
2018-09-06 10:32             ` Konstantin Osipov
2018-09-04 17:23       ` [PATCH 6/8] vinyl: zap vy_worker_pool::idle_worker_count Vladimir Davydov
2018-09-06  7:38         ` Konstantin Osipov
2018-09-04 17:23       ` Vladimir Davydov [this message]
2018-09-06  7:39         ` [PATCH 7/8] vinyl: don't start scheduler fiber until local recovery is complete Konstantin Osipov
2018-09-04 17:23       ` [PATCH 8/8] vinyl: keep track of thread pool idle ratio Vladimir Davydov
2018-09-06  7:49         ` Konstantin Osipov
2018-09-06  8:18           ` Vladimir Davydov
2018-09-06 10:26             ` Konstantin Osipov
2018-09-06 10:52               ` Vladimir Davydov
2018-09-06 10:57                 ` Konstantin Osipov
2018-09-06 11:59                   ` Vladimir Davydov
2018-09-09 11:41 ` [PATCH 0/7] vinyl: improve stats for throttling Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8dd7a2f1d46aae68e3509311826f9f6efc8bc49e.1536080993.git.vdavydov.dev@gmail.com \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [PATCH 7/8] vinyl: don'\''t start scheduler fiber until local recovery is complete' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox