Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH 0/4] Prepare vylog for space alter
@ 2018-03-17 17:56 Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 1/4] vinyl: refactor vylog recovery Vladimir Davydov
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-17 17:56 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

The vinyl metadata log (vylog) has two idiosyncrasies that thwart
implementation of ALTER of vinyl spaces. The first one is the
callback-based recovery procedure (see vy_recovery_iterate): an
attempt to purge incomplete indexes from vylog after restart using the
recovery callback would result in extremely ugly and obfuscated code.
The second problem is the way indexes are identified in vylog: we
currently use LSN as a unique identifier, but it won't be unique if
more than one index is altered in one operation, which is the case
when the primary key definition is changed.

The patch set solves both these problems as follows:
 - Patch 1 refactors the recovery procedure so that it uses plain
   iterators instead of callbacks.
 - Patch 2 replaces LSN with space and index id as index identifier
   in vylog.
 - Patches 3 and 4 do some cleanup that becomes possible after
   patch 2 is applied.

https://github.com/tarantool/tarantool/tree/vy-prepare-vylog-for-alter

Vladimir Davydov (4):
  vinyl: refactor vylog recovery
  vinyl: do not use index lsn to identify indexes in vylog
  index: remove lsn from commit_create arguments
  alter: rewrite space truncation using alter infrastructure

 src/box/alter.cc         | 135 ++--------
 src/box/index.cc         |   2 +-
 src/box/index.h          |  12 +-
 src/box/memtx_space.c    |  29 +--
 src/box/space.h          |  44 ----
 src/box/sysview_engine.c |  17 --
 src/box/vinyl.c          | 593 ++++++++++++++----------------------------
 src/box/vy_index.c       | 404 ++++++++++++++---------------
 src/box/vy_index.h       |  34 +--
 src/box/vy_log.c         | 651 +++++++++++++++++++----------------------------
 src/box/vy_log.h         | 297 +++++++++++++--------
 src/box/vy_scheduler.c   |  12 +-
 test/unit/vy_log_stub.c  |   8 +-
 test/vinyl/layout.result |  36 +--
 14 files changed, 902 insertions(+), 1372 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] vinyl: refactor vylog recovery
  2018-03-17 17:56 [PATCH 0/4] Prepare vylog for space alter Vladimir Davydov
@ 2018-03-17 17:56 ` Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 2/4] vinyl: do not use index lsn to identify indexes in vylog Vladimir Davydov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-17 17:56 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

The vy_recovery structure was initially designed as opaque to the
outside world - to iterate over objects stored in it, one is supposed to
use vy_recovery_iterate(), which invokes the given callback for each
recovered object encoded as vy_log_record that was used to create it.

Such a design gets extremely difficult to use when we need to preserve
some context between callback invocations - e.g. see how ugly backup and
garbage collection procedures look. And it is going to become even more
obfuscated once we introduce the notion of incomplete indexes (indexes
that are currently being built by ALTER).

So let's refactor vylog recovery procedure: let's make the vy_recovery
structure transparent and allow to iterate over internal representations
of recovered objects directly, without callbacks.
---
 src/box/vinyl.c          | 399 +++++++++++++++++++++--------------------------
 src/box/vy_index.c       | 375 ++++++++++++++++++++++----------------------
 src/box/vy_log.c         | 340 ++++++++++------------------------------
 src/box/vy_log.h         | 197 +++++++++++++++--------
 test/unit/vy_log_stub.c  |   8 +-
 test/vinyl/layout.result |   4 +-
 6 files changed, 582 insertions(+), 741 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index e0c30757..d3659b0b 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -3011,8 +3011,6 @@ struct vy_join_ctx {
 	struct cbus_call_msg cmsg;
 	/** ID of the space currently being relayed. */
 	uint32_t space_id;
-	/** Ordinal number of the index. */
-	uint32_t index_id;
 	/**
 	 * Index key definition, as defined by the user.
 	 * We only send the primary key, so the definition
@@ -3036,6 +3034,55 @@ struct vy_join_ctx {
 	struct rlist slices;
 };
 
+/**
+ * Recover a slice and add it to the list of slices.
+ * Newer slices are supposed to be recovered first.
+ * Returns 0 on success, -1 on failure.
+ */
+static int
+vy_prepare_send_slice(struct vy_join_ctx *ctx,
+		      struct vy_slice_recovery_info *slice_info)
+{
+	int rc = -1;
+	struct vy_run *run = NULL;
+	struct tuple *begin = NULL, *end = NULL;
+
+	run = vy_run_new(&ctx->env->run_env, slice_info->run->id);
+	if (run == NULL)
+		goto out;
+	if (vy_run_recover(run, ctx->env->path, ctx->space_id, 0) != 0)
+		goto out;
+
+	if (slice_info->begin != NULL) {
+		begin = vy_key_from_msgpack(ctx->env->index_env.key_format,
+					    slice_info->begin);
+		if (begin == NULL)
+			goto out;
+	}
+	if (slice_info->end != NULL) {
+		end = vy_key_from_msgpack(ctx->env->index_env.key_format,
+					  slice_info->end);
+		if (end == NULL)
+			goto out;
+	}
+
+	struct vy_slice *slice = vy_slice_new(slice_info->id, run,
+					      begin, end, ctx->key_def);
+	if (slice == NULL)
+		goto out;
+
+	rlist_add_tail_entry(&ctx->slices, slice, in_join);
+	rc = 0;
+out:
+	if (run != NULL)
+		vy_run_unref(run);
+	if (begin != NULL)
+		tuple_unref(begin);
+	if (end != NULL)
+		tuple_unref(end);
+	return rc;
+}
+
 static int
 vy_send_range_f(struct cbus_call_msg *cmsg)
 {
@@ -3068,28 +3115,38 @@ err:
 	return rc;
 }
 
-/**
- * Merge and send all runs from the given relay context.
- * On success, delete runs.
- */
+/** Merge and send all runs of the given range. */
 static int
-vy_send_range(struct vy_join_ctx *ctx)
+vy_send_range(struct vy_join_ctx *ctx,
+	      struct vy_range_recovery_info *range_info)
 {
-	if (rlist_empty(&ctx->slices))
+	int rc;
+	struct vy_slice *slice, *tmp;
+
+	if (rlist_empty(&range_info->slices))
 		return 0; /* nothing to do */
 
-	int rc = -1;
+	/* Recover slices. */
+	struct vy_slice_recovery_info *slice_info;
+	rlist_foreach_entry(slice_info, &range_info->slices, in_range) {
+		rc = vy_prepare_send_slice(ctx, slice_info);
+		if (rc != 0)
+			goto out_delete_slices;
+	}
+
+	/* Create a write iterator. */
 	struct rlist fake_read_views;
 	rlist_create(&fake_read_views);
 	ctx->wi = vy_write_iterator_new(ctx->key_def,
 					ctx->format, ctx->upsert_format,
 					true, true, &fake_read_views);
-	if (ctx->wi == NULL)
+	if (ctx->wi == NULL) {
+		rc = -1;
 		goto out;
-
-	struct vy_slice *slice;
+	}
 	rlist_foreach_entry(slice, &ctx->slices, in_join) {
-		if (vy_write_iterator_new_slice(ctx->wi, slice) != 0)
+		rc = vy_write_iterator_new_slice(ctx->wi, slice);
+		if (rc != 0)
 			goto out_delete_wi;
 	}
 
@@ -3099,11 +3156,10 @@ vy_send_range(struct vy_join_ctx *ctx)
 		       vy_send_range_f, NULL, TIMEOUT_INFINITY);
 	fiber_set_cancellable(cancellable);
 
-	struct vy_slice *tmp;
+out_delete_slices:
 	rlist_foreach_entry_safe(slice, &ctx->slices, in_join, tmp)
 		vy_slice_delete(slice);
 	rlist_create(&ctx->slices);
-
 out_delete_wi:
 	ctx->wi->iface->close(ctx->wi);
 	ctx->wi = NULL;
@@ -3111,96 +3167,59 @@ out:
 	return rc;
 }
 
-/** Relay callback, passed to vy_recovery_iterate(). */
+/** Send all tuples stored in the given index. */
 static int
-vy_join_cb(const struct vy_log_record *record, void *arg)
+vy_send_index(struct vy_join_ctx *ctx,
+	      struct vy_index_recovery_info *index_info)
 {
-	struct vy_join_ctx *ctx = arg;
-
-	if (record->type == VY_LOG_CREATE_INDEX ||
-	    record->type == VY_LOG_INSERT_RANGE) {
-		/*
-		 * All runs of the current range have been recovered,
-		 * so send them to the replica.
-		 */
-		if (vy_send_range(ctx) != 0)
-			return -1;
-	}
+	int rc = -1;
 
-	if (record->type == VY_LOG_CREATE_INDEX) {
-		ctx->space_id = record->space_id;
-		ctx->index_id = record->index_id;
-		if (ctx->key_def != NULL)
-			free(ctx->key_def);
-		ctx->key_def = key_def_new_with_parts(record->key_parts,
-						      record->key_part_count);
-		if (ctx->key_def == NULL)
-			return -1;
-		if (ctx->format != NULL)
-			tuple_format_unref(ctx->format);
-		ctx->format = tuple_format_new(&vy_tuple_format_vtab,
-					       &ctx->key_def, 1, 0, NULL, 0,
-					       NULL);
-		if (ctx->format == NULL)
-			return -1;
-		tuple_format_ref(ctx->format);
-		if (ctx->upsert_format != NULL)
-			tuple_format_unref(ctx->upsert_format);
-		ctx->upsert_format = vy_tuple_format_new_upsert(ctx->format);
-		if (ctx->upsert_format == NULL)
-			return -1;
-		tuple_format_ref(ctx->upsert_format);
-	}
+	if (index_info->is_dropped)
+		return 0;
 
 	/*
 	 * We are only interested in the primary index.
 	 * Secondary keys will be rebuilt on the destination.
 	 */
-	if (ctx->index_id != 0)
+	if (index_info->index_id != 0)
 		return 0;
 
-	if (record->type == VY_LOG_INSERT_SLICE) {
-		struct tuple_format *key_format = ctx->env->index_env.key_format;
-		struct tuple *begin = NULL, *end = NULL;
-		bool success = false;
-
-		struct vy_run *run = vy_run_new(&ctx->env->run_env,
-						record->run_id);
-		if (run == NULL)
-			goto done_slice;
-		if (vy_run_recover(run, ctx->env->path,
-				   ctx->space_id, ctx->index_id) != 0)
-			goto done_slice;
-
-		if (record->begin != NULL) {
-			begin = vy_key_from_msgpack(key_format, record->begin);
-			if (begin == NULL)
-				goto done_slice;
-		}
-		if (record->end != NULL) {
-			end = vy_key_from_msgpack(key_format, record->end);
-			if (end == NULL)
-				goto done_slice;
-		}
+	ctx->space_id = index_info->space_id;
 
-		struct vy_slice *slice = vy_slice_new(record->slice_id,
-						run, begin, end, ctx->key_def);
-		if (slice == NULL)
-			goto done_slice;
-
-		rlist_add_entry(&ctx->slices, slice, in_join);
-		success = true;
-done_slice:
-		if (run != NULL)
-			vy_run_unref(run);
-		if (begin != NULL)
-			tuple_unref(begin);
-		if (end != NULL)
-			tuple_unref(end);
-		if (!success)
-			return -1;
+	/* Create key definition and tuple format. */
+	ctx->key_def = key_def_new_with_parts(index_info->key_parts,
+					      index_info->key_part_count);
+	if (ctx->key_def == NULL)
+		goto out;
+	ctx->format = tuple_format_new(&vy_tuple_format_vtab, &ctx->key_def,
+				       1, 0, NULL, 0, NULL);
+	if (ctx->format == NULL)
+		goto out_free_key_def;
+	tuple_format_ref(ctx->format);
+	ctx->upsert_format = vy_tuple_format_new_upsert(ctx->format);
+	if (ctx->upsert_format == NULL)
+		goto out_free_format;
+	tuple_format_ref(ctx->upsert_format);
+
+	/* Send ranges. */
+	struct vy_range_recovery_info *range_info;
+	assert(!rlist_empty(&index_info->ranges));
+	rlist_foreach_entry(range_info, &index_info->ranges, in_index) {
+		rc = vy_send_range(ctx, range_info);
+		if (rc != 0)
+			break;
 	}
-	return 0;
+
+	tuple_format_unref(ctx->upsert_format);
+	ctx->upsert_format = NULL;
+out_free_format:
+	tuple_format_unref(ctx->format);
+	ctx->format = NULL;
+out_free_key_def:
+	free(ctx->key_def);
+	ctx->key_def = NULL;
+out:
+	return rc;
 }
 
 /** Relay cord function. */
@@ -3260,22 +3279,15 @@ vinyl_engine_join(struct engine *engine, struct vclock *vclock,
 		say_error("failed to recover vylog to join a replica");
 		goto out_join_cord;
 	}
-	rc = vy_recovery_iterate(recovery, vy_join_cb, ctx);
+	rc = 0;
+	struct vy_index_recovery_info *index_info;
+	rlist_foreach_entry(index_info, &recovery->indexes, in_recovery) {
+		rc = vy_send_index(ctx, index_info);
+		if (rc != 0)
+			break;
+	}
 	vy_recovery_delete(recovery);
-	/* Send the last range. */
-	if (rc == 0)
-		rc = vy_send_range(ctx);
-
-	/* Cleanup. */
-	if (ctx->key_def != NULL)
-		free(ctx->key_def);
-	if (ctx->format != NULL)
-		tuple_format_unref(ctx->format);
-	if (ctx->upsert_format != NULL)
-		tuple_format_unref(ctx->upsert_format);
-	struct vy_slice *slice, *tmp;
-	rlist_foreach_entry_safe(slice, &ctx->slices, in_join, tmp)
-		vy_slice_delete(slice);
+
 out_join_cord:
 	cbus_stop_loop(&ctx->relay_pipe);
 	cpipe_destroy(&ctx->relay_pipe);
@@ -3355,70 +3367,29 @@ vinyl_space_apply_initial_join_row(struct space *space, struct request *request)
 
 /* {{{ Garbage collection */
 
-/** Argument passed to vy_gc_cb(). */
-struct vy_gc_arg {
-	/** Vinyl environment. */
-	struct vy_env *env;
-	/**
-	 * Specifies what kinds of runs to delete.
-	 * See VY_GC_*.
-	 */
-	unsigned int gc_mask;
-	/** LSN of the oldest checkpoint to save. */
-	int64_t gc_lsn;
-	/**
-	 * ID of the current space and index.
-	 * Needed for file name formatting.
-	 */
-	uint32_t space_id;
-	uint32_t index_id;
-	/** Number of times the callback has been called. */
-	int loops;
-};
-
 /**
- * Garbage collection callback, passed to vy_recovery_iterate().
- *
  * Given a record encoding information about a vinyl run, try to
  * delete the corresponding files. On success, write a "forget" record
  * to the log so that all information about the run is deleted on the
  * next log rotation.
  */
-static int
-vy_gc_cb(const struct vy_log_record *record, void *cb_arg)
+static void
+vy_gc_run(struct vy_env *env,
+	  struct vy_index_recovery_info *index_info,
+	  struct vy_run_recovery_info *run_info)
 {
-	struct vy_gc_arg *arg = cb_arg;
-
-	switch (record->type) {
-	case VY_LOG_CREATE_INDEX:
-		arg->space_id = record->space_id;
-		arg->index_id = record->index_id;
-		goto out;
-	case VY_LOG_PREPARE_RUN:
-		if ((arg->gc_mask & VY_GC_INCOMPLETE) == 0)
-			goto out;
-		break;
-	case VY_LOG_DROP_RUN:
-		if ((arg->gc_mask & VY_GC_DROPPED) == 0 ||
-		    record->gc_lsn >= arg->gc_lsn)
-			goto out;
-		break;
-	default:
-		goto out;
-	}
-
 	ERROR_INJECT(ERRINJ_VY_GC,
 		     {say_error("error injection: vinyl run %lld not deleted",
-				(long long)record->run_id); goto out;});
+				(long long)run_info->id); return;});
 
 	/* Try to delete files. */
-	if (vy_run_remove_files(arg->env->path, arg->space_id,
-				arg->index_id, record->run_id) != 0)
-		goto out;
+	if (vy_run_remove_files(env->path, index_info->space_id,
+				index_info->index_id, run_info->id) != 0)
+		return;
 
 	/* Forget the run on success. */
 	vy_log_tx_begin();
-	vy_log_forget_run(record->run_id);
+	vy_log_forget_run(run_info->id);
 	/*
 	 * Leave the record in the vylog buffer on disk error.
 	 * If we fail to flush it before restart, we will retry
@@ -3426,23 +3397,35 @@ vy_gc_cb(const struct vy_log_record *record, void *cb_arg)
 	 * is invoked, which is harmless.
 	 */
 	vy_log_tx_try_commit();
-out:
-	if (++arg->loops % VY_YIELD_LOOPS == 0)
-		fiber_sleep(0);
-	return 0;
 }
 
-/** Delete unused run files, see vy_gc_arg for more details. */
+/**
+ * Delete unused run files stored in the recovery context.
+ * @param env      Vinyl environment.
+ * @param recovery Recovery context.
+ * @param gc_mask  Specifies what kinds of runs to delete (see VY_GC_*).
+ * @param gc_lsn   LSN of the oldest checkpoint to save.
+ */
 static void
 vy_gc(struct vy_env *env, struct vy_recovery *recovery,
       unsigned int gc_mask, int64_t gc_lsn)
 {
-	struct vy_gc_arg arg = {
-		.env = env,
-		.gc_mask = gc_mask,
-		.gc_lsn = gc_lsn,
-	};
-	vy_recovery_iterate(recovery, vy_gc_cb, &arg);
+	int loops = 0;
+	struct vy_index_recovery_info *index_info;
+	rlist_foreach_entry(index_info, &recovery->indexes, in_recovery) {
+		struct vy_run_recovery_info *run_info;
+		rlist_foreach_entry(run_info, &index_info->runs, in_index) {
+			if ((run_info->is_dropped &&
+			     run_info->gc_lsn < gc_lsn &&
+			     (gc_mask & VY_GC_DROPPED) != 0) ||
+			    (run_info->is_incomplete &&
+			     (gc_mask & VY_GC_INCOMPLETE) != 0)) {
+				vy_gc_run(env, index_info, run_info);
+			}
+			if (loops % VY_YIELD_LOOPS == 0)
+				fiber_sleep(0);
+		}
+	}
 }
 
 static int
@@ -3469,52 +3452,6 @@ vinyl_engine_collect_garbage(struct engine *engine, int64_t lsn)
 
 /* {{{ Backup */
 
-/** Argument passed to vy_backup_cb(). */
-struct vy_backup_arg {
-	/** Vinyl environment. */
-	struct vy_env *env;
-	/** Backup callback. */
-	int (*cb)(const char *, void *);
-	/** Argument passed to @cb. */
-	void *cb_arg;
-	/**
-	 * ID of the current space and index.
-	 * Needed for file name formatting.
-	 */
-	uint32_t space_id;
-	uint32_t index_id;
-	/** Number of times the callback has been called. */
-	int loops;
-};
-
-/** Backup callback, passed to vy_recovery_iterate(). */
-static int
-vy_backup_cb(const struct vy_log_record *record, void *cb_arg)
-{
-	struct vy_backup_arg *arg = cb_arg;
-
-	if (record->type == VY_LOG_CREATE_INDEX) {
-		arg->space_id = record->space_id;
-		arg->index_id = record->index_id;
-	}
-
-	if (record->type != VY_LOG_CREATE_RUN || record->is_dropped)
-		goto out;
-
-	char path[PATH_MAX];
-	for (int type = 0; type < vy_file_MAX; type++) {
-		vy_run_snprint_path(path, sizeof(path), arg->env->path,
-				    arg->space_id, arg->index_id,
-				    record->run_id, type);
-		if (arg->cb(path, arg->cb_arg) != 0)
-			return -1;
-	}
-out:
-	if (++arg->loops % VY_YIELD_LOOPS == 0)
-		fiber_sleep(0);
-	return 0;
-}
-
 static int
 vinyl_engine_backup(struct engine *engine, struct vclock *vclock,
 		    engine_backup_cb cb, void *cb_arg)
@@ -3535,12 +3472,32 @@ vinyl_engine_backup(struct engine *engine, struct vclock *vclock,
 		say_error("failed to recover vylog for backup");
 		return -1;
 	}
-	struct vy_backup_arg arg = {
-		.env = env,
-		.cb = cb,
-		.cb_arg = cb_arg,
-	};
-	int rc = vy_recovery_iterate(recovery, vy_backup_cb, &arg);
+	int rc = 0;
+	int loops = 0;
+	struct vy_index_recovery_info *index_info;
+	rlist_foreach_entry(index_info, &recovery->indexes, in_recovery) {
+		if (index_info->is_dropped)
+			continue;
+		struct vy_run_recovery_info *run_info;
+		rlist_foreach_entry(run_info, &index_info->runs, in_index) {
+			if (run_info->is_dropped || run_info->is_incomplete)
+				continue;
+			char path[PATH_MAX];
+			for (int type = 0; type < vy_file_MAX; type++) {
+				vy_run_snprint_path(path, sizeof(path),
+						    env->path,
+						    index_info->space_id,
+						    index_info->index_id,
+						    run_info->id, type);
+				rc = cb(path, cb_arg);
+				if (rc != 0)
+					goto out;
+			}
+			if (loops % VY_YIELD_LOOPS == 0)
+				fiber_sleep(0);
+		}
+	}
+out:
 	vy_recovery_delete(recovery);
 	return rc;
 }
diff --git a/src/box/vy_index.c b/src/box/vy_index.c
index 68fccab5..9c199ddd 100644
--- a/src/box/vy_index.c
+++ b/src/box/vy_index.c
@@ -36,7 +36,6 @@
 #include <sys/stat.h>
 #include <sys/types.h>
 
-#include "assoc.h"
 #include "diag.h"
 #include "errcode.h"
 #include "histogram.h"
@@ -386,156 +385,150 @@ vy_index_create(struct vy_index *index)
 	return vy_index_init_range_tree(index);
 }
 
-/** vy_index_recovery_cb() argument. */
-struct vy_index_recovery_cb_arg {
-	/** Index being recovered. */
-	struct vy_index *index;
-	/** Last recovered range. */
-	struct vy_range *range;
-	/** Vinyl run environment. */
-	struct vy_run_env *run_env;
-	/**
-	 * All recovered runs hashed by ID.
-	 * It is needed in order not to load the same
-	 * run each time a slice is created for it.
-	 */
-	struct mh_i64ptr_t *run_hash;
-	/**
-	 * True if force_recovery mode is enabled.
+static struct vy_run *
+vy_index_recover_run(struct vy_index *index,
+		     struct vy_run_recovery_info *run_info,
+		     struct vy_run_env *run_env, bool force_recovery)
+{
+	assert(!run_info->is_dropped);
+	assert(!run_info->is_incomplete);
+
+	if (run_info->data != NULL) {
+		/* Already recovered. */
+		return run_info->data;
+	}
+
+	struct vy_run *run = vy_run_new(run_env, run_info->id);
+	if (run == NULL)
+		return NULL;
+
+	run->dump_lsn = run_info->dump_lsn;
+	if (vy_run_recover(run, index->env->path,
+			   index->space_id, index->id) != 0 &&
+	    (!force_recovery ||
+	     vy_run_rebuild_index(run, index->env->path,
+				  index->space_id, index->id,
+				  index->cmp_def, index->key_def,
+				  index->mem_format, index->upsert_format,
+				  &index->opts) != 0)) {
+		vy_run_unref(run);
+		return NULL;
+	}
+	vy_index_add_run(index, run);
+
+	/*
+	 * The same run can be referenced by more than one slice
+	 * so we cache recovered runs in run_info to avoid loading
+	 * the same run multiple times.
+	 *
+	 * Runs are stored with their reference counters elevated.
+	 * We drop the extra references as soon as index recovery
+	 * is complete (see vy_index_recover()).
 	 */
-	bool force_recovery;
-};
+	run_info->data = run;
+	return run;
+}
 
-/** Index recovery callback, passed to vy_recovery_load_index(). */
-static int
-vy_index_recovery_cb(const struct vy_log_record *record, void *cb_arg)
+static struct vy_slice *
+vy_index_recover_slice(struct vy_index *index, struct vy_range *range,
+		       struct vy_slice_recovery_info *slice_info,
+		       struct vy_run_env *run_env, bool force_recovery)
 {
-	struct vy_index_recovery_cb_arg *arg = cb_arg;
-	struct vy_index *index = arg->index;
-	struct vy_range *range = arg->range;
-	struct vy_run_env *run_env = arg->run_env;
-	struct mh_i64ptr_t *run_hash = arg->run_hash;
-	bool force_recovery = arg->force_recovery;
-	struct tuple_format *key_format = index->env->key_format;
 	struct tuple *begin = NULL, *end = NULL;
+	struct vy_slice *slice = NULL;
 	struct vy_run *run;
-	struct vy_slice *slice;
-	bool success = false;
-
-	assert(record->type == VY_LOG_CREATE_INDEX || index->commit_lsn >= 0);
 
-	if (record->type == VY_LOG_INSERT_RANGE ||
-	    record->type == VY_LOG_INSERT_SLICE) {
-		if (record->begin != NULL) {
-			begin = vy_key_from_msgpack(key_format, record->begin);
-			if (begin == NULL)
-				goto out;
-		}
-		if (record->end != NULL) {
-			end = vy_key_from_msgpack(key_format, record->end);
-			if (end == NULL)
-				goto out;
-		}
-	}
-
-	switch (record->type) {
-	case VY_LOG_CREATE_INDEX:
-		assert(record->index_id == index->id);
-		assert(record->space_id == index->space_id);
-		assert(index->commit_lsn < 0);
-		assert(record->index_lsn >= 0);
-		index->commit_lsn = record->index_lsn;
-		break;
-	case VY_LOG_DUMP_INDEX:
-		assert(record->index_lsn == index->commit_lsn);
-		index->dump_lsn = record->dump_lsn;
-		break;
-	case VY_LOG_TRUNCATE_INDEX:
-		assert(record->index_lsn == index->commit_lsn);
-		index->truncate_count = record->truncate_count;
-		break;
-	case VY_LOG_DROP_INDEX:
-		assert(record->index_lsn == index->commit_lsn);
-		index->is_dropped = true;
-		/*
-		 * If the index was dropped, we don't need to replay
-		 * truncate (see vy_prepare_truncate_space()).
-		 */
-		index->truncate_count = UINT64_MAX;
-		break;
-	case VY_LOG_PREPARE_RUN:
-		break;
-	case VY_LOG_CREATE_RUN:
-		if (record->is_dropped)
-			break;
-		assert(record->index_lsn == index->commit_lsn);
-		run = vy_run_new(run_env, record->run_id);
-		if (run == NULL)
+	if (slice_info->begin != NULL) {
+		begin = vy_key_from_msgpack(index->env->key_format,
+					    slice_info->begin);
+		if (begin == NULL)
 			goto out;
-		run->dump_lsn = record->dump_lsn;
-		if (vy_run_recover(run, index->env->path,
-				   index->space_id, index->id) != 0 &&
-		     (!force_recovery ||
-		     vy_run_rebuild_index(run, index->env->path,
-					  index->space_id, index->id,
-					  index->cmp_def, index->key_def,
-					  index->mem_format,
-					  index->upsert_format,
-					  &index->opts) != 0)) {
-			vy_run_unref(run);
+	}
+	if (slice_info->end != NULL) {
+		end = vy_key_from_msgpack(index->env->key_format,
+					  slice_info->end);
+		if (end == NULL)
 			goto out;
-		}
-		struct mh_i64ptr_node_t node = { run->id, run };
-		if (mh_i64ptr_put(run_hash, &node,
-				  NULL, NULL) == mh_end(run_hash)) {
-			diag_set(OutOfMemory, 0,
-				 "mh_i64ptr_put", "mh_i64ptr_node_t");
-			vy_run_unref(run);
+	}
+	if (begin != NULL && end != NULL &&
+	    vy_key_compare(begin, end, index->cmp_def) >= 0) {
+		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
+			 tt_sprintf("begin >= end for slice %lld",
+				    (long long)slice_info->id));
+		goto out;
+	}
+
+	run = vy_index_recover_run(index, slice_info->run,
+				   run_env, force_recovery);
+	if (run == NULL)
+		goto out;
+
+	slice = vy_slice_new(slice_info->id, run, begin, end, index->cmp_def);
+	if (slice == NULL)
+		goto out;
+
+	vy_range_add_slice(range, slice);
+out:
+	if (begin != NULL)
+		tuple_unref(begin);
+	if (end != NULL)
+		tuple_unref(end);
+	return slice;
+}
+
+static struct vy_range *
+vy_index_recover_range(struct vy_index *index,
+		       struct vy_range_recovery_info *range_info,
+		       struct vy_run_env *run_env, bool force_recovery)
+{
+	struct tuple *begin = NULL, *end = NULL;
+	struct vy_range *range = NULL;
+
+	if (range_info->begin != NULL) {
+		begin = vy_key_from_msgpack(index->env->key_format,
+					    range_info->begin);
+		if (begin == NULL)
 			goto out;
-		}
-		break;
-	case VY_LOG_DROP_RUN:
-		break;
-	case VY_LOG_INSERT_RANGE:
-		assert(record->index_lsn == index->commit_lsn);
-		range = vy_range_new(record->range_id, begin, end,
-				     index->cmp_def);
-		if (range == NULL)
+	}
+	if (range_info->end != NULL) {
+		end = vy_key_from_msgpack(index->env->key_format,
+					  range_info->end);
+		if (end == NULL)
 			goto out;
-		if (range->begin != NULL && range->end != NULL &&
-		    vy_key_compare(range->begin, range->end,
-				   index->cmp_def) >= 0) {
-			diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-				 tt_sprintf("begin >= end for range id %lld",
-					    (long long)range->id));
+	}
+	if (begin != NULL && end != NULL &&
+	    vy_key_compare(begin, end, index->cmp_def) >= 0) {
+		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
+			 tt_sprintf("begin >= end for range %lld",
+				    (long long)range_info->id));
+		goto out;
+	}
+
+	range = vy_range_new(range_info->id, begin, end, index->cmp_def);
+	if (range == NULL)
+		goto out;
+
+	/*
+	 * Newer slices are stored closer to the head of the list,
+	 * while we are supposed to add slices in chronological
+	 * order, so use reverse iterator.
+	 */
+	struct vy_slice_recovery_info *slice_info;
+	rlist_foreach_entry_reverse(slice_info, &range_info->slices, in_range) {
+		if (vy_index_recover_slice(index, range, slice_info,
+					   run_env, force_recovery) == NULL) {
 			vy_range_delete(range);
+			range = NULL;
 			goto out;
 		}
-		vy_index_add_range(index, range);
-		arg->range = range;
-		break;
-	case VY_LOG_INSERT_SLICE:
-		assert(range != NULL);
-		assert(range->id == record->range_id);
-		mh_int_t k = mh_i64ptr_find(run_hash, record->run_id, NULL);
-		assert(k != mh_end(run_hash));
-		run = mh_i64ptr_node(run_hash, k)->val;
-		slice = vy_slice_new(record->slice_id, run, begin, end,
-				     index->cmp_def);
-		if (slice == NULL)
-			goto out;
-		vy_range_add_slice(range, slice);
-		break;
-	default:
-		unreachable();
 	}
-	success = true;
+	vy_index_add_range(index, range);
 out:
 	if (begin != NULL)
 		tuple_unref(begin);
 	if (end != NULL)
 		tuple_unref(end);
-	return success ? 0 : -1;
+	return range;
 }
 
 int
@@ -545,19 +538,6 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 {
 	assert(index->range_count == 0);
 
-	struct vy_index_recovery_cb_arg arg = {
-		.index = index,
-		.range = NULL,
-		.run_env = run_env,
-		.run_hash = NULL,
-		.force_recovery = force_recovery,
-	};
-	arg.run_hash = mh_i64ptr_new();
-	if (arg.run_hash == NULL) {
-		diag_set(OutOfMemory, 0, "mh_i64ptr_new", "mh_i64ptr_t");
-		return -1;
-	}
-
 	/*
 	 * Backward compatibility fixup: historically, we used
 	 * box.info.signature for LSN of index creation, which
@@ -568,39 +548,14 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 	if (index->opts.lsn != 0)
 		lsn = index->opts.lsn;
 
-	int rc = vy_recovery_load_index(recovery, index->space_id, index->id,
-					lsn, is_checkpoint_recovery,
-					vy_index_recovery_cb, &arg);
-
-	mh_int_t k;
-	mh_foreach(arg.run_hash, k) {
-		struct vy_run *run = mh_i64ptr_node(arg.run_hash, k)->val;
-		if (run->refs > 1)
-			vy_index_add_run(index, run);
-		if (run->refs == 1 && rc == 0) {
-			diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-				 tt_sprintf("Unused run %lld in index %lld",
-					    (long long)run->id,
-					    (long long)index->commit_lsn));
-			rc = -1;
-			/*
-			 * Continue the loop to unreference
-			 * all runs in the hash.
-			 */
-		}
-		/* Drop the reference held by the hash. */
-		vy_run_unref(run);
-	}
-	mh_i64ptr_delete(arg.run_hash);
-
-	if (rc != 0) {
-		/* Recovery callback failed. */
-		return -1;
-	}
-
-	if (index->commit_lsn < 0) {
-		/* Index was not found in the metadata log. */
-		if (is_checkpoint_recovery) {
+	/*
+	 * Look up the last incarnation of the index in vylog.
+	 */
+	struct vy_index_recovery_info *index_info;
+	index_info = vy_recovery_lookup_index(recovery,
+					      index->space_id, index->id);
+	if (is_checkpoint_recovery) {
+		if (index_info == NULL) {
 			/*
 			 * All indexes created from snapshot rows must
 			 * be present in vylog, because snapshot can
@@ -608,10 +563,21 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 			 * flushed.
 			 */
 			diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-				 tt_sprintf("Index %lld not found",
-					    (long long)index->commit_lsn));
+				 tt_sprintf("Index %u/%u not found",
+					    (unsigned)index->space_id,
+					    (unsigned)index->id));
 			return -1;
 		}
+		if (lsn > index_info->index_lsn) {
+			/*
+			 * The last incarnation of the index was created
+			 * before the last checkpoint, load it now.
+			 */
+			lsn = index_info->index_lsn;
+		}
+	}
+
+	if (index_info == NULL || lsn > index_info->index_lsn) {
 		/*
 		 * If we failed to log index creation before restart,
 		 * we won't find it in the log on recovery. This is
@@ -622,15 +588,58 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 		return vy_index_init_range_tree(index);
 	}
 
-	if (index->is_dropped) {
+	index->commit_lsn = lsn;
+
+	if (lsn < index_info->index_lsn || index_info->is_dropped) {
 		/*
-		 * Initial range is not stored in the metadata log
-		 * for dropped indexes, but we need it for recovery.
+		 * Loading a past incarnation of the index, i.e.
+		 * the index is going to dropped during final
+		 * recovery. Mark it as such.
+		 */
+		index->is_dropped = true;
+		/*
+		 * If the index was dropped, we don't need to replay
+		 * truncate (see vinyl_space_prepare_truncate()).
+		 */
+		index->truncate_count = UINT64_MAX;
+		/*
+		 * We need range tree initialized for all indexes,
+		 * even for dropped ones.
 		 */
 		return vy_index_init_range_tree(index);
 	}
 
 	/*
+	 * Loading the last incarnation of the index from vylog.
+	 */
+	index->dump_lsn = index_info->dump_lsn;
+	index->truncate_count = index_info->truncate_count;
+
+	int rc = 0;
+	struct vy_range_recovery_info *range_info;
+	rlist_foreach_entry(range_info, &index_info->ranges, in_index) {
+		if (vy_index_recover_range(index, range_info,
+					   run_env, force_recovery) == NULL) {
+			rc = -1;
+			break;
+		}
+	}
+
+	/*
+	 * vy_index_recover_run() elevates reference counter
+	 * of each recovered run. We need to drop the extra
+	 * references once we are done.
+	 */
+	struct vy_run *run;
+	rlist_foreach_entry(run, &index->runs, in_index) {
+		assert(run->refs > 1);
+		vy_run_unref(run);
+	}
+
+	if (rc != 0)
+		return -1;
+
+	/*
 	 * Account ranges to the index and check that the range tree
 	 * does not have holes or overlaps.
 	 */
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index c31a588e..8b95282b 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -115,8 +115,6 @@ static const char *vy_log_type_name[] = {
 	[VY_LOG_TRUNCATE_INDEX]		= "truncate_index",
 };
 
-struct vy_recovery;
-
 /** Metadata log object. */
 struct vy_log {
 	/**
@@ -170,111 +168,6 @@ struct vy_log {
 };
 static struct vy_log vy_log;
 
-/** Recovery context. */
-struct vy_recovery {
-	/** space_id, index_id -> vy_index_recovery_info. */
-	struct mh_i64ptr_t *index_id_hash;
-	/** index_lsn -> vy_index_recovery_info. */
-	struct mh_i64ptr_t *index_lsn_hash;
-	/** ID -> vy_range_recovery_info. */
-	struct mh_i64ptr_t *range_hash;
-	/** ID -> vy_run_recovery_info. */
-	struct mh_i64ptr_t *run_hash;
-	/** ID -> vy_slice_recovery_info. */
-	struct mh_i64ptr_t *slice_hash;
-	/**
-	 * Maximal vinyl object ID, according to the metadata log,
-	 * or -1 in case no vinyl objects were recovered.
-	 */
-	int64_t max_id;
-};
-
-/** Vinyl index info stored in a recovery context. */
-struct vy_index_recovery_info {
-	/** LSN of the index creation. */
-	int64_t index_lsn;
-	/** Ordinal index number in the space. */
-	uint32_t index_id;
-	/** Space ID. */
-	uint32_t space_id;
-	/** Array of key part definitions. */
-	struct key_part_def *key_parts;
-	/** Number of key parts. */
-	uint32_t key_part_count;
-	/** True if the index was dropped. */
-	bool is_dropped;
-	/** LSN of the last index dump. */
-	int64_t dump_lsn;
-	/** Truncate count. */
-	int64_t truncate_count;
-	/**
-	 * List of all ranges in the index, linked by
-	 * vy_range_recovery_info::in_index.
-	 */
-	struct rlist ranges;
-	/**
-	 * List of all runs created for the index
-	 * (both committed and not), linked by
-	 * vy_run_recovery_info::in_index.
-	 */
-	struct rlist runs;
-};
-
-/** Vinyl range info stored in a recovery context. */
-struct vy_range_recovery_info {
-	/** Link in vy_index_recovery_info::ranges. */
-	struct rlist in_index;
-	/** ID of the range. */
-	int64_t id;
-	/** Start of the range, stored in MsgPack array. */
-	char *begin;
-	/** End of the range, stored in MsgPack array. */
-	char *end;
-	/**
-	 * List of all slices in the range, linked by
-	 * vy_slice_recovery_info::in_range.
-	 *
-	 * Newer slices are closer to the head.
-	 */
-	struct rlist slices;
-};
-
-/** Run info stored in a recovery context. */
-struct vy_run_recovery_info {
-	/** Link in vy_index_recovery_info::runs. */
-	struct rlist in_index;
-	/** ID of the run. */
-	int64_t id;
-	/** Max LSN stored on disk. */
-	int64_t dump_lsn;
-	/**
-	 * For deleted runs: LSN of the last checkpoint
-	 * that uses this run.
-	 */
-	int64_t gc_lsn;
-	/**
-	 * True if the run was not committed (there's
-	 * VY_LOG_PREPARE_RUN, but no VY_LOG_CREATE_RUN).
-	 */
-	bool is_incomplete;
-	/** True if the run was dropped (VY_LOG_DROP_RUN). */
-	bool is_dropped;
-};
-
-/** Slice info stored in a recovery context. */
-struct vy_slice_recovery_info {
-	/** Link in vy_range_recovery_info::slices. */
-	struct rlist in_range;
-	/** ID of the slice. */
-	int64_t id;
-	/** Run this slice was created for. */
-	struct vy_run_recovery_info *run;
-	/** Start of the slice, stored in MsgPack array. */
-	char *begin;
-	/** End of the slice, stored in MsgPack array. */
-	char *end;
-};
-
 static struct vy_recovery *
 vy_recovery_new_locked(int64_t signature, bool only_checkpoint);
 
@@ -977,91 +870,6 @@ vy_log_end_recovery(void)
 	return 0;
 }
 
-/** Argument passed to vy_log_rotate_cb_func(). */
-struct vy_log_rotate_cb_arg {
-	struct xdir *dir;
-	struct xlog *xlog;
-	const struct vclock *vclock;
-};
-
-/** Callback passed to vy_recovery_iterate() for log rotation. */
-static int
-vy_log_rotate_cb_func(const struct vy_log_record *record, void *cb_arg)
-{
-	struct vy_log_rotate_cb_arg *arg = cb_arg;
-	struct xlog *xlog = arg->xlog;
-	struct xrow_header row;
-
-	say_verbose("save vylog record: %s", vy_log_record_str(record));
-
-	/* Create the log file on the first write. */
-	if (!xlog_is_open(xlog) &&
-	    xdir_create_xlog(arg->dir, xlog, arg->vclock) < 0)
-		return -1;
-
-	if (vy_log_record_encode(record, &row) < 0 ||
-	    xlog_write_row(xlog, &row) < 0)
-		return -1;
-	return 0;
-}
-
-/**
- * Create an vy_log file from a recovery context.
- */
-static int
-vy_log_create(const struct vclock *vclock, struct vy_recovery *recovery)
-{
-	/*
-	 * Only create the log file if we have something
-	 * to write to it.
-	 */
-	struct xlog xlog;
-	xlog_clear(&xlog);
-
-	say_verbose("saving vylog %lld", (long long)vclock_sum(vclock));
-
-	struct vy_log_rotate_cb_arg arg = {
-		.xlog = &xlog,
-		.dir = &vy_log.dir,
-		.vclock = vclock,
-	};
-	if (vy_recovery_iterate(recovery, vy_log_rotate_cb_func, &arg) < 0)
-		goto err_write_xlog;
-
-	if (!xlog_is_open(&xlog))
-		goto done; /* nothing written */
-
-	/* Mark the end of the snapshot. */
-	struct xrow_header row;
-	struct vy_log_record record;
-	vy_log_record_init(&record);
-	record.type = VY_LOG_SNAPSHOT;
-	if (vy_log_record_encode(&record, &row) < 0 ||
-	    xlog_write_row(&xlog, &row) < 0)
-		goto err_write_xlog;
-
-	/* Finalize the new xlog. */
-	if (xlog_flush(&xlog) < 0 ||
-	    xlog_sync(&xlog) < 0 ||
-	    xlog_rename(&xlog) < 0)
-		goto err_write_xlog;
-
-	xlog_close(&xlog, false);
-done:
-	say_verbose("done saving vylog");
-	return 0;
-
-err_write_xlog:
-	/* Delete the unfinished xlog. */
-	if (xlog_is_open(&xlog)) {
-		if (unlink(xlog.filename) < 0)
-			say_syserror("failed to delete file '%s'",
-				     xlog.filename);
-		xlog_close(&xlog, false);
-	}
-	return -1;
-}
-
 static ssize_t
 vy_log_rotate_f(va_list ap)
 {
@@ -1285,9 +1093,9 @@ vy_recovery_index_id_hash(uint32_t space_id, uint32_t index_id)
 }
 
 /** Lookup a vinyl index in vy_recovery::index_id_hash map. */
-static struct vy_index_recovery_info *
-vy_recovery_lookup_index_by_id(struct vy_recovery *recovery,
-			       uint32_t space_id, uint32_t index_id)
+struct vy_index_recovery_info *
+vy_recovery_lookup_index(struct vy_recovery *recovery,
+			 uint32_t space_id, uint32_t index_id)
 {
 	int64_t key = vy_recovery_index_id_hash(space_id, index_id);
 	struct mh_i64ptr_t *h = recovery->index_id_hash;
@@ -1412,6 +1220,7 @@ vy_recovery_create_index(struct vy_recovery *recovery, int64_t index_lsn,
 			free(index);
 			return -1;
 		}
+		rlist_add_entry(&recovery->indexes, index, in_recovery);
 	} else {
 		/*
 		 * The index was dropped and recreated with the
@@ -1583,6 +1392,7 @@ vy_recovery_do_create_run(struct vy_recovery *recovery, int64_t run_id)
 	run->gc_lsn = -1;
 	run->is_incomplete = false;
 	run->is_dropped = false;
+	run->data = NULL;
 	rlist_create(&run->in_index);
 	if (recovery->max_id < run_id)
 		recovery->max_id = run_id;
@@ -2024,6 +1834,7 @@ vy_recovery_new_f(va_list ap)
 		goto fail;
 	}
 
+	rlist_create(&recovery->indexes);
 	recovery->index_id_hash = NULL;
 	recovery->index_lsn_hash = NULL;
 	recovery->range_hash = NULL;
@@ -2176,9 +1987,23 @@ vy_recovery_delete(struct vy_recovery *recovery)
 	free(recovery);
 }
 
+/** Write a record to vylog. */
+static int
+vy_log_append_record(struct xlog *xlog, struct vy_log_record *record)
+{
+	say_verbose("save vylog record: %s", vy_log_record_str(record));
+
+	struct xrow_header row;
+	if (vy_log_record_encode(record, &row) < 0)
+		return -1;
+	if (xlog_write_row(xlog, &row) < 0)
+		return -1;
+	return 0;
+}
+
+/** Write all records corresponding to an index to vylog. */
 static int
-vy_recovery_iterate_index(struct vy_index_recovery_info *index,
-			  vy_recovery_cb cb, void *cb_arg)
+vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 {
 	struct vy_range_recovery_info *range;
 	struct vy_slice_recovery_info *slice;
@@ -2192,7 +2017,7 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 	record.space_id = index->space_id;
 	record.key_parts = index->key_parts;
 	record.key_part_count = index->key_part_count;
-	if (cb(&record, cb_arg) != 0)
+	if (vy_log_append_record(xlog, &record) != 0)
 		return -1;
 
 	if (index->truncate_count > 0) {
@@ -2200,7 +2025,7 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 		record.type = VY_LOG_TRUNCATE_INDEX;
 		record.index_lsn = index->index_lsn;
 		record.truncate_count = index->truncate_count;
-		if (cb(&record, cb_arg) != 0)
+		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
 	}
 
@@ -2209,7 +2034,7 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 		record.type = VY_LOG_DUMP_INDEX;
 		record.index_lsn = index->index_lsn;
 		record.dump_lsn = index->dump_lsn;
-		if (cb(&record, cb_arg) != 0)
+		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
 	}
 
@@ -2223,8 +2048,7 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 		}
 		record.index_lsn = index->index_lsn;
 		record.run_id = run->id;
-		record.is_dropped = run->is_dropped;
-		if (cb(&record, cb_arg) != 0)
+		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
 
 		if (!run->is_dropped)
@@ -2234,7 +2058,7 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 		record.type = VY_LOG_DROP_RUN;
 		record.run_id = run->id;
 		record.gc_lsn = run->gc_lsn;
-		if (cb(&record, cb_arg) != 0)
+		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
 	}
 
@@ -2245,7 +2069,7 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 		record.range_id = range->id;
 		record.begin = range->begin;
 		record.end = range->end;
-		if (cb(&record, cb_arg) != 0)
+		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
 		/*
 		 * Newer slices are stored closer to the head of the list,
@@ -2260,7 +2084,7 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 			record.run_id = slice->run->id;
 			record.begin = slice->begin;
 			record.end = slice->end;
-			if (cb(&record, cb_arg) != 0)
+			if (vy_log_append_record(xlog, &record) != 0)
 				return -1;
 		}
 	}
@@ -2269,16 +2093,25 @@ vy_recovery_iterate_index(struct vy_index_recovery_info *index,
 		vy_log_record_init(&record);
 		record.type = VY_LOG_DROP_INDEX;
 		record.index_lsn = index->index_lsn;
-		if (cb(&record, cb_arg) != 0)
+		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
 	}
 	return 0;
 }
 
-int
-vy_recovery_iterate(struct vy_recovery *recovery,
-		    vy_recovery_cb cb, void *cb_arg)
+/** Create vylog from a recovery context. */
+static int
+vy_log_create(const struct vclock *vclock, struct vy_recovery *recovery)
 {
+	say_verbose("saving vylog %lld", (long long)vclock_sum(vclock));
+
+	/*
+	 * Only create the log file if we have something
+	 * to write to it.
+	 */
+	struct xlog xlog;
+	xlog_clear(&xlog);
+
 	mh_int_t i;
 	mh_foreach(recovery->index_id_hash, i) {
 		struct vy_index_recovery_info *index;
@@ -2290,59 +2123,44 @@ vy_recovery_iterate(struct vy_recovery *recovery,
 		 */
 		if (index->is_dropped && rlist_empty(&index->runs))
 			continue;
-		if (vy_recovery_iterate_index(index, cb, cb_arg) < 0)
-			return -1;
+
+		/* Create the log file on the first write. */
+		if (!xlog_is_open(&xlog) &&
+		    xdir_create_xlog(&vy_log.dir, &xlog, vclock) != 0)
+			goto err_create_xlog;
+
+		if (vy_log_append_index(&xlog, index) != 0)
+			goto err_write_xlog;
 	}
+	if (!xlog_is_open(&xlog))
+		goto done; /* nothing written */
+
+	/* Mark the end of the snapshot. */
+	struct vy_log_record record;
+	vy_log_record_init(&record);
+	record.type = VY_LOG_SNAPSHOT;
+	if (vy_log_append_record(&xlog, &record) != 0)
+		goto err_write_xlog;
+
+	/* Finalize the new xlog. */
+	if (xlog_flush(&xlog) < 0 ||
+	    xlog_sync(&xlog) < 0 ||
+	    xlog_rename(&xlog) < 0)
+		goto err_write_xlog;
+
+	xlog_close(&xlog, false);
+done:
+	say_verbose("done saving vylog");
 	return 0;
-}
 
-int
-vy_recovery_load_index(struct vy_recovery *recovery,
-		       uint32_t space_id, uint32_t index_id,
-		       int64_t index_lsn, bool is_checkpoint_recovery,
-		       vy_recovery_cb cb, void *cb_arg)
-{
-	struct vy_index_recovery_info *index;
-	index = vy_recovery_lookup_index_by_id(recovery, space_id, index_id);
-	if (index == NULL)
-		return 0;
-	/* See the comment to the function declaration. */
-	if (index_lsn < index->index_lsn) {
-		/*
-		 * Loading a past incarnation of the index.
-		 * Emit create/drop records to indicate that
-		 * it is going to be dropped by a WAL statement
-		 * and hence doesn't need to be recovered.
-		 */
-		struct vy_log_record record;
-		vy_log_record_init(&record);
-		record.type = VY_LOG_CREATE_INDEX;
-		record.index_id = index->index_id;
-		record.space_id = index->space_id;
-		record.index_lsn = index_lsn;
-		if (cb(&record, cb_arg) != 0)
-			return -1;
-		vy_log_record_init(&record);
-		record.type = VY_LOG_DROP_INDEX;
-		record.index_lsn = index_lsn;
-		if (cb(&record, cb_arg) != 0)
-			return -1;
-		return 0;
-	} else if (is_checkpoint_recovery || index_lsn == index->index_lsn) {
-		/*
-		 * Loading the last incarnation of the index.
-		 * Replay all records we have recovered from
-		 * the log for this index.
-		 */
-		return vy_recovery_iterate_index(index, cb, cb_arg);
-	} else {
-		/*
-		 * The requested incarnation is missing in the recovery
-		 * context, because we failed to log it before restart.
-		 * Do nothing and let the caller retry logging.
-		 */
-		assert(!is_checkpoint_recovery);
-		assert(index_lsn > index->index_lsn);
-		return 0;
-	}
+err_write_xlog:
+	/* Delete the unfinished xlog. */
+	assert(xlog_is_open(&xlog));
+	if (unlink(xlog.filename) < 0)
+		say_syserror("failed to delete file '%s'",
+			     xlog.filename);
+	xlog_close(&xlog, false);
+
+err_create_xlog:
+	return -1;
 }
diff --git a/src/box/vy_log.h b/src/box/vy_log.h
index f17b122a..8fbacd0f 100644
--- a/src/box/vy_log.h
+++ b/src/box/vy_log.h
@@ -34,6 +34,7 @@
 #include <stdbool.h>
 #include <stdint.h>
 #include <string.h>
+#include <small/rlist.h>
 
 #include "salad/stailq.h"
 
@@ -58,8 +59,7 @@ struct xlog;
 struct vclock;
 struct key_def;
 struct key_part_def;
-
-struct vy_recovery;
+struct mh_i64ptr_t;
 
 /** Type of a metadata log record. */
 enum vy_log_record_type {
@@ -175,12 +175,6 @@ struct vy_log_record {
 	/** Unique ID of the run slice. */
 	int64_t slice_id;
 	/**
-	 * For VY_LOG_CREATE_RUN record: hint that the run
-	 * is dropped, i.e. there is a VY_LOG_DROP_RUN record
-	 * following this one.
-	 */
-	bool is_dropped;
-	/**
 	 * Msgpack key for start of the range/slice.
 	 * NULL if the range/slice starts from -inf.
 	 */
@@ -213,6 +207,127 @@ struct vy_log_record {
 	struct stailq_entry in_tx;
 };
 
+/** Recovery context. */
+struct vy_recovery {
+	/**
+	 * List of all indexes stored in the recovery context,
+	 * linked by vy_index_recovery_info::in_recovery.
+	 */
+	struct rlist indexes;
+	/** space_id, index_id -> vy_index_recovery_info. */
+	struct mh_i64ptr_t *index_id_hash;
+	/** index_lsn -> vy_index_recovery_info. */
+	struct mh_i64ptr_t *index_lsn_hash;
+	/** ID -> vy_range_recovery_info. */
+	struct mh_i64ptr_t *range_hash;
+	/** ID -> vy_run_recovery_info. */
+	struct mh_i64ptr_t *run_hash;
+	/** ID -> vy_slice_recovery_info. */
+	struct mh_i64ptr_t *slice_hash;
+	/**
+	 * Maximal vinyl object ID, according to the metadata log,
+	 * or -1 in case no vinyl objects were recovered.
+	 */
+	int64_t max_id;
+};
+
+/** Vinyl index info stored in a recovery context. */
+struct vy_index_recovery_info {
+	/** Link in vy_recovery::indexes. */
+	struct rlist in_recovery;
+	/** LSN of the index creation. */
+	int64_t index_lsn;
+	/** Ordinal index number in the space. */
+	uint32_t index_id;
+	/** Space ID. */
+	uint32_t space_id;
+	/** Array of key part definitions. */
+	struct key_part_def *key_parts;
+	/** Number of key parts. */
+	uint32_t key_part_count;
+	/** True if the index was dropped. */
+	bool is_dropped;
+	/** LSN of the last index dump. */
+	int64_t dump_lsn;
+	/** Truncate count. */
+	int64_t truncate_count;
+	/**
+	 * List of all ranges in the index, linked by
+	 * vy_range_recovery_info::in_index.
+	 */
+	struct rlist ranges;
+	/**
+	 * List of all runs created for the index
+	 * (both committed and not), linked by
+	 * vy_run_recovery_info::in_index.
+	 */
+	struct rlist runs;
+};
+
+/** Vinyl range info stored in a recovery context. */
+struct vy_range_recovery_info {
+	/** Link in vy_index_recovery_info::ranges. */
+	struct rlist in_index;
+	/** ID of the range. */
+	int64_t id;
+	/** Start of the range, stored in MsgPack array. */
+	char *begin;
+	/** End of the range, stored in MsgPack array. */
+	char *end;
+	/**
+	 * List of all slices in the range, linked by
+	 * vy_slice_recovery_info::in_range.
+	 *
+	 * Newer slices are closer to the head.
+	 */
+	struct rlist slices;
+};
+
+/** Run info stored in a recovery context. */
+struct vy_run_recovery_info {
+	/** Link in vy_index_recovery_info::runs. */
+	struct rlist in_index;
+	/** ID of the run. */
+	int64_t id;
+	/** Max LSN stored on disk. */
+	int64_t dump_lsn;
+	/**
+	 * For deleted runs: LSN of the last checkpoint
+	 * that uses this run.
+	 */
+	int64_t gc_lsn;
+	/**
+	 * True if the run was not committed (there's
+	 * VY_LOG_PREPARE_RUN, but no VY_LOG_CREATE_RUN).
+	 */
+	bool is_incomplete;
+	/** True if the run was dropped (VY_LOG_DROP_RUN). */
+	bool is_dropped;
+	/*
+	 * The following field is initialized to NULL and
+	 * ignored by vy_log subsystem. It may be used by
+	 * the caller to store some extra information.
+	 *
+	 * During recovery, we store a pointer to vy_run
+	 * corresponding to this object.
+	 */
+	void *data;
+};
+
+/** Slice info stored in a recovery context. */
+struct vy_slice_recovery_info {
+	/** Link in vy_range_recovery_info::slices. */
+	struct rlist in_range;
+	/** ID of the slice. */
+	int64_t id;
+	/** Run this slice was created for. */
+	struct vy_run_recovery_info *run;
+	/** Start of the slice, stored in MsgPack array. */
+	char *begin;
+	/** End of the slice, stored in MsgPack array. */
+	char *end;
+};
+
 /**
  * Initialize the metadata log.
  * @dir is the directory where log files are stored.
@@ -359,70 +474,14 @@ vy_recovery_new(int64_t signature, bool only_checkpoint);
 void
 vy_recovery_delete(struct vy_recovery *recovery);
 
-typedef int
-(*vy_recovery_cb)(const struct vy_log_record *record, void *arg);
-
-/**
- * Iterate over all objects stored in a recovery context.
- *
- * This function invokes callback @cb for each object (index, run, etc)
- * stored in the given recovery context. The callback is passed a record
- * used to log the object and optional argument @cb_arg. If the callback
- * returns a value different from 0, iteration stops and -1 is returned,
- * otherwise the function returns 0.
- *
- * To ease the work done by the callback, records corresponding to
- * slices of a range always go right after the range, in the
- * chronological order, while an index's runs go after the index
- * and before its ranges.
- */
-int
-vy_recovery_iterate(struct vy_recovery *recovery,
-		    vy_recovery_cb cb, void *cb_arg);
-
 /**
- * Load an index from a recovery context.
- *
- * Call @cb for each object related to the index. Break the loop and
- * return -1 if @cb returned a non-zero value, otherwise return 0.
- * Objects are loaded in the same order as by vy_recovery_iterate().
+ * Look up the last incarnation of an index stored in a recovery context.
  *
- * Note, this function returns 0 if there's no index with the requested
- * id in the recovery context. In this case, @cb isn't called at all.
- *
- * The @is_checkpoint_recovery flag indicates that the row that created
- * the index was loaded from a snapshot, in which case @index_lsn is
- * the snapshot signature. Otherwise @index_lsn is the LSN of the WAL
- * row that created the index.
- *
- * The index is looked up by @space_id and @index_id while @index_lsn
- * is used to discern different incarnations of the same index as
- * follows. Let @record denote the vylog record corresponding to the
- * last incarnation of the index. Then
- *
- * - If @is_checkpoint_recovery is set and @index_lsn >= @record->index_lsn,
- *   the last index incarnation was created before the snapshot and we
- *   need to load it right now.
- *
- * - If @is_checkpoint_recovery is set and @index_lsn < @record->index_lsn,
- *   the last index incarnation was created after the snapshot, i.e.
- *   the index loaded now is going to be dropped so load a dummy.
- *
- * - If @is_checkpoint_recovery is unset and @index_lsn < @record->index_lsn,
- *   the last index incarnation is created further in WAL, load a dummy.
- *
- * - If @is_checkpoint_recovery is unset and @index_lsn == @record->index_lsn,
- *   load the last index incarnation.
- *
- * - If @is_checkpoint_recovery is unset and @index_lsn > @record->index_lsn,
- *   it seems we failed to log index creation before restart. In this
- *   case don't do anything. The caller is supposed to retry logging.
+ * Returns NULL if the index was not found.
  */
-int
-vy_recovery_load_index(struct vy_recovery *recovery,
-		       uint32_t space_id, uint32_t index_id,
-		       int64_t index_lsn, bool is_checkpoint_recovery,
-		       vy_recovery_cb cb, void *cb_arg);
+struct vy_index_recovery_info *
+vy_recovery_lookup_index(struct vy_recovery *recovery,
+			 uint32_t space_id, uint32_t index_id);
 
 /**
  * Initialize a log record with default values.
diff --git a/test/unit/vy_log_stub.c b/test/unit/vy_log_stub.c
index daabf3f9..1fda0a6b 100644
--- a/test/unit/vy_log_stub.c
+++ b/test/unit/vy_log_stub.c
@@ -51,11 +51,9 @@ vy_log_tx_commit(void)
 void
 vy_log_write(const struct vy_log_record *record) {}
 
-int
-vy_recovery_load_index(struct vy_recovery *recovery,
-		       uint32_t space_id, uint32_t index_id,
-		       int64_t index_lsn, bool snapshot_recovery,
-		       vy_recovery_cb cb, void *cb_arg)
+struct vy_index_recovery_info *
+vy_recovery_lookup_index(struct vy_recovery *recovery,
+			 uint32_t space_id, uint32_t index_id)
 {
 	unreachable();
 }
diff --git a/test/vinyl/layout.result b/test/vinyl/layout.result
index 5c78babf..603d2865 100644
--- a/test/vinyl/layout.result
+++ b/test/vinyl/layout.result
@@ -189,12 +189,12 @@ result
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [7, {2: 3}]
+          tuple: [7, {2: 2}]
       - HEADER:
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [7, {2: 2}]
+          tuple: [7, {2: 3}]
       - HEADER:
           timestamp: <timestamp>
           type: INSERT
-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/4] vinyl: do not use index lsn to identify indexes in vylog
  2018-03-17 17:56 [PATCH 0/4] Prepare vylog for space alter Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 1/4] vinyl: refactor vylog recovery Vladimir Davydov
@ 2018-03-17 17:56 ` Vladimir Davydov
  2018-03-19 15:01   ` Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 3/4] index: remove lsn from commit_create arguments Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 4/4] alter: rewrite space truncation using alter infrastructure Vladimir Davydov
  3 siblings, 1 reply; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-17 17:56 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

An index can be dropped and then recreated with the same space/index id.
To discriminate between different incarnations of the same index during
recovery, we use LSN from the time of index creation, as it is supposed
to be unique. However, the uniqueness property won't hold once ALTER is
implemented for vinyl spaces: the problem is we have to rebuild all
indexes of a space and commit them simultaneously in case the primary
index is ALTERED.

Actually, we don't really need a unique index ID stored both in vylog
and in WAL to recover all incarnations of vinyl indexes. Since WAL is
replayed in the chronological order, we can find the index corresponding
to index creation statement replayed from WAL by its space_id/index_id
and the number of times the index has been dropped since we started to
replay WAL. As we don't actually recover indexes that are dropped during
WAL recovery, we don't need to know their properties, we just need to
know that they existed. So we don't need to store a separate object for
each index incarnation in the recovery context, it's enough to add a
counter representing the number of incarnations the index has had since
the last checkpoint to the vy_index_recovery_info structure to be able
to recover all incarnations of the same vinyl index. Then on recovery,
vy_index_recover() would check the incarnation counter of the index and

 - recover the last incarnation if it is 1
 - emit create/drop record if it is > 1
 - do nothing if it is 0

and decrement the counter on success so that the next call to
vy_index_recover() with the same space and index id proceeds to
the next incarnation of this index. And this is what this patch
does in a nutshell.

The tricky part is preserving backward compatibility. We can't just
replace index_lsn with space_id/index_id everywhere, because there
still can be records that use index_lsn to identify an index and we
need to process them. For example, VY_LOG_CREATE_INDEX is fine in
this regard, because it contains both space_id/index_id and index_lsn,
but VY_LOG_DROP_INDEX isn't, as it only has index_lsn. So we have to
maintain two index hash tables during vylog recovery instead of just
one: one for new records where index is referenced by space_id/index_id
and another for legacy records where index_lsn is used instead.
---
 src/box/vinyl.c          |  39 ++-----
 src/box/vy_index.c       |  54 ++++-----
 src/box/vy_index.h       |  12 +-
 src/box/vy_log.c         | 286 ++++++++++++++++++++++++++++-------------------
 src/box/vy_log.h         |  97 +++++++++++-----
 src/box/vy_scheduler.c   |  12 +-
 test/unit/vy_log_stub.c  |   2 +-
 test/vinyl/layout.result |  32 +++---
 8 files changed, 300 insertions(+), 234 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index d3659b0b..244360fa 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -765,7 +765,6 @@ vy_index_open(struct vy_env *env, struct vy_index *index)
 		 * the index files from it.
 		 */
 		rc = vy_index_recover(index, env->recovery, &env->run_env,
-				vclock_sum(env->recovery_vclock),
 				env->status == VINYL_INITIAL_RECOVERY_LOCAL,
 				env->force_recovery);
 		break;
@@ -778,6 +777,8 @@ vy_index_open(struct vy_env *env, struct vy_index *index)
 static void
 vinyl_index_commit_create(struct index *base, int64_t lsn)
 {
+	(void)lsn;
+
 	struct vy_env *env = vy_env(base->engine);
 	struct vy_index *index = vy_index(base);
 
@@ -791,32 +792,13 @@ vinyl_index_commit_create(struct index *base, int64_t lsn)
 		 * the index isn't in the recovery context and we
 		 * need to retry to log it now.
 		 */
-		if (index->commit_lsn >= 0) {
+		if (index->is_committed) {
 			vy_scheduler_add_index(&env->scheduler, index);
 			return;
 		}
 	}
 
-	if (env->status == VINYL_INITIAL_RECOVERY_REMOTE) {
-		/*
-		 * Records received during initial join do not
-		 * have LSNs so we use a fake one to identify
-		 * the index in vylog.
-		 */
-		lsn = ++env->join_lsn;
-	}
-
-	/*
-	 * Backward compatibility fixup: historically, we used
-	 * box.info.signature for LSN of index creation, which
-	 * lags behind the LSN of the record that created the
-	 * index by 1. So for legacy indexes use the LSN from
-	 * index options.
-	 */
-	if (index->opts.lsn != 0)
-		lsn = index->opts.lsn;
-
-	index->commit_lsn = lsn;
+	index->is_committed = true;
 
 	assert(index->range_count == 1);
 	struct vy_range *range = vy_range_tree_first(index->tree);
@@ -830,9 +812,8 @@ vinyl_index_commit_create(struct index *base, int64_t lsn)
 	 * recovery.
 	 */
 	vy_log_tx_begin();
-	vy_log_create_index(index->commit_lsn, index->id,
-			    index->space_id, index->key_def);
-	vy_log_insert_range(index->commit_lsn, range->id, NULL, NULL);
+	vy_log_create_index(index->space_id, index->id, index->key_def);
+	vy_log_insert_range(index->space_id, index->id, range->id, NULL, NULL);
 	vy_log_tx_try_commit();
 	/*
 	 * After we committed the index in the log, we can schedule
@@ -889,7 +870,7 @@ vinyl_index_commit_drop(struct index *base)
 
 	vy_log_tx_begin();
 	vy_log_index_prune(index, checkpoint_last(NULL));
-	vy_log_drop_index(index->commit_lsn);
+	vy_log_drop_index(index->space_id, index->id);
 	vy_log_tx_try_commit();
 }
 
@@ -938,7 +919,7 @@ vinyl_space_prepare_truncate(struct space *old_space, struct space *new_space)
 		struct vy_index *old_index = vy_index(old_space->index[i]);
 		struct vy_index *new_index = vy_index(new_space->index[i]);
 
-		new_index->commit_lsn = old_index->commit_lsn;
+		new_index->is_committed = old_index->is_committed;
 
 		if (truncate_done) {
 			/*
@@ -1015,9 +996,9 @@ vinyl_space_commit_truncate(struct space *old_space, struct space *new_space)
 		assert(new_index->range_count == 1);
 
 		vy_log_index_prune(old_index, gc_lsn);
-		vy_log_insert_range(new_index->commit_lsn,
+		vy_log_insert_range(new_index->space_id, new_index->id,
 				    range->id, NULL, NULL);
-		vy_log_truncate_index(new_index->commit_lsn,
+		vy_log_truncate_index(new_index->space_id, new_index->id,
 				      new_index->truncate_count);
 	}
 	vy_log_tx_try_commit();
diff --git a/src/box/vy_index.c b/src/box/vy_index.c
index 9c199ddd..5a994c47 100644
--- a/src/box/vy_index.c
+++ b/src/box/vy_index.c
@@ -221,7 +221,6 @@ vy_index_new(struct vy_index_env *index_env, struct vy_cache_env *cache_env,
 		goto fail_mem;
 
 	index->refs = 1;
-	index->commit_lsn = -1;
 	index->dump_lsn = -1;
 	vy_cache_create(&index->cache, cache_env, cmp_def);
 	rlist_create(&index->sealed);
@@ -533,29 +532,26 @@ out:
 
 int
 vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
-		 struct vy_run_env *run_env, int64_t lsn,
+		 struct vy_run_env *run_env,
 		 bool is_checkpoint_recovery, bool force_recovery)
 {
 	assert(index->range_count == 0);
 
 	/*
-	 * Backward compatibility fixup: historically, we used
-	 * box.info.signature for LSN of index creation, which
-	 * lags behind the LSN of the record that created the
-	 * index by 1. So for legacy indexes use the LSN from
-	 * index options.
-	 */
-	if (index->opts.lsn != 0)
-		lsn = index->opts.lsn;
-
-	/*
 	 * Look up the last incarnation of the index in vylog.
 	 */
+	struct vy_index_recovery_id index_id = {
+		.index_id = index->id,
+		.space_id = index->space_id,
+	};
 	struct vy_index_recovery_info *index_info;
-	index_info = vy_recovery_lookup_index(recovery,
-					      index->space_id, index->id);
-	if (is_checkpoint_recovery) {
-		if (index_info == NULL) {
+	index_info = vy_recovery_lookup_index(recovery, &index_id);
+	if (index_info == NULL || index_info->incarnation_count == 0) {
+		/*
+		 * The index was either not found in vylog or
+		 * all incarnations have already been loaded.
+		 */
+		if (is_checkpoint_recovery) {
 			/*
 			 * All indexes created from snapshot rows must
 			 * be present in vylog, because snapshot can
@@ -568,16 +564,6 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 					    (unsigned)index->id));
 			return -1;
 		}
-		if (lsn > index_info->index_lsn) {
-			/*
-			 * The last incarnation of the index was created
-			 * before the last checkpoint, load it now.
-			 */
-			lsn = index_info->index_lsn;
-		}
-	}
-
-	if (index_info == NULL || lsn > index_info->index_lsn) {
 		/*
 		 * If we failed to log index creation before restart,
 		 * we won't find it in the log on recovery. This is
@@ -588,9 +574,12 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 		return vy_index_init_range_tree(index);
 	}
 
-	index->commit_lsn = lsn;
+	index->is_committed = true;
+
+	/* Advance to the next incarnation. */
+	index_info->incarnation_count--;
 
-	if (lsn < index_info->index_lsn || index_info->is_dropped) {
+	if (index_info->incarnation_count > 0 || index_info->is_dropped) {
 		/*
 		 * Loading a past incarnation of the index, i.e.
 		 * the index is going to dropped during final
@@ -671,8 +660,9 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 	}
 	if (prev == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Index %lld has empty range tree",
-				    (long long)index->commit_lsn));
+			 tt_sprintf("Index %u/%u has empty range tree",
+				    (unsigned)index->space_id,
+				    (unsigned)index->id));
 		return -1;
 	}
 	if (prev->end != NULL) {
@@ -1049,7 +1039,7 @@ vy_index_split_range(struct vy_index *index, struct vy_range *range)
 	vy_log_delete_range(range->id);
 	for (int i = 0; i < n_parts; i++) {
 		part = parts[i];
-		vy_log_insert_range(index->commit_lsn, part->id,
+		vy_log_insert_range(index->space_id, index->id, part->id,
 				    tuple_data_or_null(part->begin),
 				    tuple_data_or_null(part->end));
 		rlist_foreach_entry(slice, &part->slices, in_range)
@@ -1115,7 +1105,7 @@ vy_index_coalesce_range(struct vy_index *index, struct vy_range *range)
 	 * Log change in metadata.
 	 */
 	vy_log_tx_begin();
-	vy_log_insert_range(index->commit_lsn, result->id,
+	vy_log_insert_range(index->space_id, index->id, result->id,
 			    tuple_data_or_null(result->begin),
 			    tuple_data_or_null(result->end));
 	for (it = first; it != end; it = vy_range_tree_next(index->tree, it)) {
diff --git a/src/box/vy_index.h b/src/box/vy_index.h
index 4368f6a5..5a4aa29b 100644
--- a/src/box/vy_index.h
+++ b/src/box/vy_index.h
@@ -254,10 +254,10 @@ struct vy_index {
 	 */
 	int64_t dump_lsn;
 	/**
-	 * LSN of the row that committed the index or -1 if
-	 * the index was not committed to the metadata log.
+	 * This flag is set if the index creation was
+	 * committed to the metadata log.
 	 */
-	int64_t commit_lsn;
+	bool is_committed;
 	/**
 	 * This flag is set if the index was dropped.
 	 * It is also set on local recovery if the index
@@ -381,14 +381,10 @@ vy_index_create(struct vy_index *index);
  * WAL, in which case a missing index is OK - it just means we
  * failed to log it before restart and have to retry during
  * WAL replay.
- *
- * @lsn is the LSN of the row that created the index.
- * If the index is recovered from a snapshot, it is set
- * to the snapshot signature.
  */
 int
 vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
-		 struct vy_run_env *run_env, int64_t lsn,
+		 struct vy_run_env *run_env,
 		 bool is_checkpoint_recovery, bool force_recovery);
 
 /**
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index 8b95282b..cbd6dc16 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -304,11 +304,6 @@ vy_log_record_encode(const struct vy_log_record *record,
 	size += mp_sizeof_array(2);
 	size += mp_sizeof_uint(record->type);
 	size_t n_keys = 0;
-	if (record->index_lsn > 0) {
-		size += mp_sizeof_uint(VY_LOG_KEY_INDEX_LSN);
-		size += mp_sizeof_uint(record->index_lsn);
-		n_keys++;
-	}
 	if (record->range_id > 0) {
 		size += mp_sizeof_uint(VY_LOG_KEY_RANGE_ID);
 		size += mp_sizeof_uint(record->range_id);
@@ -386,10 +381,6 @@ vy_log_record_encode(const struct vy_log_record *record,
 	pos = mp_encode_array(pos, 2);
 	pos = mp_encode_uint(pos, record->type);
 	pos = mp_encode_map(pos, n_keys);
-	if (record->index_lsn > 0) {
-		pos = mp_encode_uint(pos, VY_LOG_KEY_INDEX_LSN);
-		pos = mp_encode_uint(pos, record->index_lsn);
-	}
 	if (record->range_id > 0) {
 		pos = mp_encode_uint(pos, VY_LOG_KEY_RANGE_ID);
 		pos = mp_encode_uint(pos, record->range_id);
@@ -1087,30 +1078,70 @@ vy_log_write(const struct vy_log_record *record)
  * vy_recovery::index_id_hash map.
  */
 static inline int64_t
-vy_recovery_index_id_hash(uint32_t space_id, uint32_t index_id)
+vy_index_recovery_id_hash(const struct vy_index_recovery_id *id)
 {
-	return ((uint64_t)space_id << 32) + index_id;
+	return ((uint64_t)id->space_id << 32) + id->index_id;
 }
 
-/** Lookup a vinyl index in vy_recovery::index_id_hash map. */
-struct vy_index_recovery_info *
-vy_recovery_lookup_index(struct vy_recovery *recovery,
-			 uint32_t space_id, uint32_t index_id)
+/** An snprint-style function to print an index id. */
+static int
+vy_index_recovery_id_snprint(char *buf, int size,
+			     const struct vy_index_recovery_id *id)
 {
-	int64_t key = vy_recovery_index_id_hash(space_id, index_id);
-	struct mh_i64ptr_t *h = recovery->index_id_hash;
-	mh_int_t k = mh_i64ptr_find(h, key, NULL);
-	if (k == mh_end(h))
-		return NULL;
-	return mh_i64ptr_node(h, k)->val;
+	int total = 0;
+	if (id->index_lsn != 0) {
+		/*
+		 * Legacy code: index_lsn is used as index ID.
+		 * Print the lsn and space_id/index_id if available.
+		 */
+		SNPRINT(total, snprintf, buf, size, "%lld",
+			(long long)id->index_lsn);
+		if (id->space_id != 0)
+			SNPRINT(total, snprintf, buf, size, " (%u/%u)",
+				(unsigned)id->space_id, (unsigned)id->index_id);
+	} else {
+		SNPRINT(total, snprintf, buf, size, "%u/%u",
+			(unsigned)id->space_id, (unsigned)id->index_id);
+	}
+	return total;
+}
+
+/**
+ * Return a string containing a human readable representation
+ * of an index id.
+ */
+static const char *
+vy_index_recovery_id_str(const struct vy_index_recovery_id *id)
+{
+	char *buf = tt_static_buf();
+	vy_index_recovery_id_snprint(buf, TT_STATIC_BUF_LEN, id);
+	return buf;
 }
 
-/** Lookup a vinyl index in vy_recovery::index_lsn_hash map. */
-static struct vy_index_recovery_info *
-vy_recovery_lookup_index_by_lsn(struct vy_recovery *recovery, int64_t index_lsn)
+/**
+ * Lookup a vinyl index in a recovery context.
+ *
+ * If the index id contains LSN (legacy), we look it up
+ * in index_lsn_hash, otherwise in index_id_hash.
+ */
+struct vy_index_recovery_info *
+vy_recovery_lookup_index(struct vy_recovery *recovery,
+			 const struct vy_index_recovery_id *id)
 {
-	struct mh_i64ptr_t *h = recovery->index_lsn_hash;
-	mh_int_t k = mh_i64ptr_find(h, index_lsn, NULL);
+	int64_t key;
+	struct mh_i64ptr_t *h;
+	if (id->index_lsn != 0) {
+		/*
+		 * Legacy code: index_lsn is used as index ID.
+		 * Look up the index in the corresponding hash.
+		 */
+		key = id->index_lsn;
+		h = recovery->index_lsn_hash;
+	} else {
+		key = vy_index_recovery_id_hash(id);
+		h = recovery->index_id_hash;
+	}
+	mh_int_t k = mh_i64ptr_find(h, key, NULL);
 	if (k == mh_end(h))
 		return NULL;
 	return mh_i64ptr_node(h, k)->val;
@@ -1151,13 +1182,13 @@ vy_recovery_lookup_slice(struct vy_recovery *recovery, int64_t slice_id)
 
 /**
  * Handle a VY_LOG_CREATE_INDEX log record.
- * This function allocates a new vinyl index with ID @index_lsn
+ * This function allocates a new vinyl index with ID @id
  * and inserts it to the hash.
  * Return 0 on success, -1 on failure (ID collision or OOM).
  */
 static int
-vy_recovery_create_index(struct vy_recovery *recovery, int64_t index_lsn,
-			 uint32_t index_id, uint32_t space_id,
+vy_recovery_create_index(struct vy_recovery *recovery,
+			 const struct vy_index_recovery_id *id,
 			 const struct key_part_def *key_parts,
 			 uint32_t key_part_count)
 {
@@ -1173,8 +1204,8 @@ vy_recovery_create_index(struct vy_recovery *recovery, int64_t index_lsn,
 	 */
 	if (key_parts == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Missing key definition for index %lld",
-				    (long long)index_lsn));
+			 tt_sprintf("Missing key definition for index %s",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	key_parts_copy = malloc(sizeof(*key_parts) * key_part_count);
@@ -1189,7 +1220,7 @@ vy_recovery_create_index(struct vy_recovery *recovery, int64_t index_lsn,
 	 * Look up the index in the hash.
 	 */
 	h = recovery->index_id_hash;
-	node.key = vy_recovery_index_id_hash(space_id, index_id);
+	node.key = vy_index_recovery_id_hash(id);
 	k = mh_i64ptr_find(h, node.key, NULL);
 	index = (k != mh_end(h)) ? mh_i64ptr_node(h, k)->val : NULL;
 
@@ -1207,10 +1238,11 @@ vy_recovery_create_index(struct vy_recovery *recovery, int64_t index_lsn,
 			free(key_parts_copy);
 			return -1;
 		}
-		index->index_id = index_id;
-		index->space_id = space_id;
+		index->index_id = id->index_id;
+		index->space_id = id->space_id;
 		rlist_create(&index->ranges);
 		rlist_create(&index->runs);
+		index->incarnation_count = 1;
 
 		node.val = index;
 		if (mh_i64ptr_put(h, &node, NULL, NULL) == mh_end(h)) {
@@ -1226,112 +1258,124 @@ vy_recovery_create_index(struct vy_recovery *recovery, int64_t index_lsn,
 		 * The index was dropped and recreated with the
 		 * same ID. Update its key definition (because it
 		 * could have changed since the last time it was
-		 * used) and reset its state.
+		 * used), reset its state, and bump the incarnation
+		 * counter.
 		 */
 		if (!index->is_dropped) {
 			diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-				 tt_sprintf("Index %u/%u created twice",
-					    (unsigned)space_id,
-					    (unsigned)index_id));
+				 tt_sprintf("Index %s created twice",
+					    vy_index_recovery_id_str(id)));
 			free(key_parts_copy);
 			return -1;
 		}
-		assert(index->index_id == index_id);
-		assert(index->space_id == space_id);
+		assert(index->index_id == id->index_id);
+		assert(index->space_id == id->space_id);
+		index->incarnation_count++;
 		free(index->key_parts);
 	}
 
-	index->index_lsn = index_lsn;
 	index->key_parts = key_parts_copy;
 	index->key_part_count = key_part_count;
 	index->is_dropped = false;
 	index->dump_lsn = -1;
 	index->truncate_count = 0;
 
-	/*
-	 * Add the index to the LSN hash.
-	 */
-	h = recovery->index_lsn_hash;
-	node.key = index_lsn;
-	node.val = index;
-	if (mh_i64ptr_find(h, index_lsn, NULL) != mh_end(h)) {
-		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Duplicate index id %lld",
-				    (long long)index_lsn));
-		return -1;
-	}
-	if (mh_i64ptr_put(h, &node, NULL, NULL) == mh_end(h)) {
-		diag_set(OutOfMemory, 0, "mh_i64ptr_put",
-			 "mh_i64ptr_node_t");
-		return -1;
+	if (id->index_lsn != 0) {
+		/*
+		 * Legacy code: index_lsn is used as index ID.
+		 * Add the index to the corresponding hash.
+		 */
+		h = recovery->index_lsn_hash;
+		node.key = id->index_lsn;
+		node.val = index;
+		if (mh_i64ptr_find(h, id->index_lsn, NULL) != mh_end(h)) {
+			diag_set(ClientError, ER_INVALID_VYLOG_FILE,
+				 tt_sprintf("Duplicate index id %s",
+					    vy_index_recovery_id_str(id)));
+			return -1;
+		}
+		if (mh_i64ptr_put(h, &node, NULL, NULL) == mh_end(h)) {
+			diag_set(OutOfMemory, 0, "mh_i64ptr_put",
+				 "mh_i64ptr_node_t");
+			return -1;
+		}
 	}
 	return 0;
 }
 
 /**
  * Handle a VY_LOG_DROP_INDEX log record.
- * This function marks the vinyl index with ID @index_lsn as dropped.
+ * This function marks the vinyl index with ID @id as dropped.
  * All ranges and runs of the index must have been deleted by now.
  * Returns 0 on success, -1 if ID not found or index is already marked.
  */
 static int
-vy_recovery_drop_index(struct vy_recovery *recovery, int64_t index_lsn)
+vy_recovery_drop_index(struct vy_recovery *recovery,
+		       const struct vy_index_recovery_id *id)
 {
 	struct vy_index_recovery_info *index;
-	index = vy_recovery_lookup_index_by_lsn(recovery, index_lsn);
+	index = vy_recovery_lookup_index(recovery, id);
 	if (index == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Index %lld deleted but not registered",
-				    (long long)index_lsn));
+			 tt_sprintf("Index %s deleted but not registered",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	if (index->is_dropped) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Index %lld deleted twice",
-				    (long long)index_lsn));
+			 tt_sprintf("Index %s deleted twice",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	if (!rlist_empty(&index->ranges)) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Dropped index %lld has ranges",
-				    (long long)index_lsn));
+			 tt_sprintf("Dropped index %s has ranges",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	struct vy_run_recovery_info *run;
 	rlist_foreach_entry(run, &index->runs, in_index) {
 		if (!run->is_dropped && !run->is_incomplete) {
 			diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-				 tt_sprintf("Dropped index %lld has active "
-					    "runs", (long long)index_lsn));
+				 tt_sprintf("Dropped index %s has active runs",
+					    vy_index_recovery_id_str(id)));
 			return -1;
 		}
 	}
 	index->is_dropped = true;
+	if (!recovery->seen_snapshot) {
+		/*
+		 * The index was dropped before the last checkpoint.
+		 * Reset the incarnation counter, because we won't
+		 * load this incarnation during WAL recovery.
+		 */
+		index->incarnation_count = 0;
+	}
 	return 0;
 }
 
 /**
  * Handle a VY_LOG_DUMP_INDEX log record.
  * This function updates LSN of the last dump of the vinyl index
- * with ID @index_lsn.
+ * with ID @id.
  * Returns 0 on success, -1 if ID not found or index is dropped.
  */
 static int
 vy_recovery_dump_index(struct vy_recovery *recovery,
-		       int64_t index_lsn, int64_t dump_lsn)
+		       const struct vy_index_recovery_id *id, int64_t dump_lsn)
 {
 	struct vy_index_recovery_info *index;
-	index = vy_recovery_lookup_index_by_lsn(recovery, index_lsn);
+	index = vy_recovery_lookup_index(recovery, id);
 	if (index == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Dump of unregistered index %lld",
-				    (long long)index_lsn));
+			 tt_sprintf("Dump of unregistered index %s",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	if (index->is_dropped) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Dump of deleted index %lld",
-				    (long long)index_lsn));
+			 tt_sprintf("Dump of deleted index %s",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	index->dump_lsn = dump_lsn;
@@ -1340,25 +1384,26 @@ vy_recovery_dump_index(struct vy_recovery *recovery,
 
 /**
  * Handle a VY_LOG_TRUNCATE_INDEX log record.
- * This function updates truncate_count of the index with ID @index_lsn.
+ * This function updates truncate_count of the index with ID @id.
  * Returns 0 on success, -1 if ID not found or index is dropped.
  */
 static int
 vy_recovery_truncate_index(struct vy_recovery *recovery,
-			   int64_t index_lsn, int64_t truncate_count)
+			   const struct vy_index_recovery_id *id,
+			   int64_t truncate_count)
 {
 	struct vy_index_recovery_info *index;
-	index = vy_recovery_lookup_index_by_lsn(recovery, index_lsn);
+	index = vy_recovery_lookup_index(recovery, id);
 	if (index == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Truncation of unregistered index %lld",
-				    (long long)index_lsn));
+			 tt_sprintf("Truncation of unregistered index %s",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	if (index->is_dropped) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
-			 tt_sprintf("Truncation of deleted index %lld",
-				    (long long)index_lsn));
+			 tt_sprintf("Truncation of deleted index %s",
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	index->truncate_count = truncate_count;
@@ -1402,21 +1447,21 @@ vy_recovery_do_create_run(struct vy_recovery *recovery, int64_t run_id)
 /**
  * Handle a VY_LOG_PREPARE_RUN log record.
  * This function creates a new incomplete vinyl run with ID @run_id
- * and adds it to the list of runs of the index with ID @index_lsn.
+ * and adds it to the list of runs of the index with ID @id.
  * Return 0 on success, -1 if run already exists, index not found,
  * or OOM.
  */
 static int
-vy_recovery_prepare_run(struct vy_recovery *recovery, int64_t index_lsn,
-			int64_t run_id)
+vy_recovery_prepare_run(struct vy_recovery *recovery,
+			const struct vy_index_recovery_id *id, int64_t run_id)
 {
 	struct vy_index_recovery_info *index;
-	index = vy_recovery_lookup_index_by_lsn(recovery, index_lsn);
+	index = vy_recovery_lookup_index(recovery, id);
 	if (index == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
 			 tt_sprintf("Run %lld created for unregistered "
-				    "index %lld", (long long)run_id,
-				    (long long)index_lsn));
+				    "index %s", (long long)run_id,
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	if (vy_recovery_lookup_run(recovery, run_id) != NULL) {
@@ -1437,29 +1482,30 @@ vy_recovery_prepare_run(struct vy_recovery *recovery, int64_t index_lsn,
 /**
  * Handle a VY_LOG_CREATE_RUN log record.
  * This function adds the vinyl run with ID @run_id to the list
- * of runs of the index with ID @index_lsn and marks it committed.
+ * of runs of the index with ID @id and marks it committed.
  * If the run does not exist, it will be created.
  * Return 0 on success, -1 if index not found, run or index
  * is dropped, or OOM.
  */
 static int
-vy_recovery_create_run(struct vy_recovery *recovery, int64_t index_lsn,
+vy_recovery_create_run(struct vy_recovery *recovery,
+		       const struct vy_index_recovery_id *id,
 		       int64_t run_id, int64_t dump_lsn)
 {
 	struct vy_index_recovery_info *index;
-	index = vy_recovery_lookup_index_by_lsn(recovery, index_lsn);
+	index = vy_recovery_lookup_index(recovery, id);
 	if (index == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
 			 tt_sprintf("Run %lld created for unregistered "
-				    "index %lld", (long long)run_id,
-				    (long long)index_lsn));
+				    "index %s", (long long)run_id,
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	if (index->is_dropped) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
 			 tt_sprintf("Run %lld created for deleted "
-				    "index %lld", (long long)run_id,
-				    (long long)index_lsn));
+				    "index %s", (long long)run_id,
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 	struct vy_run_recovery_info *run;
@@ -1538,11 +1584,12 @@ vy_recovery_forget_run(struct vy_recovery *recovery, int64_t run_id)
  * Handle a VY_LOG_INSERT_RANGE log record.
  * This function allocates a new vinyl range with ID @range_id,
  * inserts it to the hash, and adds it to the list of ranges of the
- * index with ID @index_lsn.
+ * index with ID @id.
  * Return 0 on success, -1 on failure (ID collision or OOM).
  */
 static int
-vy_recovery_insert_range(struct vy_recovery *recovery, int64_t index_lsn,
+vy_recovery_insert_range(struct vy_recovery *recovery,
+			 const struct vy_index_recovery_id *id,
 			 int64_t range_id, const char *begin, const char *end)
 {
 	if (vy_recovery_lookup_range(recovery, range_id) != NULL) {
@@ -1552,12 +1599,12 @@ vy_recovery_insert_range(struct vy_recovery *recovery, int64_t index_lsn,
 		return -1;
 	}
 	struct vy_index_recovery_info *index;
-	index = vy_recovery_lookup_index_by_lsn(recovery, index_lsn);
+	index = vy_recovery_lookup_index(recovery, id);
 	if (index == NULL) {
 		diag_set(ClientError, ER_INVALID_VYLOG_FILE,
 			 tt_sprintf("Range %lld created for unregistered "
-				    "index %lld", (long long)range_id,
-				    (long long)index_lsn));
+				    "index %s", (long long)range_id,
+				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
 
@@ -1761,29 +1808,34 @@ static int
 vy_recovery_process_record(struct vy_recovery *recovery,
 			   const struct vy_log_record *record)
 {
+	struct vy_index_recovery_id index_id = {
+		.space_id = record->space_id,
+		.index_id = record->index_id,
+		.index_lsn = record->index_lsn,
+	};
+
 	int rc;
 	switch (record->type) {
 	case VY_LOG_CREATE_INDEX:
-		rc = vy_recovery_create_index(recovery, record->index_lsn,
-				record->index_id, record->space_id,
+		rc = vy_recovery_create_index(recovery, &index_id,
 				record->key_parts, record->key_part_count);
 		break;
 	case VY_LOG_DROP_INDEX:
-		rc = vy_recovery_drop_index(recovery, record->index_lsn);
+		rc = vy_recovery_drop_index(recovery, &index_id);
 		break;
 	case VY_LOG_INSERT_RANGE:
-		rc = vy_recovery_insert_range(recovery, record->index_lsn,
+		rc = vy_recovery_insert_range(recovery, &index_id,
 				record->range_id, record->begin, record->end);
 		break;
 	case VY_LOG_DELETE_RANGE:
 		rc = vy_recovery_delete_range(recovery, record->range_id);
 		break;
 	case VY_LOG_PREPARE_RUN:
-		rc = vy_recovery_prepare_run(recovery, record->index_lsn,
+		rc = vy_recovery_prepare_run(recovery, &index_id,
 					     record->run_id);
 		break;
 	case VY_LOG_CREATE_RUN:
-		rc = vy_recovery_create_run(recovery, record->index_lsn,
+		rc = vy_recovery_create_run(recovery, &index_id,
 					    record->run_id, record->dump_lsn);
 		break;
 	case VY_LOG_DROP_RUN:
@@ -1802,11 +1854,11 @@ vy_recovery_process_record(struct vy_recovery *recovery,
 		rc = vy_recovery_delete_slice(recovery, record->slice_id);
 		break;
 	case VY_LOG_DUMP_INDEX:
-		rc = vy_recovery_dump_index(recovery, record->index_lsn,
+		rc = vy_recovery_dump_index(recovery, &index_id,
 					    record->dump_lsn);
 		break;
 	case VY_LOG_TRUNCATE_INDEX:
-		rc = vy_recovery_truncate_index(recovery, record->index_lsn,
+		rc = vy_recovery_truncate_index(recovery, &index_id,
 						record->truncate_count);
 		break;
 	default:
@@ -1841,6 +1893,7 @@ vy_recovery_new_f(va_list ap)
 	recovery->run_hash = NULL;
 	recovery->slice_hash = NULL;
 	recovery->max_id = -1;
+	recovery->seen_snapshot = false;
 
 	recovery->index_id_hash = mh_i64ptr_new();
 	recovery->index_lsn_hash = mh_i64ptr_new();
@@ -1879,6 +1932,7 @@ vy_recovery_new_f(va_list ap)
 		say_verbose("load vylog record: %s",
 			    vy_log_record_str(&record));
 		if (record.type == VY_LOG_SNAPSHOT) {
+			recovery->seen_snapshot = true;
 			if (only_checkpoint)
 				break;
 			continue;
@@ -2012,7 +2066,6 @@ vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 
 	vy_log_record_init(&record);
 	record.type = VY_LOG_CREATE_INDEX;
-	record.index_lsn = index->index_lsn;
 	record.index_id = index->index_id;
 	record.space_id = index->space_id;
 	record.key_parts = index->key_parts;
@@ -2023,7 +2076,8 @@ vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 	if (index->truncate_count > 0) {
 		vy_log_record_init(&record);
 		record.type = VY_LOG_TRUNCATE_INDEX;
-		record.index_lsn = index->index_lsn;
+		record.index_id = index->index_id;
+		record.space_id = index->space_id;
 		record.truncate_count = index->truncate_count;
 		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
@@ -2032,7 +2086,8 @@ vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 	if (index->dump_lsn >= 0) {
 		vy_log_record_init(&record);
 		record.type = VY_LOG_DUMP_INDEX;
-		record.index_lsn = index->index_lsn;
+		record.index_id = index->index_id;
+		record.space_id = index->space_id;
 		record.dump_lsn = index->dump_lsn;
 		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
@@ -2046,7 +2101,8 @@ vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 			record.type = VY_LOG_CREATE_RUN;
 			record.dump_lsn = run->dump_lsn;
 		}
-		record.index_lsn = index->index_lsn;
+		record.index_id = index->index_id;
+		record.space_id = index->space_id;
 		record.run_id = run->id;
 		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
@@ -2065,7 +2121,8 @@ vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 	rlist_foreach_entry(range, &index->ranges, in_index) {
 		vy_log_record_init(&record);
 		record.type = VY_LOG_INSERT_RANGE;
-		record.index_lsn = index->index_lsn;
+		record.index_id = index->index_id;
+		record.space_id = index->space_id;
 		record.range_id = range->id;
 		record.begin = range->begin;
 		record.end = range->end;
@@ -2092,7 +2149,8 @@ vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 	if (index->is_dropped) {
 		vy_log_record_init(&record);
 		record.type = VY_LOG_DROP_INDEX;
-		record.index_lsn = index->index_lsn;
+		record.index_id = index->index_id;
+		record.space_id = index->space_id;
 		if (vy_log_append_record(xlog, &record) != 0)
 			return -1;
 	}
diff --git a/src/box/vy_log.h b/src/box/vy_log.h
index 8fbacd0f..67dcd418 100644
--- a/src/box/vy_log.h
+++ b/src/box/vy_log.h
@@ -65,18 +65,19 @@ struct mh_i64ptr_t;
 enum vy_log_record_type {
 	/**
 	 * Create a new vinyl index.
-	 * Requires vy_log_record::index_lsn, index_id, space_id,
+	 * Requires vy_log_record::space_id, index_id,
 	 * key_def (with primary key parts).
 	 */
 	VY_LOG_CREATE_INDEX		= 0,
 	/**
 	 * Drop an index.
-	 * Requires vy_log_record::index_lsn.
+	 * Requires vy_log_record::space_id, index_id.
 	 */
 	VY_LOG_DROP_INDEX		= 1,
 	/**
 	 * Insert a new range into a vinyl index.
-	 * Requires vy_log_record::index_lsn, range_id, begin, end.
+	 * Requires vy_log_record::space_id, index_id, range_id,
+	 * begin, end.
 	 */
 	VY_LOG_INSERT_RANGE		= 2,
 	/**
@@ -86,7 +87,7 @@ enum vy_log_record_type {
 	VY_LOG_DELETE_RANGE		= 3,
 	/**
 	 * Prepare a vinyl run file.
-	 * Requires vy_log_record::index_lsn, run_id.
+	 * Requires vy_log_record::space_id, index_id, run_id.
 	 *
 	 * Record of this type is written before creating a run file.
 	 * It is needed to keep track of unfinished due to errors run
@@ -95,7 +96,8 @@ enum vy_log_record_type {
 	VY_LOG_PREPARE_RUN		= 4,
 	/**
 	 * Commit a vinyl run file creation.
-	 * Requires vy_log_record::index_lsn, run_id, dump_lsn.
+	 * Requires vy_log_record::space_id, index_id, run_id,
+	 * dump_lsn.
 	 *
 	 * Written after a run file was successfully created.
 	 */
@@ -126,7 +128,8 @@ enum vy_log_record_type {
 	VY_LOG_FORGET_RUN		= 7,
 	/**
 	 * Insert a run slice into a range.
-	 * Requires vy_log_record::range_id, run_id, slice_id, begin, end.
+	 * Requires vy_log_record::range_id, run_id, slice_id,
+	 * begin, end.
 	 */
 	VY_LOG_INSERT_SLICE		= 8,
 	/**
@@ -136,7 +139,7 @@ enum vy_log_record_type {
 	VY_LOG_DELETE_SLICE		= 9,
 	/**
 	 * Update LSN of the last index dump.
-	 * Requires vy_log_record::index_lsn, dump_lsn.
+	 * Requires vy_log_record::space_id, index_id, dump_lsn.
 	 */
 	VY_LOG_DUMP_INDEX		= 10,
 	/**
@@ -152,7 +155,7 @@ enum vy_log_record_type {
 	VY_LOG_SNAPSHOT			= 11,
 	/**
 	 * Update truncate count of a vinyl index.
-	 * Requires vy_log_record::index_lsn, truncate_count.
+	 * Requires vy_log_record::space_id, index_id, truncate_count.
 	 */
 	VY_LOG_TRUNCATE_INDEX		= 12,
 
@@ -165,7 +168,11 @@ struct vy_log_record {
 	enum vy_log_record_type type;
 	/**
 	 * LSN from the time of index creation.
-	 * Used to identify indexes in vylog.
+
+	 * We used to identify different incarnations of
+	 * the same index by LSN. Now it isn't needed, but
+	 * we have to maintain it to handle records created
+	 * by older versions.
 	 */
 	int64_t index_lsn;
 	/** Unique ID of the vinyl range. */
@@ -216,7 +223,13 @@ struct vy_recovery {
 	struct rlist indexes;
 	/** space_id, index_id -> vy_index_recovery_info. */
 	struct mh_i64ptr_t *index_id_hash;
-	/** index_lsn -> vy_index_recovery_info. */
+	/**
+	 * index_lsn -> vy_index_recovery_info.
+	 *
+	 * This is a legacy from the time when index_lsn was
+	 * used as a unique index ID. We need it to process
+	 * records written by older versions.
+	 */
 	struct mh_i64ptr_t *index_lsn_hash;
 	/** ID -> vy_range_recovery_info. */
 	struct mh_i64ptr_t *range_hash;
@@ -225,18 +238,34 @@ struct vy_recovery {
 	/** ID -> vy_slice_recovery_info. */
 	struct mh_i64ptr_t *slice_hash;
 	/**
+	 * The following flag is set once the VY_LOG_SNAPSHOT
+	 * record is processed.
+	 */
+	bool seen_snapshot;
+	/**
 	 * Maximal vinyl object ID, according to the metadata log,
 	 * or -1 in case no vinyl objects were recovered.
 	 */
 	int64_t max_id;
 };
 
+/**
+ * A helper structure that consists of vy_log_record
+ * fields needed to identify an index.
+ */
+struct vy_index_recovery_id {
+	/** vy_log_record::space_id */
+	uint32_t space_id;
+	/** vy_log_record::index_id */
+	uint32_t index_id;
+	/** vy_log_record::index_lsn (legacy) */
+	int64_t index_lsn;
+};
+
 /** Vinyl index info stored in a recovery context. */
 struct vy_index_recovery_info {
 	/** Link in vy_recovery::indexes. */
 	struct rlist in_recovery;
-	/** LSN of the index creation. */
-	int64_t index_lsn;
 	/** Ordinal index number in the space. */
 	uint32_t index_id;
 	/** Space ID. */
@@ -252,6 +281,11 @@ struct vy_index_recovery_info {
 	/** Truncate count. */
 	int64_t truncate_count;
 	/**
+	 * Number of incarnations the index has had
+	 * since the last checkpoint.
+	 */
+	int incarnation_count;
+	/**
 	 * List of all ranges in the index, linked by
 	 * vy_range_recovery_info::in_index.
 	 */
@@ -481,7 +515,7 @@ vy_recovery_delete(struct vy_recovery *recovery);
  */
 struct vy_index_recovery_info *
 vy_recovery_lookup_index(struct vy_recovery *recovery,
-			 uint32_t space_id, uint32_t index_id);
+			 const struct vy_index_recovery_id *id);
 
 /**
  * Initialize a log record with default values.
@@ -496,39 +530,40 @@ vy_log_record_init(struct vy_log_record *record)
 
 /** Helper to log a vinyl index creation. */
 static inline void
-vy_log_create_index(int64_t index_lsn, uint32_t index_id, uint32_t space_id,
+vy_log_create_index(uint32_t space_id, uint32_t index_id,
 		    const struct key_def *key_def)
 {
 	struct vy_log_record record;
 	vy_log_record_init(&record);
 	record.type = VY_LOG_CREATE_INDEX;
-	record.index_lsn = index_lsn;
-	record.index_id = index_id;
 	record.space_id = space_id;
+	record.index_id = index_id;
 	record.key_def = key_def;
 	vy_log_write(&record);
 }
 
 /** Helper to log a vinyl index drop. */
 static inline void
-vy_log_drop_index(int64_t index_lsn)
+vy_log_drop_index(uint32_t space_id, uint32_t index_id)
 {
 	struct vy_log_record record;
 	vy_log_record_init(&record);
 	record.type = VY_LOG_DROP_INDEX;
-	record.index_lsn = index_lsn;
+	record.space_id = space_id;
+	record.index_id = index_id;
 	vy_log_write(&record);
 }
 
 /** Helper to log a vinyl range insertion. */
 static inline void
-vy_log_insert_range(int64_t index_lsn, int64_t range_id,
+vy_log_insert_range(uint32_t space_id, uint32_t index_id, int64_t range_id,
 		    const char *begin, const char *end)
 {
 	struct vy_log_record record;
 	vy_log_record_init(&record);
 	record.type = VY_LOG_INSERT_RANGE;
-	record.index_lsn = index_lsn;
+	record.space_id = space_id;
+	record.index_id = index_id;
 	record.range_id = range_id;
 	record.begin = begin;
 	record.end = end;
@@ -548,24 +583,27 @@ vy_log_delete_range(int64_t range_id)
 
 /** Helper to log a vinyl run file creation. */
 static inline void
-vy_log_prepare_run(int64_t index_lsn, int64_t run_id)
+vy_log_prepare_run(uint32_t space_id, uint32_t index_id, int64_t run_id)
 {
 	struct vy_log_record record;
 	vy_log_record_init(&record);
 	record.type = VY_LOG_PREPARE_RUN;
-	record.index_lsn = index_lsn;
+	record.space_id = space_id;
+	record.index_id = index_id;
 	record.run_id = run_id;
 	vy_log_write(&record);
 }
 
 /** Helper to log a vinyl run creation. */
 static inline void
-vy_log_create_run(int64_t index_lsn, int64_t run_id, int64_t dump_lsn)
+vy_log_create_run(uint32_t space_id, uint32_t index_id,
+		  int64_t run_id, int64_t dump_lsn)
 {
 	struct vy_log_record record;
 	vy_log_record_init(&record);
 	record.type = VY_LOG_CREATE_RUN;
-	record.index_lsn = index_lsn;
+	record.space_id = space_id;
+	record.index_id = index_id;
 	record.run_id = run_id;
 	record.dump_lsn = dump_lsn;
 	vy_log_write(&record);
@@ -623,24 +661,27 @@ vy_log_delete_slice(int64_t slice_id)
 
 /** Helper to log index dump. */
 static inline void
-vy_log_dump_index(int64_t index_lsn, int64_t dump_lsn)
+vy_log_dump_index(uint32_t space_id, uint32_t index_id, int64_t dump_lsn)
 {
 	struct vy_log_record record;
 	vy_log_record_init(&record);
 	record.type = VY_LOG_DUMP_INDEX;
-	record.index_lsn = index_lsn;
+	record.space_id = space_id;
+	record.index_id = index_id;
 	record.dump_lsn = dump_lsn;
 	vy_log_write(&record);
 }
 
 /** Helper to log index truncation. */
 static inline void
-vy_log_truncate_index(int64_t index_lsn, int64_t truncate_count)
+vy_log_truncate_index(uint32_t space_id, uint32_t index_id,
+		      int64_t truncate_count)
 {
 	struct vy_log_record record;
 	vy_log_record_init(&record);
 	record.type = VY_LOG_TRUNCATE_INDEX;
-	record.index_lsn = index_lsn;
+	record.space_id = space_id;
+	record.index_id = index_id;
 	record.truncate_count = truncate_count;
 	vy_log_write(&record);
 }
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index 382cf071..0fde3138 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -583,7 +583,7 @@ vy_run_prepare(struct vy_run_env *run_env, struct vy_index *index)
 	if (run == NULL)
 		return NULL;
 	vy_log_tx_begin();
-	vy_log_prepare_run(index->commit_lsn, run->id);
+	vy_log_prepare_run(index->space_id, index->id, run->id);
 	if (vy_log_tx_commit() < 0) {
 		vy_run_unref(run);
 		return NULL;
@@ -706,7 +706,7 @@ vy_task_dump_complete(struct vy_scheduler *scheduler, struct vy_task *task)
 		 * to log index dump anyway.
 		 */
 		vy_log_tx_begin();
-		vy_log_dump_index(index->commit_lsn, dump_lsn);
+		vy_log_dump_index(index->space_id, index->id, dump_lsn);
 		if (vy_log_tx_commit() < 0)
 			goto fail;
 		vy_run_discard(new_run);
@@ -766,7 +766,7 @@ vy_task_dump_complete(struct vy_scheduler *scheduler, struct vy_task *task)
 	 * Log change in metadata.
 	 */
 	vy_log_tx_begin();
-	vy_log_create_run(index->commit_lsn, new_run->id, dump_lsn);
+	vy_log_create_run(index->space_id, index->id, new_run->id, dump_lsn);
 	for (range = begin_range, i = 0; range != end_range;
 	     range = vy_range_tree_next(index->tree, range), i++) {
 		assert(i < index->range_count);
@@ -778,7 +778,7 @@ vy_task_dump_complete(struct vy_scheduler *scheduler, struct vy_task *task)
 		if (++loops % VY_YIELD_LOOPS == 0)
 			fiber_sleep(0); /* see comment above */
 	}
-	vy_log_dump_index(index->commit_lsn, dump_lsn);
+	vy_log_dump_index(index->space_id, index->id, dump_lsn);
 	if (vy_log_tx_commit() < 0)
 		goto fail_free_slices;
 
@@ -1103,8 +1103,8 @@ vy_task_compact_complete(struct vy_scheduler *scheduler, struct vy_task *task)
 	rlist_foreach_entry(run, &unused_runs, in_unused)
 		vy_log_drop_run(run->id, gc_lsn);
 	if (new_slice != NULL) {
-		vy_log_create_run(index->commit_lsn, new_run->id,
-				  new_run->dump_lsn);
+		vy_log_create_run(index->space_id, index->id,
+				  new_run->id, new_run->dump_lsn);
 		vy_log_insert_slice(range->id, new_run->id, new_slice->id,
 				    tuple_data_or_null(new_slice->begin),
 				    tuple_data_or_null(new_slice->end));
diff --git a/test/unit/vy_log_stub.c b/test/unit/vy_log_stub.c
index 1fda0a6b..f3387846 100644
--- a/test/unit/vy_log_stub.c
+++ b/test/unit/vy_log_stub.c
@@ -53,7 +53,7 @@ vy_log_write(const struct vy_log_record *record) {}
 
 struct vy_index_recovery_info *
 vy_recovery_lookup_index(struct vy_recovery *recovery,
-			 uint32_t space_id, uint32_t index_id)
+			 const struct vy_index_recovery_id *id)
 {
 	unreachable();
 }
diff --git a/test/vinyl/layout.result b/test/vinyl/layout.result
index 603d2865..c59d8c5f 100644
--- a/test/vinyl/layout.result
+++ b/test/vinyl/layout.result
@@ -128,19 +128,19 @@ result
     - - HEADER:
           type: INSERT
         BODY:
-          tuple: [0, {0: 3, 7: [{'field': 0, 'collation': 1, 'type': 'string'}], 6: 512}]
+          tuple: [0, {7: [{'field': 0, 'collation': 1, 'type': 'string'}], 6: 512}]
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [10, {0: 3, 9: 9}]
+          tuple: [10, {9: 9, 6: 512}]
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [5, {0: 3, 2: 6, 9: 9}]
+          tuple: [5, {2: 6, 9: 9, 6: 512}]
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [4, {0: 3, 2: 3}]
+          tuple: [4, {2: 3, 6: 512}]
       - HEADER:
           type: INSERT
         BODY:
@@ -148,7 +148,7 @@ result
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [2, {0: 3}]
+          tuple: [2, {6: 512}]
       - HEADER:
           type: INSERT
         BODY:
@@ -156,19 +156,19 @@ result
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [0, {0: 4, 5: 1, 6: 512, 7: [{'field': 1, 'is_nullable': true, 'type': 'unsigned'}]}]
+          tuple: [0, {5: 1, 6: 512, 7: [{'field': 1, 'is_nullable': true, 'type': 'unsigned'}]}]
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [10, {0: 4, 9: 9}]
+          tuple: [10, {9: 9, 5: 1, 6: 512}]
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [5, {0: 4, 2: 4, 9: 9}]
+          tuple: [5, {2: 4, 5: 1, 6: 512, 9: 9}]
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [4, {0: 4, 2: 2}]
+          tuple: [4, {2: 2, 5: 1, 6: 512}]
       - HEADER:
           type: INSERT
         BODY:
@@ -176,7 +176,7 @@ result
       - HEADER:
           type: INSERT
         BODY:
-          tuple: [2, {0: 4, 1: 1}]
+          tuple: [2, {1: 1, 5: 1, 6: 512}]
       - HEADER:
           type: INSERT
         BODY:
@@ -199,12 +199,12 @@ result
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [4, {0: 4, 2: 8}]
+          tuple: [4, {2: 8, 5: 1, 6: 512}]
       - HEADER:
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [5, {0: 4, 2: 8, 9: 12}]
+          tuple: [5, {2: 8, 5: 1, 6: 512, 9: 12}]
       - HEADER:
           timestamp: <timestamp>
           type: INSERT
@@ -214,17 +214,17 @@ result
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [10, {0: 4, 9: 12}]
+          tuple: [10, {9: 12, 5: 1, 6: 512}]
       - HEADER:
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [4, {0: 3, 2: 10}]
+          tuple: [4, {2: 10, 6: 512}]
       - HEADER:
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [5, {0: 3, 2: 10, 9: 12}]
+          tuple: [5, {2: 10, 9: 12, 6: 512}]
       - HEADER:
           timestamp: <timestamp>
           type: INSERT
@@ -234,7 +234,7 @@ result
           timestamp: <timestamp>
           type: INSERT
         BODY:
-          tuple: [10, {0: 3, 9: 12}]
+          tuple: [10, {9: 12, 6: 512}]
   - - 00000000000000000006.index
     - - HEADER:
           type: RUNINFO
-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/4] index: remove lsn from commit_create arguments
  2018-03-17 17:56 [PATCH 0/4] Prepare vylog for space alter Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 1/4] vinyl: refactor vylog recovery Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 2/4] vinyl: do not use index lsn to identify indexes in vylog Vladimir Davydov
@ 2018-03-17 17:56 ` Vladimir Davydov
  2018-03-17 17:56 ` [PATCH 4/4] alter: rewrite space truncation using alter infrastructure Vladimir Davydov
  3 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-17 17:56 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

The lsn argument of index_vtab::commit_create was introduced for the
sake of vinyl, which used it to identify indexes in vylog. Now vinyl
doesn't need it anymore so we can remove it.
---
 src/box/alter.cc | 28 +++++++++++++---------------
 src/box/index.cc |  2 +-
 src/box/index.h  | 12 ++++--------
 src/box/vinyl.c  |  4 +---
 4 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 8455373b..809fb4ed 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -616,8 +616,7 @@ public:
 	struct rlist link;
 	virtual void alter_def(struct alter_space * /* alter */) {}
 	virtual void alter(struct alter_space * /* alter */) {}
-	virtual void commit(struct alter_space * /* alter */,
-			    int64_t /* signature */) {}
+	virtual void commit(struct alter_space * /* alter */) {}
 	virtual void rollback(struct alter_space * /* alter */) {}
 	virtual ~AlterSpaceOp() {}
 
@@ -717,7 +716,7 @@ AlterSpaceOp::AlterSpaceOp(struct alter_space *alter)
 static void
 alter_space_commit(struct trigger *trigger, void *event)
 {
-	struct txn *txn = (struct txn *) event;
+	(void) event;
 	struct alter_space *alter = (struct alter_space *) trigger->data;
 	/*
 	 * Commit alter ops, this will move the changed
@@ -725,7 +724,7 @@ alter_space_commit(struct trigger *trigger, void *event)
 	 */
 	class AlterSpaceOp *op;
 	rlist_foreach_entry(op, &alter->ops, link) {
-		op->commit(alter, txn->signature);
+		op->commit(alter);
 	}
 
 	trigger_run_xc(&on_alter_space, alter->new_space);
@@ -919,7 +918,7 @@ public:
 		  new_def(new_def) {}
 	virtual void alter(struct alter_space *alter);
 	virtual void alter_def(struct alter_space *alter);
-	virtual void commit(struct alter_space *alter, int64_t lsn);
+	virtual void commit(struct alter_space *alter);
 	virtual ~ModifySpaceFormat();
 };
 
@@ -954,10 +953,9 @@ ModifySpaceFormat::alter(struct alter_space *alter)
 }
 
 void
-ModifySpaceFormat::commit(struct alter_space *alter, int64_t lsn)
+ModifySpaceFormat::commit(struct alter_space *alter)
 {
 	(void) alter;
-	(void) lsn;
 	old_dict = NULL;
 }
 
@@ -1009,7 +1007,7 @@ public:
 	struct index_def *old_index_def;
 	virtual void alter_def(struct alter_space *alter);
 	virtual void alter(struct alter_space *alter);
-	virtual void commit(struct alter_space *alter, int64_t lsn);
+	virtual void commit(struct alter_space *alter);
 };
 
 /*
@@ -1046,7 +1044,7 @@ DropIndex::alter(struct alter_space *alter)
 }
 
 void
-DropIndex::commit(struct alter_space *alter, int64_t /* signature */)
+DropIndex::commit(struct alter_space *alter)
 {
 	struct index *index = index_find_xc(alter->old_space,
 					    old_index_def->iid);
@@ -1177,7 +1175,7 @@ public:
 	struct index_def *new_index_def;
 	virtual void alter_def(struct alter_space *alter);
 	virtual void alter(struct alter_space *alter);
-	virtual void commit(struct alter_space *alter, int64_t lsn);
+	virtual void commit(struct alter_space *alter);
 	virtual ~CreateIndex();
 };
 
@@ -1225,11 +1223,11 @@ CreateIndex::alter(struct alter_space *alter)
 }
 
 void
-CreateIndex::commit(struct alter_space *alter, int64_t signature)
+CreateIndex::commit(struct alter_space *alter)
 {
 	struct index *new_index = index_find_xc(alter->new_space,
 						new_index_def->iid);
-	index_commit_create(new_index, signature);
+	index_commit_create(new_index);
 }
 
 CreateIndex::~CreateIndex()
@@ -1263,7 +1261,7 @@ public:
 	struct index_def *old_index_def;
 	virtual void alter_def(struct alter_space *alter);
 	virtual void alter(struct alter_space *alter);
-	virtual void commit(struct alter_space *alter, int64_t signature);
+	virtual void commit(struct alter_space *alter);
 	virtual ~RebuildIndex();
 };
 
@@ -1288,14 +1286,14 @@ RebuildIndex::alter(struct alter_space *alter)
 }
 
 void
-RebuildIndex::commit(struct alter_space *alter, int64_t signature)
+RebuildIndex::commit(struct alter_space *alter)
 {
 	struct index *old_index = space_index(alter->old_space,
 					      old_index_def->iid);
 	struct index *new_index = space_index(alter->new_space,
 					      new_index_def->iid);
 	index_commit_drop(old_index);
-	index_commit_create(new_index, signature);
+	index_commit_create(new_index);
 }
 
 RebuildIndex::~RebuildIndex()
diff --git a/src/box/index.cc b/src/box/index.cc
index 21c768ce..006b2e9c 100644
--- a/src/box/index.cc
+++ b/src/box/index.cc
@@ -536,7 +536,7 @@ index_build(struct index *index, struct index *pk)
 /* {{{ Virtual method stubs */
 
 void
-generic_index_commit_create(struct index *, int64_t)
+generic_index_commit_create(struct index *)
 {
 }
 
diff --git a/src/box/index.h b/src/box/index.h
index 81d32a8b..07a2ba8b 100644
--- a/src/box/index.h
+++ b/src/box/index.h
@@ -344,12 +344,8 @@ struct index_vtab {
 	/**
 	 * Called after WAL write to commit index creation.
 	 * Must not fail.
-	 *
-	 * @signature is the LSN that was assigned to the row
-	 * that created the index. If the index was created by
-	 * a snapshot row, it is set to the snapshot signature.
 	 */
-	void (*commit_create)(struct index *index, int64_t signature);
+	void (*commit_create)(struct index *index);
 	/**
 	 * Called after WAL write to commit index drop.
 	 * Must not fail.
@@ -463,9 +459,9 @@ int
 index_build(struct index *index, struct index *pk);
 
 static inline void
-index_commit_create(struct index *index, int64_t signature)
+index_commit_create(struct index *index)
 {
-	index->vtab->commit_create(index, signature);
+	index->vtab->commit_create(index);
 }
 
 static inline void
@@ -586,7 +582,7 @@ index_end_build(struct index *index)
 /*
  * Virtual method stubs.
  */
-void generic_index_commit_create(struct index *, int64_t);
+void generic_index_commit_create(struct index *);
 void generic_index_commit_drop(struct index *);
 void generic_index_update_def(struct index *);
 ssize_t generic_index_size(struct index *);
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 244360fa..9b5c7e08 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -775,10 +775,8 @@ vy_index_open(struct vy_env *env, struct vy_index *index)
 }
 
 static void
-vinyl_index_commit_create(struct index *base, int64_t lsn)
+vinyl_index_commit_create(struct index *base)
 {
-	(void)lsn;
-
 	struct vy_env *env = vy_env(base->engine);
 	struct vy_index *index = vy_index(base);
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 4/4] alter: rewrite space truncation using alter infrastructure
  2018-03-17 17:56 [PATCH 0/4] Prepare vylog for space alter Vladimir Davydov
                   ` (2 preceding siblings ...)
  2018-03-17 17:56 ` [PATCH 3/4] index: remove lsn from commit_create arguments Vladimir Davydov
@ 2018-03-17 17:56 ` Vladimir Davydov
  3 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-17 17:56 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Truncation of a space is equivalent to recreation of all space indexes
with the same definition. The reason why we use a special system space
to trigger space truncation (_truncate) is that we don't have
transactional DDL while space truncation has to be done atomically.
However, apart from the new system space, implementation of truncation
entailed a new vylog record (VY_LOG_TRUNCATE_INDEX) and quite a few
lines of code to handle it. So why couldn't we just invoke ALTER that
would recreate all indexes?

To answer this question, one needs to recall that back then vinyl used
LSN to identify indexes in vylog. As a result, we couldn't recreate more
than one index in one operation - if we did that, they would all have
the same LSN and hence wouldn't be distinguishable in vylog. So we had
to introduce a special vylog operation (VY_LOG_TRUNCATE_INDEX) that
bump the truncation counter of an index instead of just dropping and
recreating it. We also had to introduce a pair of new virtual space
methods, prepare_truncate and commit_truncate so that we could write
this new command to vylog in vinyl. Putting it all together, it becomes
obvious why we couldn't reuse ALTER code for space truncation.

Fortunately, things have changed since then. Now, vylog identifies
indexes by space_id/index_id. That means that now we can simplify
space truncation implementation a great deal by

 - reusing alter_space_do() for space truncation,
 - dropping space_vtab::prepare_truncate and commit_truncate,
 - removing truncate_count from space, index, and vylog.
---
 src/box/alter.cc         | 107 ++++---------------------------
 src/box/memtx_space.c    |  29 +++------
 src/box/space.h          |  44 -------------
 src/box/sysview_engine.c |  17 -----
 src/box/vinyl.c          | 161 ++++-------------------------------------------
 src/box/vy_index.c       |  25 +-------
 src/box/vy_index.h       |  22 -------
 src/box/vy_log.c         |  45 +++++--------
 src/box/vy_log.h         |  25 ++------
 9 files changed, 53 insertions(+), 422 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 809fb4ed..6fcc36b5 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -825,7 +825,6 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
 	space_prepare_alter_xc(alter->old_space, alter->new_space);
 
 	alter->new_space->sequence = alter->old_space->sequence;
-	alter->new_space->truncate_count = alter->old_space->truncate_count;
 	memcpy(alter->new_space->access, alter->old_space->access,
 	       sizeof(alter->old_space->access));
 
@@ -1801,48 +1800,6 @@ on_replace_dd_index(struct trigger * /* trigger */, void *event)
 	scoped_guard.is_active = false;
 }
 
-/* {{{ space truncate */
-
-struct truncate_space {
-	/** Space being truncated. */
-	struct space *old_space;
-	/** Space created as a result of truncation. */
-	struct space *new_space;
-	/** Trigger executed to commit truncation. */
-	struct trigger on_commit;
-	/** Trigger executed to rollback truncation. */
-	struct trigger on_rollback;
-};
-
-/**
- * Call the engine specific method to commit truncation
- * and delete the old space.
- */
-static void
-truncate_space_commit(struct trigger *trigger, void * /* event */)
-{
-	struct truncate_space *truncate =
-		(struct truncate_space *) trigger->data;
-	space_commit_truncate(truncate->old_space, truncate->new_space);
-	space_delete(truncate->old_space);
-}
-
-/**
- * Move the old space back to the cache and delete
- * the new space.
- */
-static void
-truncate_space_rollback(struct trigger *trigger, void * /* event */)
-{
-	struct truncate_space *truncate =
-		(struct truncate_space *) trigger->data;
-	if (space_cache_replace(truncate->old_space) != truncate->new_space)
-		unreachable();
-
-	space_swap_triggers(truncate->new_space, truncate->old_space);
-	space_delete(truncate->new_space);
-}
-
 /**
  * A trigger invoked on replace in space _truncate.
  *
@@ -1869,16 +1826,13 @@ on_replace_dd_truncate(struct trigger * /* trigger */, void *event)
 
 	uint32_t space_id =
 		tuple_field_u32_xc(new_tuple, BOX_TRUNCATE_FIELD_SPACE_ID);
-	uint64_t truncate_count =
-		tuple_field_u64_xc(new_tuple, BOX_TRUNCATE_FIELD_COUNT);
 	struct space *old_space = space_cache_find_xc(space_id);
 
 	if (stmt->row->type == IPROTO_INSERT) {
 		/*
 		 * Space creation during initial recovery -
-		 * initialize truncate_count.
+		 * nothing to do.
 		 */
-		old_space->truncate_count = truncate_count;
 		return;
 	}
 
@@ -1896,59 +1850,24 @@ on_replace_dd_truncate(struct trigger * /* trigger */, void *event)
 	 */
 	access_check_space_xc(old_space, PRIV_W);
 
-	/*
-	 * Truncate counter is updated - truncate the space.
-	 */
-	struct truncate_space *truncate =
-		region_calloc_object_xc(&fiber()->gc, struct truncate_space);
-
-	/* Create an empty copy of the old space. */
-	struct rlist key_list;
-	space_dump_def(old_space, &key_list);
-	struct space *new_space = space_new_xc(old_space->def, &key_list);
-	new_space->truncate_count = truncate_count;
-	auto space_guard = make_scoped_guard([=] { space_delete(new_space); });
-
-	/* Notify the engine about upcoming space truncation. */
-	space_prepare_truncate_xc(old_space, new_space);
-
-	space_guard.is_active = false;
-
-	/* Preserve the access control lists during truncate. */
-	memcpy(new_space->access, old_space->access, sizeof(old_space->access));
-
-	/* Truncate does not affect space sequence. */
-	new_space->sequence = old_space->sequence;
-
-	/*
-	 * Replace the old space with the new one in the space
-	 * cache. Requests processed after this point will see
-	 * the space as truncated.
-	 */
-	if (space_cache_replace(new_space) != old_space)
-		unreachable();
+	struct alter_space *alter = alter_space_new(old_space);
+	auto scoped_guard =
+		make_scoped_guard([=] { alter_space_delete(alter); });
 
 	/*
-	 * Register the trigger that will commit or rollback
-	 * truncation depending on whether WAL write succeeds
-	 * or fails.
+	 * Recreate all indexes of the truncated space.
 	 */
-	truncate->old_space = old_space;
-	truncate->new_space = new_space;
-
-	trigger_create(&truncate->on_commit,
-		       truncate_space_commit, truncate, NULL);
-	txn_on_commit(txn, &truncate->on_commit);
-
-	trigger_create(&truncate->on_rollback,
-		       truncate_space_rollback, truncate, NULL);
-	txn_on_rollback(txn, &truncate->on_rollback);
+	for (uint32_t i = 0; i < old_space->index_count; i++) {
+		struct index *old_index = old_space->index[i];
+		(void) new DropIndex(alter, old_index->def);
+		auto create_index = new CreateIndex(alter);
+		create_index->new_index_def = index_def_dup_xc(old_index->def);
+	}
 
-	space_swap_triggers(truncate->new_space, truncate->old_space);
+	alter_space_do(txn, alter);
+	scoped_guard.is_active = false;
 }
 
-/* }}} */
-
 /* {{{ access control */
 
 bool
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index 2d94597a..c7e58946 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -818,16 +818,6 @@ memtx_space_build_secondary_key(struct space *old_space,
 	return rc;
 }
 
-static int
-memtx_space_prepare_truncate(struct space *old_space,
-			     struct space *new_space)
-{
-	struct memtx_space *old_memtx_space = (struct memtx_space *)old_space;
-	struct memtx_space *new_memtx_space = (struct memtx_space *)new_space;
-	new_memtx_space->replace = old_memtx_space->replace;
-	return 0;
-}
-
 static void
 memtx_space_prune(struct space *space)
 {
@@ -858,14 +848,6 @@ fail:
 	panic("failed to prune space");
 }
 
-static void
-memtx_space_commit_truncate(struct space *old_space,
-			    struct space *new_space)
-{
-	(void)new_space;
-	memtx_space_prune(old_space);
-}
-
 static int
 memtx_space_prepare_alter(struct space *old_space, struct space *new_space)
 {
@@ -883,9 +865,14 @@ memtx_space_commit_alter(struct space *old_space, struct space *new_space)
 {
 	struct memtx_space *old_memtx_space = (struct memtx_space *)old_space;
 	struct memtx_space *new_memtx_space = (struct memtx_space *)new_space;
+	bool is_empty = new_space->index_count == 0 ||
+			index_size(new_space->index[0]) == 0;
 
-	/* Delete all tuples when the last index is dropped. */
-	if (new_space->index_count == 0)
+	/*
+	 * Delete all tuples when the last index is dropped
+	 * or the space is truncated.
+	 */
+	if (is_empty)
 		memtx_space_prune(old_space);
 	else
 		new_memtx_space->bsize = old_memtx_space->bsize;
@@ -908,8 +895,6 @@ static const struct space_vtab memtx_space_vtab = {
 	/* .drop_primary_key = */ memtx_space_drop_primary_key,
 	/* .check_format  = */ memtx_space_check_format,
 	/* .build_secondary_key = */ memtx_space_build_secondary_key,
-	/* .prepare_truncate = */ memtx_space_prepare_truncate,
-	/* .commit_truncate = */ memtx_space_commit_truncate,
 	/* .prepare_alter = */ memtx_space_prepare_alter,
 	/* .commit_alter = */ memtx_space_commit_alter,
 };
diff --git a/src/box/space.h b/src/box/space.h
index 6408eedc..65f1531d 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -104,23 +104,6 @@ struct space_vtab {
 				   struct space *new_space,
 				   struct index *new_index);
 	/**
-	 * Notify the enigne about upcoming space truncation
-	 * so that it can prepare new_space object.
-	 */
-	int (*prepare_truncate)(struct space *old_space,
-				struct space *new_space);
-	/**
-	 * Commit space truncation. Called after space truncate
-	 * record was written to WAL hence must not fail.
-	 *
-	 * The old_space is the space that was replaced with the
-	 * new_space as a result of truncation. The callback is
-	 * supposed to release resources associated with the
-	 * old_space and commit the new_space.
-	 */
-	void (*commit_truncate)(struct space *old_space,
-				struct space *new_space);
-	/**
 	 * Notify the engine about the changed space,
 	 * before it's done, to prepare 'new_space' object.
 	 */
@@ -167,12 +150,6 @@ struct space {
 	struct space_def *def;
 	/** Sequence attached to this space or NULL. */
 	struct sequence *sequence;
-	/**
-	 * Number of times the space has been truncated.
-	 * Updating this counter via _truncate space triggers
-	 * space truncation.
-	 */
-	uint64_t truncate_count;
 	/** Enable/disable triggers. */
 	bool run_triggers;
 	/**
@@ -354,20 +331,6 @@ space_build_secondary_key(struct space *old_space,
 }
 
 static inline int
-space_prepare_truncate(struct space *old_space, struct space *new_space)
-{
-	assert(old_space->vtab == new_space->vtab);
-	return new_space->vtab->prepare_truncate(old_space, new_space);
-}
-
-static inline void
-space_commit_truncate(struct space *old_space, struct space *new_space)
-{
-	assert(old_space->vtab == new_space->vtab);
-	new_space->vtab->commit_truncate(old_space, new_space);
-}
-
-static inline int
 space_prepare_alter(struct space *old_space, struct space *new_space)
 {
 	assert(old_space->vtab == new_space->vtab);
@@ -525,13 +488,6 @@ space_build_secondary_key_xc(struct space *old_space,
 }
 
 static inline void
-space_prepare_truncate_xc(struct space *old_space, struct space *new_space)
-{
-	if (space_prepare_truncate(old_space, new_space) != 0)
-		diag_raise();
-}
-
-static inline void
 space_prepare_alter_xc(struct space *old_space, struct space *new_space)
 {
 	if (space_prepare_alter(old_space, new_space) != 0)
diff --git a/src/box/sysview_engine.c b/src/box/sysview_engine.c
index 27d9263a..f6122645 100644
--- a/src/box/sysview_engine.c
+++ b/src/box/sysview_engine.c
@@ -147,21 +147,6 @@ sysview_space_build_secondary_key(struct space *old_space,
 }
 
 static int
-sysview_space_prepare_truncate(struct space *old_space, struct space *new_space)
-{
-	(void)old_space;
-	(void)new_space;
-	return 0;
-}
-
-static void
-sysview_space_commit_truncate(struct space *old_space, struct space *new_space)
-{
-	(void)old_space;
-	(void)new_space;
-}
-
-static int
 sysview_space_prepare_alter(struct space *old_space, struct space *new_space)
 {
 	(void)old_space;
@@ -200,8 +185,6 @@ static const struct space_vtab sysview_space_vtab = {
 	/* .drop_primary_key = */ sysview_space_drop_primary_key,
 	/* .check_format = */ sysview_space_check_format,
 	/* .build_secondary_key = */ sysview_space_build_secondary_key,
-	/* .prepare_truncate = */ sysview_space_prepare_truncate,
-	/* .commit_truncate = */ sysview_space_commit_truncate,
 	/* .prepare_alter = */ sysview_space_prepare_alter,
 	/* .commit_alter = */ sysview_space_commit_alter,
 };
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 9b5c7e08..6736c781 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -880,138 +880,6 @@ vinyl_init_system_space(struct space *space)
 }
 
 static int
-vinyl_space_prepare_truncate(struct space *old_space, struct space *new_space)
-{
-	struct vy_env *env = vy_env(old_space->engine);
-
-	if (vinyl_check_wal(env, "DDL") != 0)
-		return -1;
-
-	assert(old_space->index_count == new_space->index_count);
-	uint32_t index_count = new_space->index_count;
-	if (index_count == 0)
-		return 0;
-
-	struct vy_index *pk = vy_index(old_space->index[0]);
-
-	/*
-	 * On local recovery, we need to handle the following
-	 * scenarios:
-	 *
-	 * - Space truncation was successfully logged before restart.
-	 *   In this case indexes of the old space contain data added
-	 *   after truncation (recovered by vy_index_recover()) and
-	 *   hence we just need to swap contents between old and new
-	 *   spaces.
-	 *
-	 * - We failed to log space truncation before restart.
-	 *   In this case we have to replay space truncation the
-	 *   same way we handle it during normal operation.
-	 *
-	 * See also vy_commit_truncate_space().
-	 */
-	bool truncate_done = (env->status == VINYL_FINAL_RECOVERY_LOCAL &&
-			      pk->truncate_count > old_space->truncate_count);
-
-	for (uint32_t i = 0; i < index_count; i++) {
-		struct vy_index *old_index = vy_index(old_space->index[i]);
-		struct vy_index *new_index = vy_index(new_space->index[i]);
-
-		new_index->is_committed = old_index->is_committed;
-
-		if (truncate_done) {
-			/*
-			 * We are replaying truncate from WAL and the
-			 * old space already contains data added after
-			 * truncate (recovered from vylog). Avoid
-			 * reloading the space content from vylog,
-			 * simply swap the contents of old and new
-			 * spaces instead.
-			 */
-			vy_index_swap(old_index, new_index);
-			new_index->is_dropped = old_index->is_dropped;
-			new_index->truncate_count = old_index->truncate_count;
-			vy_scheduler_remove_index(&env->scheduler, old_index);
-			vy_scheduler_add_index(&env->scheduler, new_index);
-			continue;
-		}
-
-		if (vy_index_init_range_tree(new_index) != 0)
-			return -1;
-
-		new_index->truncate_count = new_space->truncate_count;
-	}
-	return 0;
-}
-
-static void
-vinyl_space_commit_truncate(struct space *old_space, struct space *new_space)
-{
-	struct vy_env *env = vy_env(old_space->engine);
-
-	assert(old_space->index_count == new_space->index_count);
-	uint32_t index_count = new_space->index_count;
-	if (index_count == 0)
-		return;
-
-	struct vy_index *pk = vy_index(old_space->index[0]);
-
-	/*
-	 * See the comment in vy_prepare_truncate_space().
-	 */
-	if (env->status == VINYL_FINAL_RECOVERY_LOCAL &&
-	    pk->truncate_count > old_space->truncate_count)
-		return;
-
-	/*
-	 * Mark old indexes as dropped and remove them from the scheduler.
-	 * After this point no task can be scheduled or completed for any
-	 * of them (only aborted).
-	 */
-	for (uint32_t i = 0; i < index_count; i++) {
-		struct vy_index *index = vy_index(old_space->index[i]);
-		index->is_dropped = true;
-		vy_scheduler_remove_index(&env->scheduler, index);
-	}
-
-	/*
-	 * Log change in metadata.
-	 *
-	 * Since we can't fail here, in case of vylog write failure
-	 * we leave records we failed to write in vylog buffer so
-	 * that they get flushed along with the next write. If they
-	 * don't, we will replay them during WAL recovery.
-	 */
-	vy_log_tx_begin();
-	int64_t gc_lsn = checkpoint_last(NULL);
-	for (uint32_t i = 0; i < index_count; i++) {
-		struct vy_index *old_index = vy_index(old_space->index[i]);
-		struct vy_index *new_index = vy_index(new_space->index[i]);
-		struct vy_range *range = vy_range_tree_first(new_index->tree);
-
-		assert(!new_index->is_dropped);
-		assert(new_index->truncate_count == new_space->truncate_count);
-		assert(new_index->range_count == 1);
-
-		vy_log_index_prune(old_index, gc_lsn);
-		vy_log_insert_range(new_index->space_id, new_index->id,
-				    range->id, NULL, NULL);
-		vy_log_truncate_index(new_index->space_id, new_index->id,
-				      new_index->truncate_count);
-	}
-	vy_log_tx_try_commit();
-
-	/*
-	 * After we committed space truncation in the metadata log,
-	 * we can make new indexes eligible for dump and compaction.
-	 */
-	for (uint32_t i = 0; i < index_count; i++) {
-		struct vy_index *index = vy_index(new_space->index[i]);
-		vy_scheduler_add_index(&env->scheduler, index);
-	}
-}
-
-static int
 vinyl_space_prepare_alter(struct space *old_space, struct space *new_space)
 {
 	struct vy_env *env = vy_env(old_space->engine);
@@ -1306,15 +1174,12 @@ vinyl_index_bsize(struct index *base)
  * either.
  */
 static inline bool
-vy_is_committed_one(struct vy_env *env, struct space *space,
-		    struct vy_index *index)
+vy_is_committed_one(struct vy_env *env, struct vy_index *index)
 {
 	if (likely(env->status != VINYL_FINAL_RECOVERY_LOCAL))
 		return false;
 	if (index->is_dropped)
 		return true;
-	if (index->truncate_count > space->truncate_count)
-		return true;
 	if (vclock_sum(env->recovery_vclock) <= index->dump_lsn)
 		return true;
 	return false;
@@ -1331,7 +1196,7 @@ vy_is_committed(struct vy_env *env, struct space *space)
 		return false;
 	for (uint32_t iid = 0; iid < space->index_count; iid++) {
 		struct vy_index *index = vy_index(space->index[iid]);
-		if (!vy_is_committed_one(env, space, index))
+		if (!vy_is_committed_one(env, index))
 			return false;
 	}
 	return true;
@@ -1567,7 +1432,7 @@ vy_replace_impl(struct vy_env *env, struct vy_tx *tx, struct space *space,
 	if (pk == NULL) /* space has no primary key */
 		return -1;
 	/* Primary key is dumped last. */
-	assert(!vy_is_committed_one(env, space, pk));
+	assert(!vy_is_committed_one(env, pk));
 	assert(pk->id == 0);
 	if (tuple_validate_raw(pk->mem_format, request->tuple))
 		return -1;
@@ -1606,7 +1471,7 @@ vy_replace_impl(struct vy_env *env, struct vy_tx *tx, struct space *space,
 	for (uint32_t iid = 1; iid < space->index_count; ++iid) {
 		struct vy_index *index;
 		index = vy_index(space->index[iid]);
-		if (vy_is_committed_one(env, space, index))
+		if (vy_is_committed_one(env, index))
 			continue;
 		/*
 		 * Delete goes first, so if old and new keys
@@ -1739,7 +1604,7 @@ vy_delete_impl(struct vy_env *env, struct vy_tx *tx, struct space *space,
 	if (pk == NULL)
 		return -1;
 	/* Primary key is dumped last. */
-	assert(!vy_is_committed_one(env, space, pk));
+	assert(!vy_is_committed_one(env, pk));
 	struct tuple *delete =
 		vy_stmt_new_surrogate_delete(pk->mem_format, tuple);
 	if (delete == NULL)
@@ -1751,7 +1616,7 @@ vy_delete_impl(struct vy_env *env, struct vy_tx *tx, struct space *space,
 	struct vy_index *index;
 	for (uint32_t i = 1; i < space->index_count; ++i) {
 		index = vy_index(space->index[i]);
-		if (vy_is_committed_one(env, space, index))
+		if (vy_is_committed_one(env, index))
 			continue;
 		if (vy_tx_set(tx, index, delete) != 0)
 			goto error;
@@ -1899,7 +1764,7 @@ vy_update(struct vy_env *env, struct vy_tx *tx, struct txn_stmt *stmt,
 	assert(pk != NULL);
 	assert(pk->id == 0);
 	/* Primary key is dumped last. */
-	assert(!vy_is_committed_one(env, space, pk));
+	assert(!vy_is_committed_one(env, pk));
 	uint64_t column_mask = 0;
 	const char *new_tuple, *new_tuple_end;
 	uint32_t new_size, old_size;
@@ -1953,7 +1818,7 @@ vy_update(struct vy_env *env, struct vy_tx *tx, struct txn_stmt *stmt,
 
 	for (uint32_t i = 1; i < space->index_count; ++i) {
 		index = vy_index(space->index[i]);
-		if (vy_is_committed_one(env, space, index))
+		if (vy_is_committed_one(env, index))
 			continue;
 		if (vy_tx_set(tx, index, delete) != 0)
 			goto error;
@@ -2137,7 +2002,7 @@ vy_upsert(struct vy_env *env, struct vy_tx *tx, struct txn_stmt *stmt,
 	if (pk == NULL)
 		return -1;
 	/* Primary key is dumped last. */
-	assert(!vy_is_committed_one(env, space, pk));
+	assert(!vy_is_committed_one(env, pk));
 	if (tuple_validate_raw(pk->mem_format, tuple))
 		return -1;
 
@@ -2233,7 +2098,7 @@ vy_upsert(struct vy_env *env, struct vy_tx *tx, struct txn_stmt *stmt,
 
 	for (uint32_t i = 1; i < space->index_count; ++i) {
 		index = vy_index(space->index[i]);
-		if (vy_is_committed_one(env, space, index))
+		if (vy_is_committed_one(env, index))
 			continue;
 		if (vy_tx_set(tx, index, delete) != 0)
 			goto error;
@@ -2273,7 +2138,7 @@ vy_insert(struct vy_env *env, struct vy_tx *tx, struct txn_stmt *stmt,
 		return -1;
 	assert(pk->id == 0);
 	/* Primary key is dumped last. */
-	assert(!vy_is_committed_one(env, space, pk));
+	assert(!vy_is_committed_one(env, pk));
 	if (tuple_validate_raw(pk->mem_format, request->tuple))
 		return -1;
 	/* First insert into the primary index. */
@@ -2286,7 +2151,7 @@ vy_insert(struct vy_env *env, struct vy_tx *tx, struct txn_stmt *stmt,
 
 	for (uint32_t iid = 1; iid < space->index_count; ++iid) {
 		struct vy_index *index = vy_index(space->index[iid]);
-		if (vy_is_committed_one(env, space, index))
+		if (vy_is_committed_one(env, index))
 			continue;
 		if (vy_insert_secondary(env, tx, space, index,
 					stmt->new_tuple) != 0)
@@ -4055,8 +3920,6 @@ static const struct space_vtab vinyl_space_vtab = {
 	/* .drop_primary_key = */ vinyl_space_drop_primary_key,
 	/* .check_format = */ vinyl_space_check_format,
 	/* .build_secondary_key = */ vinyl_space_build_secondary_key,
-	/* .prepare_truncate = */ vinyl_space_prepare_truncate,
-	/* .commit_truncate = */ vinyl_space_commit_truncate,
 	/* .prepare_alter = */ vinyl_space_prepare_alter,
 	/* .commit_alter = */ vinyl_space_commit_alter,
 };
diff --git a/src/box/vy_index.c b/src/box/vy_index.c
index 5a994c47..50c2bff9 100644
--- a/src/box/vy_index.c
+++ b/src/box/vy_index.c
@@ -317,23 +317,8 @@ vy_index_delete(struct vy_index *index)
 	free(index);
 }
 
-void
-vy_index_swap(struct vy_index *old_index, struct vy_index *new_index)
-{
-	assert(old_index->stat.memory.count.rows == 0);
-	assert(new_index->stat.memory.count.rows == 0);
-
-	SWAP(old_index->dump_lsn, new_index->dump_lsn);
-	SWAP(old_index->range_count, new_index->range_count);
-	SWAP(old_index->run_count, new_index->run_count);
-	SWAP(old_index->stat, new_index->stat);
-	SWAP(old_index->run_hist, new_index->run_hist);
-	SWAP(old_index->tree, new_index->tree);
-	SWAP(old_index->range_heap, new_index->range_heap);
-	rlist_swap(&old_index->runs, &new_index->runs);
-}
-
-int
+/** Initialize the range tree of a new index. */
+static int
 vy_index_init_range_tree(struct vy_index *index)
 {
 	struct vy_range *range = vy_range_new(vy_log_next_id(), NULL, NULL,
@@ -587,11 +572,6 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 		 */
 		index->is_dropped = true;
 		/*
-		 * If the index was dropped, we don't need to replay
-		 * truncate (see vinyl_space_prepare_truncate()).
-		 */
-		index->truncate_count = UINT64_MAX;
-		/*
 		 * We need range tree initialized for all indexes,
 		 * even for dropped ones.
 		 */
@@ -602,7 +582,6 @@ vy_index_recover(struct vy_index *index, struct vy_recovery *recovery,
 	 * Loading the last incarnation of the index from vylog.
 	 */
 	index->dump_lsn = index_info->dump_lsn;
-	index->truncate_count = index_info->truncate_count;
 
 	int rc = 0;
 	struct vy_range_recovery_info *range_info;
diff --git a/src/box/vy_index.h b/src/box/vy_index.h
index 5a4aa29b..1f1f7236 100644
--- a/src/box/vy_index.h
+++ b/src/box/vy_index.h
@@ -265,15 +265,6 @@ struct vy_index {
 	 */
 	bool is_dropped;
 	/**
-	 * Number of times the index was truncated.
-	 *
-	 * After recovery is complete, it equals space->truncate_count.
-	 * On local recovery, it is loaded from the metadata log and may
-	 * be greater than space->truncate_count, which indicates that
-	 * the space is truncated in WAL.
-	 */
-	uint64_t truncate_count;
-	/**
 	 * If pin_count > 0 the index can't be scheduled for dump.
 	 * Used to make sure that the primary index is dumped last.
 	 */
@@ -344,19 +335,6 @@ vy_index_unref(struct vy_index *index)
 }
 
 /**
- * Swap disk contents (ranges, runs, and corresponding stats)
- * between two indexes. Used only on recovery, to skip reloading
- * indexes of a truncated space. The in-memory tree of the index
- * can't be populated - see vy_is_committed_one().
- */
-void
-vy_index_swap(struct vy_index *old_index, struct vy_index *new_index);
-
-/** Initialize the range tree of a new index. */
-int
-vy_index_init_range_tree(struct vy_index *index);
-
-/**
  * Create a new vinyl index.
  *
  * This function is called when an index is created after recovery
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index cbd6dc16..d96a154c 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -257,10 +257,6 @@ vy_log_record_snprint(char *buf, int size, const struct vy_log_record *record)
 		SNPRINT(total, snprintf, buf, size, "%s=%"PRIi64", ",
 			vy_log_key_name[VY_LOG_KEY_GC_LSN],
 			record->gc_lsn);
-	if (record->truncate_count > 0)
-		SNPRINT(total, snprintf, buf, size, "%s=%"PRIi64", ",
-			vy_log_key_name[VY_LOG_KEY_TRUNCATE_COUNT],
-			record->truncate_count);
 	SNPRINT(total, snprintf, buf, size, "}");
 	return total;
 }
@@ -362,11 +358,6 @@ vy_log_record_encode(const struct vy_log_record *record,
 		size += mp_sizeof_uint(record->gc_lsn);
 		n_keys++;
 	}
-	if (record->truncate_count > 0) {
-		size += mp_sizeof_uint(VY_LOG_KEY_TRUNCATE_COUNT);
-		size += mp_sizeof_uint(record->truncate_count);
-		n_keys++;
-	}
 	size += mp_sizeof_map(n_keys);
 
 	/*
@@ -429,10 +420,6 @@ vy_log_record_encode(const struct vy_log_record *record,
 		pos = mp_encode_uint(pos, VY_LOG_KEY_GC_LSN);
 		pos = mp_encode_uint(pos, record->gc_lsn);
 	}
-	if (record->truncate_count > 0) {
-		pos = mp_encode_uint(pos, VY_LOG_KEY_TRUNCATE_COUNT);
-		pos = mp_encode_uint(pos, record->truncate_count);
-	}
 	assert(pos == tuple + size);
 
 	/*
@@ -549,7 +536,7 @@ vy_log_record_decode(struct vy_log_record *record,
 			record->gc_lsn = mp_decode_uint(&pos);
 			break;
 		case VY_LOG_KEY_TRUNCATE_COUNT:
-			record->truncate_count = mp_decode_uint(&pos);
+			/* Not used anymore, ignore. */
 			break;
 		default:
 			diag_set(ClientError, ER_INVALID_VYLOG_FILE,
@@ -1278,7 +1265,6 @@ vy_recovery_create_index(struct vy_recovery *recovery,
 	index->key_part_count = key_part_count;
 	index->is_dropped = false;
 	index->dump_lsn = -1;
-	index->truncate_count = 0;
 
 	if (id->index_lsn != 0) {
 		/*
@@ -1384,14 +1370,22 @@ vy_recovery_dump_index(struct vy_recovery *recovery,
 
 /**
  * Handle a VY_LOG_TRUNCATE_INDEX log record.
- * This function updates truncate_count of the index with ID @id.
+ * This function bumps the incarnation counter of the index with ID @id.
  * Returns 0 on success, -1 if ID not found or index is dropped.
  */
 static int
 vy_recovery_truncate_index(struct vy_recovery *recovery,
-			   const struct vy_index_recovery_id *id,
-			   int64_t truncate_count)
+			   const struct vy_index_recovery_id *id)
 {
+	if (!recovery->seen_snapshot) {
+		/*
+		 * This isn't a real truncation - it can't be
+		 * as we are recovering the snapshot. The record
+		 * must have been written to initialize truncation
+		 * counter, which is not used anymore, so ignore.
+		 */
+		return 0;
+	}
 	struct vy_index_recovery_info *index;
 	index = vy_recovery_lookup_index(recovery, id);
 	if (index == NULL) {
@@ -1406,7 +1400,7 @@ vy_recovery_truncate_index(struct vy_recovery *recovery,
 				    vy_index_recovery_id_str(id)));
 		return -1;
 	}
-	index->truncate_count = truncate_count;
+	index->incarnation_count++;
 	return 0;
 }
 
@@ -1858,8 +1852,7 @@ vy_recovery_process_record(struct vy_recovery *recovery,
 					    record->dump_lsn);
 		break;
 	case VY_LOG_TRUNCATE_INDEX:
-		rc = vy_recovery_truncate_index(recovery, &index_id,
-						record->truncate_count);
+		rc = vy_recovery_truncate_index(recovery, &index_id);
 		break;
 	default:
 		unreachable();
@@ -2073,16 +2066,6 @@ vy_log_append_index(struct xlog *xlog, struct vy_index_recovery_info *index)
 	if (vy_log_append_record(xlog, &record) != 0)
 		return -1;
 
-	if (index->truncate_count > 0) {
-		vy_log_record_init(&record);
-		record.type = VY_LOG_TRUNCATE_INDEX;
-		record.index_id = index->index_id;
-		record.space_id = index->space_id;
-		record.truncate_count = index->truncate_count;
-		if (vy_log_append_record(xlog, &record) != 0)
-			return -1;
-	}
-
 	if (index->dump_lsn >= 0) {
 		vy_log_record_init(&record);
 		record.type = VY_LOG_DUMP_INDEX;
diff --git a/src/box/vy_log.h b/src/box/vy_log.h
index 67dcd418..723f4d8e 100644
--- a/src/box/vy_log.h
+++ b/src/box/vy_log.h
@@ -154,8 +154,11 @@ enum vy_log_record_type {
 	 */
 	VY_LOG_SNAPSHOT			= 11,
 	/**
-	 * Update truncate count of a vinyl index.
-	 * Requires vy_log_record::space_id, index_id, truncate_count.
+	 * Truncate an index.
+	 * Requires vy_log_record::space_id, index_id.
+	 *
+	 * This is basically a shortcut for DROP + CREATE.
+	 * Not used anymore, left for backward compatibility.
 	 */
 	VY_LOG_TRUNCATE_INDEX		= 12,
 
@@ -208,8 +211,6 @@ struct vy_log_record {
 	 * that uses this run.
 	 */
 	int64_t gc_lsn;
-	/** Index truncate count. */
-	int64_t truncate_count;
 	/** Link in vy_log::tx. */
 	struct stailq_entry in_tx;
 };
@@ -278,8 +279,6 @@ struct vy_index_recovery_info {
 	bool is_dropped;
 	/** LSN of the last index dump. */
 	int64_t dump_lsn;
-	/** Truncate count. */
-	int64_t truncate_count;
 	/**
 	 * Number of incarnations the index has had
 	 * since the last checkpoint.
@@ -672,20 +671,6 @@ vy_log_dump_index(uint32_t space_id, uint32_t index_id, int64_t dump_lsn)
 	vy_log_write(&record);
 }
 
-/** Helper to log index truncation. */
-static inline void
-vy_log_truncate_index(uint32_t space_id, uint32_t index_id,
-		      int64_t truncate_count)
-{
-	struct vy_log_record record;
-	vy_log_record_init(&record);
-	record.type = VY_LOG_TRUNCATE_INDEX;
-	record.space_id = space_id;
-	record.index_id = index_id;
-	record.truncate_count = truncate_count;
-	vy_log_write(&record);
-}
-
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/4] vinyl: do not use index lsn to identify indexes in vylog
  2018-03-17 17:56 ` [PATCH 2/4] vinyl: do not use index lsn to identify indexes in vylog Vladimir Davydov
@ 2018-03-19 15:01   ` Vladimir Davydov
  0 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-19 15:01 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Please ignore this patch for now. We can't use space_id/index_id for
identifying indexes in vylog after all. The problem is during ALTER we
have to maintain two indexes with the same space_id/index_id. It seems
that we need a unique index identifier in vylog after all, and it can't
be LSN. I'll rework the patch.

On Sat, Mar 17, 2018 at 08:56:35PM +0300, Vladimir Davydov wrote:
> An index can be dropped and then recreated with the same space/index id.
> To discriminate between different incarnations of the same index during
> recovery, we use LSN from the time of index creation, as it is supposed
> to be unique. However, the uniqueness property won't hold once ALTER is
> implemented for vinyl spaces: the problem is we have to rebuild all
> indexes of a space and commit them simultaneously in case the primary
> index is ALTERED.
> 
> Actually, we don't really need a unique index ID stored both in vylog
> and in WAL to recover all incarnations of vinyl indexes. Since WAL is
> replayed in the chronological order, we can find the index corresponding
> to index creation statement replayed from WAL by its space_id/index_id
> and the number of times the index has been dropped since we started to
> replay WAL. As we don't actually recover indexes that are dropped during
> WAL recovery, we don't need to know their properties, we just need to
> know that they existed. So we don't need to store a separate object for
> each index incarnation in the recovery context, it's enough to add a
> counter representing the number of incarnations the index has had since
> the last checkpoint to the vy_index_recovery_info structure to be able
> to recover all incarnations of the same vinyl index. Then on recovery,
> vy_index_recover() would check the incarnation counter of the index and
> 
>  - recover the last incarnation if it is 1
>  - emit create/drop record if it is > 1
>  - do nothing if it is 0
> 
> and decrement the counter on success so that the next call to
> vy_index_recover() with the same space and index id proceeds to
> the next incarnation of this index. And this is what this patch
> does in a nutshell.
> 
> The tricky part is preserving backward compatibility. We can't just
> replace index_lsn with space_id/index_id everywhere, because there
> still can be records that use index_lsn to identify an index and we
> need to process them. For example, VY_LOG_CREATE_INDEX is fine in
> this regard, because it contains both space_id/index_id and index_lsn,
> but VY_LOG_DROP_INDEX isn't, as it only has index_lsn. So we have to
> maintain two index hash tables during vylog recovery instead of just
> one: one for new records where index is referenced by space_id/index_id
> and another for legacy records where index_lsn is used instead.
> ---
>  src/box/vinyl.c          |  39 ++-----
>  src/box/vy_index.c       |  54 ++++-----
>  src/box/vy_index.h       |  12 +-
>  src/box/vy_log.c         | 286 ++++++++++++++++++++++++++++-------------------
>  src/box/vy_log.h         |  97 +++++++++++-----
>  src/box/vy_scheduler.c   |  12 +-
>  test/unit/vy_log_stub.c  |   2 +-
>  test/vinyl/layout.result |  32 +++---
>  8 files changed, 300 insertions(+), 234 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-19 15:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-17 17:56 [PATCH 0/4] Prepare vylog for space alter Vladimir Davydov
2018-03-17 17:56 ` [PATCH 1/4] vinyl: refactor vylog recovery Vladimir Davydov
2018-03-17 17:56 ` [PATCH 2/4] vinyl: do not use index lsn to identify indexes in vylog Vladimir Davydov
2018-03-19 15:01   ` Vladimir Davydov
2018-03-17 17:56 ` [PATCH 3/4] index: remove lsn from commit_create arguments Vladimir Davydov
2018-03-17 17:56 ` [PATCH 4/4] alter: rewrite space truncation using alter infrastructure Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox