Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH 0/5] alter: fix WAL error handling
@ 2018-04-03 17:37 Vladimir Davydov
  2018-04-03 17:37 ` [PATCH 1/5] memtx: rtree: remove pointless index_vtab::begin_build implementation Vladimir Davydov
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Vladimir Davydov @ 2018-04-03 17:37 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

This patch set fixes the use-after-free error that may occur if DDL
fails due to WAL error, see [1]. The code is available by the link [2].
It is based on top of [3, 4].

Note, the first three patches are trivial cleanups. They are not
directly connected to the issue in question, but they are all related to
the problem of vinyl space alter [5], which I'm currently working on.
The real work is done by patches 4 and 5.

[1] https://github.com/tarantool/tarantool/issues/3289
[2] https://github.com/tarantool/tarantool/commits/gh-3289-alter-fix-wal-error-handling
[3] https://github.com/tarantool/tarantool/commits/vy-allow-to-extend-key-def
[4] https://www.freelists.org/post/tarantool-patches/PATCH-0012-vinyl-allow-to-extend-key-def-of-nonempty-index
[5] https://github.com/tarantool/tarantool/issues/1653

Vladimir Davydov (5):
  memtx: rtree: remove pointless index_vtab::begin_build implementation
  memtx: don't call begin_buid and end_build for new pk after recovery
  vinyl: use disk_format in vy_run_rebuild_index
  vinyl: do not use space_vtab::commit_alter for preparing new indexes
  alter: call space_vtab::commit_alter after WAL write

 src/box/alter.cc         | 23 +++-----------
 src/box/memtx_rtree.c    |  9 +-----
 src/box/memtx_space.c    | 38 ++++++++---------------
 src/box/memtx_space.h    |  6 ++++
 src/box/vinyl.c          | 79 +++++++++++++++++++++---------------------------
 src/box/vy_lsm.c         |  2 +-
 src/box/vy_lsm.h         | 17 +++++++++++
 src/box/vy_run.c         |  4 +--
 src/box/vy_run.h         | 14 +++++----
 test/box/errinj.result   | 66 ++++++++++++++++++++++++++++++++++++++++
 test/box/errinj.test.lua | 18 +++++++++++
 11 files changed, 170 insertions(+), 106 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] memtx: rtree: remove pointless index_vtab::begin_build implementation
  2018-04-03 17:37 [PATCH 0/5] alter: fix WAL error handling Vladimir Davydov
@ 2018-04-03 17:37 ` Vladimir Davydov
  2018-04-05 20:25   ` Konstantin Osipov
  2018-04-03 17:37 ` [PATCH 2/5] memtx: don't call begin_buid and end_build for new pk after recovery Vladimir Davydov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2018-04-03 17:37 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

The rtree is empty when this function is called (in fact, it is called
right after creating the index), there's no need to purge it.
---
 src/box/memtx_rtree.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c
index 7cd3ac30..ff213922 100644
--- a/src/box/memtx_rtree.c
+++ b/src/box/memtx_rtree.c
@@ -285,13 +285,6 @@ memtx_rtree_index_create_iterator(struct index *base,  enum iterator_type type,
 	return (struct iterator *)it;
 }
 
-static void
-memtx_rtree_index_begin_build(struct index *base)
-{
-	struct memtx_rtree_index *index = (struct memtx_rtree_index *)base;
-	rtree_purge(&index->tree);
-}
-
 static const struct index_vtab memtx_rtree_index_vtab = {
 	/* .destroy = */ memtx_rtree_index_destroy,
 	/* .commit_create = */ generic_index_commit_create,
@@ -313,7 +306,7 @@ static const struct index_vtab memtx_rtree_index_vtab = {
 		generic_index_create_snapshot_iterator,
 	/* .info = */ generic_index_info,
 	/* .reset_stat = */ generic_index_reset_stat,
-	/* .begin_build = */ memtx_rtree_index_begin_build,
+	/* .begin_build = */ generic_index_begin_build,
 	/* .reserve = */ generic_index_reserve,
 	/* .build_next = */ generic_index_build_next,
 	/* .end_build = */ generic_index_end_build,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/5] memtx: don't call begin_buid and end_build for new pk after recovery
  2018-04-03 17:37 [PATCH 0/5] alter: fix WAL error handling Vladimir Davydov
  2018-04-03 17:37 ` [PATCH 1/5] memtx: rtree: remove pointless index_vtab::begin_build implementation Vladimir Davydov
@ 2018-04-03 17:37 ` Vladimir Davydov
  2018-04-03 17:37 ` [PATCH 3/5] vinyl: use disk_format in vy_run_rebuild_index Vladimir Davydov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Vladimir Davydov @ 2018-04-03 17:37 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Basically, index_begin_build() followed by index_end_build() is a no-op.
There's absolutely no point in calling it for primary indexes after
initial recovery has completed.
---
 src/box/memtx_space.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index e0fa3e2c..ec6f6db6 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -680,12 +680,12 @@ memtx_space_create_index(struct space *space, struct index_def *index_def)
  * then the primary key is not built, otherwise it's created
  * right away.
  */
-static void
-memtx_space_do_add_primary_key(struct space *space,
-			       enum memtx_recovery_state state)
+static int
+memtx_space_add_primary_key(struct space *space)
 {
 	struct memtx_space *memtx_space = (struct memtx_space *)space;
-	switch (state) {
+	struct memtx_engine *memtx = (struct memtx_engine *)space->engine;
+	switch (memtx->state) {
 	case MEMTX_INITIALIZED:
 		panic("can't create a new space before snapshot recovery");
 		break;
@@ -694,23 +694,12 @@ memtx_space_do_add_primary_key(struct space *space,
 		memtx_space->replace = memtx_space_replace_build_next;
 		break;
 	case MEMTX_FINAL_RECOVERY:
-		index_begin_build(space->index[0]);
-		index_end_build(space->index[0]);
 		memtx_space->replace = memtx_space_replace_primary_key;
 		break;
 	case MEMTX_OK:
-		index_begin_build(space->index[0]);
-		index_end_build(space->index[0]);
 		memtx_space->replace = memtx_space_replace_all_keys;
 		break;
 	}
-}
-
-static int
-memtx_space_add_primary_key(struct space *space)
-{
-	struct memtx_engine *memtx = (struct memtx_engine *)space->engine;
-	memtx_space_do_add_primary_key(space, memtx->state);
 	return 0;
 }
 
@@ -752,7 +741,8 @@ memtx_space_drop_primary_key(struct space *space)
 static void
 memtx_init_system_space(struct space *space)
 {
-	memtx_space_do_add_primary_key(space, MEMTX_OK);
+	struct memtx_space *memtx_space = (struct memtx_space *)space;
+	memtx_space->replace = memtx_space_replace_all_keys;
 }
 
 static int
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/5] vinyl: use disk_format in vy_run_rebuild_index
  2018-04-03 17:37 [PATCH 0/5] alter: fix WAL error handling Vladimir Davydov
  2018-04-03 17:37 ` [PATCH 1/5] memtx: rtree: remove pointless index_vtab::begin_build implementation Vladimir Davydov
  2018-04-03 17:37 ` [PATCH 2/5] memtx: don't call begin_buid and end_build for new pk after recovery Vladimir Davydov
@ 2018-04-03 17:37 ` Vladimir Davydov
  2018-04-05 20:25   ` Konstantin Osipov
  2018-04-03 17:37 ` [PATCH 4/5] vinyl: do not use space_vtab::commit_alter for preparing new indexes Vladimir Davydov
  2018-04-03 17:37 ` [PATCH 5/5] alter: call space_vtab::commit_alter after WAL write Vladimir Davydov
  4 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2018-04-03 17:37 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

We read tuples from disk hence we should use disk_format, not
mem_format. Fix it. While we are at it, let's also update the
outdated comment to vy_run_rebuild_index.
---
 src/box/vy_lsm.c |  2 +-
 src/box/vy_run.c |  4 ++--
 src/box/vy_run.h | 14 ++++++++------
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index a8ed39ad..73196b10 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -353,7 +353,7 @@ vy_lsm_recover_run(struct vy_lsm *lsm, struct vy_run_recovery_info *run_info,
 	     vy_run_rebuild_index(run, lsm->env->path,
 				  lsm->space_id, lsm->index_id,
 				  lsm->cmp_def, lsm->key_def,
-				  lsm->mem_format, &lsm->opts) != 0)) {
+				  lsm->disk_format, &lsm->opts) != 0)) {
 		vy_run_unref(run);
 		return NULL;
 	}
diff --git a/src/box/vy_run.c b/src/box/vy_run.c
index 637f63fa..c12fb9c5 100644
--- a/src/box/vy_run.c
+++ b/src/box/vy_run.c
@@ -2201,7 +2201,7 @@ vy_run_rebuild_index(struct vy_run *run, const char *dir,
 		     uint32_t space_id, uint32_t iid,
 		     const struct key_def *cmp_def,
 		     const struct key_def *key_def,
-		     struct tuple_format *mem_format,
+		     struct tuple_format *format,
 		     const struct index_opts *opts)
 {
 	assert(run->info.bloom == NULL);
@@ -2255,7 +2255,7 @@ vy_run_rebuild_index(struct vy_run *run, const char *dir,
 			}
 			++page_row_count;
 			struct tuple *tuple = vy_stmt_decode(&xrow, cmp_def,
-							mem_format, iid == 0);
+							     format, iid == 0);
 			if (tuple == NULL)
 				goto close_err;
 			if (bloom_builder != NULL) {
diff --git a/src/box/vy_run.h b/src/box/vy_run.h
index 1e8bf740..d31dd645 100644
--- a/src/box/vy_run.h
+++ b/src/box/vy_run.h
@@ -350,21 +350,23 @@ vy_run_recover(struct vy_run *run, const char *dir,
 	       uint32_t space_id, uint32_t iid);
 
 /**
- * Rebuild vy_run index
- * @param run - run to laod
+ * Rebuild run index
+ * @param run - run to rebuild index for
  * @param dir - path to the vinyl directory
  * @param space_id - space id
  * @param iid - index id
- * @param key_def index key definition
- * @param bloom_fpr bloom filter param
+ * @param cmp_def - key definition with primary key parts
+ * @param key_def - user defined key definition
+ * @param format - format for allocating tuples read from disk
+ * @param opts - index options
  * @return - 0 on sucess, -1 on fail
  */
 int
 vy_run_rebuild_index(struct vy_run *run, const char *dir,
 		     uint32_t space_id, uint32_t iid,
+		     const struct key_def *cmp_def,
 		     const struct key_def *key_def,
-		     const struct key_def *user_key_def,
-		     struct tuple_format *mem_format,
+		     struct tuple_format *format,
 		     const struct index_opts *opts);
 
 enum vy_file_type {
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/5] vinyl: do not use space_vtab::commit_alter for preparing new indexes
  2018-04-03 17:37 [PATCH 0/5] alter: fix WAL error handling Vladimir Davydov
                   ` (2 preceding siblings ...)
  2018-04-03 17:37 ` [PATCH 3/5] vinyl: use disk_format in vy_run_rebuild_index Vladimir Davydov
@ 2018-04-03 17:37 ` Vladimir Davydov
  2018-04-03 17:37 ` [PATCH 5/5] alter: call space_vtab::commit_alter after WAL write Vladimir Davydov
  4 siblings, 0 replies; 10+ messages in thread
From: Vladimir Davydov @ 2018-04-03 17:37 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

Currently, space_vtab::commit_alter is called before WAL write so we can
use it for preparing new indexes in vinyl. However, this is going to
change soon, because actually space_vtab::commit_alter should be called
after WAL write, like index_vtab::commit_drop or commit_create. Calling
it before WAL write may result in use-after-free in memtx (see #3289).
Besides, using this function for iterating over space indexes just feels
wrong, as we have index methods invoked by AlterSpaceOp descendants for
this.

So let's move all the operations performed by vinyl_space_commit_alter
somewhere else. Fortunately, it's easy to do without damaging code
readability or efficiency:

 - Update of vy_lsm::pk can be done by vinyl_space_swap_index and
   vinyl_build_secondary_key.

 - vy_lsm->check_is_unique can be computed by vinyl_engine_create_space
   and then set by vinyl_space_swap_index.

 - Call to key_def_update_optionality is implied by key_def_swap, which
   is already called by vinyl_space_swap_index, hence it can be removed.

Needed for #3289
---
 src/box/vinyl.c  | 79 ++++++++++++++++++++++++--------------------------------
 src/box/vy_lsm.h | 17 ++++++++++++
 2 files changed, 51 insertions(+), 45 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index d503704e..01755d7d 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -643,6 +643,27 @@ vinyl_engine_create_space(struct engine *engine, struct space_def *def,
 
 	/* Format is now referenced by the space. */
 	tuple_format_unref(format);
+
+	/*
+	 * Check if there are unique indexes that are contained
+	 * by other unique indexes. For them, we can skip check
+	 * for duplicates on INSERT. Prefer indexes with higher
+	 * ids for uniqueness check optimization as they are
+	 * likelier to have a "colder" cache.
+	 */
+	for (int i = space->index_count - 1; i >= 0; i--) {
+		struct vy_lsm *lsm = vy_lsm(space->index[i]);
+		if (!lsm->check_is_unique)
+			continue;
+		for (int j = 0; j < (int)space->index_count; j++) {
+			struct vy_lsm *other = vy_lsm(space->index[j]);
+			if (other != lsm && other->check_is_unique &&
+			    key_def_contains(lsm->key_def, other->key_def)) {
+				lsm->check_is_unique = false;
+				break;
+			}
+		}
+	}
 	return space;
 }
 
@@ -1035,49 +1056,7 @@ static void
 vinyl_space_commit_alter(struct space *old_space, struct space *new_space)
 {
 	(void)old_space;
-	if (new_space == NULL || new_space->index_count == 0)
-		return; /* space drop */
-
-	struct tuple_format *new_format = new_space->format;
-	struct vy_lsm *pk = vy_lsm(new_space->index[0]);
-	assert(pk->pk == NULL);
-
-	pk->check_is_unique = true;
-	key_def_update_optionality(pk->key_def, new_format->min_field_count);
-	key_def_update_optionality(pk->cmp_def, new_format->min_field_count);
-
-	for (uint32_t i = 1; i < new_space->index_count; ++i) {
-		struct vy_lsm *lsm = vy_lsm(new_space->index[i]);
-		vy_lsm_unref(lsm->pk);
-		vy_lsm_ref(pk);
-		lsm->pk = pk;
-		lsm->check_is_unique = lsm->opts.is_unique;
-		key_def_update_optionality(lsm->key_def,
-					   new_format->min_field_count);
-		key_def_update_optionality(lsm->cmp_def,
-					   new_format->min_field_count);
-	}
-
-	/*
-	 * Check if there are unique indexes that are contained
-	 * by other unique indexes. For them, we can skip check
-	 * for duplicates on INSERT. Prefer indexes with higher
-	 * ids for uniqueness check optimization as they are
-	 * likelier to have a "colder" cache.
-	 */
-	for (int i = new_space->index_count - 1; i >= 0; i--) {
-		struct vy_lsm *lsm = vy_lsm(new_space->index[i]);
-		if (!lsm->check_is_unique)
-			continue;
-		for (int j = 0; j < (int)new_space->index_count; j++) {
-			struct vy_lsm *other = vy_lsm(new_space->index[j]);
-			if (other != lsm && other->check_is_unique &&
-			    key_def_contains(lsm->key_def, other->key_def)) {
-				lsm->check_is_unique = false;
-				break;
-			}
-		}
-	}
+	(void)new_space;
 }
 
 static void
@@ -1094,6 +1073,8 @@ vinyl_space_swap_index(struct space *old_space, struct space *new_space,
 	generic_space_swap_index(old_space, new_space,
 				 old_index_id, new_index_id);
 
+	SWAP(old_lsm, new_lsm);
+	SWAP(old_lsm->check_is_unique, new_lsm->check_is_unique);
 	SWAP(old_lsm->mem_format, new_lsm->mem_format);
 	SWAP(old_lsm->mem_format_with_colmask,
 	     new_lsm->mem_format_with_colmask);
@@ -1101,6 +1082,10 @@ vinyl_space_swap_index(struct space *old_space, struct space *new_space,
 	SWAP(old_lsm->opts, new_lsm->opts);
 	key_def_swap(old_lsm->key_def, new_lsm->key_def);
 	key_def_swap(old_lsm->cmp_def, new_lsm->cmp_def);
+
+	/* Update pointer to the primary key. */
+	vy_lsm_update_pk(old_lsm, vy_lsm(old_space->index_map[0]));
+	vy_lsm_update_pk(new_lsm, vy_lsm(new_space->index_map[0]));
 }
 
 static int
@@ -1121,7 +1106,6 @@ vinyl_space_build_secondary_key(struct space *old_space,
 				struct index *new_index)
 {
 	(void)old_space;
-	(void)new_space;
 	/*
 	 * Unlike Memtx, Vinyl does not need building of a secondary index.
 	 * This is true because of two things:
@@ -1142,7 +1126,12 @@ vinyl_space_build_secondary_key(struct space *old_space,
 	 *   Engine::buildSecondaryKey(old_space, new_space, new_index_arg);
 	 *  but aware of three cases mentioned above.
 	 */
-	return vinyl_index_open(new_index);
+	if (vinyl_index_open(new_index) != 0)
+		return -1;
+
+	/* Set pointer to the primary key for the new index. */
+	vy_lsm_update_pk(vy_lsm(new_index), vy_lsm(space_index(new_space, 0)));
+	return 0;
 }
 
 static size_t
diff --git a/src/box/vy_lsm.h b/src/box/vy_lsm.h
index 201b5a56..2ef190ba 100644
--- a/src/box/vy_lsm.h
+++ b/src/box/vy_lsm.h
@@ -332,6 +332,23 @@ vy_lsm_unref(struct vy_lsm *lsm)
 }
 
 /**
+ * Update pointer to the primary key for an LSM tree.
+ * If called for an LSM tree corresponding to a primary
+ * index, this function does nothing.
+ */
+static inline void
+vy_lsm_update_pk(struct vy_lsm *lsm, struct vy_lsm *pk)
+{
+	if (lsm->index_id == 0) {
+		assert(lsm->pk == NULL);
+		return;
+	}
+	vy_lsm_unref(lsm->pk);
+	vy_lsm_ref(pk);
+	lsm->pk = pk;
+}
+
+/**
  * Create a new LSM tree.
  *
  * This function is called when an LSM tree is created
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 5/5] alter: call space_vtab::commit_alter after WAL write
  2018-04-03 17:37 [PATCH 0/5] alter: fix WAL error handling Vladimir Davydov
                   ` (3 preceding siblings ...)
  2018-04-03 17:37 ` [PATCH 4/5] vinyl: do not use space_vtab::commit_alter for preparing new indexes Vladimir Davydov
@ 2018-04-03 17:37 ` Vladimir Davydov
  2018-04-05 20:37   ` Konstantin Osipov
  4 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2018-04-03 17:37 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

space_vtab::commit_alter is implemented only by memtx, which uses it to
delete tuples stored in the space and reset bsize when the primary index
is dropped. Currently, it's called before WAL write and so this can
result in use-after-free in memtx if WAL write fails. To avoid that,
this patch moves invocation of this method after WAL write.

A few things to note about this patch:

 - Space bsize should still be updated before WAL write, when the space
   becomes visible. So now we set it in space_vtab::prepare_alter and
   reset it in drop_primary_key, along with memtx_space::replace.

 - Before this patch, space_vtab::drop_primary_key was not called on
   space truncation, because DropIndex::alter only called it if the
   primary key was absent in the new space, which isn't true in case of
   space truncation. So this patch fixed this check: now we check the id
   of the dropped index directly, and call drop_primary_key if it's 0.

 - To discriminate between primary index rebuild and space truncation in
   space_vtab::commit_alter and make the right decision about whether we
   need to delete tuples stored in the space or not, we use a version
   counter attached to memtx_space. The counter is bumped every time the
   primary index is dropped, so if it differs between the old and new
   spaces in commit_alter, it was either primary index drop or space
   truncation and hence we need to purge the space.

Closes #3289
---
 src/box/alter.cc         | 23 +++--------------
 src/box/memtx_space.c    | 16 +++++-------
 src/box/memtx_space.h    |  6 +++++
 test/box/errinj.result   | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
 test/box/errinj.test.lua | 18 +++++++++++++
 5 files changed, 101 insertions(+), 28 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 174d53fa..06ef2923 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -729,6 +729,8 @@ alter_space_commit(struct trigger *trigger, void *event)
 		op->commit(alter, txn->signature);
 	}
 
+	space_commit_alter(alter->old_space, alter->new_space);
+
 	trigger_run_xc(&on_alter_space, alter->new_space);
 
 	alter->new_space = NULL; /* for alter_space_delete(). */
@@ -864,8 +866,6 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
 	 * The new space is ready. Time to update the space
 	 * cache with it.
 	 */
-	space_commit_alter(alter->old_space, alter->new_space);
-
 	struct space *old_space = space_cache_replace(alter->new_space);
 	(void) old_space;
 	assert(old_space == alter->old_space);
@@ -1026,23 +1026,8 @@ DropIndex::alter_def(struct alter_space * /* alter */)
 void
 DropIndex::alter(struct alter_space *alter)
 {
-	/*
-	 * If it's not the primary key, nothing to do --
-	 * the dropped index didn't exist in the new space
-	 * definition, so does not exist in the created space.
-	 */
-	if (space_index(alter->new_space, 0) != NULL)
-		return;
-	/*
-	 * OK to drop the primary key. Inform the engine about it,
-	 * since it may have to reset handler->replace function,
-	 * so that:
-	 * - DML returns proper errors rather than crashes the
-	 *   program
-	 * - when a new primary key is finally added, the space
-	 *   can be put back online properly.
-	 */
-	space_drop_primary_key(alter->new_space);
+	if (old_index_def->iid == 0)
+		space_drop_primary_key(alter->new_space);
 }
 
 void
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index ec6f6db6..adc81430 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -736,6 +736,8 @@ memtx_space_drop_primary_key(struct space *space)
 {
 	struct memtx_space *memtx_space = (struct memtx_space *)space;
 	memtx_space->replace = memtx_space_replace_no_keys;
+	memtx_space->bsize = 0;
+	memtx_space->version++;
 }
 
 static void
@@ -844,6 +846,8 @@ memtx_space_prepare_alter(struct space *old_space, struct space *new_space)
 	struct memtx_space *old_memtx_space = (struct memtx_space *)old_space;
 	struct memtx_space *new_memtx_space = (struct memtx_space *)new_space;
 	new_memtx_space->replace = old_memtx_space->replace;
+	new_memtx_space->bsize = old_memtx_space->bsize;
+	new_memtx_space->version = old_memtx_space->version;
 	bool is_empty = old_space->index_count == 0 ||
 			index_size(old_space->index[0]) == 0;
 	return space_def_check_compatibility(old_space->def,
@@ -855,17 +859,10 @@ memtx_space_commit_alter(struct space *old_space, struct space *new_space)
 {
 	struct memtx_space *old_memtx_space = (struct memtx_space *)old_space;
 	struct memtx_space *new_memtx_space = (struct memtx_space *)new_space;
-	bool is_empty = new_space->index_count == 0 ||
-			index_size(new_space->index[0]) == 0;
 
-	/*
-	 * Delete all tuples when the last index is dropped
-	 * or the space is truncated.
-	 */
-	if (is_empty)
+	/* Delete all tuples if the primary index was dropped. */
+	if (old_memtx_space->version != new_memtx_space->version)
 		memtx_space_prune(old_space);
-	else
-		new_memtx_space->bsize = old_memtx_space->bsize;
 }
 
 /* }}} DDL */
@@ -937,6 +934,7 @@ memtx_space_new(struct memtx_engine *memtx,
 	tuple_format_unref(format);
 
 	memtx_space->bsize = 0;
+	memtx_space->version = 0;
 	memtx_space->replace = memtx_space_replace_no_keys;
 	return (struct space *)memtx_space;
 }
diff --git a/src/box/memtx_space.h b/src/box/memtx_space.h
index 26d33349..0c4176bc 100644
--- a/src/box/memtx_space.h
+++ b/src/box/memtx_space.h
@@ -43,6 +43,12 @@ struct memtx_space {
 	/* Number of bytes used in memory by tuples in the space. */
 	size_t bsize;
 	/**
+	 * Version of data stored in the space. It is bumped every
+	 * time the primary index is dropped. We use it to detect
+	 * if we need to delete tuples when the space is altered.
+	 */
+	int version;
+	/**
 	 * A pointer to replace function, set to different values
 	 * at different stages of recovery.
 	 */
diff --git a/test/box/errinj.result b/test/box/errinj.result
index 8e4d5742..269de663 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -1093,3 +1093,69 @@ s:drop()
 box.schema.user.revoke('guest', 'read,write,execute','universe')
 ---
 ...
+--
+-- gh-3289: drop/truncate leaves the space in inconsistent
+-- state if WAL write fails.
+--
+s = box.schema.space.create('test')
+---
+...
+_ = s:create_index('pk')
+---
+...
+for i = 1, 10 do s:replace{i} end
+---
+...
+errinj.set('ERRINJ_WAL_IO', true)
+---
+- ok
+...
+s:drop()
+---
+- error: Failed to write to disk
+...
+s:truncate()
+---
+- error: Failed to write to disk
+...
+s:drop()
+---
+- error: Failed to write to disk
+...
+s:truncate()
+---
+- error: Failed to write to disk
+...
+errinj.set('ERRINJ_WAL_IO', false)
+---
+- ok
+...
+for i = 1, 10 do s:replace{i + 10} end
+---
+...
+s:select()
+---
+- - [1]
+  - [2]
+  - [3]
+  - [4]
+  - [5]
+  - [6]
+  - [7]
+  - [8]
+  - [9]
+  - [10]
+  - [11]
+  - [12]
+  - [13]
+  - [14]
+  - [15]
+  - [16]
+  - [17]
+  - [18]
+  - [19]
+  - [20]
+...
+s:drop()
+---
+...
diff --git a/test/box/errinj.test.lua b/test/box/errinj.test.lua
index d97cd81f..c12f61a4 100644
--- a/test/box/errinj.test.lua
+++ b/test/box/errinj.test.lua
@@ -366,3 +366,21 @@ errinj.set("ERRINJ_IPROTO_TX_DELAY", false)
 
 s:drop()
 box.schema.user.revoke('guest', 'read,write,execute','universe')
+
+--
+-- gh-3289: drop/truncate leaves the space in inconsistent
+-- state if WAL write fails.
+--
+s = box.schema.space.create('test')
+_ = s:create_index('pk')
+for i = 1, 10 do s:replace{i} end
+errinj.set('ERRINJ_WAL_IO', true)
+s:drop()
+s:truncate()
+s:drop()
+s:truncate()
+errinj.set('ERRINJ_WAL_IO', false)
+for i = 1, 10 do s:replace{i + 10} end
+s:select()
+
+s:drop()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] memtx: rtree: remove pointless index_vtab::begin_build implementation
  2018-04-03 17:37 ` [PATCH 1/5] memtx: rtree: remove pointless index_vtab::begin_build implementation Vladimir Davydov
@ 2018-04-05 20:25   ` Konstantin Osipov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2018-04-05 20:25 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/04/04 08:18]:
> The rtree is empty when this function is called (in fact, it is called
> right after creating the index), there's no need to purge it.

Pushed.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/5] vinyl: use disk_format in vy_run_rebuild_index
  2018-04-03 17:37 ` [PATCH 3/5] vinyl: use disk_format in vy_run_rebuild_index Vladimir Davydov
@ 2018-04-05 20:25   ` Konstantin Osipov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2018-04-05 20:25 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/04/04 08:18]:
> We read tuples from disk hence we should use disk_format, not
> mem_format. Fix it. While we are at it, let's also update the
> outdated comment to vy_run_rebuild_index.

Pushed.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] alter: call space_vtab::commit_alter after WAL write
  2018-04-03 17:37 ` [PATCH 5/5] alter: call space_vtab::commit_alter after WAL write Vladimir Davydov
@ 2018-04-05 20:37   ` Konstantin Osipov
  2018-04-06 10:59     ` Vladimir Davydov
  0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Osipov @ 2018-04-05 20:37 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/04/04 08:18]:
>  void
>  DropIndex::alter(struct alter_space *alter)
>  {
> -	/*
> -	 * If it's not the primary key, nothing to do --
> -	 * the dropped index didn't exist in the new space
> -	 * definition, so does not exist in the created space.
> -	 */
> -	if (space_index(alter->new_space, 0) != NULL)
> -		return;
> -	/*
> -	 * OK to drop the primary key. Inform the engine about it,
> -	 * since it may have to reset handler->replace function,
> -	 * so that:
> -	 * - DML returns proper errors rather than crashes the
> -	 *   program
> -	 * - when a new primary key is finally added, the space
> -	 *   can be put back online properly.
> -	 */
> -	space_drop_primary_key(alter->new_space);
> +	if (old_index_def->iid == 0)
> +		space_drop_primary_key(alter->new_space);
>  }

Why did you drop the comments?

> @@ -736,6 +736,8 @@ memtx_space_drop_primary_key(struct space *space)
>  {
>  	struct memtx_space *memtx_space = (struct memtx_space *)space;
>  	memtx_space->replace = memtx_space_replace_no_keys;
> +	memtx_space->bsize = 0;
> +	memtx_space->version++;
>  }

Why don't you call it truncate_count?-))
I haven't grown accustomed to the approach, can't we use a
different one?-)

> +--
> +-- gh-3289: drop/truncate leaves the space in inconsistent
> +-- state if WAL write fails.
> +--
> +s = box.schema.space.create('test')
> +---
> +...
> +_ = s:create_index('pk')
> +---
> +...
> +for i = 1, 10 do s:replace{i} end
> +---
> +...
> +errinj.set('ERRINJ_WAL_IO', true)
> +---
> +- ok
> +...
> +s:drop()
> +---
> +- error: Failed to write to disk
> +...
> +s:truncate()
> +---
> +- error: Failed to write to disk
> +...
> +s:drop()
> +---
> +- error: Failed to write to disk
> +...
> +s:truncate()
> +---
> +- error: Failed to write to disk
> +...
> +errinj.set('ERRINJ_WAL_IO', false)
> +---
> +- ok
> +...
> +for i = 1, 10 do s:replace{i + 10} end
> +---
> +...
> +s:select()
> +---
> +- - [1]
> +  - [2]
> +  - [3]
> +  - [4]
> +  - [5]
> +  - [6]
> +  - [7]
> +  - [8]
> +  - [9]
> +  - [10]
> +  - [11]
> +  - [12]
> +  - [13]
> +  - [14]
> +  - [15]
> +  - [16]
> +  - [17]
> +  - [18]
> +  - [19]
> +  - [20]
> +...
> +s:drop()
> +---
> +...
> diff --git a/test/box/errinj.test.lua b/test/box/errinj.test.lua
> index d97cd81f..c12f61a4 100644
> --- a/test/box/errinj.test.lua
> +++ b/test/box/errinj.test.lua
> @@ -366,3 +366,21 @@ errinj.set("ERRINJ_IPROTO_TX_DELAY", false)
>  
>  s:drop()
>  box.schema.user.revoke('guest', 'read,write,execute','universe')
> +
> +--
> +-- gh-3289: drop/truncate leaves the space in inconsistent
> +-- state if WAL write fails.
> +--
> +s = box.schema.space.create('test')
> +_ = s:create_index('pk')
> +for i = 1, 10 do s:replace{i} end
> +errinj.set('ERRINJ_WAL_IO', true)
> +s:drop()
> +s:truncate()
> +s:drop()
> +s:truncate()
> +errinj.set('ERRINJ_WAL_IO', false)
> +for i = 1, 10 do s:replace{i + 10} end
> +s:select()
> +
> +s:drop()

Thank you for a test case.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] alter: call space_vtab::commit_alter after WAL write
  2018-04-05 20:37   ` Konstantin Osipov
@ 2018-04-06 10:59     ` Vladimir Davydov
  0 siblings, 0 replies; 10+ messages in thread
From: Vladimir Davydov @ 2018-04-06 10:59 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

On Thu, Apr 05, 2018 at 11:37:28PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/04/04 08:18]:
> >  void
> >  DropIndex::alter(struct alter_space *alter)
> >  {
> > -	/*
> > -	 * If it's not the primary key, nothing to do --
> > -	 * the dropped index didn't exist in the new space
> > -	 * definition, so does not exist in the created space.
> > -	 */
> > -	if (space_index(alter->new_space, 0) != NULL)
> > -		return;
> > -	/*
> > -	 * OK to drop the primary key. Inform the engine about it,
> > -	 * since it may have to reset handler->replace function,
> > -	 * so that:
> > -	 * - DML returns proper errors rather than crashes the
> > -	 *   program
> > -	 * - when a new primary key is finally added, the space
> > -	 *   can be put back online properly.
> > -	 */
> > -	space_drop_primary_key(alter->new_space);
> > +	if (old_index_def->iid == 0)
> > +		space_drop_primary_key(alter->new_space);
> >  }
> 
> Why did you drop the comments?

Because it's pertinent to memtx only.

> 
> > @@ -736,6 +736,8 @@ memtx_space_drop_primary_key(struct space *space)
> >  {
> >  	struct memtx_space *memtx_space = (struct memtx_space *)space;
> >  	memtx_space->replace = memtx_space_replace_no_keys;
> > +	memtx_space->bsize = 0;
> > +	memtx_space->version++;
> >  }
> 
> Why don't you call it truncate_count?-))

No particular reason. I've nothing against truncate_count.

> I haven't grown accustomed to the approach, can't we use a
> different one?-)

Guess, we need to discuss it f2f.

In short, I don't want to introduce yet another callback for handling
space truncation - it seems natural to do it in space's commit_alter.
And in commit_alter, we need a way to differentiate space truncation vs
rebuild of the primary key. We can't use space bsize for this, as we
currently do, because it is racy to check it after WAL write (there may
have been new insertions into the space after truncation, while we were
waiting for WAL).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-04-06 10:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 17:37 [PATCH 0/5] alter: fix WAL error handling Vladimir Davydov
2018-04-03 17:37 ` [PATCH 1/5] memtx: rtree: remove pointless index_vtab::begin_build implementation Vladimir Davydov
2018-04-05 20:25   ` Konstantin Osipov
2018-04-03 17:37 ` [PATCH 2/5] memtx: don't call begin_buid and end_build for new pk after recovery Vladimir Davydov
2018-04-03 17:37 ` [PATCH 3/5] vinyl: use disk_format in vy_run_rebuild_index Vladimir Davydov
2018-04-05 20:25   ` Konstantin Osipov
2018-04-03 17:37 ` [PATCH 4/5] vinyl: do not use space_vtab::commit_alter for preparing new indexes Vladimir Davydov
2018-04-03 17:37 ` [PATCH 5/5] alter: call space_vtab::commit_alter after WAL write Vladimir Davydov
2018-04-05 20:37   ` Konstantin Osipov
2018-04-06 10:59     ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox