Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion
@ 2020-02-14 19:40 Ilya Kosarev
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 1/4] small: bump small version Ilya Kosarev
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Ilya Kosarev @ 2020-02-14 19:40 UTC (permalink / raw)
  To: tarantool-patches

This patchset is an experiment on reserve tech used for safe truncation
and the demonstration of the possible space:delete() fail on reserve.
The idea of the 3rd patch (memtx: use reserved slab for truncation) was
to prealloc a slab on arena to be used only for truncation tuples.
This approach didn't really solve the problem. After we are getting the
reserved slab with slab_map, we have a choice to put or not to put it
on slab lists. In case we do put it, the problem is that any next
operation, using mempool_alloc, will be able to use our slab, not only
truncation. This will lead to it's fast exhaustion. On the other hand,
if we skip all this lists (as it is done), we will fail at the garbage
collection when trying to free those tuples, as far as their slab won't
be found. See backtrace at
https://gitlab.com/tarantool/tarantool/-/jobs/438082894. It doesn't
look like there is any acceptable way to tune garbage collector
behavior.
In the 4th patch the stress_delete test with errinj is introduced to
show how it may fail on reserve. It uses errinj to simulate the case
where the num of reserved extents for memtx_index_extent_reserve won't
be enough, setting target amount as current amount + 1.

Not aimed to be pushed to master!

Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-3807-safe-alloc-on-truncation
Issue: https://github.com/tarantool/tarantool/issues/3807

Ilya Kosarev (4):
  small: bump small version
  b-tree: return NULL on matras_alloc fail
  memtx: use reserved slab for truncation
  memtx: space:delete() might fail on reserve

 src/box/box.cc                       |  27 ++++++-
 src/box/memtx_space.c                |  15 +++-
 src/box/tuple.c                      |   4 +
 src/lib/core/errinj.h                |   1 +
 src/lib/core/memory.c                |   1 +
 src/lib/salad/bps_tree.h             |  22 +++++-
 src/lib/small                        |   2 +-
 test/box/errinj.result               |   1 +
 test/engine/engine.cfg               |   6 ++
 test/engine/low_memory.lua           |   8 ++
 test/engine/stress_delete.result     | 111 +++++++++++++++++++++++++++
 test/engine/stress_delete.test.lua   |  55 +++++++++++++
 test/engine/stress_truncate.result   | 103 +++++++++++++++++++++++++
 test/engine/stress_truncate.test.lua |  52 +++++++++++++
 14 files changed, 399 insertions(+), 9 deletions(-)
 create mode 100644 test/engine/low_memory.lua
 create mode 100644 test/engine/stress_delete.result
 create mode 100644 test/engine/stress_delete.test.lua
 create mode 100644 test/engine/stress_truncate.result
 create mode 100644 test/engine/stress_truncate.test.lua

-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH 1/4] small: bump small version
  2020-02-14 19:40 [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Ilya Kosarev
@ 2020-02-14 19:40 ` Ilya Kosarev
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 2/4] b-tree: return NULL on matras_alloc fail Ilya Kosarev
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ilya Kosarev @ 2020-02-14 19:40 UTC (permalink / raw)
  To: tarantool-patches

---
 src/lib/small | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lib/small b/src/lib/small
index 3df505017..e9f041589 160000
--- a/src/lib/small
+++ b/src/lib/small
@@ -1 +1 @@
-Subproject commit 3df5050171d040bfe5717b4bddf57da3b312ffe4
+Subproject commit e9f0415891de16b0625fc9d27dde4693a812527d
-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH 2/4] b-tree: return NULL on matras_alloc fail
  2020-02-14 19:40 [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Ilya Kosarev
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 1/4] small: bump small version Ilya Kosarev
@ 2020-02-14 19:40 ` Ilya Kosarev
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 3/4] memtx: use reserved slab for truncation Ilya Kosarev
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ilya Kosarev @ 2020-02-14 19:40 UTC (permalink / raw)
  To: tarantool-patches

In bps_tree_create_leaf we use matras_alloc in case
bps_tree_garbage_pop didn't work out. However it also might not
succeed. Then we need to return NULL instead of dereferencing NULL
pointer.

Part of #3807
---
 src/lib/salad/bps_tree.h | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/lib/salad/bps_tree.h b/src/lib/salad/bps_tree.h
index d28b53f53..d71340143 100644
--- a/src/lib/salad/bps_tree.h
+++ b/src/lib/salad/bps_tree.h
@@ -2147,8 +2147,11 @@ bps_tree_create_leaf(struct bps_tree *tree, bps_tree_block_id_t *id)
 {
 	struct bps_leaf *res = (struct bps_leaf *)
 			       bps_tree_garbage_pop(tree, id);
-	if (!res)
-		res = (struct bps_leaf *)matras_alloc(&tree->matras, id);
+	if (!res) {
+		res = (struct bps_leaf *) matras_alloc(&tree->matras, id);
+		if (!res)
+			return NULL;
+	}
 	res->header.type = BPS_TREE_BT_LEAF;
 	tree->leaf_count++;
 	return res;
@@ -2162,8 +2165,11 @@ bps_tree_create_inner(struct bps_tree *tree, bps_tree_block_id_t *id)
 {
 	struct bps_inner *res = (struct bps_inner *)
 				bps_tree_garbage_pop(tree, id);
-	if (!res)
-		res = (struct bps_inner *)matras_alloc(&tree->matras, id);
+	if (!res) {
+		res = (struct bps_inner *) matras_alloc(&tree->matras, id);
+		if (!res)
+			return NULL;
+	}
 	res->header.type = BPS_TREE_BT_INNER;
 	tree->inner_count++;
 	return res;
@@ -3472,6 +3478,8 @@ bps_tree_process_insert_leaf(struct bps_tree *tree,
 	}
 	bps_tree_block_id_t new_block_id = (bps_tree_block_id_t)(-1);
 	struct bps_leaf *new_leaf = bps_tree_create_leaf(tree, &new_block_id);
+	if (!new_leaf)
+		return -1;
 
 	leaf_path_elem->block = (struct bps_leaf *)
 	bps_tree_touch_block(tree, leaf_path_elem->block_id);
@@ -3710,6 +3718,8 @@ bps_tree_process_insert_leaf(struct bps_tree *tree,
 		bps_tree_block_id_t new_root_id = (bps_tree_block_id_t)(-1);
 		struct bps_inner *new_root = bps_tree_create_inner(tree,
 				&new_root_id);
+		if (!new_root)
+			return -1;
 		new_root->header.size = 2;
 		new_root->child_ids[0] = tree->root_id;
 		new_root->child_ids[1] = new_block_id;
@@ -3835,6 +3845,8 @@ bps_tree_process_insert_inner(struct bps_tree *tree,
 	bps_tree_block_id_t new_block_id = (bps_tree_block_id_t)(-1);
 	struct bps_inner *new_inner = bps_tree_create_inner(tree,
 			&new_block_id);
+	if (!new_inner)
+		return -1;
 
 	new_inner->header.size = 0;
 	struct bps_inner_path_elem new_path_elem;
@@ -4030,6 +4042,8 @@ bps_tree_process_insert_inner(struct bps_tree *tree,
 		bps_tree_block_id_t new_root_id = (bps_tree_block_id_t)(-1);
 		struct bps_inner *new_root =
 			bps_tree_create_inner(tree, &new_root_id);
+		if (!new_root)
+			return -1;
 		new_root->header.size = 2;
 		new_root->child_ids[0] = tree->root_id;
 		new_root->child_ids[1] = new_block_id;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH 3/4] memtx: use reserved slab for truncation
  2020-02-14 19:40 [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Ilya Kosarev
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 1/4] small: bump small version Ilya Kosarev
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 2/4] b-tree: return NULL on matras_alloc fail Ilya Kosarev
@ 2020-02-14 19:40 ` Ilya Kosarev
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 4/4] memtx: space:delete() might fail on reserve Ilya Kosarev
  2020-02-14 22:48 ` [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Vladislav Shpilevoy
  4 siblings, 0 replies; 9+ messages in thread
From: Ilya Kosarev @ 2020-02-14 19:40 UTC (permalink / raw)
  To: tarantool-patches

Trying to perform space:truncate() while reaching memtx_memory limit we
could experience slab allocator failure. This behavior seems to be
quite surprising for users. Now we are using specifically preallocated
on arena slab for truncation tuples in case we are running out of
limit.
---
 src/box/box.cc                       |  27 ++++++-
 src/box/tuple.c                      |   4 ++
 src/lib/core/memory.c                |   1 +
 test/engine/engine.cfg               |   3 +
 test/engine/low_memory.lua           |   8 +++
 test/engine/stress_truncate.result   | 103 +++++++++++++++++++++++++++
 test/engine/stress_truncate.test.lua |  52 ++++++++++++++
 7 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 test/engine/low_memory.lua
 create mode 100644 test/engine/stress_truncate.result
 create mode 100644 test/engine/stress_truncate.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 1b2b27d61..eea4269e7 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1321,9 +1321,34 @@ space_truncate(struct space *space)
 	ops_buf_end = mp_encode_uint(ops_buf_end, 1);
 	assert(ops_buf_end < buf + buf_size);
 
+	struct txn *txn = NULL;
+	struct txn_savepoint *txn_svp = NULL;
+	if (!box_txn()) {
+		txn = txn_begin();
+		if (txn == NULL)
+			diag_raise();
+	} else {
+		txn_svp = box_txn_savepoint();
+		if (txn_svp == NULL)
+			diag_raise();
+	}
+	struct space *truncate_space = space_cache_find_xc(BOX_TRUNCATE_ID);
+	((struct memtx_engine *)truncate_space->engine)->arena.truncating = true;
 	if (box_upsert(BOX_TRUNCATE_ID, 0, tuple_buf, tuple_buf_end,
-		       ops_buf, ops_buf_end, 0, NULL) != 0)
+		       ops_buf, ops_buf_end, 0, NULL) != 0) {
+		((struct memtx_engine *)truncate_space->engine)->arena.truncating = false;
+		if (txn != NULL)
+			txn_rollback(txn);
+		else
+			box_txn_rollback_to_savepoint(txn_svp);
+		fiber_gc();
 		diag_raise();
+	}
+	((struct memtx_engine *)truncate_space->engine)->arena.truncating = false;
+	if (txn != NULL && txn_commit(txn) != 0) {
+		fiber_gc();
+		diag_raise();
+	}
 }
 
 int
diff --git a/src/box/tuple.c b/src/box/tuple.c
index 4d676b090..0e8aeb92e 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -342,6 +342,10 @@ tuple_arena_create(struct slab_arena *arena, struct quota *quota,
 	say_info("mapping %zu bytes for %s tuple arena...", prealloc,
 		 arena_name);
 
+	if (!strncmp(arena_name, "memtx", 5))
+		arena->trunc_alloc = slab_size;
+	else
+		arena->trunc_alloc = 0;
 	if (slab_arena_create(arena, quota, prealloc, slab_size, flags) != 0) {
 		if (errno == ENOMEM) {
 			panic("failed to preallocate %zu bytes: Cannot "\
diff --git a/src/lib/core/memory.c b/src/lib/core/memory.c
index f236c1887..0ce977d38 100644
--- a/src/lib/core/memory.c
+++ b/src/lib/core/memory.c
@@ -42,6 +42,7 @@ memory_init()
 	quota_init(&runtime_quota, QUOTA_MAX);
 
 	/* No limit on the runtime memory. */
+	runtime.trunc_alloc = 0;
 	slab_arena_create(&runtime, &runtime_quota, 0,
 			  SLAB_SIZE, SLAB_ARENA_PRIVATE);
 }
diff --git a/test/engine/engine.cfg b/test/engine/engine.cfg
index f1e7de274..4b9eeb5f5 100644
--- a/test/engine/engine.cfg
+++ b/test/engine/engine.cfg
@@ -5,6 +5,9 @@
     },
     "func_index.test.lua": {
         "memtx": {"engine": "memtx"}
+     },
+    "stress_truncate.test.lua": {
+        "memtx": {"engine": "memtx"}
      }
 }
 
diff --git a/test/engine/low_memory.lua b/test/engine/low_memory.lua
new file mode 100644
index 000000000..46fd26d1b
--- /dev/null
+++ b/test/engine/low_memory.lua
@@ -0,0 +1,8 @@
+#!/usr/bin/env tarantool
+os = require('os')
+box.cfg({
+    listen              = os.getenv("LISTEN"),
+    memtx_memory        = 32 * 1024 * 1024,
+})
+
+require('console').listen(os.getenv('ADMIN'))
diff --git a/test/engine/stress_truncate.result b/test/engine/stress_truncate.result
new file mode 100644
index 000000000..c5308a0ac
--- /dev/null
+++ b/test/engine/stress_truncate.result
@@ -0,0 +1,103 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+
+test_run:cmd("create server master with script='engine/low_memory.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd("switch master")
+ | ---
+ | - true
+ | ...
+
+
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function create_space(name)
+    local space = box.schema.create_space(name)
+    space:format({
+        { name = "id",  type = "unsigned" },
+        { name = "val", type = "str" }
+    })
+    space:create_index('primary', { parts = { 'id' } })
+    return space
+end;
+ | ---
+ | ...
+
+function insert(space, i)
+    space:insert({ i, string.rep(string.char(32 + math.random(127-32)), math.random(1024)) })
+end;
+ | ---
+ | ...
+
+function fill_space(space, start)
+    local _, err = nil
+    local i = start
+    while err == nil do _, err = pcall(insert, space, i) i = i + 1 end
+end;
+ | ---
+ | ...
+
+function stress_truncation(i, spaces)
+    local res, space = pcall(create_space, 'test' .. tostring(i))
+    if res then spaces[i] = space return end
+    fill_space(box.space.test, box.space.test:len())
+    for _, s in pairs(spaces) do s:truncate() end
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+
+
+_ = create_space('test')
+ | ---
+ | ...
+for i = 0, 27000 do insert(box.space.test, i) end
+ | ---
+ | ...
+
+spaces = {}
+ | ---
+ | ...
+counter = 0
+ | ---
+ | ...
+status = true
+ | ---
+ | ...
+res = nil
+ | ---
+ | ...
+while counter < 80000 do status, res = pcall(stress_truncation, counter, spaces) counter = counter + 1 end
+ | ---
+ | ...
+status
+ | ---
+ | - true
+ | ...
+res
+ | ---
+ | - null
+ | ...
+
+-- Cleanup.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:drop_cluster({'master'})
+ | ---
+ | ...
diff --git a/test/engine/stress_truncate.test.lua b/test/engine/stress_truncate.test.lua
new file mode 100644
index 000000000..761b2a9c2
--- /dev/null
+++ b/test/engine/stress_truncate.test.lua
@@ -0,0 +1,52 @@
+test_run = require('test_run').new()
+
+
+test_run:cmd("create server master with script='engine/low_memory.lua'")
+test_run:cmd('start server master')
+test_run:cmd("switch master")
+
+
+test_run:cmd("setopt delimiter ';'")
+function create_space(name)
+    local space = box.schema.create_space(name)
+    space:format({
+        { name = "id",  type = "unsigned" },
+        { name = "val", type = "str" }
+    })
+    space:create_index('primary', { parts = { 'id' } })
+    return space
+end;
+
+function insert(space, i)
+    space:insert({ i, string.rep(string.char(32 + math.random(127-32)), math.random(1024)) })
+end;
+
+function fill_space(space, start)
+    local _, err = nil
+    local i = start
+    while err == nil do _, err = pcall(insert, space, i) i = i + 1 end
+end;
+
+function stress_truncation(i, spaces)
+    local res, space = pcall(create_space, 'test' .. tostring(i))
+    if res then spaces[i] = space return end
+    fill_space(box.space.test, box.space.test:len())
+    for _, s in pairs(spaces) do s:truncate() end
+end;
+test_run:cmd("setopt delimiter ''");
+
+
+_ = create_space('test')
+for i = 0, 27000 do insert(box.space.test, i) end
+
+spaces = {}
+counter = 0
+status = true
+res = nil
+while counter < 80000 do status, res = pcall(stress_truncation, counter, spaces) counter = counter + 1 end
+status
+res
+
+-- Cleanup.
+test_run:cmd('switch default')
+test_run:drop_cluster({'master'})
-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Tarantool-patches] [PATCH 4/4] memtx: space:delete() might fail on reserve
  2020-02-14 19:40 [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Ilya Kosarev
                   ` (2 preceding siblings ...)
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 3/4] memtx: use reserved slab for truncation Ilya Kosarev
@ 2020-02-14 19:40 ` Ilya Kosarev
  2020-02-14 22:48 ` [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Vladislav Shpilevoy
  4 siblings, 0 replies; 9+ messages in thread
From: Ilya Kosarev @ 2020-02-14 19:40 UTC (permalink / raw)
  To: tarantool-patches

---
 src/box/memtx_space.c              |  15 +++-
 src/lib/core/errinj.h              |   1 +
 test/box/errinj.result             |   1 +
 test/engine/engine.cfg             |   3 +
 test/engine/stress_delete.result   | 111 +++++++++++++++++++++++++++++
 test/engine/stress_delete.test.lua |  55 ++++++++++++++
 6 files changed, 183 insertions(+), 3 deletions(-)
 create mode 100644 test/engine/stress_delete.result
 create mode 100644 test/engine/stress_delete.test.lua

diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index 6ef84e045..7dc0a3282 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -260,9 +260,18 @@ memtx_space_replace_all_keys(struct space *space, struct tuple *old_tuple,
 	 * Ensure we have enough slack memory to guarantee
 	 * successful statement-level rollback.
 	 */
-	if (memtx_index_extent_reserve(memtx, new_tuple != NULL ?
-				       RESERVE_EXTENTS_BEFORE_REPLACE :
-				       RESERVE_EXTENTS_BEFORE_DELETE) != 0)
+	int reserve_extents_num = new_tuple != NULL ?
+				  RESERVE_EXTENTS_BEFORE_REPLACE :
+				  RESERVE_EXTENTS_BEFORE_DELETE;
+	ERROR_INJECT(ERRINJ_RESERVE_EXTENTS_BEFORE_DELETE, {
+		/**
+		 * Set huge number of needed reserved extents to
+		 * provoke delete failure.
+		 */
+		if (new_tuple == NULL)
+			reserve_extents_num = memtx->num_reserved_extents + 1;
+	});
+	if (memtx_index_extent_reserve(memtx, reserve_extents_num) != 0)
 		return -1;
 
 	uint32_t i = 0;
diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
index 672da2119..2d3331cfa 100644
--- a/src/lib/core/errinj.h
+++ b/src/lib/core/errinj.h
@@ -135,6 +135,7 @@ struct errinj {
 	_(ERRINJ_COIO_SENDFILE_CHUNK, ERRINJ_INT, {.iparam = -1}) \
 	_(ERRINJ_SWIM_FD_ONLY, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \
+	_(ERRINJ_RESERVE_EXTENTS_BEFORE_DELETE, ERRINJ_BOOL, {.bparam = false}) \
 
 ENUM0(errinj_id, ERRINJ_LIST);
 extern struct errinj errinjs[];
diff --git a/test/box/errinj.result b/test/box/errinj.result
index f043c6689..1f99470e3 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -63,6 +63,7 @@ evals
   - ERRINJ_RELAY_SEND_DELAY: false
   - ERRINJ_RELAY_TIMEOUT: 0
   - ERRINJ_REPLICA_JOIN_DELAY: false
+  - ERRINJ_RESERVE_EXTENTS_BEFORE_DELETE: false
   - ERRINJ_SIO_READ_MAX: -1
   - ERRINJ_SNAP_COMMIT_DELAY: false
   - ERRINJ_SNAP_WRITE_DELAY: false
diff --git a/test/engine/engine.cfg b/test/engine/engine.cfg
index 4b9eeb5f5..c8c6fb130 100644
--- a/test/engine/engine.cfg
+++ b/test/engine/engine.cfg
@@ -6,6 +6,9 @@
     "func_index.test.lua": {
         "memtx": {"engine": "memtx"}
      },
+    "stress_delete.test.lua": {
+        "memtx": {"engine": "memtx"}
+     },
     "stress_truncate.test.lua": {
         "memtx": {"engine": "memtx"}
      }
diff --git a/test/engine/stress_delete.result b/test/engine/stress_delete.result
new file mode 100644
index 000000000..bf92c780b
--- /dev/null
+++ b/test/engine/stress_delete.result
@@ -0,0 +1,111 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+
+test_run:cmd("create server master with script='engine/low_memory.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd("switch master")
+ | ---
+ | - true
+ | ...
+
+
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function create_space(name)
+    local space = box.schema.create_space(name)
+    space:format({
+        { name = "id",  type = "unsigned" },
+        { name = "val", type = "str" }
+    })
+    space:create_index('primary', { parts = { 'id' } })
+    return space
+end;
+ | ---
+ | ...
+
+function insert(space, i)
+    space:insert({ i, string.rep(string.char(32 + math.random(127-32)), math.random(1024)) })
+end;
+ | ---
+ | ...
+
+function fill_space(space, start)
+    local _, err = nil
+    local i = start
+    while err == nil do _, err = pcall(insert, space, i) i = i + 1 end
+end;
+ | ---
+ | ...
+
+function stress_deletion(i, spaces)
+    local res, space = pcall(create_space, 'test' .. tostring(i))
+    if res then spaces[i] = space return end
+    fill_space(box.space.test, box.space.test:len())
+    for _, s in pairs(spaces) do fill_space(s, s:len()) end
+    box.space.test:delete(box.space.test:len() - 1)
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+
+
+_ = create_space('test')
+ | ---
+ | ...
+for i = 0, 27000 do insert(box.space.test, i) end
+ | ---
+ | ...
+
+spaces = {}
+ | ---
+ | ...
+counter = 0
+ | ---
+ | ...
+status = true
+ | ---
+ | ...
+res = nil
+ | ---
+ | ...
+errinj = box.error.injection
+ | ---
+ | ...
+errinj.set('ERRINJ_RESERVE_EXTENTS_BEFORE_DELETE', true)
+ | ---
+ | - ok
+ | ...
+while counter < 1400 do status, res = pcall(stress_deletion, counter, spaces) counter = counter + 1 end
+ | ---
+ | ...
+status
+ | ---
+ | - true
+ | ...
+res
+ | ---
+ | - null
+ | ...
+
+-- Cleanup.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:drop_cluster({'master'})
+ | ---
+ | ...
diff --git a/test/engine/stress_delete.test.lua b/test/engine/stress_delete.test.lua
new file mode 100644
index 000000000..b9014ece5
--- /dev/null
+++ b/test/engine/stress_delete.test.lua
@@ -0,0 +1,55 @@
+test_run = require('test_run').new()
+
+
+test_run:cmd("create server master with script='engine/low_memory.lua'")
+test_run:cmd('start server master')
+test_run:cmd("switch master")
+
+
+test_run:cmd("setopt delimiter ';'")
+function create_space(name)
+    local space = box.schema.create_space(name)
+    space:format({
+        { name = "id",  type = "unsigned" },
+        { name = "val", type = "str" }
+    })
+    space:create_index('primary', { parts = { 'id' } })
+    return space
+end;
+
+function insert(space, i)
+    space:insert({ i, string.rep(string.char(32 + math.random(127-32)), math.random(1024)) })
+end;
+
+function fill_space(space, start)
+    local _, err = nil
+    local i = start
+    while err == nil do _, err = pcall(insert, space, i) i = i + 1 end
+end;
+
+function stress_deletion(i, spaces)
+    local res, space = pcall(create_space, 'test' .. tostring(i))
+    if res then spaces[i] = space return end
+    fill_space(box.space.test, box.space.test:len())
+    for _, s in pairs(spaces) do fill_space(s, s:len()) end
+    box.space.test:delete(box.space.test:len() - 1)
+end;
+test_run:cmd("setopt delimiter ''");
+
+
+_ = create_space('test')
+for i = 0, 27000 do insert(box.space.test, i) end
+
+spaces = {}
+counter = 0
+status = true
+res = nil
+errinj = box.error.injection
+errinj.set('ERRINJ_RESERVE_EXTENTS_BEFORE_DELETE', true)
+while counter < 1400 do status, res = pcall(stress_deletion, counter, spaces) counter = counter + 1 end
+status
+res
+
+-- Cleanup.
+test_run:cmd('switch default')
+test_run:drop_cluster({'master'})
-- 
2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion
  2020-02-14 19:40 [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Ilya Kosarev
                   ` (3 preceding siblings ...)
  2020-02-14 19:40 ` [Tarantool-patches] [PATCH 4/4] memtx: space:delete() might fail on reserve Ilya Kosarev
@ 2020-02-14 22:48 ` Vladislav Shpilevoy
  2020-02-15 15:33   ` Vladislav Shpilevoy
  4 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-02-14 22:48 UTC (permalink / raw)
  To: Ilya Kosarev, tarantool-patches

Thanks for the patchset!

On 14/02/2020 20:40, Ilya Kosarev wrote:
> This patchset is an experiment on reserve tech used for safe truncation
> and the demonstration of the possible space:delete() fail on reserve.
> The idea of the 3rd patch (memtx: use reserved slab for truncation) was
> to prealloc a slab on arena to be used only for truncation tuples.
> This approach didn't really solve the problem. After we are getting the
> reserved slab with slab_map, we have a choice to put or not to put it
> on slab lists. In case we do put it, the problem is that any next
> operation, using mempool_alloc, will be able to use our slab, not only
> truncation. This will lead to it's fast exhaustion. On the other hand,
> if we skip all this lists (as it is done), we will fail at the garbage
> collection when trying to free those tuples, as far as their slab won't
> be found. See backtrace at
> https://gitlab.com/tarantool/tarantool/-/jobs/438082894. It doesn't
> look like there is any acceptable way to tune garbage collector
> behavior.
> In the 4th patch the stress_delete test with errinj is introduced to
> show how it may fail on reserve. It uses errinj to simulate the case
> where the num of reserved extents for memtx_index_extent_reserve won't
> be enough, setting target amount as current amount + 1.
> 
> Not aimed to be pushed to master!
> 
> Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-3807-safe-alloc-on-truncation
> Issue: https://github.com/tarantool/tarantool/issues/3807

This is not a correct link. You pushed this patchset into
https://github.com/tarantool/tarantool/tree/i.kosarev/gh-3807-reserve-on-truncation

Overall the patchset looks ... ghm, I would say not really good. It
consists of crutches more than completely. I don't think that approach
works, if this is what Kostja proposed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion
  2020-02-14 22:48 ` [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Vladislav Shpilevoy
@ 2020-02-15 15:33   ` Vladislav Shpilevoy
  2020-02-15 16:34     ` Konstantin Osipov
  0 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-02-15 15:33 UTC (permalink / raw)
  To: Ilya Kosarev, tarantool-patches

Here is what Kostja said, but somewhy without CCing the
mailing list, and my answers to it inlined:

> This is the first approximation of what I have proposed at best. 
> First of all, the mechanism should not be truncation specific. It should be usable by all subsystems that require emergency memory.
> 
> Second, there is no reason to patch small. All the patch needs to do is something along these lines:
> 
> 
> 1) in memtx_init, reserve the emergency slab.
> 
> 2) in memtx_tuple_alloc, refuse with error if there is no emergency slab
> 
> 3) in memtx_tuple_free, try to reserve emergency slab (replenish it) if it is empty

All of this adds code and conditions to a hot path. The way with quota
enabling/disabling works only in a rare case, when delete() fails due
to OOM, or when truncate is called, which is not often.

> 4) at start of truncate, or wherever we need emergency memory, release emergency slab. Simply return it to arena.

Once you returned it, all the other operations will be able to take
and fill it. Such as insertions. In the first truncate or delete
didn't free anything, or freed not enough to fit a new truncate
tuple here, there is no more a reserved slab for a next delete/truncate.
So your proposal does not seem to change anything.

> No need for special members or code for this outside small.
> 
> We could generalize this idea by moving all system spaces to its own arena.
> 
> On Sat, Feb 15, 2020, 01:48 Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
>     Thanks for the patchset!
> 
>     On 14/02/2020 20:40, Ilya Kosarev wrote:
>     > This patchset is an experiment on reserve tech used for safe truncation
>     > and the demonstration of the possible space:delete() fail on reserve.
>     > The idea of the 3rd patch (memtx: use reserved slab for truncation) was
>     > to prealloc a slab on arena to be used only for truncation tuples.
>     > This approach didn't really solve the problem. After we are getting the
>     > reserved slab with slab_map, we have a choice to put or not to put it
>     > on slab lists. In case we do put it, the problem is that any next
>     > operation, using mempool_alloc, will be able to use our slab, not only
>     > truncation. This will lead to it's fast exhaustion. On the other hand,
>     > if we skip all this lists (as it is done), we will fail at the garbage
>     > collection when trying to free those tuples, as far as their slab won't
>     > be found. See backtrace at
>     > https://gitlab.com/tarantool/tarantool/-/jobs/438082894. It doesn't
>     > look like there is any acceptable way to tune garbage collector
>     > behavior.
>     > In the 4th patch the stress_delete test with errinj is introduced to
>     > show how it may fail on reserve. It uses errinj to simulate the case
>     > where the num of reserved extents for memtx_index_extent_reserve won't
>     > be enough, setting target amount as current amount + 1.
>     >
>     > Not aimed to be pushed to master!
>     >
>     > Branch: https://github.com/tarantool/tarantool/tree/i.kosarev/gh-3807-safe-alloc-on-truncation
>     > Issue: https://github.com/tarantool/tarantool/issues/3807
> 
>     This is not a correct link. You pushed this patchset into
>     https://github.com/tarantool/tarantool/tree/i.kosarev/gh-3807-reserve-on-truncation
> 
>     Overall the patchset looks ... ghm, I would say not really good. It
>     consists of crutches more than completely. I don't think that approach
>     works, if this is what Kostja proposed.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion
  2020-02-15 15:33   ` Vladislav Shpilevoy
@ 2020-02-15 16:34     ` Konstantin Osipov
  2020-02-16 15:41       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 9+ messages in thread
From: Konstantin Osipov @ 2020-02-15 16:34 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/02/15 19:24]:
> Here is what Kostja said, but somewhy without CCing the
> mailing list, and my answers to it inlined:
> 
> > This is the first approximation of what I have proposed at best. 
> > First of all, the mechanism should not be truncation specific. It should be usable by all subsystems that require emergency memory.
> > 
> > Second, there is no reason to patch small. All the patch needs to do is something along these lines:
> > 
> > 
> > 1) in memtx_init, reserve the emergency slab.
> > 
> > 2) in memtx_tuple_alloc, refuse with error if there is no emergency slab
> > 
> > 3) in memtx_tuple_free, try to reserve emergency slab (replenish it) if it is empty
> 
> All of this adds code and conditions to a hot path. The way with quota
> enabling/disabling works only in a rare case, when delete() fails due
> to OOM, or when truncate is called, which is not often.

First of all, the way with quota simply doesn't work. You expect
the quota to shrink back, but it never does. So you simply run out
of all existing memory. 

If it worked, the fix would to reduce the amount of available
quota by 1 slab at start, and change quota before truncate, 
and then return the quota back in its place. 

My main point: under no circumstances tarantool should go beyond
the amount of memory set by quota. If we need to reserve
memory/quota for emergency, it's fine, let's do it at start,
but going beyond is not acceptable.

Second, yes, it does add a branch to the hot path. Same as in
memtx_index_extent_reserve().  This will have no impact on performance profile -
feel free to check. I think, however, given the 10% performance
regression in tuple_format, your time optimizing performance will
be better spent elsewhere.

> > 4) at start of truncate, or wherever we need emergency memory, release emergency slab. Simply return it to arena.
> 
> Once you returned it, all the other operations will be able to take
> and fill it. Such as insertions. In the first truncate or delete
> didn't free anything, or freed not enough to fit a new truncate
> tuple here, there is no more a reserved slab for a next delete/truncate.
> So your proposal does not seem to change anything.

Ehm, I did not fully explain the proposal. It also assumes there is 
"emergency mode" flag set at start of truncate and cleared at end,
and if there is no reserve slab and there is an emergence flag
set, nothing can allocate memory.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion
  2020-02-15 16:34     ` Konstantin Osipov
@ 2020-02-16 15:41       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 9+ messages in thread
From: Vladislav Shpilevoy @ 2020-02-16 15:41 UTC (permalink / raw)
  To: Konstantin Osipov, Ilya Kosarev, tarantool-patches

We discussed the ticket and the patches verbally.

On 15/02/2020 17:34, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/02/15 19:24]:
>> Here is what Kostja said, but somewhy without CCing the
>> mailing list, and my answers to it inlined:
>>
>>> This is the first approximation of what I have proposed at best. 
>>> First of all, the mechanism should not be truncation specific. It should be usable by all subsystems that require emergency memory.
>>>
>>> Second, there is no reason to patch small. All the patch needs to do is something along these lines:
>>>
>>>
>>> 1) in memtx_init, reserve the emergency slab.
>>>
>>> 2) in memtx_tuple_alloc, refuse with error if there is no emergency slab
>>>
>>> 3) in memtx_tuple_free, try to reserve emergency slab (replenish it) if it is empty
>>
>> All of this adds code and conditions to a hot path. The way with quota
>> enabling/disabling works only in a rare case, when delete() fails due
>> to OOM, or when truncate is called, which is not often.
> 
> First of all, the way with quota simply doesn't work. You expect
> the quota to shrink back, but it never does. So you simply run out
> of all existing memory. 

This is solvable. Only normal slabs are never returned. 'Huge' slabs
are. When quota is disabled, we can allocate all new data in 'huge'
slabs. Then quota should shrink back when these slabs are freed.

> If it worked, the fix would to reduce the amount of available
> quota by 1 slab at start, and change quota before truncate, 
> and then return the quota back in its place. 

This does not really work as simple as you wanted. If truncate would
not free anything, you won't be able to return the quota back. Because
it is occupied by a tuple in _truncate.

> My main point: under no circumstances tarantool should go beyond
> the amount of memory set by quota. If we need to reserve
> memory/quota for emergency, it's fine, let's do it at start,
> but going beyond is not acceptable.

Ok, we also discussed with you that we could allocate one special
slab in the beginning, and use it only for special cases like
_truncate tuples and :delete() reservations. But this also does
not fix the issue completely, only makes it harder to stumble into
it. Besides, this adds notable complexity to memtx tuple allocation
and freeing.

> Second, yes, it does add a branch to the hot path. Same as in
> memtx_index_extent_reserve().  This will have no impact on performance profile -
> feel free to check. I think, however, given the 10% performance
> regression in tuple_format, your time optimizing performance will
> be better spent elsewhere.

Having one condition on a hot path is not a justification for
adding more conditions and complexity.

>>> 4) at start of truncate, or wherever we need emergency memory, release emergency slab. Simply return it to arena.
>>
>> Once you returned it, all the other operations will be able to take
>> and fill it. Such as insertions. In the first truncate or delete
>> didn't free anything, or freed not enough to fit a new truncate
>> tuple here, there is no more a reserved slab for a next delete/truncate.
>> So your proposal does not seem to change anything.
> 
> Ehm, I did not fully explain the proposal. It also assumes there is 
> "emergency mode" flag set at start of truncate and cleared at end,
> and if there is no reserve slab and there is an emergence flag
> set, nothing can allocate memory.

Yeah, the problem is the same as with reserving quota instead of a
slab. If truncate/delete didn't free anything, the 'emergency'
slab is already taken, and can't be freed.

Here is a summary of what we discussed.

Lets don't change anything about truncate(). Or make it work using
deletes + periodically try to execute a real truncate again. Or
drop _truncate system space. Or use 'huge' slabs idea for _truncate
tuples.

Talking of delete, it looks like the problem is related to a
leak/bug somewhere in btree. (Probably this is it:
https://github.com/tarantool/tarantool/issues/4780).

The thing is that :delete() should never fail in memtx. It is
guaranteed by memtx_index_extent_reserve() in
memtx_space_replace_all_keys(). Indeed, consider an example. Assume,
that we insert tuples into a tree. It is getting filled, and calls
memtx_index_extent_reserve() with 16 blocks each time. When there is
not enough memory for 16 blocks, it is assumed, that there is enough
for 8 blocks.
So memtx_index_extent_reserve() for REPLACE stops working, but
memtx_index_extent_reserve() for DELETE works, because needs only 8
blocks. Moreover, even if DELETE didn't free anything, the tree is
not changed, and these 8 blocks are still available for a next DELETE
attempt.

This was the initial idea with that reserve() call. A problem may be
with a leak, or with the assumption, that not enough for 16 blocks !=
enough for 8 blocks. And it is possible to reach a situation, when both
REPLACE and DELETE can't reserve anything.

I think that firstly we need a stable reproducer. Because the tests
on the branch seems to fail only at insertion. On deletion they fail
artificially via errinj. Perhaps. Because I tried to delete that
errinj, and nothing changed. It means, that they don't really depend
on why memtx_index_extent_reserve() can fail. And this is bad.

Once we have a reproducer, without errinj, we can see at the real
reason why does not it work. Maybe it would be enough just to increase
RESERVE_EXTENTS_BEFORE_REPLACE. So as after any insertion/update there
would be still reserved at least RESERVE_EXTENTS_BEFORE_DELETE.
Maybe we will find a leak in bps. A fair reproducer will show.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-02-16 15:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 19:40 [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 1/4] small: bump small version Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 2/4] b-tree: return NULL on matras_alloc fail Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 3/4] memtx: use reserved slab for truncation Ilya Kosarev
2020-02-14 19:40 ` [Tarantool-patches] [PATCH 4/4] memtx: space:delete() might fail on reserve Ilya Kosarev
2020-02-14 22:48 ` [Tarantool-patches] [PATCH 0/4] Safe truncation and deletion Vladislav Shpilevoy
2020-02-15 15:33   ` Vladislav Shpilevoy
2020-02-15 16:34     ` Konstantin Osipov
2020-02-16 15:41       ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox