[Tarantool-patches] [PATCH v6] memtx: fix out of memory handling for rtree

Olga Arkhangelskaia arkholga at tarantool.org
Mon Feb 10 10:47:12 MSK 2020


When tarantool tries to recover rtree from a snapshot and memtx_memory value
is lower than it has been when the snapshot was created, server suffers from
segmentation fault. This happens because there is no out of memory error
handling in rtree lib. In another words, we do not check the result of
malloc operation.
The execution flow in case of recovery uses different way and the secondary
keys are build in batches. That way has no checks and reservations.
The patch adds memtx_rtree_index_reserve implementation to make sure that any
memory allocation in rtree will fail. Although this gives us no additional
optimization as in case of memtx_tree, the memory reservation prevents
tarantool from segmentation fault. If there is not enough memory to be reserved
server will fail gently with the "Failed to allocate" error message.

Closes #4619
---
Branch: https://github.com/tarantool/tarantool/tree/OKriw/gh-RTREE-doesnt-handle-OOM-properly
Issue:https://github.com/tarantool/tarantool/issues/4619

v1:https://lists.tarantool.org/pipermail/tarantool-patches/2019-November/012391.html
v2:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/012958.html
v3:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/013182.html
v4:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/013230.html
v5:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/013340.html

Changes in v2:
- changed way of error handling, now we reserve pages before go to rtree lib
- changed test
- changed commit msg

Changes in v3:
- added memtx_rtree_build_next function
- memory reservation is moved to memtx_rtree_build_next

Changes in v4:
- added index reservation in build_next
- added memtx_rtree_reserve implementation

Changes in v5:
- added error injection
- test is placed in errorinj.test.lua
- test is much faster now
- fixed indentation errors 

Changes in v6:
- fixed indentation
- fixed test output
 
 src/box/index.cc           |   2 +
 src/box/memtx_engine.h     |  12 ++
 src/box/memtx_rtree.c      |  19 ++-
 src/box/memtx_space.c      |  12 --
 src/lib/core/errinj.h      |   1 +
 test/box/errinj.result     | 231 ++++++++++++++++++++++++++-----------
 test/box/errinj.test.lua   |  27 +++++
 test/box/lua/cfg_rtree.lua |   8 ++
 8 files changed, 233 insertions(+), 79 deletions(-)
 create mode 100644 test/box/lua/cfg_rtree.lua

diff --git a/src/box/index.cc b/src/box/index.cc
index 4e4867118..750838bef 100644
--- a/src/box/index.cc
+++ b/src/box/index.cc
@@ -733,6 +733,8 @@ int
 generic_index_build_next(struct index *index, struct tuple *tuple)
 {
 	struct tuple *unused;
+	if (index_reserve(index, 0) != 0)
+		return -1;
 	return index_replace(index, NULL, tuple, DUP_INSERT, &unused);
 }
 
diff --git a/src/box/memtx_engine.h b/src/box/memtx_engine.h
index f562c66df..8b380bf3c 100644
--- a/src/box/memtx_engine.h
+++ b/src/box/memtx_engine.h
@@ -87,6 +87,18 @@ enum memtx_recovery_state {
 /** Memtx extents pool, available to statistics. */
 extern struct mempool memtx_index_extent_pool;
 
+enum memtx_reserve_extents_num {
+	/**
+	 * This number is calculated based on the
+	 * max (realistic) number of insertions
+	 * a deletion from a B-tree or an R-tree
+	 * can lead to, and, as a result, the max
+	 * number of new block allocations.
+	 */
+	RESERVE_EXTENTS_BEFORE_DELETE = 8,
+	RESERVE_EXTENTS_BEFORE_REPLACE = 16
+};
+
 /**
  * The size of the biggest memtx iterator. Used with
  * mempool_create. This is the size of the block that will be
diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c
index 8badad797..a5cae8f45 100644
--- a/src/box/memtx_rtree.c
+++ b/src/box/memtx_rtree.c
@@ -242,6 +242,23 @@ memtx_rtree_index_replace(struct index *base, struct tuple *old_tuple,
 	return 0;
 }
 
+static int
+memtx_rtree_index_reserve(struct index *base, uint32_t size_hint)
+{
+	/*
+	 * In case of rtree we use reserve to make sure that memory allocation
+	 * will not fail durig any operationin rtree, because there is no
+	 * error handling in rtree lib.
+	 */
+	(void)size_hint;
+	ERROR_INJECT(ERRINJ_INDEX_RESERVE, {
+		diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, "slab allocator", "memtx extent");
+		return -1;
+	});
+	struct memtx_engine *memtx = (struct memtx_engine *)base->engine;
+	return memtx_index_extent_reserve(memtx, RESERVE_EXTENTS_BEFORE_REPLACE);
+}
+
 static struct iterator *
 memtx_rtree_index_create_iterator(struct index *base,  enum iterator_type type,
 				  const char *key, uint32_t part_count)
@@ -333,7 +350,7 @@ static const struct index_vtab memtx_rtree_index_vtab = {
 	/* .compact = */ generic_index_compact,
 	/* .reset_stat = */ generic_index_reset_stat,
 	/* .begin_build = */ generic_index_begin_build,
-	/* .reserve = */ generic_index_reserve,
+	/* .reserve = */ memtx_rtree_index_reserve,
 	/* .build_next = */ generic_index_build_next,
 	/* .end_build = */ generic_index_end_build,
 };
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index 6ef84e045..da20b9196 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -103,18 +103,6 @@ memtx_space_replace_no_keys(struct space *space, struct tuple *old_tuple,
 	return -1;
 }
 
-enum {
-	/**
-	 * This number is calculated based on the
-	 * max (realistic) number of insertions
-	 * a deletion from a B-tree or an R-tree
-	 * can lead to, and, as a result, the max
-	 * number of new block allocations.
-	 */
-	RESERVE_EXTENTS_BEFORE_DELETE = 8,
-	RESERVE_EXTENTS_BEFORE_REPLACE = 16
-};
-
 /**
  * A short-cut version of replace() used during bulk load
  * from snapshot.
diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
index 672da2119..9ee9b5699 100644
--- a/src/lib/core/errinj.h
+++ b/src/lib/core/errinj.h
@@ -135,6 +135,7 @@ struct errinj {
 	_(ERRINJ_COIO_SENDFILE_CHUNK, ERRINJ_INT, {.iparam = -1}) \
 	_(ERRINJ_SWIM_FD_ONLY, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \
+	_(ERRINJ_INDEX_RESERVE, ERRINJ_BOOL, {.bparam = false})\
 
 ENUM0(errinj_id, ERRINJ_LIST);
 extern struct errinj errinjs[];
diff --git a/test/box/errinj.result b/test/box/errinj.result
index babe36b1b..c73fe3456 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -23,132 +23,134 @@ errinj.info()
 ---
 - ERRINJ_VY_RUN_WRITE_STMT_TIMEOUT:
     state: 0
-  ERRINJ_WAL_WRITE:
-    state: false
-  ERRINJ_RELAY_BREAK_LSN:
+  ERRINJ_WAL_BREAK_LSN:
     state: -1
-  ERRINJ_HTTPC_EXECUTE:
-    state: false
   ERRINJ_VYRUN_DATA_READ:
     state: false
-  ERRINJ_SWIM_FD_ONLY:
-    state: false
-  ERRINJ_SQL_NAME_NORMALIZATION:
-    state: false
   ERRINJ_VY_SCHED_TIMEOUT:
     state: 0
-  ERRINJ_COIO_SENDFILE_CHUNK:
-    state: -1
   ERRINJ_HTTP_RESPONSE_ADD_WAIT:
     state: false
-  ERRINJ_WAL_WRITE_PARTIAL:
-    state: -1
-  ERRINJ_VY_GC:
-    state: false
-  ERRINJ_WAL_DELAY:
-    state: false
-  ERRINJ_INDEX_ALLOC:
-    state: false
   ERRINJ_WAL_WRITE_EOF:
     state: false
-  ERRINJ_WAL_SYNC:
-    state: false
-  ERRINJ_BUILD_INDEX:
-    state: -1
   ERRINJ_BUILD_INDEX_DELAY:
     state: false
-  ERRINJ_VY_RUN_FILE_RENAME:
-    state: false
-  ERRINJ_VY_COMPACTION_DELAY:
-    state: false
-  ERRINJ_VY_DUMP_DELAY:
-    state: false
   ERRINJ_VY_DELAY_PK_LOOKUP:
     state: false
-  ERRINJ_VY_TASK_COMPLETE:
-    state: false
-  ERRINJ_PORT_DUMP:
+  ERRINJ_VY_POINT_ITER_WAIT:
     state: false
-  ERRINJ_WAL_BREAK_LSN:
-    state: -1
   ERRINJ_WAL_IO:
     state: false
-  ERRINJ_WAL_FALLOCATE:
-    state: 0
-  ERRINJ_DYN_MODULE_COUNT:
-    state: 0
   ERRINJ_VY_INDEX_FILE_RENAME:
     state: false
   ERRINJ_TUPLE_FORMAT_COUNT:
     state: -1
   ERRINJ_TUPLE_ALLOC:
     state: false
-  ERRINJ_VY_RUN_WRITE_DELAY:
+  ERRINJ_VY_RUN_FILE_RENAME:
     state: false
   ERRINJ_VY_READ_PAGE:
     state: false
   ERRINJ_RELAY_REPORT_INTERVAL:
     state: 0
-  ERRINJ_VY_LOG_FILE_RENAME:
-    state: false
-  ERRINJ_VY_READ_PAGE_TIMEOUT:
-    state: 0
+  ERRINJ_RELAY_BREAK_LSN:
+    state: -1
   ERRINJ_XLOG_META:
     state: false
-  ERRINJ_SIO_READ_MAX:
-    state: -1
   ERRINJ_SNAP_COMMIT_DELAY:
     state: false
-  ERRINJ_WAL_WRITE_DISK:
+  ERRINJ_VY_RUN_WRITE:
     state: false
-  ERRINJ_SNAP_WRITE_DELAY:
+  ERRINJ_BUILD_INDEX:
+    state: -1
+  ERRINJ_RELAY_FINAL_JOIN:
+    state: false
+  ERRINJ_REPLICA_JOIN_DELAY:
     state: false
   ERRINJ_LOG_ROTATE:
     state: false
-  ERRINJ_VY_RUN_WRITE:
+  ERRINJ_MEMTX_DELAY_GC:
     state: false
-  ERRINJ_CHECK_FORMAT_DELAY:
+  ERRINJ_XLOG_GARBAGE:
+    state: false
+  ERRINJ_VY_READ_PAGE_DELAY:
+    state: false
+  ERRINJ_INDEX_RESERVE:
+    state: false
+  ERRINJ_SWIM_FD_ONLY:
+    state: false
+  ERRINJ_WAL_WRITE:
     state: false
+  ERRINJ_HTTPC_EXECUTE:
+    state: false
+  ERRINJ_SQL_NAME_NORMALIZATION:
+    state: false
+  ERRINJ_WAL_WRITE_PARTIAL:
+    state: -1
+  ERRINJ_VY_GC:
+    state: false
+  ERRINJ_WAL_DELAY:
+    state: false
+  ERRINJ_XLOG_READ:
+    state: -1
+  ERRINJ_WAL_SYNC:
+    state: false
+  ERRINJ_VY_TASK_COMPLETE:
+    state: false
+  ERRINJ_PORT_DUMP:
+    state: false
+  ERRINJ_COIO_SENDFILE_CHUNK:
+    state: -1
+  ERRINJ_DYN_MODULE_COUNT:
+    state: 0
+  ERRINJ_SIO_READ_MAX:
+    state: -1
+  ERRINJ_RELAY_TIMEOUT:
+    state: 0
+  ERRINJ_VY_DUMP_DELAY:
+    state: false
+  ERRINJ_VY_SQUASH_TIMEOUT:
+    state: 0
   ERRINJ_VY_LOG_FLUSH_DELAY:
     state: false
-  ERRINJ_RELAY_FINAL_JOIN:
+  ERRINJ_RELAY_SEND_DELAY:
     state: false
-  ERRINJ_REPLICA_JOIN_DELAY:
+  ERRINJ_VY_COMPACTION_DELAY:
     state: false
-  ERRINJ_RELAY_FINAL_SLEEP:
+  ERRINJ_VY_LOG_FILE_RENAME:
     state: false
   ERRINJ_VY_RUN_DISCARD:
     state: false
   ERRINJ_WAL_ROTATE:
     state: false
-  ERRINJ_RELAY_EXIT_DELAY:
+  ERRINJ_VY_READ_PAGE_TIMEOUT:
     state: 0
-  ERRINJ_VY_POINT_ITER_WAIT:
+  ERRINJ_VY_INDEX_DUMP:
+    state: -1
+  ERRINJ_TUPLE_FIELD:
     state: false
-  ERRINJ_MEMTX_DELAY_GC:
+  ERRINJ_SNAP_WRITE_DELAY:
     state: false
   ERRINJ_IPROTO_TX_DELAY:
     state: false
-  ERRINJ_XLOG_READ:
-    state: -1
-  ERRINJ_TUPLE_FIELD:
+  ERRINJ_RELAY_EXIT_DELAY:
+    state: 0
+  ERRINJ_RELAY_FINAL_SLEEP:
     state: false
-  ERRINJ_XLOG_GARBAGE:
+  ERRINJ_WAL_WRITE_DISK:
     state: false
-  ERRINJ_VY_INDEX_DUMP:
-    state: -1
-  ERRINJ_VY_READ_PAGE_DELAY:
+  ERRINJ_CHECK_FORMAT_DELAY:
     state: false
   ERRINJ_TESTING:
     state: false
-  ERRINJ_RELAY_SEND_DELAY:
+  ERRINJ_VY_RUN_WRITE_DELAY:
     state: false
-  ERRINJ_VY_SQUASH_TIMEOUT:
+  ERRINJ_WAL_FALLOCATE:
     state: 0
   ERRINJ_VY_LOG_FLUSH:
     state: false
-  ERRINJ_RELAY_TIMEOUT:
-    state: 0
+  ERRINJ_INDEX_ALLOC:
+    state: false
 ...
 errinj.set("some-injection", true)
 ---
@@ -1781,3 +1783,100 @@ box.error.injection.get('ERRINJ_RELAY_TIMEOUT')
 ---
 - 0
 ...
+--
+-- gh-4619-RTREE-doesn't-handle-OOM-properly
+--
+test_run:cmd('create server rtree with script = "box/lua/cfg_rtree.lua"')
+---
+- true
+...
+test_run:cmd("start server rtree")
+---
+- true
+...
+test_run:cmd('switch rtree')
+---
+- true
+...
+math = require("math")
+---
+...
+rtreespace = box.schema.create_space('rtree', {if_not_exists = true})
+---
+...
+rtreespace:create_index('pk', {if_not_exists = true})
+---
+- unique: true
+  parts:
+  - type: unsigned
+    is_nullable: false
+    fieldno: 1
+  id: 0
+  space_id: 512
+  type: TREE
+  name: pk
+...
+rtreespace:create_index('target', {type='rtree', dimension = 3, parts={2, 'array'},unique = false, if_not_exists = true,})
+---
+- parts:
+  - type: array
+    is_nullable: false
+    fieldno: 2
+  dimension: 3
+  id: 1
+  type: RTREE
+  space_id: 512
+  name: target
+...
+count = 10
+---
+...
+for i = 1, count do box.space.rtree:insert{i, {(i + 1) -\
+    math.floor((i + 1)/7000) * 7000, (i + 2) - math.floor((i + 2)/7000) * 7000,\
+    (i + 3) - math.floor((i + 3)/7000) * 7000}} end
+---
+...
+rtreespace:count()
+---
+- 10
+...
+box.snapshot()
+---
+- ok
+...
+test_run:cmd('switch default')
+---
+- true
+...
+test_run:cmd("stop server rtree")
+---
+- true
+...
+test_run:cmd("start server rtree with crash_expected=True")
+---
+- false
+...
+fio = require('fio')
+---
+...
+fh = fio.open(fio.pathjoin(fio.cwd(), 'cfg_rtree.log'), {'O_RDONLY'})
+---
+...
+size = fh:seek(0, 'SEEK_END')
+---
+...
+fh:seek(-256, 'SEEK_END') ~= nil
+---
+- true
+...
+line = fh:read(256)
+---
+...
+fh:close()
+---
+- true
+...
+string.match(line, 'Failed to allocate') ~= nil
+---
+- true
+...
diff --git a/test/box/errinj.test.lua b/test/box/errinj.test.lua
index 03c088677..598e7da33 100644
--- a/test/box/errinj.test.lua
+++ b/test/box/errinj.test.lua
@@ -620,3 +620,30 @@ box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.5)
 box.error.injection.get('ERRINJ_RELAY_TIMEOUT')
 box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0)
 box.error.injection.get('ERRINJ_RELAY_TIMEOUT')
+
+--
+-- gh-4619-RTREE-doesn't-handle-OOM-properly
+--
+test_run:cmd('create server rtree with script = "box/lua/cfg_rtree.lua"')
+test_run:cmd("start server rtree")
+test_run:cmd('switch rtree')
+math = require("math")
+rtreespace = box.schema.create_space('rtree', {if_not_exists = true})
+rtreespace:create_index('pk', {if_not_exists = true})
+rtreespace:create_index('target', {type='rtree', dimension = 3, parts={2, 'array'},unique = false, if_not_exists = true,})
+count = 10
+for i = 1, count do box.space.rtree:insert{i, {(i + 1) -\
+    math.floor((i + 1)/7000) * 7000, (i + 2) - math.floor((i + 2)/7000) * 7000,\
+    (i + 3) - math.floor((i + 3)/7000) * 7000}} end
+rtreespace:count()
+box.snapshot()
+test_run:cmd('switch default')
+test_run:cmd("stop server rtree")
+test_run:cmd("start server rtree with crash_expected=True")
+fio = require('fio')
+fh = fio.open(fio.pathjoin(fio.cwd(), 'cfg_rtree.log'), {'O_RDONLY'})
+size = fh:seek(0, 'SEEK_END')
+fh:seek(-256, 'SEEK_END') ~= nil
+line = fh:read(256)
+fh:close()
+string.match(line, 'Failed to allocate') ~= nil
diff --git a/test/box/lua/cfg_rtree.lua b/test/box/lua/cfg_rtree.lua
new file mode 100644
index 000000000..f2d32ef7d
--- /dev/null
+++ b/test/box/lua/cfg_rtree.lua
@@ -0,0 +1,8 @@
+#!/usr/bin/env tarantool
+os = require('os')
+box.error.injection.set("ERRINJ_INDEX_RESERVE", true)
+box.cfg{
+    listen              = os.getenv("LISTEN"),
+}
+require('console').listen(os.getenv('ADMIN'))
+box.schema.user.grant('guest', 'read,write,execute', 'universe')
-- 
2.20.1 (Apple Git-117)



More information about the Tarantool-patches mailing list