[Tarantool-patches] [PATCH v5] memtx: fix out of memory handling for rtree

Olga Arkhangelskaia arkholga at tarantool.org
Wed Dec 25 12:16:31 MSK 2019


On 25/12/2019 11:57, Olga Arkhangelskaia wrote:
> When tarantool tries to recover rtree from a snapshot and memtx_memory value
> is lower than it has been when the snapshot was created, server suffers from
> segmentation fault. This happens because there is no out of memory error
> handling in rtree lib. In another words, we do not check the result of
> malloc operation.
> The execution flow in case of recovery uses different way and the secondary
> keys are build in batches. That way has no checks and reservations.
> The patch adds memtx_rtree_index_reserve implementation to make sure that any
> memory allocation in rtree will fail. Although this gives us no additional
> optimization as in case of memtx_tree, the memory reservation prevents
> tarantool from segmentation fault. If there is not enough memory to be reserved
> server will fail gently with the "Failed to allocate" error message.
>
> Closes #4619
> ---
> Branch: https://github.com/tarantool/tarantool/tree/OKriw/gh-RTREE-doesnt-handle-OOM-properly
> Issue:https://github.com/tarantool/tarantool/issues/4619
>
> v1:https://lists.tarantool.org/pipermail/tarantool-patches/2019-November/012391.html
> v2:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/012958.html
> v3:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/013182.html
> v4:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/013230.html
>
> Changes in v2:
> - changed way of error handling, now we reserve pages before go to rtree lib
> - changed test
> - changed commit msg
>
> Changes in v3:
> - added memtx_rtree_build_next function
> - memory reservation is moved to memtx_rtree_build_next
>
> Changes in v4:
> - added index reservation in build_next
> - added memtx_rtree_reserve implementation
>
> Changes in v5:
> - added error injection
> - test is much faster now
> - fixed indentation errors
>
>   src/box/index.cc           |   2 +
>   src/box/memtx_engine.h     |  12 +++++
>   src/box/memtx_rtree.c      |  19 ++++++-
>   src/box/memtx_space.c      |  12 -----
>   src/lib/core/errinj.h      |   1 +
>   test/box/cfg.result        | 105 +++++++++++++++++++++++++++++++++++++
>   test/box/cfg.test.lua      |  28 ++++++++++
>   test/box/lua/cfg_rtree.lua |   8 +++
>   8 files changed, 174 insertions(+), 13 deletions(-)
>   create mode 100644 test/box/lua/cfg_rtree.lua
>
> diff --git a/src/box/index.cc b/src/box/index.cc
> index 4e4867118..62cb2ab96 100644
> --- a/src/box/index.cc
> +++ b/src/box/index.cc
> @@ -733,6 +733,8 @@ int
>   generic_index_build_next(struct index *index, struct tuple *tuple)
>   {
>   	struct tuple *unused;
> +	if (index_reserve(index, 0) != 0)
> +        return -1;
>   	return index_replace(index, NULL, tuple, DUP_INSERT, &unused);
>   }
>   
> diff --git a/src/box/memtx_engine.h b/src/box/memtx_engine.h
> index f562c66df..8b380bf3c 100644
> --- a/src/box/memtx_engine.h
> +++ b/src/box/memtx_engine.h
> @@ -87,6 +87,18 @@ enum memtx_recovery_state {
>   /** Memtx extents pool, available to statistics. */
>   extern struct mempool memtx_index_extent_pool;
>   
> +enum memtx_reserve_extents_num {
> +	/**
> +	 * This number is calculated based on the
> +	 * max (realistic) number of insertions
> +	 * a deletion from a B-tree or an R-tree
> +	 * can lead to, and, as a result, the max
> +	 * number of new block allocations.
> +	 */
> +	RESERVE_EXTENTS_BEFORE_DELETE = 8,
> +	RESERVE_EXTENTS_BEFORE_REPLACE = 16
> +};
> +
>   /**
>    * The size of the biggest memtx iterator. Used with
>    * mempool_create. This is the size of the block that will be
> diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c
> index 8badad797..fe8802ef0 100644
> --- a/src/box/memtx_rtree.c
> +++ b/src/box/memtx_rtree.c
> @@ -242,6 +242,23 @@ memtx_rtree_index_replace(struct index *base, struct tuple *old_tuple,
>   	return 0;
>   }
>   
> +static int
> +memtx_rtree_index_reserve(struct index *base, uint32_t size_hint)
> +{
> +	/*
> +	 * In case of rtree we use reserve to make sure that memory allocation
> +	 * will not fail durig any operationin rtree, because there is no
> +	 * error handling in rtree lib.
> +	 */
> +	(void)size_hint;
> +	(ERROR_INJECT(ERRINJ_INDEX_RESERVE, {
> +		diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, "slab allocator", "memtx extent");
> +		return -1;
> +	});
> +	struct memtx_engine *memtx = (struct memtx_engine *)base->engine;
> +	return memtx_index_extent_reserve(memtx, RESERVE_EXTENTS_BEFORE_REPLACE);
> +}
> +
>   static struct iterator *
>   memtx_rtree_index_create_iterator(struct index *base,  enum iterator_type type,
>   				  const char *key, uint32_t part_count)
> @@ -333,7 +350,7 @@ static const struct index_vtab memtx_rtree_index_vtab = {
>   	/* .compact = */ generic_index_compact,
>   	/* .reset_stat = */ generic_index_reset_stat,
>   	/* .begin_build = */ generic_index_begin_build,
> -	/* .reserve = */ generic_index_reserve,
> +	/* .reserve = */ memtx_rtree_index_reserve,
>   	/* .build_next = */ generic_index_build_next,
>   	/* .end_build = */ generic_index_end_build,
>   };
> diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
> index 6ef84e045..da20b9196 100644
> --- a/src/box/memtx_space.c
> +++ b/src/box/memtx_space.c
> @@ -103,18 +103,6 @@ memtx_space_replace_no_keys(struct space *space, struct tuple *old_tuple,
>   	return -1;
>   }
>   
> -enum {
> -	/**
> -	 * This number is calculated based on the
> -	 * max (realistic) number of insertions
> -	 * a deletion from a B-tree or an R-tree
> -	 * can lead to, and, as a result, the max
> -	 * number of new block allocations.
> -	 */
> -	RESERVE_EXTENTS_BEFORE_DELETE = 8,
> -	RESERVE_EXTENTS_BEFORE_REPLACE = 16
> -};
> -
>   /**
>    * A short-cut version of replace() used during bulk load
>    * from snapshot.
> diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
> index 672da2119..9ee9b5699 100644
> --- a/src/lib/core/errinj.h
> +++ b/src/lib/core/errinj.h
> @@ -135,6 +135,7 @@ struct errinj {
>   	_(ERRINJ_COIO_SENDFILE_CHUNK, ERRINJ_INT, {.iparam = -1}) \
>   	_(ERRINJ_SWIM_FD_ONLY, ERRINJ_BOOL, {.bparam = false}) \
>   	_(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \
> +	_(ERRINJ_INDEX_RESERVE, ERRINJ_BOOL, {.bparam = false})\
>   
>   ENUM0(errinj_id, ERRINJ_LIST);
>   extern struct errinj errinjs[];
> diff --git a/test/box/cfg.result b/test/box/cfg.result
> index 5370bb870..389c93724 100644
> --- a/test/box/cfg.result
> +++ b/test/box/cfg.result
> @@ -580,3 +580,108 @@ test_run:cmd("cleanup server cfg_tester6")
>    | ---
>    | - true
>    | ...
> +
> +--
> +-- gh-4619-RTREE-doesn't-handle-OOM-properly
> +--
> +test_run:cmd('create server rtree with script = "box/lua/cfg_rtree.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server rtree")
> + | ---
> + | - true
> + | ...
> +test_run:cmd('switch rtree')
> + | ---
> + | - true
> + | ...
> +box.cfg{memtx_memory = 3221225472}
> + | ---
> + | ...
> +math = require("math")
> + | ---
> + | ...
> +rtreespace = box.schema.create_space('rtree', {if_not_exists = true})
> + | ---
> + | ...
> +rtreespace:create_index('pk', {if_not_exists = true})
> + | ---
> + | - unique: true
> + |   parts:
> + |   - type: unsigned
> + |     is_nullable: false
> + |     fieldno: 1
> + |   id: 0
> + |   space_id: 512
> + |   type: TREE
> + |   name: pk
> + | ...
> +rtreespace:create_index('target', {type='rtree', dimension = 3, parts={2, 'array'},unique = false, if_not_exists = true,})
> + | ---
> + | - parts:
> + |   - type: array
> + |     is_nullable: false
> + |     fieldno: 2
> + |   dimension: 3
> + |   id: 1
> + |   type: RTREE
> + |   space_id: 512
> + |   name: target
> + | ...
> +count = 10
> + | ---
> + | ...
> +for i = 1, count do box.space.rtree:insert{i, {(i + 1) -\
> +    math.floor((i + 1)/7000) * 7000, (i + 2) - math.floor((i + 2)/7000) * 7000,\
> +    (i + 3) - math.floor((i + 3)/7000) * 7000}} end
> + | ---
> + | ...
> +rtreespace:count()
> + | ---
> + | - 10
> + | ...
> +box.error.injection.set("ERRINJ_INDEX_RESERVE", true)
> + | ---
> + | - ok
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd("stop server rtree")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server rtree with crash_expected=True")
> + | ---
> + | - false
> + | ...
> +fio = require('fio')
> + | ---
> + | ...
> +fh = fio.open(fio.pathjoin(fio.cwd(), 'cfg_rtree.log'), {'O_RDONLY'})
> + | ---
> + | ...
> +size = fh:seek(0, 'SEEK_END')
> + | ---
> + | ...
> +fh:seek(-256, 'SEEK_END') ~= nil
> + | ---
> + | - true
> + | ...
> +line = fh:read(256)
> + | ---
> + | ...
> +fh:close()
> + | ---
> + | - true
> + | ...
> +string.match(line, 'Failed to allocate') ~= nil
> + | ---
> + | - true
> + | ...
> diff --git a/test/box/cfg.test.lua b/test/box/cfg.test.lua
> index 56ccb6767..49b97486a 100644
> --- a/test/box/cfg.test.lua
> +++ b/test/box/cfg.test.lua
> @@ -141,3 +141,31 @@ test_run:cmd("start server cfg_tester6")
>   test_run:grep_log('cfg_tester6', 'set \'vinyl_memory\' configuration option to 1073741824', 1000)
>   test_run:cmd("stop server cfg_tester6")
>   test_run:cmd("cleanup server cfg_tester6")
> +
> +--
> +-- gh-4619-RTREE-doesn't-handle-OOM-properly
> +--
> +test_run:cmd('create server rtree with script = "box/lua/cfg_rtree.lua"')
> +test_run:cmd("start server rtree")
> +test_run:cmd('switch rtree')
> +box.cfg{memtx_memory = 3221225472}
> +math = require("math")
> +rtreespace = box.schema.create_space('rtree', {if_not_exists = true})
> +rtreespace:create_index('pk', {if_not_exists = true})
> +rtreespace:create_index('target', {type='rtree', dimension = 3, parts={2, 'array'},unique = false, if_not_exists = true,})
> +count = 10
> +for i = 1, count do box.space.rtree:insert{i, {(i + 1) -\
> +    math.floor((i + 1)/7000) * 7000, (i + 2) - math.floor((i + 2)/7000) * 7000,\
> +    (i + 3) - math.floor((i + 3)/7000) * 7000}} end
> +rtreespace:count()
> +box.snapshot()
> +test_run:cmd('switch default')
> +test_run:cmd("stop server rtree")
> +test_run:cmd("start server rtree with crash_expected=True")
> +fio = require('fio')
> +fh = fio.open(fio.pathjoin(fio.cwd(), 'cfg_rtree.log'), {'O_RDONLY'})
> +size = fh:seek(0, 'SEEK_END')
> +fh:seek(-256, 'SEEK_END') ~= nil
> +line = fh:read(256)
> +fh:close()
> +string.match(line, 'Failed to allocate') ~= nil
> diff --git a/test/box/lua/cfg_rtree.lua b/test/box/lua/cfg_rtree.lua
> new file mode 100644
> index 000000000..f2d32ef7d
> --- /dev/null
> +++ b/test/box/lua/cfg_rtree.lua
> @@ -0,0 +1,8 @@
> +#!/usr/bin/env tarantool
> +os = require('os')
> +box.error.injection.set("ERRINJ_INDEX_RESERVE", true)
> +box.cfg{
> +    listen              = os.getenv("LISTEN"),
> +}
> +require('console').listen(os.getenv('ADMIN'))
> +box.schema.user.grant('guest', 'read,write,execute', 'universe')

Pls, do not look, it is broken



More information about the Tarantool-patches mailing list