[Tarantool-patches] [PATCH v5] memtx: fix out of memory handling for rtree
Olga Arkhangelskaia
arkholga at tarantool.org
Wed Dec 25 12:16:31 MSK 2019
On 25/12/2019 11:57, Olga Arkhangelskaia wrote:
> When tarantool tries to recover rtree from a snapshot and memtx_memory value
> is lower than it has been when the snapshot was created, server suffers from
> segmentation fault. This happens because there is no out of memory error
> handling in rtree lib. In another words, we do not check the result of
> malloc operation.
> The execution flow in case of recovery uses different way and the secondary
> keys are build in batches. That way has no checks and reservations.
> The patch adds memtx_rtree_index_reserve implementation to make sure that any
> memory allocation in rtree will fail. Although this gives us no additional
> optimization as in case of memtx_tree, the memory reservation prevents
> tarantool from segmentation fault. If there is not enough memory to be reserved
> server will fail gently with the "Failed to allocate" error message.
>
> Closes #4619
> ---
> Branch: https://github.com/tarantool/tarantool/tree/OKriw/gh-RTREE-doesnt-handle-OOM-properly
> Issue:https://github.com/tarantool/tarantool/issues/4619
>
> v1:https://lists.tarantool.org/pipermail/tarantool-patches/2019-November/012391.html
> v2:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/012958.html
> v3:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/013182.html
> v4:https://lists.tarantool.org/pipermail/tarantool-patches/2019-December/013230.html
>
> Changes in v2:
> - changed way of error handling, now we reserve pages before go to rtree lib
> - changed test
> - changed commit msg
>
> Changes in v3:
> - added memtx_rtree_build_next function
> - memory reservation is moved to memtx_rtree_build_next
>
> Changes in v4:
> - added index reservation in build_next
> - added memtx_rtree_reserve implementation
>
> Changes in v5:
> - added error injection
> - test is much faster now
> - fixed indentation errors
>
> src/box/index.cc | 2 +
> src/box/memtx_engine.h | 12 +++++
> src/box/memtx_rtree.c | 19 ++++++-
> src/box/memtx_space.c | 12 -----
> src/lib/core/errinj.h | 1 +
> test/box/cfg.result | 105 +++++++++++++++++++++++++++++++++++++
> test/box/cfg.test.lua | 28 ++++++++++
> test/box/lua/cfg_rtree.lua | 8 +++
> 8 files changed, 174 insertions(+), 13 deletions(-)
> create mode 100644 test/box/lua/cfg_rtree.lua
>
> diff --git a/src/box/index.cc b/src/box/index.cc
> index 4e4867118..62cb2ab96 100644
> --- a/src/box/index.cc
> +++ b/src/box/index.cc
> @@ -733,6 +733,8 @@ int
> generic_index_build_next(struct index *index, struct tuple *tuple)
> {
> struct tuple *unused;
> + if (index_reserve(index, 0) != 0)
> + return -1;
> return index_replace(index, NULL, tuple, DUP_INSERT, &unused);
> }
>
> diff --git a/src/box/memtx_engine.h b/src/box/memtx_engine.h
> index f562c66df..8b380bf3c 100644
> --- a/src/box/memtx_engine.h
> +++ b/src/box/memtx_engine.h
> @@ -87,6 +87,18 @@ enum memtx_recovery_state {
> /** Memtx extents pool, available to statistics. */
> extern struct mempool memtx_index_extent_pool;
>
> +enum memtx_reserve_extents_num {
> + /**
> + * This number is calculated based on the
> + * max (realistic) number of insertions
> + * a deletion from a B-tree or an R-tree
> + * can lead to, and, as a result, the max
> + * number of new block allocations.
> + */
> + RESERVE_EXTENTS_BEFORE_DELETE = 8,
> + RESERVE_EXTENTS_BEFORE_REPLACE = 16
> +};
> +
> /**
> * The size of the biggest memtx iterator. Used with
> * mempool_create. This is the size of the block that will be
> diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c
> index 8badad797..fe8802ef0 100644
> --- a/src/box/memtx_rtree.c
> +++ b/src/box/memtx_rtree.c
> @@ -242,6 +242,23 @@ memtx_rtree_index_replace(struct index *base, struct tuple *old_tuple,
> return 0;
> }
>
> +static int
> +memtx_rtree_index_reserve(struct index *base, uint32_t size_hint)
> +{
> + /*
> + * In case of rtree we use reserve to make sure that memory allocation
> + * will not fail durig any operationin rtree, because there is no
> + * error handling in rtree lib.
> + */
> + (void)size_hint;
> + (ERROR_INJECT(ERRINJ_INDEX_RESERVE, {
> + diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, "slab allocator", "memtx extent");
> + return -1;
> + });
> + struct memtx_engine *memtx = (struct memtx_engine *)base->engine;
> + return memtx_index_extent_reserve(memtx, RESERVE_EXTENTS_BEFORE_REPLACE);
> +}
> +
> static struct iterator *
> memtx_rtree_index_create_iterator(struct index *base, enum iterator_type type,
> const char *key, uint32_t part_count)
> @@ -333,7 +350,7 @@ static const struct index_vtab memtx_rtree_index_vtab = {
> /* .compact = */ generic_index_compact,
> /* .reset_stat = */ generic_index_reset_stat,
> /* .begin_build = */ generic_index_begin_build,
> - /* .reserve = */ generic_index_reserve,
> + /* .reserve = */ memtx_rtree_index_reserve,
> /* .build_next = */ generic_index_build_next,
> /* .end_build = */ generic_index_end_build,
> };
> diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
> index 6ef84e045..da20b9196 100644
> --- a/src/box/memtx_space.c
> +++ b/src/box/memtx_space.c
> @@ -103,18 +103,6 @@ memtx_space_replace_no_keys(struct space *space, struct tuple *old_tuple,
> return -1;
> }
>
> -enum {
> - /**
> - * This number is calculated based on the
> - * max (realistic) number of insertions
> - * a deletion from a B-tree or an R-tree
> - * can lead to, and, as a result, the max
> - * number of new block allocations.
> - */
> - RESERVE_EXTENTS_BEFORE_DELETE = 8,
> - RESERVE_EXTENTS_BEFORE_REPLACE = 16
> -};
> -
> /**
> * A short-cut version of replace() used during bulk load
> * from snapshot.
> diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
> index 672da2119..9ee9b5699 100644
> --- a/src/lib/core/errinj.h
> +++ b/src/lib/core/errinj.h
> @@ -135,6 +135,7 @@ struct errinj {
> _(ERRINJ_COIO_SENDFILE_CHUNK, ERRINJ_INT, {.iparam = -1}) \
> _(ERRINJ_SWIM_FD_ONLY, ERRINJ_BOOL, {.bparam = false}) \
> _(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \
> + _(ERRINJ_INDEX_RESERVE, ERRINJ_BOOL, {.bparam = false})\
>
> ENUM0(errinj_id, ERRINJ_LIST);
> extern struct errinj errinjs[];
> diff --git a/test/box/cfg.result b/test/box/cfg.result
> index 5370bb870..389c93724 100644
> --- a/test/box/cfg.result
> +++ b/test/box/cfg.result
> @@ -580,3 +580,108 @@ test_run:cmd("cleanup server cfg_tester6")
> | ---
> | - true
> | ...
> +
> +--
> +-- gh-4619-RTREE-doesn't-handle-OOM-properly
> +--
> +test_run:cmd('create server rtree with script = "box/lua/cfg_rtree.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server rtree")
> + | ---
> + | - true
> + | ...
> +test_run:cmd('switch rtree')
> + | ---
> + | - true
> + | ...
> +box.cfg{memtx_memory = 3221225472}
> + | ---
> + | ...
> +math = require("math")
> + | ---
> + | ...
> +rtreespace = box.schema.create_space('rtree', {if_not_exists = true})
> + | ---
> + | ...
> +rtreespace:create_index('pk', {if_not_exists = true})
> + | ---
> + | - unique: true
> + | parts:
> + | - type: unsigned
> + | is_nullable: false
> + | fieldno: 1
> + | id: 0
> + | space_id: 512
> + | type: TREE
> + | name: pk
> + | ...
> +rtreespace:create_index('target', {type='rtree', dimension = 3, parts={2, 'array'},unique = false, if_not_exists = true,})
> + | ---
> + | - parts:
> + | - type: array
> + | is_nullable: false
> + | fieldno: 2
> + | dimension: 3
> + | id: 1
> + | type: RTREE
> + | space_id: 512
> + | name: target
> + | ...
> +count = 10
> + | ---
> + | ...
> +for i = 1, count do box.space.rtree:insert{i, {(i + 1) -\
> + math.floor((i + 1)/7000) * 7000, (i + 2) - math.floor((i + 2)/7000) * 7000,\
> + (i + 3) - math.floor((i + 3)/7000) * 7000}} end
> + | ---
> + | ...
> +rtreespace:count()
> + | ---
> + | - 10
> + | ...
> +box.error.injection.set("ERRINJ_INDEX_RESERVE", true)
> + | ---
> + | - ok
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd("stop server rtree")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server rtree with crash_expected=True")
> + | ---
> + | - false
> + | ...
> +fio = require('fio')
> + | ---
> + | ...
> +fh = fio.open(fio.pathjoin(fio.cwd(), 'cfg_rtree.log'), {'O_RDONLY'})
> + | ---
> + | ...
> +size = fh:seek(0, 'SEEK_END')
> + | ---
> + | ...
> +fh:seek(-256, 'SEEK_END') ~= nil
> + | ---
> + | - true
> + | ...
> +line = fh:read(256)
> + | ---
> + | ...
> +fh:close()
> + | ---
> + | - true
> + | ...
> +string.match(line, 'Failed to allocate') ~= nil
> + | ---
> + | - true
> + | ...
> diff --git a/test/box/cfg.test.lua b/test/box/cfg.test.lua
> index 56ccb6767..49b97486a 100644
> --- a/test/box/cfg.test.lua
> +++ b/test/box/cfg.test.lua
> @@ -141,3 +141,31 @@ test_run:cmd("start server cfg_tester6")
> test_run:grep_log('cfg_tester6', 'set \'vinyl_memory\' configuration option to 1073741824', 1000)
> test_run:cmd("stop server cfg_tester6")
> test_run:cmd("cleanup server cfg_tester6")
> +
> +--
> +-- gh-4619-RTREE-doesn't-handle-OOM-properly
> +--
> +test_run:cmd('create server rtree with script = "box/lua/cfg_rtree.lua"')
> +test_run:cmd("start server rtree")
> +test_run:cmd('switch rtree')
> +box.cfg{memtx_memory = 3221225472}
> +math = require("math")
> +rtreespace = box.schema.create_space('rtree', {if_not_exists = true})
> +rtreespace:create_index('pk', {if_not_exists = true})
> +rtreespace:create_index('target', {type='rtree', dimension = 3, parts={2, 'array'},unique = false, if_not_exists = true,})
> +count = 10
> +for i = 1, count do box.space.rtree:insert{i, {(i + 1) -\
> + math.floor((i + 1)/7000) * 7000, (i + 2) - math.floor((i + 2)/7000) * 7000,\
> + (i + 3) - math.floor((i + 3)/7000) * 7000}} end
> +rtreespace:count()
> +box.snapshot()
> +test_run:cmd('switch default')
> +test_run:cmd("stop server rtree")
> +test_run:cmd("start server rtree with crash_expected=True")
> +fio = require('fio')
> +fh = fio.open(fio.pathjoin(fio.cwd(), 'cfg_rtree.log'), {'O_RDONLY'})
> +size = fh:seek(0, 'SEEK_END')
> +fh:seek(-256, 'SEEK_END') ~= nil
> +line = fh:read(256)
> +fh:close()
> +string.match(line, 'Failed to allocate') ~= nil
> diff --git a/test/box/lua/cfg_rtree.lua b/test/box/lua/cfg_rtree.lua
> new file mode 100644
> index 000000000..f2d32ef7d
> --- /dev/null
> +++ b/test/box/lua/cfg_rtree.lua
> @@ -0,0 +1,8 @@
> +#!/usr/bin/env tarantool
> +os = require('os')
> +box.error.injection.set("ERRINJ_INDEX_RESERVE", true)
> +box.cfg{
> + listen = os.getenv("LISTEN"),
> +}
> +require('console').listen(os.getenv('ADMIN'))
> +box.schema.user.grant('guest', 'read,write,execute', 'universe')
Pls, do not look, it is broken
More information about the Tarantool-patches
mailing list