[Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sun Jun 7 18:51:23 MSK 2020
Hi! Thanks for the patch!
On 04/06/2020 19:49, Nikita Pettik wrote:
> Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
> appear that in-memory level of secondary index being constructed has
> lagging memory generation (in other words, it's values is less than
> the value of current global generation). Such situation can be achieved
> if yield which is required to check uniqueness of value being inserted
> is too long. In this time gap other space may trigger dump process
> bumping global memory generation counter, but dump itself is still not
> yet scheduled.
Why is it so important in this bug to trigger the dump, but stop the
dump from actual scheduling? Why all the needed actions, whatever they
are, can't be done at the trigger moment? And what are these actions,
which are not done at trigger moment, but done at scheduling, and lead
to this bug?
See 10 comments below.
> So to get rid of generations mismatch, let's rotate in-memory level
> after yield on demand.
>
> Closes #5042
> ---
> Branch: https://github.com/tarantool/tarantool/tree/np/gh-5042-rotate-mem-after-yield
> Issue: https://github.com/tarantool/tarantool/issues/5042
>
> @ChangeLog:
> * Fix crash due to triggered dump process during secondary index creation (gh-5042).
>
> src/box/vinyl.c | 14 ++
> src/errinj.h | 1 +
> test/box/errinj.result | 1 +
> .../gh-5042-dump-during-index-build-2.result | 189 ++++++++++++++++++
> ...gh-5042-dump-during-index-build-2.test.lua | 98 +++++++++
> test/vinyl/suite.ini | 2 +-
> 6 files changed, 304 insertions(+), 1 deletion(-)
> create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.result
> create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.test.lua
>
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index a90b786a7..19e37947a 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -4097,6 +4105,12 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm,
> * Hence, to get right mem (during mem rotation it becomes
> * sealed i.e. read-only) we should fetch it from lsm again.
> */
> + if (lsm->mem->generation != *lsm->env->p_generation) {
> + if (vy_lsm_rotate_mem(lsm) != 0) {
> + tuple_unref(stmt);
> + return -1;
> + }
> + }
1. This check + rotation now are done in 3 places. In all three there
is a comment explaining that again and again. Isn't it time to extract
it into a function?
> mem = lsm->mem;
>
> /* Insert the new tuple into the in-memory index. */
> diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.result b/test/vinyl/gh-5042-dump-during-index-build-2.result
> new file mode 100644
> index 000000000..a4fbc56ec
> --- /dev/null
> +++ b/test/vinyl/gh-5042-dump-during-index-build-2.result
> @@ -0,0 +1,189 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +test_run:cmd("create server test with script='vinyl/low_quota.lua'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server test with args='13421772'")
2. Why is it 13421772?
> + | ---
> + | - true
> + | ...
> +test_run:cmd('switch test')
> + | ---
> + | - true
> + | ...
> +
> +fiber = require 'fiber'
> + | ---
> + | ...
> +errinj = box.error.injection
> + | ---
> + | ...
> +
> +--
> +-- Test is quite similar to gh-4810, but data is inserted to
> +-- the space different from one which builds the new index.
> +-- Consider following scenario:
> +-- 1. during building secondary index, we yield to scan disk
> +-- in order to check tuple uniqueness;
> +-- 2. insertion to another space bumps memory generation (owing to
> +-- exceeding memory limit), but dump itself is not yet scheduled
> +-- (e.g. scheduler fiber does not get time to execute, or it fails
> +-- due for whatever reason);
> +-- 3. we return from yield point and now generation of mem does not
> +-- match with global one (which was bumped during second step). It may
> +-- lead to unpredictable behavior (starting from fired assertions in
> +-- debug mode to segfaults or lost data in release).
> +-- To equalize memory generations, we must rotate mem of corresponding
> +-- LSM tree before processing further operations with it.
> +--
> +s1 = box.schema.space.create('insert_to', {engine = 'vinyl'})
> + | ---
> + | ...
> +_ = s1:create_index('pk', {parts = {1, 'unsigned'}})
> + | ---
> + | ...
> +
> +s2 = box.schema.space.create('build_index', {engine = 'vinyl'})
> + | ---
> + | ...
> +_ = s2:create_index('pk', {parts = {1, 'unsigned'}})
> + | ---
> + | ...
> +
> +val = 0
> + | ---
> + | ...
> +PADDING = string.rep('x', 100)
3. Why is it 100?
> + | ---
> + | ...
> +
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +
> +function gen_insert(space)
> + pcall(space.insert, space, {val, val, val, val, PADDING})
4. Why is it important to have 4 vals, and not just one? Why is
it in pcall?
> + val = val + 1
> +end;
> + | ---
> + | ...
> +
> +function fill_L0_without_dump(space, watermark)
> + while box.stat.vinyl().memory.level0 < watermark do
> + gen_insert(space)
> + end
5. Is it possible to check that the dump really didn't happen?
Since the function is called 'without_dump'.
> +end;
> + | ---
> + | ...
> +
> +function insert_loop()
> + while not stop do
6. Variable 'stop' is not declared here. I propose to declare it
before this function.
> + gen_insert(s1)
> + end
> + ch:put(true)
> +end;
> + | ---
> + | ...
> +
> +function idx_build()
> + _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> + ch:put(true)
7. ch is not declared anywhere above. It would be better to declare
things before usage. Easier to read.
8. Is it important to use fields {2, 3} and not just {2}?
> +end;
> + | ---
> + | ...
> +
> +dump_watermark = box.cfg.vinyl_memory / 2;
9. Why / 2? Is it some internal vinyl formula, that
the dump is triggered, when the memory is half full?
In this case won't 'fill_L0_without_dump(s2, dump_watermark);'
this line trigger the dump? Even though it is called
'without_dump'.
> + | ---
> + | ...
> +fill_L0_without_dump(s1, dump_watermark / 2);
> + | ---
> + | ...
> +fill_L0_without_dump(s2, dump_watermark);
> + | ---
> + | ...
> +
> +box.snapshot();
> + | ---
> + | - ok
> + | ...
> +
> +fill_L0_without_dump(s1, dump_watermark / 2 * 3);
> + | ---
> + | ...
> +
> +function idx_build()
> + errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
> + _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> + ch:put(true)
> +end;
> + | ---
> + | ...
> +
> +stop = false;
> + | ---
> + | ...
> +ch = fiber.channel(2);
> + | ---
> + | ...
> +
> +_ = fiber.create(idx_build);
> + | ---
> + | ...
> +_ = fiber.create(insert_loop, s1);
> + | ---
> + | ...
> +errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
> + | ---
> + | - ok
> + | ...
> +
> +fiber.sleep(2);
10. Is it possible to avoid the fixed timeout? As you know,
having a fixed timeout very often makes the test flaky.
Also the test works more than 10 seconds - I always get the
message:
No output during 10 seconds. Will abort after 120 seconds without output. List of workers not reporting the status:
- 001_vinyl [vinyl/gh-5042-dump-during-index-build-2.test.lua, None] at var/001_vinyl/gh-5042-dump-during-index-build-2.result:162
I suggest to try to find a way to speed it up.
Overall I didn't understand almost anything in this test,
despite the big comment in the beginning. So I can't
validate whether it is optimal/correct. All I could check
is that it really crashes without the patch. Probably fixing
the comments above, and adding more explanations into the
test context would help.
I checked all the other similar places and propose you to take
a look at some of them too:
- vy_build_recover_stmt() calls vy_build_insert_stmt() without
checking mem generation and rotation;
- vy_squash_process() calls vy_lsm_set() without these actions;
- vy_lsm_commit_upsert() calls vy_lsm_set, ditto.
More information about the Tarantool-patches
mailing list