From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp54.i.mail.ru (smtp54.i.mail.ru [217.69.128.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 42A10469710 for ; Sun, 7 Jun 2020 18:51:26 +0300 (MSK) References: <7407b62139349ee3904a674490a6222b0c960a1c.1591292549.git.korablev@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Sun, 7 Jun 2020 17:51:23 +0200 MIME-Version: 1.0 In-Reply-To: <7407b62139349ee3904a674490a6222b0c960a1c.1591292549.git.korablev@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik , tarantool-patches@dev.tarantool.org Hi! Thanks for the patch! On 04/06/2020 19:49, Nikita Pettik wrote: > Meanwhile in 17f6af7dc the similar problem has been fixed, still it may > appear that in-memory level of secondary index being constructed has > lagging memory generation (in other words, it's values is less than > the value of current global generation). Such situation can be achieved > if yield which is required to check uniqueness of value being inserted > is too long. In this time gap other space may trigger dump process > bumping global memory generation counter, but dump itself is still not > yet scheduled. Why is it so important in this bug to trigger the dump, but stop the dump from actual scheduling? Why all the needed actions, whatever they are, can't be done at the trigger moment? And what are these actions, which are not done at trigger moment, but done at scheduling, and lead to this bug? See 10 comments below. > So to get rid of generations mismatch, let's rotate in-memory level > after yield on demand. > > Closes #5042 > --- > Branch: https://github.com/tarantool/tarantool/tree/np/gh-5042-rotate-mem-after-yield > Issue: https://github.com/tarantool/tarantool/issues/5042 > > @ChangeLog: > * Fix crash due to triggered dump process during secondary index creation (gh-5042). > > src/box/vinyl.c | 14 ++ > src/errinj.h | 1 + > test/box/errinj.result | 1 + > .../gh-5042-dump-during-index-build-2.result | 189 ++++++++++++++++++ > ...gh-5042-dump-during-index-build-2.test.lua | 98 +++++++++ > test/vinyl/suite.ini | 2 +- > 6 files changed, 304 insertions(+), 1 deletion(-) > create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.result > create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.test.lua > > diff --git a/src/box/vinyl.c b/src/box/vinyl.c > index a90b786a7..19e37947a 100644 > --- a/src/box/vinyl.c > +++ b/src/box/vinyl.c > @@ -4097,6 +4105,12 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm, > * Hence, to get right mem (during mem rotation it becomes > * sealed i.e. read-only) we should fetch it from lsm again. > */ > + if (lsm->mem->generation != *lsm->env->p_generation) { > + if (vy_lsm_rotate_mem(lsm) != 0) { > + tuple_unref(stmt); > + return -1; > + } > + } 1. This check + rotation now are done in 3 places. In all three there is a comment explaining that again and again. Isn't it time to extract it into a function? > mem = lsm->mem; > > /* Insert the new tuple into the in-memory index. */ > diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.result b/test/vinyl/gh-5042-dump-during-index-build-2.result > new file mode 100644 > index 000000000..a4fbc56ec > --- /dev/null > +++ b/test/vinyl/gh-5042-dump-during-index-build-2.result > @@ -0,0 +1,189 @@ > +-- test-run result file version 2 > +test_run = require('test_run').new() > + | --- > + | ... > + > +test_run:cmd("create server test with script='vinyl/low_quota.lua'") > + | --- > + | - true > + | ... > +test_run:cmd("start server test with args='13421772'") 2. Why is it 13421772? > + | --- > + | - true > + | ... > +test_run:cmd('switch test') > + | --- > + | - true > + | ... > + > +fiber = require 'fiber' > + | --- > + | ... > +errinj = box.error.injection > + | --- > + | ... > + > +-- > +-- Test is quite similar to gh-4810, but data is inserted to > +-- the space different from one which builds the new index. > +-- Consider following scenario: > +-- 1. during building secondary index, we yield to scan disk > +-- in order to check tuple uniqueness; > +-- 2. insertion to another space bumps memory generation (owing to > +-- exceeding memory limit), but dump itself is not yet scheduled > +-- (e.g. scheduler fiber does not get time to execute, or it fails > +-- due for whatever reason); > +-- 3. we return from yield point and now generation of mem does not > +-- match with global one (which was bumped during second step). It may > +-- lead to unpredictable behavior (starting from fired assertions in > +-- debug mode to segfaults or lost data in release). > +-- To equalize memory generations, we must rotate mem of corresponding > +-- LSM tree before processing further operations with it. > +-- > +s1 = box.schema.space.create('insert_to', {engine = 'vinyl'}) > + | --- > + | ... > +_ = s1:create_index('pk', {parts = {1, 'unsigned'}}) > + | --- > + | ... > + > +s2 = box.schema.space.create('build_index', {engine = 'vinyl'}) > + | --- > + | ... > +_ = s2:create_index('pk', {parts = {1, 'unsigned'}}) > + | --- > + | ... > + > +val = 0 > + | --- > + | ... > +PADDING = string.rep('x', 100) 3. Why is it 100? > + | --- > + | ... > + > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > + > +function gen_insert(space) > + pcall(space.insert, space, {val, val, val, val, PADDING}) 4. Why is it important to have 4 vals, and not just one? Why is it in pcall? > + val = val + 1 > +end; > + | --- > + | ... > + > +function fill_L0_without_dump(space, watermark) > + while box.stat.vinyl().memory.level0 < watermark do > + gen_insert(space) > + end 5. Is it possible to check that the dump really didn't happen? Since the function is called 'without_dump'. > +end; > + | --- > + | ... > + > +function insert_loop() > + while not stop do 6. Variable 'stop' is not declared here. I propose to declare it before this function. > + gen_insert(s1) > + end > + ch:put(true) > +end; > + | --- > + | ... > + > +function idx_build() > + _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}}) > + ch:put(true) 7. ch is not declared anywhere above. It would be better to declare things before usage. Easier to read. 8. Is it important to use fields {2, 3} and not just {2}? > +end; > + | --- > + | ... > + > +dump_watermark = box.cfg.vinyl_memory / 2; 9. Why / 2? Is it some internal vinyl formula, that the dump is triggered, when the memory is half full? In this case won't 'fill_L0_without_dump(s2, dump_watermark);' this line trigger the dump? Even though it is called 'without_dump'. > + | --- > + | ... > +fill_L0_without_dump(s1, dump_watermark / 2); > + | --- > + | ... > +fill_L0_without_dump(s2, dump_watermark); > + | --- > + | ... > + > +box.snapshot(); > + | --- > + | - ok > + | ... > + > +fill_L0_without_dump(s1, dump_watermark / 2 * 3); > + | --- > + | ... > + > +function idx_build() > + errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true) > + _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}}) > + ch:put(true) > +end; > + | --- > + | ... > + > +stop = false; > + | --- > + | ... > +ch = fiber.channel(2); > + | --- > + | ... > + > +_ = fiber.create(idx_build); > + | --- > + | ... > +_ = fiber.create(insert_loop, s1); > + | --- > + | ... > +errinj.set("ERRINJ_VY_INDEX_DUMP", 1); > + | --- > + | - ok > + | ... > + > +fiber.sleep(2); 10. Is it possible to avoid the fixed timeout? As you know, having a fixed timeout very often makes the test flaky. Also the test works more than 10 seconds - I always get the message: No output during 10 seconds. Will abort after 120 seconds without output. List of workers not reporting the status: - 001_vinyl [vinyl/gh-5042-dump-during-index-build-2.test.lua, None] at var/001_vinyl/gh-5042-dump-during-index-build-2.result:162 I suggest to try to find a way to speed it up. Overall I didn't understand almost anything in this test, despite the big comment in the beginning. So I can't validate whether it is optimal/correct. All I could check is that it really crashes without the patch. Probably fixing the comments above, and adding more explanations into the test context would help. I checked all the other similar places and propose you to take a look at some of them too: - vy_build_recover_stmt() calls vy_build_insert_stmt() without checking mem generation and rotation; - vy_squash_process() calls vy_lsm_set() without these actions; - vy_lsm_commit_upsert() calls vy_lsm_set, ditto.