Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
@ 2020-06-04 17:49 Nikita Pettik
  2020-06-04 20:23 ` Konstantin Osipov
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Nikita Pettik @ 2020-06-04 17:49 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
appear that in-memory level of secondary index being constructed has
lagging memory generation (in other words, it's values is less than
the value of current global generation). Such situation can be achieved
if yield which is required to check uniqueness of value being inserted
is too long. In this time gap other space may trigger dump process
bumping global memory generation counter, but dump itself is still not
yet scheduled.

So to get rid of generations mismatch, let's rotate in-memory level
after yield on demand.

Closes #5042
---
Branch: https://github.com/tarantool/tarantool/tree/np/gh-5042-rotate-mem-after-yield
Issue: https://github.com/tarantool/tarantool/issues/5042

@ChangeLog:
 * Fix crash due to triggered dump process during secondary index creation (gh-5042).

 src/box/vinyl.c                               |  14 ++
 src/errinj.h                                  |   1 +
 test/box/errinj.result                        |   1 +
 .../gh-5042-dump-during-index-build-2.result  | 189 ++++++++++++++++++
 ...gh-5042-dump-during-index-build-2.test.lua |  98 +++++++++
 test/vinyl/suite.ini                          |   2 +-
 6 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.result
 create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.test.lua

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index a90b786a7..19e37947a 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -4083,6 +4083,14 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm,
 	vy_mem_pin(mem);
 	rc = vy_check_is_unique_secondary(NULL, &env->xm->p_committed_read_view,
 					  space_name, index_name, lsm, stmt);
+#ifndef NDEBUG
+	struct errinj *inj = errinj(ERRINJ_BUILD_UNIQUE_IDX_DELAY, ERRINJ_BOOL);
+	if (inj != NULL) {
+		while (inj->bparam) {
+			fiber_sleep(0.01);
+		}
+	}
+#endif
 	vy_mem_unpin(mem);
 	if (rc != 0) {
 		tuple_unref(stmt);
@@ -4097,6 +4105,12 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm,
 	 * Hence, to get right mem (during mem rotation it becomes
 	 * sealed i.e. read-only) we should fetch it from lsm again.
 	 */
+	if (lsm->mem->generation != *lsm->env->p_generation) {
+		if (vy_lsm_rotate_mem(lsm) != 0) {
+			tuple_unref(stmt);
+			return -1;
+		}
+	}
 	mem = lsm->mem;
 
 	/* Insert the new tuple into the in-memory index. */
diff --git a/src/errinj.h b/src/errinj.h
index 92bff5e4d..1adbb9e05 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -131,6 +131,7 @@ struct errinj {
 	_(ERRINJ_VY_READ_VIEW_MERGE_FAIL, ERRINJ_BOOL, {.bparam = false})\
 	_(ERRINJ_VY_WRITE_ITERATOR_START_FAIL, ERRINJ_BOOL, {.bparam = false})\
 	_(ERRINJ_VY_RUN_OPEN, ERRINJ_INT, {.iparam = -1})\
+	_(ERRINJ_BUILD_UNIQUE_IDX_DELAY, ERRINJ_BOOL, {.bparam = false})\
 
 ENUM0(errinj_id, ERRINJ_LIST);
 extern struct errinj errinjs[];
diff --git a/test/box/errinj.result b/test/box/errinj.result
index 3f075a2d0..cf1e81072 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -33,6 +33,7 @@ end
 evals
 ---
 - - ERRINJ_BUILD_INDEX: -1
+  - ERRINJ_BUILD_UNIQUE_IDX_DELAY: false
   - ERRINJ_DYN_MODULE_COUNT: 0
   - ERRINJ_HTTPC_EXECUTE: false
   - ERRINJ_HTTP_RESPONSE_ADD_WAIT: false
diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.result b/test/vinyl/gh-5042-dump-during-index-build-2.result
new file mode 100644
index 000000000..a4fbc56ec
--- /dev/null
+++ b/test/vinyl/gh-5042-dump-during-index-build-2.result
@@ -0,0 +1,189 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+test_run:cmd("create server test with script='vinyl/low_quota.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd("start server test with args='13421772'")
+ | ---
+ | - true
+ | ...
+test_run:cmd('switch test')
+ | ---
+ | - true
+ | ...
+
+fiber = require 'fiber'
+ | ---
+ | ...
+errinj = box.error.injection
+ | ---
+ | ...
+
+--
+-- Test is quite similar to gh-4810, but data is inserted to
+-- the space different from one which builds the new index.
+-- Consider following scenario:
+-- 1. during building secondary index, we yield to scan disk
+-- in order to check tuple uniqueness;
+-- 2. insertion to another space bumps memory generation (owing to
+-- exceeding memory limit), but dump itself is not yet scheduled
+-- (e.g. scheduler fiber does not get time to execute, or it fails
+-- due for whatever reason);
+-- 3. we return from yield point and now generation of mem does not
+-- match with global one (which was bumped during second step). It may
+-- lead to unpredictable behavior (starting from fired assertions in
+-- debug mode to segfaults or lost data in release).
+-- To equalize memory generations, we must rotate mem of corresponding
+-- LSM tree before processing further operations with it.
+--
+s1 = box.schema.space.create('insert_to', {engine = 'vinyl'})
+ | ---
+ | ...
+_ = s1:create_index('pk', {parts = {1, 'unsigned'}})
+ | ---
+ | ...
+
+s2 = box.schema.space.create('build_index', {engine = 'vinyl'})
+ | ---
+ | ...
+_ = s2:create_index('pk', {parts = {1, 'unsigned'}})
+ | ---
+ | ...
+
+val = 0
+ | ---
+ | ...
+PADDING = string.rep('x', 100)
+ | ---
+ | ...
+
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+
+function gen_insert(space)
+    pcall(space.insert, space, {val, val, val, val, PADDING})
+    val = val + 1
+end;
+ | ---
+ | ...
+
+function fill_L0_without_dump(space, watermark)
+    while box.stat.vinyl().memory.level0 < watermark do
+        gen_insert(space)
+    end
+end;
+ | ---
+ | ...
+
+function insert_loop()
+    while not stop do
+        gen_insert(s1)
+    end
+    ch:put(true)
+end;
+ | ---
+ | ...
+
+function idx_build()
+    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
+    ch:put(true)
+end;
+ | ---
+ | ...
+
+dump_watermark = box.cfg.vinyl_memory / 2;
+ | ---
+ | ...
+fill_L0_without_dump(s1, dump_watermark / 2);
+ | ---
+ | ...
+fill_L0_without_dump(s2, dump_watermark);
+ | ---
+ | ...
+
+box.snapshot();
+ | ---
+ | - ok
+ | ...
+
+fill_L0_without_dump(s1, dump_watermark / 2 * 3);
+ | ---
+ | ...
+
+function idx_build()
+    errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
+    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
+    ch:put(true)
+end;
+ | ---
+ | ...
+
+stop = false;
+ | ---
+ | ...
+ch = fiber.channel(2);
+ | ---
+ | ...
+
+_ = fiber.create(idx_build);
+ | ---
+ | ...
+_ = fiber.create(insert_loop, s1);
+ | ---
+ | ...
+errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
+ | ---
+ | - ok
+ | ...
+
+fiber.sleep(2);
+ | ---
+ | ...
+errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
+ | ---
+ | - ok
+ | ...
+errinj.set("ERRINJ_VY_INDEX_DUMP", -1);
+ | ---
+ | - ok
+ | ...
+
+stop = true;
+ | ---
+ | ...
+for i = 1, ch:size() do
+    ch:get()
+end;
+ | ---
+ | ...
+
+s1:drop();
+ | ---
+ | ...
+s2:drop();
+ | ---
+ | ...
+
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:cmd("stop server test")
+ | ---
+ | - true
+ | ...
+test_run:cmd("cleanup server test")
+ | ---
+ | - true
+ | ...
diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.test.lua b/test/vinyl/gh-5042-dump-during-index-build-2.test.lua
new file mode 100644
index 000000000..16cdfa668
--- /dev/null
+++ b/test/vinyl/gh-5042-dump-during-index-build-2.test.lua
@@ -0,0 +1,98 @@
+test_run = require('test_run').new()
+
+test_run:cmd("create server test with script='vinyl/low_quota.lua'")
+test_run:cmd("start server test with args='13421772'")
+test_run:cmd('switch test')
+
+fiber = require 'fiber'
+errinj = box.error.injection
+
+--
+-- Test is quite similar to gh-4810, but data is inserted to
+-- the space different from one which builds the new index.
+-- Consider following scenario:
+-- 1. during building secondary index, we yield to scan disk
+-- in order to check tuple uniqueness;
+-- 2. insertion to another space bumps memory generation (owing to
+-- exceeding memory limit), but dump itself is not yet scheduled
+-- (e.g. scheduler fiber does not get time to execute, or it fails
+-- due for whatever reason);
+-- 3. we return from yield point and now generation of mem does not
+-- match with global one (which was bumped during second step). It may
+-- lead to unpredictable behavior (starting from fired assertions in
+-- debug mode to segfaults or lost data in release).
+-- To equalize memory generations, we must rotate mem of corresponding
+-- LSM tree before processing further operations with it.
+--
+s1 = box.schema.space.create('insert_to', {engine = 'vinyl'})
+_ = s1:create_index('pk', {parts = {1, 'unsigned'}})
+
+s2 = box.schema.space.create('build_index', {engine = 'vinyl'})
+_ = s2:create_index('pk', {parts = {1, 'unsigned'}})
+
+val = 0
+PADDING = string.rep('x', 100)
+
+test_run:cmd("setopt delimiter ';'")
+
+function gen_insert(space)
+    pcall(space.insert, space, {val, val, val, val, PADDING})
+    val = val + 1
+end;
+
+function fill_L0_without_dump(space, watermark)
+    while box.stat.vinyl().memory.level0 < watermark do
+        gen_insert(space)
+    end
+end;
+
+function insert_loop()
+    while not stop do
+        gen_insert(s1)
+    end
+    ch:put(true)
+end;
+
+function idx_build()
+    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
+    ch:put(true)
+end;
+
+dump_watermark = box.cfg.vinyl_memory / 2;
+fill_L0_without_dump(s1, dump_watermark / 2);
+fill_L0_without_dump(s2, dump_watermark);
+
+box.snapshot();
+
+fill_L0_without_dump(s1, dump_watermark / 2 * 3);
+
+function idx_build()
+    errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
+    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
+    ch:put(true)
+end;
+
+stop = false;
+ch = fiber.channel(2);
+
+_ = fiber.create(idx_build);
+_ = fiber.create(insert_loop, s1);
+errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
+
+fiber.sleep(2);
+errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
+errinj.set("ERRINJ_VY_INDEX_DUMP", -1);
+
+stop = true;
+for i = 1, ch:size() do
+    ch:get()
+end;
+
+s1:drop();
+s2:drop();
+
+test_run:cmd("setopt delimiter ''");
+
+test_run:cmd('switch default')
+test_run:cmd("stop server test")
+test_run:cmd("cleanup server test")
diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini
index 4de29fccf..7f2a2896f 100644
--- a/test/vinyl/suite.ini
+++ b/test/vinyl/suite.ini
@@ -2,7 +2,7 @@
 core = tarantool
 description = vinyl integration tests
 script = vinyl.lua
-release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua gh-4864-stmt-alloc-fail-compact.test.lua gh-4805-open-run-err-recovery.test.lua
+release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua gh-4864-stmt-alloc-fail-compact.test.lua gh-4805-open-run-err-recovery.test.lua gh-5042-dump-during-index-build-2.test.lua
 config = suite.cfg
 lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua
 use_unix_sockets = True
-- 
2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-04 17:49 [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand Nikita Pettik
@ 2020-06-04 20:23 ` Konstantin Osipov
  2020-06-04 22:32   ` Nikita Pettik
  2020-06-04 22:52 ` Konstantin Osipov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Konstantin Osipov @ 2020-06-04 20:23 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches, v.shpilevoy

* Nikita Pettik <korablev@tarantool.org> [20/06/04 20:49]:
> Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
> appear that in-memory level of secondary index being constructed has
> lagging memory generation (in other words, it's values is less than
> the value of current global generation). Such situation can be achieved
> if yield which is required to check uniqueness of value being inserted
> is too long. In this time gap other space may trigger dump process
> bumping global memory generation counter, but dump itself is still not
> yet scheduled.

It's hard for me to understand this comment, perhaps
 you discussed the problem verbally, but I'm a bit out of context.

Could you write it using an event diagram, something like:

user1         user2            vy_scheduler

insert ...     
             create_index()


yield                            
                               dump

???


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-04 20:23 ` Konstantin Osipov
@ 2020-06-04 22:32   ` Nikita Pettik
  0 siblings, 0 replies; 14+ messages in thread
From: Nikita Pettik @ 2020-06-04 22:32 UTC (permalink / raw)
  To: Konstantin Osipov, tarantool-patches, v.shpilevoy, alyapunov

On 04 Jun 23:23, Konstantin Osipov wrote:
> * Nikita Pettik <korablev@tarantool.org> [20/06/04 20:49]:
> > Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
> > appear that in-memory level of secondary index being constructed has
> > lagging memory generation (in other words, it's values is less than
> > the value of current global generation). Such situation can be achieved
> > if yield which is required to check uniqueness of value being inserted
> > is too long. In this time gap other space may trigger dump process
> > bumping global memory generation counter, but dump itself is still not
> > yet scheduled.
> 
> It's hard for me to understand this comment, perhaps
>  you discussed the problem verbally, but I'm a bit out of context.
> 
> Could you write it using an event diagram, something like:
> 
> user1         user2            vy_scheduler
> 
> insert ...     
>              create_index()
> 
> 
> yield                            
>                                dump
> 
> ???
> 

Sure. In the first case vy_scheduler fiber is not waked up during yield in
vy_build_insert_tuple():

       user1                user2            scheduler         generation

s1:create_index              ---               idle               n

yield(*_insert_tuple)       s2:insert()        idle               n

yield                     *_trigger_dump()     idle              n+1

yield                      s2:rotate_mem()     idle              n+1

yield                         ---              idle              n+1

s1:*_insert_tuple continue     ---             idle              n+1

^
|

s1 still has mem generation n, meanwhile global and s2 generations is n+1.

Since we can't schedule fibers (and can't provide guarantee that
vy_scheduler fiber won't wake up), to reproduce the problem let's
assume that vy_task_dump_new fails for whatever reason:

       user1                user2            scheduler         generation

s1:create_index              ---               idle               n

yield(*_insert_tuple)       s2:insert()        idle               n

yield                     *_trigger_dump()     idle              n+1

yield                          ---         vy_task_dump_new      n+1

yield                          ---            throttled          n+1

s1:*_insert_tuple continue      ---           throttled          n+1

Again s1 has mem generation n, meanwhile global and s2 generations is n+1.


 
> -- 
> Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-04 17:49 [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand Nikita Pettik
  2020-06-04 20:23 ` Konstantin Osipov
@ 2020-06-04 22:52 ` Konstantin Osipov
  2020-06-05 11:18   ` Nikita Pettik
  2020-06-06  8:38 ` Aleksandr Lyapunov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Konstantin Osipov @ 2020-06-04 22:52 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches, v.shpilevoy

* Nikita Pettik <korablev@tarantool.org> [20/06/04 20:49]:
> So to get rid of generations mismatch, let's rotate in-memory level
> after yield on demand.

Could you please also elaborate why generation mismatch is bad
here? What does it lead to specifically?


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-04 22:52 ` Konstantin Osipov
@ 2020-06-05 11:18   ` Nikita Pettik
  2020-06-05 11:33     ` Konstantin Osipov
  0 siblings, 1 reply; 14+ messages in thread
From: Nikita Pettik @ 2020-06-05 11:18 UTC (permalink / raw)
  To: Konstantin Osipov, tarantool-patches, v.shpilevoy, alyapunov

On 05 Jun 01:52, Konstantin Osipov wrote:
> * Nikita Pettik <korablev@tarantool.org> [20/06/04 20:49]:
> > So to get rid of generations mismatch, let's rotate in-memory level
> > after yield on demand.
> 
> Could you please also elaborate why generation mismatch is bad
> here? What does it lead to specifically?

Because right after yield, vy_stmt is allocated using ls region
(in vy_build_insert_stmt()) and inserted into mem. Ls region does
not tolerate non-decreasing ids: otherwise region gc may truncate
data which is still accessible.
 
> 
> -- 
> Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-05 11:18   ` Nikita Pettik
@ 2020-06-05 11:33     ` Konstantin Osipov
  0 siblings, 0 replies; 14+ messages in thread
From: Konstantin Osipov @ 2020-06-05 11:33 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches, v.shpilevoy

* Nikita Pettik <korablev@tarantool.org> [20/06/05 14:23]:
> > > So to get rid of generations mismatch, let's rotate in-memory level
> > > after yield on demand.
> > 
> > Could you please also elaborate why generation mismatch is bad
> > here? What does it lead to specifically?
> 
> Because right after yield, vy_stmt is allocated using ls region
> (in vy_build_insert_stmt()) and inserted into mem. Ls region does
> not tolerate non-decreasing ids: otherwise region gc may truncate
> data which is still accessible.

Right, this is a gotcha I forgot about. The entire ls_region was a
temporary crutch to get dumps working.

Thanks for explanation.

-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-04 17:49 [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand Nikita Pettik
  2020-06-04 20:23 ` Konstantin Osipov
  2020-06-04 22:52 ` Konstantin Osipov
@ 2020-06-06  8:38 ` Aleksandr Lyapunov
  2020-06-07 15:51 ` Vladislav Shpilevoy
  2020-06-10 15:27 ` Nikita Pettik
  4 siblings, 0 replies; 14+ messages in thread
From: Aleksandr Lyapunov @ 2020-06-06  8:38 UTC (permalink / raw)
  To: Nikita Pettik, tarantool-patches; +Cc: v.shpilevoy

Thanks for the patch, lgtm.

On 6/4/20 8:49 PM, Nikita Pettik wrote:
> Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
> appear that in-memory level of secondary index being constructed has
> lagging memory generation (in other words, it's values is less than
> the value of current global generation). Such situation can be achieved
> if yield which is required to check uniqueness of value being inserted
> is too long. In this time gap other space may trigger dump process
> bumping global memory generation counter, but dump itself is still not
> yet scheduled.
>
> So to get rid of generations mismatch, let's rotate in-memory level
> after yield on demand.
>
> Closes #5042
> ---
> Branch: https://github.com/tarantool/tarantool/tree/np/gh-5042-rotate-mem-after-yield
> Issue: https://github.com/tarantool/tarantool/issues/5042
>
> @ChangeLog:
>   * Fix crash due to triggered dump process during secondary index creation (gh-5042).
>
>   src/box/vinyl.c                               |  14 ++
>   src/errinj.h                                  |   1 +
>   test/box/errinj.result                        |   1 +
>   .../gh-5042-dump-during-index-build-2.result  | 189 ++++++++++++++++++
>   ...gh-5042-dump-during-index-build-2.test.lua |  98 +++++++++
>   test/vinyl/suite.ini                          |   2 +-
>   6 files changed, 304 insertions(+), 1 deletion(-)
>   create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.result
>   create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.test.lua
>
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index a90b786a7..19e37947a 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -4083,6 +4083,14 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm,
>   	vy_mem_pin(mem);
>   	rc = vy_check_is_unique_secondary(NULL, &env->xm->p_committed_read_view,
>   					  space_name, index_name, lsm, stmt);
> +#ifndef NDEBUG
> +	struct errinj *inj = errinj(ERRINJ_BUILD_UNIQUE_IDX_DELAY, ERRINJ_BOOL);
> +	if (inj != NULL) {
> +		while (inj->bparam) {
> +			fiber_sleep(0.01);
> +		}
> +	}
> +#endif
>   	vy_mem_unpin(mem);
>   	if (rc != 0) {
>   		tuple_unref(stmt);
> @@ -4097,6 +4105,12 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm,
>   	 * Hence, to get right mem (during mem rotation it becomes
>   	 * sealed i.e. read-only) we should fetch it from lsm again.
>   	 */
> +	if (lsm->mem->generation != *lsm->env->p_generation) {
> +		if (vy_lsm_rotate_mem(lsm) != 0) {
> +			tuple_unref(stmt);
> +			return -1;
> +		}
> +	}
>   	mem = lsm->mem;
>   
>   	/* Insert the new tuple into the in-memory index. */
> diff --git a/src/errinj.h b/src/errinj.h
> index 92bff5e4d..1adbb9e05 100644
> --- a/src/errinj.h
> +++ b/src/errinj.h
> @@ -131,6 +131,7 @@ struct errinj {
>   	_(ERRINJ_VY_READ_VIEW_MERGE_FAIL, ERRINJ_BOOL, {.bparam = false})\
>   	_(ERRINJ_VY_WRITE_ITERATOR_START_FAIL, ERRINJ_BOOL, {.bparam = false})\
>   	_(ERRINJ_VY_RUN_OPEN, ERRINJ_INT, {.iparam = -1})\
> +	_(ERRINJ_BUILD_UNIQUE_IDX_DELAY, ERRINJ_BOOL, {.bparam = false})\
>   
>   ENUM0(errinj_id, ERRINJ_LIST);
>   extern struct errinj errinjs[];
> diff --git a/test/box/errinj.result b/test/box/errinj.result
> index 3f075a2d0..cf1e81072 100644
> --- a/test/box/errinj.result
> +++ b/test/box/errinj.result
> @@ -33,6 +33,7 @@ end
>   evals
>   ---
>   - - ERRINJ_BUILD_INDEX: -1
> +  - ERRINJ_BUILD_UNIQUE_IDX_DELAY: false
>     - ERRINJ_DYN_MODULE_COUNT: 0
>     - ERRINJ_HTTPC_EXECUTE: false
>     - ERRINJ_HTTP_RESPONSE_ADD_WAIT: false
> diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.result b/test/vinyl/gh-5042-dump-during-index-build-2.result
> new file mode 100644
> index 000000000..a4fbc56ec
> --- /dev/null
> +++ b/test/vinyl/gh-5042-dump-during-index-build-2.result
> @@ -0,0 +1,189 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +test_run:cmd("create server test with script='vinyl/low_quota.lua'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server test with args='13421772'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd('switch test')
> + | ---
> + | - true
> + | ...
> +
> +fiber = require 'fiber'
> + | ---
> + | ...
> +errinj = box.error.injection
> + | ---
> + | ...
> +
> +--
> +-- Test is quite similar to gh-4810, but data is inserted to
> +-- the space different from one which builds the new index.
> +-- Consider following scenario:
> +-- 1. during building secondary index, we yield to scan disk
> +-- in order to check tuple uniqueness;
> +-- 2. insertion to another space bumps memory generation (owing to
> +-- exceeding memory limit), but dump itself is not yet scheduled
> +-- (e.g. scheduler fiber does not get time to execute, or it fails
> +-- due for whatever reason);
> +-- 3. we return from yield point and now generation of mem does not
> +-- match with global one (which was bumped during second step). It may
> +-- lead to unpredictable behavior (starting from fired assertions in
> +-- debug mode to segfaults or lost data in release).
> +-- To equalize memory generations, we must rotate mem of corresponding
> +-- LSM tree before processing further operations with it.
> +--
> +s1 = box.schema.space.create('insert_to', {engine = 'vinyl'})
> + | ---
> + | ...
> +_ = s1:create_index('pk', {parts = {1, 'unsigned'}})
> + | ---
> + | ...
> +
> +s2 = box.schema.space.create('build_index', {engine = 'vinyl'})
> + | ---
> + | ...
> +_ = s2:create_index('pk', {parts = {1, 'unsigned'}})
> + | ---
> + | ...
> +
> +val = 0
> + | ---
> + | ...
> +PADDING = string.rep('x', 100)
> + | ---
> + | ...
> +
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +
> +function gen_insert(space)
> +    pcall(space.insert, space, {val, val, val, val, PADDING})
> +    val = val + 1
> +end;
> + | ---
> + | ...
> +
> +function fill_L0_without_dump(space, watermark)
> +    while box.stat.vinyl().memory.level0 < watermark do
> +        gen_insert(space)
> +    end
> +end;
> + | ---
> + | ...
> +
> +function insert_loop()
> +    while not stop do
> +        gen_insert(s1)
> +    end
> +    ch:put(true)
> +end;
> + | ---
> + | ...
> +
> +function idx_build()
> +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> +    ch:put(true)
> +end;
> + | ---
> + | ...
> +
> +dump_watermark = box.cfg.vinyl_memory / 2;
> + | ---
> + | ...
> +fill_L0_without_dump(s1, dump_watermark / 2);
> + | ---
> + | ...
> +fill_L0_without_dump(s2, dump_watermark);
> + | ---
> + | ...
> +
> +box.snapshot();
> + | ---
> + | - ok
> + | ...
> +
> +fill_L0_without_dump(s1, dump_watermark / 2 * 3);
> + | ---
> + | ...
> +
> +function idx_build()
> +    errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
> +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> +    ch:put(true)
> +end;
> + | ---
> + | ...
> +
> +stop = false;
> + | ---
> + | ...
> +ch = fiber.channel(2);
> + | ---
> + | ...
> +
> +_ = fiber.create(idx_build);
> + | ---
> + | ...
> +_ = fiber.create(insert_loop, s1);
> + | ---
> + | ...
> +errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
> + | ---
> + | - ok
> + | ...
> +
> +fiber.sleep(2);
> + | ---
> + | ...
> +errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
> + | ---
> + | - ok
> + | ...
> +errinj.set("ERRINJ_VY_INDEX_DUMP", -1);
> + | ---
> + | - ok
> + | ...
> +
> +stop = true;
> + | ---
> + | ...
> +for i = 1, ch:size() do
> +    ch:get()
> +end;
> + | ---
> + | ...
> +
> +s1:drop();
> + | ---
> + | ...
> +s2:drop();
> + | ---
> + | ...
> +
> +test_run:cmd("setopt delimiter ''");
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd("stop server test")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("cleanup server test")
> + | ---
> + | - true
> + | ...
> diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.test.lua b/test/vinyl/gh-5042-dump-during-index-build-2.test.lua
> new file mode 100644
> index 000000000..16cdfa668
> --- /dev/null
> +++ b/test/vinyl/gh-5042-dump-during-index-build-2.test.lua
> @@ -0,0 +1,98 @@
> +test_run = require('test_run').new()
> +
> +test_run:cmd("create server test with script='vinyl/low_quota.lua'")
> +test_run:cmd("start server test with args='13421772'")
> +test_run:cmd('switch test')
> +
> +fiber = require 'fiber'
> +errinj = box.error.injection
> +
> +--
> +-- Test is quite similar to gh-4810, but data is inserted to
> +-- the space different from one which builds the new index.
> +-- Consider following scenario:
> +-- 1. during building secondary index, we yield to scan disk
> +-- in order to check tuple uniqueness;
> +-- 2. insertion to another space bumps memory generation (owing to
> +-- exceeding memory limit), but dump itself is not yet scheduled
> +-- (e.g. scheduler fiber does not get time to execute, or it fails
> +-- due for whatever reason);
> +-- 3. we return from yield point and now generation of mem does not
> +-- match with global one (which was bumped during second step). It may
> +-- lead to unpredictable behavior (starting from fired assertions in
> +-- debug mode to segfaults or lost data in release).
> +-- To equalize memory generations, we must rotate mem of corresponding
> +-- LSM tree before processing further operations with it.
> +--
> +s1 = box.schema.space.create('insert_to', {engine = 'vinyl'})
> +_ = s1:create_index('pk', {parts = {1, 'unsigned'}})
> +
> +s2 = box.schema.space.create('build_index', {engine = 'vinyl'})
> +_ = s2:create_index('pk', {parts = {1, 'unsigned'}})
> +
> +val = 0
> +PADDING = string.rep('x', 100)
> +
> +test_run:cmd("setopt delimiter ';'")
> +
> +function gen_insert(space)
> +    pcall(space.insert, space, {val, val, val, val, PADDING})
> +    val = val + 1
> +end;
> +
> +function fill_L0_without_dump(space, watermark)
> +    while box.stat.vinyl().memory.level0 < watermark do
> +        gen_insert(space)
> +    end
> +end;
> +
> +function insert_loop()
> +    while not stop do
> +        gen_insert(s1)
> +    end
> +    ch:put(true)
> +end;
> +
> +function idx_build()
> +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> +    ch:put(true)
> +end;
> +
> +dump_watermark = box.cfg.vinyl_memory / 2;
> +fill_L0_without_dump(s1, dump_watermark / 2);
> +fill_L0_without_dump(s2, dump_watermark);
> +
> +box.snapshot();
> +
> +fill_L0_without_dump(s1, dump_watermark / 2 * 3);
> +
> +function idx_build()
> +    errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
> +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> +    ch:put(true)
> +end;
> +
> +stop = false;
> +ch = fiber.channel(2);
> +
> +_ = fiber.create(idx_build);
> +_ = fiber.create(insert_loop, s1);
> +errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
> +
> +fiber.sleep(2);
> +errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
> +errinj.set("ERRINJ_VY_INDEX_DUMP", -1);
> +
> +stop = true;
> +for i = 1, ch:size() do
> +    ch:get()
> +end;
> +
> +s1:drop();
> +s2:drop();
> +
> +test_run:cmd("setopt delimiter ''");
> +
> +test_run:cmd('switch default')
> +test_run:cmd("stop server test")
> +test_run:cmd("cleanup server test")
> diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini
> index 4de29fccf..7f2a2896f 100644
> --- a/test/vinyl/suite.ini
> +++ b/test/vinyl/suite.ini
> @@ -2,7 +2,7 @@
>   core = tarantool
>   description = vinyl integration tests
>   script = vinyl.lua
> -release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua gh-4864-stmt-alloc-fail-compact.test.lua gh-4805-open-run-err-recovery.test.lua
> +release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua gh-4864-stmt-alloc-fail-compact.test.lua gh-4805-open-run-err-recovery.test.lua gh-5042-dump-during-index-build-2.test.lua
>   config = suite.cfg
>   lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua
>   use_unix_sockets = True

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-04 17:49 [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand Nikita Pettik
                   ` (2 preceding siblings ...)
  2020-06-06  8:38 ` Aleksandr Lyapunov
@ 2020-06-07 15:51 ` Vladislav Shpilevoy
  2020-06-09 22:34   ` Nikita Pettik
  2020-06-10 15:27 ` Nikita Pettik
  4 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-07 15:51 UTC (permalink / raw)
  To: Nikita Pettik, tarantool-patches

Hi! Thanks for the patch!

On 04/06/2020 19:49, Nikita Pettik wrote:
> Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
> appear that in-memory level of secondary index being constructed has
> lagging memory generation (in other words, it's values is less than
> the value of current global generation). Such situation can be achieved
> if yield which is required to check uniqueness of value being inserted
> is too long. In this time gap other space may trigger dump process
> bumping global memory generation counter, but dump itself is still not
> yet scheduled.

Why is it so important in this bug to trigger the dump, but stop the
dump from actual scheduling? Why all the needed actions, whatever they
are, can't be done at the trigger moment? And what are these actions,
which are not done at trigger moment, but done at scheduling, and lead
to this bug?

See 10 comments below.

> So to get rid of generations mismatch, let's rotate in-memory level
> after yield on demand.
> 
> Closes #5042
> ---
> Branch: https://github.com/tarantool/tarantool/tree/np/gh-5042-rotate-mem-after-yield
> Issue: https://github.com/tarantool/tarantool/issues/5042
> 
> @ChangeLog:
>  * Fix crash due to triggered dump process during secondary index creation (gh-5042).
> 
>  src/box/vinyl.c                               |  14 ++
>  src/errinj.h                                  |   1 +
>  test/box/errinj.result                        |   1 +
>  .../gh-5042-dump-during-index-build-2.result  | 189 ++++++++++++++++++
>  ...gh-5042-dump-during-index-build-2.test.lua |  98 +++++++++
>  test/vinyl/suite.ini                          |   2 +-
>  6 files changed, 304 insertions(+), 1 deletion(-)
>  create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.result
>  create mode 100644 test/vinyl/gh-5042-dump-during-index-build-2.test.lua
> 
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index a90b786a7..19e37947a 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -4097,6 +4105,12 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm,
>  	 * Hence, to get right mem (during mem rotation it becomes
>  	 * sealed i.e. read-only) we should fetch it from lsm again.
>  	 */
> +	if (lsm->mem->generation != *lsm->env->p_generation) {
> +		if (vy_lsm_rotate_mem(lsm) != 0) {
> +			tuple_unref(stmt);
> +			return -1;
> +		}
> +	}

1. This check + rotation now are done in 3 places. In all three there
is a comment explaining that again and again. Isn't it time to extract
it into a function?

>  	mem = lsm->mem;
>  
>  	/* Insert the new tuple into the in-memory index. */
> diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.result b/test/vinyl/gh-5042-dump-during-index-build-2.result
> new file mode 100644
> index 000000000..a4fbc56ec
> --- /dev/null
> +++ b/test/vinyl/gh-5042-dump-during-index-build-2.result
> @@ -0,0 +1,189 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +test_run:cmd("create server test with script='vinyl/low_quota.lua'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd("start server test with args='13421772'")

2. Why is it 13421772?

> + | ---
> + | - true
> + | ...
> +test_run:cmd('switch test')
> + | ---
> + | - true
> + | ...
> +
> +fiber = require 'fiber'
> + | ---
> + | ...
> +errinj = box.error.injection
> + | ---
> + | ...
> +
> +--
> +-- Test is quite similar to gh-4810, but data is inserted to
> +-- the space different from one which builds the new index.
> +-- Consider following scenario:
> +-- 1. during building secondary index, we yield to scan disk
> +-- in order to check tuple uniqueness;
> +-- 2. insertion to another space bumps memory generation (owing to
> +-- exceeding memory limit), but dump itself is not yet scheduled
> +-- (e.g. scheduler fiber does not get time to execute, or it fails
> +-- due for whatever reason);
> +-- 3. we return from yield point and now generation of mem does not
> +-- match with global one (which was bumped during second step). It may
> +-- lead to unpredictable behavior (starting from fired assertions in
> +-- debug mode to segfaults or lost data in release).
> +-- To equalize memory generations, we must rotate mem of corresponding
> +-- LSM tree before processing further operations with it.
> +--
> +s1 = box.schema.space.create('insert_to', {engine = 'vinyl'})
> + | ---
> + | ...
> +_ = s1:create_index('pk', {parts = {1, 'unsigned'}})
> + | ---
> + | ...
> +
> +s2 = box.schema.space.create('build_index', {engine = 'vinyl'})
> + | ---
> + | ...
> +_ = s2:create_index('pk', {parts = {1, 'unsigned'}})
> + | ---
> + | ...
> +
> +val = 0
> + | ---
> + | ...
> +PADDING = string.rep('x', 100)

3. Why is it 100?

> + | ---
> + | ...
> +
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +
> +function gen_insert(space)
> +    pcall(space.insert, space, {val, val, val, val, PADDING})

4. Why is it important to have 4 vals, and not just one? Why is
it in pcall?

> +    val = val + 1
> +end;
> + | ---
> + | ...
> +
> +function fill_L0_without_dump(space, watermark)
> +    while box.stat.vinyl().memory.level0 < watermark do
> +        gen_insert(space)
> +    end

5. Is it possible to check that the dump really didn't happen?
Since the function is called 'without_dump'.

> +end;
> + | ---
> + | ...
> +
> +function insert_loop()
> +    while not stop do

6. Variable 'stop' is not declared here. I propose to declare it
before this function.

> +        gen_insert(s1)
> +    end
> +    ch:put(true)
> +end;
> + | ---
> + | ...
> +
> +function idx_build()
> +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> +    ch:put(true)

7. ch is not declared anywhere above. It would be better to declare
things before usage. Easier to read.

8. Is it important to use fields {2, 3} and not just {2}?

> +end;
> + | ---
> + | ...
> +
> +dump_watermark = box.cfg.vinyl_memory / 2;

9. Why / 2? Is it some internal vinyl formula, that
the dump is triggered, when the memory is half full?

In this case won't 'fill_L0_without_dump(s2, dump_watermark);'
this line trigger the dump? Even though it is called
'without_dump'.

> + | ---
> + | ...
> +fill_L0_without_dump(s1, dump_watermark / 2);
> + | ---
> + | ...
> +fill_L0_without_dump(s2, dump_watermark);
> + | ---
> + | ...
> +
> +box.snapshot();
> + | ---
> + | - ok
> + | ...
> +
> +fill_L0_without_dump(s1, dump_watermark / 2 * 3);
> + | ---
> + | ...
> +
> +function idx_build()
> +    errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
> +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> +    ch:put(true)
> +end;
> + | ---
> + | ...
> +
> +stop = false;
> + | ---
> + | ...
> +ch = fiber.channel(2);
> + | ---
> + | ...
> +
> +_ = fiber.create(idx_build);
> + | ---
> + | ...
> +_ = fiber.create(insert_loop, s1);
> + | ---
> + | ...
> +errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
> + | ---
> + | - ok
> + | ...
> +
> +fiber.sleep(2);

10. Is it possible to avoid the fixed timeout? As you know,
having a fixed timeout very often makes the test flaky.

Also the test works more than 10 seconds - I always get the
message:

	No output during 10 seconds. Will abort after 120 seconds without output. List of workers not reporting the status:
	- 001_vinyl [vinyl/gh-5042-dump-during-index-build-2.test.lua, None] at var/001_vinyl/gh-5042-dump-during-index-build-2.result:162

I suggest to try to find a way to speed it up.

Overall I didn't understand almost anything in this test,
despite the big comment in the beginning. So I can't
validate whether it is optimal/correct. All I could check
is that it really crashes without the patch. Probably fixing
the comments above, and adding more explanations into the
test context would help.

I checked all the other similar places and propose you to take
a look at some of them too:

- vy_build_recover_stmt() calls vy_build_insert_stmt() without
  checking mem generation and rotation;

- vy_squash_process() calls vy_lsm_set() without these actions;

- vy_lsm_commit_upsert() calls vy_lsm_set, ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-07 15:51 ` Vladislav Shpilevoy
@ 2020-06-09 22:34   ` Nikita Pettik
  2020-06-15 13:54     ` Nikita Pettik
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Nikita Pettik @ 2020-06-09 22:34 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 07 Jun 17:51, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> On 04/06/2020 19:49, Nikita Pettik wrote:
> > Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
> > appear that in-memory level of secondary index being constructed has
> > lagging memory generation (in other words, it's values is less than
> > the value of current global generation). Such situation can be achieved
> > if yield which is required to check uniqueness of value being inserted
> > is too long. In this time gap other space may trigger dump process
> > bumping global memory generation counter, but dump itself is still not
> > yet scheduled.
> 
> Why is it so important in this bug to trigger the dump, but stop the
> dump from actual scheduling?

Triggering the dump process bumps global memory generation. After
this event each L0 should be rotated before processing further
using of lsregion with old generation (another fiber could insert
newer tuple using lsregion).

> Why all the needed actions, whatever they
> are, can't be done at the trigger moment? 

Rotating all L0s at once may stall execution for a while.
It is done later in scheduler's fiber.

> And what are these actions,
> which are not done at trigger moment, but done at scheduling, and lead
> to this bug?

It's memory roratiton in vy_task_dump_new().

> > diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> > index a90b786a7..19e37947a 100644
> > --- a/src/box/vinyl.c
> > +++ b/src/box/vinyl.c
> > @@ -4097,6 +4105,12 @@ vy_build_insert_tuple(struct vy_env *env, struct vy_lsm *lsm,
> >  	 * Hence, to get right mem (during mem rotation it becomes
> >  	 * sealed i.e. read-only) we should fetch it from lsm again.
> >  	 */
> > +	if (lsm->mem->generation != *lsm->env->p_generation) {
> > +		if (vy_lsm_rotate_mem(lsm) != 0) {
> > +			tuple_unref(stmt);
> > +			return -1;
> > +		}
> > +	}
> 
> 1. This check + rotation now are done in 3 places. In all three there
> is a comment explaining that again and again. Isn't it time to extract
> it into a function?

IMHO, check+rotations doesn't worth another one separate wrapper.
Moreover, in-place comments always describe situation better than
one general comment.
 
> >  	mem = lsm->mem;
> >  
> >  	/* Insert the new tuple into the in-memory index. */
> > diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.result b/test/vinyl/gh-5042-dump-during-index-build-2.result
> > new file mode 100644
> > index 000000000..a4fbc56ec
> > --- /dev/null
> > +++ b/test/vinyl/gh-5042-dump-during-index-build-2.result
> > @@ -0,0 +1,189 @@
> > +-- test-run result file version 2
> > +test_run = require('test_run').new()
> > + | ---
> > + | ...
> > +
> > +test_run:cmd("create server test with script='vinyl/low_quota.lua'")
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cmd("start server test with args='13421772'")
> 
> 2. Why is it 13421772?

Just borrowed it from vinyl/gh-4810-dump-during-index.build.test.lua
This value obtained from empirical research. Using other values
leads to unstable test results for some reason.

> > +_ = s2:create_index('pk', {parts = {1, 'unsigned'}})
> > + | ---
> > + | ...
> > +
> > +val = 0
> > + | ---
> > + | ...
> > +PADDING = string.rep('x', 100)
> 
> 3. Why is it 100?

Again, current test is just a modification of gh-4810, which in turn
is a compilation of patterns used in different vinyl tests. For
instance, this padding I took from select_consistency.test.lua
In general, padding is used to increase tuple payload, so that
less tuples are required to force dump.

> > + | ---
> > + | ...
> > +
> > +test_run:cmd("setopt delimiter ';'")
> > + | ---
> > + | - true
> > + | ...
> > +
> > +function gen_insert(space)
> > +    pcall(space.insert, space, {val, val, val, val, PADDING})
> 
> 4. Why is it important to have 4 vals, and not just one? Why is
> it in pcall?

IDK, it is just a test case. Removed pcall; left only 3 fields - see diff below.
 
> > +    val = val + 1
> > +end;
> > + | ---
> > + | ...
> > +
> > +function fill_L0_without_dump(space, watermark)
> > +    while box.stat.vinyl().memory.level0 < watermark do
> > +        gen_insert(space)
> > +    end
> 
> 5. Is it possible to check that the dump really didn't happen?
> Since the function is called 'without_dump'.

Yes. Original test (gh-4810) contains assertion, but I forgot to
copy it. Fixed.

> > +end;
> > + | ---
> > + | ...
> > +
> > +function insert_loop()
> > +    while not stop do
> 
> 6. Variable 'stop' is not declared here. I propose to declare it
> before this function.

Ok, fixed (diff below).

> > +        gen_insert(s1)
> > +    end
> > +    ch:put(true)
> > +end;
> > + | ---
> > + | ...
> > +
> > +function idx_build()
> > +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> > +    ch:put(true)
> 
> 7. ch is not declared anywhere above. It would be better to declare
> things before usage. Easier to read.

Fixed.

> 8. Is it important to use fields {2, 3} and not just {2}?

IDK. Only second one part is left.

> > +end;
> > + | ---
> > + | ...
> > +
> > +dump_watermark = box.cfg.vinyl_memory / 2;
> 
> 9. Why / 2? Is it some internal vinyl formula, that
> the dump is triggered, when the memory is half full?

Yes, dump watermarks is always >= vinyl_memory/2. Numbers may
vary, but for this vinyl_memory value on my local machine
first/second dumps are triggered before reaching 2/3 of vinyl_memory.

> In this case won't 'fill_L0_without_dump(s2, dump_watermark);'
> this line trigger the dump? Even though it is called
> 'without_dump'.

Oh god, let's rename it to fill_L0(space, size).

> > +errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
> > + | ---
> > + | - ok
> > + | ...
> > +
> > +fiber.sleep(2);
> 
> 10. Is it possible to avoid the fixed timeout? As you know,
> having a fixed timeout very often makes the test flaky.

I've failed to come up with condition which can be used as
indicator of triggered but not yet scheduled dump. 2 seconds
seems to be enough to reach quota limit and trigger the dump
even on slow machines. In new version I used 1 second - please
check if this test will fail on your PC.
 
> Also the test works more than 10 seconds - I always get the
> message:
> 
> 	No output during 10 seconds. Will abort after 120 seconds without output. List of workers not reporting the status:
> 	- 001_vinyl [vinyl/gh-5042-dump-during-index-build-2.test.lua, None] at var/001_vinyl/gh-5042-dump-during-index-build-2.result:162
> 
> I suggest to try to find a way to speed it up.

Here's whole diff:

diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.test.lua b/test/vinyl/gh-5042-dump-during-index-build-2.test.lua
index 16cdfa668..52170fc76 100644
--- a/test/vinyl/gh-5042-dump-during-index-build-2.test.lua
+++ b/test/vinyl/gh-5042-dump-during-index-build-2.test.lua
@@ -36,16 +36,28 @@ PADDING = string.rep('x', 100)
 test_run:cmd("setopt delimiter ';'")
 
 function gen_insert(space)
-    pcall(space.insert, space, {val, val, val, val, PADDING})
+    space:insert{val, val, PADDING}
     val = val + 1
 end;
 
-function fill_L0_without_dump(space, watermark)
-    while box.stat.vinyl().memory.level0 < watermark do
+function fill_L0(space, size)
+    while box.stat.vinyl().memory.level0 < size do
         gen_insert(space)
     end
 end;
 
+dump_watermark = box.cfg.vinyl_memory / 2;
+fill_L0(s2, dump_watermark / 2);
+
+box.snapshot();
+
+dump_cnt_before = box.stat.vinyl().scheduler.dump_count;
+fill_L0(s1, dump_watermark / 2 * 3);
+assert(dump_cnt_before == box.stat.vinyl().scheduler.dump_count);
+
+stop = false;
+ch = fiber.channel(2);
+
 function insert_loop()
     while not stop do
         gen_insert(s1)
@@ -53,33 +65,17 @@ function insert_loop()
     ch:put(true)
 end;
 
-function idx_build()
-    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
-    ch:put(true)
-end;
-
-dump_watermark = box.cfg.vinyl_memory / 2;
-fill_L0_without_dump(s1, dump_watermark / 2);
-fill_L0_without_dump(s2, dump_watermark);
-
-box.snapshot();
-
-fill_L0_without_dump(s1, dump_watermark / 2 * 3);
-
 function idx_build()
     errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
-    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
+    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned'}})
     ch:put(true)
 end;
 
-stop = false;
-ch = fiber.channel(2);
-
 _ = fiber.create(idx_build);
-_ = fiber.create(insert_loop, s1);
+_ = fiber.create(insert_loop);
 errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
 
-fiber.sleep(2);
+fiber.sleep(1);
 errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
 errinj.set("ERRINJ_VY_INDEX_DUMP", -1);
 
> Overall I didn't understand almost anything in this test,
> despite the big comment in the beginning. So I can't
> validate whether it is optimal/correct. All I could check
> is that it really crashes without the patch. Probably fixing
> the comments above, and adding more explanations into the
> test context would help.

Ок, давай объясню по шагам. Все что идет до этого момента - подготовка.
У s2 (по которому мы будем строить вторичный индекс) должны быть таплы
на диске; у s1 заполнен уровень L0, но так, чтобы дамп еще не произошел
(т.е. чтобы минимальное кол-во вставок после заполнения приводили к дампу).

-- Это тупо луп, который вставляет таплы до упора, чтобы стригерить
-- дамп. Тут нельзя использовать box.snapshot() так как он ставит
-- латч на ддл и мы тогда не сможем запустить построение индекса.
--
function insert_loop()
    while not stop do
        gen_insert(s1)
    end
    ch:put(true)
end;

-- Перед тем как начать строить индекс, выставляем эту задержку.
-- Так как индекс уникальный, то для мы свалимся в чтение рана
-- для проверки уникальности тапла. Можно было бы использовать
-- ERRINJ_VY_READ_PAGE_DELAY, но тогда может зависнуть параллельный
-- инсерт в спейс s1. Поэтому тут мы имитируем делей+илд сразу после
-- vy_check_is_unique_secondary() в vy_build_insert_tuple().
-- Таким образом, после выполнения этой функции мы зависнем в
-- vy_build_insert_tuple(), которая в свою очередь вызывается
-- в главном цикли при построении вторичного индекса (см. vinyl_space_build_index)
--
function idx_build()
    errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned'}})
    ch:put(true)
end;

_ = fiber.create(idx_build);
_ = fiber.create(insert_loop);
-- В этот момент у нас будет такое состояние: s2:create_index()
-- зависает в vy_build_insert_tuple(); таплы продолжают вставляться
-- в s1. В какой-то момент их станет достаточно, чтобы был превышен
-- лимит по памяти и сработал дамп (vy_scheduler_trigger_dump).
-- В этот момент значение глобальной генерации увеличивается, а 
-- файбер отвечающий за планировку дампа просыпается. Чтобы не дать
-- сделать ротацию вторичного индекса который мы строим (s2:i1),
-- мы ставим ERRINJ_VY_INDEX_DUMP. В теории, это может произойти не
-- потому что vy_task_dump_new() завершился с ошибкой, а потому что
-- файбер не был попросту разбужен в текущее временное окно (то есть
-- управление возвращается сразу к построению индекса).
-- С другой стороны, ротация других индексов может иметь место
-- (например, в vy_tx_write_prepare()). Соответственно, в lsregion
-- могут уже появиться блоки с id > s2:i1->generation.
-- Задержка же нужна, чтобы insert_loop() успел стригерить дамп.
--
errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
-- После того, как планировщик не смог или не успел выполнить
-- vy_task_dump_new() (в которой происходит ротация памяти),
-- мы возращаемся к построению индекса и продолжаем выполнение.
-- Как было сказано, глобальная генерация памяти увеличилась,
-- и могла произойти аллокация на lsregion с новым id (увеличенным).
-- Когда мы попытаемся в vy_build_insert_stmt() аллоцировать память
-- со старым id, то сработает ассерт: slab->max_id <= id.
--
fiber.sleep(2);
errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
errinj.set("ERRINJ_VY_INDEX_DUMP", -1);

> I checked all the other similar places and propose you to take
> a look at some of them too:
> 
> - vy_build_recover_stmt() calls vy_build_insert_stmt() without
>   checking mem generation and rotation;

Во время рекавери ротации памяти не могут происходить..
vy_point_lookup() может, конечно, илдить, но дампов быть не может.
Вот коммент в vinyl_engine_begin_initial_recovery() подтверждающий это:

		 * We can't schedule any background tasks until
		 * local recovery is complete, because they would
		 * disrupt yet to be recovered data stored on disk.
		 * So we don't start the scheduler fiber or enable
		 * memory limit until then.
		 *
		 * This is OK, because during recovery an instance
		 * can't consume more memory than it used before
		 * restart and hence memory dumps are not necessary.

> - vy_squash_process() calls vy_lsm_set() without these actions;

В теории потенциально тут такая ситуация может произойти (так как сквош
происходит в отдельном файбере), но там выделяется память всего под один
тапл (в отличие от построения вторичного индекса, где lsregion_alloc
зовется для каждого тапла) и поймать такую ситуацию на практике,
наверное, очень маловероятно.

> - vy_lsm_commit_upsert() calls vy_lsm_set, ditto.

Тут тоже в принципе реально: между engine_prepare() и engine_commit()
есть илд. При этом дамп может стригериться в конце vinyl_engine_prepare()
(хотя и в vy_tx_prepare() есть ротация). Для этого случая сценарий
придумать будет проще.

В любом случае, это я уже отдельными патчами пошлю.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-04 17:49 [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand Nikita Pettik
                   ` (3 preceding siblings ...)
  2020-06-07 15:51 ` Vladislav Shpilevoy
@ 2020-06-10 15:27 ` Nikita Pettik
  4 siblings, 0 replies; 14+ messages in thread
From: Nikita Pettik @ 2020-06-10 15:27 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

On 04 Jun 20:49, Nikita Pettik wrote:
> Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
> appear that in-memory level of secondary index being constructed has
> lagging memory generation (in other words, it's values is less than
> the value of current global generation). Such situation can be achieved
> if yield which is required to check uniqueness of value being inserted
> is too long. In this time gap other space may trigger dump process
> bumping global memory generation counter, but dump itself is still not
> yet scheduled.
> 
> So to get rid of generations mismatch, let's rotate in-memory level
> after yield on demand.
> 
> Closes #5042
> ---
> Branch: https://github.com/tarantool/tarantool/tree/np/gh-5042-rotate-mem-after-yield
> Issue: https://github.com/tarantool/tarantool/issues/5042
> 
> @ChangeLog:
>  * Fix crash due to triggered dump process during secondary index creation (gh-5042).
>

Pushed without test to master, 2.4, 2.3 and 1.10. Changelogs are updated
correspondingly. Test will be pushed as a follow-up as soon as it gets
second ack.
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-09 22:34   ` Nikita Pettik
@ 2020-06-15 13:54     ` Nikita Pettik
  2020-06-23 22:43     ` Vladislav Shpilevoy
  2020-07-22 22:54     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 14+ messages in thread
From: Nikita Pettik @ 2020-06-15 13:54 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 09 Jun 22:34, Nikita Pettik wrote:
> On 07 Jun 17:51, Vladislav Shpilevoy wrote:
> 
> > I checked all the other similar places and propose you to take
> > a look at some of them too:
> > 
> > - vy_build_recover_stmt() calls vy_build_insert_stmt() without
> >   checking mem generation and rotation;
> 
> Во время рекавери ротации памяти не могут происходить..
> vy_point_lookup() может, конечно, илдить, но дампов быть не может.
> Вот коммент в vinyl_engine_begin_initial_recovery() подтверждающий это:
> 
> 		 * We can't schedule any background tasks until
> 		 * local recovery is complete, because they would
> 		 * disrupt yet to be recovered data stored on disk.
> 		 * So we don't start the scheduler fiber or enable
> 		 * memory limit until then.
> 		 *
> 		 * This is OK, because during recovery an instance
> 		 * can't consume more memory than it used before
> 		 * restart and hence memory dumps are not necessary.
> 
> > - vy_squash_process() calls vy_lsm_set() without these actions;
> 
> В теории потенциально тут такая ситуация может произойти (так как сквош
> происходит в отдельном файбере), но там выделяется память всего под один
> тапл (в отличие от построения вторичного индекса, где lsregion_alloc
> зовется для каждого тапла) и поймать такую ситуацию на практике,
> наверное, очень маловероятно.
> 
> > - vy_lsm_commit_upsert() calls vy_lsm_set, ditto.
> 
> Тут тоже в принципе реально: между engine_prepare() и engine_commit()
> есть илд. При этом дамп может стригериться в конце vinyl_engine_prepare()
> (хотя и в vy_tx_prepare() есть ротация). Для этого случая сценарий
> придумать будет проще.
> 
> В любом случае, это я уже отдельными патчами пошлю.
>

По мотивам открыл тикет: https://github.com/tarantool/tarantool/issues/5080
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-09 22:34   ` Nikita Pettik
  2020-06-15 13:54     ` Nikita Pettik
@ 2020-06-23 22:43     ` Vladislav Shpilevoy
  2020-06-23 23:11       ` Nikita Pettik
  2020-07-22 22:54     ` Vladislav Shpilevoy
  2 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-23 22:43 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

Hi! JFYI, I didn't forget. Just decided to postpone it
until there are no urgent tasks needed for the release.

If you don't mind.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-23 22:43     ` Vladislav Shpilevoy
@ 2020-06-23 23:11       ` Nikita Pettik
  0 siblings, 0 replies; 14+ messages in thread
From: Nikita Pettik @ 2020-06-23 23:11 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 24 Jun 00:43, Vladislav Shpilevoy wrote:
> Hi! JFYI, I didn't forget. Just decided to postpone it
> until there are no urgent tasks needed for the release.
> 
> If you don't mind.

Sure, NP.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand
  2020-06-09 22:34   ` Nikita Pettik
  2020-06-15 13:54     ` Nikita Pettik
  2020-06-23 22:43     ` Vladislav Shpilevoy
@ 2020-07-22 22:54     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-22 22:54 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

Привет!

On 10.06.2020 00:34, Nikita Pettik wrote:
> On 07 Jun 17:51, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> On 04/06/2020 19:49, Nikita Pettik wrote:
>>> Meanwhile in 17f6af7dc the similar problem has been fixed, still it may
>>> appear that in-memory level of secondary index being constructed has
>>> lagging memory generation (in other words, it's values is less than
>>> the value of current global generation). Such situation can be achieved
>>> if yield which is required to check uniqueness of value being inserted
>>> is too long. In this time gap other space may trigger dump process
>>> bumping global memory generation counter, but dump itself is still not
>>> yet scheduled.
>>
>> Why is it so important in this bug to trigger the dump, but stop the
>> dump from actual scheduling?
> 
> Triggering the dump process bumps global memory generation. After
> this event each L0 should be rotated before processing further
> using of lsregion with old generation (another fiber could insert
> newer tuple using lsregion).

Все равно не понял. Почему нельзя взять и выполнить дамп, пока строитель
индекса там копается? Зачем именно планировать, но не выполнять дамп?

>> Why all the needed actions, whatever they
>> are, can't be done at the trigger moment? 
> 
> Rotating all L0s at once may stall execution for a while.
> It is done later in scheduler's fiber.

А в чем разница? Эти действия все равно надо сделать. Какая разница, в
каком файбере? Раньше - лучше, нет?

>> And what are these actions,
>> which are not done at trigger moment, but done at scheduling, and lead
>> to this bug?
> 
> It's memory roratiton in vy_task_dump_new().
> 
>>>  	mem = lsm->mem;
>>>  
>>>  	/* Insert the new tuple into the in-memory index. */
>>> diff --git a/test/vinyl/gh-5042-dump-during-index-build-2.result b/test/vinyl/gh-5042-dump-during-index-build-2.result
>>> new file mode 100644
>>> index 000000000..a4fbc56ec
>>> --- /dev/null
>>> +++ b/test/vinyl/gh-5042-dump-during-index-build-2.result
>>> @@ -0,0 +1,189 @@
>>> +-- test-run result file version 2
>>> +test_run = require('test_run').new()
>>> + | ---
>>> + | ...
>>> +
>>> +test_run:cmd("create server test with script='vinyl/low_quota.lua'")
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd("start server test with args='13421772'")
>>
>> 2. Why is it 13421772?
> 
> Just borrowed it from vinyl/gh-4810-dump-during-index.build.test.lua
> This value obtained from empirical research. Using other values
> leads to unstable test results for some reason.

Звучит не очень хорошо. Есть какая-то константа волшебная, без которой
все плохо, и с которой все норм, но непонятно, почему. Надо бы выяснить.

>>> +end;
>>> + | ---
>>> + | ...
>>> +
>>> +dump_watermark = box.cfg.vinyl_memory / 2;
>>
>> 9. Why / 2? Is it some internal vinyl formula, that
>> the dump is triggered, when the memory is half full?
> 
> Yes, dump watermarks is always >= vinyl_memory/2. Numbers may
> vary, but for this vinyl_memory value on my local machine
> first/second dumps are triggered before reaching 2/3 of vinyl_memory.
> 
>> In this case won't 'fill_L0_without_dump(s2, dump_watermark);'
>> this line trigger the dump? Even though it is called
>> 'without_dump'.
> 
> Oh god, let's rename it to fill_L0(space, size).

Хочешь - я могу сразу дать лгтм хоть сейчас. Мне не сложно.

ЛГТМ. Можно пушать и все комменты здесь проигнорить.

Но если хочется нормально сделать, то надо, чтоб имена отражали происходящее.
Имя говорило буквально противоположное тому, что происходило.

>>> +errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
>>> + | ---
>>> + | - ok
>>> + | ...
>>> +
>>> +fiber.sleep(2);
>>
>> 10. Is it possible to avoid the fixed timeout? As you know,
>> having a fixed timeout very often makes the test flaky.
> 
> I've failed to come up with condition which can be used as
> indicator of triggered but not yet scheduled dump. 2 seconds
> seems to be enough to reach quota limit and trigger the dump
> even on slow machines.

Фиксированный таймаут порядка единиц секунд - это всегда потенциально
нестабильный тест.

Отловить любое событие можно через новый errinj, если использовать его
не как вставку ошибки, а как сигнал. См ERRINJ_DYN_MODULE_COUNT для
примера. Можно увеличивать счетчик зашедуленых дампов в каком-нибудь
errinj, и в тесте его поллить, пока он не увеличится.

> @@ -53,33 +65,17 @@ function insert_loop()
>      ch:put(true)
>  end;
>  
> -function idx_build()
> -    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> -    ch:put(true)
> -end;
> -
> -dump_watermark = box.cfg.vinyl_memory / 2;
> -fill_L0_without_dump(s1, dump_watermark / 2);
> -fill_L0_without_dump(s2, dump_watermark);
> -
> -box.snapshot();
> -
> -fill_L0_without_dump(s1, dump_watermark / 2 * 3);
> -
>  function idx_build()

Эта функция два раза определена, и отличается только первой строкой. Мб
дать ей значение инджекшена в аргументе и определить один раз?

>      errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
> -    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned', 3, 'unsigned'}})
> +    _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned'}})
>      ch:put(true)
>  end;
>  
> -stop = false;
> -ch = fiber.channel(2);
> -
>  _ = fiber.create(idx_build);
> -_ = fiber.create(insert_loop, s1);
> +_ = fiber.create(insert_loop);
>  errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
>  
> -fiber.sleep(2);
> +fiber.sleep(1);
>  errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
>  errinj.set("ERRINJ_VY_INDEX_DUMP", -1);
>  
>> Overall I didn't understand almost anything in this test,
>> despite the big comment in the beginning. So I can't
>> validate whether it is optimal/correct. All I could check
>> is that it really crashes without the patch. Probably fixing
>> the comments above, and adding more explanations into the
>> test context would help.
> 
> Ок, давай объясню по шагам. Все что идет до этого момента - подготовка.
> У s2 (по которому мы будем строить вторичный индекс) должны быть таплы
> на диске; у s1 заполнен уровень L0, но так, чтобы дамп еще не произошел
> (т.е. чтобы минимальное кол-во вставок после заполнения приводили к дампу).
> 
> -- Это тупо луп, который вставляет таплы до упора, чтобы стригерить
> -- дамп. Тут нельзя использовать box.snapshot() так как он ставит
> -- латч на ддл и мы тогда не сможем запустить построение индекса.
> --
> function insert_loop()
>     while not stop do
>         gen_insert(s1)
>     end
>     ch:put(true)
> end;
> 
> -- Перед тем как начать строить индекс, выставляем эту задержку.
> -- Так как индекс уникальный, то для мы свалимся в чтение рана
> -- для проверки уникальности тапла. Можно было бы использовать
> -- ERRINJ_VY_READ_PAGE_DELAY, но тогда может зависнуть параллельный
> -- инсерт в спейс s1. Поэтому тут мы имитируем делей+илд сразу после
> -- vy_check_is_unique_secondary() в vy_build_insert_tuple().
> -- Таким образом, после выполнения этой функции мы зависнем в
> -- vy_build_insert_tuple(), которая в свою очередь вызывается
> -- в главном цикли при построении вторичного индекса (см. vinyl_space_build_index)
> --
> function idx_build()
>     errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", true)
>     _ = s2:create_index('i1', {unique = true, parts = {2, 'unsigned'}})
>     ch:put(true)
> end;
> 
> _ = fiber.create(idx_build);
> _ = fiber.create(insert_loop);
> -- В этот момент у нас будет такое состояние: s2:create_index()
> -- зависает в vy_build_insert_tuple(); таплы продолжают вставляться
> -- в s1. В какой-то момент их станет достаточно, чтобы был превышен
> -- лимит по памяти и сработал дамп (vy_scheduler_trigger_dump).
> -- В этот момент значение глобальной генерации увеличивается, а 
> -- файбер отвечающий за планировку дампа просыпается. Чтобы не дать
> -- сделать ротацию вторичного индекса который мы строим (s2:i1),
> -- мы ставим ERRINJ_VY_INDEX_DUMP. В теории, это может произойти не
> -- потому что vy_task_dump_new() завершился с ошибкой, а потому что
> -- файбер не был попросту разбужен в текущее временное окно (то есть
> -- управление возвращается сразу к построению индекса).
> -- С другой стороны, ротация других индексов может иметь место
> -- (например, в vy_tx_write_prepare()). Соответственно, в lsregion
> -- могут уже появиться блоки с id > s2:i1->generation.
> -- Задержка же нужна, чтобы insert_loop() успел стригерить дамп.
> --
> errinj.set("ERRINJ_VY_INDEX_DUMP", 1);
> -- После того, как планировщик не смог или не успел выполнить
> -- vy_task_dump_new() (в которой происходит ротация памяти),
> -- мы возращаемся к построению индекса и продолжаем выполнение.
> -- Как было сказано, глобальная генерация памяти увеличилась,
> -- и могла произойти аллокация на lsregion с новым id (увеличенным).
> -- Когда мы попытаемся в vy_build_insert_stmt() аллоцировать память
> -- со старым id, то сработает ассерт: slab->max_id <= id.
> --
> fiber.sleep(2);
> errinj.set("ERRINJ_BUILD_UNIQUE_IDX_DELAY", false);
> errinj.set("ERRINJ_VY_INDEX_DUMP", -1);

Схематически понятно. Но реальный тест выглядит почему-то сложнее.
Наверное из самого непонятного все еще то, почему дамп не должен
произойти.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-22 22:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04 17:49 [Tarantool-patches] [PATCH] vinyl: rotate mem during index build on demand Nikita Pettik
2020-06-04 20:23 ` Konstantin Osipov
2020-06-04 22:32   ` Nikita Pettik
2020-06-04 22:52 ` Konstantin Osipov
2020-06-05 11:18   ` Nikita Pettik
2020-06-05 11:33     ` Konstantin Osipov
2020-06-06  8:38 ` Aleksandr Lyapunov
2020-06-07 15:51 ` Vladislav Shpilevoy
2020-06-09 22:34   ` Nikita Pettik
2020-06-15 13:54     ` Nikita Pettik
2020-06-23 22:43     ` Vladislav Shpilevoy
2020-06-23 23:11       ` Nikita Pettik
2020-07-22 22:54     ` Vladislav Shpilevoy
2020-06-10 15:27 ` Nikita Pettik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox