Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
@ 2020-01-29 17:03 Alexander Turenko
  2020-01-29 18:46 ` Nikita Pettik
  2020-02-27 20:22 ` Kirill Yukhin
  0 siblings, 2 replies; 7+ messages in thread
From: Alexander Turenko @ 2020-01-29 17:03 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

After #4736 regression fix (in fact it just reverts the new logic in
small) it is possible again that a fiber's region may hold a memory for
a while, but release it eventually. When the used memory exceeds 128 KiB
threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
subtracts them from region_used() metric. But until this point those
slabs are accounted in region_used() and so in fiber.info() metrics.

This commit fixes flakiness of test cases of the following kind:

 | fiber.info()[fiber.self().id()].memory.used -- should be zero
 | <...workload...>
 | fiber.info()[fiber.self().id()].memory.used -- should be zero

The problem is that the first `<...>.memory.used` value may be non-zero.
It depends of previous tests that were executed on this tarantool
instance. It is resolved by restarting of a tarantool instance before
such test cases to ensure that there are no 'garbage' slabs in a current
fiber's region.

Note: This works only if a test case reserves only one slab at the
moment: otherwise some memory may be hold after the case (and so a
memory check after a workload will fail). However it seems that our
cases are small enough to don't trigger this situation.

Call of region_free() would be enough, but we have no Lua API for it.

Fixes #4750.
---

https://github.com/tarantool/tarantool/issues/4750
https://github.com/tarantool/tarantool/tree/Totktonada/gh-4750-stabilize-flaky-memleak-checks

 test/engine/snapshot.result            | 3 +++
 test/engine/snapshot.test.lua          | 4 ++++
 test/sql/gh-3199-no-mem-leaks.result   | 9 ++++++---
 test/sql/gh-3199-no-mem-leaks.test.lua | 5 ++++-
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/test/engine/snapshot.result b/test/engine/snapshot.result
index 9b4824ba3..f4ac7287a 100644
--- a/test/engine/snapshot.result
+++ b/test/engine/snapshot.result
@@ -238,6 +238,9 @@ box.space.test.index.secondary:select{}
 box.space.test:drop()
 ---
 ...
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
 -- Check that box.snapshot() doesn't leave garbage one the region.
 -- https://github.com/tarantool/tarantool/issues/3732
 fiber = require('fiber')
diff --git a/test/engine/snapshot.test.lua b/test/engine/snapshot.test.lua
index 13eecccc2..3f5a89e0b 100644
--- a/test/engine/snapshot.test.lua
+++ b/test/engine/snapshot.test.lua
@@ -42,6 +42,10 @@ box.space.test.index.primary:select{}
 box.space.test.index.secondary:select{}
 box.space.test:drop()
 
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
+
 -- Check that box.snapshot() doesn't leave garbage one the region.
 -- https://github.com/tarantool/tarantool/issues/3732
 fiber = require('fiber')
diff --git a/test/sql/gh-3199-no-mem-leaks.result b/test/sql/gh-3199-no-mem-leaks.result
index 35d8572b4..d8590779a 100644
--- a/test/sql/gh-3199-no-mem-leaks.result
+++ b/test/sql/gh-3199-no-mem-leaks.result
@@ -7,13 +7,16 @@ engine = test_run:get_cfg('engine')
 _ = box.space._session_settings:update('sql_default_engine', {{'=', 2, engine}})
 ---
 ...
-fiber = require('fiber')
----
-...
 -- This test checks that no leaks of region memory happens during
 -- executing SQL queries.
 --
 -- box.cfg()
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
+fiber = require('fiber')
+---
+...
 box.execute('CREATE TABLE test (id INT PRIMARY KEY, x INTEGER, y INTEGER)')
 ---
 - row_count: 1
diff --git a/test/sql/gh-3199-no-mem-leaks.test.lua b/test/sql/gh-3199-no-mem-leaks.test.lua
index 41648d0fc..d517c3373 100644
--- a/test/sql/gh-3199-no-mem-leaks.test.lua
+++ b/test/sql/gh-3199-no-mem-leaks.test.lua
@@ -1,7 +1,6 @@
 test_run = require('test_run').new()
 engine = test_run:get_cfg('engine')
 _ = box.space._session_settings:update('sql_default_engine', {{'=', 2, engine}})
-fiber = require('fiber')
 
 -- This test checks that no leaks of region memory happens during
 -- executing SQL queries.
@@ -9,6 +8,10 @@ fiber = require('fiber')
 
 -- box.cfg()
 
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
+fiber = require('fiber')
 
 box.execute('CREATE TABLE test (id INT PRIMARY KEY, x INTEGER, y INTEGER)')
 box.execute('INSERT INTO test VALUES (1, 1, 1), (2, 2, 2)')
-- 
2.22.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
  2020-01-29 17:03 [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection Alexander Turenko
@ 2020-01-29 18:46 ` Nikita Pettik
  2020-01-29 20:27   ` Alexander Turenko
  2020-02-27 20:22 ` Kirill Yukhin
  1 sibling, 1 reply; 7+ messages in thread
From: Nikita Pettik @ 2020-01-29 18:46 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

On 29 Jan 20:03, Alexander Turenko wrote:
> After #4736 regression fix (in fact it just reverts the new logic in
> small) it is possible again that a fiber's region may hold a memory for
> a while, but release it eventually. When the used memory exceeds 128 KiB
> threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
> subtracts them from region_used() metric. But until this point those
> slabs are accounted in region_used() and so in fiber.info() metrics.
> 
> This commit fixes flakiness of test cases of the following kind:
> 
>  | fiber.info()[fiber.self().id()].memory.used -- should be zero
>  | <...workload...>
>  | fiber.info()[fiber.self().id()].memory.used -- should be zero
> 
> The problem is that the first `<...>.memory.used` value may be non-zero.
> It depends of previous tests that were executed on this tarantool
> instance. It is resolved by restarting of a tarantool instance before
> such test cases to ensure that there are no 'garbage' slabs in a current
> fiber's region.

Hm, why not simply save value of ..memory.used before workload, value
after workload and compare them:

reg_sz_before = fiber.info()[fiber.self().id()].memory.used
...
reg_sz_after = fiber.info()[fiber.self().id()].memory.used

assert(reg_sz_before == reg_sz_after);

So that make sure workload returns all occupied memory. This may fail
only in case allocated memory > 4kb, but examples in this particular
case definitely don't require so many memory (as you noted below).

> Note: This works only if a test case reserves only one slab at the
> moment: otherwise some memory may be hold after the case (and so a
> memory check after a workload will fail). However it seems that our
> cases are small enough to don't trigger this situation.
> 
> Call of region_free() would be enough, but we have no Lua API for it.
> 
> Fixes #4750.
> ---
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
  2020-01-29 18:46 ` Nikita Pettik
@ 2020-01-29 20:27   ` Alexander Turenko
  2020-02-05 15:36     ` Nikita Pettik
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Turenko @ 2020-01-29 20:27 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

On Wed, Jan 29, 2020 at 09:46:30PM +0300, Nikita Pettik wrote:
> On 29 Jan 20:03, Alexander Turenko wrote:
> > After #4736 regression fix (in fact it just reverts the new logic in
> > small) it is possible again that a fiber's region may hold a memory for
> > a while, but release it eventually. When the used memory exceeds 128 KiB
> > threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
> > subtracts them from region_used() metric. But until this point those
> > slabs are accounted in region_used() and so in fiber.info() metrics.
> > 
> > This commit fixes flakiness of test cases of the following kind:
> > 
> >  | fiber.info()[fiber.self().id()].memory.used -- should be zero
> >  | <...workload...>
> >  | fiber.info()[fiber.self().id()].memory.used -- should be zero
> > 
> > The problem is that the first `<...>.memory.used` value may be non-zero.
> > It depends of previous tests that were executed on this tarantool
> > instance. It is resolved by restarting of a tarantool instance before
> > such test cases to ensure that there are no 'garbage' slabs in a current
> > fiber's region.
> 
> Hm, why not simply save value of ..memory.used before workload, value
> after workload and compare them:
> 
> reg_sz_before = fiber.info()[fiber.self().id()].memory.used
> ...
> reg_sz_after = fiber.info()[fiber.self().id()].memory.used
> 
> assert(reg_sz_before == reg_sz_after);
> 
> So that make sure workload returns all occupied memory. This may fail
> only in case allocated memory > 4kb, but examples in this particular
> case definitely don't require so many memory (as you noted below).

I forgot to add the reason why this approach does not work to the commit
message. Added the following paragraph:

 | The obvious way to solve it would be print differences between
 | `<...>.memory.used` values before and after a workload instead of
 | absolute values. This however does not work, because a first slab in a
 | region can be almost used at the point where a test case starts and a
 | next slab will be acquired from a slab_cache. This means that the
 | previous slab will become a 'garbage' and will not be collected until
 | 128 KiB threshold will exceed: the latter `<...>.memory.used` check will
 | return a bigger value than the former one. However, if the threshold
 | will be reached during the workload, the latter check may show lesser
 | value than the former one. In short, the test case would be unstable
 | after this change.

It can be checked using the snipped from the issue:

 | diff --git a/test/sql/gh-3199-no-mem-leaks.test.lua b/test/sql/gh-3199-no-mem-leaks.test.lua
 | index 41648d0fc..2f998e65f 100644
 | --- a/test/sql/gh-3199-no-mem-leaks.test.lua
 | +++ b/test/sql/gh-3199-no-mem-leaks.test.lua
 | @@ -9,6 +9,15 @@ fiber = require('fiber')
 |  
 |  -- box.cfg()
 |  
 | +box.once('init', function()        \
 | +    box.schema.space.create('s')   \
 | +    box.space.s:create_index('pk') \
 | +end)
 | +
 | +data = string.rep('x', 2048)                        \
 | +for i = 1, 1000 do                                  \
 | +    box.space.s:upsert({i, data}, {{'=', 2, data}}) \
 | +end
 |  
 |  box.execute('CREATE TABLE test (id INT PRIMARY KEY, x INTEGER, y INTEGER)')
 |  box.execute('INSERT INTO test VALUES (1, 1, 1), (2, 2, 2)')

After this change, values from
`fiber.info()[fiber.self().id()].memory.used` expressions from the test
will be the following:

32888
32888
32888
32888
26726

If we'll change the absolute values to a differences, then the last
value will not be zero anyway.

> 
> > Note: This works only if a test case reserves only one slab at the
> > moment: otherwise some memory may be hold after the case (and so a
> > memory check after a workload will fail). However it seems that our
> > cases are small enough to don't trigger this situation.
> > 
> > Call of region_free() would be enough, but we have no Lua API for it.
> > 
> > Fixes #4750.
> > ---
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
  2020-01-29 20:27   ` Alexander Turenko
@ 2020-02-05 15:36     ` Nikita Pettik
  0 siblings, 0 replies; 7+ messages in thread
From: Nikita Pettik @ 2020-02-05 15:36 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

On 29 Jan 23:27, Alexander Turenko wrote:
> On Wed, Jan 29, 2020 at 09:46:30PM +0300, Nikita Pettik wrote:
> > On 29 Jan 20:03, Alexander Turenko wrote:
> > > The problem is that the first `<...>.memory.used` value may be non-zero.
> > > It depends of previous tests that were executed on this tarantool
> > > instance. It is resolved by restarting of a tarantool instance before
> > > such test cases to ensure that there are no 'garbage' slabs in a current
> > > fiber's region.
> > 
> > Hm, why not simply save value of ..memory.used before workload, value
> > after workload and compare them:
> > 
> > reg_sz_before = fiber.info()[fiber.self().id()].memory.used
> > ...
> > reg_sz_after = fiber.info()[fiber.self().id()].memory.used
> > 
> > assert(reg_sz_before == reg_sz_after);
> > 
> > So that make sure workload returns all occupied memory. This may fail
> > only in case allocated memory > 4kb, but examples in this particular
> > case definitely don't require so many memory (as you noted below).
> 
> I forgot to add the reason why this approach does not work to the commit
> message. Added the following paragraph:
> 
>  | The obvious way to solve it would be print differences between
>  | `<...>.memory.used` values before and after a workload instead of
>  | absolute values. This however does not work, because a first slab in a
>  | region can be almost used at the point where a test case starts and a
>  | next slab will be acquired from a slab_cache. This means that the

I checked these particular SQL queries - it's not the case you mentioned.
Slab (first one i.e. head of list) is empty at the start of query processing;
meanwhile query itself requires only a few bytes allocated on region.
Region memory (...memory.used) changes only after executing last query:

box.execute('SELECT x, y + 3 * b, b FROM test2, test WHERE b = x')

Here's brief region memory use log for this query:

region alloc size 0 //First region allocation of 0 bytes 
getting new slab
region alloc size 0 
current slab unused 4040 //56 bytes takes slab structure itself
current slab used 0 
current slab size 4096 
region alloc size 0 
... //same values
region free
region alloc size 1 
current slab unused 4040 
current slab used 0 
current slab size 4096 
region alloc size 1 
...
region alloc size 1 
current slab unused 4038 
current slab used 2 
current slab size 4096 
region alloc size 1 
...
region join 
region truncate 
cut size 0 //nothing to truncate 
current slab used 4 
slabs used 26730 //total region memory in use
region truncate
cut size 4 //cut size matches with memory in use 
slab used 4
removing current slab
// slab is empty and put back to cache
// at this point we observe slab rotation.
// But new slab (i.e. new head of list) can be not empty
// (that is slab.used != 0) since we reverted Georgy's patch which
// zeroed whole list of slabs.
// So that used memory in first slab has increased (which looks extremely
// contradictory). Also, at the end of execution we call fibre_gc() which
// will nullify slab->used memory consuption and ergo reduce whole
// ...fiber.memory.used consumption. That's why amount of memory in usage at
// the end of query execution does not match with initial value.
// As a workaround we can remove slab rotation in region_truncate().
// Moreover, it may even result in performance gain.
// So, instead of pushing this patch, let's consider another one fix
// for small library. I'll send a patch.

...
cut size 0 
slab used 2060 
slabs used 26726 
region alloc size 1 
current slab unused 1980 
current slab used 2060 
current slab size 4096 
region alloc size 1 
current slab unused 1979 
current slab used 2061 
current slab size 4096 
region alloc size 1 
current slab unused 1978 
current slab used 2062 
current slab size 4096 
region alloc size 1 
current slab unused 1977 
current slab used 2063 
current slab size 4096 
...
fiber gc 
region reset 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
  2020-01-29 17:03 [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection Alexander Turenko
  2020-01-29 18:46 ` Nikita Pettik
@ 2020-02-27 20:22 ` Kirill Yukhin
  2020-02-27 22:58   ` Nikita Pettik
  1 sibling, 1 reply; 7+ messages in thread
From: Kirill Yukhin @ 2020-02-27 20:22 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches

Hello,

On 29 янв 20:03, Alexander Turenko wrote:
> After #4736 regression fix (in fact it just reverts the new logic in
> small) it is possible again that a fiber's region may hold a memory for
> a while, but release it eventually. When the used memory exceeds 128 KiB
> threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
> subtracts them from region_used() metric. But until this point those
> slabs are accounted in region_used() and so in fiber.info() metrics.
> 
> This commit fixes flakiness of test cases of the following kind:
> 
>  | fiber.info()[fiber.self().id()].memory.used -- should be zero
>  | <...workload...>
>  | fiber.info()[fiber.self().id()].memory.used -- should be zero
> 
> The problem is that the first `<...>.memory.used` value may be non-zero.
> It depends of previous tests that were executed on this tarantool
> instance. It is resolved by restarting of a tarantool instance before
> such test cases to ensure that there are no 'garbage' slabs in a current
> fiber's region.
> 
> Note: This works only if a test case reserves only one slab at the
> moment: otherwise some memory may be hold after the case (and so a
> memory check after a workload will fail). However it seems that our
> cases are small enough to don't trigger this situation.
> 
> Call of region_free() would be enough, but we have no Lua API for it.
> 
> Fixes #4750.

I've checked your patch into 2.2, 2.3 and master as temporary
workaround.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
  2020-02-27 20:22 ` Kirill Yukhin
@ 2020-02-27 22:58   ` Nikita Pettik
  2020-02-27 23:07     ` Kirill Yukhin
  0 siblings, 1 reply; 7+ messages in thread
From: Nikita Pettik @ 2020-02-27 22:58 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-patches

On 27 Feb 23:22, Kirill Yukhin wrote:
> Hello,
> 
> > 
> > Fixes #4750.
> 
> I've checked your patch into 2.2, 2.3 and master as temporary
> workaround.

Patch got no lgtms btw. Just for the record.
 
> --
> Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
  2020-02-27 22:58   ` Nikita Pettik
@ 2020-02-27 23:07     ` Kirill Yukhin
  0 siblings, 0 replies; 7+ messages in thread
From: Kirill Yukhin @ 2020-02-27 23:07 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches

On 28 фев 01:58, Nikita Pettik wrote:
> On 27 Feb 23:22, Kirill Yukhin wrote:
> > Hello,
> > 
> > > 
> > > Fixes #4750.
> > 
> > I've checked your patch into 2.2, 2.3 and master as temporary
> > workaround.
> 
> Patch got no lgtms btw. Just for the record.

This is temporary, to stabilize testing.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-02-27 23:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-29 17:03 [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection Alexander Turenko
2020-01-29 18:46 ` Nikita Pettik
2020-01-29 20:27   ` Alexander Turenko
2020-02-05 15:36     ` Nikita Pettik
2020-02-27 20:22 ` Kirill Yukhin
2020-02-27 22:58   ` Nikita Pettik
2020-02-27 23:07     ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox