* [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
@ 2020-01-29 17:03 Alexander Turenko
2020-01-29 18:46 ` Nikita Pettik
2020-02-27 20:22 ` Kirill Yukhin
0 siblings, 2 replies; 7+ messages in thread
From: Alexander Turenko @ 2020-01-29 17:03 UTC (permalink / raw)
To: Nikita Pettik; +Cc: tarantool-patches
After #4736 regression fix (in fact it just reverts the new logic in
small) it is possible again that a fiber's region may hold a memory for
a while, but release it eventually. When the used memory exceeds 128 KiB
threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
subtracts them from region_used() metric. But until this point those
slabs are accounted in region_used() and so in fiber.info() metrics.
This commit fixes flakiness of test cases of the following kind:
| fiber.info()[fiber.self().id()].memory.used -- should be zero
| <...workload...>
| fiber.info()[fiber.self().id()].memory.used -- should be zero
The problem is that the first `<...>.memory.used` value may be non-zero.
It depends of previous tests that were executed on this tarantool
instance. It is resolved by restarting of a tarantool instance before
such test cases to ensure that there are no 'garbage' slabs in a current
fiber's region.
Note: This works only if a test case reserves only one slab at the
moment: otherwise some memory may be hold after the case (and so a
memory check after a workload will fail). However it seems that our
cases are small enough to don't trigger this situation.
Call of region_free() would be enough, but we have no Lua API for it.
Fixes #4750.
---
https://github.com/tarantool/tarantool/issues/4750
https://github.com/tarantool/tarantool/tree/Totktonada/gh-4750-stabilize-flaky-memleak-checks
test/engine/snapshot.result | 3 +++
test/engine/snapshot.test.lua | 4 ++++
test/sql/gh-3199-no-mem-leaks.result | 9 ++++++---
test/sql/gh-3199-no-mem-leaks.test.lua | 5 ++++-
4 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/test/engine/snapshot.result b/test/engine/snapshot.result
index 9b4824ba3..f4ac7287a 100644
--- a/test/engine/snapshot.result
+++ b/test/engine/snapshot.result
@@ -238,6 +238,9 @@ box.space.test.index.secondary:select{}
box.space.test:drop()
---
...
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
-- Check that box.snapshot() doesn't leave garbage one the region.
-- https://github.com/tarantool/tarantool/issues/3732
fiber = require('fiber')
diff --git a/test/engine/snapshot.test.lua b/test/engine/snapshot.test.lua
index 13eecccc2..3f5a89e0b 100644
--- a/test/engine/snapshot.test.lua
+++ b/test/engine/snapshot.test.lua
@@ -42,6 +42,10 @@ box.space.test.index.primary:select{}
box.space.test.index.secondary:select{}
box.space.test:drop()
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
+
-- Check that box.snapshot() doesn't leave garbage one the region.
-- https://github.com/tarantool/tarantool/issues/3732
fiber = require('fiber')
diff --git a/test/sql/gh-3199-no-mem-leaks.result b/test/sql/gh-3199-no-mem-leaks.result
index 35d8572b4..d8590779a 100644
--- a/test/sql/gh-3199-no-mem-leaks.result
+++ b/test/sql/gh-3199-no-mem-leaks.result
@@ -7,13 +7,16 @@ engine = test_run:get_cfg('engine')
_ = box.space._session_settings:update('sql_default_engine', {{'=', 2, engine}})
---
...
-fiber = require('fiber')
----
-...
-- This test checks that no leaks of region memory happens during
-- executing SQL queries.
--
-- box.cfg()
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
+fiber = require('fiber')
+---
+...
box.execute('CREATE TABLE test (id INT PRIMARY KEY, x INTEGER, y INTEGER)')
---
- row_count: 1
diff --git a/test/sql/gh-3199-no-mem-leaks.test.lua b/test/sql/gh-3199-no-mem-leaks.test.lua
index 41648d0fc..d517c3373 100644
--- a/test/sql/gh-3199-no-mem-leaks.test.lua
+++ b/test/sql/gh-3199-no-mem-leaks.test.lua
@@ -1,7 +1,6 @@
test_run = require('test_run').new()
engine = test_run:get_cfg('engine')
_ = box.space._session_settings:update('sql_default_engine', {{'=', 2, engine}})
-fiber = require('fiber')
-- This test checks that no leaks of region memory happens during
-- executing SQL queries.
@@ -9,6 +8,10 @@ fiber = require('fiber')
-- box.cfg()
+-- Hard way to flush garbage slabs in the fiber's region. See
+-- gh-4750.
+test_run:cmd('restart server default')
+fiber = require('fiber')
box.execute('CREATE TABLE test (id INT PRIMARY KEY, x INTEGER, y INTEGER)')
box.execute('INSERT INTO test VALUES (1, 1, 1), (2, 2, 2)')
--
2.22.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
2020-01-29 17:03 [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection Alexander Turenko
@ 2020-01-29 18:46 ` Nikita Pettik
2020-01-29 20:27 ` Alexander Turenko
2020-02-27 20:22 ` Kirill Yukhin
1 sibling, 1 reply; 7+ messages in thread
From: Nikita Pettik @ 2020-01-29 18:46 UTC (permalink / raw)
To: Alexander Turenko; +Cc: tarantool-patches
On 29 Jan 20:03, Alexander Turenko wrote:
> After #4736 regression fix (in fact it just reverts the new logic in
> small) it is possible again that a fiber's region may hold a memory for
> a while, but release it eventually. When the used memory exceeds 128 KiB
> threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
> subtracts them from region_used() metric. But until this point those
> slabs are accounted in region_used() and so in fiber.info() metrics.
>
> This commit fixes flakiness of test cases of the following kind:
>
> | fiber.info()[fiber.self().id()].memory.used -- should be zero
> | <...workload...>
> | fiber.info()[fiber.self().id()].memory.used -- should be zero
>
> The problem is that the first `<...>.memory.used` value may be non-zero.
> It depends of previous tests that were executed on this tarantool
> instance. It is resolved by restarting of a tarantool instance before
> such test cases to ensure that there are no 'garbage' slabs in a current
> fiber's region.
Hm, why not simply save value of ..memory.used before workload, value
after workload and compare them:
reg_sz_before = fiber.info()[fiber.self().id()].memory.used
...
reg_sz_after = fiber.info()[fiber.self().id()].memory.used
assert(reg_sz_before == reg_sz_after);
So that make sure workload returns all occupied memory. This may fail
only in case allocated memory > 4kb, but examples in this particular
case definitely don't require so many memory (as you noted below).
> Note: This works only if a test case reserves only one slab at the
> moment: otherwise some memory may be hold after the case (and so a
> memory check after a workload will fail). However it seems that our
> cases are small enough to don't trigger this situation.
>
> Call of region_free() would be enough, but we have no Lua API for it.
>
> Fixes #4750.
> ---
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
2020-01-29 18:46 ` Nikita Pettik
@ 2020-01-29 20:27 ` Alexander Turenko
2020-02-05 15:36 ` Nikita Pettik
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Turenko @ 2020-01-29 20:27 UTC (permalink / raw)
To: Nikita Pettik; +Cc: tarantool-patches
On Wed, Jan 29, 2020 at 09:46:30PM +0300, Nikita Pettik wrote:
> On 29 Jan 20:03, Alexander Turenko wrote:
> > After #4736 regression fix (in fact it just reverts the new logic in
> > small) it is possible again that a fiber's region may hold a memory for
> > a while, but release it eventually. When the used memory exceeds 128 KiB
> > threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
> > subtracts them from region_used() metric. But until this point those
> > slabs are accounted in region_used() and so in fiber.info() metrics.
> >
> > This commit fixes flakiness of test cases of the following kind:
> >
> > | fiber.info()[fiber.self().id()].memory.used -- should be zero
> > | <...workload...>
> > | fiber.info()[fiber.self().id()].memory.used -- should be zero
> >
> > The problem is that the first `<...>.memory.used` value may be non-zero.
> > It depends of previous tests that were executed on this tarantool
> > instance. It is resolved by restarting of a tarantool instance before
> > such test cases to ensure that there are no 'garbage' slabs in a current
> > fiber's region.
>
> Hm, why not simply save value of ..memory.used before workload, value
> after workload and compare them:
>
> reg_sz_before = fiber.info()[fiber.self().id()].memory.used
> ...
> reg_sz_after = fiber.info()[fiber.self().id()].memory.used
>
> assert(reg_sz_before == reg_sz_after);
>
> So that make sure workload returns all occupied memory. This may fail
> only in case allocated memory > 4kb, but examples in this particular
> case definitely don't require so many memory (as you noted below).
I forgot to add the reason why this approach does not work to the commit
message. Added the following paragraph:
| The obvious way to solve it would be print differences between
| `<...>.memory.used` values before and after a workload instead of
| absolute values. This however does not work, because a first slab in a
| region can be almost used at the point where a test case starts and a
| next slab will be acquired from a slab_cache. This means that the
| previous slab will become a 'garbage' and will not be collected until
| 128 KiB threshold will exceed: the latter `<...>.memory.used` check will
| return a bigger value than the former one. However, if the threshold
| will be reached during the workload, the latter check may show lesser
| value than the former one. In short, the test case would be unstable
| after this change.
It can be checked using the snipped from the issue:
| diff --git a/test/sql/gh-3199-no-mem-leaks.test.lua b/test/sql/gh-3199-no-mem-leaks.test.lua
| index 41648d0fc..2f998e65f 100644
| --- a/test/sql/gh-3199-no-mem-leaks.test.lua
| +++ b/test/sql/gh-3199-no-mem-leaks.test.lua
| @@ -9,6 +9,15 @@ fiber = require('fiber')
|
| -- box.cfg()
|
| +box.once('init', function() \
| + box.schema.space.create('s') \
| + box.space.s:create_index('pk') \
| +end)
| +
| +data = string.rep('x', 2048) \
| +for i = 1, 1000 do \
| + box.space.s:upsert({i, data}, {{'=', 2, data}}) \
| +end
|
| box.execute('CREATE TABLE test (id INT PRIMARY KEY, x INTEGER, y INTEGER)')
| box.execute('INSERT INTO test VALUES (1, 1, 1), (2, 2, 2)')
After this change, values from
`fiber.info()[fiber.self().id()].memory.used` expressions from the test
will be the following:
32888
32888
32888
32888
26726
If we'll change the absolute values to a differences, then the last
value will not be zero anyway.
>
> > Note: This works only if a test case reserves only one slab at the
> > moment: otherwise some memory may be hold after the case (and so a
> > memory check after a workload will fail). However it seems that our
> > cases are small enough to don't trigger this situation.
> >
> > Call of region_free() would be enough, but we have no Lua API for it.
> >
> > Fixes #4750.
> > ---
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
2020-01-29 20:27 ` Alexander Turenko
@ 2020-02-05 15:36 ` Nikita Pettik
0 siblings, 0 replies; 7+ messages in thread
From: Nikita Pettik @ 2020-02-05 15:36 UTC (permalink / raw)
To: Alexander Turenko; +Cc: tarantool-patches
On 29 Jan 23:27, Alexander Turenko wrote:
> On Wed, Jan 29, 2020 at 09:46:30PM +0300, Nikita Pettik wrote:
> > On 29 Jan 20:03, Alexander Turenko wrote:
> > > The problem is that the first `<...>.memory.used` value may be non-zero.
> > > It depends of previous tests that were executed on this tarantool
> > > instance. It is resolved by restarting of a tarantool instance before
> > > such test cases to ensure that there are no 'garbage' slabs in a current
> > > fiber's region.
> >
> > Hm, why not simply save value of ..memory.used before workload, value
> > after workload and compare them:
> >
> > reg_sz_before = fiber.info()[fiber.self().id()].memory.used
> > ...
> > reg_sz_after = fiber.info()[fiber.self().id()].memory.used
> >
> > assert(reg_sz_before == reg_sz_after);
> >
> > So that make sure workload returns all occupied memory. This may fail
> > only in case allocated memory > 4kb, but examples in this particular
> > case definitely don't require so many memory (as you noted below).
>
> I forgot to add the reason why this approach does not work to the commit
> message. Added the following paragraph:
>
> | The obvious way to solve it would be print differences between
> | `<...>.memory.used` values before and after a workload instead of
> | absolute values. This however does not work, because a first slab in a
> | region can be almost used at the point where a test case starts and a
> | next slab will be acquired from a slab_cache. This means that the
I checked these particular SQL queries - it's not the case you mentioned.
Slab (first one i.e. head of list) is empty at the start of query processing;
meanwhile query itself requires only a few bytes allocated on region.
Region memory (...memory.used) changes only after executing last query:
box.execute('SELECT x, y + 3 * b, b FROM test2, test WHERE b = x')
Here's brief region memory use log for this query:
region alloc size 0 //First region allocation of 0 bytes
getting new slab
region alloc size 0
current slab unused 4040 //56 bytes takes slab structure itself
current slab used 0
current slab size 4096
region alloc size 0
... //same values
region free
region alloc size 1
current slab unused 4040
current slab used 0
current slab size 4096
region alloc size 1
...
region alloc size 1
current slab unused 4038
current slab used 2
current slab size 4096
region alloc size 1
...
region join
region truncate
cut size 0 //nothing to truncate
current slab used 4
slabs used 26730 //total region memory in use
region truncate
cut size 4 //cut size matches with memory in use
slab used 4
removing current slab
// slab is empty and put back to cache
// at this point we observe slab rotation.
// But new slab (i.e. new head of list) can be not empty
// (that is slab.used != 0) since we reverted Georgy's patch which
// zeroed whole list of slabs.
// So that used memory in first slab has increased (which looks extremely
// contradictory). Also, at the end of execution we call fibre_gc() which
// will nullify slab->used memory consuption and ergo reduce whole
// ...fiber.memory.used consumption. That's why amount of memory in usage at
// the end of query execution does not match with initial value.
// As a workaround we can remove slab rotation in region_truncate().
// Moreover, it may even result in performance gain.
// So, instead of pushing this patch, let's consider another one fix
// for small library. I'll send a patch.
...
cut size 0
slab used 2060
slabs used 26726
region alloc size 1
current slab unused 1980
current slab used 2060
current slab size 4096
region alloc size 1
current slab unused 1979
current slab used 2061
current slab size 4096
region alloc size 1
current slab unused 1978
current slab used 2062
current slab size 4096
region alloc size 1
current slab unused 1977
current slab used 2063
current slab size 4096
...
fiber gc
region reset
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
2020-01-29 17:03 [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection Alexander Turenko
2020-01-29 18:46 ` Nikita Pettik
@ 2020-02-27 20:22 ` Kirill Yukhin
2020-02-27 22:58 ` Nikita Pettik
1 sibling, 1 reply; 7+ messages in thread
From: Kirill Yukhin @ 2020-02-27 20:22 UTC (permalink / raw)
To: Alexander Turenko; +Cc: tarantool-patches
Hello,
On 29 янв 20:03, Alexander Turenko wrote:
> After #4736 regression fix (in fact it just reverts the new logic in
> small) it is possible again that a fiber's region may hold a memory for
> a while, but release it eventually. When the used memory exceeds 128 KiB
> threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
> subtracts them from region_used() metric. But until this point those
> slabs are accounted in region_used() and so in fiber.info() metrics.
>
> This commit fixes flakiness of test cases of the following kind:
>
> | fiber.info()[fiber.self().id()].memory.used -- should be zero
> | <...workload...>
> | fiber.info()[fiber.self().id()].memory.used -- should be zero
>
> The problem is that the first `<...>.memory.used` value may be non-zero.
> It depends of previous tests that were executed on this tarantool
> instance. It is resolved by restarting of a tarantool instance before
> such test cases to ensure that there are no 'garbage' slabs in a current
> fiber's region.
>
> Note: This works only if a test case reserves only one slab at the
> moment: otherwise some memory may be hold after the case (and so a
> memory check after a workload will fail). However it seems that our
> cases are small enough to don't trigger this situation.
>
> Call of region_free() would be enough, but we have no Lua API for it.
>
> Fixes #4750.
I've checked your patch into 2.2, 2.3 and master as temporary
workaround.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
2020-02-27 20:22 ` Kirill Yukhin
@ 2020-02-27 22:58 ` Nikita Pettik
2020-02-27 23:07 ` Kirill Yukhin
0 siblings, 1 reply; 7+ messages in thread
From: Nikita Pettik @ 2020-02-27 22:58 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tarantool-patches
On 27 Feb 23:22, Kirill Yukhin wrote:
> Hello,
>
> >
> > Fixes #4750.
>
> I've checked your patch into 2.2, 2.3 and master as temporary
> workaround.
Patch got no lgtms btw. Just for the record.
> --
> Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
2020-02-27 22:58 ` Nikita Pettik
@ 2020-02-27 23:07 ` Kirill Yukhin
0 siblings, 0 replies; 7+ messages in thread
From: Kirill Yukhin @ 2020-02-27 23:07 UTC (permalink / raw)
To: Nikita Pettik; +Cc: tarantool-patches
On 28 фев 01:58, Nikita Pettik wrote:
> On 27 Feb 23:22, Kirill Yukhin wrote:
> > Hello,
> >
> > >
> > > Fixes #4750.
> >
> > I've checked your patch into 2.2, 2.3 and master as temporary
> > workaround.
>
> Patch got no lgtms btw. Just for the record.
This is temporary, to stabilize testing.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-02-27 23:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-29 17:03 [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection Alexander Turenko
2020-01-29 18:46 ` Nikita Pettik
2020-01-29 20:27 ` Alexander Turenko
2020-02-05 15:36 ` Nikita Pettik
2020-02-27 20:22 ` Kirill Yukhin
2020-02-27 22:58 ` Nikita Pettik
2020-02-27 23:07 ` Kirill Yukhin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox