[Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection
Alexander Turenko
alexander.turenko at tarantool.org
Wed Jan 29 23:27:48 MSK 2020
On Wed, Jan 29, 2020 at 09:46:30PM +0300, Nikita Pettik wrote:
> On 29 Jan 20:03, Alexander Turenko wrote:
> > After #4736 regression fix (in fact it just reverts the new logic in
> > small) it is possible again that a fiber's region may hold a memory for
> > a while, but release it eventually. When the used memory exceeds 128 KiB
> > threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
> > subtracts them from region_used() metric. But until this point those
> > slabs are accounted in region_used() and so in fiber.info() metrics.
> >
> > This commit fixes flakiness of test cases of the following kind:
> >
> > | fiber.info()[fiber.self().id()].memory.used -- should be zero
> > | <...workload...>
> > | fiber.info()[fiber.self().id()].memory.used -- should be zero
> >
> > The problem is that the first `<...>.memory.used` value may be non-zero.
> > It depends of previous tests that were executed on this tarantool
> > instance. It is resolved by restarting of a tarantool instance before
> > such test cases to ensure that there are no 'garbage' slabs in a current
> > fiber's region.
>
> Hm, why not simply save value of ..memory.used before workload, value
> after workload and compare them:
>
> reg_sz_before = fiber.info()[fiber.self().id()].memory.used
> ...
> reg_sz_after = fiber.info()[fiber.self().id()].memory.used
>
> assert(reg_sz_before == reg_sz_after);
>
> So that make sure workload returns all occupied memory. This may fail
> only in case allocated memory > 4kb, but examples in this particular
> case definitely don't require so many memory (as you noted below).
I forgot to add the reason why this approach does not work to the commit
message. Added the following paragraph:
| The obvious way to solve it would be print differences between
| `<...>.memory.used` values before and after a workload instead of
| absolute values. This however does not work, because a first slab in a
| region can be almost used at the point where a test case starts and a
| next slab will be acquired from a slab_cache. This means that the
| previous slab will become a 'garbage' and will not be collected until
| 128 KiB threshold will exceed: the latter `<...>.memory.used` check will
| return a bigger value than the former one. However, if the threshold
| will be reached during the workload, the latter check may show lesser
| value than the former one. In short, the test case would be unstable
| after this change.
It can be checked using the snipped from the issue:
| diff --git a/test/sql/gh-3199-no-mem-leaks.test.lua b/test/sql/gh-3199-no-mem-leaks.test.lua
| index 41648d0fc..2f998e65f 100644
| --- a/test/sql/gh-3199-no-mem-leaks.test.lua
| +++ b/test/sql/gh-3199-no-mem-leaks.test.lua
| @@ -9,6 +9,15 @@ fiber = require('fiber')
|
| -- box.cfg()
|
| +box.once('init', function() \
| + box.schema.space.create('s') \
| + box.space.s:create_index('pk') \
| +end)
| +
| +data = string.rep('x', 2048) \
| +for i = 1, 1000 do \
| + box.space.s:upsert({i, data}, {{'=', 2, data}}) \
| +end
|
| box.execute('CREATE TABLE test (id INT PRIMARY KEY, x INTEGER, y INTEGER)')
| box.execute('INSERT INTO test VALUES (1, 1, 1), (2, 2, 2)')
After this change, values from
`fiber.info()[fiber.self().id()].memory.used` expressions from the test
will be the following:
32888
32888
32888
32888
26726
If we'll change the absolute values to a differences, then the last
value will not be zero anyway.
>
> > Note: This works only if a test case reserves only one slab at the
> > moment: otherwise some memory may be hold after the case (and so a
> > memory check after a workload will fail). However it seems that our
> > cases are small enough to don't trigger this situation.
> >
> > Call of region_free() would be enough, but we have no Lua API for it.
> >
> > Fixes #4750.
> > ---
> >
More information about the Tarantool-patches
mailing list