[Tarantool-patches] [PATCH] region: do not rotate slabs in case of single slab purification

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sat Feb 8 03:28:35 MSK 2020


Hi! Thanks for the explanation!

On 07/02/2020 14:58, Nikita Pettik wrote:
> On 06 Feb 21:48, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> On 06/02/2020 18:14, Nikita Pettik wrote:
>>> Imagine following situation at the moment of region truncation:
>>>
>>>        1 slab      2 slab/HEAD
>>>      x of s1 used  y of s2 used
>>>      +---------+   +---------+
>>> ...->|xx|      |-->|yyyyyy|  |--> NULL
>>>      |xx|      |   |yyyyyy|  |
>>>      +---------+   +---------+
>>>
>>> And region is going to be truncated by size y (i.g.
>>> region_truncate(&region, region_used() - y)). After this operation
>>> second slab will remain empty (slab.used == 0). Currently, this slab is
>>> returned back to the slab cache, which leads to slab rotation. This may
>>> not be useful in terms of performance, especially for requests which
>>> don't abuse region memory (for instance, SQL queries) but do call
>>> region_truncate() intensively. For example, during transfer tuples to
>>> ephemeral table truncate() is called for each tuple. What is more,
>>> truncate is also called on each returned tuple in sql_row_to_port()
>>> (see runtime_tuple_new()).
>>
>> I understand the point about perf and slab oscillation, but
>> don't see how is it related to the tests.
> 
> See below.
> 
>>> Also, taking into consideration revert of 67d7ab4 (see 81dda5b) this may
>>> lead to memory usage fluctuations: region memory in use before query
>>> execution and after may not be the same.
>>
>> Yeah, but your patch does not fix that. It is a bigger problem.
>> Region does not free its slabs even at fiber_gc() if their total
>> size is small enough. So how is this related to memory fluctuations?
> 
> Let's look at particular example taken from sql/gh-3199-no-mem-leaks.test.lua:
> 
> box.execute('SELECT x, y + 3 * b, b FROM test2, test WHERE b = x')
> 
> Here's brief region memory use log for this query:
> 
> region alloc size 0 //First region allocation of 0 bytes
> getting new slab
> region alloc size 0
> current slab unused 4040 //56 bytes takes slab structure itself
> current slab used 0
> current slab size 4096
> region alloc size 0
> ... //same values
> region free
> region alloc size 1
> current slab unused 4040
> current slab used 0
> current slab size 4096
> region alloc size 1
> ...
> region alloc size 1
> current slab unused 4038
> current slab used 2
> current slab size 4096
> region alloc size 1
> ...
> region join
> region truncate
> cut size 0 //nothing to truncate
> current slab used 4
> slabs used 26730 //total region memory in use, before and during execution
> region truncate
> cut size 4 //cut size matches with memory in use
> slab used 4
> removing current slab
> 
> Slab is purified and put back to the slab cache. At this point we observe
> slab rotation. But new slab (i.e. head of list) can be initialized with
> non-zero used memory since we reverted Georgy's patch (which zeroed whole
> list of slabs). As a result, used memory in first slab has increased (which
> looks extremely  contradictory). It can be seen in further logs:
> ...
> cut size 0
> slab used 2060 //used memory has bumped from 4 to 2060
> slabs used 26726 //total memory in use hasn't changed tho
> ...
> region alloc size 1
> current slab unused 1978
> current slab used 2062
> current slab size 4096
> region alloc size 1
> current slab unused 1977
> current slab used 2063
> current slab size 4096
> fiber_gc
> fiber reset //at this step total memory is 26730 - 2063 = 24667
> 
> At the end of execution fibre_gc() is called, which in turn will nullify
> slab->used memory consuption and ergo reduce whole fiber.memory.used
> consumption by 2063. That's why amount of memory in usage at the end of
> query execution does not match with initial value.
> 
> Now, if truncate didn't rotate slab, region memory before and after
> query execution would be the same. So, it would make tests like this
> more stable, since region leaks could be easily detected as difference
> between initial (i.e. before query execution) and final
> (i.e. after execution) occupied region memory.

Aha, now I see. region_reset() does not nullify all the
slabs. I missed that the revert is already pushed. Yes, then
looks fine. Except that if a request uses more than 128Kb, then
it will call region_free() in fiber_gc() and will reduce memory
consumption anyway. But for small tests may help.

The patch LGTM.


More information about the Tarantool-patches mailing list