From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <v.shpilevoy@tarantool.org>
Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 0AE3046970E
 for <tarantool-patches@dev.tarantool.org>;
 Sat,  8 Feb 2020 03:28:37 +0300 (MSK)
References: <d4db4ccd7c65911eb17b7d6816ed8f8c663b5940.1581008805.git.korablev@tarantool.org>
 <6c6d81d1-d9d2-5248-3edd-f56aefea9e76@tarantool.org>
 <20200207135859.GA51051@tarantool.org>
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Message-ID: <218075cd-d38d-1134-efab-6c713e34cef4@tarantool.org>
Date: Sat, 8 Feb 2020 01:28:35 +0100
MIME-Version: 1.0
In-Reply-To: <20200207135859.GA51051@tarantool.org>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: Re: [Tarantool-patches] [PATCH] region: do not rotate slabs in case
 of single slab purification
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: Nikita Pettik <korablev@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org

Hi! Thanks for the explanation!

On 07/02/2020 14:58, Nikita Pettik wrote:
> On 06 Feb 21:48, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> On 06/02/2020 18:14, Nikita Pettik wrote:
>>> Imagine following situation at the moment of region truncation:
>>>
>>>        1 slab      2 slab/HEAD
>>>      x of s1 used  y of s2 used
>>>      +---------+   +---------+
>>> ...->|xx|      |-->|yyyyyy|  |--> NULL
>>>      |xx|      |   |yyyyyy|  |
>>>      +---------+   +---------+
>>>
>>> And region is going to be truncated by size y (i.g.
>>> region_truncate(&region, region_used() - y)). After this operation
>>> second slab will remain empty (slab.used == 0). Currently, this slab is
>>> returned back to the slab cache, which leads to slab rotation. This may
>>> not be useful in terms of performance, especially for requests which
>>> don't abuse region memory (for instance, SQL queries) but do call
>>> region_truncate() intensively. For example, during transfer tuples to
>>> ephemeral table truncate() is called for each tuple. What is more,
>>> truncate is also called on each returned tuple in sql_row_to_port()
>>> (see runtime_tuple_new()).
>>
>> I understand the point about perf and slab oscillation, but
>> don't see how is it related to the tests.
> 
> See below.
> 
>>> Also, taking into consideration revert of 67d7ab4 (see 81dda5b) this may
>>> lead to memory usage fluctuations: region memory in use before query
>>> execution and after may not be the same.
>>
>> Yeah, but your patch does not fix that. It is a bigger problem.
>> Region does not free its slabs even at fiber_gc() if their total
>> size is small enough. So how is this related to memory fluctuations?
> 
> Let's look at particular example taken from sql/gh-3199-no-mem-leaks.test.lua:
> 
> box.execute('SELECT x, y + 3 * b, b FROM test2, test WHERE b = x')
> 
> Here's brief region memory use log for this query:
> 
> region alloc size 0 //First region allocation of 0 bytes
> getting new slab
> region alloc size 0
> current slab unused 4040 //56 bytes takes slab structure itself
> current slab used 0
> current slab size 4096
> region alloc size 0
> ... //same values
> region free
> region alloc size 1
> current slab unused 4040
> current slab used 0
> current slab size 4096
> region alloc size 1
> ...
> region alloc size 1
> current slab unused 4038
> current slab used 2
> current slab size 4096
> region alloc size 1
> ...
> region join
> region truncate
> cut size 0 //nothing to truncate
> current slab used 4
> slabs used 26730 //total region memory in use, before and during execution
> region truncate
> cut size 4 //cut size matches with memory in use
> slab used 4
> removing current slab
> 
> Slab is purified and put back to the slab cache. At this point we observe
> slab rotation. But new slab (i.e. head of list) can be initialized with
> non-zero used memory since we reverted Georgy's patch (which zeroed whole
> list of slabs). As a result, used memory in first slab has increased (which
> looks extremely  contradictory). It can be seen in further logs:
> ...
> cut size 0
> slab used 2060 //used memory has bumped from 4 to 2060
> slabs used 26726 //total memory in use hasn't changed tho
> ...
> region alloc size 1
> current slab unused 1978
> current slab used 2062
> current slab size 4096
> region alloc size 1
> current slab unused 1977
> current slab used 2063
> current slab size 4096
> fiber_gc
> fiber reset //at this step total memory is 26730 - 2063 = 24667
> 
> At the end of execution fibre_gc() is called, which in turn will nullify
> slab->used memory consuption and ergo reduce whole fiber.memory.used
> consumption by 2063. That's why amount of memory in usage at the end of
> query execution does not match with initial value.
> 
> Now, if truncate didn't rotate slab, region memory before and after
> query execution would be the same. So, it would make tests like this
> more stable, since region leaks could be easily detected as difference
> between initial (i.e. before query execution) and final
> (i.e. after execution) occupied region memory.

Aha, now I see. region_reset() does not nullify all the
slabs. I missed that the revert is already pushed. Yes, then
looks fine. Except that if a request uses more than 128Kb, then
it will call region_free() in fiber_gc() and will reduce memory
consumption anyway. But for small tests may help.

The patch LGTM.