From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0AE3046970E for ; Sat, 8 Feb 2020 03:28:37 +0300 (MSK) References: <6c6d81d1-d9d2-5248-3edd-f56aefea9e76@tarantool.org> <20200207135859.GA51051@tarantool.org> From: Vladislav Shpilevoy Message-ID: <218075cd-d38d-1134-efab-6c713e34cef4@tarantool.org> Date: Sat, 8 Feb 2020 01:28:35 +0100 MIME-Version: 1.0 In-Reply-To: <20200207135859.GA51051@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] region: do not rotate slabs in case of single slab purification List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for the explanation! On 07/02/2020 14:58, Nikita Pettik wrote: > On 06 Feb 21:48, Vladislav Shpilevoy wrote: >> Hi! Thanks for the patch! >> >> On 06/02/2020 18:14, Nikita Pettik wrote: >>> Imagine following situation at the moment of region truncation: >>> >>> 1 slab 2 slab/HEAD >>> x of s1 used y of s2 used >>> +---------+ +---------+ >>> ...->|xx| |-->|yyyyyy| |--> NULL >>> |xx| | |yyyyyy| | >>> +---------+ +---------+ >>> >>> And region is going to be truncated by size y (i.g. >>> region_truncate(®ion, region_used() - y)). After this operation >>> second slab will remain empty (slab.used == 0). Currently, this slab is >>> returned back to the slab cache, which leads to slab rotation. This may >>> not be useful in terms of performance, especially for requests which >>> don't abuse region memory (for instance, SQL queries) but do call >>> region_truncate() intensively. For example, during transfer tuples to >>> ephemeral table truncate() is called for each tuple. What is more, >>> truncate is also called on each returned tuple in sql_row_to_port() >>> (see runtime_tuple_new()). >> >> I understand the point about perf and slab oscillation, but >> don't see how is it related to the tests. > > See below. > >>> Also, taking into consideration revert of 67d7ab4 (see 81dda5b) this may >>> lead to memory usage fluctuations: region memory in use before query >>> execution and after may not be the same. >> >> Yeah, but your patch does not fix that. It is a bigger problem. >> Region does not free its slabs even at fiber_gc() if their total >> size is small enough. So how is this related to memory fluctuations? > > Let's look at particular example taken from sql/gh-3199-no-mem-leaks.test.lua: > > box.execute('SELECT x, y + 3 * b, b FROM test2, test WHERE b = x') > > Here's brief region memory use log for this query: > > region alloc size 0 //First region allocation of 0 bytes > getting new slab > region alloc size 0 > current slab unused 4040 //56 bytes takes slab structure itself > current slab used 0 > current slab size 4096 > region alloc size 0 > ... //same values > region free > region alloc size 1 > current slab unused 4040 > current slab used 0 > current slab size 4096 > region alloc size 1 > ... > region alloc size 1 > current slab unused 4038 > current slab used 2 > current slab size 4096 > region alloc size 1 > ... > region join > region truncate > cut size 0 //nothing to truncate > current slab used 4 > slabs used 26730 //total region memory in use, before and during execution > region truncate > cut size 4 //cut size matches with memory in use > slab used 4 > removing current slab > > Slab is purified and put back to the slab cache. At this point we observe > slab rotation. But new slab (i.e. head of list) can be initialized with > non-zero used memory since we reverted Georgy's patch (which zeroed whole > list of slabs). As a result, used memory in first slab has increased (which > looks extremely contradictory). It can be seen in further logs: > ... > cut size 0 > slab used 2060 //used memory has bumped from 4 to 2060 > slabs used 26726 //total memory in use hasn't changed tho > ... > region alloc size 1 > current slab unused 1978 > current slab used 2062 > current slab size 4096 > region alloc size 1 > current slab unused 1977 > current slab used 2063 > current slab size 4096 > fiber_gc > fiber reset //at this step total memory is 26730 - 2063 = 24667 > > At the end of execution fibre_gc() is called, which in turn will nullify > slab->used memory consuption and ergo reduce whole fiber.memory.used > consumption by 2063. That's why amount of memory in usage at the end of > query execution does not match with initial value. > > Now, if truncate didn't rotate slab, region memory before and after > query execution would be the same. So, it would make tests like this > more stable, since region leaks could be easily detected as difference > between initial (i.e. before query execution) and final > (i.e. after execution) occupied region memory. Aha, now I see. region_reset() does not nullify all the slabs. I missed that the revert is already pushed. Yes, then looks fine. Except that if a request uses more than 128Kb, then it will call region_free() in fiber_gc() and will reduce memory consumption anyway. But for small tests may help. The patch LGTM.