From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id AA6CE46970E for ; Wed, 5 Feb 2020 18:36:19 +0300 (MSK) Date: Wed, 5 Feb 2020 18:36:12 +0300 From: Nikita Pettik Message-ID: <20200205153609.GB78702@tarantool.org> References: <20200129184630.GB16149@tarantool.org> <20200129202748.ermvumaerp746s2s@tkn_work_nb> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200129202748.ermvumaerp746s2s@tkn_work_nb> Subject: Re: [Tarantool-patches] [PATCH] test: stabilize flaky fiber memory leak detection List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Turenko Cc: tarantool-patches@dev.tarantool.org On 29 Jan 23:27, Alexander Turenko wrote: > On Wed, Jan 29, 2020 at 09:46:30PM +0300, Nikita Pettik wrote: > > On 29 Jan 20:03, Alexander Turenko wrote: > > > The problem is that the first `<...>.memory.used` value may be non-zero. > > > It depends of previous tests that were executed on this tarantool > > > instance. It is resolved by restarting of a tarantool instance before > > > such test cases to ensure that there are no 'garbage' slabs in a current > > > fiber's region. > > > > Hm, why not simply save value of ..memory.used before workload, value > > after workload and compare them: > > > > reg_sz_before = fiber.info()[fiber.self().id()].memory.used > > ... > > reg_sz_after = fiber.info()[fiber.self().id()].memory.used > > > > assert(reg_sz_before == reg_sz_after); > > > > So that make sure workload returns all occupied memory. This may fail > > only in case allocated memory > 4kb, but examples in this particular > > case definitely don't require so many memory (as you noted below). > > I forgot to add the reason why this approach does not work to the commit > message. Added the following paragraph: > > | The obvious way to solve it would be print differences between > | `<...>.memory.used` values before and after a workload instead of > | absolute values. This however does not work, because a first slab in a > | region can be almost used at the point where a test case starts and a > | next slab will be acquired from a slab_cache. This means that the I checked these particular SQL queries - it's not the case you mentioned. Slab (first one i.e. head of list) is empty at the start of query processing; meanwhile query itself requires only a few bytes allocated on region. Region memory (...memory.used) changes only after executing last query: box.execute('SELECT x, y + 3 * b, b FROM test2, test WHERE b = x') Here's brief region memory use log for this query: region alloc size 0 //First region allocation of 0 bytes getting new slab region alloc size 0 current slab unused 4040 //56 bytes takes slab structure itself current slab used 0 current slab size 4096 region alloc size 0 ... //same values region free region alloc size 1 current slab unused 4040 current slab used 0 current slab size 4096 region alloc size 1 ... region alloc size 1 current slab unused 4038 current slab used 2 current slab size 4096 region alloc size 1 ... region join region truncate cut size 0 //nothing to truncate current slab used 4 slabs used 26730 //total region memory in use region truncate cut size 4 //cut size matches with memory in use slab used 4 removing current slab // slab is empty and put back to cache // at this point we observe slab rotation. // But new slab (i.e. new head of list) can be not empty // (that is slab.used != 0) since we reverted Georgy's patch which // zeroed whole list of slabs. // So that used memory in first slab has increased (which looks extremely // contradictory). Also, at the end of execution we call fibre_gc() which // will nullify slab->used memory consuption and ergo reduce whole // ...fiber.memory.used consumption. That's why amount of memory in usage at // the end of query execution does not match with initial value. // As a workaround we can remove slab rotation in region_truncate(). // Moreover, it may even result in performance gain. // So, instead of pushing this patch, let's consider another one fix // for small library. I'll send a patch. ... cut size 0 slab used 2060 slabs used 26726 region alloc size 1 current slab unused 1980 current slab used 2060 current slab size 4096 region alloc size 1 current slab unused 1979 current slab used 2061 current slab size 4096 region alloc size 1 current slab unused 1978 current slab used 2062 current slab size 4096 region alloc size 1 current slab unused 1977 current slab used 2063 current slab size 4096 ... fiber gc region reset