From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 08317CC30A; Wed, 20 Jan 2021 11:20:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 08317CC30A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1611130839; bh=RbJoJ6jhgiJt5tEqtMWH7hmITgbAjoqmRlKCJIAO4K4=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=U39QtFsRUxvFQ0KPxPeTq7XOc4IH7oubuiqrcysaDiXo6m/KeWoCu/tnPX/C+C9d+ JlpkGnJLYwOTijrRibZbs8oDWQF3rLvnfwb/K53omV+bmlApup0Og6wdewBOHVKv+9 QH6xfCPMXoQ7G000EmJQhJi7f5enMXSTrCWN2SI4= Received: from smtp32.i.mail.ru (smtp32.i.mail.ru [94.100.177.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C13005B35B for ; Wed, 20 Jan 2021 11:20:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C13005B35B Received: by smtp32.i.mail.ru with esmtpa (envelope-from ) id 1l28jC-00037K-LP; Wed, 20 Jan 2021 11:20:35 +0300 Date: Wed, 20 Jan 2021 11:19:57 +0300 To: Igor Munkin Message-ID: <20210120081957.GA3034@root> References: <20201225113431.9538-1-skaplun@tarantool.org> <20210115131424.GA5460@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210115131424.GA5460@tarantool.org> X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D0E79FBC973162CDDA1A336500443B3AD46BC604C28ED16A00894C459B0CD1B91C76F24027FBFC274A2C59264B418C693DF082D5F39638D6C9EA2F84C7A7946D X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F12ABE79F2AB44EAEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006372094AD700861FA748638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC8504C28B02F6CFC282EA3033764793867C44852E28D65B51389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C07E7E81EEA8A9722B8941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B68424CA1AAF98A6958941B15DA834481F9449624AB7ADAF3735872C767BF85DA29E625A9149C048EE85C08DA1A1AB276A28A6D463EDFD0DBB4AD6D5ED66289B524E70A05D1297E1BB35872C767BF85DA227C277FBC8AE2E8B5953484031DDAE5375ECD9A6C639B01B4E70A05D1297E1BBC6867C52282FAC85D9B7C4F32B44FF57285124B2A10EEC6C00306258E7E6ABB4E4A6367B16DE6309 X-C1DE0DAB: 0D63561A33F958A58EE5A62F50800E65469FFEA5859CA297D023ABBC6CDEB7CDD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA758F9E841AEAEC4F2C410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34AC223B6CC5B2DFAC886E183BD34CE1911C0878D7C279C27627E8CDD0C1637C0730D384D31D0389841D7E09C32AA3244CA74BE2AF4685AEE326E501702BF5255351E887DA02A9F7BFFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj7AvRt3Uvx5SBupqJeRfqsQ== X-Mailru-Sender: 3B9A0136629DC91206CBC582EFEF4CB4BA6EA8D85DC8B06BCD66912BA0EB3010C8ACE2809ABB37DAF2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A89437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler X-BeenThere: tarantool-discussions@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development process List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-discussions Reply-To: Sergey Kaplun Cc: tarantool-discussions@dev.tarantool.org Errors-To: tarantool-discussions-bounces@dev.tarantool.org Sender: "Tarantool-discussions" Hi, Igor! Thanks for the review! On 15.01.21, Igor Munkin wrote: > Sergey, > > Thanks for the changes. There is a bit of nitpicking below and I > believe we'll push the next version doc to the trunk. I've fixed all your comments, plus added some insignificant fixes. See two iterative patches below. Branch is force pushed. > > On 25.12.20, Sergey Kaplun wrote: > > Part of #5442 > > --- > > > > RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md Side note: branch name is updated. New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md > > > > Changes in v3: > > * More comments in example. > > * More verbose benchmark information. > > * Grammar and spelling fixes. > > > > Changes in v2: > > * Removed C API, Tarantool integration and description of additional > > features -- they will be added in another RFC if necessary. > > * Removed checking profile is running from the public API. > > * Added benchmarks and more meaningful example. > > * Grammar fixes. > > > > doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++ > > 1 file changed, 314 insertions(+) > > create mode 100644 doc/rfc/5442-luajit-memory-profiler.md > > > > diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md > > new file mode 100644 > > index 000000000..85a61462a > > --- /dev/null > > +++ b/doc/rfc/5442-luajit-memory-profiler.md > > @@ -0,0 +1,314 @@ > > > > > +### Prerequisites > > + > > +This section describes additional changes in LuaJIT required for the feature > > +implementation. This version of LuaJIT memory profiler does not support verbose > > +reporting allocations from traces. All allocation from traces are reported as > > Typo: s/reporting allocations from/reporting for allocations made on/. Fixed, thanks! > > > +internal. But trace code semantics should be totally the same as for the Lua > > +interpreter (excluding sink optimizations). Also all deallocations reported as > > Typo: s/deallocations reported/deallocation are reported/. Fixed, thanks! > > > +internal too. > > + > > +There are two different representations of functions in LuaJIT: the function's > > +prototype (`GCproto`) and the function object so called closure (`GCfunc`). > > +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures > > +correspondingly. Also LuaJIT has a special function's type aka Fast Function. > > Typo: s/correspondingly/respectively/. > > > +It is used for LuaJIT builtins. > > It's better to not split this sentence. Consider the rewording: > | Besides LuaJIT has a special function type a.k.a. Fast Function that > | is used for LuaJIT builtins. Applied! Thanks! > > > + > > > > > +Usually developers are not interested in information about allocations inside > > +builtins. So if fast function was called from a Lua function all > > +allocations are attributed to this Lua function. Otherwise attribute this event > > +to a C function. > > I propose the following rewording: > | Lua developers can do nothing with allocations made inside the > | builtins except reducing its usage. So if fast function is called from > | a Lua function all allocations made in its scope are attributed to this > | Lua function (i.e. the builtin caller). Otherwise this event is > | attributed to a C function. > Applied, thanks! > > + > > > > > +If one run the chunk above the profiler reports approximately the following > > Typo: s/run/runs/. Fixed. > > > +(see legend [here](#reading-and-displaying-saved-data)): > > > > > +So we need to know a type of function being executed by the virtual machine > > +(VM). Currently VM state identifies C function execution only, so Fast and Lua > > +functions states will be added. > > Typo: s/will be/are/. Sure, thanks! > > > + > > +To determine currently allocating coroutine (that may not be equal to currently > > +executed one) a new field called `mem_L` is added to `global_State` structure > > +to keep the coroutine address. This field is set at each reallocation to > > Typo: /at each reallocation to/on each reallocation to the/. Fixed. > > > +corresponding `L` with which it was called. > > Typo: s/it was/it is/. Thanks, fixed! > > > + > > > > > +When the profiling is stopped the `fclose()` is called. If it is impossible to > > Typo: s/the `fclose()`/`fclose()`/. Fixed. > > > +open a file for writing or profiler fails to start, returns `nil` on failure > > Typo: s/returns `nil`/`nil` is returned/. Fixed. > > > +(plus an error message as a second result and a system-dependent error code as > > +a third result). Otherwise returns some true value. > > It would be nice to mention that the function contract is similar to > other standart io.* interfaces. > > I glanced the source code: it's not "some" true value; it is exactly the > *true* value. All right! Fixed. > > > + > > > > > +Memory profiler is expected to be thread safe, so it has a corresponding > > +lock/unlock at internal mutex whenever you call corresponding memprof > > +functions. If you want to build LuaJIT without thread safety use > > +`-DLUAJIT_DISABLE_THREAD_SAFE`. > > This is not implemented in scope of the MVP, so drop this part. Done. > > > + > > +### Reading and displaying saved data > > + > > +Binary data can be read by `lj-parse-memprof` utility. It parses the binary > > Typo: s/lj-parse-memprof/luajit-parse-memprof/. Fixed, thanks! > > > +format provided by memory profiler and render it on human-readable format. > > Typo: s/it on/it to/. Fixed, thanks! > > > + > > > > > +This table shows performance deviation in relation to REFerence value (before > > +commit) with stopped and running profiler. The table shows the average value > > +for 11 runs. The first field of the column indicates the change in the average > > +time in seconds (less is better). The second field is the standard deviation > > +for the found difference. > > + > > +``` > > + Name | REF | AFTER, memprof off | AFTER, memprof on > > +----------------+------+--------------------+------------------ > > +array3d | 0.21 | +0.00 (0.01) | +0.00 (0.01) > > +binary-trees | 3.25 | -0.01 (0.06) | +0.53 (0.10) > > +chameneos | 2.97 | +0.14 (0.04) | +0.13 (0.06) > > +coroutine-ring | 1.00 | +0.01 (0.04) | +0.01 (0.04) > > +euler14-bit | 1.03 | +0.01 (0.02) | +0.00 (0.02) > > +fannkuch | 6.81 | -0.21 (0.06) | -0.20 (0.06) > > +fasta | 8.20 | -0.07 (0.05) | -0.08 (0.03) > > Side note: Still curious how this can happen. It looks OK when this is > negative difference in within its deviation. But this is sorta magic. Yes, me too. Unfortunately, we have neither any benchmark tests nor performance analisis for LuaJIT for now. > > > +life | 0.46 | +0.00 (0.01) | +0.35 (0.01) > > +mandelbrot | 2.65 | +0.00 (0.01) | +0.01 (0.01) > > +mandelbrot-bit | 1.97 | +0.00 (0.01) | +0.01 (0.02) > > +md5 | 1.58 | -0.01 (0.04) | -0.04 (0.04) > > +nbody | 1.34 | +0.00 (0.01) | -0.02 (0.01) > > +nsieve | 2.07 | -0.03 (0.03) | -0.01 (0.04) > > +nsieve-bit | 1.50 | -0.02 (0.04) | +0.00 (0.04) > > +nsieve-bit-fp | 4.44 | -0.03 (0.07) | -0.01 (0.07) > > +partialsums | 0.54 | +0.00 (0.01) | +0.00 (0.01) > > +pidigits-nogmp | 3.47 | -0.01 (0.02) | -0.10 (0.02) > > +ray | 1.62 | -0.02 (0.03) | +0.00 (0.02) > > +recursive-ack | 0.20 | +0.00 (0.01) | +0.00 (0.01) > > +recursive-fib | 1.63 | +0.00 (0.01) | +0.01 (0.02) > > +scimark-fft | 5.72 | +0.06 (0.09) | -0.01 (0.10) > > +scimark-lu | 3.47 | +0.02 (0.27) | -0.03 (0.26) > > +scimark-sor | 2.34 | +0.00 (0.01) | -0.01 (0.01) > > +scimark-sparse | 4.95 | -0.02 (0.04) | -0.02 (0.04) > > +series | 0.95 | +0.00 (0.02) | +0.00 (0.01) > > +spectral-norm | 0.96 | +0.00 (0.02) | -0.01 (0.02) > > +``` > > -- > > 2.28.0 > > > > -- > Best regards, > IM =================================================================== diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md index 85a61462a..2721f1cc1 100644 --- a/doc/rfc/5442-luajit-memory-profiler.md +++ b/doc/rfc/5442-luajit-memory-profiler.md @@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be divided into several parts: This section describes additional changes in LuaJIT required for the feature implementation. This version of LuaJIT memory profiler does not support verbose -reporting allocations from traces. All allocation from traces are reported as -internal. But trace code semantics should be totally the same as for the Lua -interpreter (excluding sink optimizations). Also all deallocations reported as -internal too. +reporting for allocations made on traces. All allocation from traces are +reported as internal. But trace code semantics should be totally the same as +for the Lua interpreter (excluding sink optimizations). Also all, deallocations +are reported as internal too. There are two different representations of functions in LuaJIT: the function's prototype (`GCproto`) and the function object so called closure (`GCfunc`). The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures -correspondingly. Also LuaJIT has a special function's type aka Fast Function. -It is used for LuaJIT builtins. +respectively. Besides LuaJIT has a special function type, a.k.a. Fast Function +that is used for LuaJIT built-ins Tail call optimization does not create a new call frame, so all allocations inside the function called via `CALLT`/`CALLMT` are attributed to its caller. -Usually developers are not interested in information about allocations inside -builtins. So if fast function was called from a Lua function all -allocations are attributed to this Lua function. Otherwise attribute this event -to a C function. +Lua developers can do nothing with allocations made inside the built-ins except +reducing its usage. So if fast function is called from a Lua function all +allocations made in its scope are attributed to this Lua function (i.e. the +built-in caller). Otherwise, this event is attributed to a C function. Assume we have the following Lua chunk named : -``` +```lua 1 jit.off() 2 misc.memprof.start("memprof_new.bin") -3 -- Lua does not create a new frame to call string.rep and all allocations are -4 -- attributed not to `append()` function but to the parent scope. +3 -- Lua does not create a new frame to call string.rep() and all allocations +4 -- are attributed not to append() function but to the parent scope. 5 local function append(str, rep) 6 return string.rep(str, rep) 7 end 8 9 local t = {} 10 for _ = 1, 1e5 do -11 -- table.insert is a builtin and all corresponding allocations +11 -- table.insert() is a built-in and all corresponding allocations 12 -- are reported in the scope of main chunk 13 table.insert(t, 14 append('q', _) @@ -71,7 +71,7 @@ Assume we have the following Lua chunk named : 17 misc.memprof.stop() ``` -If one run the chunk above the profiler reports approximately the following +If one runs the chunk above the profiler reports approximately the following (see legend [here](#reading-and-displaying-saved-data)): ``` ALLOCATIONS @@ -99,15 +99,15 @@ INTERNAL: 20 0 1481 So we need to know a type of function being executed by the virtual machine (VM). Currently VM state identifies C function execution only, so Fast and Lua -functions states will be added. +functions states are added. To determine currently allocating coroutine (that may not be equal to currently executed one) a new field called `mem_L` is added to `global_State` structure -to keep the coroutine address. This field is set at each reallocation to -corresponding `L` with which it was called. +to keep the coroutine address. This field is set on each reallocation to the +corresponding `L` with which it is called. There is a static function (`lj_debug_getframeline`) that returns line number -for current `BCPos` in `lj_debug.c` already. It will be added to the debug +for current `BCPos` in `lj_debug.c` already. It is added to the debug module API to be used in memory profiler. ### Information recording @@ -211,10 +211,11 @@ local started, err, errno = misc.memprof.start(fname) ``` where `fname` is name of the file where profile events are written. Writer for this function perform `fwrite()` for each call retrying in case of `EINTR`. -When the profiling is stopped the `fclose()` is called. If it is impossible to -open a file for writing or profiler fails to start, returns `nil` on failure +When the profiling is stopped `fclose()` is called. The profiler's function's +contract is similar to standard `io.*` interfaces. If it is impossible to open +a file for writing or profiler fails to start, `nil` is returned on failure (plus an error message as a second result and a system-dependent error code as -a third result). Otherwise returns some true value. +a third result). Otherwise, returns `true` value. Stopping profiler from Lua is simple too: ```lua @@ -230,17 +231,12 @@ If you want to build LuaJIT without memory profiler, you should build it with `-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and `misc.memprof.stop()` always return `false`. -Memory profiler is expected to be thread safe, so it has a corresponding -lock/unlock at internal mutex whenever you call corresponding memprof -functions. If you want to build LuaJIT without thread safety use -`-DLUAJIT_DISABLE_THREAD_SAFE`. - ### Reading and displaying saved data -Binary data can be read by `lj-parse-memprof` utility. It parses the binary -format provided by memory profiler and render it on human-readable format. +Binary data can be read by `luajit-parse-memprof` utility. It parses the binary +format provided by memory profiler and render it to human-readable format. -The usage is very simple: +The usage for LuaJIT itself is very simple: ``` $ ./luajit-parse-memprof --help luajit-parse-memprof - parser of the memory usage profile collected @@ -266,6 +262,12 @@ structures. Note that events are sorted from the most often to the least. `Overrides` means what allocation this reallocation overrides. +If you want to parse binary data via Tarantool only, use the following +command (dash is important): +```bash +$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin +``` + ## Benchmarks Benchmarks were taken from repo: =================================================================== And one more iterative patch (over the previous one). Branch is force pushed. =================================================================== diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md index 2721f1cc1..f9c43f91f 100644 --- a/doc/rfc/5442-luajit-memory-profiler.md +++ b/doc/rfc/5442-luajit-memory-profiler.md @@ -5,7 +5,7 @@ * **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org, Igor Munkin @igormunkin imun@tarantool.org, Sergey Ostanevich @sergos sergos@tarantool.org -* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442) +* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442), [#5490](https://github.com/tarantool/tarantool/issues/5490) ## Summary =================================================================== -- Best regards, Sergey Kaplun