From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 93C0F85DED; Wed, 20 Jan 2021 17:26:37 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 93C0F85DED DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1611152797; bh=4UEd6Jf1IOu0AuHWCrSXuDAjAQMwILOkWVzkFEPTSeo=; h=Date:In-Reply-To:To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=j9EErgp/isue4Tkc0GCFNbidLTo9rFovK0hSoL56Ll+6N8smwM6LgFnWnH4mh66gn SzhBxrVw0YFgyA59xlNoSQzxMeW6YwG1kkAr8DWRcFBAbb97Q0iSY6eSKFuHL+5xSo ab6Isc4TtSk9VF6ZAtfJI8KUmbhsiV6IJHYPCmAI= Received: from smtp47.i.mail.ru (smtp47.i.mail.ru [94.100.177.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A471884381 for ; Wed, 20 Jan 2021 17:26:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A471884381 Received: by smtp47.i.mail.ru with esmtpa (envelope-from ) id 1l2ERO-0005vA-81; Wed, 20 Jan 2021 17:26:34 +0300 Message-Id: <215048A7-04C8-42E3-B142-4856DA9EC56E@tarantool.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_C9BFB249-5FEA-4E32-83E2-E0942FDD1C78" Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.40.0.2.32\)) Date: Wed, 20 Jan 2021 17:26:33 +0300 In-Reply-To: <20210120081957.GA3034@root> To: Sergey Kaplun References: <20201225113431.9538-1-skaplun@tarantool.org> <20210115131424.GA5460@tarantool.org> <20210120081957.GA3034@root> X-Mailer: Apple Mail (2.3654.40.0.2.32) X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D0E79FBC973162CD1269BAEF2517FA5CABD5EB3486AA6E7100894C459B0CD1B996D27909EE9EDDC61F944029D5C0B1A6E88FC32EBD7EDC0027673A5C93F18348 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7A1DB0B089319D380EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637C2EE9128AC0EB2C78638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC5D59438228D7383A26AA31CBD4B60673C9DB74FD822A8B76389733CBF5DBD5E913377AFFFEAFD269A417C69337E82CC2CC7F00164DA146DAFE8445B8C89999729449624AB7ADAF37F6B57BC7E64490611E7FA7ABCAF51C92A417C69337E82CC2CC7F00164DA146DA6F5DAA56C3B73B23C77107234E2CFBA567F23339F89546C55F5C1EE8F4F765FC045FDDF00BA7A7CA75ECD9A6C639B01BBD4B6F7A4D31EC0BC0CAF46E325F83A522CA9DD8327EE49322A4E020D69BCAC135872C767BF85DA2F004C906525384306FED454B719173D6462275124DF8B9C9DF33B08B2BB81206574AF45C6390F7469DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A501211B7461A0DC8CE5D46A7EDDE7E427A472DAF5832FEB93D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA758F9E841AEAEC4F2C410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34753B45383CEFF2045FCC73B97532006B7FB47EF7F364E08915A18D6714B10E52F3352CF03FA3FEBC1D7E09C32AA3244CE80D4B8EACB1D209733CD18B0653F1CF33C9DC155518937FFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj7AvRt3Uvx5SSf4JbvcNneg== X-Mailru-Sender: 3B9A0136629DC912F4AABCEFC589C81E30ECA35AA5B5FD3E7CEC57EB35CE746884D48FBD8E3410FAAD07DD1419AC565FA614486B47F28B67C5E079CCF3B0523AED31B7EB2E253A9E112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler X-BeenThere: tarantool-discussions@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development process List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Ostanevich via Tarantool-discussions Reply-To: Sergey Ostanevich Cc: tarantool-discussions@dev.tarantool.org Errors-To: tarantool-discussions-bounces@dev.tarantool.org Sender: "Tarantool-discussions" --Apple-Mail=_C9BFB249-5FEA-4E32-83E2-E0942FDD1C78 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi!=20 Thanks for the patch, I've looked into=20 = https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-= profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md = in =E2=80=98Prerequisites=E2=80=99:=20 > Also all, deallocations are reported as internal too. the comma is not needed > Lua developers can do nothing with allocations made inside the = built-ins except reducing its usage. =E2=80=98its=E2=80=99 doesn=E2=80=99t explain exact matter. I would = rephrase: "As for allocations made inside the built-ins user can do = nothing but reduce use of these built-ins." > Currently VM state identifies C function execution only, so Fast and = Lua functions states are added. =E2=80=98Currently=E2=80=99 -> =E2=80=98Originally=E2=80=99 Otherwise LGTM. Sergos > On 20 Jan 2021, at 11:19, Sergey Kaplun wrote: >=20 > Hi, Igor! >=20 > Thanks for the review! >=20 > On 15.01.21, Igor Munkin wrote: >> Sergey, >>=20 >> Thanks for the changes. There is a bit of nitpicking below and I >> believe we'll push the next version doc to the trunk. >=20 > I've fixed all your comments, plus added some insignificant fixes. > See two iterative patches below. Branch is force pushed. >=20 >>=20 >> On 25.12.20, Sergey Kaplun wrote: >>> Part of #5442 >>> --- >>>=20 >>> RFC on branch: = https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-= profiler/doc/rfc/5442-luajit-memory-profiler.md = >=20 > Side note: branch name is updated. > New RFC version: = https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-= profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md = >=20 >>>=20 >>> Changes in v3: >>> * More comments in example. >>> * More verbose benchmark information. >>> * Grammar and spelling fixes. >>>=20 >>> Changes in v2: >>> * Removed C API, Tarantool integration and description of additional >>> features -- they will be added in another RFC if necessary. >>> * Removed checking profile is running from the public API. >>> * Added benchmarks and more meaningful example. >>> * Grammar fixes. >>>=20 >>> doc/rfc/5442-luajit-memory-profiler.md | 314 = +++++++++++++++++++++++++ >>> 1 file changed, 314 insertions(+) >>> create mode 100644 doc/rfc/5442-luajit-memory-profiler.md >>>=20 >>> diff --git a/doc/rfc/5442-luajit-memory-profiler.md = b/doc/rfc/5442-luajit-memory-profiler.md >>> new file mode 100644 >>> index 000000000..85a61462a >>> --- /dev/null >>> +++ b/doc/rfc/5442-luajit-memory-profiler.md >>> @@ -0,0 +1,314 @@ >>=20 >> >>=20 >>> +### Prerequisites >>> + >>> +This section describes additional changes in LuaJIT required for = the feature >>> +implementation. This version of LuaJIT memory profiler does not = support verbose >>> +reporting allocations from traces. All allocation from traces are = reported as >>=20 >> Typo: s/reporting allocations from/reporting for allocations made = on/. >=20 > Fixed, thanks! >=20 >>=20 >>> +internal. But trace code semantics should be totally the same as = for the Lua >>> +interpreter (excluding sink optimizations). Also all deallocations = reported as >>=20 >> Typo: s/deallocations reported/deallocation are reported/. >=20 > Fixed, thanks! >=20 >>=20 >>> +internal too. >>> + >>> +There are two different representations of functions in LuaJIT: the = function's >>> +prototype (`GCproto`) and the function object so called closure = (`GCfunc`). >>> +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and = C closures >>> +correspondingly. Also LuaJIT has a special function's type aka Fast = Function. >>=20 >> Typo: s/correspondingly/respectively/. >>=20 >>> +It is used for LuaJIT builtins. >>=20 >> It's better to not split this sentence. Consider the rewording: >> | Besides LuaJIT has a special function type a.k.a. Fast Function = that >> | is used for LuaJIT builtins. >=20 > Applied! Thanks! >=20 >>=20 >>> + >>=20 >> >>=20 >>> +Usually developers are not interested in information about = allocations inside >>> +builtins. So if fast function was called from a Lua function all >>> +allocations are attributed to this Lua function. Otherwise = attribute this event >>> +to a C function. >>=20 >> I propose the following rewording: >> | Lua developers can do nothing with allocations made inside the >> | builtins except reducing its usage. So if fast function is called = from >> | a Lua function all allocations made in its scope are attributed to = this >> | Lua function (i.e. the builtin caller). Otherwise this event is >> | attributed to a C function. >>=20 >=20 > Applied, thanks! >=20 >>> + >>=20 >> >>=20 >>> +If one run the chunk above the profiler reports approximately the = following >>=20 >> Typo: s/run/runs/. >=20 > Fixed. >=20 >>=20 >>> +(see legend [here](#reading-and-displaying-saved-data)): >>=20 >> >>=20 >>> +So we need to know a type of function being executed by the virtual = machine >>> +(VM). Currently VM state identifies C function execution only, so = Fast and Lua >>> +functions states will be added. >>=20 >> Typo: s/will be/are/. >=20 > Sure, thanks! >=20 >>=20 >>> + >>> +To determine currently allocating coroutine (that may not be equal = to currently >>> +executed one) a new field called `mem_L` is added to `global_State` = structure >>> +to keep the coroutine address. This field is set at each = reallocation to >>=20 >> Typo: /at each reallocation to/on each reallocation to the/. >=20 > Fixed. >=20 >>=20 >>> +corresponding `L` with which it was called. >>=20 >> Typo: s/it was/it is/. >=20 > Thanks, fixed! >=20 >>=20 >>> + >>=20 >> >>=20 >>> +When the profiling is stopped the `fclose()` is called. If it is = impossible to >>=20 >> Typo: s/the `fclose()`/`fclose()`/. >=20 > Fixed. >=20 >>=20 >>> +open a file for writing or profiler fails to start, returns `nil` = on failure >>=20 >> Typo: s/returns `nil`/`nil` is returned/. >=20 > Fixed. >=20 >>=20 >>> +(plus an error message as a second result and a system-dependent = error code as >>> +a third result). Otherwise returns some true value. >>=20 >> It would be nice to mention that the function contract is similar to >> other standart io.* interfaces. >>=20 >> I glanced the source code: it's not "some" true value; it is exactly = the >> *true* value. >=20 > All right! Fixed. >=20 >>=20 >>> + >>=20 >> >>=20 >>> +Memory profiler is expected to be thread safe, so it has a = corresponding >>> +lock/unlock at internal mutex whenever you call corresponding = memprof >>> +functions. If you want to build LuaJIT without thread safety use >>> +`-DLUAJIT_DISABLE_THREAD_SAFE`. >>=20 >> This is not implemented in scope of the MVP, so drop this part. >=20 > Done. >=20 >>=20 >>> + >>> +### Reading and displaying saved data >>> + >>> +Binary data can be read by `lj-parse-memprof` utility. It parses = the binary >>=20 >> Typo: s/lj-parse-memprof/luajit-parse-memprof/. >=20 > Fixed, thanks! >=20 >>=20 >>> +format provided by memory profiler and render it on human-readable = format. >>=20 >> Typo: s/it on/it to/. >=20 > Fixed, thanks! >=20 >>=20 >>> + >>=20 >> >>=20 >>> +This table shows performance deviation in relation to REFerence = value (before >>> +commit) with stopped and running profiler. The table shows the = average value >>> +for 11 runs. The first field of the column indicates the change in = the average >>> +time in seconds (less is better). The second field is the standard = deviation >>> +for the found difference. >>> + >>> +``` >>> + Name | REF | AFTER, memprof off | AFTER, memprof on >>> +----------------+------+--------------------+------------------ >>> +array3d | 0.21 | +0.00 (0.01) | +0.00 (0.01) >>> +binary-trees | 3.25 | -0.01 (0.06) | +0.53 (0.10) >>> +chameneos | 2.97 | +0.14 (0.04) | +0.13 (0.06) >>> +coroutine-ring | 1.00 | +0.01 (0.04) | +0.01 (0.04) >>> +euler14-bit | 1.03 | +0.01 (0.02) | +0.00 (0.02) >>> +fannkuch | 6.81 | -0.21 (0.06) | -0.20 (0.06) >>> +fasta | 8.20 | -0.07 (0.05) | -0.08 (0.03) >>=20 >> Side note: Still curious how this can happen. It looks OK when this = is >> negative difference in within its deviation. But this is sorta magic. >=20 > Yes, me too. Unfortunately, we have neither any benchmark tests nor > performance analisis for LuaJIT for now. >=20 >>=20 >>> +life | 0.46 | +0.00 (0.01) | +0.35 (0.01) >>> +mandelbrot | 2.65 | +0.00 (0.01) | +0.01 (0.01) >>> +mandelbrot-bit | 1.97 | +0.00 (0.01) | +0.01 (0.02) >>> +md5 | 1.58 | -0.01 (0.04) | -0.04 (0.04) >>> +nbody | 1.34 | +0.00 (0.01) | -0.02 (0.01) >>> +nsieve | 2.07 | -0.03 (0.03) | -0.01 (0.04) >>> +nsieve-bit | 1.50 | -0.02 (0.04) | +0.00 (0.04) >>> +nsieve-bit-fp | 4.44 | -0.03 (0.07) | -0.01 (0.07) >>> +partialsums | 0.54 | +0.00 (0.01) | +0.00 (0.01) >>> +pidigits-nogmp | 3.47 | -0.01 (0.02) | -0.10 (0.02) >>> +ray | 1.62 | -0.02 (0.03) | +0.00 (0.02) >>> +recursive-ack | 0.20 | +0.00 (0.01) | +0.00 (0.01) >>> +recursive-fib | 1.63 | +0.00 (0.01) | +0.01 (0.02) >>> +scimark-fft | 5.72 | +0.06 (0.09) | -0.01 (0.10) >>> +scimark-lu | 3.47 | +0.02 (0.27) | -0.03 (0.26) >>> +scimark-sor | 2.34 | +0.00 (0.01) | -0.01 (0.01) >>> +scimark-sparse | 4.95 | -0.02 (0.04) | -0.02 (0.04) >>> +series | 0.95 | +0.00 (0.02) | +0.00 (0.01) >>> +spectral-norm | 0.96 | +0.00 (0.02) | -0.01 (0.02) >>> +``` >>> --=20 >>> 2.28.0 >>>=20 >>=20 >> --=20 >> Best regards, >> IM >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > diff --git a/doc/rfc/5442-luajit-memory-profiler.md = b/doc/rfc/5442-luajit-memory-profiler.md > index 85a61462a..2721f1cc1 100644 > --- a/doc/rfc/5442-luajit-memory-profiler.md > +++ b/doc/rfc/5442-luajit-memory-profiler.md > @@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be = divided into several parts: >=20 > This section describes additional changes in LuaJIT required for the = feature > implementation. This version of LuaJIT memory profiler does not = support verbose > -reporting allocations from traces. All allocation from traces are = reported as > -internal. But trace code semantics should be totally the same as for = the Lua > -interpreter (excluding sink optimizations). Also all deallocations = reported as > -internal too. > +reporting for allocations made on traces. All allocation from traces = are > +reported as internal. But trace code semantics should be totally the = same as > +for the Lua interpreter (excluding sink optimizations). Also all, = deallocations > +are reported as internal too. >=20 > There are two different representations of functions in LuaJIT: the = function's > prototype (`GCproto`) and the function object so called closure = (`GCfunc`). > The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C = closures > -correspondingly. Also LuaJIT has a special function's type aka Fast = Function. > -It is used for LuaJIT builtins. > +respectively. Besides LuaJIT has a special function type, a.k.a. Fast = Function > +that is used for LuaJIT built-ins >=20 > Tail call optimization does not create a new call frame, so all = allocations > inside the function called via `CALLT`/`CALLMT` are attributed to its = caller. >=20 > -Usually developers are not interested in information about = allocations inside > -builtins. So if fast function was called from a Lua function all > -allocations are attributed to this Lua function. Otherwise attribute = this event > -to a C function. > +Lua developers can do nothing with allocations made inside the = built-ins except > +reducing its usage. So if fast function is called from a Lua function = all > +allocations made in its scope are attributed to this Lua function = (i.e. the > +built-in caller). Otherwise, this event is attributed to a C = function. >=20 > Assume we have the following Lua chunk named : >=20 > -``` > +```lua > 1 jit.off() > 2 misc.memprof.start("memprof_new.bin") > -3 -- Lua does not create a new frame to call string.rep and all = allocations are > -4 -- attributed not to `append()` function but to the parent scope. > +3 -- Lua does not create a new frame to call string.rep() and all = allocations > +4 -- are attributed not to append() function but to the parent = scope. > 5 local function append(str, rep) > 6 return string.rep(str, rep) > 7 end > 8 > 9 local t =3D {} > 10 for _ =3D 1, 1e5 do > -11 -- table.insert is a builtin and all corresponding allocations > +11 -- table.insert() is a built-in and all corresponding = allocations > 12 -- are reported in the scope of main chunk > 13 table.insert(t, > 14 append('q', _) > @@ -71,7 +71,7 @@ Assume we have the following Lua chunk named = : > 17 misc.memprof.stop() > ``` >=20 > -If one run the chunk above the profiler reports approximately the = following > +If one runs the chunk above the profiler reports approximately the = following > (see legend [here](#reading-and-displaying-saved-data)): > ``` > ALLOCATIONS > @@ -99,15 +99,15 @@ INTERNAL: 20 0 1481 >=20 > So we need to know a type of function being executed by the virtual = machine > (VM). Currently VM state identifies C function execution only, so Fast = and Lua > -functions states will be added. > +functions states are added. >=20 > To determine currently allocating coroutine (that may not be equal to = currently > executed one) a new field called `mem_L` is added to `global_State` = structure > -to keep the coroutine address. This field is set at each reallocation = to > -corresponding `L` with which it was called. > +to keep the coroutine address. This field is set on each reallocation = to the > +corresponding `L` with which it is called. >=20 > There is a static function (`lj_debug_getframeline`) that returns line = number > -for current `BCPos` in `lj_debug.c` already. It will be added to the = debug > +for current `BCPos` in `lj_debug.c` already. It is added to the debug > module API to be used in memory profiler. >=20 > ### Information recording > @@ -211,10 +211,11 @@ local started, err, errno =3D = misc.memprof.start(fname) > ``` > where `fname` is name of the file where profile events are written. = Writer for > this function perform `fwrite()` for each call retrying in case of = `EINTR`. > -When the profiling is stopped the `fclose()` is called. If it is = impossible to > -open a file for writing or profiler fails to start, returns `nil` on = failure > +When the profiling is stopped `fclose()` is called. The profiler's = function's > +contract is similar to standard `io.*` interfaces. If it is = impossible to open > +a file for writing or profiler fails to start, `nil` is returned on = failure > (plus an error message as a second result and a system-dependent error = code as > -a third result). Otherwise returns some true value. > +a third result). Otherwise, returns `true` value. >=20 > Stopping profiler from Lua is simple too: > ```lua > @@ -230,17 +231,12 @@ If you want to build LuaJIT without memory = profiler, you should build it with > `-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` = and > `misc.memprof.stop()` always return `false`. >=20 > -Memory profiler is expected to be thread safe, so it has a = corresponding > -lock/unlock at internal mutex whenever you call corresponding memprof > -functions. If you want to build LuaJIT without thread safety use > -`-DLUAJIT_DISABLE_THREAD_SAFE`. > - > ### Reading and displaying saved data >=20 > -Binary data can be read by `lj-parse-memprof` utility. It parses the = binary > -format provided by memory profiler and render it on human-readable = format. > +Binary data can be read by `luajit-parse-memprof` utility. It parses = the binary > +format provided by memory profiler and render it to human-readable = format. >=20 > -The usage is very simple: > +The usage for LuaJIT itself is very simple: > ``` > $ ./luajit-parse-memprof --help > luajit-parse-memprof - parser of the memory usage profile collected > @@ -266,6 +262,12 @@ structures. Note that events are sorted from the = most often to the least. >=20 > `Overrides` means what allocation this reallocation overrides. >=20 > +If you want to parse binary data via Tarantool only, use the = following > +command (dash is important): > +```bash > +$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin > +``` > + > ## Benchmarks >=20 > Benchmarks were taken from repo: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > And one more iterative patch (over the previous one). Branch is > force pushed. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > diff --git a/doc/rfc/5442-luajit-memory-profiler.md = b/doc/rfc/5442-luajit-memory-profiler.md > index 2721f1cc1..f9c43f91f 100644 > --- a/doc/rfc/5442-luajit-memory-profiler.md > +++ b/doc/rfc/5442-luajit-memory-profiler.md > @@ -5,7 +5,7 @@ > * **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org = , > Igor Munkin @igormunkin imun@tarantool.org = , > Sergey Ostanevich @sergos sergos@tarantool.org = > -* **Issues**: = [#5442](https://github.com/tarantool/tarantool/issues/5442 = ) > +* **Issues**: = [#5442](https://github.com/tarantool/tarantool/issues/5442 = ), = [#5490](https://github.com/tarantool/tarantool/issues/5490 = ) >=20 > ## Summary > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --=20 > Best regards, > Sergey Kaplun --Apple-Mail=_C9BFB249-5FEA-4E32-83E2-E0942FDD1C78 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Hi! 

Thanks for the patch, I've = looked into 

in = =E2=80=98Prerequisites=E2=80=99: 
Also = all, deallocations are reported as internal too.

the = comma is not needed

Lua developers can do = nothing with allocations made inside the built-ins except reducing its = usage.

=E2=80=98its=E2=80=99 doesn=E2=80=99t = explain exact matter. I would rephrase: "As for allocations made inside = the built-ins user can do nothing but reduce use of these = built-ins."

Currently VM state = identifies C function execution only, so Fast and Lua functions states = are added.

=E2=80=98Currently=E2=80=99 -> = =E2=80=98Originally=E2=80=99

Otherwise = LGTM.
Sergos


On 20 = Jan 2021, at 11:19, Sergey Kaplun <skaplun@tarantool.org> wrote:

Hi, Igor!

Thanks for the review!

On 15.01.21, Igor Munkin wrote:
Sergey,

Thanks for the changes. There is a bit of nitpicking below = and I
believe we'll push the next version doc to the = trunk.

I've fixed all your comments, plus added some insignificant = fixes.
See two = iterative patches below. Branch is force pushed.


On 25.12.20, Sergey Kaplun wrote:
Part of #5442
---
RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-lua= jit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md

Side note: branch name is updated.
New RFC = version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-lua= jit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md


Changes = in v3:
* More comments in example.
* More = verbose benchmark information.
* Grammar and spelling = fixes.

Changes in v2:
* = Removed C API, Tarantool integration and description of additional
 features -- they will be added in another RFC if = necessary.
* Removed checking profile is running from the = public API.
* Added benchmarks and more meaningful = example.
* Grammar fixes.

doc/rfc/5442-luajit-memory-profiler.md | 314 = +++++++++++++++++++++++++
1 file changed, 314 = insertions(+)
create mode 100644 = doc/rfc/5442-luajit-memory-profiler.md

diff = --git a/doc/rfc/5442-luajit-memory-profiler.md = b/doc/rfc/5442-luajit-memory-profiler.md
new file mode = 100644
index 000000000..85a61462a
--- = /dev/null
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -0,0 +1,314 @@

<snipped>

+### Prerequisites
+
+This section describes additional changes in LuaJIT required = for the feature
+implementation. This version of LuaJIT = memory profiler does not support verbose
+reporting = allocations from traces. All allocation from traces are reported as

Typo: s/reporting allocations = from/reporting for allocations made on/.

Fixed, = thanks!


+internal. But trace = code semantics should be totally the same as for the Lua
+interpreter (excluding sink optimizations). Also all = deallocations reported as

Typo: = s/deallocations reported/deallocation are reported/.

Fixed, thanks!


+internal too.
+
+There are two different representations of functions in = LuaJIT: the function's
+prototype (`GCproto`) and the = function object so called closure (`GCfunc`).
+The = closures are represented as `GCfuncL` and `GCfuncC` for Lua and C = closures
+correspondingly. Also LuaJIT has a special = function's type aka Fast Function.

Typo: s/correspondingly/respectively/.

+It is used for LuaJIT = builtins.

It's better to not = split this sentence. Consider the rewording:
| Besides = LuaJIT has a special function type a.k.a. Fast Function that
| is used for LuaJIT builtins.

Applied! = Thanks!


+

<snipped>

+Usually developers are = not interested in information about allocations inside
+builtins. So if fast function was called from a Lua function = all
+allocations are attributed to this Lua function. = Otherwise attribute this event
+to a C function.

I propose the following = rewording:
| Lua developers can do nothing with = allocations made inside the
| builtins except reducing its = usage. So if fast function is called from
| a Lua function = all allocations made in its scope are attributed to this
| = Lua function (i.e. the builtin caller). Otherwise this event is
| attributed to a C function.


Applied, thanks!

+

<snipped>

+If one = run the chunk above the profiler reports approximately the following

Typo: s/run/runs/.

Fixed.


+(see legend = [here](#reading-and-displaying-saved-data)):

<snipped>

+So we need to know a = type of function being executed by the virtual machine
+(VM). Currently VM state identifies C function execution = only, so Fast and Lua
+functions states will be added.

Typo: s/will be/are/.

Sure, thanks!


+
+To determine currently = allocating coroutine (that may not be equal to currently
+executed one) a new field called `mem_L` is added to = `global_State` structure
+to keep the coroutine address. = This field is set at each reallocation to

Typo: /at each reallocation to/on each reallocation to = the/.

Fixed.


+corresponding `L` with which it was called.

Typo: s/it was/it is/.

Thanks, fixed!


+

<snipped>

+When the profiling is stopped the `fclose()` = is called. If it is impossible to

Typo: s/the `fclose()`/`fclose()`/.

Fixed.


+open a file for writing or profiler fails to = start, returns `nil` on failure

Typo: s/returns `nil`/`nil` is returned/.

Fixed.


+(plus an error message as a second result and = a system-dependent error code as
+a third result). = Otherwise returns some true value.

It would be nice to mention that the function contract is = similar to
other standart io.* interfaces.
I glanced the source code: it's not "some" true value; it is = exactly the
*true* value.

All right! = Fixed.


+

<snipped>

+Memory profiler is = expected to be thread safe, so it has a corresponding
+lock/unlock at internal mutex whenever you call = corresponding memprof
+functions. If you want to build = LuaJIT without thread safety use
+`-DLUAJIT_DISABLE_THREAD_SAFE`.
This is not implemented in scope of the MVP, so drop this = part.

Done.


+
+### Reading and displaying = saved data
+
+Binary data can be read by = `lj-parse-memprof` utility. It parses the binary

Typo: = s/lj-parse-memprof/luajit-parse-memprof/.

Fixed, = thanks!


+format provided by = memory profiler and render it on human-readable format.

Typo: s/it on/it to/.

Fixed, thanks!


+

<snipped>

+This table shows performance deviation in = relation to REFerence value (before
+commit) with stopped = and running profiler. The table shows the average value
+for= 11 runs. The first field of the column indicates the change in the = average
+time in seconds (less is better). The second = field is the standard deviation
+for the found = difference.
+
+```
+ =     Name       | REF =  | AFTER, memprof off | AFTER, memprof on
+----------------+------+--------------------+-----------------= -
+array3d =         | 0.21 | =    +0.00 (0.01)    | =    +0.00 (0.01)
+binary-trees =    | 3.25 |    -0.01 (0.06) =    |    +0.53 (0.10)
+chameneos       | 2.97 | =    +0.14 (0.04)    | =    +0.13 (0.06)
+coroutine-ring  | = 1.00 |    +0.01 (0.04)    | =    +0.01 (0.04)
+euler14-bit =     | 1.03 |    +0.01 (0.02) =    |    +0.00 (0.02)
+fannkuch=        | 6.81 | =    -0.21 (0.06)    | =    -0.20 (0.06)
+fasta =           | 8.20 | =    -0.07 (0.05)    | =    -0.08 (0.03)

Side note: Still curious how this can happen. It looks OK = when this is
negative difference in within its deviation. = But this is sorta magic.

Yes, me too. = Unfortunately, we have neither any benchmark tests nor
performance = analisis for LuaJIT for now.


+life =            | 0.46 = |    +0.00 (0.01)    | =    +0.35 (0.01)
+mandelbrot =      | 2.65 |    +0.00 (0.01) =    |    +0.01 (0.01)
+mandelbrot-bit  | 1.97 |    +0.00 (0.01) =    |    +0.01 (0.02)
+md5 =             | = 1.58 |    -0.01 (0.04)    | =    -0.04 (0.04)
+nbody =           | 1.34 | =    +0.00 (0.01)    | =    -0.02 (0.01)
+nsieve =          | 2.07 | =    -0.03 (0.03)    | =    -0.01 (0.04)
+nsieve-bit =      | 1.50 |    -0.02 (0.04) =    |    +0.00 (0.04)
+nsieve-bit-fp   | 4.44 |    -0.03 = (0.07)    |    -0.01 (0.07)
+partialsums     | 0.54 | =    +0.00 (0.01)    | =    +0.00 (0.01)
+pidigits-nogmp  | = 3.47 |    -0.01 (0.02)    | =    -0.10 (0.02)
+ray =             | = 1.62 |    -0.02 (0.03)    | =    +0.00 (0.02)
+recursive-ack =   | 0.20 |    +0.00 (0.01)    | =    +0.00 (0.01)
+recursive-fib =   | 1.63 |    +0.00 (0.01)    | =    +0.01 (0.02)
+scimark-fft =     | 5.72 |    +0.06 (0.09) =    |    -0.01 (0.10)
+scimark-lu      | 3.47 | =    +0.02 (0.27)    | =    -0.03 (0.26)
+scimark-sor =     | 2.34 |    +0.00 (0.01) =    |    -0.01 (0.01)
+scimark-sparse  | 4.95 |    -0.02 (0.04) =    |    -0.02 (0.04)
+series =          | 0.95 | =    +0.00 (0.02)    | =    +0.00 (0.01)
+spectral-norm =   | 0.96 |    +0.00 (0.02)    | =    -0.01 (0.02)
+```
-- 
2.28.0


-- 
Best = regards,
IM

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
diff --git = a/doc/rfc/5442-luajit-memory-profiler.md = b/doc/rfc/5442-luajit-memory-profiler.md
index 85a61462a..2721f1cc1 100644
--- = a/doc/rfc/5442-luajit-memory-profiler.md
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -30,39 = +30,39 @@ The whole toolchain of memory profiling will be divided into = several parts:

This section describes additional changes in LuaJIT required = for the feature
implementation. This version of LuaJIT memory profiler does = not support verbose
-reporting allocations from traces. All allocation from = traces are reported as
-internal. But trace code semantics should be totally the = same as for the Lua
-interpreter (excluding sink optimizations). Also all = deallocations reported as
-internal too.
+reporting for allocations made on traces. All allocation = from traces are
+reported as internal. But trace code semantics should be = totally the same as
+for the Lua interpreter (excluding sink optimizations). Also = all, deallocations
+are reported as internal too.

There are two different representations of functions in = LuaJIT: the function's
prototype (`GCproto`) and the function object so called = closure (`GCfunc`).
The closures are represented as `GCfuncL` and `GCfuncC` for = Lua and C closures
-correspondingly. Also LuaJIT has a special function's type = aka Fast Function.
-It is used for LuaJIT builtins.
+respectively. = Besides LuaJIT has a special function type, a.k.a. Fast = Function
+that is used = for LuaJIT built-ins

Tail call optimization does not create a new call frame, so = all allocations
inside the function called via `CALLT`/`CALLMT` are = attributed to its caller.

-Usually developers are not interested in information about = allocations inside
-builtins. So if fast function was called from a Lua function = all
-allocations = are attributed to this Lua function. Otherwise attribute this = event
-to a C = function.
+Lua = developers can do nothing with allocations made inside the built-ins = except
+reducing its = usage. So if fast function is called from a Lua function all
+allocations = made in its scope are attributed to this Lua function (i.e. = the
+built-in = caller). Otherwise, this event is attributed to a C function.

Assume we = have the following Lua chunk named <test.lua>:

-```
+```lua
1  jit.off()
2  misc.memprof.start("memprof_new.bin")
-3  -- = Lua does not create a new frame to call string.rep and all allocations = are
-4  -- = attributed not to `append()` function but to the parent scope.
+3  -- = Lua does not create a new frame to call string.rep() and all = allocations
+4  -- = are attributed not to append() function but to the parent = scope.
5  local = function append(str, rep)
6    return string.rep(str, rep)
7 =  end
8
9  local = t =3D {}
10 for _ =3D = 1, 1e5 do
-11 =   -- table.insert is a builtin and all corresponding = allocations
+11 =   -- table.insert() is a built-in and all corresponding = allocations
12 =   -- are reported in the scope of main chunk
13 =   table.insert(t,
14     append('q', _)
@@ -71,7 = +71,7 @@ Assume we have the following Lua chunk named = <test.lua>:
17 misc.memprof.stop()
```

-If one run the chunk above the profiler reports = approximately the following
+If one runs the chunk above the profiler reports = approximately the following
(see legend = [here](#reading-and-displaying-saved-data)):
```
ALLOCATIONS
@@ -99,15 +99,15 @@ INTERNAL: 20    0 =       1481

So we need to know a type of function being executed by the = virtual machine
(VM). Currently VM state identifies C function execution = only, so Fast and Lua
-functions states will be added.
+functions = states are added.

To determine currently allocating coroutine (that may not be = equal to currently
executed one) a new field called `mem_L` is added to = `global_State` structure
-to keep the coroutine address. This field is set at each = reallocation to
-corresponding `L` with which it was called.
+to keep the = coroutine address. This field is set on each reallocation to = the
+corresponding = `L` with which it is called.

There is a static function (`lj_debug_getframeline`) that = returns line number
-for current `BCPos` in `lj_debug.c` already. It will be = added to the debug
+for current `BCPos` in `lj_debug.c` already. It is added to = the debug
module API to = be used in memory profiler.

### Information recording
@@ -211,10 +211,11 @@ local started, err, errno =3D = misc.memprof.start(fname)
```
where `fname` is name of the file where profile events are = written. Writer for
this function perform `fwrite()` for each call retrying in = case of `EINTR`.
-When the profiling is stopped the `fclose()` is called. If = it is impossible to
-open a file for writing or profiler fails to start, returns = `nil` on failure
+When the profiling is stopped `fclose()` is called. The = profiler's function's
+contract is similar to standard `io.*` interfaces. If it is = impossible to open
+a file for writing or profiler fails to start, `nil` is = returned on failure
(plus an error message as a second result and a = system-dependent error code as
-a third result). Otherwise returns some true = value.
+a third = result). Otherwise, returns `true` value.

Stopping profiler from Lua is simple too:
```lua
@@ -230,17 +231,12 @@ If you want to build LuaJIT without = memory profiler, you should build it with
`-DLUAJIT_DISABLE_MEMPROF`. If it is disabled = `misc.memprof.start()` and
`misc.memprof.stop()` always return `false`.

-Memory = profiler is expected to be thread safe, so it has a = corresponding
-lock/unlock = at internal mutex whenever you call corresponding memprof
-functions. = If you want to build LuaJIT without thread safety use
-`-DLUAJIT_DISABLE_THREAD_SAFE`.
-
### Reading = and displaying saved data

-Binary data can be read by `lj-parse-memprof` utility. It = parses the binary
-format provided by memory profiler and render it on = human-readable format.
+Binary data can be read by `luajit-parse-memprof` utility. = It parses the binary
+format provided by memory profiler and render it to = human-readable format.

-The usage is very simple:
+The usage for LuaJIT itself is very simple:
```
$ = ./luajit-parse-memprof --help
luajit-parse-memprof - parser of the memory usage profile = collected
@@ -266,6 = +262,12 @@ structures. Note that events are sorted from the most often = to the least.

`Overrides` = means what allocation this reallocation overrides.

+If you want = to parse binary data via Tarantool only, use the following
+command = (dash is important):
+```bash
+$ tarantool -e 'require("memprof")(arg[1])' - = memprof.bin
+```
+
## = Benchmarks

Benchmarks = were taken from repo:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

And one more = iterative patch (over the previous one). Branch is
force = pushed.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
diff --git = a/doc/rfc/5442-luajit-memory-profiler.md = b/doc/rfc/5442-luajit-memory-profiler.md
index 2721f1cc1..f9c43f91f 100644
--- = a/doc/rfc/5442-luajit-memory-profiler.md
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -5,7 +5,7 = @@
* = **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org,
          &nb= sp;    Igor Munkin @igormunkin imun@tarantool.org,
          &nb= sp;    Sergey Ostanevich @sergos sergos@tarantool.org
-* = **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442)
+* = **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442), = [#5490](https://github.com/tarantool/tarantool/issues/5490)

## = Summary
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- 
Best = regards,
Sergey = Kaplun

= --Apple-Mail=_C9BFB249-5FEA-4E32-83E2-E0942FDD1C78--