Hi! Thanks for the patch, I've looked into https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md in ‘Prerequisites’: > Also all, deallocations are reported as internal too. the comma is not needed > Lua developers can do nothing with allocations made inside the built-ins except reducing its usage. ‘its’ doesn’t explain exact matter. I would rephrase: "As for allocations made inside the built-ins user can do nothing but reduce use of these built-ins." > Currently VM state identifies C function execution only, so Fast and Lua functions states are added. ‘Currently’ -> ‘Originally’ Otherwise LGTM. Sergos > On 20 Jan 2021, at 11:19, Sergey Kaplun wrote: > > Hi, Igor! > > Thanks for the review! > > On 15.01.21, Igor Munkin wrote: >> Sergey, >> >> Thanks for the changes. There is a bit of nitpicking below and I >> believe we'll push the next version doc to the trunk. > > I've fixed all your comments, plus added some insignificant fixes. > See two iterative patches below. Branch is force pushed. > >> >> On 25.12.20, Sergey Kaplun wrote: >>> Part of #5442 >>> --- >>> >>> RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md > > Side note: branch name is updated. > New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md > >>> >>> Changes in v3: >>> * More comments in example. >>> * More verbose benchmark information. >>> * Grammar and spelling fixes. >>> >>> Changes in v2: >>> * Removed C API, Tarantool integration and description of additional >>> features -- they will be added in another RFC if necessary. >>> * Removed checking profile is running from the public API. >>> * Added benchmarks and more meaningful example. >>> * Grammar fixes. >>> >>> doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++ >>> 1 file changed, 314 insertions(+) >>> create mode 100644 doc/rfc/5442-luajit-memory-profiler.md >>> >>> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md >>> new file mode 100644 >>> index 000000000..85a61462a >>> --- /dev/null >>> +++ b/doc/rfc/5442-luajit-memory-profiler.md >>> @@ -0,0 +1,314 @@ >> >> >> >>> +### Prerequisites >>> + >>> +This section describes additional changes in LuaJIT required for the feature >>> +implementation. This version of LuaJIT memory profiler does not support verbose >>> +reporting allocations from traces. All allocation from traces are reported as >> >> Typo: s/reporting allocations from/reporting for allocations made on/. > > Fixed, thanks! > >> >>> +internal. But trace code semantics should be totally the same as for the Lua >>> +interpreter (excluding sink optimizations). Also all deallocations reported as >> >> Typo: s/deallocations reported/deallocation are reported/. > > Fixed, thanks! > >> >>> +internal too. >>> + >>> +There are two different representations of functions in LuaJIT: the function's >>> +prototype (`GCproto`) and the function object so called closure (`GCfunc`). >>> +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures >>> +correspondingly. Also LuaJIT has a special function's type aka Fast Function. >> >> Typo: s/correspondingly/respectively/. >> >>> +It is used for LuaJIT builtins. >> >> It's better to not split this sentence. Consider the rewording: >> | Besides LuaJIT has a special function type a.k.a. Fast Function that >> | is used for LuaJIT builtins. > > Applied! Thanks! > >> >>> + >> >> >> >>> +Usually developers are not interested in information about allocations inside >>> +builtins. So if fast function was called from a Lua function all >>> +allocations are attributed to this Lua function. Otherwise attribute this event >>> +to a C function. >> >> I propose the following rewording: >> | Lua developers can do nothing with allocations made inside the >> | builtins except reducing its usage. So if fast function is called from >> | a Lua function all allocations made in its scope are attributed to this >> | Lua function (i.e. the builtin caller). Otherwise this event is >> | attributed to a C function. >> > > Applied, thanks! > >>> + >> >> >> >>> +If one run the chunk above the profiler reports approximately the following >> >> Typo: s/run/runs/. > > Fixed. > >> >>> +(see legend [here](#reading-and-displaying-saved-data)): >> >> >> >>> +So we need to know a type of function being executed by the virtual machine >>> +(VM). Currently VM state identifies C function execution only, so Fast and Lua >>> +functions states will be added. >> >> Typo: s/will be/are/. > > Sure, thanks! > >> >>> + >>> +To determine currently allocating coroutine (that may not be equal to currently >>> +executed one) a new field called `mem_L` is added to `global_State` structure >>> +to keep the coroutine address. This field is set at each reallocation to >> >> Typo: /at each reallocation to/on each reallocation to the/. > > Fixed. > >> >>> +corresponding `L` with which it was called. >> >> Typo: s/it was/it is/. > > Thanks, fixed! > >> >>> + >> >> >> >>> +When the profiling is stopped the `fclose()` is called. If it is impossible to >> >> Typo: s/the `fclose()`/`fclose()`/. > > Fixed. > >> >>> +open a file for writing or profiler fails to start, returns `nil` on failure >> >> Typo: s/returns `nil`/`nil` is returned/. > > Fixed. > >> >>> +(plus an error message as a second result and a system-dependent error code as >>> +a third result). Otherwise returns some true value. >> >> It would be nice to mention that the function contract is similar to >> other standart io.* interfaces. >> >> I glanced the source code: it's not "some" true value; it is exactly the >> *true* value. > > All right! Fixed. > >> >>> + >> >> >> >>> +Memory profiler is expected to be thread safe, so it has a corresponding >>> +lock/unlock at internal mutex whenever you call corresponding memprof >>> +functions. If you want to build LuaJIT without thread safety use >>> +`-DLUAJIT_DISABLE_THREAD_SAFE`. >> >> This is not implemented in scope of the MVP, so drop this part. > > Done. > >> >>> + >>> +### Reading and displaying saved data >>> + >>> +Binary data can be read by `lj-parse-memprof` utility. It parses the binary >> >> Typo: s/lj-parse-memprof/luajit-parse-memprof/. > > Fixed, thanks! > >> >>> +format provided by memory profiler and render it on human-readable format. >> >> Typo: s/it on/it to/. > > Fixed, thanks! > >> >>> + >> >> >> >>> +This table shows performance deviation in relation to REFerence value (before >>> +commit) with stopped and running profiler. The table shows the average value >>> +for 11 runs. The first field of the column indicates the change in the average >>> +time in seconds (less is better). The second field is the standard deviation >>> +for the found difference. >>> + >>> +``` >>> + Name | REF | AFTER, memprof off | AFTER, memprof on >>> +----------------+------+--------------------+------------------ >>> +array3d | 0.21 | +0.00 (0.01) | +0.00 (0.01) >>> +binary-trees | 3.25 | -0.01 (0.06) | +0.53 (0.10) >>> +chameneos | 2.97 | +0.14 (0.04) | +0.13 (0.06) >>> +coroutine-ring | 1.00 | +0.01 (0.04) | +0.01 (0.04) >>> +euler14-bit | 1.03 | +0.01 (0.02) | +0.00 (0.02) >>> +fannkuch | 6.81 | -0.21 (0.06) | -0.20 (0.06) >>> +fasta | 8.20 | -0.07 (0.05) | -0.08 (0.03) >> >> Side note: Still curious how this can happen. It looks OK when this is >> negative difference in within its deviation. But this is sorta magic. > > Yes, me too. Unfortunately, we have neither any benchmark tests nor > performance analisis for LuaJIT for now. > >> >>> +life | 0.46 | +0.00 (0.01) | +0.35 (0.01) >>> +mandelbrot | 2.65 | +0.00 (0.01) | +0.01 (0.01) >>> +mandelbrot-bit | 1.97 | +0.00 (0.01) | +0.01 (0.02) >>> +md5 | 1.58 | -0.01 (0.04) | -0.04 (0.04) >>> +nbody | 1.34 | +0.00 (0.01) | -0.02 (0.01) >>> +nsieve | 2.07 | -0.03 (0.03) | -0.01 (0.04) >>> +nsieve-bit | 1.50 | -0.02 (0.04) | +0.00 (0.04) >>> +nsieve-bit-fp | 4.44 | -0.03 (0.07) | -0.01 (0.07) >>> +partialsums | 0.54 | +0.00 (0.01) | +0.00 (0.01) >>> +pidigits-nogmp | 3.47 | -0.01 (0.02) | -0.10 (0.02) >>> +ray | 1.62 | -0.02 (0.03) | +0.00 (0.02) >>> +recursive-ack | 0.20 | +0.00 (0.01) | +0.00 (0.01) >>> +recursive-fib | 1.63 | +0.00 (0.01) | +0.01 (0.02) >>> +scimark-fft | 5.72 | +0.06 (0.09) | -0.01 (0.10) >>> +scimark-lu | 3.47 | +0.02 (0.27) | -0.03 (0.26) >>> +scimark-sor | 2.34 | +0.00 (0.01) | -0.01 (0.01) >>> +scimark-sparse | 4.95 | -0.02 (0.04) | -0.02 (0.04) >>> +series | 0.95 | +0.00 (0.02) | +0.00 (0.01) >>> +spectral-norm | 0.96 | +0.00 (0.02) | -0.01 (0.02) >>> +``` >>> -- >>> 2.28.0 >>> >> >> -- >> Best regards, >> IM > > =================================================================== > diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md > index 85a61462a..2721f1cc1 100644 > --- a/doc/rfc/5442-luajit-memory-profiler.md > +++ b/doc/rfc/5442-luajit-memory-profiler.md > @@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be divided into several parts: > > This section describes additional changes in LuaJIT required for the feature > implementation. This version of LuaJIT memory profiler does not support verbose > -reporting allocations from traces. All allocation from traces are reported as > -internal. But trace code semantics should be totally the same as for the Lua > -interpreter (excluding sink optimizations). Also all deallocations reported as > -internal too. > +reporting for allocations made on traces. All allocation from traces are > +reported as internal. But trace code semantics should be totally the same as > +for the Lua interpreter (excluding sink optimizations). Also all, deallocations > +are reported as internal too. > > There are two different representations of functions in LuaJIT: the function's > prototype (`GCproto`) and the function object so called closure (`GCfunc`). > The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures > -correspondingly. Also LuaJIT has a special function's type aka Fast Function. > -It is used for LuaJIT builtins. > +respectively. Besides LuaJIT has a special function type, a.k.a. Fast Function > +that is used for LuaJIT built-ins > > Tail call optimization does not create a new call frame, so all allocations > inside the function called via `CALLT`/`CALLMT` are attributed to its caller. > > -Usually developers are not interested in information about allocations inside > -builtins. So if fast function was called from a Lua function all > -allocations are attributed to this Lua function. Otherwise attribute this event > -to a C function. > +Lua developers can do nothing with allocations made inside the built-ins except > +reducing its usage. So if fast function is called from a Lua function all > +allocations made in its scope are attributed to this Lua function (i.e. the > +built-in caller). Otherwise, this event is attributed to a C function. > > Assume we have the following Lua chunk named : > > -``` > +```lua > 1 jit.off() > 2 misc.memprof.start("memprof_new.bin") > -3 -- Lua does not create a new frame to call string.rep and all allocations are > -4 -- attributed not to `append()` function but to the parent scope. > +3 -- Lua does not create a new frame to call string.rep() and all allocations > +4 -- are attributed not to append() function but to the parent scope. > 5 local function append(str, rep) > 6 return string.rep(str, rep) > 7 end > 8 > 9 local t = {} > 10 for _ = 1, 1e5 do > -11 -- table.insert is a builtin and all corresponding allocations > +11 -- table.insert() is a built-in and all corresponding allocations > 12 -- are reported in the scope of main chunk > 13 table.insert(t, > 14 append('q', _) > @@ -71,7 +71,7 @@ Assume we have the following Lua chunk named : > 17 misc.memprof.stop() > ``` > > -If one run the chunk above the profiler reports approximately the following > +If one runs the chunk above the profiler reports approximately the following > (see legend [here](#reading-and-displaying-saved-data)): > ``` > ALLOCATIONS > @@ -99,15 +99,15 @@ INTERNAL: 20 0 1481 > > So we need to know a type of function being executed by the virtual machine > (VM). Currently VM state identifies C function execution only, so Fast and Lua > -functions states will be added. > +functions states are added. > > To determine currently allocating coroutine (that may not be equal to currently > executed one) a new field called `mem_L` is added to `global_State` structure > -to keep the coroutine address. This field is set at each reallocation to > -corresponding `L` with which it was called. > +to keep the coroutine address. This field is set on each reallocation to the > +corresponding `L` with which it is called. > > There is a static function (`lj_debug_getframeline`) that returns line number > -for current `BCPos` in `lj_debug.c` already. It will be added to the debug > +for current `BCPos` in `lj_debug.c` already. It is added to the debug > module API to be used in memory profiler. > > ### Information recording > @@ -211,10 +211,11 @@ local started, err, errno = misc.memprof.start(fname) > ``` > where `fname` is name of the file where profile events are written. Writer for > this function perform `fwrite()` for each call retrying in case of `EINTR`. > -When the profiling is stopped the `fclose()` is called. If it is impossible to > -open a file for writing or profiler fails to start, returns `nil` on failure > +When the profiling is stopped `fclose()` is called. The profiler's function's > +contract is similar to standard `io.*` interfaces. If it is impossible to open > +a file for writing or profiler fails to start, `nil` is returned on failure > (plus an error message as a second result and a system-dependent error code as > -a third result). Otherwise returns some true value. > +a third result). Otherwise, returns `true` value. > > Stopping profiler from Lua is simple too: > ```lua > @@ -230,17 +231,12 @@ If you want to build LuaJIT without memory profiler, you should build it with > `-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and > `misc.memprof.stop()` always return `false`. > > -Memory profiler is expected to be thread safe, so it has a corresponding > -lock/unlock at internal mutex whenever you call corresponding memprof > -functions. If you want to build LuaJIT without thread safety use > -`-DLUAJIT_DISABLE_THREAD_SAFE`. > - > ### Reading and displaying saved data > > -Binary data can be read by `lj-parse-memprof` utility. It parses the binary > -format provided by memory profiler and render it on human-readable format. > +Binary data can be read by `luajit-parse-memprof` utility. It parses the binary > +format provided by memory profiler and render it to human-readable format. > > -The usage is very simple: > +The usage for LuaJIT itself is very simple: > ``` > $ ./luajit-parse-memprof --help > luajit-parse-memprof - parser of the memory usage profile collected > @@ -266,6 +262,12 @@ structures. Note that events are sorted from the most often to the least. > > `Overrides` means what allocation this reallocation overrides. > > +If you want to parse binary data via Tarantool only, use the following > +command (dash is important): > +```bash > +$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin > +``` > + > ## Benchmarks > > Benchmarks were taken from repo: > =================================================================== > > And one more iterative patch (over the previous one). Branch is > force pushed. > =================================================================== > diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md > index 2721f1cc1..f9c43f91f 100644 > --- a/doc/rfc/5442-luajit-memory-profiler.md > +++ b/doc/rfc/5442-luajit-memory-profiler.md > @@ -5,7 +5,7 @@ > * **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org , > Igor Munkin @igormunkin imun@tarantool.org , > Sergey Ostanevich @sergos sergos@tarantool.org > -* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442 ) > +* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442 ), [#5490](https://github.com/tarantool/tarantool/issues/5490 ) > > ## Summary > =================================================================== > -- > Best regards, > Sergey Kaplun