[Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler

Sergey Ostanevich sergos at tarantool.org
Wed Jan 20 17:26:33 MSK 2021


Hi! 

Thanks for the patch, I've looked into 
https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md <https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md>

in ‘Prerequisites’: 
> Also all, deallocations are reported as internal too.

the comma is not needed

> Lua developers can do nothing with allocations made inside the built-ins except reducing its usage.

‘its’ doesn’t explain exact matter. I would rephrase: "As for allocations made inside the built-ins user can do nothing but reduce use of these built-ins."

> Currently VM state identifies C function execution only, so Fast and Lua functions states are added.

‘Currently’ -> ‘Originally’

Otherwise LGTM.
Sergos


> On 20 Jan 2021, at 11:19, Sergey Kaplun <skaplun at tarantool.org> wrote:
> 
> Hi, Igor!
> 
> Thanks for the review!
> 
> On 15.01.21, Igor Munkin wrote:
>> Sergey,
>> 
>> Thanks for the changes. There is a bit of nitpicking below and I
>> believe we'll push the next version doc to the trunk.
> 
> I've fixed all your comments, plus added some insignificant fixes.
> See two iterative patches below. Branch is force pushed.
> 
>> 
>> On 25.12.20, Sergey Kaplun wrote:
>>> Part of #5442
>>> ---
>>> 
>>> RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md <https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md>
> 
> Side note: branch name is updated.
> New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md <https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md>
> 
>>> 
>>> Changes in v3:
>>> * More comments in example.
>>> * More verbose benchmark information.
>>> * Grammar and spelling fixes.
>>> 
>>> Changes in v2:
>>> * Removed C API, Tarantool integration and description of additional
>>>  features -- they will be added in another RFC if necessary.
>>> * Removed checking profile is running from the public API.
>>> * Added benchmarks and more meaningful example.
>>> * Grammar fixes.
>>> 
>>> doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
>>> 1 file changed, 314 insertions(+)
>>> create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
>>> 
>>> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
>>> new file mode 100644
>>> index 000000000..85a61462a
>>> --- /dev/null
>>> +++ b/doc/rfc/5442-luajit-memory-profiler.md
>>> @@ -0,0 +1,314 @@
>> 
>> <snipped>
>> 
>>> +### Prerequisites
>>> +
>>> +This section describes additional changes in LuaJIT required for the feature
>>> +implementation. This version of LuaJIT memory profiler does not support verbose
>>> +reporting allocations from traces. All allocation from traces are reported as
>> 
>> Typo: s/reporting allocations from/reporting for allocations made on/.
> 
> Fixed, thanks!
> 
>> 
>>> +internal. But trace code semantics should be totally the same as for the Lua
>>> +interpreter (excluding sink optimizations). Also all deallocations reported as
>> 
>> Typo: s/deallocations reported/deallocation are reported/.
> 
> Fixed, thanks!
> 
>> 
>>> +internal too.
>>> +
>>> +There are two different representations of functions in LuaJIT: the function's
>>> +prototype (`GCproto`) and the function object so called closure (`GCfunc`).
>>> +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
>>> +correspondingly. Also LuaJIT has a special function's type aka Fast Function.
>> 
>> Typo: s/correspondingly/respectively/.
>> 
>>> +It is used for LuaJIT builtins.
>> 
>> It's better to not split this sentence. Consider the rewording:
>> | Besides LuaJIT has a special function type a.k.a. Fast Function that
>> | is used for LuaJIT builtins.
> 
> Applied! Thanks!
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +Usually developers are not interested in information about allocations inside
>>> +builtins. So if fast function was called from a Lua function all
>>> +allocations are attributed to this Lua function. Otherwise attribute this event
>>> +to a C function.
>> 
>> I propose the following rewording:
>> | Lua developers can do nothing with allocations made inside the
>> | builtins except reducing its usage. So if fast function is called from
>> | a Lua function all allocations made in its scope are attributed to this
>> | Lua function (i.e. the builtin caller). Otherwise this event is
>> | attributed to a C function.
>> 
> 
> Applied, thanks!
> 
>>> +
>> 
>> <snipped>
>> 
>>> +If one run the chunk above the profiler reports approximately the following
>> 
>> Typo: s/run/runs/.
> 
> Fixed.
> 
>> 
>>> +(see legend [here](#reading-and-displaying-saved-data)):
>> 
>> <snipped>
>> 
>>> +So we need to know a type of function being executed by the virtual machine
>>> +(VM). Currently VM state identifies C function execution only, so Fast and Lua
>>> +functions states will be added.
>> 
>> Typo: s/will be/are/.
> 
> Sure, thanks!
> 
>> 
>>> +
>>> +To determine currently allocating coroutine (that may not be equal to currently
>>> +executed one) a new field called `mem_L` is added to `global_State` structure
>>> +to keep the coroutine address. This field is set at each reallocation to
>> 
>> Typo: /at each reallocation to/on each reallocation to the/.
> 
> Fixed.
> 
>> 
>>> +corresponding `L` with which it was called.
>> 
>> Typo: s/it was/it is/.
> 
> Thanks, fixed!
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +When the profiling is stopped the `fclose()` is called. If it is impossible to
>> 
>> Typo: s/the `fclose()`/`fclose()`/.
> 
> Fixed.
> 
>> 
>>> +open a file for writing or profiler fails to start, returns `nil` on failure
>> 
>> Typo: s/returns `nil`/`nil` is returned/.
> 
> Fixed.
> 
>> 
>>> +(plus an error message as a second result and a system-dependent error code as
>>> +a third result). Otherwise returns some true value.
>> 
>> It would be nice to mention that the function contract is similar to
>> other standart io.* interfaces.
>> 
>> I glanced the source code: it's not "some" true value; it is exactly the
>> *true* value.
> 
> All right! Fixed.
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +Memory profiler is expected to be thread safe, so it has a corresponding
>>> +lock/unlock at internal mutex whenever you call corresponding memprof
>>> +functions. If you want to build LuaJIT without thread safety use
>>> +`-DLUAJIT_DISABLE_THREAD_SAFE`.
>> 
>> This is not implemented in scope of the MVP, so drop this part.
> 
> Done.
> 
>> 
>>> +
>>> +### Reading and displaying saved data
>>> +
>>> +Binary data can be read by `lj-parse-memprof` utility. It parses the binary
>> 
>> Typo: s/lj-parse-memprof/luajit-parse-memprof/.
> 
> Fixed, thanks!
> 
>> 
>>> +format provided by memory profiler and render it on human-readable format.
>> 
>> Typo: s/it on/it to/.
> 
> Fixed, thanks!
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +This table shows performance deviation in relation to REFerence value (before
>>> +commit) with stopped and running profiler. The table shows the average value
>>> +for 11 runs. The first field of the column indicates the change in the average
>>> +time in seconds (less is better). The second field is the standard deviation
>>> +for the found difference.
>>> +
>>> +```
>>> +     Name       | REF  | AFTER, memprof off | AFTER, memprof on
>>> +----------------+------+--------------------+------------------
>>> +array3d         | 0.21 |    +0.00 (0.01)    |    +0.00 (0.01)
>>> +binary-trees    | 3.25 |    -0.01 (0.06)    |    +0.53 (0.10)
>>> +chameneos       | 2.97 |    +0.14 (0.04)    |    +0.13 (0.06)
>>> +coroutine-ring  | 1.00 |    +0.01 (0.04)    |    +0.01 (0.04)
>>> +euler14-bit     | 1.03 |    +0.01 (0.02)    |    +0.00 (0.02)
>>> +fannkuch        | 6.81 |    -0.21 (0.06)    |    -0.20 (0.06)
>>> +fasta           | 8.20 |    -0.07 (0.05)    |    -0.08 (0.03)
>> 
>> Side note: Still curious how this can happen. It looks OK when this is
>> negative difference in within its deviation. But this is sorta magic.
> 
> Yes, me too. Unfortunately, we have neither any benchmark tests nor
> performance analisis for LuaJIT for now.
> 
>> 
>>> +life            | 0.46 |    +0.00 (0.01)    |    +0.35 (0.01)
>>> +mandelbrot      | 2.65 |    +0.00 (0.01)    |    +0.01 (0.01)
>>> +mandelbrot-bit  | 1.97 |    +0.00 (0.01)    |    +0.01 (0.02)
>>> +md5             | 1.58 |    -0.01 (0.04)    |    -0.04 (0.04)
>>> +nbody           | 1.34 |    +0.00 (0.01)    |    -0.02 (0.01)
>>> +nsieve          | 2.07 |    -0.03 (0.03)    |    -0.01 (0.04)
>>> +nsieve-bit      | 1.50 |    -0.02 (0.04)    |    +0.00 (0.04)
>>> +nsieve-bit-fp   | 4.44 |    -0.03 (0.07)    |    -0.01 (0.07)
>>> +partialsums     | 0.54 |    +0.00 (0.01)    |    +0.00 (0.01)
>>> +pidigits-nogmp  | 3.47 |    -0.01 (0.02)    |    -0.10 (0.02)
>>> +ray             | 1.62 |    -0.02 (0.03)    |    +0.00 (0.02)
>>> +recursive-ack   | 0.20 |    +0.00 (0.01)    |    +0.00 (0.01)
>>> +recursive-fib   | 1.63 |    +0.00 (0.01)    |    +0.01 (0.02)
>>> +scimark-fft     | 5.72 |    +0.06 (0.09)    |    -0.01 (0.10)
>>> +scimark-lu      | 3.47 |    +0.02 (0.27)    |    -0.03 (0.26)
>>> +scimark-sor     | 2.34 |    +0.00 (0.01)    |    -0.01 (0.01)
>>> +scimark-sparse  | 4.95 |    -0.02 (0.04)    |    -0.02 (0.04)
>>> +series          | 0.95 |    +0.00 (0.02)    |    +0.00 (0.01)
>>> +spectral-norm   | 0.96 |    +0.00 (0.02)    |    -0.01 (0.02)
>>> +```
>>> -- 
>>> 2.28.0
>>> 
>> 
>> -- 
>> Best regards,
>> IM
> 
> ===================================================================
> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> index 85a61462a..2721f1cc1 100644
> --- a/doc/rfc/5442-luajit-memory-profiler.md
> +++ b/doc/rfc/5442-luajit-memory-profiler.md
> @@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be divided into several parts:
> 
> This section describes additional changes in LuaJIT required for the feature
> implementation. This version of LuaJIT memory profiler does not support verbose
> -reporting allocations from traces. All allocation from traces are reported as
> -internal. But trace code semantics should be totally the same as for the Lua
> -interpreter (excluding sink optimizations). Also all deallocations reported as
> -internal too.
> +reporting for allocations made on traces. All allocation from traces are
> +reported as internal. But trace code semantics should be totally the same as
> +for the Lua interpreter (excluding sink optimizations). Also all, deallocations
> +are reported as internal too.
> 
> There are two different representations of functions in LuaJIT: the function's
> prototype (`GCproto`) and the function object so called closure (`GCfunc`).
> The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
> -correspondingly. Also LuaJIT has a special function's type aka Fast Function.
> -It is used for LuaJIT builtins.
> +respectively. Besides LuaJIT has a special function type, a.k.a. Fast Function
> +that is used for LuaJIT built-ins
> 
> Tail call optimization does not create a new call frame, so all allocations
> inside the function called via `CALLT`/`CALLMT` are attributed to its caller.
> 
> -Usually developers are not interested in information about allocations inside
> -builtins. So if fast function was called from a Lua function all
> -allocations are attributed to this Lua function. Otherwise attribute this event
> -to a C function.
> +Lua developers can do nothing with allocations made inside the built-ins except
> +reducing its usage. So if fast function is called from a Lua function all
> +allocations made in its scope are attributed to this Lua function (i.e. the
> +built-in caller). Otherwise, this event is attributed to a C function.
> 
> Assume we have the following Lua chunk named <test.lua>:
> 
> -```
> +```lua
> 1  jit.off()
> 2  misc.memprof.start("memprof_new.bin")
> -3  -- Lua does not create a new frame to call string.rep and all allocations are
> -4  -- attributed not to `append()` function but to the parent scope.
> +3  -- Lua does not create a new frame to call string.rep() and all allocations
> +4  -- are attributed not to append() function but to the parent scope.
> 5  local function append(str, rep)
> 6    return string.rep(str, rep)
> 7  end
> 8
> 9  local t = {}
> 10 for _ = 1, 1e5 do
> -11   -- table.insert is a builtin and all corresponding allocations
> +11   -- table.insert() is a built-in and all corresponding allocations
> 12   -- are reported in the scope of main chunk
> 13   table.insert(t,
> 14     append('q', _)
> @@ -71,7 +71,7 @@ Assume we have the following Lua chunk named <test.lua>:
> 17 misc.memprof.stop()
> ```
> 
> -If one run the chunk above the profiler reports approximately the following
> +If one runs the chunk above the profiler reports approximately the following
> (see legend [here](#reading-and-displaying-saved-data)):
> ```
> ALLOCATIONS
> @@ -99,15 +99,15 @@ INTERNAL: 20    0       1481
> 
> So we need to know a type of function being executed by the virtual machine
> (VM). Currently VM state identifies C function execution only, so Fast and Lua
> -functions states will be added.
> +functions states are added.
> 
> To determine currently allocating coroutine (that may not be equal to currently
> executed one) a new field called `mem_L` is added to `global_State` structure
> -to keep the coroutine address. This field is set at each reallocation to
> -corresponding `L` with which it was called.
> +to keep the coroutine address. This field is set on each reallocation to the
> +corresponding `L` with which it is called.
> 
> There is a static function (`lj_debug_getframeline`) that returns line number
> -for current `BCPos` in `lj_debug.c` already. It will be added to the debug
> +for current `BCPos` in `lj_debug.c` already. It is added to the debug
> module API to be used in memory profiler.
> 
> ### Information recording
> @@ -211,10 +211,11 @@ local started, err, errno = misc.memprof.start(fname)
> ```
> where `fname` is name of the file where profile events are written. Writer for
> this function perform `fwrite()` for each call retrying in case of `EINTR`.
> -When the profiling is stopped the `fclose()` is called. If it is impossible to
> -open a file for writing or profiler fails to start, returns `nil` on failure
> +When the profiling is stopped `fclose()` is called. The profiler's function's
> +contract is similar to standard `io.*` interfaces. If it is impossible to open
> +a file for writing or profiler fails to start, `nil` is returned on failure
> (plus an error message as a second result and a system-dependent error code as
> -a third result). Otherwise returns some true value.
> +a third result). Otherwise, returns `true` value.
> 
> Stopping profiler from Lua is simple too:
> ```lua
> @@ -230,17 +231,12 @@ If you want to build LuaJIT without memory profiler, you should build it with
> `-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and
> `misc.memprof.stop()` always return `false`.
> 
> -Memory profiler is expected to be thread safe, so it has a corresponding
> -lock/unlock at internal mutex whenever you call corresponding memprof
> -functions. If you want to build LuaJIT without thread safety use
> -`-DLUAJIT_DISABLE_THREAD_SAFE`.
> -
> ### Reading and displaying saved data
> 
> -Binary data can be read by `lj-parse-memprof` utility. It parses the binary
> -format provided by memory profiler and render it on human-readable format.
> +Binary data can be read by `luajit-parse-memprof` utility. It parses the binary
> +format provided by memory profiler and render it to human-readable format.
> 
> -The usage is very simple:
> +The usage for LuaJIT itself is very simple:
> ```
> $ ./luajit-parse-memprof --help
> luajit-parse-memprof - parser of the memory usage profile collected
> @@ -266,6 +262,12 @@ structures. Note that events are sorted from the most often to the least.
> 
> `Overrides` means what allocation this reallocation overrides.
> 
> +If you want to parse binary data via Tarantool only, use the following
> +command (dash is important):
> +```bash
> +$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin
> +```
> +
> ## Benchmarks
> 
> Benchmarks were taken from repo:
> ===================================================================
> 
> And one more iterative patch (over the previous one). Branch is
> force pushed.
> ===================================================================
> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> index 2721f1cc1..f9c43f91f 100644
> --- a/doc/rfc/5442-luajit-memory-profiler.md
> +++ b/doc/rfc/5442-luajit-memory-profiler.md
> @@ -5,7 +5,7 @@
> * **Authors**: Sergey Kaplun @Buristan skaplun at tarantool.org <mailto:skaplun at tarantool.org>,
>                Igor Munkin @igormunkin imun at tarantool.org <mailto:imun at tarantool.org>,
>                Sergey Ostanevich @sergos sergos at tarantool.org <mailto:sergos at tarantool.org>
> -* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442 <https://github.com/tarantool/tarantool/issues/5442>)
> +* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442 <https://github.com/tarantool/tarantool/issues/5442>), [#5490](https://github.com/tarantool/tarantool/issues/5490 <https://github.com/tarantool/tarantool/issues/5490>)
> 
> ## Summary
> ===================================================================
> -- 
> Best regards,
> Sergey Kaplun

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-discussions/attachments/20210120/83a3841d/attachment.htm>


More information about the Tarantool-discussions mailing list