[Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
Sergey Kaplun
skaplun at tarantool.org
Wed Jan 20 11:19:57 MSK 2021
Hi, Igor!
Thanks for the review!
On 15.01.21, Igor Munkin wrote:
> Sergey,
>
> Thanks for the changes. There is a bit of nitpicking below and I
> believe we'll push the next version doc to the trunk.
I've fixed all your comments, plus added some insignificant fixes.
See two iterative patches below. Branch is force pushed.
>
> On 25.12.20, Sergey Kaplun wrote:
> > Part of #5442
> > ---
> >
> > RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md
Side note: branch name is updated.
New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md
> >
> > Changes in v3:
> > * More comments in example.
> > * More verbose benchmark information.
> > * Grammar and spelling fixes.
> >
> > Changes in v2:
> > * Removed C API, Tarantool integration and description of additional
> > features -- they will be added in another RFC if necessary.
> > * Removed checking profile is running from the public API.
> > * Added benchmarks and more meaningful example.
> > * Grammar fixes.
> >
> > doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
> > 1 file changed, 314 insertions(+)
> > create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
> >
> > diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> > new file mode 100644
> > index 000000000..85a61462a
> > --- /dev/null
> > +++ b/doc/rfc/5442-luajit-memory-profiler.md
> > @@ -0,0 +1,314 @@
>
> <snipped>
>
> > +### Prerequisites
> > +
> > +This section describes additional changes in LuaJIT required for the feature
> > +implementation. This version of LuaJIT memory profiler does not support verbose
> > +reporting allocations from traces. All allocation from traces are reported as
>
> Typo: s/reporting allocations from/reporting for allocations made on/.
Fixed, thanks!
>
> > +internal. But trace code semantics should be totally the same as for the Lua
> > +interpreter (excluding sink optimizations). Also all deallocations reported as
>
> Typo: s/deallocations reported/deallocation are reported/.
Fixed, thanks!
>
> > +internal too.
> > +
> > +There are two different representations of functions in LuaJIT: the function's
> > +prototype (`GCproto`) and the function object so called closure (`GCfunc`).
> > +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
> > +correspondingly. Also LuaJIT has a special function's type aka Fast Function.
>
> Typo: s/correspondingly/respectively/.
>
> > +It is used for LuaJIT builtins.
>
> It's better to not split this sentence. Consider the rewording:
> | Besides LuaJIT has a special function type a.k.a. Fast Function that
> | is used for LuaJIT builtins.
Applied! Thanks!
>
> > +
>
> <snipped>
>
> > +Usually developers are not interested in information about allocations inside
> > +builtins. So if fast function was called from a Lua function all
> > +allocations are attributed to this Lua function. Otherwise attribute this event
> > +to a C function.
>
> I propose the following rewording:
> | Lua developers can do nothing with allocations made inside the
> | builtins except reducing its usage. So if fast function is called from
> | a Lua function all allocations made in its scope are attributed to this
> | Lua function (i.e. the builtin caller). Otherwise this event is
> | attributed to a C function.
>
Applied, thanks!
> > +
>
> <snipped>
>
> > +If one run the chunk above the profiler reports approximately the following
>
> Typo: s/run/runs/.
Fixed.
>
> > +(see legend [here](#reading-and-displaying-saved-data)):
>
> <snipped>
>
> > +So we need to know a type of function being executed by the virtual machine
> > +(VM). Currently VM state identifies C function execution only, so Fast and Lua
> > +functions states will be added.
>
> Typo: s/will be/are/.
Sure, thanks!
>
> > +
> > +To determine currently allocating coroutine (that may not be equal to currently
> > +executed one) a new field called `mem_L` is added to `global_State` structure
> > +to keep the coroutine address. This field is set at each reallocation to
>
> Typo: /at each reallocation to/on each reallocation to the/.
Fixed.
>
> > +corresponding `L` with which it was called.
>
> Typo: s/it was/it is/.
Thanks, fixed!
>
> > +
>
> <snipped>
>
> > +When the profiling is stopped the `fclose()` is called. If it is impossible to
>
> Typo: s/the `fclose()`/`fclose()`/.
Fixed.
>
> > +open a file for writing or profiler fails to start, returns `nil` on failure
>
> Typo: s/returns `nil`/`nil` is returned/.
Fixed.
>
> > +(plus an error message as a second result and a system-dependent error code as
> > +a third result). Otherwise returns some true value.
>
> It would be nice to mention that the function contract is similar to
> other standart io.* interfaces.
>
> I glanced the source code: it's not "some" true value; it is exactly the
> *true* value.
All right! Fixed.
>
> > +
>
> <snipped>
>
> > +Memory profiler is expected to be thread safe, so it has a corresponding
> > +lock/unlock at internal mutex whenever you call corresponding memprof
> > +functions. If you want to build LuaJIT without thread safety use
> > +`-DLUAJIT_DISABLE_THREAD_SAFE`.
>
> This is not implemented in scope of the MVP, so drop this part.
Done.
>
> > +
> > +### Reading and displaying saved data
> > +
> > +Binary data can be read by `lj-parse-memprof` utility. It parses the binary
>
> Typo: s/lj-parse-memprof/luajit-parse-memprof/.
Fixed, thanks!
>
> > +format provided by memory profiler and render it on human-readable format.
>
> Typo: s/it on/it to/.
Fixed, thanks!
>
> > +
>
> <snipped>
>
> > +This table shows performance deviation in relation to REFerence value (before
> > +commit) with stopped and running profiler. The table shows the average value
> > +for 11 runs. The first field of the column indicates the change in the average
> > +time in seconds (less is better). The second field is the standard deviation
> > +for the found difference.
> > +
> > +```
> > + Name | REF | AFTER, memprof off | AFTER, memprof on
> > +----------------+------+--------------------+------------------
> > +array3d | 0.21 | +0.00 (0.01) | +0.00 (0.01)
> > +binary-trees | 3.25 | -0.01 (0.06) | +0.53 (0.10)
> > +chameneos | 2.97 | +0.14 (0.04) | +0.13 (0.06)
> > +coroutine-ring | 1.00 | +0.01 (0.04) | +0.01 (0.04)
> > +euler14-bit | 1.03 | +0.01 (0.02) | +0.00 (0.02)
> > +fannkuch | 6.81 | -0.21 (0.06) | -0.20 (0.06)
> > +fasta | 8.20 | -0.07 (0.05) | -0.08 (0.03)
>
> Side note: Still curious how this can happen. It looks OK when this is
> negative difference in within its deviation. But this is sorta magic.
Yes, me too. Unfortunately, we have neither any benchmark tests nor
performance analisis for LuaJIT for now.
>
> > +life | 0.46 | +0.00 (0.01) | +0.35 (0.01)
> > +mandelbrot | 2.65 | +0.00 (0.01) | +0.01 (0.01)
> > +mandelbrot-bit | 1.97 | +0.00 (0.01) | +0.01 (0.02)
> > +md5 | 1.58 | -0.01 (0.04) | -0.04 (0.04)
> > +nbody | 1.34 | +0.00 (0.01) | -0.02 (0.01)
> > +nsieve | 2.07 | -0.03 (0.03) | -0.01 (0.04)
> > +nsieve-bit | 1.50 | -0.02 (0.04) | +0.00 (0.04)
> > +nsieve-bit-fp | 4.44 | -0.03 (0.07) | -0.01 (0.07)
> > +partialsums | 0.54 | +0.00 (0.01) | +0.00 (0.01)
> > +pidigits-nogmp | 3.47 | -0.01 (0.02) | -0.10 (0.02)
> > +ray | 1.62 | -0.02 (0.03) | +0.00 (0.02)
> > +recursive-ack | 0.20 | +0.00 (0.01) | +0.00 (0.01)
> > +recursive-fib | 1.63 | +0.00 (0.01) | +0.01 (0.02)
> > +scimark-fft | 5.72 | +0.06 (0.09) | -0.01 (0.10)
> > +scimark-lu | 3.47 | +0.02 (0.27) | -0.03 (0.26)
> > +scimark-sor | 2.34 | +0.00 (0.01) | -0.01 (0.01)
> > +scimark-sparse | 4.95 | -0.02 (0.04) | -0.02 (0.04)
> > +series | 0.95 | +0.00 (0.02) | +0.00 (0.01)
> > +spectral-norm | 0.96 | +0.00 (0.02) | -0.01 (0.02)
> > +```
> > --
> > 2.28.0
> >
>
> --
> Best regards,
> IM
===================================================================
diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
index 85a61462a..2721f1cc1 100644
--- a/doc/rfc/5442-luajit-memory-profiler.md
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be divided into several parts:
This section describes additional changes in LuaJIT required for the feature
implementation. This version of LuaJIT memory profiler does not support verbose
-reporting allocations from traces. All allocation from traces are reported as
-internal. But trace code semantics should be totally the same as for the Lua
-interpreter (excluding sink optimizations). Also all deallocations reported as
-internal too.
+reporting for allocations made on traces. All allocation from traces are
+reported as internal. But trace code semantics should be totally the same as
+for the Lua interpreter (excluding sink optimizations). Also all, deallocations
+are reported as internal too.
There are two different representations of functions in LuaJIT: the function's
prototype (`GCproto`) and the function object so called closure (`GCfunc`).
The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
-correspondingly. Also LuaJIT has a special function's type aka Fast Function.
-It is used for LuaJIT builtins.
+respectively. Besides LuaJIT has a special function type, a.k.a. Fast Function
+that is used for LuaJIT built-ins
Tail call optimization does not create a new call frame, so all allocations
inside the function called via `CALLT`/`CALLMT` are attributed to its caller.
-Usually developers are not interested in information about allocations inside
-builtins. So if fast function was called from a Lua function all
-allocations are attributed to this Lua function. Otherwise attribute this event
-to a C function.
+Lua developers can do nothing with allocations made inside the built-ins except
+reducing its usage. So if fast function is called from a Lua function all
+allocations made in its scope are attributed to this Lua function (i.e. the
+built-in caller). Otherwise, this event is attributed to a C function.
Assume we have the following Lua chunk named <test.lua>:
-```
+```lua
1 jit.off()
2 misc.memprof.start("memprof_new.bin")
-3 -- Lua does not create a new frame to call string.rep and all allocations are
-4 -- attributed not to `append()` function but to the parent scope.
+3 -- Lua does not create a new frame to call string.rep() and all allocations
+4 -- are attributed not to append() function but to the parent scope.
5 local function append(str, rep)
6 return string.rep(str, rep)
7 end
8
9 local t = {}
10 for _ = 1, 1e5 do
-11 -- table.insert is a builtin and all corresponding allocations
+11 -- table.insert() is a built-in and all corresponding allocations
12 -- are reported in the scope of main chunk
13 table.insert(t,
14 append('q', _)
@@ -71,7 +71,7 @@ Assume we have the following Lua chunk named <test.lua>:
17 misc.memprof.stop()
```
-If one run the chunk above the profiler reports approximately the following
+If one runs the chunk above the profiler reports approximately the following
(see legend [here](#reading-and-displaying-saved-data)):
```
ALLOCATIONS
@@ -99,15 +99,15 @@ INTERNAL: 20 0 1481
So we need to know a type of function being executed by the virtual machine
(VM). Currently VM state identifies C function execution only, so Fast and Lua
-functions states will be added.
+functions states are added.
To determine currently allocating coroutine (that may not be equal to currently
executed one) a new field called `mem_L` is added to `global_State` structure
-to keep the coroutine address. This field is set at each reallocation to
-corresponding `L` with which it was called.
+to keep the coroutine address. This field is set on each reallocation to the
+corresponding `L` with which it is called.
There is a static function (`lj_debug_getframeline`) that returns line number
-for current `BCPos` in `lj_debug.c` already. It will be added to the debug
+for current `BCPos` in `lj_debug.c` already. It is added to the debug
module API to be used in memory profiler.
### Information recording
@@ -211,10 +211,11 @@ local started, err, errno = misc.memprof.start(fname)
```
where `fname` is name of the file where profile events are written. Writer for
this function perform `fwrite()` for each call retrying in case of `EINTR`.
-When the profiling is stopped the `fclose()` is called. If it is impossible to
-open a file for writing or profiler fails to start, returns `nil` on failure
+When the profiling is stopped `fclose()` is called. The profiler's function's
+contract is similar to standard `io.*` interfaces. If it is impossible to open
+a file for writing or profiler fails to start, `nil` is returned on failure
(plus an error message as a second result and a system-dependent error code as
-a third result). Otherwise returns some true value.
+a third result). Otherwise, returns `true` value.
Stopping profiler from Lua is simple too:
```lua
@@ -230,17 +231,12 @@ If you want to build LuaJIT without memory profiler, you should build it with
`-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and
`misc.memprof.stop()` always return `false`.
-Memory profiler is expected to be thread safe, so it has a corresponding
-lock/unlock at internal mutex whenever you call corresponding memprof
-functions. If you want to build LuaJIT without thread safety use
-`-DLUAJIT_DISABLE_THREAD_SAFE`.
-
### Reading and displaying saved data
-Binary data can be read by `lj-parse-memprof` utility. It parses the binary
-format provided by memory profiler and render it on human-readable format.
+Binary data can be read by `luajit-parse-memprof` utility. It parses the binary
+format provided by memory profiler and render it to human-readable format.
-The usage is very simple:
+The usage for LuaJIT itself is very simple:
```
$ ./luajit-parse-memprof --help
luajit-parse-memprof - parser of the memory usage profile collected
@@ -266,6 +262,12 @@ structures. Note that events are sorted from the most often to the least.
`Overrides` means what allocation this reallocation overrides.
+If you want to parse binary data via Tarantool only, use the following
+command (dash is important):
+```bash
+$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin
+```
+
## Benchmarks
Benchmarks were taken from repo:
===================================================================
And one more iterative patch (over the previous one). Branch is
force pushed.
===================================================================
diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
index 2721f1cc1..f9c43f91f 100644
--- a/doc/rfc/5442-luajit-memory-profiler.md
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -5,7 +5,7 @@
* **Authors**: Sergey Kaplun @Buristan skaplun at tarantool.org,
Igor Munkin @igormunkin imun at tarantool.org,
Sergey Ostanevich @sergos sergos at tarantool.org
-* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442)
+* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442), [#5490](https://github.com/tarantool/tarantool/issues/5490)
## Summary
===================================================================
--
Best regards,
Sergey Kaplun
More information about the Tarantool-discussions
mailing list