Hi!
Thanks for the patch, I've looked into
in ‘Prerequisites’:
Also all, deallocations are reported as internal too.
the comma is not needed
Lua developers can do nothing with allocations made inside the built-ins except reducing its usage.
‘its’ doesn’t explain exact matter. I would rephrase: "As for allocations made inside the built-ins user can do nothing but reduce use of these built-ins."
Currently VM state identifies C function execution only, so Fast and Lua functions states are added.
‘Currently’ -> ‘Originally’
Otherwise LGTM.
Sergos
Hi, Igor!Thanks for the review!On 15.01.21, Igor Munkin wrote:Sergey,
Thanks for the changes. There is a bit of nitpicking below and I
believe we'll push the next version doc to the trunk.
I've fixed all your comments, plus added some insignificant fixes.See two iterative patches below. Branch is force pushed.
On 25.12.20, Sergey Kaplun wrote:
Part of #5442
---
RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md
Side note: branch name is updated.New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md
Changes in v3:
* More comments in example.
* More verbose benchmark information.
* Grammar and spelling fixes.
Changes in v2:
* Removed C API, Tarantool integration and description of additional
features -- they will be added in another RFC if necessary.
* Removed checking profile is running from the public API.
* Added benchmarks and more meaningful example.
* Grammar fixes.
doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
1 file changed, 314 insertions(+)
create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
new file mode 100644
index 000000000..85a61462a
--- /dev/null
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -0,0 +1,314 @@
<snipped>
+### Prerequisites
+
+This section describes additional changes in LuaJIT required for the feature
+implementation. This version of LuaJIT memory profiler does not support verbose
+reporting allocations from traces. All allocation from traces are reported as
Typo: s/reporting allocations from/reporting for allocations made on/.
Fixed, thanks!
+internal. But trace code semantics should be totally the same as for the Lua
+interpreter (excluding sink optimizations). Also all deallocations reported as
Typo: s/deallocations reported/deallocation are reported/.
Fixed, thanks!
+internal too.
+
+There are two different representations of functions in LuaJIT: the function's
+prototype (`GCproto`) and the function object so called closure (`GCfunc`).
+The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
+correspondingly. Also LuaJIT has a special function's type aka Fast Function.
Typo: s/correspondingly/respectively/.
+It is used for LuaJIT builtins.
It's better to not split this sentence. Consider the rewording:
| Besides LuaJIT has a special function type a.k.a. Fast Function that
| is used for LuaJIT builtins.
Applied! Thanks!
+
<snipped>
+Usually developers are not interested in information about allocations inside
+builtins. So if fast function was called from a Lua function all
+allocations are attributed to this Lua function. Otherwise attribute this event
+to a C function.
I propose the following rewording:
| Lua developers can do nothing with allocations made inside the
| builtins except reducing its usage. So if fast function is called from
| a Lua function all allocations made in its scope are attributed to this
| Lua function (i.e. the builtin caller). Otherwise this event is
| attributed to a C function.
Applied, thanks!+
<snipped>
+If one run the chunk above the profiler reports approximately the following
Typo: s/run/runs/.
Fixed.
+(see legend [here](#reading-and-displaying-saved-data)):
<snipped>
+So we need to know a type of function being executed by the virtual machine
+(VM). Currently VM state identifies C function execution only, so Fast and Lua
+functions states will be added.
Typo: s/will be/are/.
Sure, thanks!
+
+To determine currently allocating coroutine (that may not be equal to currently
+executed one) a new field called `mem_L` is added to `global_State` structure
+to keep the coroutine address. This field is set at each reallocation to
Typo: /at each reallocation to/on each reallocation to the/.
Fixed.
+corresponding `L` with which it was called.
Typo: s/it was/it is/.
Thanks, fixed!
+
<snipped>
+When the profiling is stopped the `fclose()` is called. If it is impossible to
Typo: s/the `fclose()`/`fclose()`/.
Fixed.
+open a file for writing or profiler fails to start, returns `nil` on failure
Typo: s/returns `nil`/`nil` is returned/.
Fixed.
+(plus an error message as a second result and a system-dependent error code as
+a third result). Otherwise returns some true value.
It would be nice to mention that the function contract is similar to
other standart io.* interfaces.
I glanced the source code: it's not "some" true value; it is exactly the
*true* value.
All right! Fixed.
+
<snipped>
+Memory profiler is expected to be thread safe, so it has a corresponding
+lock/unlock at internal mutex whenever you call corresponding memprof
+functions. If you want to build LuaJIT without thread safety use
+`-DLUAJIT_DISABLE_THREAD_SAFE`.
This is not implemented in scope of the MVP, so drop this part.
Done.
+
+### Reading and displaying saved data
+
+Binary data can be read by `lj-parse-memprof` utility. It parses the binary
Typo: s/lj-parse-memprof/luajit-parse-memprof/.
Fixed, thanks!
+format provided by memory profiler and render it on human-readable format.
Typo: s/it on/it to/.
Fixed, thanks!
+
<snipped>
+This table shows performance deviation in relation to REFerence value (before
+commit) with stopped and running profiler. The table shows the average value
+for 11 runs. The first field of the column indicates the change in the average
+time in seconds (less is better). The second field is the standard deviation
+for the found difference.
+
+```
+ Name | REF | AFTER, memprof off | AFTER, memprof on
+----------------+------+--------------------+------------------
+array3d | 0.21 | +0.00 (0.01) | +0.00 (0.01)
+binary-trees | 3.25 | -0.01 (0.06) | +0.53 (0.10)
+chameneos | 2.97 | +0.14 (0.04) | +0.13 (0.06)
+coroutine-ring | 1.00 | +0.01 (0.04) | +0.01 (0.04)
+euler14-bit | 1.03 | +0.01 (0.02) | +0.00 (0.02)
+fannkuch | 6.81 | -0.21 (0.06) | -0.20 (0.06)
+fasta | 8.20 | -0.07 (0.05) | -0.08 (0.03)
Side note: Still curious how this can happen. It looks OK when this is
negative difference in within its deviation. But this is sorta magic.
Yes, me too. Unfortunately, we have neither any benchmark tests norperformance analisis for LuaJIT for now.
+life | 0.46 | +0.00 (0.01) | +0.35 (0.01)
+mandelbrot | 2.65 | +0.00 (0.01) | +0.01 (0.01)
+mandelbrot-bit | 1.97 | +0.00 (0.01) | +0.01 (0.02)
+md5 | 1.58 | -0.01 (0.04) | -0.04 (0.04)
+nbody | 1.34 | +0.00 (0.01) | -0.02 (0.01)
+nsieve | 2.07 | -0.03 (0.03) | -0.01 (0.04)
+nsieve-bit | 1.50 | -0.02 (0.04) | +0.00 (0.04)
+nsieve-bit-fp | 4.44 | -0.03 (0.07) | -0.01 (0.07)
+partialsums | 0.54 | +0.00 (0.01) | +0.00 (0.01)
+pidigits-nogmp | 3.47 | -0.01 (0.02) | -0.10 (0.02)
+ray | 1.62 | -0.02 (0.03) | +0.00 (0.02)
+recursive-ack | 0.20 | +0.00 (0.01) | +0.00 (0.01)
+recursive-fib | 1.63 | +0.00 (0.01) | +0.01 (0.02)
+scimark-fft | 5.72 | +0.06 (0.09) | -0.01 (0.10)
+scimark-lu | 3.47 | +0.02 (0.27) | -0.03 (0.26)
+scimark-sor | 2.34 | +0.00 (0.01) | -0.01 (0.01)
+scimark-sparse | 4.95 | -0.02 (0.04) | -0.02 (0.04)
+series | 0.95 | +0.00 (0.02) | +0.00 (0.01)
+spectral-norm | 0.96 | +0.00 (0.02) | -0.01 (0.02)
+```
--
2.28.0
--
Best regards,
IM
===================================================================diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.mdindex 85a61462a..2721f1cc1 100644--- a/doc/rfc/5442-luajit-memory-profiler.md+++ b/doc/rfc/5442-luajit-memory-profiler.md@@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be divided into several parts:This section describes additional changes in LuaJIT required for the featureimplementation. This version of LuaJIT memory profiler does not support verbose-reporting allocations from traces. All allocation from traces are reported as-internal. But trace code semantics should be totally the same as for the Lua-interpreter (excluding sink optimizations). Also all deallocations reported as-internal too.+reporting for allocations made on traces. All allocation from traces are+reported as internal. But trace code semantics should be totally the same as+for the Lua interpreter (excluding sink optimizations). Also all, deallocations+are reported as internal too.There are two different representations of functions in LuaJIT: the function'sprototype (`GCproto`) and the function object so called closure (`GCfunc`).The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures-correspondingly. Also LuaJIT has a special function's type aka Fast Function.-It is used for LuaJIT builtins.+respectively. Besides LuaJIT has a special function type, a.k.a. Fast Function+that is used for LuaJIT built-insTail call optimization does not create a new call frame, so all allocationsinside the function called via `CALLT`/`CALLMT` are attributed to its caller.-Usually developers are not interested in information about allocations inside-builtins. So if fast function was called from a Lua function all-allocations are attributed to this Lua function. Otherwise attribute this event-to a C function.+Lua developers can do nothing with allocations made inside the built-ins except+reducing its usage. So if fast function is called from a Lua function all+allocations made in its scope are attributed to this Lua function (i.e. the+built-in caller). Otherwise, this event is attributed to a C function.Assume we have the following Lua chunk named <test.lua>:-```+```lua1 jit.off()2 misc.memprof.start("memprof_new.bin")-3 -- Lua does not create a new frame to call string.rep and all allocations are-4 -- attributed not to `append()` function but to the parent scope.+3 -- Lua does not create a new frame to call string.rep() and all allocations+4 -- are attributed not to append() function but to the parent scope.5 local function append(str, rep)6 return string.rep(str, rep)7 end89 local t = {}10 for _ = 1, 1e5 do-11 -- table.insert is a builtin and all corresponding allocations+11 -- table.insert() is a built-in and all corresponding allocations12 -- are reported in the scope of main chunk13 table.insert(t,14 append('q', _)@@ -71,7 +71,7 @@ Assume we have the following Lua chunk named <test.lua>:17 misc.memprof.stop()```-If one run the chunk above the profiler reports approximately the following+If one runs the chunk above the profiler reports approximately the following(see legend [here](#reading-and-displaying-saved-data)):```ALLOCATIONS@@ -99,15 +99,15 @@ INTERNAL: 20 0 1481So we need to know a type of function being executed by the virtual machine(VM). Currently VM state identifies C function execution only, so Fast and Lua-functions states will be added.+functions states are added.To determine currently allocating coroutine (that may not be equal to currentlyexecuted one) a new field called `mem_L` is added to `global_State` structure-to keep the coroutine address. This field is set at each reallocation to-corresponding `L` with which it was called.+to keep the coroutine address. This field is set on each reallocation to the+corresponding `L` with which it is called.There is a static function (`lj_debug_getframeline`) that returns line number-for current `BCPos` in `lj_debug.c` already. It will be added to the debug+for current `BCPos` in `lj_debug.c` already. It is added to the debugmodule API to be used in memory profiler.### Information recording@@ -211,10 +211,11 @@ local started, err, errno = misc.memprof.start(fname)```where `fname` is name of the file where profile events are written. Writer forthis function perform `fwrite()` for each call retrying in case of `EINTR`.-When the profiling is stopped the `fclose()` is called. If it is impossible to-open a file for writing or profiler fails to start, returns `nil` on failure+When the profiling is stopped `fclose()` is called. The profiler's function's+contract is similar to standard `io.*` interfaces. If it is impossible to open+a file for writing or profiler fails to start, `nil` is returned on failure(plus an error message as a second result and a system-dependent error code as-a third result). Otherwise returns some true value.+a third result). Otherwise, returns `true` value.Stopping profiler from Lua is simple too:```lua@@ -230,17 +231,12 @@ If you want to build LuaJIT without memory profiler, you should build it with`-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and`misc.memprof.stop()` always return `false`.-Memory profiler is expected to be thread safe, so it has a corresponding-lock/unlock at internal mutex whenever you call corresponding memprof-functions. If you want to build LuaJIT without thread safety use-`-DLUAJIT_DISABLE_THREAD_SAFE`.-### Reading and displaying saved data-Binary data can be read by `lj-parse-memprof` utility. It parses the binary-format provided by memory profiler and render it on human-readable format.+Binary data can be read by `luajit-parse-memprof` utility. It parses the binary+format provided by memory profiler and render it to human-readable format.-The usage is very simple:+The usage for LuaJIT itself is very simple:```$ ./luajit-parse-memprof --helpluajit-parse-memprof - parser of the memory usage profile collected@@ -266,6 +262,12 @@ structures. Note that events are sorted from the most often to the least.`Overrides` means what allocation this reallocation overrides.+If you want to parse binary data via Tarantool only, use the following+command (dash is important):+```bash+$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin+```+## BenchmarksBenchmarks were taken from repo:===================================================================And one more iterative patch (over the previous one). Branch isforce pushed.===================================================================diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.mdindex 2721f1cc1..f9c43f91f 100644--- a/doc/rfc/5442-luajit-memory-profiler.md+++ b/doc/rfc/5442-luajit-memory-profiler.md@@ -5,7 +5,7 @@* **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org, Igor Munkin @igormunkin imun@tarantool.org, Sergey Ostanevich @sergos sergos@tarantool.org-* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442)+* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442), [#5490](https://github.com/tarantool/tarantool/issues/5490)## Summary===================================================================-- Best regards,Sergey Kaplun