[Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
Igor Munkin
imun at tarantool.org
Fri Jan 15 16:14:24 MSK 2021
Sergey,
Thanks for the changes. There is a bit of nitpicking below and I
believe we'll push the next version doc to the trunk.
On 25.12.20, Sergey Kaplun wrote:
> Part of #5442
> ---
>
> RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md
>
> Changes in v3:
> * More comments in example.
> * More verbose benchmark information.
> * Grammar and spelling fixes.
>
> Changes in v2:
> * Removed C API, Tarantool integration and description of additional
> features -- they will be added in another RFC if necessary.
> * Removed checking profile is running from the public API.
> * Added benchmarks and more meaningful example.
> * Grammar fixes.
>
> doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
> 1 file changed, 314 insertions(+)
> create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
>
> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> new file mode 100644
> index 000000000..85a61462a
> --- /dev/null
> +++ b/doc/rfc/5442-luajit-memory-profiler.md
> @@ -0,0 +1,314 @@
<snipped>
> +### Prerequisites
> +
> +This section describes additional changes in LuaJIT required for the feature
> +implementation. This version of LuaJIT memory profiler does not support verbose
> +reporting allocations from traces. All allocation from traces are reported as
Typo: s/reporting allocations from/reporting for allocations made on/.
> +internal. But trace code semantics should be totally the same as for the Lua
> +interpreter (excluding sink optimizations). Also all deallocations reported as
Typo: s/deallocations reported/deallocation are reported/.
> +internal too.
> +
> +There are two different representations of functions in LuaJIT: the function's
> +prototype (`GCproto`) and the function object so called closure (`GCfunc`).
> +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
> +correspondingly. Also LuaJIT has a special function's type aka Fast Function.
Typo: s/correspondingly/respectively/.
> +It is used for LuaJIT builtins.
It's better to not split this sentence. Consider the rewording:
| Besides LuaJIT has a special function type a.k.a. Fast Function that
| is used for LuaJIT builtins.
> +
<snipped>
> +Usually developers are not interested in information about allocations inside
> +builtins. So if fast function was called from a Lua function all
> +allocations are attributed to this Lua function. Otherwise attribute this event
> +to a C function.
I propose the following rewording:
| Lua developers can do nothing with allocations made inside the
| builtins except reducing its usage. So if fast function is called from
| a Lua function all allocations made in its scope are attributed to this
| Lua function (i.e. the builtin caller). Otherwise this event is
| attributed to a C function.
> +
<snipped>
> +If one run the chunk above the profiler reports approximately the following
Typo: s/run/runs/.
> +(see legend [here](#reading-and-displaying-saved-data)):
<snipped>
> +So we need to know a type of function being executed by the virtual machine
> +(VM). Currently VM state identifies C function execution only, so Fast and Lua
> +functions states will be added.
Typo: s/will be/are/.
> +
> +To determine currently allocating coroutine (that may not be equal to currently
> +executed one) a new field called `mem_L` is added to `global_State` structure
> +to keep the coroutine address. This field is set at each reallocation to
Typo: /at each reallocation to/on each reallocation to the/.
> +corresponding `L` with which it was called.
Typo: s/it was/it is/.
> +
<snipped>
> +When the profiling is stopped the `fclose()` is called. If it is impossible to
Typo: s/the `fclose()`/`fclose()`/.
> +open a file for writing or profiler fails to start, returns `nil` on failure
Typo: s/returns `nil`/`nil` is returned/.
> +(plus an error message as a second result and a system-dependent error code as
> +a third result). Otherwise returns some true value.
It would be nice to mention that the function contract is similar to
other standart io.* interfaces.
I glanced the source code: it's not "some" true value; it is exactly the
*true* value.
> +
<snipped>
> +Memory profiler is expected to be thread safe, so it has a corresponding
> +lock/unlock at internal mutex whenever you call corresponding memprof
> +functions. If you want to build LuaJIT without thread safety use
> +`-DLUAJIT_DISABLE_THREAD_SAFE`.
This is not implemented in scope of the MVP, so drop this part.
> +
> +### Reading and displaying saved data
> +
> +Binary data can be read by `lj-parse-memprof` utility. It parses the binary
Typo: s/lj-parse-memprof/luajit-parse-memprof/.
> +format provided by memory profiler and render it on human-readable format.
Typo: s/it on/it to/.
> +
<snipped>
> +This table shows performance deviation in relation to REFerence value (before
> +commit) with stopped and running profiler. The table shows the average value
> +for 11 runs. The first field of the column indicates the change in the average
> +time in seconds (less is better). The second field is the standard deviation
> +for the found difference.
> +
> +```
> + Name | REF | AFTER, memprof off | AFTER, memprof on
> +----------------+------+--------------------+------------------
> +array3d | 0.21 | +0.00 (0.01) | +0.00 (0.01)
> +binary-trees | 3.25 | -0.01 (0.06) | +0.53 (0.10)
> +chameneos | 2.97 | +0.14 (0.04) | +0.13 (0.06)
> +coroutine-ring | 1.00 | +0.01 (0.04) | +0.01 (0.04)
> +euler14-bit | 1.03 | +0.01 (0.02) | +0.00 (0.02)
> +fannkuch | 6.81 | -0.21 (0.06) | -0.20 (0.06)
> +fasta | 8.20 | -0.07 (0.05) | -0.08 (0.03)
Side note: Still curious how this can happen. It looks OK when this is
negative difference in within its deviation. But this is sorta magic.
> +life | 0.46 | +0.00 (0.01) | +0.35 (0.01)
> +mandelbrot | 2.65 | +0.00 (0.01) | +0.01 (0.01)
> +mandelbrot-bit | 1.97 | +0.00 (0.01) | +0.01 (0.02)
> +md5 | 1.58 | -0.01 (0.04) | -0.04 (0.04)
> +nbody | 1.34 | +0.00 (0.01) | -0.02 (0.01)
> +nsieve | 2.07 | -0.03 (0.03) | -0.01 (0.04)
> +nsieve-bit | 1.50 | -0.02 (0.04) | +0.00 (0.04)
> +nsieve-bit-fp | 4.44 | -0.03 (0.07) | -0.01 (0.07)
> +partialsums | 0.54 | +0.00 (0.01) | +0.00 (0.01)
> +pidigits-nogmp | 3.47 | -0.01 (0.02) | -0.10 (0.02)
> +ray | 1.62 | -0.02 (0.03) | +0.00 (0.02)
> +recursive-ack | 0.20 | +0.00 (0.01) | +0.00 (0.01)
> +recursive-fib | 1.63 | +0.00 (0.01) | +0.01 (0.02)
> +scimark-fft | 5.72 | +0.06 (0.09) | -0.01 (0.10)
> +scimark-lu | 3.47 | +0.02 (0.27) | -0.03 (0.26)
> +scimark-sor | 2.34 | +0.00 (0.01) | -0.01 (0.01)
> +scimark-sparse | 4.95 | -0.02 (0.04) | -0.02 (0.04)
> +series | 0.95 | +0.00 (0.02) | +0.00 (0.01)
> +spectral-norm | 0.96 | +0.00 (0.02) | -0.01 (0.02)
> +```
> --
> 2.28.0
>
--
Best regards,
IM
More information about the Tarantool-discussions
mailing list