[Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler

Igor Munkin imun at tarantool.org
Fri Jan 15 16:14:24 MSK 2021


Sergey,

Thanks for the changes. There is a bit of nitpicking below and I
believe we'll push the next version doc to the trunk.

On 25.12.20, Sergey Kaplun wrote:
> Part of #5442
> ---
> 
> RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md
> 
> Changes in v3:
> * More comments in example.
> * More verbose benchmark information.
> * Grammar and spelling fixes.
> 
> Changes in v2:
> * Removed C API, Tarantool integration and description of additional
>   features -- they will be added in another RFC if necessary.
> * Removed checking profile is running from the public API.
> * Added benchmarks and more meaningful example.
> * Grammar fixes.
> 
>  doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
>  1 file changed, 314 insertions(+)
>  create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
> 
> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> new file mode 100644
> index 000000000..85a61462a
> --- /dev/null
> +++ b/doc/rfc/5442-luajit-memory-profiler.md
> @@ -0,0 +1,314 @@

<snipped>

> +### Prerequisites
> +
> +This section describes additional changes in LuaJIT required for the feature
> +implementation. This version of LuaJIT memory profiler does not support verbose
> +reporting allocations from traces. All allocation from traces are reported as

Typo: s/reporting allocations from/reporting for allocations made on/.

> +internal. But trace code semantics should be totally the same as for the Lua
> +interpreter (excluding sink optimizations). Also all deallocations reported as

Typo: s/deallocations reported/deallocation are reported/.

> +internal too.
> +
> +There are two different representations of functions in LuaJIT: the function's
> +prototype (`GCproto`) and the function object so called closure (`GCfunc`).
> +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
> +correspondingly. Also LuaJIT has a special function's type aka Fast Function.

Typo: s/correspondingly/respectively/.

> +It is used for LuaJIT builtins.

It's better to not split this sentence. Consider the rewording:
| Besides LuaJIT has a special function type a.k.a. Fast Function that
| is used for LuaJIT builtins.

> +

<snipped>

> +Usually developers are not interested in information about allocations inside
> +builtins. So if fast function was called from a Lua function all
> +allocations are attributed to this Lua function. Otherwise attribute this event
> +to a C function.

I propose the following rewording:
| Lua developers can do nothing with allocations made inside the
| builtins except reducing its usage. So if fast function is called from
| a Lua function all allocations made in its scope are attributed to this
| Lua function (i.e. the builtin caller). Otherwise this event is
| attributed to a C function.

> +

<snipped>

> +If one run the chunk above the profiler reports approximately the following

Typo: s/run/runs/.

> +(see legend [here](#reading-and-displaying-saved-data)):

<snipped>

> +So we need to know a type of function being executed by the virtual machine
> +(VM). Currently VM state identifies C function execution only, so Fast and Lua
> +functions states will be added.

Typo: s/will be/are/.

> +
> +To determine currently allocating coroutine (that may not be equal to currently
> +executed one) a new field called `mem_L` is added to `global_State` structure
> +to keep the coroutine address. This field is set at each reallocation to

Typo: /at each reallocation to/on each reallocation to the/.

> +corresponding `L` with which it was called.

Typo: s/it was/it is/.

> +

<snipped>

> +When the profiling is stopped the `fclose()` is called. If it is impossible to

Typo: s/the `fclose()`/`fclose()`/.

> +open a file for writing or profiler fails to start, returns `nil` on failure

Typo: s/returns `nil`/`nil` is returned/.

> +(plus an error message as a second result and a system-dependent error code as
> +a third result). Otherwise returns some true value.

It would be nice to mention that the function contract is similar to
other standart io.* interfaces.

I glanced the source code: it's not "some" true value; it is exactly the
*true* value.

> +

<snipped>

> +Memory profiler is expected to be thread safe, so it has a corresponding
> +lock/unlock at internal mutex whenever you call corresponding memprof
> +functions. If you want to build LuaJIT without thread safety use
> +`-DLUAJIT_DISABLE_THREAD_SAFE`.

This is not implemented in scope of the MVP, so drop this part.

> +
> +### Reading and displaying saved data
> +
> +Binary data can be read by `lj-parse-memprof` utility. It parses the binary

Typo: s/lj-parse-memprof/luajit-parse-memprof/.

> +format provided by memory profiler and render it on human-readable format.

Typo: s/it on/it to/.

> +

<snipped>

> +This table shows performance deviation in relation to REFerence value (before
> +commit) with stopped and running profiler. The table shows the average value
> +for 11 runs. The first field of the column indicates the change in the average
> +time in seconds (less is better). The second field is the standard deviation
> +for the found difference.
> +
> +```
> +     Name       | REF  | AFTER, memprof off | AFTER, memprof on
> +----------------+------+--------------------+------------------
> +array3d         | 0.21 |    +0.00 (0.01)    |    +0.00 (0.01)
> +binary-trees    | 3.25 |    -0.01 (0.06)    |    +0.53 (0.10)
> +chameneos       | 2.97 |    +0.14 (0.04)    |    +0.13 (0.06)
> +coroutine-ring  | 1.00 |    +0.01 (0.04)    |    +0.01 (0.04)
> +euler14-bit     | 1.03 |    +0.01 (0.02)    |    +0.00 (0.02)
> +fannkuch        | 6.81 |    -0.21 (0.06)    |    -0.20 (0.06)
> +fasta           | 8.20 |    -0.07 (0.05)    |    -0.08 (0.03)

Side note: Still curious how this can happen. It looks OK when this is
negative difference in within its deviation. But this is sorta magic.

> +life            | 0.46 |    +0.00 (0.01)    |    +0.35 (0.01)
> +mandelbrot      | 2.65 |    +0.00 (0.01)    |    +0.01 (0.01)
> +mandelbrot-bit  | 1.97 |    +0.00 (0.01)    |    +0.01 (0.02)
> +md5             | 1.58 |    -0.01 (0.04)    |    -0.04 (0.04)
> +nbody           | 1.34 |    +0.00 (0.01)    |    -0.02 (0.01)
> +nsieve          | 2.07 |    -0.03 (0.03)    |    -0.01 (0.04)
> +nsieve-bit      | 1.50 |    -0.02 (0.04)    |    +0.00 (0.04)
> +nsieve-bit-fp   | 4.44 |    -0.03 (0.07)    |    -0.01 (0.07)
> +partialsums     | 0.54 |    +0.00 (0.01)    |    +0.00 (0.01)
> +pidigits-nogmp  | 3.47 |    -0.01 (0.02)    |    -0.10 (0.02)
> +ray             | 1.62 |    -0.02 (0.03)    |    +0.00 (0.02)
> +recursive-ack   | 0.20 |    +0.00 (0.01)    |    +0.00 (0.01)
> +recursive-fib   | 1.63 |    +0.00 (0.01)    |    +0.01 (0.02)
> +scimark-fft     | 5.72 |    +0.06 (0.09)    |    -0.01 (0.10)
> +scimark-lu      | 3.47 |    +0.02 (0.27)    |    -0.03 (0.26)
> +scimark-sor     | 2.34 |    +0.00 (0.01)    |    -0.01 (0.01)
> +scimark-sparse  | 4.95 |    -0.02 (0.04)    |    -0.02 (0.04)
> +series          | 0.95 |    +0.00 (0.02)    |    +0.00 (0.01)
> +spectral-norm   | 0.96 |    +0.00 (0.02)    |    -0.01 (0.02)
> +```
> -- 
> 2.28.0
> 

-- 
Best regards,
IM


More information about the Tarantool-discussions mailing list