Tarantool discussions archive
 help / color / mirror / Atom feed
* [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
@ 2020-12-25 11:34 Sergey Kaplun
  2021-01-15 13:14 ` Igor Munkin via Tarantool-discussions
  2021-01-21 18:42 ` Igor Munkin via Tarantool-discussions
  0 siblings, 2 replies; 7+ messages in thread
From: Sergey Kaplun @ 2020-12-25 11:34 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-discussions

Part of #5442
---

RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md

Changes in v3:
* More comments in example.
* More verbose benchmark information.
* Grammar and spelling fixes.

Changes in v2:
* Removed C API, Tarantool integration and description of additional
  features -- they will be added in another RFC if necessary.
* Removed checking profile is running from the public API.
* Added benchmarks and more meaningful example.
* Grammar fixes.

 doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
 1 file changed, 314 insertions(+)
 create mode 100644 doc/rfc/5442-luajit-memory-profiler.md

diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
new file mode 100644
index 000000000..85a61462a
--- /dev/null
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -0,0 +1,314 @@
+# LuaJIT memory profiler
+
+* **Status**: In progress
+* **Start date**: 24-10-2020
+* **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org,
+               Igor Munkin @igormunkin imun@tarantool.org,
+               Sergey Ostanevich @sergos sergos@tarantool.org
+* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442)
+
+## Summary
+
+LuaJIT memory profiler is a toolchain for analysis of memory usage by user's
+application.
+
+## Background and motivation
+
+Garbage collector (GC) is a curse of performance for most of Lua applications.
+Memory usage of Lua application should be profiled to locate various
+memory-unoptimized code blocks. If the application has memory leaks they can be
+found with the profiler also.
+
+## Detailed design
+
+The whole toolchain of memory profiling will be divided into several parts:
+1) [Prerequisites](#prerequisites).
+2) [Recording information about memory usage and saving it](#information-recording).
+3) [Reading saved data and display it in human-readable format](#reading-and-displaying-saved-data).
+
+### Prerequisites
+
+This section describes additional changes in LuaJIT required for the feature
+implementation. This version of LuaJIT memory profiler does not support verbose
+reporting allocations from traces. All allocation from traces are reported as
+internal. But trace code semantics should be totally the same as for the Lua
+interpreter (excluding sink optimizations). Also all deallocations reported as
+internal too.
+
+There are two different representations of functions in LuaJIT: the function's
+prototype (`GCproto`) and the function object so called closure (`GCfunc`).
+The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
+correspondingly. Also LuaJIT has a special function's type aka Fast Function.
+It is used for LuaJIT builtins.
+
+Tail call optimization does not create a new call frame, so all allocations
+inside the function called via `CALLT`/`CALLMT` are attributed to its caller.
+
+Usually developers are not interested in information about allocations inside
+builtins. So if fast function was called from a Lua function all
+allocations are attributed to this Lua function. Otherwise attribute this event
+to a C function.
+
+Assume we have the following Lua chunk named <test.lua>:
+
+```
+1  jit.off()
+2  misc.memprof.start("memprof_new.bin")
+3  -- Lua does not create a new frame to call string.rep and all allocations are
+4  -- attributed not to `append()` function but to the parent scope.
+5  local function append(str, rep)
+6    return string.rep(str, rep)
+7  end
+8
+9  local t = {}
+10 for _ = 1, 1e5 do
+11   -- table.insert is a builtin and all corresponding allocations
+12   -- are reported in the scope of main chunk
+13   table.insert(t,
+14     append('q', _)
+15   )
+16 end
+17 misc.memprof.stop()
+```
+
+If one run the chunk above the profiler reports approximately the following
+(see legend [here](#reading-and-displaying-saved-data)):
+```
+ALLOCATIONS
+@test.lua:0, line 14: 1002      531818  0
+@test.lua:0, line 13: 1 24      0
+@test.lua:0, line 9: 1  32      0
+@test.lua:0, line 7: 1  20      0
+
+REALLOCATIONS
+@test.lua:0, line 13: 9 16424   8248
+        Overrides:
+                @test.lua:0, line 13
+
+@test.lua:0, line 14: 5 1984    992
+        Overrides:
+                @test.lua:0, line 14
+
+
+DEALLOCATIONS
+INTERNAL: 20    0       1481
+@test.lua:0, line 14: 3 0       7168
+        Overrides:
+                @test.lua:0, line 14
+```
+
+So we need to know a type of function being executed by the virtual machine
+(VM). Currently VM state identifies C function execution only, so Fast and Lua
+functions states will be added.
+
+To determine currently allocating coroutine (that may not be equal to currently
+executed one) a new field called `mem_L` is added to `global_State` structure
+to keep the coroutine address. This field is set at each reallocation to
+corresponding `L` with which it was called.
+
+There is a static function (`lj_debug_getframeline`) that returns line number
+for current `BCPos` in `lj_debug.c` already. It will be added to the debug
+module API to be used in memory profiler.
+
+### Information recording
+
+Each allocate/reallocate/free is considered as a type of event that are
+reported. Event stream has the following format:
+
+```c
+/*
+** Event stream format:
+**
+** stream         := symtab memprof
+** symtab         := see symtab description
+** memprof        := prologue event* epilogue
+** prologue       := 'l' 'j' 'm' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** event          := event-alloc | event-realloc | event-free
+** event-alloc    := event-header loc? naddr nsize
+** event-realloc  := event-header loc? oaddr osize naddr nsize
+** event-free     := event-header loc? oaddr osize
+** event-header   := <BYTE>
+** loc            := loc-lua | loc-c
+** loc-lua        := sym-addr line-no
+** loc-c          := sym-addr
+** sym-addr       := <ULEB128>
+** line-no        := <ULEB128>
+** oaddr          := <ULEB128>
+** naddr          := <ULEB128>
+** osize          := <ULEB128>
+** nsize          := <ULEB128>
+** epilogue       := event-header
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain integer version number
+**
+** event-header: [FUUUSSEE]
+**  * EE   : 2 bits for representing allocation event type (AEVENT_*)
+**  * SS   : 2 bits for representing allocation source type (ASOURCE_*)
+**  * UUU  : 3 unused bits
+**  * F    : 0 for regular events, 1 for epilogue's *F*inal header
+**           (if F is set to 1, all other bits are currently ignored)
+*/
+```
+
+It is enough to know the address of LUA/C function to determine it. Symbolic
+table (symtab) is dumped at the start of profiling to avoid dumping function
+location on each memory event for saving both CPU usage and binary profile
+size.
+
+Each line contains the address, Lua chunk definition as the filename and line
+number of the function's declaration. This table of symbols has the following
+format described at <lj_memprof.h>:
+
+```c
+/*
+** symtab format:
+**
+** symtab         := prologue sym*
+** prologue       := 'l' 'j' 's' version reserved
+** version        := <BYTE>
+** reserved       := <BYTE> <BYTE> <BYTE>
+** sym            := sym-lua | sym-final
+** sym-lua        := sym-header sym-addr sym-chunk sym-line
+** sym-header     := <BYTE>
+** sym-addr       := <ULEB128>
+** sym-chunk      := string
+** sym-line       := <ULEB128>
+** sym-final      := sym-header
+** string         := string-len string-payload
+** string-len     := <ULEB128>
+** string-payload := <BYTE> {string-len}
+**
+** <BYTE>   :  A single byte (no surprises here)
+** <ULEB128>:  Unsigned integer represented in ULEB128 encoding
+**
+** (Order of bits below is hi -> lo)
+**
+** version: [VVVVVVVV]
+**  * VVVVVVVV: Byte interpreted as a plain numeric version number
+**
+** sym-header: [FUUUUUTT]
+**  * TT    : 2 bits for representing symbol type
+**  * UUUUU : 5 unused bits
+**  * F     : 1 bit marking the end of the symtab (final symbol)
+*/
+```
+
+So when memory profiling starts the current allocation function is replaced by
+the new allocation function additionally wrapped to write the profiling events.
+When profiler stops the previous allocation function is restored.
+
+Starting profiler from Lua is quite simple:
+```lua
+local started, err, errno = misc.memprof.start(fname)
+```
+where `fname` is name of the file where profile events are written. Writer for
+this function perform `fwrite()` for each call retrying in case of `EINTR`.
+When the profiling is stopped the `fclose()` is called. If it is impossible to
+open a file for writing or profiler fails to start, returns `nil` on failure
+(plus an error message as a second result and a system-dependent error code as
+a third result). Otherwise returns some true value.
+
+Stopping profiler from Lua is simple too:
+```lua
+local stopped, err, errno = misc.memprof.stop()
+```
+
+If there is any error occurred at profiling stopping (an error when file
+descriptor was closed) `memprof.stop()` returns `nil` (plus an error message as
+a second result and a system-dependent error code as a third result). Returns
+`true` otherwise.
+
+If you want to build LuaJIT without memory profiler, you should build it with
+`-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and
+`misc.memprof.stop()` always return `false`.
+
+Memory profiler is expected to be thread safe, so it has a corresponding
+lock/unlock at internal mutex whenever you call corresponding memprof
+functions. If you want to build LuaJIT without thread safety use
+`-DLUAJIT_DISABLE_THREAD_SAFE`.
+
+### Reading and displaying saved data
+
+Binary data can be read by `lj-parse-memprof` utility. It parses the binary
+format provided by memory profiler and render it on human-readable format.
+
+The usage is very simple:
+```
+$ ./luajit-parse-memprof --help
+luajit-parse-memprof - parser of the memory usage profile collected
+                       with LuaJIT's memprof.
+
+SYNOPSIS
+
+luajit-parse-memprof [options] memprof.bin
+
+Supported options are:
+
+  --help                            Show this help and exit
+```
+
+Plain text of profiled info has the following format:
+```
+@<filename>:<function_line>, line <line where event was detected>: <number of events>	<allocated>	<freed>
+```
+See the example [above](#prerequisites).
+
+`INTERNAL` means that these allocations are caused by internal LuaJIT
+structures. Note that events are sorted from the most often to the least.
+
+`Overrides` means what allocation this reallocation overrides.
+
+## Benchmarks
+
+Benchmarks were taken from repo:
+[LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup).
+
+Example of measuring:
+```bash
+/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua 300 >/dev/null
+```
+
+This table shows performance deviation in relation to REFerence value (before
+commit) with stopped and running profiler. The table shows the average value
+for 11 runs. The first field of the column indicates the change in the average
+time in seconds (less is better). The second field is the standard deviation
+for the found difference.
+
+```
+     Name       | REF  | AFTER, memprof off | AFTER, memprof on
+----------------+------+--------------------+------------------
+array3d         | 0.21 |    +0.00 (0.01)    |    +0.00 (0.01)
+binary-trees    | 3.25 |    -0.01 (0.06)    |    +0.53 (0.10)
+chameneos       | 2.97 |    +0.14 (0.04)    |    +0.13 (0.06)
+coroutine-ring  | 1.00 |    +0.01 (0.04)    |    +0.01 (0.04)
+euler14-bit     | 1.03 |    +0.01 (0.02)    |    +0.00 (0.02)
+fannkuch        | 6.81 |    -0.21 (0.06)    |    -0.20 (0.06)
+fasta           | 8.20 |    -0.07 (0.05)    |    -0.08 (0.03)
+life            | 0.46 |    +0.00 (0.01)    |    +0.35 (0.01)
+mandelbrot      | 2.65 |    +0.00 (0.01)    |    +0.01 (0.01)
+mandelbrot-bit  | 1.97 |    +0.00 (0.01)    |    +0.01 (0.02)
+md5             | 1.58 |    -0.01 (0.04)    |    -0.04 (0.04)
+nbody           | 1.34 |    +0.00 (0.01)    |    -0.02 (0.01)
+nsieve          | 2.07 |    -0.03 (0.03)    |    -0.01 (0.04)
+nsieve-bit      | 1.50 |    -0.02 (0.04)    |    +0.00 (0.04)
+nsieve-bit-fp   | 4.44 |    -0.03 (0.07)    |    -0.01 (0.07)
+partialsums     | 0.54 |    +0.00 (0.01)    |    +0.00 (0.01)
+pidigits-nogmp  | 3.47 |    -0.01 (0.02)    |    -0.10 (0.02)
+ray             | 1.62 |    -0.02 (0.03)    |    +0.00 (0.02)
+recursive-ack   | 0.20 |    +0.00 (0.01)    |    +0.00 (0.01)
+recursive-fib   | 1.63 |    +0.00 (0.01)    |    +0.01 (0.02)
+scimark-fft     | 5.72 |    +0.06 (0.09)    |    -0.01 (0.10)
+scimark-lu      | 3.47 |    +0.02 (0.27)    |    -0.03 (0.26)
+scimark-sor     | 2.34 |    +0.00 (0.01)    |    -0.01 (0.01)
+scimark-sparse  | 4.95 |    -0.02 (0.04)    |    -0.02 (0.04)
+series          | 0.95 |    +0.00 (0.02)    |    +0.00 (0.01)
+spectral-norm   | 0.96 |    +0.00 (0.02)    |    -0.01 (0.02)
+```
-- 
2.28.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
  2020-12-25 11:34 [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler Sergey Kaplun
@ 2021-01-15 13:14 ` Igor Munkin via Tarantool-discussions
  2021-01-20  8:19   ` Sergey Kaplun via Tarantool-discussions
  2021-01-21 18:42 ` Igor Munkin via Tarantool-discussions
  1 sibling, 1 reply; 7+ messages in thread
From: Igor Munkin via Tarantool-discussions @ 2021-01-15 13:14 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-discussions

Sergey,

Thanks for the changes. There is a bit of nitpicking below and I
believe we'll push the next version doc to the trunk.

On 25.12.20, Sergey Kaplun wrote:
> Part of #5442
> ---
> 
> RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md
> 
> Changes in v3:
> * More comments in example.
> * More verbose benchmark information.
> * Grammar and spelling fixes.
> 
> Changes in v2:
> * Removed C API, Tarantool integration and description of additional
>   features -- they will be added in another RFC if necessary.
> * Removed checking profile is running from the public API.
> * Added benchmarks and more meaningful example.
> * Grammar fixes.
> 
>  doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
>  1 file changed, 314 insertions(+)
>  create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
> 
> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> new file mode 100644
> index 000000000..85a61462a
> --- /dev/null
> +++ b/doc/rfc/5442-luajit-memory-profiler.md
> @@ -0,0 +1,314 @@

<snipped>

> +### Prerequisites
> +
> +This section describes additional changes in LuaJIT required for the feature
> +implementation. This version of LuaJIT memory profiler does not support verbose
> +reporting allocations from traces. All allocation from traces are reported as

Typo: s/reporting allocations from/reporting for allocations made on/.

> +internal. But trace code semantics should be totally the same as for the Lua
> +interpreter (excluding sink optimizations). Also all deallocations reported as

Typo: s/deallocations reported/deallocation are reported/.

> +internal too.
> +
> +There are two different representations of functions in LuaJIT: the function's
> +prototype (`GCproto`) and the function object so called closure (`GCfunc`).
> +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
> +correspondingly. Also LuaJIT has a special function's type aka Fast Function.

Typo: s/correspondingly/respectively/.

> +It is used for LuaJIT builtins.

It's better to not split this sentence. Consider the rewording:
| Besides LuaJIT has a special function type a.k.a. Fast Function that
| is used for LuaJIT builtins.

> +

<snipped>

> +Usually developers are not interested in information about allocations inside
> +builtins. So if fast function was called from a Lua function all
> +allocations are attributed to this Lua function. Otherwise attribute this event
> +to a C function.

I propose the following rewording:
| Lua developers can do nothing with allocations made inside the
| builtins except reducing its usage. So if fast function is called from
| a Lua function all allocations made in its scope are attributed to this
| Lua function (i.e. the builtin caller). Otherwise this event is
| attributed to a C function.

> +

<snipped>

> +If one run the chunk above the profiler reports approximately the following

Typo: s/run/runs/.

> +(see legend [here](#reading-and-displaying-saved-data)):

<snipped>

> +So we need to know a type of function being executed by the virtual machine
> +(VM). Currently VM state identifies C function execution only, so Fast and Lua
> +functions states will be added.

Typo: s/will be/are/.

> +
> +To determine currently allocating coroutine (that may not be equal to currently
> +executed one) a new field called `mem_L` is added to `global_State` structure
> +to keep the coroutine address. This field is set at each reallocation to

Typo: /at each reallocation to/on each reallocation to the/.

> +corresponding `L` with which it was called.

Typo: s/it was/it is/.

> +

<snipped>

> +When the profiling is stopped the `fclose()` is called. If it is impossible to

Typo: s/the `fclose()`/`fclose()`/.

> +open a file for writing or profiler fails to start, returns `nil` on failure

Typo: s/returns `nil`/`nil` is returned/.

> +(plus an error message as a second result and a system-dependent error code as
> +a third result). Otherwise returns some true value.

It would be nice to mention that the function contract is similar to
other standart io.* interfaces.

I glanced the source code: it's not "some" true value; it is exactly the
*true* value.

> +

<snipped>

> +Memory profiler is expected to be thread safe, so it has a corresponding
> +lock/unlock at internal mutex whenever you call corresponding memprof
> +functions. If you want to build LuaJIT without thread safety use
> +`-DLUAJIT_DISABLE_THREAD_SAFE`.

This is not implemented in scope of the MVP, so drop this part.

> +
> +### Reading and displaying saved data
> +
> +Binary data can be read by `lj-parse-memprof` utility. It parses the binary

Typo: s/lj-parse-memprof/luajit-parse-memprof/.

> +format provided by memory profiler and render it on human-readable format.

Typo: s/it on/it to/.

> +

<snipped>

> +This table shows performance deviation in relation to REFerence value (before
> +commit) with stopped and running profiler. The table shows the average value
> +for 11 runs. The first field of the column indicates the change in the average
> +time in seconds (less is better). The second field is the standard deviation
> +for the found difference.
> +
> +```
> +     Name       | REF  | AFTER, memprof off | AFTER, memprof on
> +----------------+------+--------------------+------------------
> +array3d         | 0.21 |    +0.00 (0.01)    |    +0.00 (0.01)
> +binary-trees    | 3.25 |    -0.01 (0.06)    |    +0.53 (0.10)
> +chameneos       | 2.97 |    +0.14 (0.04)    |    +0.13 (0.06)
> +coroutine-ring  | 1.00 |    +0.01 (0.04)    |    +0.01 (0.04)
> +euler14-bit     | 1.03 |    +0.01 (0.02)    |    +0.00 (0.02)
> +fannkuch        | 6.81 |    -0.21 (0.06)    |    -0.20 (0.06)
> +fasta           | 8.20 |    -0.07 (0.05)    |    -0.08 (0.03)

Side note: Still curious how this can happen. It looks OK when this is
negative difference in within its deviation. But this is sorta magic.

> +life            | 0.46 |    +0.00 (0.01)    |    +0.35 (0.01)
> +mandelbrot      | 2.65 |    +0.00 (0.01)    |    +0.01 (0.01)
> +mandelbrot-bit  | 1.97 |    +0.00 (0.01)    |    +0.01 (0.02)
> +md5             | 1.58 |    -0.01 (0.04)    |    -0.04 (0.04)
> +nbody           | 1.34 |    +0.00 (0.01)    |    -0.02 (0.01)
> +nsieve          | 2.07 |    -0.03 (0.03)    |    -0.01 (0.04)
> +nsieve-bit      | 1.50 |    -0.02 (0.04)    |    +0.00 (0.04)
> +nsieve-bit-fp   | 4.44 |    -0.03 (0.07)    |    -0.01 (0.07)
> +partialsums     | 0.54 |    +0.00 (0.01)    |    +0.00 (0.01)
> +pidigits-nogmp  | 3.47 |    -0.01 (0.02)    |    -0.10 (0.02)
> +ray             | 1.62 |    -0.02 (0.03)    |    +0.00 (0.02)
> +recursive-ack   | 0.20 |    +0.00 (0.01)    |    +0.00 (0.01)
> +recursive-fib   | 1.63 |    +0.00 (0.01)    |    +0.01 (0.02)
> +scimark-fft     | 5.72 |    +0.06 (0.09)    |    -0.01 (0.10)
> +scimark-lu      | 3.47 |    +0.02 (0.27)    |    -0.03 (0.26)
> +scimark-sor     | 2.34 |    +0.00 (0.01)    |    -0.01 (0.01)
> +scimark-sparse  | 4.95 |    -0.02 (0.04)    |    -0.02 (0.04)
> +series          | 0.95 |    +0.00 (0.02)    |    +0.00 (0.01)
> +spectral-norm   | 0.96 |    +0.00 (0.02)    |    -0.01 (0.02)
> +```
> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
  2021-01-15 13:14 ` Igor Munkin via Tarantool-discussions
@ 2021-01-20  8:19   ` Sergey Kaplun via Tarantool-discussions
  2021-01-20 14:26     ` Sergey Ostanevich via Tarantool-discussions
  2021-01-21 18:41     ` Igor Munkin via Tarantool-discussions
  0 siblings, 2 replies; 7+ messages in thread
From: Sergey Kaplun via Tarantool-discussions @ 2021-01-20  8:19 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-discussions

Hi, Igor!

Thanks for the review!

On 15.01.21, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the changes. There is a bit of nitpicking below and I
> believe we'll push the next version doc to the trunk.

I've fixed all your comments, plus added some insignificant fixes.
See two iterative patches below. Branch is force pushed.

> 
> On 25.12.20, Sergey Kaplun wrote:
> > Part of #5442
> > ---
> > 
> > RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md

Side note: branch name is updated.
New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md

> > 
> > Changes in v3:
> > * More comments in example.
> > * More verbose benchmark information.
> > * Grammar and spelling fixes.
> > 
> > Changes in v2:
> > * Removed C API, Tarantool integration and description of additional
> >   features -- they will be added in another RFC if necessary.
> > * Removed checking profile is running from the public API.
> > * Added benchmarks and more meaningful example.
> > * Grammar fixes.
> > 
> >  doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
> >  1 file changed, 314 insertions(+)
> >  create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
> > 
> > diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> > new file mode 100644
> > index 000000000..85a61462a
> > --- /dev/null
> > +++ b/doc/rfc/5442-luajit-memory-profiler.md
> > @@ -0,0 +1,314 @@
> 
> <snipped>
> 
> > +### Prerequisites
> > +
> > +This section describes additional changes in LuaJIT required for the feature
> > +implementation. This version of LuaJIT memory profiler does not support verbose
> > +reporting allocations from traces. All allocation from traces are reported as
> 
> Typo: s/reporting allocations from/reporting for allocations made on/.

Fixed, thanks!

> 
> > +internal. But trace code semantics should be totally the same as for the Lua
> > +interpreter (excluding sink optimizations). Also all deallocations reported as
> 
> Typo: s/deallocations reported/deallocation are reported/.

Fixed, thanks!

> 
> > +internal too.
> > +
> > +There are two different representations of functions in LuaJIT: the function's
> > +prototype (`GCproto`) and the function object so called closure (`GCfunc`).
> > +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
> > +correspondingly. Also LuaJIT has a special function's type aka Fast Function.
> 
> Typo: s/correspondingly/respectively/.
> 
> > +It is used for LuaJIT builtins.
> 
> It's better to not split this sentence. Consider the rewording:
> | Besides LuaJIT has a special function type a.k.a. Fast Function that
> | is used for LuaJIT builtins.

Applied! Thanks!

> 
> > +
> 
> <snipped>
> 
> > +Usually developers are not interested in information about allocations inside
> > +builtins. So if fast function was called from a Lua function all
> > +allocations are attributed to this Lua function. Otherwise attribute this event
> > +to a C function.
> 
> I propose the following rewording:
> | Lua developers can do nothing with allocations made inside the
> | builtins except reducing its usage. So if fast function is called from
> | a Lua function all allocations made in its scope are attributed to this
> | Lua function (i.e. the builtin caller). Otherwise this event is
> | attributed to a C function.
> 

Applied, thanks!

> > +
> 
> <snipped>
> 
> > +If one run the chunk above the profiler reports approximately the following
> 
> Typo: s/run/runs/.

Fixed.

> 
> > +(see legend [here](#reading-and-displaying-saved-data)):
> 
> <snipped>
> 
> > +So we need to know a type of function being executed by the virtual machine
> > +(VM). Currently VM state identifies C function execution only, so Fast and Lua
> > +functions states will be added.
> 
> Typo: s/will be/are/.

Sure, thanks!

> 
> > +
> > +To determine currently allocating coroutine (that may not be equal to currently
> > +executed one) a new field called `mem_L` is added to `global_State` structure
> > +to keep the coroutine address. This field is set at each reallocation to
> 
> Typo: /at each reallocation to/on each reallocation to the/.

Fixed.

> 
> > +corresponding `L` with which it was called.
> 
> Typo: s/it was/it is/.

Thanks, fixed!

> 
> > +
> 
> <snipped>
> 
> > +When the profiling is stopped the `fclose()` is called. If it is impossible to
> 
> Typo: s/the `fclose()`/`fclose()`/.

Fixed.

> 
> > +open a file for writing or profiler fails to start, returns `nil` on failure
> 
> Typo: s/returns `nil`/`nil` is returned/.

Fixed.

> 
> > +(plus an error message as a second result and a system-dependent error code as
> > +a third result). Otherwise returns some true value.
> 
> It would be nice to mention that the function contract is similar to
> other standart io.* interfaces.
> 
> I glanced the source code: it's not "some" true value; it is exactly the
> *true* value.

All right! Fixed.

> 
> > +
> 
> <snipped>
> 
> > +Memory profiler is expected to be thread safe, so it has a corresponding
> > +lock/unlock at internal mutex whenever you call corresponding memprof
> > +functions. If you want to build LuaJIT without thread safety use
> > +`-DLUAJIT_DISABLE_THREAD_SAFE`.
> 
> This is not implemented in scope of the MVP, so drop this part.

Done.

> 
> > +
> > +### Reading and displaying saved data
> > +
> > +Binary data can be read by `lj-parse-memprof` utility. It parses the binary
> 
> Typo: s/lj-parse-memprof/luajit-parse-memprof/.

Fixed, thanks!

> 
> > +format provided by memory profiler and render it on human-readable format.
> 
> Typo: s/it on/it to/.

Fixed, thanks!

> 
> > +
> 
> <snipped>
> 
> > +This table shows performance deviation in relation to REFerence value (before
> > +commit) with stopped and running profiler. The table shows the average value
> > +for 11 runs. The first field of the column indicates the change in the average
> > +time in seconds (less is better). The second field is the standard deviation
> > +for the found difference.
> > +
> > +```
> > +     Name       | REF  | AFTER, memprof off | AFTER, memprof on
> > +----------------+------+--------------------+------------------
> > +array3d         | 0.21 |    +0.00 (0.01)    |    +0.00 (0.01)
> > +binary-trees    | 3.25 |    -0.01 (0.06)    |    +0.53 (0.10)
> > +chameneos       | 2.97 |    +0.14 (0.04)    |    +0.13 (0.06)
> > +coroutine-ring  | 1.00 |    +0.01 (0.04)    |    +0.01 (0.04)
> > +euler14-bit     | 1.03 |    +0.01 (0.02)    |    +0.00 (0.02)
> > +fannkuch        | 6.81 |    -0.21 (0.06)    |    -0.20 (0.06)
> > +fasta           | 8.20 |    -0.07 (0.05)    |    -0.08 (0.03)
> 
> Side note: Still curious how this can happen. It looks OK when this is
> negative difference in within its deviation. But this is sorta magic.

Yes, me too. Unfortunately, we have neither any benchmark tests nor
performance analisis for LuaJIT for now.

> 
> > +life            | 0.46 |    +0.00 (0.01)    |    +0.35 (0.01)
> > +mandelbrot      | 2.65 |    +0.00 (0.01)    |    +0.01 (0.01)
> > +mandelbrot-bit  | 1.97 |    +0.00 (0.01)    |    +0.01 (0.02)
> > +md5             | 1.58 |    -0.01 (0.04)    |    -0.04 (0.04)
> > +nbody           | 1.34 |    +0.00 (0.01)    |    -0.02 (0.01)
> > +nsieve          | 2.07 |    -0.03 (0.03)    |    -0.01 (0.04)
> > +nsieve-bit      | 1.50 |    -0.02 (0.04)    |    +0.00 (0.04)
> > +nsieve-bit-fp   | 4.44 |    -0.03 (0.07)    |    -0.01 (0.07)
> > +partialsums     | 0.54 |    +0.00 (0.01)    |    +0.00 (0.01)
> > +pidigits-nogmp  | 3.47 |    -0.01 (0.02)    |    -0.10 (0.02)
> > +ray             | 1.62 |    -0.02 (0.03)    |    +0.00 (0.02)
> > +recursive-ack   | 0.20 |    +0.00 (0.01)    |    +0.00 (0.01)
> > +recursive-fib   | 1.63 |    +0.00 (0.01)    |    +0.01 (0.02)
> > +scimark-fft     | 5.72 |    +0.06 (0.09)    |    -0.01 (0.10)
> > +scimark-lu      | 3.47 |    +0.02 (0.27)    |    -0.03 (0.26)
> > +scimark-sor     | 2.34 |    +0.00 (0.01)    |    -0.01 (0.01)
> > +scimark-sparse  | 4.95 |    -0.02 (0.04)    |    -0.02 (0.04)
> > +series          | 0.95 |    +0.00 (0.02)    |    +0.00 (0.01)
> > +spectral-norm   | 0.96 |    +0.00 (0.02)    |    -0.01 (0.02)
> > +```
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

===================================================================
diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
index 85a61462a..2721f1cc1 100644
--- a/doc/rfc/5442-luajit-memory-profiler.md
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be divided into several parts:
 
 This section describes additional changes in LuaJIT required for the feature
 implementation. This version of LuaJIT memory profiler does not support verbose
-reporting allocations from traces. All allocation from traces are reported as
-internal. But trace code semantics should be totally the same as for the Lua
-interpreter (excluding sink optimizations). Also all deallocations reported as
-internal too.
+reporting for allocations made on traces. All allocation from traces are
+reported as internal. But trace code semantics should be totally the same as
+for the Lua interpreter (excluding sink optimizations). Also all, deallocations
+are reported as internal too.
 
 There are two different representations of functions in LuaJIT: the function's
 prototype (`GCproto`) and the function object so called closure (`GCfunc`).
 The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
-correspondingly. Also LuaJIT has a special function's type aka Fast Function.
-It is used for LuaJIT builtins.
+respectively. Besides LuaJIT has a special function type, a.k.a. Fast Function
+that is used for LuaJIT built-ins
 
 Tail call optimization does not create a new call frame, so all allocations
 inside the function called via `CALLT`/`CALLMT` are attributed to its caller.
 
-Usually developers are not interested in information about allocations inside
-builtins. So if fast function was called from a Lua function all
-allocations are attributed to this Lua function. Otherwise attribute this event
-to a C function.
+Lua developers can do nothing with allocations made inside the built-ins except
+reducing its usage. So if fast function is called from a Lua function all
+allocations made in its scope are attributed to this Lua function (i.e. the
+built-in caller). Otherwise, this event is attributed to a C function.
 
 Assume we have the following Lua chunk named <test.lua>:
 
-```
+```lua
 1  jit.off()
 2  misc.memprof.start("memprof_new.bin")
-3  -- Lua does not create a new frame to call string.rep and all allocations are
-4  -- attributed not to `append()` function but to the parent scope.
+3  -- Lua does not create a new frame to call string.rep() and all allocations
+4  -- are attributed not to append() function but to the parent scope.
 5  local function append(str, rep)
 6    return string.rep(str, rep)
 7  end
 8
 9  local t = {}
 10 for _ = 1, 1e5 do
-11   -- table.insert is a builtin and all corresponding allocations
+11   -- table.insert() is a built-in and all corresponding allocations
 12   -- are reported in the scope of main chunk
 13   table.insert(t,
 14     append('q', _)
@@ -71,7 +71,7 @@ Assume we have the following Lua chunk named <test.lua>:
 17 misc.memprof.stop()
 ```
 
-If one run the chunk above the profiler reports approximately the following
+If one runs the chunk above the profiler reports approximately the following
 (see legend [here](#reading-and-displaying-saved-data)):
 ```
 ALLOCATIONS
@@ -99,15 +99,15 @@ INTERNAL: 20    0       1481
 
 So we need to know a type of function being executed by the virtual machine
 (VM). Currently VM state identifies C function execution only, so Fast and Lua
-functions states will be added.
+functions states are added.
 
 To determine currently allocating coroutine (that may not be equal to currently
 executed one) a new field called `mem_L` is added to `global_State` structure
-to keep the coroutine address. This field is set at each reallocation to
-corresponding `L` with which it was called.
+to keep the coroutine address. This field is set on each reallocation to the
+corresponding `L` with which it is called.
 
 There is a static function (`lj_debug_getframeline`) that returns line number
-for current `BCPos` in `lj_debug.c` already. It will be added to the debug
+for current `BCPos` in `lj_debug.c` already. It is added to the debug
 module API to be used in memory profiler.
 
 ### Information recording
@@ -211,10 +211,11 @@ local started, err, errno = misc.memprof.start(fname)
 ```
 where `fname` is name of the file where profile events are written. Writer for
 this function perform `fwrite()` for each call retrying in case of `EINTR`.
-When the profiling is stopped the `fclose()` is called. If it is impossible to
-open a file for writing or profiler fails to start, returns `nil` on failure
+When the profiling is stopped `fclose()` is called. The profiler's function's
+contract is similar to standard `io.*` interfaces. If it is impossible to open
+a file for writing or profiler fails to start, `nil` is returned on failure
 (plus an error message as a second result and a system-dependent error code as
-a third result). Otherwise returns some true value.
+a third result). Otherwise, returns `true` value.
 
 Stopping profiler from Lua is simple too:
 ```lua
@@ -230,17 +231,12 @@ If you want to build LuaJIT without memory profiler, you should build it with
 `-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and
 `misc.memprof.stop()` always return `false`.
 
-Memory profiler is expected to be thread safe, so it has a corresponding
-lock/unlock at internal mutex whenever you call corresponding memprof
-functions. If you want to build LuaJIT without thread safety use
-`-DLUAJIT_DISABLE_THREAD_SAFE`.
-
 ### Reading and displaying saved data
 
-Binary data can be read by `lj-parse-memprof` utility. It parses the binary
-format provided by memory profiler and render it on human-readable format.
+Binary data can be read by `luajit-parse-memprof` utility. It parses the binary
+format provided by memory profiler and render it to human-readable format.
 
-The usage is very simple:
+The usage for LuaJIT itself is very simple:
 ```
 $ ./luajit-parse-memprof --help
 luajit-parse-memprof - parser of the memory usage profile collected
@@ -266,6 +262,12 @@ structures. Note that events are sorted from the most often to the least.
 
 `Overrides` means what allocation this reallocation overrides.
 
+If you want to parse binary data via Tarantool only, use the following
+command (dash is important):
+```bash
+$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin
+```
+
 ## Benchmarks
 
 Benchmarks were taken from repo:
===================================================================

And one more iterative patch (over the previous one). Branch is
force pushed.
===================================================================
diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
index 2721f1cc1..f9c43f91f 100644
--- a/doc/rfc/5442-luajit-memory-profiler.md
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -5,7 +5,7 @@
 * **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org,
                Igor Munkin @igormunkin imun@tarantool.org,
                Sergey Ostanevich @sergos sergos@tarantool.org
-* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442)
+* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442), [#5490](https://github.com/tarantool/tarantool/issues/5490)
 
 ## Summary
===================================================================
-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
  2021-01-20  8:19   ` Sergey Kaplun via Tarantool-discussions
@ 2021-01-20 14:26     ` Sergey Ostanevich via Tarantool-discussions
  2021-01-20 14:57       ` Sergey Kaplun via Tarantool-discussions
  2021-01-21 18:41     ` Igor Munkin via Tarantool-discussions
  1 sibling, 1 reply; 7+ messages in thread
From: Sergey Ostanevich via Tarantool-discussions @ 2021-01-20 14:26 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-discussions

[-- Attachment #1: Type: text/plain, Size: 18266 bytes --]

Hi! 

Thanks for the patch, I've looked into 
https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md <https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md>

in ‘Prerequisites’: 
> Also all, deallocations are reported as internal too.

the comma is not needed

> Lua developers can do nothing with allocations made inside the built-ins except reducing its usage.

‘its’ doesn’t explain exact matter. I would rephrase: "As for allocations made inside the built-ins user can do nothing but reduce use of these built-ins."

> Currently VM state identifies C function execution only, so Fast and Lua functions states are added.

‘Currently’ -> ‘Originally’

Otherwise LGTM.
Sergos


> On 20 Jan 2021, at 11:19, Sergey Kaplun <skaplun@tarantool.org> wrote:
> 
> Hi, Igor!
> 
> Thanks for the review!
> 
> On 15.01.21, Igor Munkin wrote:
>> Sergey,
>> 
>> Thanks for the changes. There is a bit of nitpicking below and I
>> believe we'll push the next version doc to the trunk.
> 
> I've fixed all your comments, plus added some insignificant fixes.
> See two iterative patches below. Branch is force pushed.
> 
>> 
>> On 25.12.20, Sergey Kaplun wrote:
>>> Part of #5442
>>> ---
>>> 
>>> RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md <https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md>
> 
> Side note: branch name is updated.
> New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md <https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md>
> 
>>> 
>>> Changes in v3:
>>> * More comments in example.
>>> * More verbose benchmark information.
>>> * Grammar and spelling fixes.
>>> 
>>> Changes in v2:
>>> * Removed C API, Tarantool integration and description of additional
>>>  features -- they will be added in another RFC if necessary.
>>> * Removed checking profile is running from the public API.
>>> * Added benchmarks and more meaningful example.
>>> * Grammar fixes.
>>> 
>>> doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
>>> 1 file changed, 314 insertions(+)
>>> create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
>>> 
>>> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
>>> new file mode 100644
>>> index 000000000..85a61462a
>>> --- /dev/null
>>> +++ b/doc/rfc/5442-luajit-memory-profiler.md
>>> @@ -0,0 +1,314 @@
>> 
>> <snipped>
>> 
>>> +### Prerequisites
>>> +
>>> +This section describes additional changes in LuaJIT required for the feature
>>> +implementation. This version of LuaJIT memory profiler does not support verbose
>>> +reporting allocations from traces. All allocation from traces are reported as
>> 
>> Typo: s/reporting allocations from/reporting for allocations made on/.
> 
> Fixed, thanks!
> 
>> 
>>> +internal. But trace code semantics should be totally the same as for the Lua
>>> +interpreter (excluding sink optimizations). Also all deallocations reported as
>> 
>> Typo: s/deallocations reported/deallocation are reported/.
> 
> Fixed, thanks!
> 
>> 
>>> +internal too.
>>> +
>>> +There are two different representations of functions in LuaJIT: the function's
>>> +prototype (`GCproto`) and the function object so called closure (`GCfunc`).
>>> +The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
>>> +correspondingly. Also LuaJIT has a special function's type aka Fast Function.
>> 
>> Typo: s/correspondingly/respectively/.
>> 
>>> +It is used for LuaJIT builtins.
>> 
>> It's better to not split this sentence. Consider the rewording:
>> | Besides LuaJIT has a special function type a.k.a. Fast Function that
>> | is used for LuaJIT builtins.
> 
> Applied! Thanks!
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +Usually developers are not interested in information about allocations inside
>>> +builtins. So if fast function was called from a Lua function all
>>> +allocations are attributed to this Lua function. Otherwise attribute this event
>>> +to a C function.
>> 
>> I propose the following rewording:
>> | Lua developers can do nothing with allocations made inside the
>> | builtins except reducing its usage. So if fast function is called from
>> | a Lua function all allocations made in its scope are attributed to this
>> | Lua function (i.e. the builtin caller). Otherwise this event is
>> | attributed to a C function.
>> 
> 
> Applied, thanks!
> 
>>> +
>> 
>> <snipped>
>> 
>>> +If one run the chunk above the profiler reports approximately the following
>> 
>> Typo: s/run/runs/.
> 
> Fixed.
> 
>> 
>>> +(see legend [here](#reading-and-displaying-saved-data)):
>> 
>> <snipped>
>> 
>>> +So we need to know a type of function being executed by the virtual machine
>>> +(VM). Currently VM state identifies C function execution only, so Fast and Lua
>>> +functions states will be added.
>> 
>> Typo: s/will be/are/.
> 
> Sure, thanks!
> 
>> 
>>> +
>>> +To determine currently allocating coroutine (that may not be equal to currently
>>> +executed one) a new field called `mem_L` is added to `global_State` structure
>>> +to keep the coroutine address. This field is set at each reallocation to
>> 
>> Typo: /at each reallocation to/on each reallocation to the/.
> 
> Fixed.
> 
>> 
>>> +corresponding `L` with which it was called.
>> 
>> Typo: s/it was/it is/.
> 
> Thanks, fixed!
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +When the profiling is stopped the `fclose()` is called. If it is impossible to
>> 
>> Typo: s/the `fclose()`/`fclose()`/.
> 
> Fixed.
> 
>> 
>>> +open a file for writing or profiler fails to start, returns `nil` on failure
>> 
>> Typo: s/returns `nil`/`nil` is returned/.
> 
> Fixed.
> 
>> 
>>> +(plus an error message as a second result and a system-dependent error code as
>>> +a third result). Otherwise returns some true value.
>> 
>> It would be nice to mention that the function contract is similar to
>> other standart io.* interfaces.
>> 
>> I glanced the source code: it's not "some" true value; it is exactly the
>> *true* value.
> 
> All right! Fixed.
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +Memory profiler is expected to be thread safe, so it has a corresponding
>>> +lock/unlock at internal mutex whenever you call corresponding memprof
>>> +functions. If you want to build LuaJIT without thread safety use
>>> +`-DLUAJIT_DISABLE_THREAD_SAFE`.
>> 
>> This is not implemented in scope of the MVP, so drop this part.
> 
> Done.
> 
>> 
>>> +
>>> +### Reading and displaying saved data
>>> +
>>> +Binary data can be read by `lj-parse-memprof` utility. It parses the binary
>> 
>> Typo: s/lj-parse-memprof/luajit-parse-memprof/.
> 
> Fixed, thanks!
> 
>> 
>>> +format provided by memory profiler and render it on human-readable format.
>> 
>> Typo: s/it on/it to/.
> 
> Fixed, thanks!
> 
>> 
>>> +
>> 
>> <snipped>
>> 
>>> +This table shows performance deviation in relation to REFerence value (before
>>> +commit) with stopped and running profiler. The table shows the average value
>>> +for 11 runs. The first field of the column indicates the change in the average
>>> +time in seconds (less is better). The second field is the standard deviation
>>> +for the found difference.
>>> +
>>> +```
>>> +     Name       | REF  | AFTER, memprof off | AFTER, memprof on
>>> +----------------+------+--------------------+------------------
>>> +array3d         | 0.21 |    +0.00 (0.01)    |    +0.00 (0.01)
>>> +binary-trees    | 3.25 |    -0.01 (0.06)    |    +0.53 (0.10)
>>> +chameneos       | 2.97 |    +0.14 (0.04)    |    +0.13 (0.06)
>>> +coroutine-ring  | 1.00 |    +0.01 (0.04)    |    +0.01 (0.04)
>>> +euler14-bit     | 1.03 |    +0.01 (0.02)    |    +0.00 (0.02)
>>> +fannkuch        | 6.81 |    -0.21 (0.06)    |    -0.20 (0.06)
>>> +fasta           | 8.20 |    -0.07 (0.05)    |    -0.08 (0.03)
>> 
>> Side note: Still curious how this can happen. It looks OK when this is
>> negative difference in within its deviation. But this is sorta magic.
> 
> Yes, me too. Unfortunately, we have neither any benchmark tests nor
> performance analisis for LuaJIT for now.
> 
>> 
>>> +life            | 0.46 |    +0.00 (0.01)    |    +0.35 (0.01)
>>> +mandelbrot      | 2.65 |    +0.00 (0.01)    |    +0.01 (0.01)
>>> +mandelbrot-bit  | 1.97 |    +0.00 (0.01)    |    +0.01 (0.02)
>>> +md5             | 1.58 |    -0.01 (0.04)    |    -0.04 (0.04)
>>> +nbody           | 1.34 |    +0.00 (0.01)    |    -0.02 (0.01)
>>> +nsieve          | 2.07 |    -0.03 (0.03)    |    -0.01 (0.04)
>>> +nsieve-bit      | 1.50 |    -0.02 (0.04)    |    +0.00 (0.04)
>>> +nsieve-bit-fp   | 4.44 |    -0.03 (0.07)    |    -0.01 (0.07)
>>> +partialsums     | 0.54 |    +0.00 (0.01)    |    +0.00 (0.01)
>>> +pidigits-nogmp  | 3.47 |    -0.01 (0.02)    |    -0.10 (0.02)
>>> +ray             | 1.62 |    -0.02 (0.03)    |    +0.00 (0.02)
>>> +recursive-ack   | 0.20 |    +0.00 (0.01)    |    +0.00 (0.01)
>>> +recursive-fib   | 1.63 |    +0.00 (0.01)    |    +0.01 (0.02)
>>> +scimark-fft     | 5.72 |    +0.06 (0.09)    |    -0.01 (0.10)
>>> +scimark-lu      | 3.47 |    +0.02 (0.27)    |    -0.03 (0.26)
>>> +scimark-sor     | 2.34 |    +0.00 (0.01)    |    -0.01 (0.01)
>>> +scimark-sparse  | 4.95 |    -0.02 (0.04)    |    -0.02 (0.04)
>>> +series          | 0.95 |    +0.00 (0.02)    |    +0.00 (0.01)
>>> +spectral-norm   | 0.96 |    +0.00 (0.02)    |    -0.01 (0.02)
>>> +```
>>> -- 
>>> 2.28.0
>>> 
>> 
>> -- 
>> Best regards,
>> IM
> 
> ===================================================================
> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> index 85a61462a..2721f1cc1 100644
> --- a/doc/rfc/5442-luajit-memory-profiler.md
> +++ b/doc/rfc/5442-luajit-memory-profiler.md
> @@ -30,39 +30,39 @@ The whole toolchain of memory profiling will be divided into several parts:
> 
> This section describes additional changes in LuaJIT required for the feature
> implementation. This version of LuaJIT memory profiler does not support verbose
> -reporting allocations from traces. All allocation from traces are reported as
> -internal. But trace code semantics should be totally the same as for the Lua
> -interpreter (excluding sink optimizations). Also all deallocations reported as
> -internal too.
> +reporting for allocations made on traces. All allocation from traces are
> +reported as internal. But trace code semantics should be totally the same as
> +for the Lua interpreter (excluding sink optimizations). Also all, deallocations
> +are reported as internal too.
> 
> There are two different representations of functions in LuaJIT: the function's
> prototype (`GCproto`) and the function object so called closure (`GCfunc`).
> The closures are represented as `GCfuncL` and `GCfuncC` for Lua and C closures
> -correspondingly. Also LuaJIT has a special function's type aka Fast Function.
> -It is used for LuaJIT builtins.
> +respectively. Besides LuaJIT has a special function type, a.k.a. Fast Function
> +that is used for LuaJIT built-ins
> 
> Tail call optimization does not create a new call frame, so all allocations
> inside the function called via `CALLT`/`CALLMT` are attributed to its caller.
> 
> -Usually developers are not interested in information about allocations inside
> -builtins. So if fast function was called from a Lua function all
> -allocations are attributed to this Lua function. Otherwise attribute this event
> -to a C function.
> +Lua developers can do nothing with allocations made inside the built-ins except
> +reducing its usage. So if fast function is called from a Lua function all
> +allocations made in its scope are attributed to this Lua function (i.e. the
> +built-in caller). Otherwise, this event is attributed to a C function.
> 
> Assume we have the following Lua chunk named <test.lua>:
> 
> -```
> +```lua
> 1  jit.off()
> 2  misc.memprof.start("memprof_new.bin")
> -3  -- Lua does not create a new frame to call string.rep and all allocations are
> -4  -- attributed not to `append()` function but to the parent scope.
> +3  -- Lua does not create a new frame to call string.rep() and all allocations
> +4  -- are attributed not to append() function but to the parent scope.
> 5  local function append(str, rep)
> 6    return string.rep(str, rep)
> 7  end
> 8
> 9  local t = {}
> 10 for _ = 1, 1e5 do
> -11   -- table.insert is a builtin and all corresponding allocations
> +11   -- table.insert() is a built-in and all corresponding allocations
> 12   -- are reported in the scope of main chunk
> 13   table.insert(t,
> 14     append('q', _)
> @@ -71,7 +71,7 @@ Assume we have the following Lua chunk named <test.lua>:
> 17 misc.memprof.stop()
> ```
> 
> -If one run the chunk above the profiler reports approximately the following
> +If one runs the chunk above the profiler reports approximately the following
> (see legend [here](#reading-and-displaying-saved-data)):
> ```
> ALLOCATIONS
> @@ -99,15 +99,15 @@ INTERNAL: 20    0       1481
> 
> So we need to know a type of function being executed by the virtual machine
> (VM). Currently VM state identifies C function execution only, so Fast and Lua
> -functions states will be added.
> +functions states are added.
> 
> To determine currently allocating coroutine (that may not be equal to currently
> executed one) a new field called `mem_L` is added to `global_State` structure
> -to keep the coroutine address. This field is set at each reallocation to
> -corresponding `L` with which it was called.
> +to keep the coroutine address. This field is set on each reallocation to the
> +corresponding `L` with which it is called.
> 
> There is a static function (`lj_debug_getframeline`) that returns line number
> -for current `BCPos` in `lj_debug.c` already. It will be added to the debug
> +for current `BCPos` in `lj_debug.c` already. It is added to the debug
> module API to be used in memory profiler.
> 
> ### Information recording
> @@ -211,10 +211,11 @@ local started, err, errno = misc.memprof.start(fname)
> ```
> where `fname` is name of the file where profile events are written. Writer for
> this function perform `fwrite()` for each call retrying in case of `EINTR`.
> -When the profiling is stopped the `fclose()` is called. If it is impossible to
> -open a file for writing or profiler fails to start, returns `nil` on failure
> +When the profiling is stopped `fclose()` is called. The profiler's function's
> +contract is similar to standard `io.*` interfaces. If it is impossible to open
> +a file for writing or profiler fails to start, `nil` is returned on failure
> (plus an error message as a second result and a system-dependent error code as
> -a third result). Otherwise returns some true value.
> +a third result). Otherwise, returns `true` value.
> 
> Stopping profiler from Lua is simple too:
> ```lua
> @@ -230,17 +231,12 @@ If you want to build LuaJIT without memory profiler, you should build it with
> `-DLUAJIT_DISABLE_MEMPROF`. If it is disabled `misc.memprof.start()` and
> `misc.memprof.stop()` always return `false`.
> 
> -Memory profiler is expected to be thread safe, so it has a corresponding
> -lock/unlock at internal mutex whenever you call corresponding memprof
> -functions. If you want to build LuaJIT without thread safety use
> -`-DLUAJIT_DISABLE_THREAD_SAFE`.
> -
> ### Reading and displaying saved data
> 
> -Binary data can be read by `lj-parse-memprof` utility. It parses the binary
> -format provided by memory profiler and render it on human-readable format.
> +Binary data can be read by `luajit-parse-memprof` utility. It parses the binary
> +format provided by memory profiler and render it to human-readable format.
> 
> -The usage is very simple:
> +The usage for LuaJIT itself is very simple:
> ```
> $ ./luajit-parse-memprof --help
> luajit-parse-memprof - parser of the memory usage profile collected
> @@ -266,6 +262,12 @@ structures. Note that events are sorted from the most often to the least.
> 
> `Overrides` means what allocation this reallocation overrides.
> 
> +If you want to parse binary data via Tarantool only, use the following
> +command (dash is important):
> +```bash
> +$ tarantool -e 'require("memprof")(arg[1])' - memprof.bin
> +```
> +
> ## Benchmarks
> 
> Benchmarks were taken from repo:
> ===================================================================
> 
> And one more iterative patch (over the previous one). Branch is
> force pushed.
> ===================================================================
> diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
> index 2721f1cc1..f9c43f91f 100644
> --- a/doc/rfc/5442-luajit-memory-profiler.md
> +++ b/doc/rfc/5442-luajit-memory-profiler.md
> @@ -5,7 +5,7 @@
> * **Authors**: Sergey Kaplun @Buristan skaplun@tarantool.org <mailto:skaplun@tarantool.org>,
>                Igor Munkin @igormunkin imun@tarantool.org <mailto:imun@tarantool.org>,
>                Sergey Ostanevich @sergos sergos@tarantool.org <mailto:sergos@tarantool.org>
> -* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442 <https://github.com/tarantool/tarantool/issues/5442>)
> +* **Issues**: [#5442](https://github.com/tarantool/tarantool/issues/5442 <https://github.com/tarantool/tarantool/issues/5442>), [#5490](https://github.com/tarantool/tarantool/issues/5490 <https://github.com/tarantool/tarantool/issues/5490>)
> 
> ## Summary
> ===================================================================
> -- 
> Best regards,
> Sergey Kaplun


[-- Attachment #2: Type: text/html, Size: 174277 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
  2021-01-20 14:26     ` Sergey Ostanevich via Tarantool-discussions
@ 2021-01-20 14:57       ` Sergey Kaplun via Tarantool-discussions
  0 siblings, 0 replies; 7+ messages in thread
From: Sergey Kaplun via Tarantool-discussions @ 2021-01-20 14:57 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-discussions

Hi, Sergos!

Thanks, for the review!

On 20.01.21, Sergey Ostanevich wrote:
> Hi! 
> 
> Thanks for the patch, I've looked into 
> https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md <https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md>
> 
> in ‘Prerequisites’: 
> > Also all, deallocations are reported as internal too.
> 
> the comma is not needed

Indeed! Fixed!

> 
> > Lua developers can do nothing with allocations made inside the built-ins except reducing its usage.
> 
> ‘its’ doesn’t explain exact matter. I would rephrase: "As for allocations made inside the built-ins user can do nothing but reduce use of these built-ins."

Thanks, applied!

> 
> > Currently VM state identifies C function execution only, so Fast and Lua functions states are added.
> 
> ‘Currently’ -> ‘Originally’

Fixed, thanks!

See the iterative patch below. Branch is force pushed.

> 
> Otherwise LGTM.
> Sergos
> 

<snipped>

> 

===================================================================
diff --git a/doc/rfc/5442-luajit-memory-profiler.md b/doc/rfc/5442-luajit-memory-profiler.md
index f9c43f91f..cb8adab79 100644
--- a/doc/rfc/5442-luajit-memory-profiler.md
+++ b/doc/rfc/5442-luajit-memory-profiler.md
@@ -32,7 +32,7 @@ This section describes additional changes in LuaJIT required for the feature
 implementation. This version of LuaJIT memory profiler does not support verbose
 reporting for allocations made on traces. All allocation from traces are
 reported as internal. But trace code semantics should be totally the same as
-for the Lua interpreter (excluding sink optimizations). Also all, deallocations
+for the Lua interpreter (excluding sink optimizations). Also all deallocations
 are reported as internal too.
 
 There are two different representations of functions in LuaJIT: the function's
@@ -44,8 +44,8 @@ that is used for LuaJIT built-ins
 Tail call optimization does not create a new call frame, so all allocations
 inside the function called via `CALLT`/`CALLMT` are attributed to its caller.
 
-Lua developers can do nothing with allocations made inside the built-ins except
-reducing its usage. So if fast function is called from a Lua function all
+As for allocations made inside the built-ins user can do nothing but reduce use
+of these built-ins. So if fast function is called from a Lua function all
 allocations made in its scope are attributed to this Lua function (i.e. the
 built-in caller). Otherwise, this event is attributed to a C function.
 
@@ -98,7 +98,7 @@ INTERNAL: 20    0       1481
 ```
 
 So we need to know a type of function being executed by the virtual machine
-(VM). Currently VM state identifies C function execution only, so Fast and Lua
+(VM). Originally VM state identifies C function execution only, so Fast and Lua
 functions states are added.
 
 To determine currently allocating coroutine (that may not be equal to currently
===================================================================

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
  2021-01-20  8:19   ` Sergey Kaplun via Tarantool-discussions
  2021-01-20 14:26     ` Sergey Ostanevich via Tarantool-discussions
@ 2021-01-21 18:41     ` Igor Munkin via Tarantool-discussions
  1 sibling, 0 replies; 7+ messages in thread
From: Igor Munkin via Tarantool-discussions @ 2021-01-21 18:41 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-discussions

Sergey,

Thanks for the fixes, LGTM.

On 20.01.21, Sergey Kaplun wrote:
> Hi, Igor!
> 
> Thanks for the review!
> 
> On 15.01.21, Igor Munkin wrote:
> > Sergey,
> > 
> > Thanks for the changes. There is a bit of nitpicking below and I
> > believe we'll push the next version doc to the trunk.
> 
> I've fixed all your comments, plus added some insignificant fixes.
> See two iterative patches below. Branch is force pushed.

Great, thanks! I also changed the commit subject to the following:
| rfc: describe a LuaJIT memory profiler toolchain

> 
> > 
> > On 25.12.20, Sergey Kaplun wrote:
> > > Part of #5442
> > > ---
> > > 
> > > RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md
> 
> Side note: branch name is updated.
> New RFC version: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler-rfc/doc/rfc/5442-luajit-memory-profiler.md
> 
> > > 
> > > Changes in v3:
> > > * More comments in example.
> > > * More verbose benchmark information.
> > > * Grammar and spelling fixes.
> > > 
> > > Changes in v2:
> > > * Removed C API, Tarantool integration and description of additional
> > >   features -- they will be added in another RFC if necessary.
> > > * Removed checking profile is running from the public API.
> > > * Added benchmarks and more meaningful example.
> > > * Grammar fixes.
> > > 
> > >  doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
> > >  1 file changed, 314 insertions(+)
> > >  create mode 100644 doc/rfc/5442-luajit-memory-profiler.md
> > > 

<snipped>

> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler
  2020-12-25 11:34 [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler Sergey Kaplun
  2021-01-15 13:14 ` Igor Munkin via Tarantool-discussions
@ 2021-01-21 18:42 ` Igor Munkin via Tarantool-discussions
  1 sibling, 0 replies; 7+ messages in thread
From: Igor Munkin via Tarantool-discussions @ 2021-01-21 18:42 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-discussions

Sergey,

On 25.12.20, Sergey Kaplun wrote:
> Part of #5442
> ---
> 
> RFC on branch: https://github.com/tarantool/tarantool/blob/skaplun/gh-5442-luajit-memory-profiler/doc/rfc/5442-luajit-memory-profiler.md
> 
> Changes in v3:
> * More comments in example.
> * More verbose benchmark information.
> * Grammar and spelling fixes.
> 
> Changes in v2:
> * Removed C API, Tarantool integration and description of additional
>   features -- they will be added in another RFC if necessary.
> * Removed checking profile is running from the public API.
> * Added benchmarks and more meaningful example.
> * Grammar fixes.
> 
>  doc/rfc/5442-luajit-memory-profiler.md | 314 +++++++++++++++++++++++++
>  1 file changed, 314 insertions(+)
>  create mode 100644 doc/rfc/5442-luajit-memory-profiler.md

I've checked your patch into 2.7 and master.

> 

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-21 18:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25 11:34 [Tarantool-discussions] [RFC luajit v3] rfc: describe a LuaJIT memory profiler Sergey Kaplun
2021-01-15 13:14 ` Igor Munkin via Tarantool-discussions
2021-01-20  8:19   ` Sergey Kaplun via Tarantool-discussions
2021-01-20 14:26     ` Sergey Ostanevich via Tarantool-discussions
2021-01-20 14:57       ` Sergey Kaplun via Tarantool-discussions
2021-01-21 18:41     ` Igor Munkin via Tarantool-discussions
2021-01-21 18:42 ` Igor Munkin via Tarantool-discussions

Tarantool discussions archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-discussions/0 tarantool-discussions/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-discussions tarantool-discussions/ https://lists.tarantool.org/tarantool-discussions \
		tarantool-discussions@dev.tarantool.org.
	public-inbox-index tarantool-discussions

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git