[Tarantool-patches] [RFC v4] rfc: luajit metrics

Thu Oct 8 22:29:54 MSK 2020

Hi, Igor! Thanks for the review!

On 08.10.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the updates! Considering the changes in the follow-up reply,
> there are still several comments below.
> 
> On 05.10.20, Sergey Kaplun wrote:
> > Part of #5187
> > ---
> > 
> > This patch adds RFC to LuaJIT metrics interfaces. Nevertheless name
> > `misc` for builtin library is not good and should be discussed, because
> > tons of user modules can use that name for their own libraries.
> > 
> > Branch: https://github.com/tarantool/tarantool/tree/skaplun/5187-luajit-metrics
> > Issue: https://github.com/tarantool/tarantool/issues/5187
> > 
> > Changes in v2:
> > - Fixed typos
> > - Made comments more verbose
> > - Avoided flushing any of metrics after each call of luaM_metrics()
> > Changes in v3:
> > - Added colors count metrics description
> > - Added description about how metrics are collected
> > - Added benchmarks
> > Changes in v3:
> > - Removed colors count metrics
> > - Added code block language
> > - Added how to use section
> 
> Minor: ChangeLog is misordered (the latest changes are the first entry).

Thanks!

Forgot to add ChangeLog to patch:
@ChangeLog:
 * Add Lua and C API for LuaJIT platform metrics about:
   - overall amount of allocated tables, cdata and udata objects
   - number of incremental GC steps grouped by GC state
   - number of string hashes hits and misses
   - amount of allocated and freed memory
   - number of trace aborts, number of traces and restored snapshots

> 
> > 
> >  doc/rfc/5187-luajit-metrics.md | 299 +++++++++++++++++++++++++++++++++
> >  1 file changed, 299 insertions(+)
> >  create mode 100644 doc/rfc/5187-luajit-metrics.md
> > 
> > diff --git a/doc/rfc/5187-luajit-metrics.md b/doc/rfc/5187-luajit-metrics.md
> > new file mode 100644
> > index 000000000..02f5b559f
> > --- /dev/null
> > +++ b/doc/rfc/5187-luajit-metrics.md
> > @@ -0,0 +1,299 @@
> 
> <snipped>
> 
> > +Couple of words about how metrics are collected:
> > +- `strhash_*` -- whenever existing string is returned after attempt to
> > +  create new string there is incremented `strhash_hit` counter, if new string
> > +  created then `strhash_miss` is incremented instead.
> 
> Minor: I propose to reword it the way similar you've updated it in the
> luam_Metrics comments.

Yep, sure!

> 
> > +- `gc_*num`, `jit_trace_num` -- corresponding counter incremented whenever new
> 
> Typo: s/whenever new/whenever a new/.

Hawkeye!

> 
> > +  object is allocated. When object become garbage collected its counter is
> 
> Typo: s/become garbage collected/is collected by GC/.

Thanks! Fixed.

> 
> > +  decremented.
> > +- `gc_total`, `gc_allocated`, `gc_freed` -- any time when allocation function
> > +  is called `gc_allocated` and/or `gc_freed` is increased and `gc_total`
> > +  increase when memory is allocated or reallocated, decrease when memory is
> 
> Typo: s/increase/is increased/.
> Typo: s/decrease/is decreased/.

Thanks! Fixed.

> 
> > +  freed.
> > +- `gc_steps_*` -- corresponding counter increments whenever Garbage Collector
> 
> Typo: s/increments/is incremented/.

Thanks!

> 
> > +  starts to execute 1 step of garbage collection.
> 
> Minor: I propose s/1/an incremental/.

No problem!

> 
> > +- `jit_snap_restore` -- whenever JIT machine exits from the trace and restores
> > +  interpreter state `jit_snap_restore` counter is incremented.
> > +- `jit_trace_abort` -- whenever JIT compiler can't record the trace in case NYI
> > +  BC this counter is incremented.
> 
> Minor: NYI relates also to builtins, not only to bytecodes.

Ok. Fixed.

> 
> > +- `jit_mcode_size` -- whenever new MCode area is allocated `jit_mcode_size` is
> > +  increased at corresponding size in bytes. Sets to 0 when all mcode area is
> > +  freed.
> 
> How does it change, when a trace is collected as a result of its flush?

It doesn't. IINM, this area will be reused later for other traces.
MCode area is linked with jit_State not with trace by itself. Trace just
reserve MCode area that needed.

> 
> > +
> > +All metrics are collected throughout the platform uptime. These metrics
> > +increase monotonically and can overflow:
> > +  - `strhash_hit`
> > +  - `strhash_miss`
> > +  - `gc_freed`
> > +  - `gc_allocated`,
> 
> Typo: Excess comma.

Hawkeye!

> 
> > +  - `gc_steps_pause`
> > +  - `gc_steps_propagate`
> > +  - `gc_steps_atomic`
> > +  - `gc_steps_sweepstring`
> > +  - `gc_steps_sweep`
> > +  - `gc_steps_finalize`
> > +  - `jit_snap_restore`
> > +  - `jit_trace_abort`
> 
> <snipped>
> 
> > +## How to use
> > +
> > +This section describes small example of metrics usage.
> > +
> > +For example amount of `strhash_misses` can be shown for tracking of new string
> > +objects allocations. For example if we add code like:
> > +```lua
> > +local function sharded_storage_func(storage_name, func_name)
> > +    return 'sharded_storage.storages.' .. storage_name .. '.' .. func_name
> > +end
> > +```
> > +increase in slope curve of `strhash_misses` means, that after your changes
> > +there are more new strings allocating at runtime. Of course slope curve of
> > +`strhash_misses` _should_ be less than slope curve of `strhash_hits`. If it
> > +is not, you should refactor your code.
> 
> Minor: I guess it's not a good idea to suggest 'code refactoring'. This
> section is good to describe the values being observed, so the first part
> about tilt angle is enough.

OK, dropped it!

> 
> > +
> > +Slope curves of `gc_freed` and `gc_allocated` can be used for analysis of GC
> > +pressure of your application (less is better).
> > +
> > +Also we can check some hacky optimization with these metrics. For example let's
> > +assume that we have this code snippet:
> > +```lua
> > +local old_metrics = misc.getmetrics()
> > +local t = {}
> > +for i = 1, 513 do
> > +    t[i] = i
> > +end
> > +local new_metrics = misc.getmetrics()
> > +local diff =  new_metrics.gc_allocated - old_metrics.gc_allocated

Found double space prior new_metrics.gc_allocated. Fixed.

> > +```
> > +`diff` equals to 18879 after running of this chunk.
> > +
> > +But if we change table initialization to
> > +```lua
> > +local table_new = require "table.new"
> > +local old_metrics = misc.getmetrics()
> > +local t = table_new(513,0)
> > +for i = 1, 513 do
> > +    t[i] = i
> > +end
> > +local new_metrics = misc.getmetrics()
> > +local diff =  new_metrics.gc_allocated - old_metrics.gc_allocated

Here too.

> > +```
> > +`diff` shows us only 5895.
> > +
> > +Slope curves of `gc_steps_*` can be used for tracking of GC pressure too. For
> > +long time observations you will see periodic increment for `gc_steps_*` metrics
> > +-- for example longer period of `gc_steps_atomic` increment is better. Also
> > +additional amount of `gc_steps_propagate` in one period can be used to
> > +indirectly estimate amount of objects.
> 
> These values also correlate with the <stepmul> value. The amount of
> incremental steps can grow, but one step can process a small amount of
> GCobjects. So these metrics should be considered together with GC setup.

Thanks! Add this note to RFC!

> 
> > +
> > +Amount of `gc_*num` is useful for control of memory leaks -- totally amount of
> 
> Typo: s/totally/total/.

Fixed! Thanks!

> 
> > +these objects should not growth nonstop (you can also track `gc_total` for
> > +this).  Also `jit_mcode_size` can be used for tracking amount of allocated
> 
> Typo: double space prior to "Also".

Yep. Fixed!
Side note: Too unpredictable autoformatting by `gq` :\

> 
> > +memory for traces machine code.
> > +
> > +Slope curves of `jit_trace_abort` shows how many times trace hasn't been
> > +compiled when the attempt was made (less is better).
> > +
> > +Amount of `gc_trace_num` is shown how much traces was generated (_usually_
> > +more is better).
> > +
> > +And the last one -- `gc_snap_restores` can be used for estimation when LuaJIT
> > +is stop trace execution. If slope curves of this metric growth after changing
> > +old code it can mean performance degradation.
> > +
> > +Assumes that we have code like this:
> > +```lua
> > +local function foo(i)
> > +    return i <= 5 and i or tostring(i)
> > +end
> > +-- minstitch option needs to emulate nonstitching behaviour
> > +jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
> > +
> > +local sum = 0
> > +local old_metrics = misc.getmetrics()
> > +for i = 1, 10 do
> > +    sum = sum + foo(i)
> > +end
> > +local new_metrics = misc.getmetrics()
> > +local diff = new_metrics.jit_snap_restore - old_metrics.jit_snap_restore
> > +```
> > +`diff` equals 3 (1 side exit on loop end, 2 side exits to the interpreter
> > +before trace gets hot and compiled) after this chunk of code.
> > +
> > +And now we decide to change `foo` function like this:
> > +```lua
> > +local function foo(i)
> > +    -- math.fmod is not yet compiled!
> > +    return i <= 5 and i or math.fmod(i, 11)
> > +end
> > +```
> > +`diff` equals 6 (1 side exit on loop end, 2 side exits to the interpreter
> > +before trace gets hot and compiled an 3 side exits from the root trace could
> > +not get compiled) after the same chunk of code.
> > +
> > +## Benchmarks
> > +
> > +Benchmarks was taken from repo:
> 
> Typo: s/was/were/.

Thanks, fixed!

> 
> > +[LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup).
> > +
> > +Example of usage:
> > +```bash
> > +/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua  300 >/dev/null
> 
> Typo: double space prior to "300".

Thanks, fixed!

> 
> > +```
> 
> <snipped>
> 
> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

See iterative patch in the bottom. Branch force-pushed.

===================================================================

diff --git a/doc/rfc/5187-luajit-metrics.md b/doc/rfc/5187-luajit-metrics.md
index 988b049fb..0c1df6901 100644
--- a/doc/rfc/5187-luajit-metrics.md
+++ b/doc/rfc/5187-luajit-metrics.md
@@ -89,22 +89,23 @@ struct luam_Metrics {
 ```
 
 Couple of words about how metrics are collected:
-- `strhash_*` -- whenever existing string is returned after attempt to
-  create new string there is incremented `strhash_hit` counter, if new string
-  created then `strhash_miss` is incremented instead.
-- `gc_*num`, `jit_trace_num` -- corresponding counter incremented whenever new
-  object is allocated. When object become garbage collected its counter is
+- `strhash_*` -- whenever the string with the same payload is found, so a new
+  one is not created/allocated, there is incremented `strhash_hit` counter, if
+  a new one string created/allocated then `strhash_miss` is incremented
+  instead.
+- `gc_*num`, `jit_trace_num` -- corresponding counter is incremented whenever a
+  new object is allocated. When object is collected by GC its counter is
   decremented.
 - `gc_total`, `gc_allocated`, `gc_freed` -- any time when allocation function
-  is called `gc_allocated` and/or `gc_freed` is increased and `gc_total`
-  increase when memory is allocated or reallocated, decrease when memory is
-  freed.
-- `gc_steps_*` -- corresponding counter increments whenever Garbage Collector
-  starts to execute 1 step of garbage collection.
+  is called `gc_allocated` and/or `gc_freed` is increased and `gc_total` is
+  increased when memory is allocated or reallocated, is decreased when memory
+  is freed.
+- `gc_steps_*` -- corresponding counter is incremented whenever Garbage
+  Collector starts to execute an incremental step of garbage collection.
 - `jit_snap_restore` -- whenever JIT machine exits from the trace and restores
   interpreter state `jit_snap_restore` counter is incremented.
 - `jit_trace_abort` -- whenever JIT compiler can't record the trace in case NYI
-  BC this counter is incremented.
+  BC or builtins this counter is incremented.
 - `jit_mcode_size` -- whenever new MCode area is allocated `jit_mcode_size` is
   increased at corresponding size in bytes. Sets to 0 when all mcode area is
   freed.
@@ -114,7 +115,7 @@ increase monotonically and can overflow:
   - `strhash_hit`
   - `strhash_miss`
   - `gc_freed`
-  - `gc_allocated`,
+  - `gc_allocated`
   - `gc_steps_pause`
   - `gc_steps_propagate`
   - `gc_steps_atomic`
@@ -173,8 +174,7 @@ end
 ```
 increase in slope curve of `strhash_misses` means, that after your changes
 there are more new strings allocating at runtime. Of course slope curve of
-`strhash_misses` _should_ be less than slope curve of `strhash_hits`. If it
-is not, you should refactor your code.
+`strhash_misses` _should_ be less than slope curve of `strhash_hits`.
 
 Slope curves of `gc_freed` and `gc_allocated` can be used for analysis of GC
 pressure of your application (less is better).
@@ -188,7 +188,7 @@ for i = 1, 513 do
     t[i] = i
 end
 local new_metrics = misc.getmetrics()
-local diff =  new_metrics.gc_allocated - old_metrics.gc_allocated
+local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
 ```
 `diff` equals to 18879 after running of this chunk.
 
@@ -201,7 +201,7 @@ for i = 1, 513 do
     t[i] = i
 end
 local new_metrics = misc.getmetrics()
-local diff =  new_metrics.gc_allocated - old_metrics.gc_allocated
+local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
 ```
 `diff` shows us only 5895.
 
@@ -209,11 +209,14 @@ Slope curves of `gc_steps_*` can be used for tracking of GC pressure too. For
 long time observations you will see periodic increment for `gc_steps_*` metrics
 -- for example longer period of `gc_steps_atomic` increment is better. Also
 additional amount of `gc_steps_propagate` in one period can be used to
-indirectly estimate amount of objects.
+indirectly estimate amount of objects. These values also correlate with the
+step multiplier of the GC. The amount of incremental steps can grow, but
+one step can process a small amount of objects. So these metrics should be
+considered together with GC setup.
 
-Amount of `gc_*num` is useful for control of memory leaks -- totally amount of
+Amount of `gc_*num` is useful for control of memory leaks -- total amount of
 these objects should not growth nonstop (you can also track `gc_total` for
-this).  Also `jit_mcode_size` can be used for tracking amount of allocated
+this). Also `jit_mcode_size` can be used for tracking amount of allocated
 memory for traces machine code.
 
 Slope curves of `jit_trace_abort` shows how many times trace hasn't been
@@ -258,12 +261,12 @@ not get compiled) after the same chunk of code.
 
 ## Benchmarks
 
-Benchmarks was taken from repo:
+Benchmarks were taken from repo:
 [LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup).
 
 Example of usage:
 ```bash
-/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua  300 >/dev/null
+/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua 300 >/dev/null
 ```
 Taking into account the measurement error ~ 2%, it can be said that there is no
 difference in the performance.
===================================================================

-- 
Best regards,
Sergey Kaplun