[Tarantool-patches] [RFC v4] rfc: luajit metrics

Igor Munkin imun at tarantool.org
Thu Oct 8 20:25:12 MSK 2020


Sergey,

Thanks for the updates! Considering the changes in the follow-up reply,
there are still several comments below.

On 05.10.20, Sergey Kaplun wrote:
> Part of #5187
> ---
> 
> This patch adds RFC to LuaJIT metrics interfaces. Nevertheless name
> `misc` for builtin library is not good and should be discussed, because
> tons of user modules can use that name for their own libraries.
> 
> Branch: https://github.com/tarantool/tarantool/tree/skaplun/5187-luajit-metrics
> Issue: https://github.com/tarantool/tarantool/issues/5187
> 
> Changes in v2:
> - Fixed typos
> - Made comments more verbose
> - Avoided flushing any of metrics after each call of luaM_metrics()
> Changes in v3:
> - Added colors count metrics description
> - Added description about how metrics are collected
> - Added benchmarks
> Changes in v3:
> - Removed colors count metrics
> - Added code block language
> - Added how to use section

Minor: ChangeLog is misordered (the latest changes are the first entry).

> 
>  doc/rfc/5187-luajit-metrics.md | 299 +++++++++++++++++++++++++++++++++
>  1 file changed, 299 insertions(+)
>  create mode 100644 doc/rfc/5187-luajit-metrics.md
> 
> diff --git a/doc/rfc/5187-luajit-metrics.md b/doc/rfc/5187-luajit-metrics.md
> new file mode 100644
> index 000000000..02f5b559f
> --- /dev/null
> +++ b/doc/rfc/5187-luajit-metrics.md
> @@ -0,0 +1,299 @@

<snipped>

> +Couple of words about how metrics are collected:
> +- `strhash_*` -- whenever existing string is returned after attempt to
> +  create new string there is incremented `strhash_hit` counter, if new string
> +  created then `strhash_miss` is incremented instead.

Minor: I propose to reword it the way similar you've updated it in the
luam_Metrics comments.

> +- `gc_*num`, `jit_trace_num` -- corresponding counter incremented whenever new

Typo: s/whenever new/whenever a new/.

> +  object is allocated. When object become garbage collected its counter is

Typo: s/become garbage collected/is collected by GC/.

> +  decremented.
> +- `gc_total`, `gc_allocated`, `gc_freed` -- any time when allocation function
> +  is called `gc_allocated` and/or `gc_freed` is increased and `gc_total`
> +  increase when memory is allocated or reallocated, decrease when memory is

Typo: s/increase/is increased/.
Typo: s/decrease/is decreased/.

> +  freed.
> +- `gc_steps_*` -- corresponding counter increments whenever Garbage Collector

Typo: s/increments/is incremented/.

> +  starts to execute 1 step of garbage collection.

Minor: I propose s/1/an incremental/.

> +- `jit_snap_restore` -- whenever JIT machine exits from the trace and restores
> +  interpreter state `jit_snap_restore` counter is incremented.
> +- `jit_trace_abort` -- whenever JIT compiler can't record the trace in case NYI
> +  BC this counter is incremented.

Minor: NYI relates also to builtins, not only to bytecodes.

> +- `jit_mcode_size` -- whenever new MCode area is allocated `jit_mcode_size` is
> +  increased at corresponding size in bytes. Sets to 0 when all mcode area is
> +  freed.

How does it change, when a trace is collected as a result of its flush?

> +
> +All metrics are collected throughout the platform uptime. These metrics
> +increase monotonically and can overflow:
> +  - `strhash_hit`
> +  - `strhash_miss`
> +  - `gc_freed`
> +  - `gc_allocated`,

Typo: Excess comma.

> +  - `gc_steps_pause`
> +  - `gc_steps_propagate`
> +  - `gc_steps_atomic`
> +  - `gc_steps_sweepstring`
> +  - `gc_steps_sweep`
> +  - `gc_steps_finalize`
> +  - `jit_snap_restore`
> +  - `jit_trace_abort`

<snipped>

> +## How to use
> +
> +This section describes small example of metrics usage.
> +
> +For example amount of `strhash_misses` can be shown for tracking of new string
> +objects allocations. For example if we add code like:
> +```lua
> +local function sharded_storage_func(storage_name, func_name)
> +    return 'sharded_storage.storages.' .. storage_name .. '.' .. func_name
> +end
> +```
> +increase in slope curve of `strhash_misses` means, that after your changes
> +there are more new strings allocating at runtime. Of course slope curve of
> +`strhash_misses` _should_ be less than slope curve of `strhash_hits`. If it
> +is not, you should refactor your code.

Minor: I guess it's not a good idea to suggest 'code refactoring'. This
section is good to describe the values being observed, so the first part
about tilt angle is enough.

> +
> +Slope curves of `gc_freed` and `gc_allocated` can be used for analysis of GC
> +pressure of your application (less is better).
> +
> +Also we can check some hacky optimization with these metrics. For example let's
> +assume that we have this code snippet:
> +```lua
> +local old_metrics = misc.getmetrics()
> +local t = {}
> +for i = 1, 513 do
> +    t[i] = i
> +end
> +local new_metrics = misc.getmetrics()
> +local diff =  new_metrics.gc_allocated - old_metrics.gc_allocated
> +```
> +`diff` equals to 18879 after running of this chunk.
> +
> +But if we change table initialization to
> +```lua
> +local table_new = require "table.new"
> +local old_metrics = misc.getmetrics()
> +local t = table_new(513,0)
> +for i = 1, 513 do
> +    t[i] = i
> +end
> +local new_metrics = misc.getmetrics()
> +local diff =  new_metrics.gc_allocated - old_metrics.gc_allocated
> +```
> +`diff` shows us only 5895.
> +
> +Slope curves of `gc_steps_*` can be used for tracking of GC pressure too. For
> +long time observations you will see periodic increment for `gc_steps_*` metrics
> +-- for example longer period of `gc_steps_atomic` increment is better. Also
> +additional amount of `gc_steps_propagate` in one period can be used to
> +indirectly estimate amount of objects.

These values also correlate with the <stepmul> value. The amount of
incremental steps can grow, but one step can process a small amount of
GCobjects. So these metrics should be considered together with GC setup.

> +
> +Amount of `gc_*num` is useful for control of memory leaks -- totally amount of

Typo: s/totally/total/.

> +these objects should not growth nonstop (you can also track `gc_total` for
> +this).  Also `jit_mcode_size` can be used for tracking amount of allocated

Typo: double space prior to "Also".

> +memory for traces machine code.
> +
> +Slope curves of `jit_trace_abort` shows how many times trace hasn't been
> +compiled when the attempt was made (less is better).
> +
> +Amount of `gc_trace_num` is shown how much traces was generated (_usually_
> +more is better).
> +
> +And the last one -- `gc_snap_restores` can be used for estimation when LuaJIT
> +is stop trace execution. If slope curves of this metric growth after changing
> +old code it can mean performance degradation.
> +
> +Assumes that we have code like this:
> +```lua
> +local function foo(i)
> +    return i <= 5 and i or tostring(i)
> +end
> +-- minstitch option needs to emulate nonstitching behaviour
> +jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
> +
> +local sum = 0
> +local old_metrics = misc.getmetrics()
> +for i = 1, 10 do
> +    sum = sum + foo(i)
> +end
> +local new_metrics = misc.getmetrics()
> +local diff = new_metrics.jit_snap_restore - old_metrics.jit_snap_restore
> +```
> +`diff` equals 3 (1 side exit on loop end, 2 side exits to the interpreter
> +before trace gets hot and compiled) after this chunk of code.
> +
> +And now we decide to change `foo` function like this:
> +```lua
> +local function foo(i)
> +    -- math.fmod is not yet compiled!
> +    return i <= 5 and i or math.fmod(i, 11)
> +end
> +```
> +`diff` equals 6 (1 side exit on loop end, 2 side exits to the interpreter
> +before trace gets hot and compiled an 3 side exits from the root trace could
> +not get compiled) after the same chunk of code.
> +
> +## Benchmarks
> +
> +Benchmarks was taken from repo:

Typo: s/was/were/.

> +[LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup).
> +
> +Example of usage:
> +```bash
> +/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua  300 >/dev/null

Typo: double space prior to "300".

> +```

<snipped>

> -- 
> 2.28.0
> 

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list