[Tarantool-patches] [RFC v4] rfc: luajit metrics
Igor Munkin
imun at tarantool.org
Thu Oct 8 20:25:12 MSK 2020
Sergey,
Thanks for the updates! Considering the changes in the follow-up reply,
there are still several comments below.
On 05.10.20, Sergey Kaplun wrote:
> Part of #5187
> ---
>
> This patch adds RFC to LuaJIT metrics interfaces. Nevertheless name
> `misc` for builtin library is not good and should be discussed, because
> tons of user modules can use that name for their own libraries.
>
> Branch: https://github.com/tarantool/tarantool/tree/skaplun/5187-luajit-metrics
> Issue: https://github.com/tarantool/tarantool/issues/5187
>
> Changes in v2:
> - Fixed typos
> - Made comments more verbose
> - Avoided flushing any of metrics after each call of luaM_metrics()
> Changes in v3:
> - Added colors count metrics description
> - Added description about how metrics are collected
> - Added benchmarks
> Changes in v3:
> - Removed colors count metrics
> - Added code block language
> - Added how to use section
Minor: ChangeLog is misordered (the latest changes are the first entry).
>
> doc/rfc/5187-luajit-metrics.md | 299 +++++++++++++++++++++++++++++++++
> 1 file changed, 299 insertions(+)
> create mode 100644 doc/rfc/5187-luajit-metrics.md
>
> diff --git a/doc/rfc/5187-luajit-metrics.md b/doc/rfc/5187-luajit-metrics.md
> new file mode 100644
> index 000000000..02f5b559f
> --- /dev/null
> +++ b/doc/rfc/5187-luajit-metrics.md
> @@ -0,0 +1,299 @@
<snipped>
> +Couple of words about how metrics are collected:
> +- `strhash_*` -- whenever existing string is returned after attempt to
> + create new string there is incremented `strhash_hit` counter, if new string
> + created then `strhash_miss` is incremented instead.
Minor: I propose to reword it the way similar you've updated it in the
luam_Metrics comments.
> +- `gc_*num`, `jit_trace_num` -- corresponding counter incremented whenever new
Typo: s/whenever new/whenever a new/.
> + object is allocated. When object become garbage collected its counter is
Typo: s/become garbage collected/is collected by GC/.
> + decremented.
> +- `gc_total`, `gc_allocated`, `gc_freed` -- any time when allocation function
> + is called `gc_allocated` and/or `gc_freed` is increased and `gc_total`
> + increase when memory is allocated or reallocated, decrease when memory is
Typo: s/increase/is increased/.
Typo: s/decrease/is decreased/.
> + freed.
> +- `gc_steps_*` -- corresponding counter increments whenever Garbage Collector
Typo: s/increments/is incremented/.
> + starts to execute 1 step of garbage collection.
Minor: I propose s/1/an incremental/.
> +- `jit_snap_restore` -- whenever JIT machine exits from the trace and restores
> + interpreter state `jit_snap_restore` counter is incremented.
> +- `jit_trace_abort` -- whenever JIT compiler can't record the trace in case NYI
> + BC this counter is incremented.
Minor: NYI relates also to builtins, not only to bytecodes.
> +- `jit_mcode_size` -- whenever new MCode area is allocated `jit_mcode_size` is
> + increased at corresponding size in bytes. Sets to 0 when all mcode area is
> + freed.
How does it change, when a trace is collected as a result of its flush?
> +
> +All metrics are collected throughout the platform uptime. These metrics
> +increase monotonically and can overflow:
> + - `strhash_hit`
> + - `strhash_miss`
> + - `gc_freed`
> + - `gc_allocated`,
Typo: Excess comma.
> + - `gc_steps_pause`
> + - `gc_steps_propagate`
> + - `gc_steps_atomic`
> + - `gc_steps_sweepstring`
> + - `gc_steps_sweep`
> + - `gc_steps_finalize`
> + - `jit_snap_restore`
> + - `jit_trace_abort`
<snipped>
> +## How to use
> +
> +This section describes small example of metrics usage.
> +
> +For example amount of `strhash_misses` can be shown for tracking of new string
> +objects allocations. For example if we add code like:
> +```lua
> +local function sharded_storage_func(storage_name, func_name)
> + return 'sharded_storage.storages.' .. storage_name .. '.' .. func_name
> +end
> +```
> +increase in slope curve of `strhash_misses` means, that after your changes
> +there are more new strings allocating at runtime. Of course slope curve of
> +`strhash_misses` _should_ be less than slope curve of `strhash_hits`. If it
> +is not, you should refactor your code.
Minor: I guess it's not a good idea to suggest 'code refactoring'. This
section is good to describe the values being observed, so the first part
about tilt angle is enough.
> +
> +Slope curves of `gc_freed` and `gc_allocated` can be used for analysis of GC
> +pressure of your application (less is better).
> +
> +Also we can check some hacky optimization with these metrics. For example let's
> +assume that we have this code snippet:
> +```lua
> +local old_metrics = misc.getmetrics()
> +local t = {}
> +for i = 1, 513 do
> + t[i] = i
> +end
> +local new_metrics = misc.getmetrics()
> +local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
> +```
> +`diff` equals to 18879 after running of this chunk.
> +
> +But if we change table initialization to
> +```lua
> +local table_new = require "table.new"
> +local old_metrics = misc.getmetrics()
> +local t = table_new(513,0)
> +for i = 1, 513 do
> + t[i] = i
> +end
> +local new_metrics = misc.getmetrics()
> +local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
> +```
> +`diff` shows us only 5895.
> +
> +Slope curves of `gc_steps_*` can be used for tracking of GC pressure too. For
> +long time observations you will see periodic increment for `gc_steps_*` metrics
> +-- for example longer period of `gc_steps_atomic` increment is better. Also
> +additional amount of `gc_steps_propagate` in one period can be used to
> +indirectly estimate amount of objects.
These values also correlate with the <stepmul> value. The amount of
incremental steps can grow, but one step can process a small amount of
GCobjects. So these metrics should be considered together with GC setup.
> +
> +Amount of `gc_*num` is useful for control of memory leaks -- totally amount of
Typo: s/totally/total/.
> +these objects should not growth nonstop (you can also track `gc_total` for
> +this). Also `jit_mcode_size` can be used for tracking amount of allocated
Typo: double space prior to "Also".
> +memory for traces machine code.
> +
> +Slope curves of `jit_trace_abort` shows how many times trace hasn't been
> +compiled when the attempt was made (less is better).
> +
> +Amount of `gc_trace_num` is shown how much traces was generated (_usually_
> +more is better).
> +
> +And the last one -- `gc_snap_restores` can be used for estimation when LuaJIT
> +is stop trace execution. If slope curves of this metric growth after changing
> +old code it can mean performance degradation.
> +
> +Assumes that we have code like this:
> +```lua
> +local function foo(i)
> + return i <= 5 and i or tostring(i)
> +end
> +-- minstitch option needs to emulate nonstitching behaviour
> +jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
> +
> +local sum = 0
> +local old_metrics = misc.getmetrics()
> +for i = 1, 10 do
> + sum = sum + foo(i)
> +end
> +local new_metrics = misc.getmetrics()
> +local diff = new_metrics.jit_snap_restore - old_metrics.jit_snap_restore
> +```
> +`diff` equals 3 (1 side exit on loop end, 2 side exits to the interpreter
> +before trace gets hot and compiled) after this chunk of code.
> +
> +And now we decide to change `foo` function like this:
> +```lua
> +local function foo(i)
> + -- math.fmod is not yet compiled!
> + return i <= 5 and i or math.fmod(i, 11)
> +end
> +```
> +`diff` equals 6 (1 side exit on loop end, 2 side exits to the interpreter
> +before trace gets hot and compiled an 3 side exits from the root trace could
> +not get compiled) after the same chunk of code.
> +
> +## Benchmarks
> +
> +Benchmarks was taken from repo:
Typo: s/was/were/.
> +[LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup).
> +
> +Example of usage:
> +```bash
> +/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua 300 >/dev/null
Typo: double space prior to "300".
> +```
<snipped>
> --
> 2.28.0
>
--
Best regards,
IM
More information about the Tarantool-patches
mailing list