[Tarantool-patches] [RFC v4] rfc: luajit metrics
Sergey Kaplun
skaplun at tarantool.org
Thu Oct 8 22:29:54 MSK 2020
Hi, Igor! Thanks for the review!
On 08.10.20, Igor Munkin wrote:
> Sergey,
>
> Thanks for the updates! Considering the changes in the follow-up reply,
> there are still several comments below.
>
> On 05.10.20, Sergey Kaplun wrote:
> > Part of #5187
> > ---
> >
> > This patch adds RFC to LuaJIT metrics interfaces. Nevertheless name
> > `misc` for builtin library is not good and should be discussed, because
> > tons of user modules can use that name for their own libraries.
> >
> > Branch: https://github.com/tarantool/tarantool/tree/skaplun/5187-luajit-metrics
> > Issue: https://github.com/tarantool/tarantool/issues/5187
> >
> > Changes in v2:
> > - Fixed typos
> > - Made comments more verbose
> > - Avoided flushing any of metrics after each call of luaM_metrics()
> > Changes in v3:
> > - Added colors count metrics description
> > - Added description about how metrics are collected
> > - Added benchmarks
> > Changes in v3:
> > - Removed colors count metrics
> > - Added code block language
> > - Added how to use section
>
> Minor: ChangeLog is misordered (the latest changes are the first entry).
Thanks!
Forgot to add ChangeLog to patch:
@ChangeLog:
* Add Lua and C API for LuaJIT platform metrics about:
- overall amount of allocated tables, cdata and udata objects
- number of incremental GC steps grouped by GC state
- number of string hashes hits and misses
- amount of allocated and freed memory
- number of trace aborts, number of traces and restored snapshots
>
> >
> > doc/rfc/5187-luajit-metrics.md | 299 +++++++++++++++++++++++++++++++++
> > 1 file changed, 299 insertions(+)
> > create mode 100644 doc/rfc/5187-luajit-metrics.md
> >
> > diff --git a/doc/rfc/5187-luajit-metrics.md b/doc/rfc/5187-luajit-metrics.md
> > new file mode 100644
> > index 000000000..02f5b559f
> > --- /dev/null
> > +++ b/doc/rfc/5187-luajit-metrics.md
> > @@ -0,0 +1,299 @@
>
> <snipped>
>
> > +Couple of words about how metrics are collected:
> > +- `strhash_*` -- whenever existing string is returned after attempt to
> > + create new string there is incremented `strhash_hit` counter, if new string
> > + created then `strhash_miss` is incremented instead.
>
> Minor: I propose to reword it the way similar you've updated it in the
> luam_Metrics comments.
Yep, sure!
>
> > +- `gc_*num`, `jit_trace_num` -- corresponding counter incremented whenever new
>
> Typo: s/whenever new/whenever a new/.
Hawkeye!
>
> > + object is allocated. When object become garbage collected its counter is
>
> Typo: s/become garbage collected/is collected by GC/.
Thanks! Fixed.
>
> > + decremented.
> > +- `gc_total`, `gc_allocated`, `gc_freed` -- any time when allocation function
> > + is called `gc_allocated` and/or `gc_freed` is increased and `gc_total`
> > + increase when memory is allocated or reallocated, decrease when memory is
>
> Typo: s/increase/is increased/.
> Typo: s/decrease/is decreased/.
Thanks! Fixed.
>
> > + freed.
> > +- `gc_steps_*` -- corresponding counter increments whenever Garbage Collector
>
> Typo: s/increments/is incremented/.
Thanks!
>
> > + starts to execute 1 step of garbage collection.
>
> Minor: I propose s/1/an incremental/.
No problem!
>
> > +- `jit_snap_restore` -- whenever JIT machine exits from the trace and restores
> > + interpreter state `jit_snap_restore` counter is incremented.
> > +- `jit_trace_abort` -- whenever JIT compiler can't record the trace in case NYI
> > + BC this counter is incremented.
>
> Minor: NYI relates also to builtins, not only to bytecodes.
Ok. Fixed.
>
> > +- `jit_mcode_size` -- whenever new MCode area is allocated `jit_mcode_size` is
> > + increased at corresponding size in bytes. Sets to 0 when all mcode area is
> > + freed.
>
> How does it change, when a trace is collected as a result of its flush?
It doesn't. IINM, this area will be reused later for other traces.
MCode area is linked with jit_State not with trace by itself. Trace just
reserve MCode area that needed.
>
> > +
> > +All metrics are collected throughout the platform uptime. These metrics
> > +increase monotonically and can overflow:
> > + - `strhash_hit`
> > + - `strhash_miss`
> > + - `gc_freed`
> > + - `gc_allocated`,
>
> Typo: Excess comma.
Hawkeye!
>
> > + - `gc_steps_pause`
> > + - `gc_steps_propagate`
> > + - `gc_steps_atomic`
> > + - `gc_steps_sweepstring`
> > + - `gc_steps_sweep`
> > + - `gc_steps_finalize`
> > + - `jit_snap_restore`
> > + - `jit_trace_abort`
>
> <snipped>
>
> > +## How to use
> > +
> > +This section describes small example of metrics usage.
> > +
> > +For example amount of `strhash_misses` can be shown for tracking of new string
> > +objects allocations. For example if we add code like:
> > +```lua
> > +local function sharded_storage_func(storage_name, func_name)
> > + return 'sharded_storage.storages.' .. storage_name .. '.' .. func_name
> > +end
> > +```
> > +increase in slope curve of `strhash_misses` means, that after your changes
> > +there are more new strings allocating at runtime. Of course slope curve of
> > +`strhash_misses` _should_ be less than slope curve of `strhash_hits`. If it
> > +is not, you should refactor your code.
>
> Minor: I guess it's not a good idea to suggest 'code refactoring'. This
> section is good to describe the values being observed, so the first part
> about tilt angle is enough.
OK, dropped it!
>
> > +
> > +Slope curves of `gc_freed` and `gc_allocated` can be used for analysis of GC
> > +pressure of your application (less is better).
> > +
> > +Also we can check some hacky optimization with these metrics. For example let's
> > +assume that we have this code snippet:
> > +```lua
> > +local old_metrics = misc.getmetrics()
> > +local t = {}
> > +for i = 1, 513 do
> > + t[i] = i
> > +end
> > +local new_metrics = misc.getmetrics()
> > +local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
Found double space prior new_metrics.gc_allocated. Fixed.
> > +```
> > +`diff` equals to 18879 after running of this chunk.
> > +
> > +But if we change table initialization to
> > +```lua
> > +local table_new = require "table.new"
> > +local old_metrics = misc.getmetrics()
> > +local t = table_new(513,0)
> > +for i = 1, 513 do
> > + t[i] = i
> > +end
> > +local new_metrics = misc.getmetrics()
> > +local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
Here too.
> > +```
> > +`diff` shows us only 5895.
> > +
> > +Slope curves of `gc_steps_*` can be used for tracking of GC pressure too. For
> > +long time observations you will see periodic increment for `gc_steps_*` metrics
> > +-- for example longer period of `gc_steps_atomic` increment is better. Also
> > +additional amount of `gc_steps_propagate` in one period can be used to
> > +indirectly estimate amount of objects.
>
> These values also correlate with the <stepmul> value. The amount of
> incremental steps can grow, but one step can process a small amount of
> GCobjects. So these metrics should be considered together with GC setup.
Thanks! Add this note to RFC!
>
> > +
> > +Amount of `gc_*num` is useful for control of memory leaks -- totally amount of
>
> Typo: s/totally/total/.
Fixed! Thanks!
>
> > +these objects should not growth nonstop (you can also track `gc_total` for
> > +this). Also `jit_mcode_size` can be used for tracking amount of allocated
>
> Typo: double space prior to "Also".
Yep. Fixed!
Side note: Too unpredictable autoformatting by `gq` :\
>
> > +memory for traces machine code.
> > +
> > +Slope curves of `jit_trace_abort` shows how many times trace hasn't been
> > +compiled when the attempt was made (less is better).
> > +
> > +Amount of `gc_trace_num` is shown how much traces was generated (_usually_
> > +more is better).
> > +
> > +And the last one -- `gc_snap_restores` can be used for estimation when LuaJIT
> > +is stop trace execution. If slope curves of this metric growth after changing
> > +old code it can mean performance degradation.
> > +
> > +Assumes that we have code like this:
> > +```lua
> > +local function foo(i)
> > + return i <= 5 and i or tostring(i)
> > +end
> > +-- minstitch option needs to emulate nonstitching behaviour
> > +jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15")
> > +
> > +local sum = 0
> > +local old_metrics = misc.getmetrics()
> > +for i = 1, 10 do
> > + sum = sum + foo(i)
> > +end
> > +local new_metrics = misc.getmetrics()
> > +local diff = new_metrics.jit_snap_restore - old_metrics.jit_snap_restore
> > +```
> > +`diff` equals 3 (1 side exit on loop end, 2 side exits to the interpreter
> > +before trace gets hot and compiled) after this chunk of code.
> > +
> > +And now we decide to change `foo` function like this:
> > +```lua
> > +local function foo(i)
> > + -- math.fmod is not yet compiled!
> > + return i <= 5 and i or math.fmod(i, 11)
> > +end
> > +```
> > +`diff` equals 6 (1 side exit on loop end, 2 side exits to the interpreter
> > +before trace gets hot and compiled an 3 side exits from the root trace could
> > +not get compiled) after the same chunk of code.
> > +
> > +## Benchmarks
> > +
> > +Benchmarks was taken from repo:
>
> Typo: s/was/were/.
Thanks, fixed!
>
> > +[LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup).
> > +
> > +Example of usage:
> > +```bash
> > +/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua 300 >/dev/null
>
> Typo: double space prior to "300".
Thanks, fixed!
>
> > +```
>
> <snipped>
>
> > --
> > 2.28.0
> >
>
> --
> Best regards,
> IM
See iterative patch in the bottom. Branch force-pushed.
===================================================================
diff --git a/doc/rfc/5187-luajit-metrics.md b/doc/rfc/5187-luajit-metrics.md
index 988b049fb..0c1df6901 100644
--- a/doc/rfc/5187-luajit-metrics.md
+++ b/doc/rfc/5187-luajit-metrics.md
@@ -89,22 +89,23 @@ struct luam_Metrics {
```
Couple of words about how metrics are collected:
-- `strhash_*` -- whenever existing string is returned after attempt to
- create new string there is incremented `strhash_hit` counter, if new string
- created then `strhash_miss` is incremented instead.
-- `gc_*num`, `jit_trace_num` -- corresponding counter incremented whenever new
- object is allocated. When object become garbage collected its counter is
+- `strhash_*` -- whenever the string with the same payload is found, so a new
+ one is not created/allocated, there is incremented `strhash_hit` counter, if
+ a new one string created/allocated then `strhash_miss` is incremented
+ instead.
+- `gc_*num`, `jit_trace_num` -- corresponding counter is incremented whenever a
+ new object is allocated. When object is collected by GC its counter is
decremented.
- `gc_total`, `gc_allocated`, `gc_freed` -- any time when allocation function
- is called `gc_allocated` and/or `gc_freed` is increased and `gc_total`
- increase when memory is allocated or reallocated, decrease when memory is
- freed.
-- `gc_steps_*` -- corresponding counter increments whenever Garbage Collector
- starts to execute 1 step of garbage collection.
+ is called `gc_allocated` and/or `gc_freed` is increased and `gc_total` is
+ increased when memory is allocated or reallocated, is decreased when memory
+ is freed.
+- `gc_steps_*` -- corresponding counter is incremented whenever Garbage
+ Collector starts to execute an incremental step of garbage collection.
- `jit_snap_restore` -- whenever JIT machine exits from the trace and restores
interpreter state `jit_snap_restore` counter is incremented.
- `jit_trace_abort` -- whenever JIT compiler can't record the trace in case NYI
- BC this counter is incremented.
+ BC or builtins this counter is incremented.
- `jit_mcode_size` -- whenever new MCode area is allocated `jit_mcode_size` is
increased at corresponding size in bytes. Sets to 0 when all mcode area is
freed.
@@ -114,7 +115,7 @@ increase monotonically and can overflow:
- `strhash_hit`
- `strhash_miss`
- `gc_freed`
- - `gc_allocated`,
+ - `gc_allocated`
- `gc_steps_pause`
- `gc_steps_propagate`
- `gc_steps_atomic`
@@ -173,8 +174,7 @@ end
```
increase in slope curve of `strhash_misses` means, that after your changes
there are more new strings allocating at runtime. Of course slope curve of
-`strhash_misses` _should_ be less than slope curve of `strhash_hits`. If it
-is not, you should refactor your code.
+`strhash_misses` _should_ be less than slope curve of `strhash_hits`.
Slope curves of `gc_freed` and `gc_allocated` can be used for analysis of GC
pressure of your application (less is better).
@@ -188,7 +188,7 @@ for i = 1, 513 do
t[i] = i
end
local new_metrics = misc.getmetrics()
-local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
+local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
```
`diff` equals to 18879 after running of this chunk.
@@ -201,7 +201,7 @@ for i = 1, 513 do
t[i] = i
end
local new_metrics = misc.getmetrics()
-local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
+local diff = new_metrics.gc_allocated - old_metrics.gc_allocated
```
`diff` shows us only 5895.
@@ -209,11 +209,14 @@ Slope curves of `gc_steps_*` can be used for tracking of GC pressure too. For
long time observations you will see periodic increment for `gc_steps_*` metrics
-- for example longer period of `gc_steps_atomic` increment is better. Also
additional amount of `gc_steps_propagate` in one period can be used to
-indirectly estimate amount of objects.
+indirectly estimate amount of objects. These values also correlate with the
+step multiplier of the GC. The amount of incremental steps can grow, but
+one step can process a small amount of objects. So these metrics should be
+considered together with GC setup.
-Amount of `gc_*num` is useful for control of memory leaks -- totally amount of
+Amount of `gc_*num` is useful for control of memory leaks -- total amount of
these objects should not growth nonstop (you can also track `gc_total` for
-this). Also `jit_mcode_size` can be used for tracking amount of allocated
+this). Also `jit_mcode_size` can be used for tracking amount of allocated
memory for traces machine code.
Slope curves of `jit_trace_abort` shows how many times trace hasn't been
@@ -258,12 +261,12 @@ not get compiled) after the same chunk of code.
## Benchmarks
-Benchmarks was taken from repo:
+Benchmarks were taken from repo:
[LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup).
Example of usage:
```bash
-/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua 300 >/dev/null
+/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua 300 >/dev/null
```
Taking into account the measurement error ~ 2%, it can be said that there is no
difference in the performance.
===================================================================
--
Best regards,
Sergey Kaplun
More information about the Tarantool-patches
mailing list