From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id ECA67469719 for ; Thu, 8 Oct 2020 20:35:49 +0300 (MSK) Date: Thu, 8 Oct 2020 20:25:12 +0300 From: Igor Munkin Message-ID: <20201008172512.GU18920@tarantool.org> References: <20201005063029.31737-1-skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201005063029.31737-1-skaplun@tarantool.org> Subject: Re: [Tarantool-patches] [RFC v4] rfc: luajit metrics List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Sergey, Thanks for the updates! Considering the changes in the follow-up reply, there are still several comments below. On 05.10.20, Sergey Kaplun wrote: > Part of #5187 > --- > > This patch adds RFC to LuaJIT metrics interfaces. Nevertheless name > `misc` for builtin library is not good and should be discussed, because > tons of user modules can use that name for their own libraries. > > Branch: https://github.com/tarantool/tarantool/tree/skaplun/5187-luajit-metrics > Issue: https://github.com/tarantool/tarantool/issues/5187 > > Changes in v2: > - Fixed typos > - Made comments more verbose > - Avoided flushing any of metrics after each call of luaM_metrics() > Changes in v3: > - Added colors count metrics description > - Added description about how metrics are collected > - Added benchmarks > Changes in v3: > - Removed colors count metrics > - Added code block language > - Added how to use section Minor: ChangeLog is misordered (the latest changes are the first entry). > > doc/rfc/5187-luajit-metrics.md | 299 +++++++++++++++++++++++++++++++++ > 1 file changed, 299 insertions(+) > create mode 100644 doc/rfc/5187-luajit-metrics.md > > diff --git a/doc/rfc/5187-luajit-metrics.md b/doc/rfc/5187-luajit-metrics.md > new file mode 100644 > index 000000000..02f5b559f > --- /dev/null > +++ b/doc/rfc/5187-luajit-metrics.md > @@ -0,0 +1,299 @@ > +Couple of words about how metrics are collected: > +- `strhash_*` -- whenever existing string is returned after attempt to > + create new string there is incremented `strhash_hit` counter, if new string > + created then `strhash_miss` is incremented instead. Minor: I propose to reword it the way similar you've updated it in the luam_Metrics comments. > +- `gc_*num`, `jit_trace_num` -- corresponding counter incremented whenever new Typo: s/whenever new/whenever a new/. > + object is allocated. When object become garbage collected its counter is Typo: s/become garbage collected/is collected by GC/. > + decremented. > +- `gc_total`, `gc_allocated`, `gc_freed` -- any time when allocation function > + is called `gc_allocated` and/or `gc_freed` is increased and `gc_total` > + increase when memory is allocated or reallocated, decrease when memory is Typo: s/increase/is increased/. Typo: s/decrease/is decreased/. > + freed. > +- `gc_steps_*` -- corresponding counter increments whenever Garbage Collector Typo: s/increments/is incremented/. > + starts to execute 1 step of garbage collection. Minor: I propose s/1/an incremental/. > +- `jit_snap_restore` -- whenever JIT machine exits from the trace and restores > + interpreter state `jit_snap_restore` counter is incremented. > +- `jit_trace_abort` -- whenever JIT compiler can't record the trace in case NYI > + BC this counter is incremented. Minor: NYI relates also to builtins, not only to bytecodes. > +- `jit_mcode_size` -- whenever new MCode area is allocated `jit_mcode_size` is > + increased at corresponding size in bytes. Sets to 0 when all mcode area is > + freed. How does it change, when a trace is collected as a result of its flush? > + > +All metrics are collected throughout the platform uptime. These metrics > +increase monotonically and can overflow: > + - `strhash_hit` > + - `strhash_miss` > + - `gc_freed` > + - `gc_allocated`, Typo: Excess comma. > + - `gc_steps_pause` > + - `gc_steps_propagate` > + - `gc_steps_atomic` > + - `gc_steps_sweepstring` > + - `gc_steps_sweep` > + - `gc_steps_finalize` > + - `jit_snap_restore` > + - `jit_trace_abort` > +## How to use > + > +This section describes small example of metrics usage. > + > +For example amount of `strhash_misses` can be shown for tracking of new string > +objects allocations. For example if we add code like: > +```lua > +local function sharded_storage_func(storage_name, func_name) > + return 'sharded_storage.storages.' .. storage_name .. '.' .. func_name > +end > +``` > +increase in slope curve of `strhash_misses` means, that after your changes > +there are more new strings allocating at runtime. Of course slope curve of > +`strhash_misses` _should_ be less than slope curve of `strhash_hits`. If it > +is not, you should refactor your code. Minor: I guess it's not a good idea to suggest 'code refactoring'. This section is good to describe the values being observed, so the first part about tilt angle is enough. > + > +Slope curves of `gc_freed` and `gc_allocated` can be used for analysis of GC > +pressure of your application (less is better). > + > +Also we can check some hacky optimization with these metrics. For example let's > +assume that we have this code snippet: > +```lua > +local old_metrics = misc.getmetrics() > +local t = {} > +for i = 1, 513 do > + t[i] = i > +end > +local new_metrics = misc.getmetrics() > +local diff = new_metrics.gc_allocated - old_metrics.gc_allocated > +``` > +`diff` equals to 18879 after running of this chunk. > + > +But if we change table initialization to > +```lua > +local table_new = require "table.new" > +local old_metrics = misc.getmetrics() > +local t = table_new(513,0) > +for i = 1, 513 do > + t[i] = i > +end > +local new_metrics = misc.getmetrics() > +local diff = new_metrics.gc_allocated - old_metrics.gc_allocated > +``` > +`diff` shows us only 5895. > + > +Slope curves of `gc_steps_*` can be used for tracking of GC pressure too. For > +long time observations you will see periodic increment for `gc_steps_*` metrics > +-- for example longer period of `gc_steps_atomic` increment is better. Also > +additional amount of `gc_steps_propagate` in one period can be used to > +indirectly estimate amount of objects. These values also correlate with the value. The amount of incremental steps can grow, but one step can process a small amount of GCobjects. So these metrics should be considered together with GC setup. > + > +Amount of `gc_*num` is useful for control of memory leaks -- totally amount of Typo: s/totally/total/. > +these objects should not growth nonstop (you can also track `gc_total` for > +this). Also `jit_mcode_size` can be used for tracking amount of allocated Typo: double space prior to "Also". > +memory for traces machine code. > + > +Slope curves of `jit_trace_abort` shows how many times trace hasn't been > +compiled when the attempt was made (less is better). > + > +Amount of `gc_trace_num` is shown how much traces was generated (_usually_ > +more is better). > + > +And the last one -- `gc_snap_restores` can be used for estimation when LuaJIT > +is stop trace execution. If slope curves of this metric growth after changing > +old code it can mean performance degradation. > + > +Assumes that we have code like this: > +```lua > +local function foo(i) > + return i <= 5 and i or tostring(i) > +end > +-- minstitch option needs to emulate nonstitching behaviour > +jit.opt.start(0, "hotloop=2", "hotexit=2", "minstitch=15") > + > +local sum = 0 > +local old_metrics = misc.getmetrics() > +for i = 1, 10 do > + sum = sum + foo(i) > +end > +local new_metrics = misc.getmetrics() > +local diff = new_metrics.jit_snap_restore - old_metrics.jit_snap_restore > +``` > +`diff` equals 3 (1 side exit on loop end, 2 side exits to the interpreter > +before trace gets hot and compiled) after this chunk of code. > + > +And now we decide to change `foo` function like this: > +```lua > +local function foo(i) > + -- math.fmod is not yet compiled! > + return i <= 5 and i or math.fmod(i, 11) > +end > +``` > +`diff` equals 6 (1 side exit on loop end, 2 side exits to the interpreter > +before trace gets hot and compiled an 3 side exits from the root trace could > +not get compiled) after the same chunk of code. > + > +## Benchmarks > + > +Benchmarks was taken from repo: Typo: s/was/were/. > +[LuaJIT-test-cleanup](https://github.com/LuaJIT/LuaJIT-test-cleanup). > + > +Example of usage: > +```bash > +/usr/bin/time -f"array3d %U" ./luajit $BENCH_DIR/array3d.lua 300 >/dev/null Typo: double space prior to "300". > +``` > -- > 2.28.0 > -- Best regards, IM