[Tarantool-patches] [RFC] rfc: luajit metrics

Sergey Kaplun skaplun at tarantool.org
Mon Jul 27 05:18:05 MSK 2020


Hi!

On 24.07.20, Sergey Ostanevich wrote:
> Hi!
> 
> 
> On 24 Jul 14:18, Sergey Kaplun wrote:
> > Hi! Thanks for the review!
> > 
> > Mons makes no bones of mailing list so I cite his comments from
> > [1] here.
> > 
> > On 23.07.20, Sergey Ostanevich wrote:
> > > Hi!
> > > 
> > > Thanks for the patch! See my comments below.
> > > Otherwise LGTM.
> > > 
> > > Sergos
> > > 
> ...
> > > > +	/*
> > > > +	 * Overall number of snap restores for all traces
> > > 
> > > Snap restores needs some explanation.
> > 
> > Ok, sounds reasonable. AFAIK number of snap restores equals to amount of
> > trace exits. May be this will be more informative for users.
> > 
> 
> Well. What this will tell to you as a user running a Lua code? Consider
> the number is grows - ??? And what if it shrinks??
> 
> I believe it is tighly bound to the total number of traces and the size
> of code. But what should it disclose to the user - how should it react,
> refactor its code?

The more time you spend away from traces the worse it is for the
performance of your application. So fast growing number of snap restores is
good reason to refactor your code.

> > > > +	 * "belonging" to the given jit_State since the last call
> > > > +	 * to luaM_metrics().
> > > > +	 */
> > > > +	size_t jit_snap_restore;
> > > > +	/*
> > > > +	 * Overall number of abort traces "belonging" to the given
> > > > +	 * jit_State since the last call to luaM_metrics().
> > > > +	 */
> > > > +	size_t jit_trace_abort;
> > > > +	/* Total size of all allocated machine code areas. */
> > > > +	size_t jit_mcode_size;
> > > > +	/* Current amount of JIT traces. */
> > > > +	unsigned int jit_trace_num;
> > > > +};
> > > > +```
> > > > +
> > > > +One can see metrics are divided by the two types: global and incremental.
> > > > +Global metrics are collected throughout the platform uptime. Incremental
> > > > +metrics are reset after each `luaM_metrics()` call.
> > 
> > As Mons said:
> > | Incremental metrics must not reset after each call (see practice from
> > | Linux kernel, for ex.). Rationale is simple: 2 different modules, both
> > | using this function, will ruin metrics to each other
> > 
> > It's very important comment. With two modules that use metrics our
> > counters become just garbage. I suppose that we should use monotonic
> > increase counters for all "incremental" metrics. Users can count delta
> > between calls of `getmetrics` and analise a result.
> > 
> > We can use uint64_t for storage of these metrics. If we astimate amount
> > of allocated objects as 40Gb per second it will be necessary 14 years of
> > instance uptime to overflow this counter. Also all logic corresponding
> > to overflow of the counter can be realized inside user code.
> > 
> > > > +
> > > > +There is also a complement introduced for Lua space -- `misc.getmetrics()`.
> > > > +This function is just a wrapper for `luaM_metrics()` returning a Lua table with
> > > > +the similar metrics. Its usage is quite simple:
> > > > +```
> > > > +$ ./src/tarantool
> > > > +Tarantool 2.5.0-267-gbf047ad44
> > > > +type 'help' for interactive help
> > > > +tarantool> misc.getmetrics()
> > > > +---
> > > > +- global:
> > > > +    tabnum: 1812
> > > > +    gc_total: 1369927
> > > > +    strnum: 5767
> > > > +    jit_trace_num: 0
> > > > +    cdatanum: 89
> > > > +    jit_mcode_size: 0
> > > > +    udatanum: 17
> > > > +  incremental:
> > > 
> > > incremental_from_last_call will be more descriptive?
> > 
> > Mons:
> > | I'd consider rename global to absolute. All metrics are global (not
> > | scoped). But some are incremental counters and others are absolute
> > | current values.
> > 
> > And also as far as we mustn't use updatable by call metrics, should we join all
> > metrics inside one table, shouldn't we?
> 
> Sure, both comments are correct - we should report just absolute values,
> but mention this fact in documentation. 

I've fixed your comments in [1].

> > 
> > > > +    jit_snap_restore: 0
> > > > +    gc_freed: 2239391
> > > > +    strhash_hit: 53759
> > > > +    gc_steps_finalize: 0
> > > > +    gc_allocated: 3609318
> > > > +    gc_steps_atomic: 6
> > > > +    gc_steps_sweep: 296
> > > > +    gc_steps_sweepstring: 17920
> > > > +    jit_trace_abort: 0
> > > > +    strhash_miss: 6874
> > > 
> > > I wonder if those keys can be sorted? Consider, I want to find
> > > something by the name. Sorting helps greatly. 
> > 
> > It can be implement by user as simple as:
> > 
> > | local metrics = xjit.getmetrics()
> > | local sorted = {}
> > | for k, v in pairs(metrics.incremental) do table.insert(sorted, k) end
> > | table.sort(sorted)
> > | for _, k in ipairs(sorted) do print(k, metrics.incremental[k]) end
> > |
> > | --[[
> > | gc_allocated    8920
> > | gc_freed        3528
> > | gc_steps_atomic 0
> > | gc_steps_finalize       0
> > | gc_steps_pause  0
> > | gc_steps_propagate      62
> > | gc_steps_sweep  0
> > | gc_steps_sweepstring    0
> > | jit_snap_restore        0
> > | jit_trace_abort 0
> > | strhash_hit     118
> > | strhash_miss    7
> > | ]]
> > 
> 
> For me it is not that simple, frankly. Not in understanding - in use.
> Can we just incorporate the code into the getmetrics()?

Unfortunately, there is no way in Lua to return key-sorted table.  But
for better user experience we can add new serializer (if it doesn't
already exist) in tarantool console :)

> 
> Thanks,
> Sergos
> 
> > 
> > > > +    gc_steps_propagate: 10106
> > > > +    gc_steps_pause: 7
> > > > +...
> > > > +```
> > > > -- 
> > > > 2.24.1
> > > > 
> > 
> > Thoughts?
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018869.html

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list