[Tarantool-patches] [RFC] rfc: luajit metrics

Sergey Ostanevich sergos at tarantool.org
Fri Jul 24 20:20:03 MSK 2020


Hi!


On 24 Jul 14:18, Sergey Kaplun wrote:
> Hi! Thanks for the review!
> 
> Mons makes no bones of mailing list so I cite his comments from
> [1] here.
> 
> On 23.07.20, Sergey Ostanevich wrote:
> > Hi!
> > 
> > Thanks for the patch! See my comments below.
> > Otherwise LGTM.
> > 
> > Sergos
> > 
...
> > > +	/*
> > > +	 * Overall number of snap restores for all traces
> > 
> > Snap restores needs some explanation.
> 
> Ok, sounds reasonable. AFAIK number of snap restores equals to amount of
> trace exits. May be this will be more informative for users.
> 

Well. What this will tell to you as a user running a Lua code? Consider
the number is grows - ??? And what if it shrinks??

I believe it is tighly bound to the total number of traces and the size
of code. But what should it disclose to the user - how should it react,
refactor its code?

> > > +	 * "belonging" to the given jit_State since the last call
> > > +	 * to luaM_metrics().
> > > +	 */
> > > +	size_t jit_snap_restore;
> > > +	/*
> > > +	 * Overall number of abort traces "belonging" to the given
> > > +	 * jit_State since the last call to luaM_metrics().
> > > +	 */
> > > +	size_t jit_trace_abort;
> > > +	/* Total size of all allocated machine code areas. */
> > > +	size_t jit_mcode_size;
> > > +	/* Current amount of JIT traces. */
> > > +	unsigned int jit_trace_num;
> > > +};
> > > +```
> > > +
> > > +One can see metrics are divided by the two types: global and incremental.
> > > +Global metrics are collected throughout the platform uptime. Incremental
> > > +metrics are reset after each `luaM_metrics()` call.
> 
> As Mons said:
> | Incremental metrics must not reset after each call (see practice from
> | Linux kernel, for ex.). Rationale is simple: 2 different modules, both
> | using this function, will ruin metrics to each other
> 
> It's very important comment. With two modules that use metrics our
> counters become just garbage. I suppose that we should use monotonic
> increase counters for all "incremental" metrics. Users can count delta
> between calls of `getmetrics` and analise a result.
> 
> We can use uint64_t for storage of these metrics. If we astimate amount
> of allocated objects as 40Gb per second it will be necessary 14 years of
> instance uptime to overflow this counter. Also all logic corresponding
> to overflow of the counter can be realized inside user code.
> 
> > > +
> > > +There is also a complement introduced for Lua space -- `misc.getmetrics()`.
> > > +This function is just a wrapper for `luaM_metrics()` returning a Lua table with
> > > +the similar metrics. Its usage is quite simple:
> > > +```
> > > +$ ./src/tarantool
> > > +Tarantool 2.5.0-267-gbf047ad44
> > > +type 'help' for interactive help
> > > +tarantool> misc.getmetrics()
> > > +---
> > > +- global:
> > > +    tabnum: 1812
> > > +    gc_total: 1369927
> > > +    strnum: 5767
> > > +    jit_trace_num: 0
> > > +    cdatanum: 89
> > > +    jit_mcode_size: 0
> > > +    udatanum: 17
> > > +  incremental:
> > 
> > incremental_from_last_call will be more descriptive?
> 
> Mons:
> | I'd consider rename global to absolute. All metrics are global (not
> | scoped). But some are incremental counters and others are absolute
> | current values.
> 
> And also as far as we mustn't use updatable by call metrics, should we join all
> metrics inside one table, shouldn't we?

Sure, both comments are correct - we should report just absolute values,
but mention this fact in documentation. 

> 
> > > +    jit_snap_restore: 0
> > > +    gc_freed: 2239391
> > > +    strhash_hit: 53759
> > > +    gc_steps_finalize: 0
> > > +    gc_allocated: 3609318
> > > +    gc_steps_atomic: 6
> > > +    gc_steps_sweep: 296
> > > +    gc_steps_sweepstring: 17920
> > > +    jit_trace_abort: 0
> > > +    strhash_miss: 6874
> > 
> > I wonder if those keys can be sorted? Consider, I want to find
> > something by the name. Sorting helps greatly. 
> 
> It can be implement by user as simple as:
> 
> | local metrics = xjit.getmetrics()
> | local sorted = {}
> | for k, v in pairs(metrics.incremental) do table.insert(sorted, k) end
> | table.sort(sorted)
> | for _, k in ipairs(sorted) do print(k, metrics.incremental[k]) end
> |
> | --[[
> | gc_allocated    8920
> | gc_freed        3528
> | gc_steps_atomic 0
> | gc_steps_finalize       0
> | gc_steps_pause  0
> | gc_steps_propagate      62
> | gc_steps_sweep  0
> | gc_steps_sweepstring    0
> | jit_snap_restore        0
> | jit_trace_abort 0
> | strhash_hit     118
> | strhash_miss    7
> | ]]
> 

For me it is not that simple, frankly. Not in understanding - in use.
Can we just incorporate the code into the getmetrics()?

Thanks,
Sergos

> 
> > > +    gc_steps_propagate: 10106
> > > +    gc_steps_pause: 7
> > > +...
> > > +```
> > > -- 
> > > 2.24.1
> > > 
> 
> Thoughts?
> 
> -- 
> Best regards,
> Sergey Kaplun


More information about the Tarantool-patches mailing list