[Tarantool-patches] [RFC] rfc: luajit metrics
Sergey Ostanevich
sergos at tarantool.org
Fri Jul 24 20:20:03 MSK 2020
Hi!
On 24 Jul 14:18, Sergey Kaplun wrote:
> Hi! Thanks for the review!
>
> Mons makes no bones of mailing list so I cite his comments from
> [1] here.
>
> On 23.07.20, Sergey Ostanevich wrote:
> > Hi!
> >
> > Thanks for the patch! See my comments below.
> > Otherwise LGTM.
> >
> > Sergos
> >
...
> > > + /*
> > > + * Overall number of snap restores for all traces
> >
> > Snap restores needs some explanation.
>
> Ok, sounds reasonable. AFAIK number of snap restores equals to amount of
> trace exits. May be this will be more informative for users.
>
Well. What this will tell to you as a user running a Lua code? Consider
the number is grows - ??? And what if it shrinks??
I believe it is tighly bound to the total number of traces and the size
of code. But what should it disclose to the user - how should it react,
refactor its code?
> > > + * "belonging" to the given jit_State since the last call
> > > + * to luaM_metrics().
> > > + */
> > > + size_t jit_snap_restore;
> > > + /*
> > > + * Overall number of abort traces "belonging" to the given
> > > + * jit_State since the last call to luaM_metrics().
> > > + */
> > > + size_t jit_trace_abort;
> > > + /* Total size of all allocated machine code areas. */
> > > + size_t jit_mcode_size;
> > > + /* Current amount of JIT traces. */
> > > + unsigned int jit_trace_num;
> > > +};
> > > +```
> > > +
> > > +One can see metrics are divided by the two types: global and incremental.
> > > +Global metrics are collected throughout the platform uptime. Incremental
> > > +metrics are reset after each `luaM_metrics()` call.
>
> As Mons said:
> | Incremental metrics must not reset after each call (see practice from
> | Linux kernel, for ex.). Rationale is simple: 2 different modules, both
> | using this function, will ruin metrics to each other
>
> It's very important comment. With two modules that use metrics our
> counters become just garbage. I suppose that we should use monotonic
> increase counters for all "incremental" metrics. Users can count delta
> between calls of `getmetrics` and analise a result.
>
> We can use uint64_t for storage of these metrics. If we astimate amount
> of allocated objects as 40Gb per second it will be necessary 14 years of
> instance uptime to overflow this counter. Also all logic corresponding
> to overflow of the counter can be realized inside user code.
>
> > > +
> > > +There is also a complement introduced for Lua space -- `misc.getmetrics()`.
> > > +This function is just a wrapper for `luaM_metrics()` returning a Lua table with
> > > +the similar metrics. Its usage is quite simple:
> > > +```
> > > +$ ./src/tarantool
> > > +Tarantool 2.5.0-267-gbf047ad44
> > > +type 'help' for interactive help
> > > +tarantool> misc.getmetrics()
> > > +---
> > > +- global:
> > > + tabnum: 1812
> > > + gc_total: 1369927
> > > + strnum: 5767
> > > + jit_trace_num: 0
> > > + cdatanum: 89
> > > + jit_mcode_size: 0
> > > + udatanum: 17
> > > + incremental:
> >
> > incremental_from_last_call will be more descriptive?
>
> Mons:
> | I'd consider rename global to absolute. All metrics are global (not
> | scoped). But some are incremental counters and others are absolute
> | current values.
>
> And also as far as we mustn't use updatable by call metrics, should we join all
> metrics inside one table, shouldn't we?
Sure, both comments are correct - we should report just absolute values,
but mention this fact in documentation.
>
> > > + jit_snap_restore: 0
> > > + gc_freed: 2239391
> > > + strhash_hit: 53759
> > > + gc_steps_finalize: 0
> > > + gc_allocated: 3609318
> > > + gc_steps_atomic: 6
> > > + gc_steps_sweep: 296
> > > + gc_steps_sweepstring: 17920
> > > + jit_trace_abort: 0
> > > + strhash_miss: 6874
> >
> > I wonder if those keys can be sorted? Consider, I want to find
> > something by the name. Sorting helps greatly.
>
> It can be implement by user as simple as:
>
> | local metrics = xjit.getmetrics()
> | local sorted = {}
> | for k, v in pairs(metrics.incremental) do table.insert(sorted, k) end
> | table.sort(sorted)
> | for _, k in ipairs(sorted) do print(k, metrics.incremental[k]) end
> |
> | --[[
> | gc_allocated 8920
> | gc_freed 3528
> | gc_steps_atomic 0
> | gc_steps_finalize 0
> | gc_steps_pause 0
> | gc_steps_propagate 62
> | gc_steps_sweep 0
> | gc_steps_sweepstring 0
> | jit_snap_restore 0
> | jit_trace_abort 0
> | strhash_hit 118
> | strhash_miss 7
> | ]]
>
For me it is not that simple, frankly. Not in understanding - in use.
Can we just incorporate the code into the getmetrics()?
Thanks,
Sergos
>
> > > + gc_steps_propagate: 10106
> > > + gc_steps_pause: 7
> > > +...
> > > +```
> > > --
> > > 2.24.1
> > >
>
> Thoughts?
>
> --
> Best regards,
> Sergey Kaplun
More information about the Tarantool-patches
mailing list