[Tarantool-patches] [PATCH 1/2] metrics: add counters for metrics interested in

Sergey Kaplun skaplun at tarantool.org
Fri Jul 24 15:00:00 MSK 2020


Hi Igor! Thanks for the review!

On 23.07.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider my comments below.
> 
> On 21.07.20, Sergey Kaplun wrote:
> 
> We still don't have strict prefixes, so I propose to use 'core' for this
> patch. Does it look fine to you? Feel free to propose your own one.

Maybe we should use 'misc' for this kind of patches?

> > This patch adds counters for:
> >   - amount of table, cdata and udata objects,
> >   - GC invokations by GC states,
> >   - strhash misses or hits,
> >   - amount of freed or allocated memory,
> >   - amount of aborted traces and restored snapshots.
> > 
> > C and Lua API will be added in the next patch.
> 
> OK, let's make the commit message a bit clearer:
> | core: introduce various platform metrics
> |
> | This patch introduces the following counters:
> | - overall amount of allocated tables, cdata and udata objects
> | - number of incremental GC steps grouped by GC state
> | - number of string hashes hits and misses
> | - amount of allocated and freed memory
> | - number of trace aborts and restored snapshots
> |
> | Interfaces to obtain these metrics via both Lua and C API are
> | introduced in the next patch.
> 
> Feel free to change it on your own.

Thanks! It looks fine for me.

> > 
> > Part of tarantool/tarantool#5187
> > ---
> >  src/lj_cdata.c |  2 ++
> >  src/lj_cdata.h |  2 ++
> >  src/lj_gc.c    |  4 ++++
> >  src/lj_gc.h    |  6 +-----
> >  src/lj_jit.h   |  7 +++++++
> >  src/lj_obj.h   | 25 +++++++++++++++++++++++++
> >  src/lj_snap.c  |  1 +
> >  src/lj_state.c | 13 ++++++++++++-
> >  src/lj_str.c   |  5 +++++
> >  src/lj_tab.c   |  2 ++
> >  src/lj_trace.c |  5 ++++-
> >  src/lj_udata.c |  2 ++
> >  12 files changed, 67 insertions(+), 7 deletions(-)
> > 
> 
> <snipped>
> 
> > diff --git a/src/lj_jit.h b/src/lj_jit.h
> > index 7eb3d2a..960a02b 100644
> > --- a/src/lj_jit.h
> > +++ b/src/lj_jit.h
> > @@ -475,6 +475,13 @@ typedef struct jit_State {
> >    size_t szmcarea;	/* Size of current mcode area. */
> >    size_t szallmcarea;	/* Total size of all allocated mcode areas. */
> >  
> > +  size_t nsnaprestore;	/* Overall number of snap restores for all traces
> > +                        ** "belonging" to the given jit_State
> > +                        ** since the last call to luaM_metrics(). */
> > +  size_t ntraceabort;	/* Overall number of abort traces
> > +                        ** "belonging" to the given jit_State
> > +                        ** since the last call to luaM_metrics(). */
> 
> Code indentation is really strange here, like you mixed LuaJIT and uJIT
> styles. Besides, I guess the remark regarding the field reset is wrong:
> * These fields are not reset in this patch.
> * <luaM_metrics> is introduced only in the following patch but you've
>   already mentioned it here.

My bad! Thanks for this catch. I will fix indentation in the next
patch.

Naming-wise, as you can see at [1] we shouldn't use metrics since last
call. So comment should gone.

> > +
> >    TValue errinfo;	/* Additional info element for trace errors. */
> >  
> >  #if LJ_HASPROFILE
> > diff --git a/src/lj_obj.h b/src/lj_obj.h
> > index f368578..1fee04f 100644
> > --- a/src/lj_obj.h
> > +++ b/src/lj_obj.h
> > @@ -571,13 +571,28 @@ typedef enum {
> >  #define basemt_obj(g, o)	((g)->gcroot[GCROOT_BASEMT+itypemap(o)])
> >  #define mmname_str(g, mm)	(strref((g)->gcroot[GCROOT_MMNAME+(mm)]))
> >  
> > +/* Garbage collector states. Order matters. */
> > +enum {
> > +  GCSpause,
> > +  GCSpropagate,
> > +  GCSatomic,
> > +  GCSsweepstring,
> > +  GCSsweep,
> > +  GCSfinalize,
> > +  GCSlast
> 
> Minor: GCSMAX looks better, doesn't it? Feel free to ignore.

Really, GCSlast is unreachable. GCSmax for this kind of naming is good
enough. GCSunreachable also looks nice. What do you think?

> > +};
> > +
> >  typedef struct GCState {
> >    GCSize total;		/* Memory currently allocated. */
> > +  size_t freed;		/* Memory freed since last luaM_metrics() call. */
> > +  size_t allocated;	/* Memory allocated since last luaM_metrics() call. */
> 
> Well, why <total> field type differs from <freed> and <allocated> fields
> type? And again, remark about <luaM_metrics> can be dropped (see above).

GCSize for total memfory usage is defines by LJ_GC64 macro definition.
| #if LJ_GC64
| typedef uint64_t GCSize;
| #else
| typedef uint32_t GCSize;
| #endif

But summarized allocation size can be many times over maximum memory
size). So these field types should be independet.

> >    GCSize threshold;	/* Memory threshold. */
> >    uint8_t currentwhite;	/* Current white color. */
> >    uint8_t state;	/* GC state. */
> >    uint8_t nocdatafin;	/* No cdata finalizer called. */
> >    uint8_t unused2;
> > +  size_t state_count[GCSlast]; /* Count of GC invocations with different states
> > +			       ** since previous call of luaM_metrics().  */
> 
> Again, a mess with whitespace. Furthermore, I propose to reword this
> comment the following way:
> | /* Count of incremental GC steps per state */

Ok, no problems!

> >    MSize sweepstr;	/* Sweep position in string table. */
> >    GCRef root;		/* List of all collectable objects. */
> >    MRef sweep;		/* Sweep position in root list. */
> > @@ -589,6 +604,12 @@ typedef struct GCState {
> >    GCSize estimate;	/* Estimate of memory actually in use. */
> >    MSize stepmul;	/* Incremental GC step granularity. */
> >    MSize pause;		/* Pause between successive GC cycles. */
> > +
> > +  size_t tabnum;	/* Current amount of table objects. */
> > +  size_t udatanum;	/* Current amount of udata objects. */
> 
> Minor: These comments are a bit misleading. "Current" sounds like the
> metric is reset somewhen. But you're talking about "alive" objects of
> the particular type. Yes, formally (in GC terms) some of them are not
> alive, but are still allocated (so called "dead") objects. So the term
> "alive" is also a misleading one. I guess you can freely reword the
> comments the following way:
> | /* Number of allocated <GCtype> objects */
> 

I think that "allocated" is unnecessarily. We can omit it easily.

I should reword the comments as
| /* Number of <GCtype> objects. */
or
| /* Amount of <GCtype> objects. */
and it seems enough.

> > +#ifdef LJ_HASFFI
> > +  size_t cdatanum;	/* Current amount of cdata objects. */
> 
> This counter is not incremented if cdata is allocated on trace. Consider
> the following test:
> | $ LUA_PATH="$HOME/.luarocks/share/lua/5.1/?.lua;;" ./luajit -joff <<EOF
> | heredoc> local t = { }
> | heredoc> for i = 1, 1000 do
> | heredoc>   table.insert(t, require('ffi').new('uint64_t', i))
> | heredoc> end
> | heredoc> t = nil
> | heredoc> collectgarbage('collect')
> | heredoc> print(require('inspect')(misc.getmetrics()))
> | heredoc> EOF
> | {
> |   global = {
> |     cdatanum = 0,
> |     gc_total = 74722,
> |     jit_mcode_size = 0,
> |     jit_trace_num = 0,
> |     strnum = 534,
> |     tabnum = 77,
> |     udatanum = 5
> |   },
> |   incremental = {
> |     gc_allocated = 173092,
> |     gc_freed = 98370,
> |     gc_steps_atomic = 1,
> |     gc_steps_finalize = 0,
> |     gc_steps_pause = 3,
> |     gc_steps_propagate = 553,
> |     gc_steps_sweep = 64,
> |     gc_steps_sweepstring = 1024,
> |     jit_snap_restore = 0,
> |     jit_trace_abort = 0,
> |     strhash_hit = 4021,
> |     strhash_miss = 538
> |   }
> | }
> | $ LUA_PATH="$HOME/.luarocks/share/lua/5.1/?.lua;;" ./luajit -jon <<EOF
> | heredoc> local t = { }
> | heredoc> for i = 1, 1000 do
> | heredoc>   table.insert(t, require('ffi').new('uint64_t', i))
> | heredoc> end
> | heredoc> t = nil
> | heredoc> collectgarbage('collect')
> | heredoc> print(require('inspect')(misc.getmetrics()))
> | heredoc> EOF
> | {
> |   global = {
> |     cdatanum = -941,
> |     gc_total = 76561,
> |     jit_mcode_size = 65536,
> |     jit_trace_num = 3,
> |     strnum = 534,
> |     tabnum = 77,
> |     udatanum = 5
> |   },
> |   incremental = {
> |     gc_allocated = 145299,
> |     gc_freed = 68738,
> |     gc_steps_atomic = 3,
> |     gc_steps_finalize = 0,
> |     gc_steps_pause = 2,
> |     gc_steps_propagate = 505,
> |     gc_steps_sweep = 64,
> |     gc_steps_sweepstring = 1024,
> |     jit_snap_restore = 6,
> |     jit_trace_abort = 0,
> |     strhash_hit = 3085,
> |     strhash_miss = 538
> |   }
> | }
> 
> As you can see global.cdatanum is negative when JIT is on. You can take
> a look on the compiled trace by yourself. The root cause is CNEW IR
> semantics: for VLA/VLS cdata types <lj_cdata_newv> call is emitted and
> the counters are fine, but for other cdata types JIT compiler emits just
> a <lj_mem_newgco> call that so this object is not counted. Thereby JIT
> has to emit the corresponding instructions while assembling CNEW IR.
> I will elaborate on this later.

My fault!
Ok, I will try to implement it at this week.

> > +#endif
> >  } GCState;
> >  
> >  /* Global state, shared by all threads of a Lua universe. */
> > @@ -602,6 +623,10 @@ typedef struct global_State {
> >      BloomFilter next[2];
> >    } strbloom;
> >  #endif
> > +  size_t strhash_hit;	/* New string has been found in the storage
> > +			** since last luaM_metrics() call. */
> > +  size_t strhash_miss;	/* New string has been added to the storage
> > +			** since last luaM_metrics() call. */
> 
> Well, such indentation is closer to the desired one, but I would use two
> spaces prior to these tabs. What do you think? Also please, don't forget
> about <luaM_metrics> remark.

Sorry, but ^\s+\t+\s+ stylistics looks very bad. In *.c files Mike use
exactly ^\t+ style for code, so we have to use the same for comments.

Thoughts?

> >    lua_Alloc allocf;	/* Memory allocator. */
> >    void *allocd;		/* Memory allocator data. */
> >    GCState gc;		/* Garbage collector. */
> 
> <snipped>
> 
> > diff --git a/src/lj_state.c b/src/lj_state.c
> > index 632dd07..b71ebae 100644
> > --- a/src/lj_state.c
> > +++ b/src/lj_state.c
> > @@ -204,6 +204,16 @@ LUA_API lua_State *lua_newstate(lua_Alloc f, void *ud)
> >    setgcref(g->uvhead.prev, obj2gco(&g->uvhead));
> >    setgcref(g->uvhead.next, obj2gco(&g->uvhead));
> >    g->strmask = ~(MSize)0;
> > +
> > +  memset(g->gc.state_count, 0, GCSlast * sizeof(g->gc.state_count[0]));
> > +  g->strhash_hit = 0;
> > +  g->strhash_miss = 0;
> > +  g->gc.tabnum = 0;
> > +  g->gc.udatanum = 0;
> > +#if LJ_HASFFI
> > +  g->gc.cdatanum = 0;
> > +#endif
> > +
> 
> It looks like all assignments above are excess: <GG_State> is filled
> with zeros, <g> points to the corresponding <GG_State> chunk, <gc> is
> allocated within <g>, ergo also initialized with zeros. Feel free to
> drop this hunk.

Nice catch! Thanks!

> >    setnilV(registry(L));
> >    setnilV(&g->nilnode.val);
> >    setnilV(&g->nilnode.key);
> > @@ -214,7 +224,8 @@ LUA_API lua_State *lua_newstate(lua_Alloc f, void *ud)
> >    g->gc.state = GCSpause;
> >    setgcref(g->gc.root, obj2gco(L));
> >    setmref(g->gc.sweep, &g->gc.root);
> > -  g->gc.total = sizeof(GG_State);
> > +  g->gc.allocated = g->gc.total = sizeof(GG_State);
> > +  g->gc.freed = 0;
> 
> Ditto for g->gc.freed.

Sure!

> 
> >    g->gc.pause = LUAI_GCPAUSE;
> >    g->gc.stepmul = LUAI_GCMUL;
> >    lj_dispatch_init((GG_State *)L);
> 
> <snipped>
> 
> > -- 
> > 2.24.1
> > 
> 
> -- 
> Best regards,
> IM

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018845.html

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list