[Tarantool-patches] [PATCH v4 1/2] core: introduce various platform metrics

Sergey Kaplun skaplun at tarantool.org
Wed Oct 7 22:55:58 MSK 2020


On 07.10.20, Igor Munkin wrote:
> Sergey,
> 
> Thanks for the patch! Please consider my comments below.
> 
> On 05.10.20, Sergey Kaplun wrote:
> > This patch introduces the following counters:
> >   - overall amount of allocated tables, cdata and udata objects
> >   - number of incremental GC steps grouped by GC state
> >   - number of string hashes hits and misses
> >   - amount of allocated and freed memory
> >   - number of trace aborts and restored snapshots
> 
> Looks like you forgot a number of alive traces.
> 
> > 
> > Interfaces to obtain these metrics via both Lua and C API are
> > introduced in the next patch.
> > 
> > Part of tarantool/tarantool#5187
> > ---
> >  src/lj_asm.c       |  2 ++
> >  src/lj_asm_arm.h   |  7 +++++++
> >  src/lj_asm_arm64.h |  7 +++++++
> >  src/lj_asm_mips.h  |  4 ++++
> >  src/lj_asm_ppc.h   |  3 +++
> >  src/lj_asm_x86.h   |  4 ++++
> >  src/lj_cdata.c     |  2 ++
> >  src/lj_cdata.h     |  2 ++
> >  src/lj_gc.c        |  4 ++++
> >  src/lj_gc.h        |  6 +-----
> >  src/lj_jit.h       |  3 +++
> >  src/lj_obj.h       | 28 +++++++++++++++++++++++++---
> >  src/lj_snap.c      |  1 +
> >  src/lj_state.c     |  2 +-
> >  src/lj_str.c       |  5 +++++
> >  src/lj_tab.c       |  2 ++
> >  src/lj_trace.c     |  6 +++++-
> >  src/lj_udata.c     |  2 ++
> >  18 files changed, 80 insertions(+), 10 deletions(-)
> > 
> > diff --git a/src/lj_asm.c b/src/lj_asm.c
> > index c2cf5a9..8fb3ccd 100644
> > --- a/src/lj_asm.c
> > +++ b/src/lj_asm.c
> > @@ -2273,6 +2273,7 @@ void lj_asm_trace(jit_State *J, GCtrace *T)
> >    as->J = J;
> >    as->T = T;
> >    J->curfinal = lj_trace_alloc(J->L, T);  /* This copies the IR, too. */
> > +  J->tracenum++;
> 
> Why didn't you simply put this increment right to <lj_trace_alloc> as
> you did for decrement and <lj_trace_free>?

I was confused by L argument. Fixed

> 
> >    as->flags = J->flags;
> >    as->loopref = J->loopref;
> >    as->realign = NULL;
> > @@ -2379,6 +2380,7 @@ void lj_asm_trace(jit_State *J, GCtrace *T)
> >      lj_trace_free(J2G(J), J->curfinal);
> >      J->curfinal = NULL;  /* In case lj_trace_alloc() OOMs. */
> >      J->curfinal = lj_trace_alloc(J->L, T);
> > +    J->tracenum++;
> 
> Ditto.

And here too.

> 

<snipped>

> > +  /* Code incrementing cdatanum is sparse to avoid mips data hazards. */
> 
> Side note: LOL, here you are :)

Finally :)

> 
> > +  emit_setgl(as, RID_RET+2, gc.cdatanum);
> 
> Well, I glanced a MIPS register-usage convention and AFAICS $4 register
> (RID_RET + 2) is a general-purpose (i.e. doesn't store 0 or preserved by
> kernel) caller-safe one. Ergo it should be allocated it in a proper way
> from scratch set, shouldn't it?
> 

AFAIK, $a0 - $a3 ($4 - $7) registers are arguments to functions - not
preserved by subprograms.
But anyway explicit allocation is better here. Added.

> >    /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */

<snipped>

> > +/* Garbage collector states. Order matters. */
> > +enum {
> > +  GCSpause, /* Start a GC cycle and mark the root set.*/
> > +  GCSpropagate, /* One gray object is processed. */
> > +  GCSatomic, /* Atomic transition from mark to sweep phase. */
> > +  GCSsweepstring, /* Sweep one chain of strings. */
> > +  GCSsweep, /* Sweep few objects from root. */
> > +  GCSfinalize, /* Finalize one userdata or cdata object. */
> > +  GCSmax
> 
> Adjust the comments considering the code nearby.

Sure! Thanks!

> 
> > +};
> > +
> >  typedef struct GCState {
> >    GCSize total;		/* Memory currently allocated. */
> >    GCSize threshold;	/* Memory threshold. */
> 
> <snipped>
> 
> > @@ -602,22 +622,24 @@ typedef struct global_State {
> >      BloomFilter next[2];
> >    } strbloom;
> >  #endif
> > +  size_t strhash_hit;	/* Strings amount found in string hash. */
> > +  size_t strhash_miss;	/* Strings amount allocated and put into string hash. */
> >    lua_Alloc allocf;	/* Memory allocator. */
> >    void *allocd;		/* Memory allocator data. */
> >    GCState gc;		/* Garbage collector. */
> > -  volatile int32_t vmstate;  /* VM state or current JIT code trace number. */
> >    SBuf tmpbuf;		/* Temporary string buffer. */
> >    GCstr strempty;	/* Empty string. */
> >    uint8_t stremptyz;	/* Zero terminator of empty string. */
> >    uint8_t hookmask;	/* Hook mask. */
> >    uint8_t dispatchmode;	/* Dispatch mode. */
> >    uint8_t vmevmask;	/* VM event mask. */
> > +  int32_t hookcount;	/* Instruction hook countdown. */
> >    GCRef mainthref;	/* Link to main thread. */
> >    TValue registrytv;	/* Anchor for registry. */
> > -  TValue tmptv, tmptv2;	/* Temporary TValues. */
> > +  TValue tmptv2, tmptv;	/* Temporary TValues. */
> >    Node nilnode;		/* Fallback 1-element hash part (nil key and value). */
> >    GCupval uvhead;	/* Head of double-linked list of all open upvalues. */
> > -  int32_t hookcount;	/* Instruction hook countdown. */
> > +  volatile int32_t vmstate;  /* VM state or current JIT code trace number. */
> 
> There is no a single word in the commit message regarding this unclear
> change. Please drop a sentence about the rationale for this reordering.

I've changed commit message as follows:

===================================================================
core: introduce various platform metrics

This patch introduces the following counters:
  - overall amount of allocated tables, cdata and udata objects
  - number of incremental GC steps grouped by GC state
  - number of string hashes hits and misses
  - amount of allocated and freed memory
  - number of trace aborts, number of traces and restored snapshots

Also this patch fixes alignment for 64-bit architectures.

NB: MSize and BCIns are the only fixed types that equal 32 bits. GCRef,
MRef and GCSize sizes depend on LJ_GC64 define.

struct GCState is terminated by three fields: GCSize estimate, MSize
stepmul and MSize pause, which are aligned. The introduces size_t
fields do not violate the alignment too.

vmstate 32-bit field goes right after GCState field within global_State
structure. The next field tmpbuf consists of several MRef fields that
have 64-bit size each. This issue can be solved by moving vmstate field
below. However DynASM doesn't work well with unaligned memory access on
64-bit bigendian MIPS, so vmstate should be aligned to a 64-bit
boundary.

Furthermore field order has been changed to be able to compile code by
DynASM for 32-bit ARM too (see also
https://github.com/openresty/luajit2/issues/37#issuecomment-459145226).

Interfaces to obtain these metrics via both Lua and C API are
introduced in the next patch.

Part of tarantool/tarantool#5187
===================================================================

Side note: If you want read a little bit more about ARM immediate value
encoding (and play with it) see also [1].

> 

<snipped>

> > -- 
> > 2.28.0
> > 
> 
> -- 
> Best regards,
> IM

See iterative patch in the bottom. Branch force-pushed.

===================================================================
diff --git a/src/lj_asm.c b/src/lj_asm.c
index 8fb3ccd..c2cf5a9 100644
--- a/src/lj_asm.c
+++ b/src/lj_asm.c
@@ -2273,7 +2273,6 @@ void lj_asm_trace(jit_State *J, GCtrace *T)
   as->J = J;
   as->T = T;
   J->curfinal = lj_trace_alloc(J->L, T);  /* This copies the IR, too. */
-  J->tracenum++;
   as->flags = J->flags;
   as->loopref = J->loopref;
   as->realign = NULL;
@@ -2380,7 +2379,6 @@ void lj_asm_trace(jit_State *J, GCtrace *T)
     lj_trace_free(J2G(J), J->curfinal);
     J->curfinal = NULL;  /* In case lj_trace_alloc() OOMs. */
     J->curfinal = lj_trace_alloc(J->L, T);
-    J->tracenum++;
     as->realign = NULL;
   }
 
diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
index f4b4b5d..0341701 100644
--- a/src/lj_asm_mips.h
+++ b/src/lj_asm_mips.h
@@ -1430,7 +1430,9 @@ static void asm_cnew(ASMState *as, IRIns *ir)
   CTInfo info = lj_ctype_info(cts, id, &sz);
   const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_mem_newgco];
   IRRef args[4];
+  RegSet allow = (RSET_GPR & ~RSET_SCRATCH);
   RegSet drop = RSET_SCRATCH;
+  Reg tmp;
   lua_assert(sz != CTSIZE_INVALID || (ir->o == IR_CNEW && ir->op2 != REF_NIL));
 
   as->gcsteps++;
@@ -1442,7 +1444,6 @@ static void asm_cnew(ASMState *as, IRIns *ir)
 
   /* Initialize immutable cdata object. */
   if (ir->o == IR_CNEWI) {
-    RegSet allow = (RSET_GPR & ~RSET_SCRATCH);
 #if LJ_32
     int32_t ofs = sizeof(GCcdata);
     if (sz == 8) {
@@ -1473,15 +1474,16 @@ static void asm_cnew(ASMState *as, IRIns *ir)
     return;
   }
 
+  tmp = ra_scratch(as, allow);
   /* Code incrementing cdatanum is sparse to avoid mips data hazards. */
-  emit_setgl(as, RID_RET+2, gc.cdatanum);
+  emit_setgl(as, tmp, gc.cdatanum);
   /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */
   emit_tsi(as, MIPSI_SB, RID_RET+1, RID_RET, offsetof(GCcdata, gct));
   emit_tsi(as, MIPSI_SH, RID_TMP, RID_RET, offsetof(GCcdata, ctypeid));
-  emit_tsi(as, MIPSI_AADDIU, RID_RET+2, RID_RET+2, 1);
+  emit_tsi(as, MIPSI_AADDIU, tmp, tmp, 1);
   emit_ti(as, MIPSI_LI, RID_RET+1, ~LJ_TCDATA);
   emit_ti(as, MIPSI_LI, RID_TMP, id); /* Lower 16 bit used. Sign-ext ok. */
-  emit_getgl(as, RID_RET+2, gc.cdatanum);
+  emit_getgl(as, tmp, gc.cdatanum);
   args[0] = ASMREF_L;     /* lua_State *L */
   args[1] = ASMREF_TMP1;  /* MSize size   */
   asm_gencall(as, ci, args);
diff --git a/src/lj_obj.h b/src/lj_obj.h
index 600b68d..927b347 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -573,12 +573,12 @@ typedef enum {
 
 /* Garbage collector states. Order matters. */
 enum {
-  GCSpause, /* Start a GC cycle and mark the root set.*/
-  GCSpropagate, /* One gray object is processed. */
-  GCSatomic, /* Atomic transition from mark to sweep phase. */
-  GCSsweepstring, /* Sweep one chain of strings. */
-  GCSsweep, /* Sweep few objects from root. */
-  GCSfinalize, /* Finalize one userdata or cdata object. */
+  GCSpause,		/* Start a GC cycle and mark the root set.*/
+  GCSpropagate,		/* One gray object is processed. */
+  GCSatomic,		/* Atomic transition from mark to sweep phase. */
+  GCSsweepstring,	/* Sweep one chain of strings. */
+  GCSsweep,		/* Sweep few objects from root. */
+  GCSfinalize,		/* Finalize one userdata or cdata object. */
   GCSmax
 };
 
diff --git a/src/lj_trace.c b/src/lj_trace.c
index 9bcbce6..86563cd 100644
--- a/src/lj_trace.c
+++ b/src/lj_trace.c
@@ -136,6 +136,7 @@ GCtrace * LJ_FASTCALL lj_trace_alloc(lua_State *L, GCtrace *T)
   T2->nsnap = T->nsnap;
   T2->nsnapmap = T->nsnapmap;
   memcpy(p, T->ir + T->nk, szins);
+  L2J(L)->tracenum++;
   return T2;
 }
 
===================================================================

[1]: https://alisdair.mcdiarmid.org/arm-immediate-value-encoding/

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list