From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp16.mail.ru (smtp16.mail.ru [94.100.176.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E4466469719 for ; Wed, 7 Oct 2020 22:56:23 +0300 (MSK) Date: Wed, 7 Oct 2020 22:55:58 +0300 From: Sergey Kaplun Message-ID: <20201007195558.GA20188@root> References: <2280bc3a2e32356455c3aebae711bafe2c4332f5.1601878708.git.skaplun@tarantool.org> <20201007141106.GP18920@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201007141106.GP18920@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v4 1/2] core: introduce various platform metrics List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org On 07.10.20, Igor Munkin wrote: > Sergey, > > Thanks for the patch! Please consider my comments below. > > On 05.10.20, Sergey Kaplun wrote: > > This patch introduces the following counters: > > - overall amount of allocated tables, cdata and udata objects > > - number of incremental GC steps grouped by GC state > > - number of string hashes hits and misses > > - amount of allocated and freed memory > > - number of trace aborts and restored snapshots > > Looks like you forgot a number of alive traces. > > > > > Interfaces to obtain these metrics via both Lua and C API are > > introduced in the next patch. > > > > Part of tarantool/tarantool#5187 > > --- > > src/lj_asm.c | 2 ++ > > src/lj_asm_arm.h | 7 +++++++ > > src/lj_asm_arm64.h | 7 +++++++ > > src/lj_asm_mips.h | 4 ++++ > > src/lj_asm_ppc.h | 3 +++ > > src/lj_asm_x86.h | 4 ++++ > > src/lj_cdata.c | 2 ++ > > src/lj_cdata.h | 2 ++ > > src/lj_gc.c | 4 ++++ > > src/lj_gc.h | 6 +----- > > src/lj_jit.h | 3 +++ > > src/lj_obj.h | 28 +++++++++++++++++++++++++--- > > src/lj_snap.c | 1 + > > src/lj_state.c | 2 +- > > src/lj_str.c | 5 +++++ > > src/lj_tab.c | 2 ++ > > src/lj_trace.c | 6 +++++- > > src/lj_udata.c | 2 ++ > > 18 files changed, 80 insertions(+), 10 deletions(-) > > > > diff --git a/src/lj_asm.c b/src/lj_asm.c > > index c2cf5a9..8fb3ccd 100644 > > --- a/src/lj_asm.c > > +++ b/src/lj_asm.c > > @@ -2273,6 +2273,7 @@ void lj_asm_trace(jit_State *J, GCtrace *T) > > as->J = J; > > as->T = T; > > J->curfinal = lj_trace_alloc(J->L, T); /* This copies the IR, too. */ > > + J->tracenum++; > > Why didn't you simply put this increment right to as > you did for decrement and ? I was confused by L argument. Fixed > > > as->flags = J->flags; > > as->loopref = J->loopref; > > as->realign = NULL; > > @@ -2379,6 +2380,7 @@ void lj_asm_trace(jit_State *J, GCtrace *T) > > lj_trace_free(J2G(J), J->curfinal); > > J->curfinal = NULL; /* In case lj_trace_alloc() OOMs. */ > > J->curfinal = lj_trace_alloc(J->L, T); > > + J->tracenum++; > > Ditto. And here too. > > > + /* Code incrementing cdatanum is sparse to avoid mips data hazards. */ > > Side note: LOL, here you are :) Finally :) > > > + emit_setgl(as, RID_RET+2, gc.cdatanum); > > Well, I glanced a MIPS register-usage convention and AFAICS $4 register > (RID_RET + 2) is a general-purpose (i.e. doesn't store 0 or preserved by > kernel) caller-safe one. Ergo it should be allocated it in a proper way > from scratch set, shouldn't it? > AFAIK, $a0 - $a3 ($4 - $7) registers are arguments to functions - not preserved by subprograms. But anyway explicit allocation is better here. Added. > > /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */ > > +/* Garbage collector states. Order matters. */ > > +enum { > > + GCSpause, /* Start a GC cycle and mark the root set.*/ > > + GCSpropagate, /* One gray object is processed. */ > > + GCSatomic, /* Atomic transition from mark to sweep phase. */ > > + GCSsweepstring, /* Sweep one chain of strings. */ > > + GCSsweep, /* Sweep few objects from root. */ > > + GCSfinalize, /* Finalize one userdata or cdata object. */ > > + GCSmax > > Adjust the comments considering the code nearby. Sure! Thanks! > > > +}; > > + > > typedef struct GCState { > > GCSize total; /* Memory currently allocated. */ > > GCSize threshold; /* Memory threshold. */ > > > > > @@ -602,22 +622,24 @@ typedef struct global_State { > > BloomFilter next[2]; > > } strbloom; > > #endif > > + size_t strhash_hit; /* Strings amount found in string hash. */ > > + size_t strhash_miss; /* Strings amount allocated and put into string hash. */ > > lua_Alloc allocf; /* Memory allocator. */ > > void *allocd; /* Memory allocator data. */ > > GCState gc; /* Garbage collector. */ > > - volatile int32_t vmstate; /* VM state or current JIT code trace number. */ > > SBuf tmpbuf; /* Temporary string buffer. */ > > GCstr strempty; /* Empty string. */ > > uint8_t stremptyz; /* Zero terminator of empty string. */ > > uint8_t hookmask; /* Hook mask. */ > > uint8_t dispatchmode; /* Dispatch mode. */ > > uint8_t vmevmask; /* VM event mask. */ > > + int32_t hookcount; /* Instruction hook countdown. */ > > GCRef mainthref; /* Link to main thread. */ > > TValue registrytv; /* Anchor for registry. */ > > - TValue tmptv, tmptv2; /* Temporary TValues. */ > > + TValue tmptv2, tmptv; /* Temporary TValues. */ > > Node nilnode; /* Fallback 1-element hash part (nil key and value). */ > > GCupval uvhead; /* Head of double-linked list of all open upvalues. */ > > - int32_t hookcount; /* Instruction hook countdown. */ > > + volatile int32_t vmstate; /* VM state or current JIT code trace number. */ > > There is no a single word in the commit message regarding this unclear > change. Please drop a sentence about the rationale for this reordering. I've changed commit message as follows: =================================================================== core: introduce various platform metrics This patch introduces the following counters: - overall amount of allocated tables, cdata and udata objects - number of incremental GC steps grouped by GC state - number of string hashes hits and misses - amount of allocated and freed memory - number of trace aborts, number of traces and restored snapshots Also this patch fixes alignment for 64-bit architectures. NB: MSize and BCIns are the only fixed types that equal 32 bits. GCRef, MRef and GCSize sizes depend on LJ_GC64 define. struct GCState is terminated by three fields: GCSize estimate, MSize stepmul and MSize pause, which are aligned. The introduces size_t fields do not violate the alignment too. vmstate 32-bit field goes right after GCState field within global_State structure. The next field tmpbuf consists of several MRef fields that have 64-bit size each. This issue can be solved by moving vmstate field below. However DynASM doesn't work well with unaligned memory access on 64-bit bigendian MIPS, so vmstate should be aligned to a 64-bit boundary. Furthermore field order has been changed to be able to compile code by DynASM for 32-bit ARM too (see also https://github.com/openresty/luajit2/issues/37#issuecomment-459145226). Interfaces to obtain these metrics via both Lua and C API are introduced in the next patch. Part of tarantool/tarantool#5187 =================================================================== Side note: If you want read a little bit more about ARM immediate value encoding (and play with it) see also [1]. > > > -- > > 2.28.0 > > > > -- > Best regards, > IM See iterative patch in the bottom. Branch force-pushed. =================================================================== diff --git a/src/lj_asm.c b/src/lj_asm.c index 8fb3ccd..c2cf5a9 100644 --- a/src/lj_asm.c +++ b/src/lj_asm.c @@ -2273,7 +2273,6 @@ void lj_asm_trace(jit_State *J, GCtrace *T) as->J = J; as->T = T; J->curfinal = lj_trace_alloc(J->L, T); /* This copies the IR, too. */ - J->tracenum++; as->flags = J->flags; as->loopref = J->loopref; as->realign = NULL; @@ -2380,7 +2379,6 @@ void lj_asm_trace(jit_State *J, GCtrace *T) lj_trace_free(J2G(J), J->curfinal); J->curfinal = NULL; /* In case lj_trace_alloc() OOMs. */ J->curfinal = lj_trace_alloc(J->L, T); - J->tracenum++; as->realign = NULL; } diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h index f4b4b5d..0341701 100644 --- a/src/lj_asm_mips.h +++ b/src/lj_asm_mips.h @@ -1430,7 +1430,9 @@ static void asm_cnew(ASMState *as, IRIns *ir) CTInfo info = lj_ctype_info(cts, id, &sz); const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_mem_newgco]; IRRef args[4]; + RegSet allow = (RSET_GPR & ~RSET_SCRATCH); RegSet drop = RSET_SCRATCH; + Reg tmp; lua_assert(sz != CTSIZE_INVALID || (ir->o == IR_CNEW && ir->op2 != REF_NIL)); as->gcsteps++; @@ -1442,7 +1444,6 @@ static void asm_cnew(ASMState *as, IRIns *ir) /* Initialize immutable cdata object. */ if (ir->o == IR_CNEWI) { - RegSet allow = (RSET_GPR & ~RSET_SCRATCH); #if LJ_32 int32_t ofs = sizeof(GCcdata); if (sz == 8) { @@ -1473,15 +1474,16 @@ static void asm_cnew(ASMState *as, IRIns *ir) return; } + tmp = ra_scratch(as, allow); /* Code incrementing cdatanum is sparse to avoid mips data hazards. */ - emit_setgl(as, RID_RET+2, gc.cdatanum); + emit_setgl(as, tmp, gc.cdatanum); /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */ emit_tsi(as, MIPSI_SB, RID_RET+1, RID_RET, offsetof(GCcdata, gct)); emit_tsi(as, MIPSI_SH, RID_TMP, RID_RET, offsetof(GCcdata, ctypeid)); - emit_tsi(as, MIPSI_AADDIU, RID_RET+2, RID_RET+2, 1); + emit_tsi(as, MIPSI_AADDIU, tmp, tmp, 1); emit_ti(as, MIPSI_LI, RID_RET+1, ~LJ_TCDATA); emit_ti(as, MIPSI_LI, RID_TMP, id); /* Lower 16 bit used. Sign-ext ok. */ - emit_getgl(as, RID_RET+2, gc.cdatanum); + emit_getgl(as, tmp, gc.cdatanum); args[0] = ASMREF_L; /* lua_State *L */ args[1] = ASMREF_TMP1; /* MSize size */ asm_gencall(as, ci, args); diff --git a/src/lj_obj.h b/src/lj_obj.h index 600b68d..927b347 100644 --- a/src/lj_obj.h +++ b/src/lj_obj.h @@ -573,12 +573,12 @@ typedef enum { /* Garbage collector states. Order matters. */ enum { - GCSpause, /* Start a GC cycle and mark the root set.*/ - GCSpropagate, /* One gray object is processed. */ - GCSatomic, /* Atomic transition from mark to sweep phase. */ - GCSsweepstring, /* Sweep one chain of strings. */ - GCSsweep, /* Sweep few objects from root. */ - GCSfinalize, /* Finalize one userdata or cdata object. */ + GCSpause, /* Start a GC cycle and mark the root set.*/ + GCSpropagate, /* One gray object is processed. */ + GCSatomic, /* Atomic transition from mark to sweep phase. */ + GCSsweepstring, /* Sweep one chain of strings. */ + GCSsweep, /* Sweep few objects from root. */ + GCSfinalize, /* Finalize one userdata or cdata object. */ GCSmax }; diff --git a/src/lj_trace.c b/src/lj_trace.c index 9bcbce6..86563cd 100644 --- a/src/lj_trace.c +++ b/src/lj_trace.c @@ -136,6 +136,7 @@ GCtrace * LJ_FASTCALL lj_trace_alloc(lua_State *L, GCtrace *T) T2->nsnap = T->nsnap; T2->nsnapmap = T->nsnapmap; memcpy(p, T->ir + T->nk, szins); + L2J(L)->tracenum++; return T2; } =================================================================== [1]: https://alisdair.mcdiarmid.org/arm-immediate-value-encoding/ -- Best regards, Sergey Kaplun