From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 06875469719 for ; Wed, 7 Oct 2020 23:26:37 +0300 (MSK) Date: Wed, 7 Oct 2020 23:16:01 +0300 From: Igor Munkin Message-ID: <20201007201601.GR18920@tarantool.org> References: <2280bc3a2e32356455c3aebae711bafe2c4332f5.1601878708.git.skaplun@tarantool.org> <20201007141106.GP18920@tarantool.org> <20201007195558.GA20188@root> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201007195558.GA20188@root> Subject: Re: [Tarantool-patches] [PATCH v4 1/2] core: introduce various platform metrics List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Sergey, Thanks for your fixes! There is still a comment regarding CNEW assembling and a couple minors below. On 07.10.20, Sergey Kaplun wrote: > On 07.10.20, Igor Munkin wrote: > > Sergey, > > > > Thanks for the patch! Please consider my comments below. > > > > On 05.10.20, Sergey Kaplun wrote: > > > > > > + emit_setgl(as, RID_RET+2, gc.cdatanum); > > > > Well, I glanced a MIPS register-usage convention and AFAICS $4 register > > (RID_RET + 2) is a general-purpose (i.e. doesn't store 0 or preserved by > > kernel) caller-safe one. Ergo it should be allocated it in a proper way > > from scratch set, shouldn't it? > > > > AFAIK, $a0 - $a3 ($4 - $7) registers are arguments to functions - not > preserved by subprograms. Yes, but there is e.g. $8, that is temporary one, isn't it? Anyway, you can't just pick the particular register, since it can be already allocated by RA. So it *has* to be explicitly allocated to avoid data clash on the trace. I strongly believe the reason you see no failure on tests is simply a lucky coincidence (or tiny traces). > But anyway explicit allocation is better here. Added. > > > > /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */ > > > I've changed commit message as follows: > > =================================================================== > core: introduce various platform metrics > > This patch introduces the following counters: > - overall amount of allocated tables, cdata and udata objects > - number of incremental GC steps grouped by GC state > - number of string hashes hits and misses > - amount of allocated and freed memory > - number of trace aborts, number of traces and restored snapshots > > Also this patch fixes alignment for 64-bit architectures. > > NB: MSize and BCIns are the only fixed types that equal 32 bits. GCRef, > MRef and GCSize sizes depend on LJ_GC64 define. > > struct GCState is terminated by three fields: GCSize estimate, MSize > stepmul and MSize pause, which are aligned. The introduces size_t Typo: s/introduces/introduced/. > fields do not violate the alignment too. > > vmstate 32-bit field goes right after GCState field within global_State > structure. The next field tmpbuf consists of several MRef fields that > have 64-bit size each. This issue can be solved by moving vmstate field > below. However DynASM doesn't work well with unaligned memory access on > 64-bit bigendian MIPS, so vmstate should be aligned to a 64-bit > boundary. > > Furthermore field order has been changed to be able to compile code by > DynASM for 32-bit ARM too (see also > https://github.com/openresty/luajit2/issues/37#issuecomment-459145226). > > Interfaces to obtain these metrics via both Lua and C API are > introduced in the next patch. > > Part of tarantool/tarantool#5187 > =================================================================== > > Side note: If you want read a little bit more about ARM immediate value > encoding (and play with it) see also [1]. Thanks. > > > See iterative patch in the bottom. Branch force-pushed. > > =================================================================== > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h > index f4b4b5d..0341701 100644 > --- a/src/lj_asm_mips.h > +++ b/src/lj_asm_mips.h > @@ -1430,7 +1430,9 @@ static void asm_cnew(ASMState *as, IRIns *ir) > CTInfo info = lj_ctype_info(cts, id, &sz); > const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_mem_newgco]; > IRRef args[4]; > + RegSet allow = (RSET_GPR & ~RSET_SCRATCH); > RegSet drop = RSET_SCRATCH; > + Reg tmp; > lua_assert(sz != CTSIZE_INVALID || (ir->o == IR_CNEW && ir->op2 != REF_NIL)); > > as->gcsteps++; > @@ -1442,7 +1444,6 @@ static void asm_cnew(ASMState *as, IRIns *ir) > > /* Initialize immutable cdata object. */ > if (ir->o == IR_CNEWI) { > - RegSet allow = (RSET_GPR & ~RSET_SCRATCH); > #if LJ_32 > int32_t ofs = sizeof(GCcdata); > if (sz == 8) { > @@ -1473,15 +1474,16 @@ static void asm_cnew(ASMState *as, IRIns *ir) > return; > } > > + tmp = ra_scratch(as, allow); Since there are registers allocated in scope of IR_CNEWI assembling above, you need to exclude those registers from set prior to scratching a new one. > /* Code incrementing cdatanum is sparse to avoid mips data hazards. */ > - emit_setgl(as, RID_RET+2, gc.cdatanum); > + emit_setgl(as, tmp, gc.cdatanum); > /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */ > emit_tsi(as, MIPSI_SB, RID_RET+1, RID_RET, offsetof(GCcdata, gct)); > emit_tsi(as, MIPSI_SH, RID_TMP, RID_RET, offsetof(GCcdata, ctypeid)); > - emit_tsi(as, MIPSI_AADDIU, RID_RET+2, RID_RET+2, 1); > + emit_tsi(as, MIPSI_AADDIU, tmp, tmp, 1); > emit_ti(as, MIPSI_LI, RID_RET+1, ~LJ_TCDATA); > emit_ti(as, MIPSI_LI, RID_TMP, id); /* Lower 16 bit used. Sign-ext ok. */ > - emit_getgl(as, RID_RET+2, gc.cdatanum); > + emit_getgl(as, tmp, gc.cdatanum); > args[0] = ASMREF_L; /* lua_State *L */ > args[1] = ASMREF_TMP1; /* MSize size */ > asm_gencall(as, ci, args); > =================================================================== > > [1]: https://alisdair.mcdiarmid.org/arm-immediate-value-encoding/ > > -- > Best regards, > Sergey Kaplun -- Best regards, IM