[Tarantool-patches] [PATCH v4 1/2] core: introduce various platform metrics
Igor Munkin
imun at tarantool.org
Wed Oct 7 23:16:01 MSK 2020
Sergey,
Thanks for your fixes! There is still a comment regarding CNEW
assembling and a couple minors below.
On 07.10.20, Sergey Kaplun wrote:
> On 07.10.20, Igor Munkin wrote:
> > Sergey,
> >
> > Thanks for the patch! Please consider my comments below.
> >
> > On 05.10.20, Sergey Kaplun wrote:
<snipped>
>
> >
> > > + emit_setgl(as, RID_RET+2, gc.cdatanum);
> >
> > Well, I glanced a MIPS register-usage convention and AFAICS $4 register
> > (RID_RET + 2) is a general-purpose (i.e. doesn't store 0 or preserved by
> > kernel) caller-safe one. Ergo it should be allocated it in a proper way
> > from scratch set, shouldn't it?
> >
>
> AFAIK, $a0 - $a3 ($4 - $7) registers are arguments to functions - not
> preserved by subprograms.
Yes, but there is e.g. $8, that is temporary one, isn't it? Anyway, you
can't just pick the particular register, since it can be already
allocated by RA. So it *has* to be explicitly allocated to avoid data
clash on the trace. I strongly believe the reason you see no failure on
tests is simply a lucky coincidence (or tiny traces).
> But anyway explicit allocation is better here. Added.
>
> > > /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */
>
<snipped>
>
> I've changed commit message as follows:
>
> ===================================================================
> core: introduce various platform metrics
>
> This patch introduces the following counters:
> - overall amount of allocated tables, cdata and udata objects
> - number of incremental GC steps grouped by GC state
> - number of string hashes hits and misses
> - amount of allocated and freed memory
> - number of trace aborts, number of traces and restored snapshots
>
> Also this patch fixes alignment for 64-bit architectures.
>
> NB: MSize and BCIns are the only fixed types that equal 32 bits. GCRef,
> MRef and GCSize sizes depend on LJ_GC64 define.
>
> struct GCState is terminated by three fields: GCSize estimate, MSize
> stepmul and MSize pause, which are aligned. The introduces size_t
Typo: s/introduces/introduced/.
> fields do not violate the alignment too.
>
> vmstate 32-bit field goes right after GCState field within global_State
> structure. The next field tmpbuf consists of several MRef fields that
> have 64-bit size each. This issue can be solved by moving vmstate field
> below. However DynASM doesn't work well with unaligned memory access on
> 64-bit bigendian MIPS, so vmstate should be aligned to a 64-bit
> boundary.
>
> Furthermore field order has been changed to be able to compile code by
> DynASM for 32-bit ARM too (see also
> https://github.com/openresty/luajit2/issues/37#issuecomment-459145226).
>
> Interfaces to obtain these metrics via both Lua and C API are
> introduced in the next patch.
>
> Part of tarantool/tarantool#5187
> ===================================================================
>
> Side note: If you want read a little bit more about ARM immediate value
> encoding (and play with it) see also [1].
Thanks.
>
<snipped>
>
> See iterative patch in the bottom. Branch force-pushed.
>
> ===================================================================
<snipped>
> diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> index f4b4b5d..0341701 100644
> --- a/src/lj_asm_mips.h
> +++ b/src/lj_asm_mips.h
> @@ -1430,7 +1430,9 @@ static void asm_cnew(ASMState *as, IRIns *ir)
> CTInfo info = lj_ctype_info(cts, id, &sz);
> const CCallInfo *ci = &lj_ir_callinfo[IRCALL_lj_mem_newgco];
> IRRef args[4];
> + RegSet allow = (RSET_GPR & ~RSET_SCRATCH);
> RegSet drop = RSET_SCRATCH;
> + Reg tmp;
> lua_assert(sz != CTSIZE_INVALID || (ir->o == IR_CNEW && ir->op2 != REF_NIL));
>
> as->gcsteps++;
> @@ -1442,7 +1444,6 @@ static void asm_cnew(ASMState *as, IRIns *ir)
>
> /* Initialize immutable cdata object. */ > if (ir->o == IR_CNEWI) {
> - RegSet allow = (RSET_GPR & ~RSET_SCRATCH);
> #if LJ_32
> int32_t ofs = sizeof(GCcdata);
> if (sz == 8) {
> @@ -1473,15 +1474,16 @@ static void asm_cnew(ASMState *as, IRIns *ir)
> return;
> }
>
> + tmp = ra_scratch(as, allow);
Since there are registers allocated in scope of IR_CNEWI assembling
above, you need to exclude those registers from <allow> set prior to
scratching a new one.
> /* Code incrementing cdatanum is sparse to avoid mips data hazards. */
> - emit_setgl(as, RID_RET+2, gc.cdatanum);
> + emit_setgl(as, tmp, gc.cdatanum);
> /* Initialize gct and ctypeid. lj_mem_newgco() already sets marked. */
> emit_tsi(as, MIPSI_SB, RID_RET+1, RID_RET, offsetof(GCcdata, gct));
> emit_tsi(as, MIPSI_SH, RID_TMP, RID_RET, offsetof(GCcdata, ctypeid));
> - emit_tsi(as, MIPSI_AADDIU, RID_RET+2, RID_RET+2, 1);
> + emit_tsi(as, MIPSI_AADDIU, tmp, tmp, 1);
> emit_ti(as, MIPSI_LI, RID_RET+1, ~LJ_TCDATA);
> emit_ti(as, MIPSI_LI, RID_TMP, id); /* Lower 16 bit used. Sign-ext ok. */
> - emit_getgl(as, RID_RET+2, gc.cdatanum);
> + emit_getgl(as, tmp, gc.cdatanum);
> args[0] = ASMREF_L; /* lua_State *L */
> args[1] = ASMREF_TMP1; /* MSize size */
> asm_gencall(as, ci, args);
<snipped>
> ===================================================================
>
> [1]: https://alisdair.mcdiarmid.org/arm-immediate-value-encoding/
>
> --
> Best regards,
> Sergey Kaplun
--
Best regards,
IM
More information about the Tarantool-patches
mailing list