[Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi().

Igor Munkin imun at tarantool.org
Fri Nov 11 13:20:32 MSK 2022


Sergos,

On 06.07.22, sergos wrote:
> Hi!
> 
> Thanks for the patch!
> 
> I won’t veto on the patch, since it’s from upstream. Still I have got
> a number of questions on it’s crutch nature. Perhaps we need to follow
> up with some test cases to luajit.

Thanks for your precise review. I would like to shed some light onto the
patch regarding LuaJIT semantics. Even if you're familiar with all
specifics, it might be useful for other guys *still* reading ML and some
parts can be used either in commit message or in test comments.

> 
> Some description fixes are needed also.
> 
> Sergos
> 
> 
> > On 4 Jul 2022, at 12:33, Sergey Kaplun <skaplun at tarantool.org> wrote:
> > 
> > From: Mike Pall <mike>
> > 
> > Thanks to Peter Cawley.
> > 
> > (cherry picked from commit fb5e522fbc0750c838ef6a926b11c5d870826183)
> > 
> > To reproduce this issue, we need:
> > 1) a register which contains the constant zero value
> 
> > 2) a floating point comparison operation
> 
> The integer one will set the same flags (CF for the case of JNB) and XOR will
> clear it anyways. 

Strictly saying, yes, I see no reason for "float" requirement here. I
believe this can be changed just to "comparison operation occurs", but
Sergey can correct me if I'm wrong.

> 
> > 3) the comparison operation to perform a fused load, which in
> >   turn needs to allocate a register, and for there to be no
> >   free registers at that moment, and for the register chosen
> >   for sacrifice to be holding the constant zero.
> > 
> Unfortunately, it’s not clear what this register (I suppose it’s r14d) is used for.
> Is it an argument, needed after the fall-through in the same trace?
> Why not it sank down below the branch? 
> IIRC it is a dedicated register used for dispatch, so why is it used for sacrificing?

Like many things in LuaJIT register allocation is "state-of-the-art" (or
as I called this "ad-hoc") entity. Since it's implemented as a linear
scan RA working in a reverse direction of the trace being recorded,
register is "allocated" for IR slot being used as a source and "freed"
at the moment it becomes a destination.

To describe the process Sergey fit in a three-bullet list above, we have
the following.
0) IR for either "internal" (e.g. type check, hmask check) or "external"
   (e.g. branch or loop condition) guard is begin emitted to mcode.
1) JCC to side exit is emitted to the trace mcode at the beginning.
2) Condition (i.e. comparison) is going to be emitted.
3) Fuse optimization takes its place, that ought to allocate a register
   for the load base.
4) There is no free registers at this point.
5) The one storing the constant zero is chosen to be sacrificed and
   reallocated (consider allocation cost in ra_alloc for constant
   materialization).
6) Before (or in the sense of trace execution, after) register is
   being used in the aforementioned comparison, register (r14 in our
   case) is reset by XOR emitted right after (before) jump instruction.
7) The comparison with fused load within is emitted.

As a result flags set by comparison are reset by XOR emitted in between
of condition and jump instructions.

> 
> > This leads to assembly code like the following:
> > | ucomisd xmm7, [r14+0x18]
> > | xor r14d, r14d
> > | jnb 0x12a0e001c ->3
> > 
> > That xor is a big problem, as it modifies flags between the
> > ucomisd and the jnb, thereby causing the jnb to do the wrong
> > thing.
> > 

<snipped>

> > 
> > diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
> > index 5207f9da..6b58306b 100644
> > --- a/src/lj_emit_x86.h
> > +++ b/src/lj_emit_x86.h
> > @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i)
> > /* mov r, i / xor r, r */
> > static void emit_loadi(ASMState *as, Reg r, int32_t i)
> > {
> > -  /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */
> > +  /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */
> >   if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP ||
> > 			    (as->curins+1 < as->T->nins &&
> > -			     IR(as->curins+1)->o == IR_HIOP)))) {
> > +			     IR(as->curins+1)->o == IR_HIOP))) &&
> > +		!((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) ||
> > +		  (*as->mcp & 0xf0) == XI_JCCs)) {
> 
> I wonder if there can be a case with two instructions emitted between the comparison
> and the jump. In such a case the xor still can sneak in? 
> Another idea: there could be different instructions that relies on the flags, such as
> x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger?

Nice question indeed! I believe, we need to re-check this case and send
the patch to upstream if the issue occurs.

> 
> >     emit_rr(as, XO_ARITH(XOg_XOR), r, r);
> >   } else {
> >     MCode *p = as->mcp;

<snipped>

> > -- 
> > 2.34.1
> > 
> 

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list