[Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi().

Sergey Kaplun skaplun at tarantool.org
Thu Sep 1 13:32:15 MSK 2022


Hi!

Fixed your comments.

On 31.08.22, sergos wrote:
> Hi!
> 
> Thanks and with last two nits marked with #1 and #2 - LGTM.
> 
> Sergos
> 
> > On 31 Aug 2022, at 11:48, Sergey Kaplun <skaplun at tarantool.org> wrote:
> > 
> > Sergos, Igor!
> > 
> > Thanks for your review!
> > 
> > I've updated test with more comments considering your suggestions.
> > See the iterative patch below:
> > 
> > Branch is force-pushed.
> > ===================================================================
> > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> > index 7c6ab2b9..39551184 100644
> > --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> > +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> > @@ -13,9 +13,9 @@ test:plan(1)
> > --    for sacrifice to be holding the constant zero.
> > --
> > -- This leads to assembly code like the following:
> > ---   ucomisd xmm7, [r14+0x18]
> > +--   ucomisd xmm7, [r14]
> > --   xor r14d, r14d
> > ---   jnb 0x12a0e001c ->3
> > +--   jnb 0x555d3d250014        ->1
> > --
> > -- That xor is a big problem, as it modifies flags between the
> > -- ucomisd and the jnb, thereby causing the jnb to do the wrong
> 
> #1.1 Still missing the correct assembly below in the message. 
> 
> > @@ -34,36 +34,53 @@ local handler = setmetatable({}, {
> >   end
> > })
> > 
> <skipped>
> 
> >> There’s no trace after the fix, so it’s hard to grasp the resolution
> >> used.
> > 
> > The trace is still compiled, but mov is used instead of xor.
> > | ucomisd xmm7, [r14]
> > | mov r14d, 0x0
> > | jnb 0x55c95c330014 ->1
> 
> #1.2 That what I expected to see in the message, right after ‘forbids emitting’

Added the correct assembly to the commit message:

===================================================================
x86/x64: Check for jcc when using xor r,r in emit_loadi().

Thanks to Peter Cawley.

(cherry picked from commit fb5e522fbc0750c838ef6a926b11c5d870826183)

To reproduce this issue, we need:
1) a register which contains the constant zero value
2) a floating point comparison operation
3) the comparison operation to perform a fused load, which in
   turn needs to allocate a register, and for there to be no
   free registers at that moment, and for the register chosen
   for sacrifice to be holding the constant zero.

This leads to assembly code like the following:
| ucomisd xmm7, [r14]
| xor r14d, r14d
| jnb 0x555d3d250014 ->1

That xor is a big problem, as it modifies flags between the
ucomisd and the jnb, thereby causing the jnb to do the wrong
thing.

This patch forbids emitting xor in `emit_loadi()` for jcc operations, so
mov is emmited instead. Now assembly code looks like the following:
| ucomisd xmm7, [r14]
| mov r14d, 0x0
| jnb 0x55c95c330014 ->1

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#7230
===================================================================

> 
> > 
> >>> 
> >>> This patch forbids emitting xor in `emit_loadi()` for jcc operations.
> > 
> >>> 
> >>> Sergey Kaplun:
> >>> * added the description and the test for the problem
> >>> 
> >>> Part of tarantool/tarantool#7230
> >>> ---
> >>> 
> 
> <skipped>
> 
> >>> --- a/src/lj_emit_x86.h
> >>> +++ b/src/lj_emit_x86.h
> >>> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i)
> >>> /* mov r, i / xor r, r */
> >>> static void emit_loadi(ASMState *as, Reg r, int32_t i)
> >>> {
> >>> -  /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */
> >>> +  /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */
> >>>   if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP ||
> >>> 			    (as->curins+1 < as->T->nins &&
> >>> -			     IR(as->curins+1)->o == IR_HIOP)))) {
> >>> +			     IR(as->curins+1)->o == IR_HIOP))) &&
> >>> +		!((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) ||
> >>> +		  (*as->mcp & 0xf0) == XI_JCCs)) {
> > 
> >> I wonder if there can be a case with two instructions emitted between the comparison
> >> and the jump. In such a case the xor still can sneak in?
> > 
> > To be honest I'm nott sure that this is possible due to self-written
> > assembler. So we must keep it in mind when assembly the correspodning
> > IRs.
> > 
> >> Another idea: there could be different instructions that relies on the flags, such as
> >> x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger?
> > 
> > AFAIK, it is imposible due to close attention to the assembly.
> 
> So, the answer for both Qs: ’not at the moment’. Which, in turn, implies ‘we will be
> running around in search for all places our new optimization leads to a wrong code
> generation’. Well, fine!

Yes, you are right.

> 
> > 
> >>>     emit_rr(as, XO_ARITH(XOg_XOR), r, r);
> >>>   } else {
> >>>     MCode *p = as->mcp;
> >>> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
> >>> index 5708822e..ad69e2cc 100644
> >>> --- a/test/tarantool-tests/CMakeLists.txt
> >>> +++ b/test/tarantool-tests/CMakeLists.txt

<snipped>

> >>> +local testxor = ffi.load('libtestxor')
> > 
> >> Should have a test for x86 platform, since changes are x86 only?
> > 
> > We have too small amount of tests for LuaJIT anyway, so I prefer to keep
> > tests for all architectures for now.
> 
> But... I bet the aarch64 will require `a little bit` different register pressure
> preparations, will it? IOW - this test is not applicable to other architectures
> ‘as is’.

Yes, of corse. It doesn't show litterally the same issue, but may show
another one.

> 

<snipped>

> >>> +test:ok(true, 'imposible branch is not taken')
> 
> #2 s/imposible/impossible/

Fixed. Branch is force-pushed.

===================================================================
diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
index 39551184..9a1c96ef 100644
--- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
+++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
@@ -82,6 +82,6 @@ for _ = 1, 3 do
   -- Don't use any `test` functions here to freeze the trace.
   assert(not testf())
 end
-test:ok(true, 'imposible branch is not taken')
+test:ok(true, 'impossible branch is not taken')
 
 os.exit(test:check() and 0 or 1)
===================================================================

> 
> >>> +

<snipped>

> > 
> >>> -- 
> >>> 2.34.1
> >>> 
> > 
> > -- 
> > Best regards,
> > Sergey Kaplun
> 

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list