[Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi().

sergos sergos at tarantool.org
Wed Aug 31 19:04:08 MSK 2022


Hi!

Thanks and with last two nits marked with #1 and #2 - LGTM.

Sergos

> On 31 Aug 2022, at 11:48, Sergey Kaplun <skaplun at tarantool.org> wrote:
> 
> Sergos, Igor!
> 
> Thanks for your review!
> 
> I've updated test with more comments considering your suggestions.
> See the iterative patch below:
> 
> Branch is force-pushed.
> ===================================================================
> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> index 7c6ab2b9..39551184 100644
> --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> @@ -13,9 +13,9 @@ test:plan(1)
> --    for sacrifice to be holding the constant zero.
> --
> -- This leads to assembly code like the following:
> ---   ucomisd xmm7, [r14+0x18]
> +--   ucomisd xmm7, [r14]
> --   xor r14d, r14d
> ---   jnb 0x12a0e001c ->3
> +--   jnb 0x555d3d250014        ->1
> --
> -- That xor is a big problem, as it modifies flags between the
> -- ucomisd and the jnb, thereby causing the jnb to do the wrong

#1.1 Still missing the correct assembly below in the message. 

> @@ -34,36 +34,53 @@ local handler = setmetatable({}, {
>   end
> })
> 
<skipped>

>> There’s no trace after the fix, so it’s hard to grasp the resolution
>> used.
> 
> The trace is still compiled, but mov is used instead of xor.
> | ucomisd xmm7, [r14]
> | mov r14d, 0x0
> | jnb 0x55c95c330014 ->1

#1.2 That what I expected to see in the message, right after ‘forbids emitting’

> 
>>> 
>>> This patch forbids emitting xor in `emit_loadi()` for jcc operations.
> 
>>> 
>>> Sergey Kaplun:
>>> * added the description and the test for the problem
>>> 
>>> Part of tarantool/tarantool#7230
>>> ---
>>> 

<skipped>

>>> --- a/src/lj_emit_x86.h
>>> +++ b/src/lj_emit_x86.h
>>> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i)
>>> /* mov r, i / xor r, r */
>>> static void emit_loadi(ASMState *as, Reg r, int32_t i)
>>> {
>>> -  /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */
>>> +  /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */
>>>   if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP ||
>>> 			    (as->curins+1 < as->T->nins &&
>>> -			     IR(as->curins+1)->o == IR_HIOP)))) {
>>> +			     IR(as->curins+1)->o == IR_HIOP))) &&
>>> +		!((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) ||
>>> +		  (*as->mcp & 0xf0) == XI_JCCs)) {
> 
>> I wonder if there can be a case with two instructions emitted between the comparison
>> and the jump. In such a case the xor still can sneak in?
> 
> To be honest I'm nott sure that this is possible due to self-written
> assembler. So we must keep it in mind when assembly the correspodning
> IRs.
> 
>> Another idea: there could be different instructions that relies on the flags, such as
>> x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger?
> 
> AFAIK, it is imposible due to close attention to the assembly.

So, the answer for both Qs: ’not at the moment’. Which, in turn, implies ‘we will be
running around in search for all places our new optimization leads to a wrong code
generation’. Well, fine!

> 
>>>     emit_rr(as, XO_ARITH(XOg_XOR), r, r);
>>>   } else {
>>>     MCode *p = as->mcp;
>>> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
>>> index 5708822e..ad69e2cc 100644
>>> --- a/test/tarantool-tests/CMakeLists.txt
>>> +++ b/test/tarantool-tests/CMakeLists.txt
>>> @@ -65,6 +65,7 @@ add_subdirectory(gh-5813-resolving-of-c-symbols/stripped)
>>> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64)
>>> add_subdirectory(gh-6189-cur_L)
>>> add_subdirectory(lj-49-bad-lightuserdata)
>>> +add_subdirectory(lj-416-xor-before-jcc)
>>> add_subdirectory(lj-601-fix-gc-finderrfunc)
>>> add_subdirectory(lj-727-lightuserdata-itern)
>>> add_subdirectory(lj-flush-on-trace)
>>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
>>> new file mode 100644
>>> index 00000000..7c6ab2b9
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
>>> @@ -0,0 +1,70 @@
>>> +local ffi = require('ffi')
>>> +local tap = require('tap')
>>> +
>>> +local test = tap.test('lj-416-xor-before-jcc')
>>> +test:plan(1)
>>> +
>>> +-- To reproduce this issue, we need:
>>> +-- 1) a register which contains the constant zero value
>>> +-- 2) a floating point comparison operation
>>> +-- 3) the comparison operation to perform a fused load, which in
>>> +--    turn needs to allocate a register, and for there to be no
>>> +--    free registers at that moment, and for the register chosen
>>> +--    for sacrifice to be holding the constant zero.
>>> +--
>>> +-- This leads to assembly code like the following:
>>> +--   ucomisd xmm7, [r14+0x18]
>>> +--   xor r14d, r14d
>>> +--   jnb 0x12a0e001c ->3
>>> +--
>>> +-- That xor is a big problem, as it modifies flags between the
>>> +-- ucomisd and the jnb, thereby causing the jnb to do the wrong
>>> +-- thing.
>>> +
>>> +ffi.cdef[[
>>> +  int test_xor_func(int a, int b, int c, int d, int e, int f, void * g, int h);
>>> +]]
>>> +local testxor = ffi.load('libtestxor')
> 
>> Should have a test for x86 platform, since changes are x86 only?
> 
> We have too small amount of tests for LuaJIT anyway, so I prefer to keep
> tests for all architectures for now.

But... I bet the aarch64 will require `a little bit` different register pressure
preparations, will it? IOW - this test is not applicable to other architectures
‘as is’.

> 
>>> +
>>> +local handler = setmetatable({}, {
>>> +  __newindex = function ()
>>> +    -- 0 and nil are suggested as differnt constant-zero values
>>> +    -- for the call and occupied different registers.
>>> +    testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0)
>>> +  end
>>> +})
>>> +
>>> +local mconf = {
>>> +  { use = false, value = 100 },
>>> +  { use = true,  value = 100 },
>>> +}
>>> +
>>> +local function testf()
>>> +  -- Generate register pressure.
>>> +  local value = 50
>>> +  for _, rule in ipairs(mconf) do
>>> +    if rule.use then
>>> +      value = rule.value
>>> +      break
>>> +    end
>>> +  end
>>> +
>>> +  -- This branch shouldn't be taken.
>>> +  if value <= 42 then
>>> +    return true
>>> +  end
>>> +
>>> +  -- Nothing to do, just call testxor with many arguments.
>>> +  handler[4] = 4
> 
>> If it is needed to use the metatable here then why?
> 
> It is needed for suitable registers layout. Unfortunately, I gave up to
> create test without it.
> 
>>> +end
>>> +
>>> +-- We need to create long side trace to generate register
>>> +-- pressure.
>>> +jit.opt.start('hotloop=1', 'hotexit=1')
>>> +for _ = 1, 3 do
>>> +  -- Don't use any `test` functions here to freeze the trace.
>>> +  assert (not testf())
>>> +end
>>> +test:ok(true, 'imposible branch is not taken')

#2 s/imposible/impossible/

>>> +
>>> +os.exit(test:check() and 0 or 1)
>>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt
>>> new file mode 100644
>>> index 00000000..17aa9f9b
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt
>>> @@ -0,0 +1 @@
>>> +BuildTestCLib(libtestxor testxor.c)
>>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c
>>> new file mode 100644
>>> index 00000000..32436d42
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c
> 
> <snipped>
> 
>>> -- 
>>> 2.34.1
>>> 
> 
> -- 
> Best regards,
> Sergey Kaplun



More information about the Tarantool-patches mailing list