[Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi().
sergos
sergos at tarantool.org
Wed Aug 31 19:04:08 MSK 2022
Hi!
Thanks and with last two nits marked with #1 and #2 - LGTM.
Sergos
> On 31 Aug 2022, at 11:48, Sergey Kaplun <skaplun at tarantool.org> wrote:
>
> Sergos, Igor!
>
> Thanks for your review!
>
> I've updated test with more comments considering your suggestions.
> See the iterative patch below:
>
> Branch is force-pushed.
> ===================================================================
> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> index 7c6ab2b9..39551184 100644
> --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
> @@ -13,9 +13,9 @@ test:plan(1)
> -- for sacrifice to be holding the constant zero.
> --
> -- This leads to assembly code like the following:
> --- ucomisd xmm7, [r14+0x18]
> +-- ucomisd xmm7, [r14]
> -- xor r14d, r14d
> --- jnb 0x12a0e001c ->3
> +-- jnb 0x555d3d250014 ->1
> --
> -- That xor is a big problem, as it modifies flags between the
> -- ucomisd and the jnb, thereby causing the jnb to do the wrong
#1.1 Still missing the correct assembly below in the message.
> @@ -34,36 +34,53 @@ local handler = setmetatable({}, {
> end
> })
>
<skipped>
>> There’s no trace after the fix, so it’s hard to grasp the resolution
>> used.
>
> The trace is still compiled, but mov is used instead of xor.
> | ucomisd xmm7, [r14]
> | mov r14d, 0x0
> | jnb 0x55c95c330014 ->1
#1.2 That what I expected to see in the message, right after ‘forbids emitting’
>
>>>
>>> This patch forbids emitting xor in `emit_loadi()` for jcc operations.
>
>>>
>>> Sergey Kaplun:
>>> * added the description and the test for the problem
>>>
>>> Part of tarantool/tarantool#7230
>>> ---
>>>
<skipped>
>>> --- a/src/lj_emit_x86.h
>>> +++ b/src/lj_emit_x86.h
>>> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i)
>>> /* mov r, i / xor r, r */
>>> static void emit_loadi(ASMState *as, Reg r, int32_t i)
>>> {
>>> - /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */
>>> + /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */
>>> if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP ||
>>> (as->curins+1 < as->T->nins &&
>>> - IR(as->curins+1)->o == IR_HIOP)))) {
>>> + IR(as->curins+1)->o == IR_HIOP))) &&
>>> + !((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) ||
>>> + (*as->mcp & 0xf0) == XI_JCCs)) {
>
>> I wonder if there can be a case with two instructions emitted between the comparison
>> and the jump. In such a case the xor still can sneak in?
>
> To be honest I'm nott sure that this is possible due to self-written
> assembler. So we must keep it in mind when assembly the correspodning
> IRs.
>
>> Another idea: there could be different instructions that relies on the flags, such as
>> x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger?
>
> AFAIK, it is imposible due to close attention to the assembly.
So, the answer for both Qs: ’not at the moment’. Which, in turn, implies ‘we will be
running around in search for all places our new optimization leads to a wrong code
generation’. Well, fine!
>
>>> emit_rr(as, XO_ARITH(XOg_XOR), r, r);
>>> } else {
>>> MCode *p = as->mcp;
>>> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt
>>> index 5708822e..ad69e2cc 100644
>>> --- a/test/tarantool-tests/CMakeLists.txt
>>> +++ b/test/tarantool-tests/CMakeLists.txt
>>> @@ -65,6 +65,7 @@ add_subdirectory(gh-5813-resolving-of-c-symbols/stripped)
>>> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64)
>>> add_subdirectory(gh-6189-cur_L)
>>> add_subdirectory(lj-49-bad-lightuserdata)
>>> +add_subdirectory(lj-416-xor-before-jcc)
>>> add_subdirectory(lj-601-fix-gc-finderrfunc)
>>> add_subdirectory(lj-727-lightuserdata-itern)
>>> add_subdirectory(lj-flush-on-trace)
>>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
>>> new file mode 100644
>>> index 00000000..7c6ab2b9
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua
>>> @@ -0,0 +1,70 @@
>>> +local ffi = require('ffi')
>>> +local tap = require('tap')
>>> +
>>> +local test = tap.test('lj-416-xor-before-jcc')
>>> +test:plan(1)
>>> +
>>> +-- To reproduce this issue, we need:
>>> +-- 1) a register which contains the constant zero value
>>> +-- 2) a floating point comparison operation
>>> +-- 3) the comparison operation to perform a fused load, which in
>>> +-- turn needs to allocate a register, and for there to be no
>>> +-- free registers at that moment, and for the register chosen
>>> +-- for sacrifice to be holding the constant zero.
>>> +--
>>> +-- This leads to assembly code like the following:
>>> +-- ucomisd xmm7, [r14+0x18]
>>> +-- xor r14d, r14d
>>> +-- jnb 0x12a0e001c ->3
>>> +--
>>> +-- That xor is a big problem, as it modifies flags between the
>>> +-- ucomisd and the jnb, thereby causing the jnb to do the wrong
>>> +-- thing.
>>> +
>>> +ffi.cdef[[
>>> + int test_xor_func(int a, int b, int c, int d, int e, int f, void * g, int h);
>>> +]]
>>> +local testxor = ffi.load('libtestxor')
>
>> Should have a test for x86 platform, since changes are x86 only?
>
> We have too small amount of tests for LuaJIT anyway, so I prefer to keep
> tests for all architectures for now.
But... I bet the aarch64 will require `a little bit` different register pressure
preparations, will it? IOW - this test is not applicable to other architectures
‘as is’.
>
>>> +
>>> +local handler = setmetatable({}, {
>>> + __newindex = function ()
>>> + -- 0 and nil are suggested as differnt constant-zero values
>>> + -- for the call and occupied different registers.
>>> + testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0)
>>> + end
>>> +})
>>> +
>>> +local mconf = {
>>> + { use = false, value = 100 },
>>> + { use = true, value = 100 },
>>> +}
>>> +
>>> +local function testf()
>>> + -- Generate register pressure.
>>> + local value = 50
>>> + for _, rule in ipairs(mconf) do
>>> + if rule.use then
>>> + value = rule.value
>>> + break
>>> + end
>>> + end
>>> +
>>> + -- This branch shouldn't be taken.
>>> + if value <= 42 then
>>> + return true
>>> + end
>>> +
>>> + -- Nothing to do, just call testxor with many arguments.
>>> + handler[4] = 4
>
>> If it is needed to use the metatable here then why?
>
> It is needed for suitable registers layout. Unfortunately, I gave up to
> create test without it.
>
>>> +end
>>> +
>>> +-- We need to create long side trace to generate register
>>> +-- pressure.
>>> +jit.opt.start('hotloop=1', 'hotexit=1')
>>> +for _ = 1, 3 do
>>> + -- Don't use any `test` functions here to freeze the trace.
>>> + assert (not testf())
>>> +end
>>> +test:ok(true, 'imposible branch is not taken')
#2 s/imposible/impossible/
>>> +
>>> +os.exit(test:check() and 0 or 1)
>>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt
>>> new file mode 100644
>>> index 00000000..17aa9f9b
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt
>>> @@ -0,0 +1 @@
>>> +BuildTestCLib(libtestxor testxor.c)
>>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c
>>> new file mode 100644
>>> index 00000000..32436d42
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c
>
> <snipped>
>
>>> --
>>> 2.34.1
>>>
>
> --
> Best regards,
> Sergey Kaplun
More information about the Tarantool-patches
mailing list