From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Sergey Ostanevich <sergos@tarantool.org>, Igor Munkin <imun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi(). Date: Wed, 31 Aug 2022 11:48:02 +0300 [thread overview] Message-ID: <Yw8gQv211bjgGtRw@root> (raw) In-Reply-To: <20220704093344.13522-1-skaplun@tarantool.org> Sergos, Igor! Thanks for your review! I've updated test with more comments considering your suggestions. See the iterative patch below: Branch is force-pushed. =================================================================== diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua index 7c6ab2b9..39551184 100644 --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua @@ -13,9 +13,9 @@ test:plan(1) -- for sacrifice to be holding the constant zero. -- -- This leads to assembly code like the following: --- ucomisd xmm7, [r14+0x18] +-- ucomisd xmm7, [r14] -- xor r14d, r14d --- jnb 0x12a0e001c ->3 +-- jnb 0x555d3d250014 ->1 -- -- That xor is a big problem, as it modifies flags between the -- ucomisd and the jnb, thereby causing the jnb to do the wrong @@ -34,36 +34,53 @@ local handler = setmetatable({}, { end }) +local MAGIC = 42 local mconf = { - { use = false, value = 100 }, - { use = true, value = 100 }, + { use = false, value = MAGIC + 1 }, + { use = true, value = MAGIC + 1 }, } local function testf() - -- Generate register pressure. - local value = 50 + -- All code below is needed for generating register pressure. + local value + -- The first trace to compile is this for loop, with `rule.use` + -- values is `true`. for _, rule in ipairs(mconf) do + -- The side trace starts here, when `rule.use` value is + -- `false`, returns to the `for` loop, where the function was + -- called, starts another iteration, calls `testf()` again and + -- ends at JITERL bytecode for the loop in this function. if rule.use then value = rule.value break end end - -- This branch shouldn't be taken. - if value <= 42 then + -- The code below is recorded with the following IRs: + -- .... SNAP #1 [ lj-416-xor-before-jcc.test.lua:44|---- ] + -- 0012 > num UGT 0009 +42 + -- + -- That leads to the following assembly: + -- ucomisd xmm7, [r14] + -- xor r14d, r14d + -- jnb 0x555d3d250014 ->1 + -- + -- As a result, this branch is taken due to the emitted `xor` + -- instruction until the issue is not resolved. + if value <= MAGIC then return true end -- Nothing to do, just call testxor with many arguments. - handler[4] = 4 + handler.nothing = 'to do' end -- We need to create long side trace to generate register --- pressure. +-- pressure (see the comment in `testf()`). jit.opt.start('hotloop=1', 'hotexit=1') for _ = 1, 3 do -- Don't use any `test` functions here to freeze the trace. - assert (not testf()) + assert(not testf()) end test:ok(true, 'imposible branch is not taken') =================================================================== On 04.07.22, Sergey Kaplun wrote: >> From: Mike Pall <mike> >> >> Thanks to Peter Cawley. >> >> (cherry picked from commit fb5e522fbc0750c838ef6a926b11c5d870826183) >> >> To reproduce this issue, we need: >> 1) a register which contains the constant zero value >> 2) a floating point comparison operation > The integer one will set the same flags (CF for the case of JNB) and XOR will > clear it anyways. I suppose that this is just not our case -- for integer comparasions we should use cdata objects. I failed to create test case with the corresponding behaviour. >> 3) the comparison operation to perform a fused load, which in >> turn needs to allocate a register, and for there to be no >> free registers at that moment, and for the register chosen >> for sacrifice to be holding the constant zero. > Unfortunately, it’s not clear what this register (I suppose it’s r14d) is used for. > Is it an argument, needed after the fall-through in the same trace? Yes, it is the argument for the future call. | mov [rsp+0x8], r14 | mov [rsp], r15 | xor r9d, r9d | xor r8d, r8d | xor ecx, ecx | xor edx, edx | xor esi, esi | xor edi, edi | call rbx > Why not it sank down below the branch? I supose that this is issue related to self-written assembly from bottom of a trace to its start. First the branch is encoded, and only after that, we assembly the comparision operation. > IIRC it is a dedicated register used for dispatch, so why is it used for sacrificing? r14d is DISPATCH register only for LuaJIT VM. >> >> This leads to assembly code like the following: >> | ucomisd xmm7, [r14] >> | xor r14d, r14d >> | jnb 0x55c95c330014 ->1 >> >> That xor is a big problem, as it modifies flags between the >> ucomisd and the jnb, thereby causing the jnb to do the wrong >> thing. > There’s no trace after the fix, so it’s hard to grasp the resolution > used. The trace is still compiled, but mov is used instead of xor. | ucomisd xmm7, [r14] | mov r14d, 0x0 | jnb 0x55c95c330014 ->1 >> >> This patch forbids emitting xor in `emit_loadi()` for jcc operations. >> >> Sergey Kaplun: >> * added the description and the test for the problem >> >> Part of tarantool/tarantool#7230 >> --- >> >> Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-416-xor-before-jcc-full-ci >> Issues: >> * https://github.com/LuaJIT/LuaJIT/issues/416 >> * https://github.com/tarantool/tarantool/issues/7230 >> >> Changelog entry (I suggest to update this entry with the each >> corresponding bump): >> =================================================================== >> ## bugfix/luajit >> >> Backported patches from vanilla LuaJIT trunk (gh-7230). In the scope of this >> activity, the following issues have been resolved: >> * Fixed `emit_loadi()` on x86/x64 emitting xor between condition check >> and jump instructions. >> =================================================================== >> >> src/lj_emit_x86.h | 6 +- >> test/tarantool-tests/CMakeLists.txt | 1 + >> .../lj-416-xor-before-jcc.test.lua | 70 +++++++++++++++++++ >> .../lj-416-xor-before-jcc/CMakeLists.txt | 1 + >> .../lj-416-xor-before-jcc/testxor.c | 14 ++++ >> 5 files changed, 90 insertions(+), 2 deletions(-) >> create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc.test.lua >> create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >> create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >> >> diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h >> index 5207f9da..6b58306b 100644 >> --- a/src/lj_emit_x86.h >> +++ b/src/lj_emit_x86.h >> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i) >> /* mov r, i / xor r, r */ >> static void emit_loadi(ASMState *as, Reg r, int32_t i) >> { >> - /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */ >> + /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */ >> if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP || >> (as->curins+1 < as->T->nins && >> - IR(as->curins+1)->o == IR_HIOP)))) { >> + IR(as->curins+1)->o == IR_HIOP))) && >> + !((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) || >> + (*as->mcp & 0xf0) == XI_JCCs)) { > I wonder if there can be a case with two instructions emitted between the comparison > and the jump. In such a case the xor still can sneak in? To be honest I'm nott sure that this is possible due to self-written assembler. So we must keep it in mind when assembly the correspodning IRs. > Another idea: there could be different instructions that relies on the flags, such as > x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger? AFAIK, it is imposible due to close attention to the assembly. >> emit_rr(as, XO_ARITH(XOg_XOR), r, r); >> } else { >> MCode *p = as->mcp; >> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt >> index 5708822e..ad69e2cc 100644 >> --- a/test/tarantool-tests/CMakeLists.txt >> +++ b/test/tarantool-tests/CMakeLists.txt >> @@ -65,6 +65,7 @@ add_subdirectory(gh-5813-resolving-of-c-symbols/stripped) >> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) >> add_subdirectory(gh-6189-cur_L) >> add_subdirectory(lj-49-bad-lightuserdata) >> +add_subdirectory(lj-416-xor-before-jcc) >> add_subdirectory(lj-601-fix-gc-finderrfunc) >> add_subdirectory(lj-727-lightuserdata-itern) >> add_subdirectory(lj-flush-on-trace) >> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >> new file mode 100644 >> index 00000000..7c6ab2b9 >> --- /dev/null >> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >> @@ -0,0 +1,70 @@ >> +local ffi = require('ffi') >> +local tap = require('tap') >> + >> +local test = tap.test('lj-416-xor-before-jcc') >> +test:plan(1) >> + >> +-- To reproduce this issue, we need: >> +-- 1) a register which contains the constant zero value >> +-- 2) a floating point comparison operation >> +-- 3) the comparison operation to perform a fused load, which in >> +-- turn needs to allocate a register, and for there to be no >> +-- free registers at that moment, and for the register chosen >> +-- for sacrifice to be holding the constant zero. >> +-- >> +-- This leads to assembly code like the following: >> +-- ucomisd xmm7, [r14+0x18] >> +-- xor r14d, r14d >> +-- jnb 0x12a0e001c ->3 >> +-- >> +-- That xor is a big problem, as it modifies flags between the >> +-- ucomisd and the jnb, thereby causing the jnb to do the wrong >> +-- thing. >> + >> +ffi.cdef[[ >> + int test_xor_func(int a, int b, int c, int d, int e, int f, void * g, int h); >> +]] >> +local testxor = ffi.load('libtestxor') > Should have a test for x86 platform, since changes are x86 only? We have too small amount of tests for LuaJIT anyway, so I prefer to keep tests for all architectures for now. >> + >> +local handler = setmetatable({}, { >> + __newindex = function () >> + -- 0 and nil are suggested as differnt constant-zero values >> + -- for the call and occupied different registers. >> + testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0) >> + end >> +}) >> + >> +local mconf = { >> + { use = false, value = 100 }, >> + { use = true, value = 100 }, >> +} >> + >> +local function testf() >> + -- Generate register pressure. >> + local value = 50 >> + for _, rule in ipairs(mconf) do >> + if rule.use then >> + value = rule.value >> + break >> + end >> + end >> + >> + -- This branch shouldn't be taken. >> + if value <= 42 then >> + return true >> + end >> + >> + -- Nothing to do, just call testxor with many arguments. >> + handler[4] = 4 > If it is needed to use the metatable here then why? It is needed for suitable registers layout. Unfortunately, I gave up to create test without it. >> +end >> + >> +-- We need to create long side trace to generate register >> +-- pressure. >> +jit.opt.start('hotloop=1', 'hotexit=1') >> +for _ = 1, 3 do >> + -- Don't use any `test` functions here to freeze the trace. >> + assert (not testf()) >> +end >> +test:ok(true, 'imposible branch is not taken') >> + >> +os.exit(test:check() and 0 or 1) >> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >> new file mode 100644 >> index 00000000..17aa9f9b >> --- /dev/null >> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >> @@ -0,0 +1 @@ >> +BuildTestCLib(libtestxor testxor.c) >> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >> new file mode 100644 >> index 00000000..32436d42 >> --- /dev/null >> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c <snipped> >> -- >> 2.34.1 >> -- Best regards, Sergey Kaplun
next prev parent reply other threads:[~2022-08-31 8:50 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-07-04 9:33 Sergey Kaplun via Tarantool-patches 2022-07-05 15:10 ` Igor Munkin via Tarantool-patches 2022-07-06 10:30 ` sergos via Tarantool-patches 2022-11-11 10:20 ` Igor Munkin via Tarantool-patches 2022-11-22 4:36 ` Sergey Kaplun via Tarantool-patches 2022-11-22 5:45 ` Igor Munkin via Tarantool-patches 2022-08-31 8:48 ` Sergey Kaplun via Tarantool-patches [this message] 2022-08-31 16:04 ` sergos via Tarantool-patches 2022-09-01 10:32 ` Sergey Kaplun via Tarantool-patches 2022-11-23 7:50 ` Igor Munkin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Yw8gQv211bjgGtRw@root \ --to=tarantool-patches@dev.tarantool.org \ --cc=imun@tarantool.org \ --cc=sergos@tarantool.org \ --cc=skaplun@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi().' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox