From: sergos via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Sergey Kaplun <skaplun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi(). Date: Wed, 31 Aug 2022 19:04:08 +0300 [thread overview] Message-ID: <07C1C422-04A5-46D5-9D61-ED6B9BC90556@tarantool.org> (raw) In-Reply-To: <Yw8gQv211bjgGtRw@root> Hi! Thanks and with last two nits marked with #1 and #2 - LGTM. Sergos > On 31 Aug 2022, at 11:48, Sergey Kaplun <skaplun@tarantool.org> wrote: > > Sergos, Igor! > > Thanks for your review! > > I've updated test with more comments considering your suggestions. > See the iterative patch below: > > Branch is force-pushed. > =================================================================== > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > index 7c6ab2b9..39551184 100644 > --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > @@ -13,9 +13,9 @@ test:plan(1) > -- for sacrifice to be holding the constant zero. > -- > -- This leads to assembly code like the following: > --- ucomisd xmm7, [r14+0x18] > +-- ucomisd xmm7, [r14] > -- xor r14d, r14d > --- jnb 0x12a0e001c ->3 > +-- jnb 0x555d3d250014 ->1 > -- > -- That xor is a big problem, as it modifies flags between the > -- ucomisd and the jnb, thereby causing the jnb to do the wrong #1.1 Still missing the correct assembly below in the message. > @@ -34,36 +34,53 @@ local handler = setmetatable({}, { > end > }) > <skipped> >> There’s no trace after the fix, so it’s hard to grasp the resolution >> used. > > The trace is still compiled, but mov is used instead of xor. > | ucomisd xmm7, [r14] > | mov r14d, 0x0 > | jnb 0x55c95c330014 ->1 #1.2 That what I expected to see in the message, right after ‘forbids emitting’ > >>> >>> This patch forbids emitting xor in `emit_loadi()` for jcc operations. > >>> >>> Sergey Kaplun: >>> * added the description and the test for the problem >>> >>> Part of tarantool/tarantool#7230 >>> --- >>> <skipped> >>> --- a/src/lj_emit_x86.h >>> +++ b/src/lj_emit_x86.h >>> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i) >>> /* mov r, i / xor r, r */ >>> static void emit_loadi(ASMState *as, Reg r, int32_t i) >>> { >>> - /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */ >>> + /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */ >>> if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP || >>> (as->curins+1 < as->T->nins && >>> - IR(as->curins+1)->o == IR_HIOP)))) { >>> + IR(as->curins+1)->o == IR_HIOP))) && >>> + !((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) || >>> + (*as->mcp & 0xf0) == XI_JCCs)) { > >> I wonder if there can be a case with two instructions emitted between the comparison >> and the jump. In such a case the xor still can sneak in? > > To be honest I'm nott sure that this is possible due to self-written > assembler. So we must keep it in mind when assembly the correspodning > IRs. > >> Another idea: there could be different instructions that relies on the flags, such as >> x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger? > > AFAIK, it is imposible due to close attention to the assembly. So, the answer for both Qs: ’not at the moment’. Which, in turn, implies ‘we will be running around in search for all places our new optimization leads to a wrong code generation’. Well, fine! > >>> emit_rr(as, XO_ARITH(XOg_XOR), r, r); >>> } else { >>> MCode *p = as->mcp; >>> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt >>> index 5708822e..ad69e2cc 100644 >>> --- a/test/tarantool-tests/CMakeLists.txt >>> +++ b/test/tarantool-tests/CMakeLists.txt >>> @@ -65,6 +65,7 @@ add_subdirectory(gh-5813-resolving-of-c-symbols/stripped) >>> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) >>> add_subdirectory(gh-6189-cur_L) >>> add_subdirectory(lj-49-bad-lightuserdata) >>> +add_subdirectory(lj-416-xor-before-jcc) >>> add_subdirectory(lj-601-fix-gc-finderrfunc) >>> add_subdirectory(lj-727-lightuserdata-itern) >>> add_subdirectory(lj-flush-on-trace) >>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >>> new file mode 100644 >>> index 00000000..7c6ab2b9 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >>> @@ -0,0 +1,70 @@ >>> +local ffi = require('ffi') >>> +local tap = require('tap') >>> + >>> +local test = tap.test('lj-416-xor-before-jcc') >>> +test:plan(1) >>> + >>> +-- To reproduce this issue, we need: >>> +-- 1) a register which contains the constant zero value >>> +-- 2) a floating point comparison operation >>> +-- 3) the comparison operation to perform a fused load, which in >>> +-- turn needs to allocate a register, and for there to be no >>> +-- free registers at that moment, and for the register chosen >>> +-- for sacrifice to be holding the constant zero. >>> +-- >>> +-- This leads to assembly code like the following: >>> +-- ucomisd xmm7, [r14+0x18] >>> +-- xor r14d, r14d >>> +-- jnb 0x12a0e001c ->3 >>> +-- >>> +-- That xor is a big problem, as it modifies flags between the >>> +-- ucomisd and the jnb, thereby causing the jnb to do the wrong >>> +-- thing. >>> + >>> +ffi.cdef[[ >>> + int test_xor_func(int a, int b, int c, int d, int e, int f, void * g, int h); >>> +]] >>> +local testxor = ffi.load('libtestxor') > >> Should have a test for x86 platform, since changes are x86 only? > > We have too small amount of tests for LuaJIT anyway, so I prefer to keep > tests for all architectures for now. But... I bet the aarch64 will require `a little bit` different register pressure preparations, will it? IOW - this test is not applicable to other architectures ‘as is’. > >>> + >>> +local handler = setmetatable({}, { >>> + __newindex = function () >>> + -- 0 and nil are suggested as differnt constant-zero values >>> + -- for the call and occupied different registers. >>> + testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0) >>> + end >>> +}) >>> + >>> +local mconf = { >>> + { use = false, value = 100 }, >>> + { use = true, value = 100 }, >>> +} >>> + >>> +local function testf() >>> + -- Generate register pressure. >>> + local value = 50 >>> + for _, rule in ipairs(mconf) do >>> + if rule.use then >>> + value = rule.value >>> + break >>> + end >>> + end >>> + >>> + -- This branch shouldn't be taken. >>> + if value <= 42 then >>> + return true >>> + end >>> + >>> + -- Nothing to do, just call testxor with many arguments. >>> + handler[4] = 4 > >> If it is needed to use the metatable here then why? > > It is needed for suitable registers layout. Unfortunately, I gave up to > create test without it. > >>> +end >>> + >>> +-- We need to create long side trace to generate register >>> +-- pressure. >>> +jit.opt.start('hotloop=1', 'hotexit=1') >>> +for _ = 1, 3 do >>> + -- Don't use any `test` functions here to freeze the trace. >>> + assert (not testf()) >>> +end >>> +test:ok(true, 'imposible branch is not taken') #2 s/imposible/impossible/ >>> + >>> +os.exit(test:check() and 0 or 1) >>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >>> new file mode 100644 >>> index 00000000..17aa9f9b >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >>> @@ -0,0 +1 @@ >>> +BuildTestCLib(libtestxor testxor.c) >>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >>> new file mode 100644 >>> index 00000000..32436d42 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c > > <snipped> > >>> -- >>> 2.34.1 >>> > > -- > Best regards, > Sergey Kaplun
next prev parent reply other threads:[~2022-08-31 16:04 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-07-04 9:33 Sergey Kaplun via Tarantool-patches 2022-07-05 15:10 ` Igor Munkin via Tarantool-patches 2022-07-06 10:30 ` sergos via Tarantool-patches 2022-11-11 10:20 ` Igor Munkin via Tarantool-patches 2022-11-22 4:36 ` Sergey Kaplun via Tarantool-patches 2022-11-22 5:45 ` Igor Munkin via Tarantool-patches 2022-08-31 8:48 ` Sergey Kaplun via Tarantool-patches 2022-08-31 16:04 ` sergos via Tarantool-patches [this message] 2022-09-01 10:32 ` Sergey Kaplun via Tarantool-patches 2022-11-23 7:50 ` Igor Munkin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=07C1C422-04A5-46D5-9D61-ED6B9BC90556@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=sergos@tarantool.org \ --cc=skaplun@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi().' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox