From: sergos via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Sergey Kaplun <skaplun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi(). Date: Wed, 6 Jul 2022 13:30:05 +0300 [thread overview] Message-ID: <813CA8F4-3B19-4370-A859-12B956ACCF3C@tarantool.org> (raw) In-Reply-To: <20220704093344.13522-1-skaplun@tarantool.org> Hi! Thanks for the patch! I won’t veto on the patch, since it’s from upstream. Still I have got a number of questions on it’s crutch nature. Perhaps we need to follow up with some test cases to luajit. Some description fixes are needed also. Sergos > On 4 Jul 2022, at 12:33, Sergey Kaplun <skaplun@tarantool.org> wrote: > > From: Mike Pall <mike> > > Thanks to Peter Cawley. > > (cherry picked from commit fb5e522fbc0750c838ef6a926b11c5d870826183) > > To reproduce this issue, we need: > 1) a register which contains the constant zero value > 2) a floating point comparison operation The integer one will set the same flags (CF for the case of JNB) and XOR will clear it anyways. > 3) the comparison operation to perform a fused load, which in > turn needs to allocate a register, and for there to be no > free registers at that moment, and for the register chosen > for sacrifice to be holding the constant zero. > Unfortunately, it’s not clear what this register (I suppose it’s r14d) is used for. Is it an argument, needed after the fall-through in the same trace? Why not it sank down below the branch? IIRC it is a dedicated register used for dispatch, so why is it used for sacrificing? > This leads to assembly code like the following: > | ucomisd xmm7, [r14+0x18] > | xor r14d, r14d > | jnb 0x12a0e001c ->3 > > That xor is a big problem, as it modifies flags between the > ucomisd and the jnb, thereby causing the jnb to do the wrong > thing. > There’s no trace after the fix, so it’s hard to grasp the resolution used. > This patch forbids emitting xor in `emit_loadi()` for jcc operations. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#7230 > --- > > Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-416-xor-before-jcc-full-ci > Issues: > * https://github.com/LuaJIT/LuaJIT/issues/416 > * https://github.com/tarantool/tarantool/issues/7230 > > Changelog entry (I suggest to update this entry with the each > corresponding bump): > =================================================================== > ## bugfix/luajit > > Backported patches from vanilla LuaJIT trunk (gh-7230). In the scope of this > activity, the following issues have been resolved: > * Fixed `emit_loadi()` on x86/x64 emitting xor between condition check > and jump instructions. > =================================================================== > > src/lj_emit_x86.h | 6 +- > test/tarantool-tests/CMakeLists.txt | 1 + > .../lj-416-xor-before-jcc.test.lua | 70 +++++++++++++++++++ > .../lj-416-xor-before-jcc/CMakeLists.txt | 1 + > .../lj-416-xor-before-jcc/testxor.c | 14 ++++ > 5 files changed, 90 insertions(+), 2 deletions(-) > create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc.test.lua > create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt > create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc/testxor.c > > diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h > index 5207f9da..6b58306b 100644 > --- a/src/lj_emit_x86.h > +++ b/src/lj_emit_x86.h > @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i) > /* mov r, i / xor r, r */ > static void emit_loadi(ASMState *as, Reg r, int32_t i) > { > - /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */ > + /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */ > if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP || > (as->curins+1 < as->T->nins && > - IR(as->curins+1)->o == IR_HIOP)))) { > + IR(as->curins+1)->o == IR_HIOP))) && > + !((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) || > + (*as->mcp & 0xf0) == XI_JCCs)) { I wonder if there can be a case with two instructions emitted between the comparison and the jump. In such a case the xor still can sneak in? Another idea: there could be different instructions that relies on the flags, such as x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger? > emit_rr(as, XO_ARITH(XOg_XOR), r, r); > } else { > MCode *p = as->mcp; > diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt > index 5708822e..ad69e2cc 100644 > --- a/test/tarantool-tests/CMakeLists.txt > +++ b/test/tarantool-tests/CMakeLists.txt > @@ -65,6 +65,7 @@ add_subdirectory(gh-5813-resolving-of-c-symbols/stripped) > add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) > add_subdirectory(gh-6189-cur_L) > add_subdirectory(lj-49-bad-lightuserdata) > +add_subdirectory(lj-416-xor-before-jcc) > add_subdirectory(lj-601-fix-gc-finderrfunc) > add_subdirectory(lj-727-lightuserdata-itern) > add_subdirectory(lj-flush-on-trace) > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > new file mode 100644 > index 00000000..7c6ab2b9 > --- /dev/null > +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > @@ -0,0 +1,70 @@ > +local ffi = require('ffi') > +local tap = require('tap') > + > +local test = tap.test('lj-416-xor-before-jcc') > +test:plan(1) > + > +-- To reproduce this issue, we need: > +-- 1) a register which contains the constant zero value > +-- 2) a floating point comparison operation > +-- 3) the comparison operation to perform a fused load, which in > +-- turn needs to allocate a register, and for there to be no > +-- free registers at that moment, and for the register chosen > +-- for sacrifice to be holding the constant zero. > +-- > +-- This leads to assembly code like the following: > +-- ucomisd xmm7, [r14+0x18] > +-- xor r14d, r14d > +-- jnb 0x12a0e001c ->3 > +-- > +-- That xor is a big problem, as it modifies flags between the > +-- ucomisd and the jnb, thereby causing the jnb to do the wrong > +-- thing. Same problems with explanation, copy from <fixed> message above > + > +ffi.cdef[[ > + int test_xor_func(int a, int b, int c, int d, int e, int f, void * g, int h); > +]] > +local testxor = ffi.load('libtestxor') Should have a test for x86 platform, since changes are x86 only? > + > +local handler = setmetatable({}, { > + __newindex = function () > + -- 0 and nil are suggested as differnt constant-zero values > + -- for the call and occupied different registers. > + testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0) > + end > +}) > + > +local mconf = { > + { use = false, value = 100 }, > + { use = true, value = 100 }, > +} > + > +local function testf() > + -- Generate register pressure. > + local value = 50 > + for _, rule in ipairs(mconf) do > + if rule.use then > + value = rule.value > + break > + end > + end > + > + -- This branch shouldn't be taken. > + if value <= 42 then > + return true > + end > + > + -- Nothing to do, just call testxor with many arguments. > + handler[4] = 4 If it is needed to use the metatable here then why? > +end > + > +-- We need to create long side trace to generate register > +-- pressure. > +jit.opt.start('hotloop=1', 'hotexit=1') > +for _ = 1, 3 do > + -- Don't use any `test` functions here to freeze the trace. > + assert (not testf()) > +end > +test:ok(true, 'imposible branch is not taken') > + > +os.exit(test:check() and 0 or 1) > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt > new file mode 100644 > index 00000000..17aa9f9b > --- /dev/null > +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt > @@ -0,0 +1 @@ > +BuildTestCLib(libtestxor testxor.c) > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c > new file mode 100644 > index 00000000..32436d42 > --- /dev/null > +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c > @@ -0,0 +1,14 @@ > +#define UNUSED(x) ((void)x) > + > +int test_xor_func(int a, int b, int c, int d, int e, int f, void *g, int h) > +{ > + UNUSED(a); > + UNUSED(b); > + UNUSED(c); > + UNUSED(d); > + UNUSED(e); > + UNUSED(f); > + UNUSED(g); > + UNUSED(h); > + return 0; > +} > -- > 2.34.1 >
next prev parent reply other threads:[~2022-07-06 10:30 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-07-04 9:33 Sergey Kaplun via Tarantool-patches 2022-07-05 15:10 ` Igor Munkin via Tarantool-patches 2022-07-06 10:30 ` sergos via Tarantool-patches [this message] 2022-11-11 10:20 ` Igor Munkin via Tarantool-patches 2022-11-22 4:36 ` Sergey Kaplun via Tarantool-patches 2022-11-22 5:45 ` Igor Munkin via Tarantool-patches 2022-08-31 8:48 ` Sergey Kaplun via Tarantool-patches 2022-08-31 16:04 ` sergos via Tarantool-patches 2022-09-01 10:32 ` Sergey Kaplun via Tarantool-patches 2022-11-23 7:50 ` Igor Munkin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=813CA8F4-3B19-4370-A859-12B956ACCF3C@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=sergos@tarantool.org \ --cc=skaplun@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi().' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox