From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 241FC6ECE3; Wed, 6 Jul 2022 13:30:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 241FC6ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1657103413; bh=q5LD9bdSNH6+Mec9ZNs0vFRzbtTvcdg4zOgTz//8288=; h=In-Reply-To:Date:References:To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=rmGQR3PRAwUAjxFD94Z+ZNEG45NHTQH4umhLZlwQCHWAsumpn9ktuGEzbKUe0WsVL AaIy7nhGZHAmZcgjmmjcCiKvEk9BUp4qwqBqNsThrQovqfzqcUWPHmA6xEVC8PQ3t7 evVU4Xt6Ne0b6AvSMA9Wa4r6zHQ2OGFXfKhVLlB4= Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 3185D6ECE3 for ; Wed, 6 Jul 2022 13:30:11 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3185D6ECE3 Received: by smtp51.i.mail.ru with esmtpa (envelope-from ) id 1o92IM-0004Vs-9i; Wed, 06 Jul 2022 13:30:10 +0300 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) In-Reply-To: <20220704093344.13522-1-skaplun@tarantool.org> Date: Wed, 6 Jul 2022 13:30:05 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <813CA8F4-3B19-4370-A859-12B956ACCF3C@tarantool.org> References: <20220704093344.13522-1-skaplun@tarantool.org> To: Sergey Kaplun X-Mailer: Apple Mail (2.3696.100.31) X-Mailru-Src: smtp X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD987C0EE6E7F0A597DF6D449F86FFBCF60688355514E92E4BCCD62213F67905E7ACB0B9623667C58E38C5BC623DDE2B9ECF6A22358239157774787F701E02FBF3D X-8FC586DF: 6EFBBC1D9D64D975 X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142185AC9E3593CE4B31AB1881A6453793CE9274300E5CE05BD4401A9E91200F654B060581190B556F3F52025898894730043B3ECD83EDF7B792FA354D7CE677587339C2B6934AE262D3EE7EAB7254005DCED8DA55E71E02F9FC08E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34ECB3E21D3CD9CB4F063ACBBD6ABDE3F6F7D5C4D48E72937EE0CFE58E293048ED1E43BF598D4814711D7E09C32AA3244CB19CEDB2961DFA1CFFD425BBAFA0C44339C99C45E8D137E9FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojb+SzFQ2YIi/My9TSUefNuw== X-Mailru-Sender: 11C2EC085EDE56FA38FD4C59F7EFE4079A64FD38D472CF87398A7C895BA9D536B41993B72DD239B219381EE24192DF5555834048F03EF5D4C9A814A92B2E3B1BA4250FC3964EA4964198E0F3ECE9B5443453F38A29522196 X-Mras: OK Subject: Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi(). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch! I won=E2=80=99t veto on the patch, since it=E2=80=99s from upstream. = Still I have got a number of questions on it=E2=80=99s crutch nature. Perhaps we need to = follow up with some test cases to luajit. Some description fixes are needed also. Sergos > On 4 Jul 2022, at 12:33, Sergey Kaplun wrote: >=20 > From: Mike Pall >=20 > Thanks to Peter Cawley. >=20 > (cherry picked from commit fb5e522fbc0750c838ef6a926b11c5d870826183) >=20 > To reproduce this issue, we need: > 1) a register which contains the constant zero value > 2) a floating point comparison operation The integer one will set the same flags (CF for the case of JNB) and XOR = will clear it anyways.=20 > 3) the comparison operation to perform a fused load, which in > turn needs to allocate a register, and for there to be no > free registers at that moment, and for the register chosen > for sacrifice to be holding the constant zero. >=20 Unfortunately, it=E2=80=99s not clear what this register (I suppose = it=E2=80=99s r14d) is used for. Is it an argument, needed after the fall-through in the same trace? Why not it sank down below the branch?=20 IIRC it is a dedicated register used for dispatch, so why is it used for = sacrificing? > This leads to assembly code like the following: > | ucomisd xmm7, [r14+0x18] > | xor r14d, r14d > | jnb 0x12a0e001c ->3 >=20 > That xor is a big problem, as it modifies flags between the > ucomisd and the jnb, thereby causing the jnb to do the wrong > thing. >=20 There=E2=80=99s no trace after the fix, so it=E2=80=99s hard to grasp = the resolution used. =20 > This patch forbids emitting xor in `emit_loadi()` for jcc operations. >=20 > Sergey Kaplun: > * added the description and the test for the problem >=20 > Part of tarantool/tarantool#7230 > --- >=20 > Branch: = https://github.com/tarantool/luajit/tree/skaplun/lj-416-xor-before-jcc-ful= l-ci > Issues: > * https://github.com/LuaJIT/LuaJIT/issues/416 > * https://github.com/tarantool/tarantool/issues/7230 >=20 > Changelog entry (I suggest to update this entry with the each > corresponding bump): > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > ## bugfix/luajit >=20 > Backported patches from vanilla LuaJIT trunk (gh-7230). In the scope = of this > activity, the following issues have been resolved: > * Fixed `emit_loadi()` on x86/x64 emitting xor between condition check > and jump instructions. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > src/lj_emit_x86.h | 6 +- > test/tarantool-tests/CMakeLists.txt | 1 + > .../lj-416-xor-before-jcc.test.lua | 70 +++++++++++++++++++ > .../lj-416-xor-before-jcc/CMakeLists.txt | 1 + > .../lj-416-xor-before-jcc/testxor.c | 14 ++++ > 5 files changed, 90 insertions(+), 2 deletions(-) > create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc.test.lua > create mode 100644 = test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt > create mode 100644 = test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >=20 > diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h > index 5207f9da..6b58306b 100644 > --- a/src/lj_emit_x86.h > +++ b/src/lj_emit_x86.h > @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, = int32_t ofs, int32_t i) > /* mov r, i / xor r, r */ > static void emit_loadi(ASMState *as, Reg r, int32_t i) > { > - /* XOR r,r is shorter, but modifies the flags. This is bad for = HIOP. */ > + /* XOR r,r is shorter, but modifies the flags. This is bad for = HIOP/jcc. */ > if (i =3D=3D 0 && !(LJ_32 && (IR(as->curins)->o =3D=3D IR_HIOP || > (as->curins+1 < as->T->nins && > - IR(as->curins+1)->o =3D=3D IR_HIOP)))) { > + IR(as->curins+1)->o =3D=3D IR_HIOP))) && > + !((*as->mcp =3D=3D 0x0f && (as->mcp[1] & 0xf0) =3D=3D = XI_JCCn) || > + (*as->mcp & 0xf0) =3D=3D XI_JCCs)) { I wonder if there can be a case with two instructions emitted between = the comparison and the jump. In such a case the xor still can sneak in?=20 Another idea: there could be different instructions that relies on the = flags, such as x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be = bigger? > emit_rr(as, XO_ARITH(XOg_XOR), r, r); > } else { > MCode *p =3D as->mcp; > diff --git a/test/tarantool-tests/CMakeLists.txt = b/test/tarantool-tests/CMakeLists.txt > index 5708822e..ad69e2cc 100644 > --- a/test/tarantool-tests/CMakeLists.txt > +++ b/test/tarantool-tests/CMakeLists.txt > @@ -65,6 +65,7 @@ = add_subdirectory(gh-5813-resolving-of-c-symbols/stripped) > add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) > add_subdirectory(gh-6189-cur_L) > add_subdirectory(lj-49-bad-lightuserdata) > +add_subdirectory(lj-416-xor-before-jcc) > add_subdirectory(lj-601-fix-gc-finderrfunc) > add_subdirectory(lj-727-lightuserdata-itern) > add_subdirectory(lj-flush-on-trace) > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua = b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > new file mode 100644 > index 00000000..7c6ab2b9 > --- /dev/null > +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > @@ -0,0 +1,70 @@ > +local ffi =3D require('ffi') > +local tap =3D require('tap') > + > +local test =3D tap.test('lj-416-xor-before-jcc') > +test:plan(1) > + > +-- To reproduce this issue, we need: > +-- 1) a register which contains the constant zero value > +-- 2) a floating point comparison operation > +-- 3) the comparison operation to perform a fused load, which in > +-- turn needs to allocate a register, and for there to be no > +-- free registers at that moment, and for the register chosen > +-- for sacrifice to be holding the constant zero. > +-- > +-- This leads to assembly code like the following: > +-- ucomisd xmm7, [r14+0x18] > +-- xor r14d, r14d > +-- jnb 0x12a0e001c ->3 > +-- > +-- That xor is a big problem, as it modifies flags between the > +-- ucomisd and the jnb, thereby causing the jnb to do the wrong > +-- thing. Same problems with explanation, copy from message above > + > +ffi.cdef[[ > + int test_xor_func(int a, int b, int c, int d, int e, int f, void * = g, int h); > +]] > +local testxor =3D ffi.load('libtestxor') Should have a test for x86 platform, since changes are x86 only? > + > +local handler =3D setmetatable({}, { > + __newindex =3D function () > + -- 0 and nil are suggested as differnt constant-zero values > + -- for the call and occupied different registers. > + testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0) > + end > +}) > + > +local mconf =3D { > + { use =3D false, value =3D 100 }, > + { use =3D true, value =3D 100 }, > +} > + > +local function testf() > + -- Generate register pressure. > + local value =3D 50 > + for _, rule in ipairs(mconf) do > + if rule.use then > + value =3D rule.value > + break > + end > + end > + > + -- This branch shouldn't be taken. > + if value <=3D 42 then > + return true > + end > + > + -- Nothing to do, just call testxor with many arguments. > + handler[4] =3D 4 If it is needed to use the metatable here then why? > +end > + > +-- We need to create long side trace to generate register > +-- pressure. > +jit.opt.start('hotloop=3D1', 'hotexit=3D1') > +for _ =3D 1, 3 do > + -- Don't use any `test` functions here to freeze the trace. > + assert (not testf()) > +end > +test:ok(true, 'imposible branch is not taken') > + > +os.exit(test:check() and 0 or 1) > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt = b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt > new file mode 100644 > index 00000000..17aa9f9b > --- /dev/null > +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt > @@ -0,0 +1 @@ > +BuildTestCLib(libtestxor testxor.c) > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c = b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c > new file mode 100644 > index 00000000..32436d42 > --- /dev/null > +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c > @@ -0,0 +1,14 @@ > +#define UNUSED(x) ((void)x) > + > +int test_xor_func(int a, int b, int c, int d, int e, int f, void *g, = int h) > +{ > + UNUSED(a); > + UNUSED(b); > + UNUSED(c); > + UNUSED(d); > + UNUSED(e); > + UNUSED(f); > + UNUSED(g); > + UNUSED(h); > + return 0; > +} > --=20 > 2.34.1 >=20