From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id D8E3C6F153; Wed, 31 Aug 2022 19:04:11 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D8E3C6F153 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1661961852; bh=7KUZ5P711X+0cx5KRJfFsL+kC/D+w83DOHzxXUkOxQE=; h=In-Reply-To:Date:References:To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=HbytW+SIrhTesHGt9ZwujLojXgq1faHv0TZa3wSIpzdxNDxFACyhPfshcDlIyyGmi IMAiLtPOoZr9E/8OuayEqQK6gise/2yeEfdYzJ3ki0H2DxvC9zptF8ec4PtqcUrJS+ lhxOQL9xMYP3JOkxHsi8iCxVozgVNHr5v2uvwiiQ= Received: from smtp43.i.mail.ru (smtp43.i.mail.ru [94.100.177.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2A2106F153 for ; Wed, 31 Aug 2022 19:04:10 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2A2106F153 Received: by smtp43.i.mail.ru with esmtpa (envelope-from ) id 1oTQCH-00024k-7O; Wed, 31 Aug 2022 19:04:09 +0300 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) In-Reply-To: Date: Wed, 31 Aug 2022 19:04:08 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <07C1C422-04A5-46D5-9D61-ED6B9BC90556@tarantool.org> References: <20220704093344.13522-1-skaplun@tarantool.org> To: Sergey Kaplun X-Mailer: Apple Mail (2.3696.120.41.1.1) X-Mailru-Src: smtp X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9BF9AC82A1D2D72376AD1D50FE522166510756AD903D3B97E182A05F538085040691A7621C4988195F9EBB236A55E2AF7B2680002455A47CFAF6A035C479D321F X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142185AC9E3593CE4B31AB1881A6453793CE9274300E5CE05BD4401A9E91200F654B0DDDEAE67FA777B9C523897933F9F838773536C001D84045A8D5F18506AF5FF689C2B6934AE262D3EE7EAB7254005DCED8DA55E71E02F9FC08E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3475FE4AA98865E2354B2222F0E537BC9D9324BEE40CB029F02DA1C0F9ED1E67734F9A050677435FA81D7E09C32AA3244C9FA0CADCE9EA2813D6B0D27C0AD0098464EE5813BBCA3A9DFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojxzT94hBWyprpuwS8bJ0EQQ== X-Mailru-Sender: 11C2EC085EDE56FA38FD4C59F7EFE407E517621DFE9A1DAC1560B07E80F4A2DB81AC99488AB30D5C19381EE24192DF5555834048F03EF5D4C9A814A92B2E3B1BA4250FC3964EA4964198E0F3ECE9B5443453F38A29522196 X-Mras: OK Subject: Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi(). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks and with last two nits marked with #1 and #2 - LGTM. Sergos > On 31 Aug 2022, at 11:48, Sergey Kaplun wrote: >=20 > Sergos, Igor! >=20 > Thanks for your review! >=20 > I've updated test with more comments considering your suggestions. > See the iterative patch below: >=20 > Branch is force-pushed. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua = b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > index 7c6ab2b9..39551184 100644 > --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > @@ -13,9 +13,9 @@ test:plan(1) > -- for sacrifice to be holding the constant zero. > -- > -- This leads to assembly code like the following: > --- ucomisd xmm7, [r14+0x18] > +-- ucomisd xmm7, [r14] > -- xor r14d, r14d > --- jnb 0x12a0e001c ->3 > +-- jnb 0x555d3d250014 ->1 > -- > -- That xor is a big problem, as it modifies flags between the > -- ucomisd and the jnb, thereby causing the jnb to do the wrong #1.1 Still missing the correct assembly below in the message.=20 > @@ -34,36 +34,53 @@ local handler =3D setmetatable({}, { > end > }) >=20 >> There=E2=80=99s no trace after the fix, so it=E2=80=99s hard to grasp = the resolution >> used. >=20 > The trace is still compiled, but mov is used instead of xor. > | ucomisd xmm7, [r14] > | mov r14d, 0x0 > | jnb 0x55c95c330014 ->1 #1.2 That what I expected to see in the message, right after =E2=80=98forb= ids emitting=E2=80=99 >=20 >>>=20 >>> This patch forbids emitting xor in `emit_loadi()` for jcc = operations. >=20 >>>=20 >>> Sergey Kaplun: >>> * added the description and the test for the problem >>>=20 >>> Part of tarantool/tarantool#7230 >>> --- >>>=20 >>> --- a/src/lj_emit_x86.h >>> +++ b/src/lj_emit_x86.h >>> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg = base, int32_t ofs, int32_t i) >>> /* mov r, i / xor r, r */ >>> static void emit_loadi(ASMState *as, Reg r, int32_t i) >>> { >>> - /* XOR r,r is shorter, but modifies the flags. This is bad for = HIOP. */ >>> + /* XOR r,r is shorter, but modifies the flags. This is bad for = HIOP/jcc. */ >>> if (i =3D=3D 0 && !(LJ_32 && (IR(as->curins)->o =3D=3D IR_HIOP || >>> (as->curins+1 < as->T->nins && >>> - IR(as->curins+1)->o =3D=3D IR_HIOP)))) { >>> + IR(as->curins+1)->o =3D=3D IR_HIOP))) && >>> + !((*as->mcp =3D=3D 0x0f && (as->mcp[1] & 0xf0) =3D=3D = XI_JCCn) || >>> + (*as->mcp & 0xf0) =3D=3D XI_JCCs)) { >=20 >> I wonder if there can be a case with two instructions emitted between = the comparison >> and the jump. In such a case the xor still can sneak in? >=20 > To be honest I'm nott sure that this is possible due to self-written > assembler. So we must keep it in mind when assembly the correspodning > IRs. >=20 >> Another idea: there could be different instructions that relies on = the flags, such as >> x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be = bigger? >=20 > AFAIK, it is imposible due to close attention to the assembly. So, the answer for both Qs: =E2=80=99not at the moment=E2=80=99. Which, = in turn, implies =E2=80=98we will be running around in search for all places our new optimization leads to a = wrong code generation=E2=80=99. Well, fine! >=20 >>> emit_rr(as, XO_ARITH(XOg_XOR), r, r); >>> } else { >>> MCode *p =3D as->mcp; >>> diff --git a/test/tarantool-tests/CMakeLists.txt = b/test/tarantool-tests/CMakeLists.txt >>> index 5708822e..ad69e2cc 100644 >>> --- a/test/tarantool-tests/CMakeLists.txt >>> +++ b/test/tarantool-tests/CMakeLists.txt >>> @@ -65,6 +65,7 @@ = add_subdirectory(gh-5813-resolving-of-c-symbols/stripped) >>> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) >>> add_subdirectory(gh-6189-cur_L) >>> add_subdirectory(lj-49-bad-lightuserdata) >>> +add_subdirectory(lj-416-xor-before-jcc) >>> add_subdirectory(lj-601-fix-gc-finderrfunc) >>> add_subdirectory(lj-727-lightuserdata-itern) >>> add_subdirectory(lj-flush-on-trace) >>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua = b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >>> new file mode 100644 >>> index 00000000..7c6ab2b9 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >>> @@ -0,0 +1,70 @@ >>> +local ffi =3D require('ffi') >>> +local tap =3D require('tap') >>> + >>> +local test =3D tap.test('lj-416-xor-before-jcc') >>> +test:plan(1) >>> + >>> +-- To reproduce this issue, we need: >>> +-- 1) a register which contains the constant zero value >>> +-- 2) a floating point comparison operation >>> +-- 3) the comparison operation to perform a fused load, which in >>> +-- turn needs to allocate a register, and for there to be no >>> +-- free registers at that moment, and for the register chosen >>> +-- for sacrifice to be holding the constant zero. >>> +-- >>> +-- This leads to assembly code like the following: >>> +-- ucomisd xmm7, [r14+0x18] >>> +-- xor r14d, r14d >>> +-- jnb 0x12a0e001c ->3 >>> +-- >>> +-- That xor is a big problem, as it modifies flags between the >>> +-- ucomisd and the jnb, thereby causing the jnb to do the wrong >>> +-- thing. >>> + >>> +ffi.cdef[[ >>> + int test_xor_func(int a, int b, int c, int d, int e, int f, void = * g, int h); >>> +]] >>> +local testxor =3D ffi.load('libtestxor') >=20 >> Should have a test for x86 platform, since changes are x86 only? >=20 > We have too small amount of tests for LuaJIT anyway, so I prefer to = keep > tests for all architectures for now. But... I bet the aarch64 will require `a little bit` different register = pressure preparations, will it? IOW - this test is not applicable to other = architectures =E2=80=98as is=E2=80=99. >=20 >>> + >>> +local handler =3D setmetatable({}, { >>> + __newindex =3D function () >>> + -- 0 and nil are suggested as differnt constant-zero values >>> + -- for the call and occupied different registers. >>> + testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0) >>> + end >>> +}) >>> + >>> +local mconf =3D { >>> + { use =3D false, value =3D 100 }, >>> + { use =3D true, value =3D 100 }, >>> +} >>> + >>> +local function testf() >>> + -- Generate register pressure. >>> + local value =3D 50 >>> + for _, rule in ipairs(mconf) do >>> + if rule.use then >>> + value =3D rule.value >>> + break >>> + end >>> + end >>> + >>> + -- This branch shouldn't be taken. >>> + if value <=3D 42 then >>> + return true >>> + end >>> + >>> + -- Nothing to do, just call testxor with many arguments. >>> + handler[4] =3D 4 >=20 >> If it is needed to use the metatable here then why? >=20 > It is needed for suitable registers layout. Unfortunately, I gave up = to > create test without it. >=20 >>> +end >>> + >>> +-- We need to create long side trace to generate register >>> +-- pressure. >>> +jit.opt.start('hotloop=3D1', 'hotexit=3D1') >>> +for _ =3D 1, 3 do >>> + -- Don't use any `test` functions here to freeze the trace. >>> + assert (not testf()) >>> +end >>> +test:ok(true, 'imposible branch is not taken') #2 s/imposible/impossible/ >>> + >>> +os.exit(test:check() and 0 or 1) >>> diff --git = a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt = b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >>> new file mode 100644 >>> index 00000000..17aa9f9b >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >>> @@ -0,0 +1 @@ >>> +BuildTestCLib(libtestxor testxor.c) >>> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c = b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >>> new file mode 100644 >>> index 00000000..32436d42 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >=20 > >=20 >>> --=20 >>> 2.34.1 >>>=20 >=20 > --=20 > Best regards, > Sergey Kaplun