From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 232696F153; Thu, 1 Sep 2022 13:34:55 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 232696F153 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1662028495; bh=48uAvgxM7Tf4uXOzmJ710YUCnJ6pMV9OiMJFAz9wcK0=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=L4iePndrWX4PLKP4QO1BzEQp07GOfZMsjxrXQKo2W+ivZenqrROLuOgf1dwyVc1gN GUolsYy3pBYtTXeqLBq6day/P+jJirdU7CFXBKPtXO6fWXHjF94bv8JRAFMf71j1As h/q23qOeA1ywuAxz6oy7r+Y2EEIIQwhGQnWFOquk= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 485D76F153 for ; Thu, 1 Sep 2022 13:34:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 485D76F153 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1oThXB-0007Y4-AV; Thu, 01 Sep 2022 13:34:53 +0300 Date: Thu, 1 Sep 2022 13:32:15 +0300 To: sergos Message-ID: References: <20220704093344.13522-1-skaplun@tarantool.org> <07C1C422-04A5-46D5-9D61-ED6B9BC90556@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <07C1C422-04A5-46D5-9D61-ED6B9BC90556@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9BF9AC82A1D2D7237348FBDDF4BD7B3F5AB3D3A126D7E1DA2182A05F538085040D8529D3731C0045EE562D7C2B506F773A7A3D962E45BED5F4377C52109CA8204 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE76D24A1449B9F25A2EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063706922F90966A37BA8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D87007D7F5FCFC2273557C2E46F785D4C7117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BECADA55FE5B58BB7A471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CDDB9BF3B882869D543847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142185AC9E3593CE4B31AB1881A6453793CE9274300E5CE05BD4401A9E91200F654B06604ACF793451CC5523897933F9F8387BDCD8F31E025DE4FD34518BC9E5A4C3B9C2B6934AE262D3EE7EAB7254005DCED8DA55E71E02F9FC08E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3435B1F14A93C4073F46381D47190CCC3D65E50E9BB8A21735669CCBD9EEFF260F9F7C99F1608CFF221D7E09C32AA3244C0B4CE70B4E7712A60B2EC241F6575A3FF2F5F14F68F1805BFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojCtRLoEB0IE2JagHwbSr9HA== X-DA7885C5: 1DD6EEB33004EAC98E1B1E1B35405069699A712B982DF01443954A273A8F5768262E2D401490A4A0DB037EFA58388B346E8BC1A9835FDE71 X-Mailru-Sender: 689FA8AB762F7393CC2E0F076E87284E340CA87C42778507BBFE9D065FD9C1200FBE9A32752B8C9C2AA642CC12EC09F1FB559BB5D741EB962F61BD320559CF1EFD657A8799238ED55FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi(). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Fixed your comments. On 31.08.22, sergos wrote: > Hi! > > Thanks and with last two nits marked with #1 and #2 - LGTM. > > Sergos > > > On 31 Aug 2022, at 11:48, Sergey Kaplun wrote: > > > > Sergos, Igor! > > > > Thanks for your review! > > > > I've updated test with more comments considering your suggestions. > > See the iterative patch below: > > > > Branch is force-pushed. > > =================================================================== > > diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > > index 7c6ab2b9..39551184 100644 > > --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > > +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua > > @@ -13,9 +13,9 @@ test:plan(1) > > -- for sacrifice to be holding the constant zero. > > -- > > -- This leads to assembly code like the following: > > --- ucomisd xmm7, [r14+0x18] > > +-- ucomisd xmm7, [r14] > > -- xor r14d, r14d > > --- jnb 0x12a0e001c ->3 > > +-- jnb 0x555d3d250014 ->1 > > -- > > -- That xor is a big problem, as it modifies flags between the > > -- ucomisd and the jnb, thereby causing the jnb to do the wrong > > #1.1 Still missing the correct assembly below in the message. > > > @@ -34,36 +34,53 @@ local handler = setmetatable({}, { > > end > > }) > > > > > >> There’s no trace after the fix, so it’s hard to grasp the resolution > >> used. > > > > The trace is still compiled, but mov is used instead of xor. > > | ucomisd xmm7, [r14] > > | mov r14d, 0x0 > > | jnb 0x55c95c330014 ->1 > > #1.2 That what I expected to see in the message, right after ‘forbids emitting’ Added the correct assembly to the commit message: =================================================================== x86/x64: Check for jcc when using xor r,r in emit_loadi(). Thanks to Peter Cawley. (cherry picked from commit fb5e522fbc0750c838ef6a926b11c5d870826183) To reproduce this issue, we need: 1) a register which contains the constant zero value 2) a floating point comparison operation 3) the comparison operation to perform a fused load, which in turn needs to allocate a register, and for there to be no free registers at that moment, and for the register chosen for sacrifice to be holding the constant zero. This leads to assembly code like the following: | ucomisd xmm7, [r14] | xor r14d, r14d | jnb 0x555d3d250014 ->1 That xor is a big problem, as it modifies flags between the ucomisd and the jnb, thereby causing the jnb to do the wrong thing. This patch forbids emitting xor in `emit_loadi()` for jcc operations, so mov is emmited instead. Now assembly code looks like the following: | ucomisd xmm7, [r14] | mov r14d, 0x0 | jnb 0x55c95c330014 ->1 Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#7230 =================================================================== > > > > >>> > >>> This patch forbids emitting xor in `emit_loadi()` for jcc operations. > > > >>> > >>> Sergey Kaplun: > >>> * added the description and the test for the problem > >>> > >>> Part of tarantool/tarantool#7230 > >>> --- > >>> > > > > >>> --- a/src/lj_emit_x86.h > >>> +++ b/src/lj_emit_x86.h > >>> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i) > >>> /* mov r, i / xor r, r */ > >>> static void emit_loadi(ASMState *as, Reg r, int32_t i) > >>> { > >>> - /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */ > >>> + /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */ > >>> if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP || > >>> (as->curins+1 < as->T->nins && > >>> - IR(as->curins+1)->o == IR_HIOP)))) { > >>> + IR(as->curins+1)->o == IR_HIOP))) && > >>> + !((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) || > >>> + (*as->mcp & 0xf0) == XI_JCCs)) { > > > >> I wonder if there can be a case with two instructions emitted between the comparison > >> and the jump. In such a case the xor still can sneak in? > > > > To be honest I'm nott sure that this is possible due to self-written > > assembler. So we must keep it in mind when assembly the correspodning > > IRs. > > > >> Another idea: there could be different instructions that relies on the flags, such as > >> x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger? > > > > AFAIK, it is imposible due to close attention to the assembly. > > So, the answer for both Qs: ’not at the moment’. Which, in turn, implies ‘we will be > running around in search for all places our new optimization leads to a wrong code > generation’. Well, fine! Yes, you are right. > > > > >>> emit_rr(as, XO_ARITH(XOg_XOR), r, r); > >>> } else { > >>> MCode *p = as->mcp; > >>> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt > >>> index 5708822e..ad69e2cc 100644 > >>> --- a/test/tarantool-tests/CMakeLists.txt > >>> +++ b/test/tarantool-tests/CMakeLists.txt > >>> +local testxor = ffi.load('libtestxor') > > > >> Should have a test for x86 platform, since changes are x86 only? > > > > We have too small amount of tests for LuaJIT anyway, so I prefer to keep > > tests for all architectures for now. > > But... I bet the aarch64 will require `a little bit` different register pressure > preparations, will it? IOW - this test is not applicable to other architectures > ‘as is’. Yes, of corse. It doesn't show litterally the same issue, but may show another one. > > >>> +test:ok(true, 'imposible branch is not taken') > > #2 s/imposible/impossible/ Fixed. Branch is force-pushed. =================================================================== diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua index 39551184..9a1c96ef 100644 --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua @@ -82,6 +82,6 @@ for _ = 1, 3 do -- Don't use any `test` functions here to freeze the trace. assert(not testf()) end -test:ok(true, 'imposible branch is not taken') +test:ok(true, 'impossible branch is not taken') os.exit(test:check() and 0 or 1) =================================================================== > > >>> + > > > >>> -- > >>> 2.34.1 > >>> > > > > -- > > Best regards, > > Sergey Kaplun > -- Best regards, Sergey Kaplun