From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 97B866F153; Wed, 31 Aug 2022 11:50:42 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 97B866F153 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1661935842; bh=J/VuaC1HJ/1fFSHbXZX+4QsWEwL7VUr7sD9As4QFLlc=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Mku7h9gOlM1lT+1mr7wgw1m3xFnBUfEceSL9yoSov+Yu3ITL8UkLNKCun7uaW19FL xK7F8owrA3tFKTSU8P1bmP6WJ1Lye82norh3haLQbVArzvjCZaFLHLSNbO2st4U0zc QctFN//5e0oWsBT5kAG2hf36suvn5GJtBhZLAhBc= Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6AE556F153 for ; Wed, 31 Aug 2022 11:50:40 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6AE556F153 Received: by smtp42.i.mail.ru with esmtpa (envelope-from ) id 1oTJQl-0003a3-Df; Wed, 31 Aug 2022 11:50:39 +0300 Date: Wed, 31 Aug 2022 11:48:02 +0300 To: Sergey Ostanevich , Igor Munkin Cc: tarantool-patches@dev.tarantool.org Message-ID: References: <20220704093344.13522-1-skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220704093344.13522-1-skaplun@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9D8E1125CB37184E1C7694BA8CA54B48E14788A9DC669AB301867C24CE74E72BB5FE18E9CBFBE7C07CD5E8FD608EED2789A12BAE31DC9072A2A4E7DEF3480137D42B98C014DDF718D X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE71107D7B19CDFFE90EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006377F69ABDCCC31D2058638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8CE6D30EDAD342F620402395C9F3BFD18117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAE9A1BBD95851C5BA471835C12D1D977C4224003CC836476EB9C4185024447017B076A6E789B0E975F5C1EE8F4F765FC092BEC5799CCA3073AA81AA40904B5D9CF19DD082D7633A078D18283394535A93AA81AA40904B5D98AA50765F79006374A711A1C54DA1D5FD81D268191BDAD3D698AB9A7B718F8C4D1B931868CE1C5781A620F70A64A45A98AA50765F79006372E808ACE2090B5E1725E5C173C3A84C3C5EA940A35A165FF2DBA43225CD8A89FB26E97DCB74E6252A91E23F1B6B78B78B5C8C57E37DE458BEDA766A37F9254B7 X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142185AC9E3593CE4B31AB1881A6453793CE9274300E5CE05BD4401A9E91200F654B0D4C601CEA03E70EE523897933F9F8387B2B87986872DE5B3F7BB38B7105CF4D19C2B6934AE262D3EE7EAB7254005DCED8DA55E71E02F9FC08E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34BC3EEE75EF3BACCF6311E428C97E1BBF47DC7FDFFACC7A95FCD02AF74662314997F2269FDE0C1AE81D7E09C32AA3244CCC962F7B66987F69EFB84CD413CA34301DD47778AE04E04DFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojxzT94hBWyppjXx8v1tubkg== X-Mailru-Sender: 07FBBCF39629D1142254247A6196FF9B2E407934FA71EE341C4DF6F6122E77D2C479C2914A57FB1FDEDBA653FF35249392D99EB8CC7091A70E183A470755BFD208F19895AA18418972D6B4FCE48DF648AE208404248635DF X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] x86/x64: Check for jcc when using xor r, r in emit_loadi(). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Sergos, Igor! Thanks for your review! I've updated test with more comments considering your suggestions. See the iterative patch below: Branch is force-pushed. =================================================================== diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua index 7c6ab2b9..39551184 100644 --- a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua @@ -13,9 +13,9 @@ test:plan(1) -- for sacrifice to be holding the constant zero. -- -- This leads to assembly code like the following: --- ucomisd xmm7, [r14+0x18] +-- ucomisd xmm7, [r14] -- xor r14d, r14d --- jnb 0x12a0e001c ->3 +-- jnb 0x555d3d250014 ->1 -- -- That xor is a big problem, as it modifies flags between the -- ucomisd and the jnb, thereby causing the jnb to do the wrong @@ -34,36 +34,53 @@ local handler = setmetatable({}, { end }) +local MAGIC = 42 local mconf = { - { use = false, value = 100 }, - { use = true, value = 100 }, + { use = false, value = MAGIC + 1 }, + { use = true, value = MAGIC + 1 }, } local function testf() - -- Generate register pressure. - local value = 50 + -- All code below is needed for generating register pressure. + local value + -- The first trace to compile is this for loop, with `rule.use` + -- values is `true`. for _, rule in ipairs(mconf) do + -- The side trace starts here, when `rule.use` value is + -- `false`, returns to the `for` loop, where the function was + -- called, starts another iteration, calls `testf()` again and + -- ends at JITERL bytecode for the loop in this function. if rule.use then value = rule.value break end end - -- This branch shouldn't be taken. - if value <= 42 then + -- The code below is recorded with the following IRs: + -- .... SNAP #1 [ lj-416-xor-before-jcc.test.lua:44|---- ] + -- 0012 > num UGT 0009 +42 + -- + -- That leads to the following assembly: + -- ucomisd xmm7, [r14] + -- xor r14d, r14d + -- jnb 0x555d3d250014 ->1 + -- + -- As a result, this branch is taken due to the emitted `xor` + -- instruction until the issue is not resolved. + if value <= MAGIC then return true end -- Nothing to do, just call testxor with many arguments. - handler[4] = 4 + handler.nothing = 'to do' end -- We need to create long side trace to generate register --- pressure. +-- pressure (see the comment in `testf()`). jit.opt.start('hotloop=1', 'hotexit=1') for _ = 1, 3 do -- Don't use any `test` functions here to freeze the trace. - assert (not testf()) + assert(not testf()) end test:ok(true, 'imposible branch is not taken') =================================================================== On 04.07.22, Sergey Kaplun wrote: >> From: Mike Pall >> >> Thanks to Peter Cawley. >> >> (cherry picked from commit fb5e522fbc0750c838ef6a926b11c5d870826183) >> >> To reproduce this issue, we need: >> 1) a register which contains the constant zero value >> 2) a floating point comparison operation > The integer one will set the same flags (CF for the case of JNB) and XOR will > clear it anyways. I suppose that this is just not our case -- for integer comparasions we should use cdata objects. I failed to create test case with the corresponding behaviour. >> 3) the comparison operation to perform a fused load, which in >> turn needs to allocate a register, and for there to be no >> free registers at that moment, and for the register chosen >> for sacrifice to be holding the constant zero. > Unfortunately, it’s not clear what this register (I suppose it’s r14d) is used for. > Is it an argument, needed after the fall-through in the same trace? Yes, it is the argument for the future call. | mov [rsp+0x8], r14 | mov [rsp], r15 | xor r9d, r9d | xor r8d, r8d | xor ecx, ecx | xor edx, edx | xor esi, esi | xor edi, edi | call rbx > Why not it sank down below the branch? I supose that this is issue related to self-written assembly from bottom of a trace to its start. First the branch is encoded, and only after that, we assembly the comparision operation. > IIRC it is a dedicated register used for dispatch, so why is it used for sacrificing? r14d is DISPATCH register only for LuaJIT VM. >> >> This leads to assembly code like the following: >> | ucomisd xmm7, [r14] >> | xor r14d, r14d >> | jnb 0x55c95c330014 ->1 >> >> That xor is a big problem, as it modifies flags between the >> ucomisd and the jnb, thereby causing the jnb to do the wrong >> thing. > There’s no trace after the fix, so it’s hard to grasp the resolution > used. The trace is still compiled, but mov is used instead of xor. | ucomisd xmm7, [r14] | mov r14d, 0x0 | jnb 0x55c95c330014 ->1 >> >> This patch forbids emitting xor in `emit_loadi()` for jcc operations. >> >> Sergey Kaplun: >> * added the description and the test for the problem >> >> Part of tarantool/tarantool#7230 >> --- >> >> Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-416-xor-before-jcc-full-ci >> Issues: >> * https://github.com/LuaJIT/LuaJIT/issues/416 >> * https://github.com/tarantool/tarantool/issues/7230 >> >> Changelog entry (I suggest to update this entry with the each >> corresponding bump): >> =================================================================== >> ## bugfix/luajit >> >> Backported patches from vanilla LuaJIT trunk (gh-7230). In the scope of this >> activity, the following issues have been resolved: >> * Fixed `emit_loadi()` on x86/x64 emitting xor between condition check >> and jump instructions. >> =================================================================== >> >> src/lj_emit_x86.h | 6 +- >> test/tarantool-tests/CMakeLists.txt | 1 + >> .../lj-416-xor-before-jcc.test.lua | 70 +++++++++++++++++++ >> .../lj-416-xor-before-jcc/CMakeLists.txt | 1 + >> .../lj-416-xor-before-jcc/testxor.c | 14 ++++ >> 5 files changed, 90 insertions(+), 2 deletions(-) >> create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc.test.lua >> create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >> create mode 100644 test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >> >> diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h >> index 5207f9da..6b58306b 100644 >> --- a/src/lj_emit_x86.h >> +++ b/src/lj_emit_x86.h >> @@ -274,10 +274,12 @@ static void emit_movmroi(ASMState *as, Reg base, int32_t ofs, int32_t i) >> /* mov r, i / xor r, r */ >> static void emit_loadi(ASMState *as, Reg r, int32_t i) >> { >> - /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP. */ >> + /* XOR r,r is shorter, but modifies the flags. This is bad for HIOP/jcc. */ >> if (i == 0 && !(LJ_32 && (IR(as->curins)->o == IR_HIOP || >> (as->curins+1 < as->T->nins && >> - IR(as->curins+1)->o == IR_HIOP)))) { >> + IR(as->curins+1)->o == IR_HIOP))) && >> + !((*as->mcp == 0x0f && (as->mcp[1] & 0xf0) == XI_JCCn) || >> + (*as->mcp & 0xf0) == XI_JCCs)) { > I wonder if there can be a case with two instructions emitted between the comparison > and the jump. In such a case the xor still can sneak in? To be honest I'm nott sure that this is possible due to self-written assembler. So we must keep it in mind when assembly the correspodning IRs. > Another idea: there could be different instructions that relies on the flags, such as > x86Arith:XOg_ADC or x86Op:XO_CMOV, so the if condition above could be bigger? AFAIK, it is imposible due to close attention to the assembly. >> emit_rr(as, XO_ARITH(XOg_XOR), r, r); >> } else { >> MCode *p = as->mcp; >> diff --git a/test/tarantool-tests/CMakeLists.txt b/test/tarantool-tests/CMakeLists.txt >> index 5708822e..ad69e2cc 100644 >> --- a/test/tarantool-tests/CMakeLists.txt >> +++ b/test/tarantool-tests/CMakeLists.txt >> @@ -65,6 +65,7 @@ add_subdirectory(gh-5813-resolving-of-c-symbols/stripped) >> add_subdirectory(gh-6098-fix-side-exit-patching-on-arm64) >> add_subdirectory(gh-6189-cur_L) >> add_subdirectory(lj-49-bad-lightuserdata) >> +add_subdirectory(lj-416-xor-before-jcc) >> add_subdirectory(lj-601-fix-gc-finderrfunc) >> add_subdirectory(lj-727-lightuserdata-itern) >> add_subdirectory(lj-flush-on-trace) >> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc.test.lua b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >> new file mode 100644 >> index 00000000..7c6ab2b9 >> --- /dev/null >> +++ b/test/tarantool-tests/lj-416-xor-before-jcc.test.lua >> @@ -0,0 +1,70 @@ >> +local ffi = require('ffi') >> +local tap = require('tap') >> + >> +local test = tap.test('lj-416-xor-before-jcc') >> +test:plan(1) >> + >> +-- To reproduce this issue, we need: >> +-- 1) a register which contains the constant zero value >> +-- 2) a floating point comparison operation >> +-- 3) the comparison operation to perform a fused load, which in >> +-- turn needs to allocate a register, and for there to be no >> +-- free registers at that moment, and for the register chosen >> +-- for sacrifice to be holding the constant zero. >> +-- >> +-- This leads to assembly code like the following: >> +-- ucomisd xmm7, [r14+0x18] >> +-- xor r14d, r14d >> +-- jnb 0x12a0e001c ->3 >> +-- >> +-- That xor is a big problem, as it modifies flags between the >> +-- ucomisd and the jnb, thereby causing the jnb to do the wrong >> +-- thing. >> + >> +ffi.cdef[[ >> + int test_xor_func(int a, int b, int c, int d, int e, int f, void * g, int h); >> +]] >> +local testxor = ffi.load('libtestxor') > Should have a test for x86 platform, since changes are x86 only? We have too small amount of tests for LuaJIT anyway, so I prefer to keep tests for all architectures for now. >> + >> +local handler = setmetatable({}, { >> + __newindex = function () >> + -- 0 and nil are suggested as differnt constant-zero values >> + -- for the call and occupied different registers. >> + testxor.test_xor_func(0, 0, 0, 0, 0, 0, nil, 0) >> + end >> +}) >> + >> +local mconf = { >> + { use = false, value = 100 }, >> + { use = true, value = 100 }, >> +} >> + >> +local function testf() >> + -- Generate register pressure. >> + local value = 50 >> + for _, rule in ipairs(mconf) do >> + if rule.use then >> + value = rule.value >> + break >> + end >> + end >> + >> + -- This branch shouldn't be taken. >> + if value <= 42 then >> + return true >> + end >> + >> + -- Nothing to do, just call testxor with many arguments. >> + handler[4] = 4 > If it is needed to use the metatable here then why? It is needed for suitable registers layout. Unfortunately, I gave up to create test without it. >> +end >> + >> +-- We need to create long side trace to generate register >> +-- pressure. >> +jit.opt.start('hotloop=1', 'hotexit=1') >> +for _ = 1, 3 do >> + -- Don't use any `test` functions here to freeze the trace. >> + assert (not testf()) >> +end >> +test:ok(true, 'imposible branch is not taken') >> + >> +os.exit(test:check() and 0 or 1) >> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >> new file mode 100644 >> index 00000000..17aa9f9b >> --- /dev/null >> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/CMakeLists.txt >> @@ -0,0 +1 @@ >> +BuildTestCLib(libtestxor testxor.c) >> diff --git a/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >> new file mode 100644 >> index 00000000..32436d42 >> --- /dev/null >> +++ b/test/tarantool-tests/lj-416-xor-before-jcc/testxor.c >> -- >> 2.34.1 >> -- Best regards, Sergey Kaplun