From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id BA0F36F153; Wed, 31 Aug 2022 12:55:17 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BA0F36F153 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1661939717; bh=d2C34Bbyu9KynmdR5PUa37Gs3GHw6M+PQ0DvUqnE1u8=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=HLdXSyu3uaR4NWVD8rTFGgK2pEky1bNVOKm2kLHLDqtkahQqKvvt+cGB6a/QnRd3O iik1TiqmaO0vOPHsraV0Yrn2iT/4HJv2nBX82kwP6FRTt78vxdrzm75m85vziV8QRs SXWh0R5JYdJQKmL22Hyij5KRoE0sxG4kM7K7JCao= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7295C6F153 for ; Wed, 31 Aug 2022 12:55:16 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7295C6F153 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1oTKRH-0003YU-4Y; Wed, 31 Aug 2022 12:55:15 +0300 To: Igor Munkin , Sergey Ostanevich Date: Wed, 31 Aug 2022 12:52:37 +0300 Message-Id: <20220831095237.18440-1-skaplun@tarantool.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailru-Src: smtp X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9BF9AC82A1D2D7237DB83007AD4F69C3E49B92567B68D2AA3182A05F538085040643F40E1D7288E0C2DDB5ACE6772C1F859081596A0117AC9EB76253965EF8463 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE721B3E54BB37EA0B4EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637A9CECC865C7A6E308638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D810C7A7FD36191A1A00A1B2F76190BE96117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC47272755C61AA17BA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352026055571C92BF10FF04B652EEC242312D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B642416645EBD45DC2089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C41E93BD56E7067354307CAA32FF218580205367B2BCC23E5B4D8E366ADB5A3BB6E7E65F06B2D3BF1CAD91A466A1DEF99B6ED91DBE5ABE359ADBCB5631A0A9D21F4E9387F6EE0124019A9C80E2F2BA5CFF6FC035106A00343404AE236257269DA4B1881A6453793CE9C32612AADDFBE0614DF969DE7B7BA855A71A35648BE338CE9510FB958DCE06DB58C12E6D310A6D5333F7A9E5587C79A693EDB24507CE13387DFF0A840B692CF8 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D343DCFC3BEDFB0242D5C3041CC428666E27FAD0BED416FD7B669AA9A67D910800B46E3BEF01AC6382A1D7E09C32AA3244CF1F5D7F8BF368FAC06C5293772F7EFF59CA7333006C390A0927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojxzT94hBWypoQ1j7TcaXYvw== X-DA7885C5: F918799AB1B0FF4C15AFD4371B15264E79C4E7F89340391AEB75E6898FF797DB262E2D401490A4A0DB037EFA58388B346E8BC1A9835FDE71 X-Mailru-Sender: 689FA8AB762F7393CC2E0F076E87284E27C4569F4D2946125F8B4D7367C4EB9E0FBE9A32752B8C9C2AA642CC12EC09F1FB559BB5D741EB962F61BD320559CF1EFD657A8799238ED55FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: [Tarantool-patches] [PATCH luajit] ARM64: Avoid side-effects of constant rematerialization. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" From: Mike Pall Thanks to Patrick Galizia. (cherry picked from commit b33e3f2d441590f4de0d189bd9a65661824a48f6) Constant rematerialization must not use other registers that contain constants, if the register is in-flight. When we have the high regitster pressure we can face the following issue: The assembly of an IR instruction allocates a constant into a free register. Then it spills another register (due to high register pressure), which is rematerialized using the same constant (which it assumes is now in the allocated register). In case when the first register also happens to be the destination register, the constant value is modified before the rematerialization. For the code in the test for this commit we get the following register allocation order (read from top to bottom (DBG RA reversed)): | current IR | operation | IR ref | register | 0048 alloc 0038 x0 | 0048 remat K038 x0 | 0048 alloc K023 x4 Which leads to the following asembly: | ... | add x4, x4, x0 # x4 modified before x0 rematerialization | ldrb w4, [x4, #24] | add x0, x4, #24 # constant x0 rematerialization | ... As a result, the value register x0 holding is incorrect. This patch moves allocation of constants for earlier to be sure that the rematerialization can not make use of the same constant as one of the sources of the IR instruction. After the patch register allocation order is the following: | current IR | operation | IR ref | register | 0048 alloc K023 x4 | 0048 alloc 0038 x0 | 0048 remat K038 x0 Also, this patch fixes the `asm_fusexref()` logic for the `IR_STRREF` in case, when both operands don't fit in 32-bit constants (`asm_isk32()` fails). We want to use the IR operand holds the referenced value in `ra_alloc1()` as one having the hint set (`ra_hashint()` check passes). It is set for the operand with a non constant value (`irref_isk()` fails). The code assumes that this is always the `ir->op1` operand, so for cases when this value holds `ir->op2` operand register allocator misses the aforementioned hint in `ir->op2`. As the result the wrong register is selected. This patch adds the corresponding `irref_isk()` check for the `ir->op1` to detect which operand contains the value with the hint. After the patch the resulting assembly is the following: | ... | add x4, x0, x4 | ldrb w4, [x4, #24] | add x0, x1, #112 | ... As we can see the constant is rematerialized from another, non-modified register. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#7230 --- The test case leads to the coredump when compile with -DCMAKE_BUILD_TYPE=[Release, RelWithDebInfo]. Issue: https://github.com/tarantool/tarantool/issues/7230 PRs: * https://github.com/LuaJIT/LuaJIT/pull/438 * https://github.com/LuaJIT/LuaJIT/pull/479 Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-438-arm64-constant-rematerialization-full-ci Tarantool PR: https://github.com/tarantool/tarantool/pull/7628 src/lj_asm_arm64.h | 46 +++++--- ...-arm64-constant-rematerialization.test.lua | 102 ++++++++++++++++++ 2 files changed, 131 insertions(+), 17 deletions(-) create mode 100644 test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h index da0ee4bb..a4de187f 100644 --- a/src/lj_asm_arm64.h +++ b/src/lj_asm_arm64.h @@ -295,8 +295,10 @@ static void asm_fusexref(ASMState *as, A64Ins ai, Reg rd, IRRef ref, } else if (asm_isk32(as, ir->op1, &ofs)) { ref = ir->op2; } else { - Reg rn = ra_alloc1(as, ir->op1, allow); - IRIns *irr = IR(ir->op2); + Reg refk = irref_isk(ir->op1) ? ir->op1 : ir->op2; + Reg refv = irref_isk(ir->op1) ? ir->op2 : ir->op1; + Reg rn = ra_alloc1(as, refv, allow); + IRIns *irr = IR(refk); uint32_t m; if (irr+1 == ir && !ra_used(irr) && irr->o == IR_ADD && irref_isk(irr->op2)) { @@ -307,7 +309,7 @@ static void asm_fusexref(ASMState *as, A64Ins ai, Reg rd, IRRef ref, goto skipopm; } } - m = asm_fuseopm(as, 0, ir->op2, rset_exclude(allow, rn)); + m = asm_fuseopm(as, 0, refk, rset_exclude(allow, rn)); ofs = sizeof(GCstr); skipopm: emit_lso(as, ai, rd, rd, ofs); @@ -722,6 +724,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) Reg dest = ra_dest(as, ir, allow); Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest)); Reg key = 0, tmp = RID_TMP; + Reg ftmp = RID_NONE, type = RID_NONE, scr = RID_NONE, tisnum = RID_NONE; IRRef refkey = ir->op2; IRIns *irkey = IR(refkey); int isk = irref_isk(ir->op2); @@ -751,6 +754,28 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) } } + /* Allocate constants early. */ + if (irt_isnum(kt)) { + if (!isk) { + tisnum = ra_allock(as, LJ_TISNUM << 15, allow); + ftmp = ra_scratch(as, rset_exclude(RSET_FPR, key)); + rset_clear(allow, tisnum); + } + } else if (irt_isaddr(kt)) { + if (isk) { + int64_t kk = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; + scr = ra_allock(as, kk, allow); + } else { + scr = ra_scratch(as, allow); + } + rset_clear(allow, scr); + } else { + lua_assert(irt_ispri(kt) && !irt_isnil(kt)); + type = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow); + scr = ra_scratch(as, rset_clear(allow, type)); + rset_clear(allow, scr); + } + /* Key not found in chain: jump to exit (if merged) or load niltv. */ l_end = emit_label(as); as->invmcp = NULL; @@ -780,9 +805,6 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) emit_nm(as, A64I_CMPx, key, tmp); emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); } else { - Reg tisnum = ra_allock(as, LJ_TISNUM << 15, allow); - Reg ftmp = ra_scratch(as, rset_exclude(RSET_FPR, key)); - rset_clear(allow, tisnum); emit_nm(as, A64I_FCMPd, key, ftmp); emit_dn(as, A64I_FMOV_D_R, (ftmp & 31), (tmp & 31)); emit_cond_branch(as, CC_LO, l_next); @@ -790,31 +812,21 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.n)); } } else if (irt_isaddr(kt)) { - Reg scr; if (isk) { - int64_t kk = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; - scr = ra_allock(as, kk, allow); emit_nm(as, A64I_CMPx, scr, tmp); emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); } else { - scr = ra_scratch(as, allow); emit_nm(as, A64I_CMPx, tmp, scr); emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key.u64)); } - rset_clear(allow, scr); } else { - Reg type, scr; - lua_assert(irt_ispri(kt) && !irt_isnil(kt)); - type = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow); - scr = ra_scratch(as, rset_clear(allow, type)); - rset_clear(allow, scr); emit_nm(as, A64I_CMPw, scr, type); emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key)); } *l_loop = A64I_BCC | A64F_S19(as->mcp - l_loop) | CC_NE; if (!isk && irt_isaddr(kt)) { - Reg type = ra_allock(as, (int32_t)irt_toitype(kt), allow); + type = ra_allock(as, (int32_t)irt_toitype(kt), allow); emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 47), tmp, key, type); rset_clear(allow, type); } diff --git a/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua new file mode 100644 index 00000000..ffc449bc --- /dev/null +++ b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua @@ -0,0 +1,102 @@ +local tap = require('tap') + +-- Test file to demonstrate LuaJIT bug with constant +-- rematerialization on arm64. +-- See also https://github.com/LuaJIT/LuaJIT/pull/438. +local test = tap.test('lj-438-arm64-constant-rematerialization') +test:plan(1) + +-- This test file demonstrates the following problem: +-- The assembly of an IR instruction allocates a constant into a +-- free register. Then it spills another register (due to high +-- register pressure), which is rematerialized using the same +-- constant (which it assumes is now in the allocated register). +-- In case when the first register also happens to be the +-- destination register, the constant value is modified before the +-- rematerialization. +-- +-- For the code below we get the following register allocation +-- order (read from top to bottom (DBG RA reversed)): +-- | current IR | operation | IR ref | register +-- | 0048 alloc 0038 x0 +-- | 0048 remat K038 x0 +-- | 0048 alloc K023 x4 +-- +-- Which leads to the following asembly: +-- | ... +-- | add x4, x4, x0 # x4 modified before x0 rematerialization +-- | ldrb w4, [x4, #24] +-- | add x0, x4, #24 # constant x0 rematerialization +-- | ... +-- As a result, the value register x0 holding is incorrect. + +local empty = {} + +jit.off() +jit.flush() + +-- XXX: The example below is very fragile. Even the names of +-- the variables matter. +local function scan(vs) + -- The code below is needed to generate high register pressure + -- and specific register allocations. + for _, v in ipairs(vs) do + -- XXX: Just more usage of registers. Nothing significant. + local sep = v:find('@') + -- Recording of yielding `string.byte()` result encodes XLOAD + -- IR. Its assembly modifies x4 register, that is chosen as + -- a destination register. + -- IR_NE, that using `asm_href()` uses the modified x4 + -- register as a source for constant x0 rematerialization. + -- As far as it is modified before, the result value is + -- incorrect. + -- luacheck: ignore + if v:sub(sep + 2, -2):byte() == 0x3f then -- 0x3f == '?' + end + + -- XXX: Just more usage of registers. Nothing significant. + local _ = empty[v] + + -- Here the `str` strdata value (rematerialized x0 register) + -- given to the `lj_str_find()` is invalid on the trace, + -- that as a result leading to the core dump. + v:find(':') + end +end + +jit.on() +jit.opt.start('hotloop=1', 'loopunroll=1') + +-- This wrapper function is needed to avoid excess errors 'leaving +-- loop in the root trace'. +local function wrap() + -- XXX: There are four failing attemts to compile trace for this + -- code: + -- * The first trace trying to record starts with the ITERL BC + -- in `scan()` function. The compilation failed, because + -- recording starts at the second iteration, when the loop is + -- left. + -- * The second trace starts with UGET (scan) in the cycle + -- below. Entering calling the `scan` function compilation + -- failed, when sees the inner ITERL loop. + -- * The third trace starts with GGET (ipairs) in the `scan()` + -- function trying to record the hot function. The compilation + -- is failed due to facing the inner ITERL loop. + -- * At 19th iteration the ITERL trying to be recorded again + -- after this instruction become hot again. + -- + -- And, finally, at 39th iteration the `for` loop below is + -- recorded after becoming hot again. Now the compiler inlining + -- the inner loop and recording doesn't fail. + -- The 40th iteration is needed to be sure the compiled mcode is + -- correct. + for _ = 1, 40 do + scan({'ab@xyz'}) + end +end + +wrap() + +test:ok(true, 'the resulting trace is correct') + +os.exit(test:check() and 0 or 1) -- 2.34.1