From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 5FB196F153; Tue, 6 Sep 2022 11:44:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5FB196F153 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1662453840; bh=e2HCPUS37hZDKr5qj12wSmSywxbjrMr4i8oFDskKBS8=; h=In-Reply-To:Date:References:To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Rs+KKL3PrKQhCed+lpC4ws7lwpo8cEvx7KP/mUJLJLA5mUPinyPg6sEELKWYYdsDT 7qJBQQ6pZAAArBSczbqTudS0NmNIAJzxpGtDtsKAmcmNeUiBTx37f08sNdwwM3RPYP 2ikmxpXgh7vdqnrX9Spn3WPK1kreGm7CleWjtQB4= Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4CAB06F153 for ; Tue, 6 Sep 2022 11:43:58 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4CAB06F153 Received: by smtp52.i.mail.ru with esmtpa (envelope-from ) id 1oVUBZ-0000N2-62; Tue, 06 Sep 2022 11:43:57 +0300 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) In-Reply-To: <20220831095237.18440-1-skaplun@tarantool.org> Date: Tue, 6 Sep 2022 11:43:56 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220831095237.18440-1-skaplun@tarantool.org> To: Sergey Kaplun X-Mailer: Apple Mail (2.3696.120.41.1.1) X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD98C49E8F258F09B63E1E6383D6B27A1A8D4140AD03CC9AEBB182A05F5380850404C228DA9ACA6FE2710AF1E9998BA9EB680F8B598DA3C110C3E7BCEAAA81E673DC48C09A8C4BBD105 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE721B3E54BB37EA0B4EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006379EF8ECDDDA8246E28638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D87F6F72CFACEF3B7AA376137F3D5165DB117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC1A9C11735BBA05FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751FC26CFBAC0749D213D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE9647ADFADE5905B1C8623B8F170C382FD8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE386A7C529F68B8E5CAD7EC71F1DB88427C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637AF8E4F18C523FAA9EFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142185AC9E3593CE4B31AB1881A6453793CE9274300E5CE05BD4401A9E91200F654B0C7A0BC55FA0FE5FC25649B616D0C609024C8591984204542B5DC6A8C2886C3D7B1881A6453793CE9C32612AADDFBE06133F7A9E5587C79A693EDB24507CE13387DFF0A840B692CF8 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D349BD6FB698A487E7ED86AD4953F7400C6556E921E496903C4834DF224E27D270EE939424094D2175D1D7E09C32AA3244CCAA96C834D0301CB1B838CD755BC072897FE24653F78E668FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojIzbLTRXTUpUr+p+4rZsvZw== X-Mailru-Sender: 5AA3D5B9D8C4864669003722B2A8B846207D716CF8D1C5152E3EFE54C102F89F93AC9912533B234219381EE24192DF5555834048F03EF5D4C9A814A92B2E3B1BA4250FC3964EA4964198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Avoid side-effects of constant rematerialization. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for the patch! As I can=E2=80=99t say much about the patch from Mike, LGTM. Just some nits in the comment. Sergos > On 31 Aug 2022, at 12:52, Sergey Kaplun wrote: >=20 > From: Mike Pall >=20 > Thanks to Patrick Galizia. >=20 > (cherry picked from commit b33e3f2d441590f4de0d189bd9a65661824a48f6) >=20 > Constant rematerialization must not use other registers that contain > constants, if the register is in-flight. When we have the high ^^^^^^ in use? =20 > regitster pressure we can face the following issue: >=20 > The assembly of an IR instruction allocates a constant into a free > register. Then it spills another register (due to high register > pressure), which is rematerialized using the same constant (which it > assumes is now in the allocated register). In case when the first > register also happens to be the destination register, the constant = value > is modified before the rematerialization. >=20 > For the code in the test for this commit we get the following register > allocation order (read from top to bottom (DBG RA reversed)): > | current IR | operation | IR ref | register > | 0048 alloc 0038 x0 > | 0048 remat K038 x0 > | 0048 alloc K023 x4 >=20 > Which leads to the following asembly: > | ... > | add x4, x4, x0 # x4 modified before x0 rematerialization > | ldrb w4, [x4, #24] > | add x0, x4, #24 # constant x0 rematerialization > | ... > As a result, the value register x0 holding is incorrect. >=20 > This patch moves allocation of constants for earlier to be sure that = the ^^^ remove it > rematerialization can not make use of the same constant as one of the > sources of the IR instruction. >=20 > After the patch register allocation order is the following: > | current IR | operation | IR ref | register > | 0048 alloc K023 x4 > | 0048 alloc 0038 x0 > | 0048 remat K038 x0 >=20 > Also, this patch fixes the `asm_fusexref()` logic for the `IR_STRREF` = in > case, when both operands don't fit in 32-bit constants (`asm_isk32()` > fails). We want to use the IR operand holds the referenced value in holding > `ra_alloc1()` as one having the hint set (`ra_hashint()` check = passes). > It is set for the operand with a non constant value (`irref_isk()` > fails). The code assumes that this is always the `ir->op1` operand, so it > for cases when this value holds `ir->op2` operand register allocator the case the > misses the aforementioned hint in `ir->op2`. As the result the wrong > register is selected. This patch adds the corresponding `irref_isk()` > check for the `ir->op1` to detect which operand contains the value = with > the hint. >=20 > After the patch the resulting assembly is the following: > | ... > | add x4, x0, x4 > | ldrb w4, [x4, #24] > | add x0, x1, #112 > | ... >=20 > As we can see the constant is rematerialized from another, = non-modified > register. >=20 > Sergey Kaplun: > * added the description and the test for the problem >=20 > Part of tarantool/tarantool#7230 > --- >=20 > The test case leads to the coredump when compile with > -DCMAKE_BUILD_TYPE=3D[Release, RelWithDebInfo]. >=20 > Issue: https://github.com/tarantool/tarantool/issues/7230 > PRs: > * https://github.com/LuaJIT/LuaJIT/pull/438 > * https://github.com/LuaJIT/LuaJIT/pull/479 > Branch: = https://github.com/tarantool/luajit/tree/skaplun/lj-438-arm64-constant-rem= aterialization-full-ci > Tarantool PR: https://github.com/tarantool/tarantool/pull/7628 >=20 > src/lj_asm_arm64.h | 46 +++++--- > ...-arm64-constant-rematerialization.test.lua | 102 ++++++++++++++++++ > 2 files changed, 131 insertions(+), 17 deletions(-) > create mode 100644 = test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua >=20 > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > index da0ee4bb..a4de187f 100644 > --- a/src/lj_asm_arm64.h > +++ b/src/lj_asm_arm64.h > @@ -295,8 +295,10 @@ static void asm_fusexref(ASMState *as, A64Ins ai, = Reg rd, IRRef ref, > } else if (asm_isk32(as, ir->op1, &ofs)) { > ref =3D ir->op2; > } else { > - Reg rn =3D ra_alloc1(as, ir->op1, allow); > - IRIns *irr =3D IR(ir->op2); > + Reg refk =3D irref_isk(ir->op1) ? ir->op1 : ir->op2; > + Reg refv =3D irref_isk(ir->op1) ? ir->op2 : ir->op1; > + Reg rn =3D ra_alloc1(as, refv, allow); > + IRIns *irr =3D IR(refk); > uint32_t m; > if (irr+1 =3D=3D ir && !ra_used(irr) && > irr->o =3D=3D IR_ADD && irref_isk(irr->op2)) { > @@ -307,7 +309,7 @@ static void asm_fusexref(ASMState *as, A64Ins ai, = Reg rd, IRRef ref, > goto skipopm; > } > } > - m =3D asm_fuseopm(as, 0, ir->op2, rset_exclude(allow, rn)); > + m =3D asm_fuseopm(as, 0, refk, rset_exclude(allow, rn)); > ofs =3D sizeof(GCstr); > skipopm: > emit_lso(as, ai, rd, rd, ofs); > @@ -722,6 +724,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp = merge) > Reg dest =3D ra_dest(as, ir, allow); > Reg tab =3D ra_alloc1(as, ir->op1, rset_clear(allow, dest)); > Reg key =3D 0, tmp =3D RID_TMP; > + Reg ftmp =3D RID_NONE, type =3D RID_NONE, scr =3D RID_NONE, tisnum = =3D RID_NONE; > IRRef refkey =3D ir->op2; > IRIns *irkey =3D IR(refkey); > int isk =3D irref_isk(ir->op2); > @@ -751,6 +754,28 @@ static void asm_href(ASMState *as, IRIns *ir, = IROp merge) > } > } >=20 > + /* Allocate constants early. */ > + if (irt_isnum(kt)) { > + if (!isk) { > + tisnum =3D ra_allock(as, LJ_TISNUM << 15, allow); > + ftmp =3D ra_scratch(as, rset_exclude(RSET_FPR, key)); > + rset_clear(allow, tisnum); > + } > + } else if (irt_isaddr(kt)) { > + if (isk) { > + int64_t kk =3D ((int64_t)irt_toitype(irkey->t) << 47) | = irkey[1].tv.u64; > + scr =3D ra_allock(as, kk, allow); > + } else { > + scr =3D ra_scratch(as, allow); > + } > + rset_clear(allow, scr); > + } else { > + lua_assert(irt_ispri(kt) && !irt_isnil(kt)); > + type =3D ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), = allow); > + scr =3D ra_scratch(as, rset_clear(allow, type)); > + rset_clear(allow, scr); > + } > + > /* Key not found in chain: jump to exit (if merged) or load niltv. = */ > l_end =3D emit_label(as); > as->invmcp =3D NULL; > @@ -780,9 +805,6 @@ static void asm_href(ASMState *as, IRIns *ir, IROp = merge) > emit_nm(as, A64I_CMPx, key, tmp); > emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); > } else { > - Reg tisnum =3D ra_allock(as, LJ_TISNUM << 15, allow); > - Reg ftmp =3D ra_scratch(as, rset_exclude(RSET_FPR, key)); > - rset_clear(allow, tisnum); > emit_nm(as, A64I_FCMPd, key, ftmp); > emit_dn(as, A64I_FMOV_D_R, (ftmp & 31), (tmp & 31)); > emit_cond_branch(as, CC_LO, l_next); > @@ -790,31 +812,21 @@ static void asm_href(ASMState *as, IRIns *ir, = IROp merge) > emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.n)); > } > } else if (irt_isaddr(kt)) { > - Reg scr; > if (isk) { > - int64_t kk =3D ((int64_t)irt_toitype(irkey->t) << 47) | = irkey[1].tv.u64; > - scr =3D ra_allock(as, kk, allow); > emit_nm(as, A64I_CMPx, scr, tmp); > emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); > } else { > - scr =3D ra_scratch(as, allow); > emit_nm(as, A64I_CMPx, tmp, scr); > emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key.u64)); > } > - rset_clear(allow, scr); > } else { > - Reg type, scr; > - lua_assert(irt_ispri(kt) && !irt_isnil(kt)); > - type =3D ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), = allow); > - scr =3D ra_scratch(as, rset_clear(allow, type)); > - rset_clear(allow, scr); > emit_nm(as, A64I_CMPw, scr, type); > emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key)); > } >=20 > *l_loop =3D A64I_BCC | A64F_S19(as->mcp - l_loop) | CC_NE; > if (!isk && irt_isaddr(kt)) { > - Reg type =3D ra_allock(as, (int32_t)irt_toitype(kt), allow); > + type =3D ra_allock(as, (int32_t)irt_toitype(kt), allow); > emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 47), tmp, key, type); > rset_clear(allow, type); > } > diff --git = a/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua = b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua > new file mode 100644 > index 00000000..ffc449bc > --- /dev/null > +++ = b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua > @@ -0,0 +1,102 @@ > +local tap =3D require('tap') > + > +-- Test file to demonstrate LuaJIT bug with constant > +-- rematerialization on arm64. > +-- See also https://github.com/LuaJIT/LuaJIT/pull/438. > +local test =3D tap.test('lj-438-arm64-constant-rematerialization') > +test:plan(1) > + > +-- This test file demonstrates the following problem: > +-- The assembly of an IR instruction allocates a constant into a > +-- free register. Then it spills another register (due to high > +-- register pressure), which is rematerialized using the same > +-- constant (which it assumes is now in the allocated register). > +-- In case when the first register also happens to be the > +-- destination register, the constant value is modified before the > +-- rematerialization. > +-- > +-- For the code below we get the following register allocation > +-- order (read from top to bottom (DBG RA reversed)): > +-- | current IR | operation | IR ref | register > +-- | 0048 alloc 0038 x0 > +-- | 0048 remat K038 x0 > +-- | 0048 alloc K023 x4 > +-- > +-- Which leads to the following asembly: > +-- | ... > +-- | add x4, x4, x0 # x4 modified before x0 rematerialization > +-- | ldrb w4, [x4, #24] > +-- | add x0, x4, #24 # constant x0 rematerialization > +-- | ... > +-- As a result, the value register x0 holding is incorrect. > + > +local empty =3D {} > + > +jit.off() > +jit.flush() > + > +-- XXX: The example below is very fragile. Even the names of > +-- the variables matter. > +local function scan(vs) > + -- The code below is needed to generate high register pressure > + -- and specific register allocations. > + for _, v in ipairs(vs) do > + -- XXX: Just more usage of registers. Nothing significant. > + local sep =3D v:find('@') > + -- Recording of yielding `string.byte()` result encodes XLOAD > + -- IR. Its assembly modifies x4 register, that is chosen as > + -- a destination register. > + -- IR_NE, that using `asm_href()` uses the modified x4 > + -- register as a source for constant x0 rematerialization. > + -- As far as it is modified before, the result value is > + -- incorrect. > + -- luacheck: ignore > + if v:sub(sep + 2, -2):byte() =3D=3D 0x3f then -- 0x3f =3D=3D '?' > + end > + > + -- XXX: Just more usage of registers. Nothing significant. > + local _ =3D empty[v] > + > + -- Here the `str` strdata value (rematerialized x0 register) > + -- given to the `lj_str_find()` is invalid on the trace, > + -- that as a result leading to the core dump. > + v:find(':') > + end > +end > + > +jit.on() > +jit.opt.start('hotloop=3D1', 'loopunroll=3D1') > + > +-- This wrapper function is needed to avoid excess errors 'leaving > +-- loop in the root trace'. > +local function wrap() > + -- XXX: There are four failing attemts to compile trace for this > + -- code: > + -- * The first trace trying to record starts with the ITERL BC > + -- in `scan()` function. The compilation failed, because > + -- recording starts at the second iteration, when the loop is > + -- left. > + -- * The second trace starts with UGET (scan) in the cycle > + -- below. Entering calling the `scan` function compilation > + -- failed, when sees the inner ITERL loop. > + -- * The third trace starts with GGET (ipairs) in the `scan()` > + -- function trying to record the hot function. The compilation > + -- is failed due to facing the inner ITERL loop. > + -- * At 19th iteration the ITERL trying to be recorded again > + -- after this instruction become hot again. > + -- > + -- And, finally, at 39th iteration the `for` loop below is > + -- recorded after becoming hot again. Now the compiler inlining > + -- the inner loop and recording doesn't fail. > + -- The 40th iteration is needed to be sure the compiled mcode is > + -- correct. > + for _ =3D 1, 40 do > + scan({'ab@xyz'}) > + end > +end > + > +wrap() > + > +test:ok(true, 'the resulting trace is correct') > + > +os.exit(test:check() and 0 or 1) > --=20 > 2.34.1 >=20