From: sergos via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Sergey Kaplun <skaplun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Avoid side-effects of constant rematerialization. Date: Tue, 6 Sep 2022 11:43:56 +0300 [thread overview] Message-ID: <DEB757EB-6D1B-4B7D-ACAE-A91EF7A267B9@tarantool.org> (raw) In-Reply-To: <20220831095237.18440-1-skaplun@tarantool.org> Hi! Thanks for the patch! As I can’t say much about the patch from Mike, LGTM. Just some nits in the comment. Sergos > On 31 Aug 2022, at 12:52, Sergey Kaplun <skaplun@tarantool.org> wrote: > > From: Mike Pall <mike> > > Thanks to Patrick Galizia. > > (cherry picked from commit b33e3f2d441590f4de0d189bd9a65661824a48f6) > > Constant rematerialization must not use other registers that contain > constants, if the register is in-flight. When we have the high ^^^^^^ in use? > regitster pressure we can face the following issue: > > The assembly of an IR instruction allocates a constant into a free > register. Then it spills another register (due to high register > pressure), which is rematerialized using the same constant (which it > assumes is now in the allocated register). In case when the first > register also happens to be the destination register, the constant value > is modified before the rematerialization. > > For the code in the test for this commit we get the following register > allocation order (read from top to bottom (DBG RA reversed)): > | current IR | operation | IR ref | register > | 0048 alloc 0038 x0 > | 0048 remat K038 x0 > | 0048 alloc K023 x4 > > Which leads to the following asembly: > | ... > | add x4, x4, x0 # x4 modified before x0 rematerialization > | ldrb w4, [x4, #24] > | add x0, x4, #24 # constant x0 rematerialization > | ... > As a result, the value register x0 holding is incorrect. > > This patch moves allocation of constants for earlier to be sure that the ^^^ remove it > rematerialization can not make use of the same constant as one of the > sources of the IR instruction. > > After the patch register allocation order is the following: > | current IR | operation | IR ref | register > | 0048 alloc K023 x4 > | 0048 alloc 0038 x0 > | 0048 remat K038 x0 > > Also, this patch fixes the `asm_fusexref()` logic for the `IR_STRREF` in > case, when both operands don't fit in 32-bit constants (`asm_isk32()` > fails). We want to use the IR operand holds the referenced value in holding > `ra_alloc1()` as one having the hint set (`ra_hashint()` check passes). > It is set for the operand with a non constant value (`irref_isk()` > fails). The code assumes that this is always the `ir->op1` operand, so it > for cases when this value holds `ir->op2` operand register allocator the case the > misses the aforementioned hint in `ir->op2`. As the result the wrong > register is selected. This patch adds the corresponding `irref_isk()` > check for the `ir->op1` to detect which operand contains the value with > the hint. > > After the patch the resulting assembly is the following: > | ... > | add x4, x0, x4 > | ldrb w4, [x4, #24] > | add x0, x1, #112 > | ... > > As we can see the constant is rematerialized from another, non-modified > register. > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#7230 > --- > > The test case leads to the coredump when compile with > -DCMAKE_BUILD_TYPE=[Release, RelWithDebInfo]. > > Issue: https://github.com/tarantool/tarantool/issues/7230 > PRs: > * https://github.com/LuaJIT/LuaJIT/pull/438 > * https://github.com/LuaJIT/LuaJIT/pull/479 > Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-438-arm64-constant-rematerialization-full-ci > Tarantool PR: https://github.com/tarantool/tarantool/pull/7628 > > src/lj_asm_arm64.h | 46 +++++--- > ...-arm64-constant-rematerialization.test.lua | 102 ++++++++++++++++++ > 2 files changed, 131 insertions(+), 17 deletions(-) > create mode 100644 test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua > > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > index da0ee4bb..a4de187f 100644 > --- a/src/lj_asm_arm64.h > +++ b/src/lj_asm_arm64.h > @@ -295,8 +295,10 @@ static void asm_fusexref(ASMState *as, A64Ins ai, Reg rd, IRRef ref, > } else if (asm_isk32(as, ir->op1, &ofs)) { > ref = ir->op2; > } else { > - Reg rn = ra_alloc1(as, ir->op1, allow); > - IRIns *irr = IR(ir->op2); > + Reg refk = irref_isk(ir->op1) ? ir->op1 : ir->op2; > + Reg refv = irref_isk(ir->op1) ? ir->op2 : ir->op1; > + Reg rn = ra_alloc1(as, refv, allow); > + IRIns *irr = IR(refk); > uint32_t m; > if (irr+1 == ir && !ra_used(irr) && > irr->o == IR_ADD && irref_isk(irr->op2)) { > @@ -307,7 +309,7 @@ static void asm_fusexref(ASMState *as, A64Ins ai, Reg rd, IRRef ref, > goto skipopm; > } > } > - m = asm_fuseopm(as, 0, ir->op2, rset_exclude(allow, rn)); > + m = asm_fuseopm(as, 0, refk, rset_exclude(allow, rn)); > ofs = sizeof(GCstr); > skipopm: > emit_lso(as, ai, rd, rd, ofs); > @@ -722,6 +724,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) > Reg dest = ra_dest(as, ir, allow); > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest)); > Reg key = 0, tmp = RID_TMP; > + Reg ftmp = RID_NONE, type = RID_NONE, scr = RID_NONE, tisnum = RID_NONE; > IRRef refkey = ir->op2; > IRIns *irkey = IR(refkey); > int isk = irref_isk(ir->op2); > @@ -751,6 +754,28 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) > } > } > > + /* Allocate constants early. */ > + if (irt_isnum(kt)) { > + if (!isk) { > + tisnum = ra_allock(as, LJ_TISNUM << 15, allow); > + ftmp = ra_scratch(as, rset_exclude(RSET_FPR, key)); > + rset_clear(allow, tisnum); > + } > + } else if (irt_isaddr(kt)) { > + if (isk) { > + int64_t kk = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; > + scr = ra_allock(as, kk, allow); > + } else { > + scr = ra_scratch(as, allow); > + } > + rset_clear(allow, scr); > + } else { > + lua_assert(irt_ispri(kt) && !irt_isnil(kt)); > + type = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow); > + scr = ra_scratch(as, rset_clear(allow, type)); > + rset_clear(allow, scr); > + } > + > /* Key not found in chain: jump to exit (if merged) or load niltv. */ > l_end = emit_label(as); > as->invmcp = NULL; > @@ -780,9 +805,6 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) > emit_nm(as, A64I_CMPx, key, tmp); > emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); > } else { > - Reg tisnum = ra_allock(as, LJ_TISNUM << 15, allow); > - Reg ftmp = ra_scratch(as, rset_exclude(RSET_FPR, key)); > - rset_clear(allow, tisnum); > emit_nm(as, A64I_FCMPd, key, ftmp); > emit_dn(as, A64I_FMOV_D_R, (ftmp & 31), (tmp & 31)); > emit_cond_branch(as, CC_LO, l_next); > @@ -790,31 +812,21 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) > emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.n)); > } > } else if (irt_isaddr(kt)) { > - Reg scr; > if (isk) { > - int64_t kk = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; > - scr = ra_allock(as, kk, allow); > emit_nm(as, A64I_CMPx, scr, tmp); > emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64)); > } else { > - scr = ra_scratch(as, allow); > emit_nm(as, A64I_CMPx, tmp, scr); > emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key.u64)); > } > - rset_clear(allow, scr); > } else { > - Reg type, scr; > - lua_assert(irt_ispri(kt) && !irt_isnil(kt)); > - type = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow); > - scr = ra_scratch(as, rset_clear(allow, type)); > - rset_clear(allow, scr); > emit_nm(as, A64I_CMPw, scr, type); > emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key)); > } > > *l_loop = A64I_BCC | A64F_S19(as->mcp - l_loop) | CC_NE; > if (!isk && irt_isaddr(kt)) { > - Reg type = ra_allock(as, (int32_t)irt_toitype(kt), allow); > + type = ra_allock(as, (int32_t)irt_toitype(kt), allow); > emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 47), tmp, key, type); > rset_clear(allow, type); > } > diff --git a/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua > new file mode 100644 > index 00000000..ffc449bc > --- /dev/null > +++ b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua > @@ -0,0 +1,102 @@ > +local tap = require('tap') > + > +-- Test file to demonstrate LuaJIT bug with constant > +-- rematerialization on arm64. > +-- See also https://github.com/LuaJIT/LuaJIT/pull/438. > +local test = tap.test('lj-438-arm64-constant-rematerialization') > +test:plan(1) > + > +-- This test file demonstrates the following problem: > +-- The assembly of an IR instruction allocates a constant into a > +-- free register. Then it spills another register (due to high > +-- register pressure), which is rematerialized using the same > +-- constant (which it assumes is now in the allocated register). > +-- In case when the first register also happens to be the > +-- destination register, the constant value is modified before the > +-- rematerialization. > +-- > +-- For the code below we get the following register allocation > +-- order (read from top to bottom (DBG RA reversed)): > +-- | current IR | operation | IR ref | register > +-- | 0048 alloc 0038 x0 > +-- | 0048 remat K038 x0 > +-- | 0048 alloc K023 x4 > +-- > +-- Which leads to the following asembly: > +-- | ... > +-- | add x4, x4, x0 # x4 modified before x0 rematerialization > +-- | ldrb w4, [x4, #24] > +-- | add x0, x4, #24 # constant x0 rematerialization > +-- | ... > +-- As a result, the value register x0 holding is incorrect. > + > +local empty = {} > + > +jit.off() > +jit.flush() > + > +-- XXX: The example below is very fragile. Even the names of > +-- the variables matter. > +local function scan(vs) > + -- The code below is needed to generate high register pressure > + -- and specific register allocations. > + for _, v in ipairs(vs) do > + -- XXX: Just more usage of registers. Nothing significant. > + local sep = v:find('@') > + -- Recording of yielding `string.byte()` result encodes XLOAD > + -- IR. Its assembly modifies x4 register, that is chosen as > + -- a destination register. > + -- IR_NE, that using `asm_href()` uses the modified x4 > + -- register as a source for constant x0 rematerialization. > + -- As far as it is modified before, the result value is > + -- incorrect. > + -- luacheck: ignore > + if v:sub(sep + 2, -2):byte() == 0x3f then -- 0x3f == '?' > + end > + > + -- XXX: Just more usage of registers. Nothing significant. > + local _ = empty[v] > + > + -- Here the `str` strdata value (rematerialized x0 register) > + -- given to the `lj_str_find()` is invalid on the trace, > + -- that as a result leading to the core dump. > + v:find(':') > + end > +end > + > +jit.on() > +jit.opt.start('hotloop=1', 'loopunroll=1') > + > +-- This wrapper function is needed to avoid excess errors 'leaving > +-- loop in the root trace'. > +local function wrap() > + -- XXX: There are four failing attemts to compile trace for this > + -- code: > + -- * The first trace trying to record starts with the ITERL BC > + -- in `scan()` function. The compilation failed, because > + -- recording starts at the second iteration, when the loop is > + -- left. > + -- * The second trace starts with UGET (scan) in the cycle > + -- below. Entering calling the `scan` function compilation > + -- failed, when sees the inner ITERL loop. > + -- * The third trace starts with GGET (ipairs) in the `scan()` > + -- function trying to record the hot function. The compilation > + -- is failed due to facing the inner ITERL loop. > + -- * At 19th iteration the ITERL trying to be recorded again > + -- after this instruction become hot again. > + -- > + -- And, finally, at 39th iteration the `for` loop below is > + -- recorded after becoming hot again. Now the compiler inlining > + -- the inner loop and recording doesn't fail. > + -- The 40th iteration is needed to be sure the compiled mcode is > + -- correct. > + for _ = 1, 40 do > + scan({'ab@xyz'}) > + end > +end > + > +wrap() > + > +test:ok(true, 'the resulting trace is correct') > + > +os.exit(test:check() and 0 or 1) > -- > 2.34.1 >
next prev parent reply other threads:[~2022-09-06 8:44 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-08-31 9:52 Sergey Kaplun via Tarantool-patches 2022-09-06 8:43 ` sergos via Tarantool-patches [this message] 2022-09-28 8:44 ` Sergey Kaplun via Tarantool-patches 2023-03-10 18:00 ` Igor Munkin via Tarantool-patches 2023-03-30 17:38 ` Igor Munkin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=DEB757EB-6D1B-4B7D-ACAE-A91EF7A267B9@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=sergos@tarantool.org \ --cc=skaplun@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit] ARM64: Avoid side-effects of constant rematerialization.' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox