Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit] ARM64: Avoid side-effects of constant rematerialization.
@ 2022-08-31  9:52 Sergey Kaplun via Tarantool-patches
  2022-09-06  8:43 ` sergos via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2022-08-31  9:52 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

From: Mike Pall <mike>

Thanks to Patrick Galizia.

(cherry picked from commit b33e3f2d441590f4de0d189bd9a65661824a48f6)

Constant rematerialization must not use other registers that contain
constants, if the register is in-flight. When we have the high
regitster pressure we can face the following issue:

The assembly of an IR instruction allocates a constant into a free
register. Then it spills another register (due to high register
pressure), which is rematerialized using the same constant (which it
assumes is now in the allocated register). In case when the first
register also happens to be the destination register, the constant value
is modified before the rematerialization.

For the code in the test for this commit we get the following register
allocation order (read from top to bottom (DBG RA reversed)):
| current IR | operation | IR ref | register
|  0048         alloc       0038     x0
|  0048         remat       K038     x0
|  0048         alloc       K023     x4

Which leads to the following asembly:
| ...
| add   x4, x4, x0    # x4 modified before x0 rematerialization
| ldrb  w4, [x4, #24]
| add   x0, x4, #24   # constant x0 rematerialization
| ...
As a result, the value register x0 holding is incorrect.

This patch moves allocation of constants for earlier to be sure that the
rematerialization can not make use of the same constant as one of the
sources of the IR instruction.

After the patch register allocation order is the following:
| current IR | operation | IR ref | register
|  0048         alloc       K023     x4
|  0048         alloc       0038     x0
|  0048         remat       K038     x0

Also, this patch fixes the `asm_fusexref()` logic for the `IR_STRREF` in
case, when both operands don't fit in 32-bit constants (`asm_isk32()`
fails). We want to use the IR operand holds the referenced value in
`ra_alloc1()` as one having the hint set (`ra_hashint()` check passes).
It is set for the operand with a non constant value (`irref_isk()`
fails). The code assumes that this is always the `ir->op1` operand, so
for cases when this value holds `ir->op2` operand register allocator
misses the aforementioned hint in `ir->op2`. As the result the wrong
register is selected. This patch adds the corresponding `irref_isk()`
check for the `ir->op1` to detect which operand contains the value with
the hint.

After the patch the resulting assembly is the following:
| ...
| add   x4, x0, x4
| ldrb  w4, [x4, #24]
| add   x0, x1, #112
| ...

As we can see the constant is rematerialized from another, non-modified
register.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#7230
---

The test case leads to the coredump when compile with
-DCMAKE_BUILD_TYPE=[Release, RelWithDebInfo].

Issue: https://github.com/tarantool/tarantool/issues/7230
PRs:
* https://github.com/LuaJIT/LuaJIT/pull/438
* https://github.com/LuaJIT/LuaJIT/pull/479
Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-438-arm64-constant-rematerialization-full-ci
Tarantool PR: https://github.com/tarantool/tarantool/pull/7628

 src/lj_asm_arm64.h                            |  46 +++++---
 ...-arm64-constant-rematerialization.test.lua | 102 ++++++++++++++++++
 2 files changed, 131 insertions(+), 17 deletions(-)
 create mode 100644 test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua

diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
index da0ee4bb..a4de187f 100644
--- a/src/lj_asm_arm64.h
+++ b/src/lj_asm_arm64.h
@@ -295,8 +295,10 @@ static void asm_fusexref(ASMState *as, A64Ins ai, Reg rd, IRRef ref,
       } else if (asm_isk32(as, ir->op1, &ofs)) {
 	ref = ir->op2;
       } else {
-	Reg rn = ra_alloc1(as, ir->op1, allow);
-	IRIns *irr = IR(ir->op2);
+	Reg refk = irref_isk(ir->op1) ? ir->op1 : ir->op2;
+	Reg refv = irref_isk(ir->op1) ? ir->op2 : ir->op1;
+	Reg rn = ra_alloc1(as, refv, allow);
+	IRIns *irr = IR(refk);
 	uint32_t m;
 	if (irr+1 == ir && !ra_used(irr) &&
 	    irr->o == IR_ADD && irref_isk(irr->op2)) {
@@ -307,7 +309,7 @@ static void asm_fusexref(ASMState *as, A64Ins ai, Reg rd, IRRef ref,
 	    goto skipopm;
 	  }
 	}
-	m = asm_fuseopm(as, 0, ir->op2, rset_exclude(allow, rn));
+	m = asm_fuseopm(as, 0, refk, rset_exclude(allow, rn));
 	ofs = sizeof(GCstr);
       skipopm:
 	emit_lso(as, ai, rd, rd, ofs);
@@ -722,6 +724,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
   Reg dest = ra_dest(as, ir, allow);
   Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
   Reg key = 0, tmp = RID_TMP;
+  Reg ftmp = RID_NONE, type = RID_NONE, scr = RID_NONE, tisnum = RID_NONE;
   IRRef refkey = ir->op2;
   IRIns *irkey = IR(refkey);
   int isk = irref_isk(ir->op2);
@@ -751,6 +754,28 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
     }
   }
 
+  /* Allocate constants early. */
+  if (irt_isnum(kt)) {
+    if (!isk) {
+      tisnum = ra_allock(as, LJ_TISNUM << 15, allow);
+      ftmp = ra_scratch(as, rset_exclude(RSET_FPR, key));
+      rset_clear(allow, tisnum);
+    }
+  } else if (irt_isaddr(kt)) {
+    if (isk) {
+      int64_t kk = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
+      scr = ra_allock(as, kk, allow);
+    } else {
+      scr = ra_scratch(as, allow);
+    }
+    rset_clear(allow, scr);
+  } else {
+    lua_assert(irt_ispri(kt) && !irt_isnil(kt));
+    type = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
+    scr = ra_scratch(as, rset_clear(allow, type));
+    rset_clear(allow, scr);
+  }
+
   /* Key not found in chain: jump to exit (if merged) or load niltv. */
   l_end = emit_label(as);
   as->invmcp = NULL;
@@ -780,9 +805,6 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
 	emit_nm(as, A64I_CMPx, key, tmp);
       emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64));
     } else {
-      Reg tisnum = ra_allock(as, LJ_TISNUM << 15, allow);
-      Reg ftmp = ra_scratch(as, rset_exclude(RSET_FPR, key));
-      rset_clear(allow, tisnum);
       emit_nm(as, A64I_FCMPd, key, ftmp);
       emit_dn(as, A64I_FMOV_D_R, (ftmp & 31), (tmp & 31));
       emit_cond_branch(as, CC_LO, l_next);
@@ -790,31 +812,21 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
       emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.n));
     }
   } else if (irt_isaddr(kt)) {
-    Reg scr;
     if (isk) {
-      int64_t kk = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
-      scr = ra_allock(as, kk, allow);
       emit_nm(as, A64I_CMPx, scr, tmp);
       emit_lso(as, A64I_LDRx, tmp, dest, offsetof(Node, key.u64));
     } else {
-      scr = ra_scratch(as, allow);
       emit_nm(as, A64I_CMPx, tmp, scr);
       emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key.u64));
     }
-    rset_clear(allow, scr);
   } else {
-    Reg type, scr;
-    lua_assert(irt_ispri(kt) && !irt_isnil(kt));
-    type = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
-    scr = ra_scratch(as, rset_clear(allow, type));
-    rset_clear(allow, scr);
     emit_nm(as, A64I_CMPw, scr, type);
     emit_lso(as, A64I_LDRx, scr, dest, offsetof(Node, key));
   }
 
   *l_loop = A64I_BCC | A64F_S19(as->mcp - l_loop) | CC_NE;
   if (!isk && irt_isaddr(kt)) {
-    Reg type = ra_allock(as, (int32_t)irt_toitype(kt), allow);
+    type = ra_allock(as, (int32_t)irt_toitype(kt), allow);
     emit_dnm(as, A64I_ADDx | A64F_SH(A64SH_LSL, 47), tmp, key, type);
     rset_clear(allow, type);
   }
diff --git a/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua
new file mode 100644
index 00000000..ffc449bc
--- /dev/null
+++ b/test/tarantool-tests/lj-438-arm64-constant-rematerialization.test.lua
@@ -0,0 +1,102 @@
+local tap = require('tap')
+
+-- Test file to demonstrate LuaJIT bug with constant
+-- rematerialization on arm64.
+-- See also https://github.com/LuaJIT/LuaJIT/pull/438.
+local test = tap.test('lj-438-arm64-constant-rematerialization')
+test:plan(1)
+
+-- This test file demonstrates the following problem:
+-- The assembly of an IR instruction allocates a constant into a
+-- free register. Then it spills another register (due to high
+-- register pressure), which is rematerialized using the same
+-- constant (which it assumes is now in the allocated register).
+-- In case when the first register also happens to be the
+-- destination register, the constant value is modified before the
+-- rematerialization.
+--
+-- For the code below we get the following register allocation
+-- order (read from top to bottom (DBG RA reversed)):
+-- | current IR | operation | IR ref | register
+-- |  0048         alloc       0038     x0
+-- |  0048         remat       K038     x0
+-- |  0048         alloc       K023     x4
+--
+-- Which leads to the following asembly:
+-- | ...
+-- | add   x4, x4, x0    # x4 modified before x0 rematerialization
+-- | ldrb  w4, [x4, #24]
+-- | add   x0, x4, #24   # constant x0 rematerialization
+-- | ...
+-- As a result, the value register x0 holding is incorrect.
+
+local empty = {}
+
+jit.off()
+jit.flush()
+
+-- XXX: The example below is very fragile. Even the names of
+-- the variables matter.
+local function scan(vs)
+  -- The code below is needed to generate high register pressure
+  -- and specific register allocations.
+  for _, v in ipairs(vs) do
+    -- XXX: Just more usage of registers. Nothing significant.
+    local sep = v:find('@')
+    -- Recording of yielding `string.byte()` result encodes XLOAD
+    -- IR. Its assembly modifies x4 register, that is chosen as
+    -- a destination register.
+    -- IR_NE, that using `asm_href()` uses the modified x4
+    -- register as a source for constant x0 rematerialization.
+    -- As far as it is modified before, the result value is
+    -- incorrect.
+    -- luacheck: ignore
+    if v:sub(sep + 2, -2):byte() == 0x3f then -- 0x3f == '?'
+    end
+
+    -- XXX: Just more usage of registers. Nothing significant.
+    local _ = empty[v]
+
+    -- Here the `str` strdata value (rematerialized x0 register)
+    -- given to the `lj_str_find()` is invalid on the trace,
+    -- that as a result leading to the core dump.
+    v:find(':')
+  end
+end
+
+jit.on()
+jit.opt.start('hotloop=1', 'loopunroll=1')
+
+-- This wrapper function is needed to avoid excess errors 'leaving
+-- loop in the root trace'.
+local function wrap()
+  -- XXX: There are four failing attemts to compile trace for this
+  -- code:
+  -- * The first trace trying to record starts with the ITERL BC
+  --   in `scan()` function. The compilation failed, because
+  --   recording starts at the second iteration, when the loop is
+  --   left.
+  -- * The second trace starts with UGET (scan) in the cycle
+  --   below. Entering calling the `scan` function compilation
+  --   failed, when sees the inner ITERL loop.
+  -- * The third trace starts with GGET (ipairs) in the `scan()`
+  --   function trying to record the hot function. The compilation
+  --   is failed due to facing the inner ITERL loop.
+  -- * At 19th iteration the ITERL trying to be recorded again
+  --   after this instruction become hot again.
+  --
+  -- And, finally, at 39th iteration the `for` loop below is
+  -- recorded after becoming hot again. Now the compiler inlining
+  -- the inner loop and recording doesn't fail.
+  -- The 40th iteration is needed to be sure the compiled mcode is
+  -- correct.
+  for _ = 1, 40 do
+    scan({'ab@xyz'})
+  end
+end
+
+wrap()
+
+test:ok(true, 'the resulting trace is correct')
+
+os.exit(test:check() and 0 or 1)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-30 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31  9:52 [Tarantool-patches] [PATCH luajit] ARM64: Avoid side-effects of constant rematerialization Sergey Kaplun via Tarantool-patches
2022-09-06  8:43 ` sergos via Tarantool-patches
2022-09-28  8:44   ` Sergey Kaplun via Tarantool-patches
2023-03-10 18:00 ` Igor Munkin via Tarantool-patches
2023-03-30 17:38 ` Igor Munkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox