[Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().

Wed Mar 22 11:27:39 MSK 2023

From: Mike Pall <mike>

(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)

The accessing of memory address for some operation `emit_rma()` may be
encoded in one of the following ways:
 a. If the offset of the accessing address from the dispatch table
    (pinned to r14 that is not changed while trace execution) fits into
    32-bit, then encode this as an access to 32-bit displacement
    relative to r14.
 b. If the offset of the accessing address from the mcode (i.e. rip)
    fits into 32-bit, then encode this as an access to 32-bit
    displacement relative to rip (considering long mode specifics and
    `RID_RIP` hack).
 c. If the address doesn't fit into 32-bit one and we use `mov` or
    `movsd`, then encode 64-bit load from this address.
 d. Elsewhere, encode it as an access to 32-bit (the address should fit
    into 32-bit one) displacement (the only option for non-GC64 mode).

So, each instruction in GC64 mode differs from `mov` or `movsd` should
be encoded via the last option. But if we got a 64-bit address with a
big enough offset it can't be encoded and the assertion in `ptr2addr()`
will fail.

There are several cases, when `emit_rma()` is used with non `mov`
instruction:
* `IR_LDEXP` with `fld` instruction for loading constant
   number `TValue` by address.
* `IR_OBAR` with the corresponding `test` instruction on
  `marked` field of `GCobj`.
All these instructions require an additional register to store value by
address. We can't truly allocate a register here due to possibility to
break IR assembling which depends on specific register usage. So, we use
and restore r14 here for emitting.

Also, this patch removes `movsd` from condition from the `x86Op` type
check, as far as it never uses for the `emit_rma()` routine (see also
`emit_loadk64()` for details).

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#8069
---

Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
PR: https://github.com/tarantool/tarantool/pull/8477
Related issue: https://github.com/tarantool/tarantool/issues/8069

AFAICS, other places with `emit_rma()` usage are not related to the
patch as far as they take an offset for the address of JIT constants
stored in `jit_State`, so it always be near enough to dispatch.

Side note: you may check test-correctness of the last check with GC by
changing the corresponding condition check on `GC_WHITES` in asm_obar to
CC_NZ (like it will be treated for incorrect check). Be carefull, member
that instructions are emitted from bottom to top!

 src/lj_emit_x86.h                          |  24 ++++-
 test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
 2 files changed, 123 insertions(+), 3 deletions(-)
 create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua

diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
index 6b58306b..b3dc4ea5 100644
--- a/src/lj_emit_x86.h
+++ b/src/lj_emit_x86.h
@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
-  } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
-    emit_rmro(as, xo, rr, rr, 0);
-    emit_loadu64(as, rr, (uintptr_t)addr);
+  } else if (!checki32((intptr_t)addr)) {
+    Reg ra = (rr & 15);
+    if (xo != XO_MOV) {
+      /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
+      uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
+      uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
+      ra = RID_DISPATCH;
+      if (checku32(dispaddr)) {
+	emit_loadi(as, ra, (int32_t)dispaddr);
+      } else {  /* Full-size 64 bit load. */
+	MCode *p = as->mcp;
+	*(uint64_t *)(p-8) = dispaddr;
+	p[-9] = (MCode)(XI_MOVri+(ra&7));
+	p[-10] = 0x48 + ((ra>>3)&1);
+	p -= 10;
+	as->mcp = p;
+      }
+      if (xo == XO_GROUP3b) emit_i8(as, i8);
+    }
+    emit_rmro(as, xo, rr, ra, 0);
+    emit_loadu64(as, ra, (uintptr_t)addr);
   } else
 #endif
   {
diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
new file mode 100644
index 00000000..faddfe83
--- /dev/null
+++ b/test/tarantool-tests/fix-emit-rma.test.lua
@@ -0,0 +1,102 @@
+local tap = require('tap')
+local test = tap.test('fix-emit-rma'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+  ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
+})
+
+-- Need to test 2 cases of `emit_rma()` particulary on x64:
+--   * `IR_LDEXP` with `fld` instruction for loading constant
+--      number `TValue` by address.
+--   * `IR_OBAR` with the corresponding `test` instruction on
+--     `marked` field of `GCobj`.
+-- Also, test correctness.
+test:plan(4)
+
+local ffi = require('ffi')
+
+collectgarbage()
+-- Chomp memory in currently allocated GC space.
+collectgarbage('stop')
+
+for _ = 1, 8 do
+  ffi.new('char[?]', 256 * 1024 * 1024)
+end
+
+jit.opt.start('hotloop=1')
+
+-- Test `IR_LDEXP`.
+
+-- Reproducer here is a little tricky.
+-- We need to generate a bunch of traces as far we reference an
+-- IR field (`TValue`) address in `emit_rma()`. The amount of
+-- traces is empirical. Usually, assert fails on ~33d iteration,
+-- so let's use 100 just to be sure.
+local test_marker
+for _ = 1, 100 do
+  test_marker = loadstring([[
+    local test_marker
+    for i = 1, 4 do
+      -- Avoid fold optimization, use `i` as the second argument.
+      -- Need some constant differs from 1 or 0 as the first
+      -- argument.
+      test_marker = math.ldexp(1.2, i)
+    end
+    return test_marker
+  ]])()
+end
+
+-- If we here, it means no assertion failed during emitting.
+test:ok(true, 'IR_LDEXP emit_rma')
+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
+
+-- Test `IR_OBAR`.
+
+-- First, create a closed upvalue.
+do
+  local uv -- luacheck: no unused
+  -- `IR_OBAR` is used for object write barrier on upvalues.
+  _G.change_uv = function(newv)
+    uv = newv
+  end
+end
+
+-- We need a constant value on trace to be referenced far enough
+-- from dispatch table. So we need to create a new function
+-- prototype with a constant string.
+-- This string should be long enough to be allocated with direct
+-- alloc far away from dispatch.
+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
+local func_with_trace = loadstring([[
+  for _ = 1, 4 do
+    change_uv(']] .. str .. [[')
+  end
+]])
+func_with_trace()
+
+-- If we here, it means no assertion failed during emitting.
+test:ok(true, 'IR_OBAR emit_rma')
+
+-- Now check the correctness.
+
+-- Set GC state to GCpause.
+collectgarbage()
+
+-- We want to wait for the situation, when upvalue is black,
+-- the string is gray. Both conditions are satisfied, when the
+-- corresponding `change_uv()` function is marked, for example.
+-- We don't know on what exactly step our upvalue is marked as
+-- black and execution of trace become dangerous, so just check it
+-- at each step.
+-- Don't need to do the full GC cycle step by step.
+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
+  collectgarbage('step')
+  func_with_trace()
+end
+
+-- If we here, it means no assertion failed during `gc_mark()`,
+-- due to wrong call to `lj_gc_barrieruv()` on trace.
+test:ok(true, 'IR_OBAR emit_rma check correctness')
+
+os.exit(test:check() and 0 or 1)
-- 
2.34.1