[Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().

Fri Mar 24 16:05:43 MSK 2023

Hi, Maxim!
Thanks for the review!
I've updated the commit message and comments, force-pushed the branch.

On 23.03.23, Maxim Kokryashkin wrote:
> 
> Hi!
> Thanks for the patch!
>  
> > 
> >>From: Mike Pall <mike>
> >>
> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
> >>
> >>The accessing of memory address for some operation `emit_rma()` may be
> >>encoded in one of the following ways:
> >> a. If the offset of the accessing address from the dispatch table
> >I suggest paraphrasing it the following way:
> >| If the offset of the address being accessed from the dispatch table
> > 
> >Same for the similar sentences below.
> >>    (pinned to r14 that is not changed while trace execution) fits into
> >Typo: s/while/during
> >>    32-bit, then encode this as an access to 32-bit displacement
> >>    relative to r14.
> >> b. If the offset of the accessing address from the mcode (i.e. rip)
> >>    fits into 32-bit, then encode this as an access to 32-bit
> >>    displacement relative to rip (considering long mode specifics and
> >>    `RID_RIP` hack).
> >> c. If the address doesn't fit into 32-bit one and we use `mov` or
> >>    `movsd`, then encode 64-bit load from this address.
> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit
> >>    into 32-bit one) displacement (the only option for non-GC64 mode).
> >>
> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should
> >Typo: s/differs/that differs/
> >>be encoded via the last option. But if we got a 64-bit address with a
> >Typo: s/got/get/
> >>big enough offset it can't be encoded and the assertion in `ptr2addr()`
> >Typo: s/offset/offset,/
> >Typo: s/encoded/encoded,
> >>will fail.
> >Typo: s/will fail/fails/
> >>
> >>There are several cases, when `emit_rma()` is used with non `mov`
> >Typo: s/with non-`mov`/with a non-`mov`/
> >>instruction:
> >>* `IR_LDEXP` with `fld` instruction for loading constant
> >>   number `TValue` by address.
> >>* `IR_OBAR` with the corresponding `test` instruction on
> >>  `marked` field of `GCobj`.
> >>All these instructions require an additional register to store value by
> >>address. We can't truly allocate a register here due to possibility to
> >>break IR assembling which depends on specific register usage. So, we use
> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
> >>and restore r14 here for emitting.
> >>
> >>Also, this patch removes `movsd` from condition from the `x86Op` type
> >>check, as far as it never uses for the `emit_rma()` routine (see also
> >Typo: s/uses/used
> >>`emit_loadk64()` for details).
> >>
> >>Sergey Kaplun:
> >>* added the description and the test for the problem
> >>
> >>Part of tarantool/tarantool#8069
> >>---

Fixed the commit message; new commit message is the following:

x64/LJ_GC64: Fix emit_rma().

(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)

```
The accessing of memory address for some operation `emit_rma()` may be
encoded in one of the following ways:
 a. If the offset of the address being accessed from the dispatch table
    (pinned to r14 that is not changed during trace execution) fits into
    32-bit, then encode this as an access to 32-bit displacement
    relative to r14.
 b. If the offset of the address being accessed from the mcode (i.e.
    rip) fits into 32-bit, then encode this as an access to 32-bit
    displacement relative to rip (considering long mode specifics and
    `RID_RIP` hack).
 c. If the address doesn't fit into 32-bit one and we use `mov` or
    `movsd`, then encode 64-bit load from this address.
 d. Elsewhere, encode it as an access to 32-bit (the address should fit
    into 32-bit one) displacement (the only option for non-GC64 mode).

So, each instruction in GC64 mode that differs from `mov` or `movsd`
should be encoded via the last option. But if we get a 64-bit address
with a big enough offset, it can't be encoded, so the assertion in
`ptr2addr()` fails.

There are several cases, when `emit_rma()` is used with a non-`mov`
instruction:
* `IR_LDEXP` with `fld` instruction for loading constant
   number `TValue` by address.
* `IR_OBAR` with the corresponding `test` instruction on
  `marked` field of `GCobj`.
All these instructions require an additional register to store value by
address. We can't truly allocate a register here due to the possibility
to break IR assembling which depends on specific register usage. So, we
use and restore r14 here for emitting.

Also, this patch removes `movsd` from condition from the `x86Op` type
check, as far as it is never used for the `emit_rma()` routine (see also
`emit_loadk64()` for details).

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#8069
```

> >>
> >>Branch:  https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
> >>PR:  https://github.com/tarantool/tarantool/pull/8477
> >>Related issue:  https://github.com/tarantool/tarantool/issues/8069
> >>
> >>AFAICS, other places with `emit_rma()` usage are not related to the
> >>patch as far as they take an offset for the address of JIT constants
> >>stored in `jit_State`, so it always be near enough to dispatch.
> >>
> >>Side note: you may check test-correctness of the last check with GC by
> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to
> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member
> >>that instructions are emitted from bottom to top!
> >>
> >> src/lj_emit_x86.h | 24 ++++-
> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
> >> 2 files changed, 123 insertions(+), 3 deletions(-)
> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
> >>
> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
> >>index 6b58306b..b3dc4ea5 100644
> >>--- a/src/lj_emit_x86.h
> >>+++ b/src/lj_emit_x86.h
> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
> >>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
> >>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
> >>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
> >>- emit_rmro(as, xo, rr, rr, 0);
> >>- emit_loadu64(as, rr, (uintptr_t)addr);
> >>+ } else if (!checki32((intptr_t)addr)) {
> >>+ Reg ra = (rr & 15);
> >>+ if (xo != XO_MOV) {
> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
> >>+ ra = RID_DISPATCH;
> >>+ if (checku32(dispaddr)) {
> >>+ emit_loadi(as, ra, (int32_t)dispaddr);
> >>+ } else { /* Full-size 64 bit load. */
> >>+ MCode *p = as->mcp;
> >>+ *(uint64_t *)(p-8) = dispaddr;
> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
> >>+ p[-10] = 0x48 + ((ra>>3)&1);
> >Why is it `0x48`?

IINM, this is  REX prefix for the instruction [1], Sergos should correct
me if I wrong.

| perl -E 'say sprintf "%08b", 0x48'
| 01001000
  ****WRXB
****: Fixed bit pattern.
W bit is set to mark that a 64-bit operand size is used.

This is the same approach as for `emit_loadu64()`, so I didn't left any comment.

> >>+ p -= 10;
> >>+ as->mcp = p;
> >>+ }
> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
> >>+ }
> >>+ emit_rmro(as, xo, rr, ra, 0);
> >>+ emit_loadu64(as, ra, (uintptr_t)addr);
> >>   } else
> >> #endif
> >>   {
> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
> >>new file mode 100644
> >>index 00000000..faddfe83
> >>--- /dev/null
> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua

<snipped>

> >>+-- Test `IR_LDEXP`.
> >>+
> >>+-- Reproducer here is a little tricky.
> >>+-- We need to generate a bunch of traces as far we reference an
> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
> >>+-- traces is empirical. Usually, assert fails on ~33d iteration,
> >>+-- so let's use 100 just to be sure.
> >Is there any way to make it more deterministic?

I suppose no, due to undetermenistic memory allocation. So, just
use big enough number.

> >>+local test_marker
> >>+for _ = 1, 100 do
> >>+ test_marker = loadstring([[
> >>+ local test_marker
> >>+ for i = 1, 4 do
> >>+ -- Avoid fold optimization, use `i` as the second argument.
> >>+ -- Need some constant differs from 1 or 0 as the first
> >>+ -- argument.
> >>+ test_marker = math.ldexp(1.2, i)
> >>+ end
> >>+ return test_marker
> >>+ ]])()
> >>+end
> >>+
> >>+-- If we here, it means no assertion failed during emitting.
> >Typo: s/we/we are/

Fixed, thanks!

> >>+test:ok(true, 'IR_LDEXP emit_rma')
> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
> >>+
> >>+-- Test `IR_OBAR`.
> >>+
> >>+-- First, create a closed upvalue.
> >>+do
> >>+ local uv -- luacheck: no unused
> >>+ -- `IR_OBAR` is used for object write barrier on upvalues.
> >>+ _G.change_uv = function(newv)
> >>+ uv = newv
> >>+ end
> >>+end
> >>+
> >>+-- We need a constant value on trace to be referenced far enough
> >>+-- from dispatch table. So we need to create a new function
> >>+-- prototype with a constant string.
> >>+-- This string should be long enough to be allocated with direct
> >>+-- alloc far away from dispatch.
> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
> >Why is that amount sufficient? Link to the source file would be enough.

Added the following comment.
```
@@ -64,7 +64,8 @@ end
 -- from dispatch table. So we need to create a new function
 -- prototype with a constant string.
 -- This string should be long enough to be allocated with direct

--- alloc far away from dispatch.
+-- alloc (not fitting in free chunks) far away from dispatch.
+-- See <src/lj_alloc.c> for details.
 local DEFAULT_MMAP_THRESHOLD = 128 * 1024
 local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
 local func_with_trace = loadstring([[
@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[
 ]])
```

> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
> >>+local func_with_trace = loadstring([[
> >>+ for _ = 1, 4 do
> >>+ change_uv(']] .. str .. [[')
> >>+ end
> >>+]])
> >>+func_with_trace()
> >>+
> >>+-- If we here, it means no assertion failed during emitting.
> >Typo: s/we here/we are here/

Fixed, thanks!

> >>+test:ok(true, 'IR_OBAR emit_rma')
> >>+
> >>+-- Now check the correctness.
> >>+
> >>+-- Set GC state to GCpause.
> >>+collectgarbage()
> >>+
> >>+-- We want to wait for the situation, when upvalue is black,
> >>+-- the string is gray. Both conditions are satisfied, when the
> >>+-- corresponding `change_uv()` function is marked, for example.
> >>+-- We don't know on what exactly step our upvalue is marked as
> >Typo: s/exactly/exact/

Fixed, thanks!

> >>+-- black and execution of trace become dangerous, so just check it
> >>+-- at each step.
> >>+-- Don't need to do the full GC cycle step by step.
> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
> >>+ collectgarbage('step')
> >>+ func_with_trace()
> >>+end
> >>+
> >>+-- If we here, it means no assertion failed during `gc_mark()`,
> >Typo: s/we here/we are here/

Fixed, thanks!

> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
> >>+test:ok(true, 'IR_OBAR emit_rma check correctness')
> >>+
> >>+os.exit(test:check() and 0 or 1)
> >>--
> >>2.34.1
> >--
> >Best regards,
> >Maxim Kokryashkin
> > 

[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix

-- 
Best regards,
Sergey Kaplun