[Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().

Fri Mar 24 17:42:17 MSK 2023

Hi!
Thanks for the fixes!
LGTM.
--
Best regards,
Maxim Kokryashkin
 
 
> 
>>Hi, Maxim!
>>Thanks for the review!
>>I've updated the commit message and comments, force-pushed the branch.
>>
>>On 23.03.23, Maxim Kokryashkin wrote:
>>>
>>> Hi!
>>> Thanks for the patch!
>>>  
>>> > 
>>> >>From: Mike Pall <mike>
>>> >>
>>> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>> >>
>>> >>The accessing of memory address for some operation `emit_rma()` may be
>>> >>encoded in one of the following ways:
>>> >> a. If the offset of the accessing address from the dispatch table
>>> >I suggest paraphrasing it the following way:
>>> >| If the offset of the address being accessed from the dispatch table
>>> > 
>>> >Same for the similar sentences below.
>>> >>    (pinned to r14 that is not changed while trace execution) fits into
>>> >Typo: s/while/during
>>> >>    32-bit, then encode this as an access to 32-bit displacement
>>> >>    relative to r14.
>>> >> b. If the offset of the accessing address from the mcode (i.e. rip)
>>> >>    fits into 32-bit, then encode this as an access to 32-bit
>>> >>    displacement relative to rip (considering long mode specifics and
>>> >>    `RID_RIP` hack).
>>> >> c. If the address doesn't fit into 32-bit one and we use `mov` or
>>> >>    `movsd`, then encode 64-bit load from this address.
>>> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>>> >>    into 32-bit one) displacement (the only option for non-GC64 mode).
>>> >>
>>> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should
>>> >Typo: s/differs/that differs/
>>> >>be encoded via the last option. But if we got a 64-bit address with a
>>> >Typo: s/got/get/
>>> >>big enough offset it can't be encoded and the assertion in `ptr2addr()`
>>> >Typo: s/offset/offset,/
>>> >Typo: s/encoded/encoded,
>>> >>will fail.
>>> >Typo: s/will fail/fails/
>>> >>
>>> >>There are several cases, when `emit_rma()` is used with non `mov`
>>> >Typo: s/with non-`mov`/with a non-`mov`/
>>> >>instruction:
>>> >>* `IR_LDEXP` with `fld` instruction for loading constant
>>> >>   number `TValue` by address.
>>> >>* `IR_OBAR` with the corresponding `test` instruction on
>>> >>  `marked` field of `GCobj`.
>>> >>All these instructions require an additional register to store value by
>>> >>address. We can't truly allocate a register here due to possibility to
>>> >>break IR assembling which depends on specific register usage. So, we use
>>> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
>>> >>and restore r14 here for emitting.
>>> >>
>>> >>Also, this patch removes `movsd` from condition from the `x86Op` type
>>> >>check, as far as it never uses for the `emit_rma()` routine (see also
>>> >Typo: s/uses/used
>>> >>`emit_loadk64()` for details).
>>> >>
>>> >>Sergey Kaplun:
>>> >>* added the description and the test for the problem
>>> >>
>>> >>Part of tarantool/tarantool#8069
>>> >>---
>>
>>Fixed the commit message; new commit message is the following:
>>
>>x64/LJ_GC64: Fix emit_rma().
>>
>>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>
>>```
>>The accessing of memory address for some operation `emit_rma()` may be
>>encoded in one of the following ways:
>> a. If the offset of the address being accessed from the dispatch table
>>    (pinned to r14 that is not changed during trace execution) fits into
>>    32-bit, then encode this as an access to 32-bit displacement
>>    relative to r14.
>> b. If the offset of the address being accessed from the mcode (i.e.
>>    rip) fits into 32-bit, then encode this as an access to 32-bit
>>    displacement relative to rip (considering long mode specifics and
>>    `RID_RIP` hack).
>> c. If the address doesn't fit into 32-bit one and we use `mov` or
>>    `movsd`, then encode 64-bit load from this address.
>> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>>    into 32-bit one) displacement (the only option for non-GC64 mode).
>>
>>So, each instruction in GC64 mode that differs from `mov` or `movsd`
>>should be encoded via the last option. But if we get a 64-bit address
>>with a big enough offset, it can't be encoded, so the assertion in
>>`ptr2addr()` fails.
>>
>>There are several cases, when `emit_rma()` is used with a non-`mov`
>>instruction:
>>* `IR_LDEXP` with `fld` instruction for loading constant
>>   number `TValue` by address.
>>* `IR_OBAR` with the corresponding `test` instruction on
>>  `marked` field of `GCobj`.
>>All these instructions require an additional register to store value by
>>address. We can't truly allocate a register here due to the possibility
>>to break IR assembling which depends on specific register usage. So, we
>>use and restore r14 here for emitting.
>>
>>Also, this patch removes `movsd` from condition from the `x86Op` type
>>check, as far as it is never used for the `emit_rma()` routine (see also
>>`emit_loadk64()` for details).
>>
>>Sergey Kaplun:
>>* added the description and the test for the problem
>>
>>Part of tarantool/tarantool#8069
>>```
>>
>>> >>
>>> >>Branch:  https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
>>> >>PR:  https://github.com/tarantool/tarantool/pull/8477
>>> >>Related issue:  https://github.com/tarantool/tarantool/issues/8069
>>> >>
>>> >>AFAICS, other places with `emit_rma()` usage are not related to the
>>> >>patch as far as they take an offset for the address of JIT constants
>>> >>stored in `jit_State`, so it always be near enough to dispatch.
>>> >>
>>> >>Side note: you may check test-correctness of the last check with GC by
>>> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to
>>> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member
>>> >>that instructions are emitted from bottom to top!
>>> >>
>>> >> src/lj_emit_x86.h | 24 ++++-
>>> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>>> >> 2 files changed, 123 insertions(+), 3 deletions(-)
>>> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>>> >>
>>> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
>>> >>index 6b58306b..b3dc4ea5 100644
>>> >>--- a/src/lj_emit_x86.h
>>> >>+++ b/src/lj_emit_x86.h
>>> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
>>> >>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
>>> >>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
>>> >>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
>>> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
>>> >>- emit_rmro(as, xo, rr, rr, 0);
>>> >>- emit_loadu64(as, rr, (uintptr_t)addr);
>>> >>+ } else if (!checki32((intptr_t)addr)) {
>>> >>+ Reg ra = (rr & 15);
>>> >>+ if (xo != XO_MOV) {
>>> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
>>> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
>>> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
>>> >>+ ra = RID_DISPATCH;
>>> >>+ if (checku32(dispaddr)) {
>>> >>+ emit_loadi(as, ra, (int32_t)dispaddr);
>>> >>+ } else { /* Full-size 64 bit load. */
>>> >>+ MCode *p = as->mcp;
>>> >>+ *(uint64_t *)(p-8) = dispaddr;
>>> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
>>> >>+ p[-10] = 0x48 + ((ra>>3)&1);
>>> >Why is it `0x48`?
>>
>>IINM, this is REX prefix for the instruction [1], Sergos should correct
>>me if I wrong.
>>
>>| perl -E 'say sprintf "%08b", 0x48'
>>| 01001000
>>  ****WRXB
>>****: Fixed bit pattern.
>>W bit is set to mark that a 64-bit operand size is used.
>>
>>This is the same approach as for `emit_loadu64()`, so I didn't left any comment.
>>
>>> >>+ p -= 10;
>>> >>+ as->mcp = p;
>>> >>+ }
>>> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
>>> >>+ }
>>> >>+ emit_rmro(as, xo, rr, ra, 0);
>>> >>+ emit_loadu64(as, ra, (uintptr_t)addr);
>>> >>   } else
>>> >> #endif
>>> >>   {
>>> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
>>> >>new file mode 100644
>>> >>index 00000000..faddfe83
>>> >>--- /dev/null
>>> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua
>>
>><snipped>
>>
>>> >>+-- Test `IR_LDEXP`.
>>> >>+
>>> >>+-- Reproducer here is a little tricky.
>>> >>+-- We need to generate a bunch of traces as far we reference an
>>> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
>>> >>+-- traces is empirical. Usually, assert fails on ~33d iteration,
>>> >>+-- so let's use 100 just to be sure.
>>> >Is there any way to make it more deterministic?
>>
>>I suppose no, due to undetermenistic memory allocation. So, just
>>use big enough number.
>>
>>> >>+local test_marker
>>> >>+for _ = 1, 100 do
>>> >>+ test_marker = loadstring([[
>>> >>+ local test_marker
>>> >>+ for i = 1, 4 do
>>> >>+ -- Avoid fold optimization, use `i` as the second argument.
>>> >>+ -- Need some constant differs from 1 or 0 as the first
>>> >>+ -- argument.
>>> >>+ test_marker = math.ldexp(1.2, i)
>>> >>+ end
>>> >>+ return test_marker
>>> >>+ ]])()
>>> >>+end
>>> >>+
>>> >>+-- If we here, it means no assertion failed during emitting.
>>> >Typo: s/we/we are/
>>
>>Fixed, thanks!
>>
>>> >>+test:ok(true, 'IR_LDEXP emit_rma')
>>> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
>>> >>+
>>> >>+-- Test `IR_OBAR`.
>>> >>+
>>> >>+-- First, create a closed upvalue.
>>> >>+do
>>> >>+ local uv -- luacheck: no unused
>>> >>+ -- `IR_OBAR` is used for object write barrier on upvalues.
>>> >>+ _G.change_uv = function(newv)
>>> >>+ uv = newv
>>> >>+ end
>>> >>+end
>>> >>+
>>> >>+-- We need a constant value on trace to be referenced far enough
>>> >>+-- from dispatch table. So we need to create a new function
>>> >>+-- prototype with a constant string.
>>> >>+-- This string should be long enough to be allocated with direct
>>> >>+-- alloc far away from dispatch.
>>> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>>> >Why is that amount sufficient? Link to the source file would be enough.
>>
>>Added the following comment.
>>```
>>@@ -64,7 +64,8 @@ end
>> -- from dispatch table. So we need to create a new function
>> -- prototype with a constant string.
>> -- This string should be long enough to be allocated with direct
>>--- alloc far away from dispatch.
>>+-- alloc (not fitting in free chunks) far away from dispatch.
>>+-- See <src/lj_alloc.c> for details.
>> local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>> local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>> local func_with_trace = loadstring([[
>>@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[
>> ]])
>>```
>>
>>> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>>> >>+local func_with_trace = loadstring([[
>>> >>+ for _ = 1, 4 do
>>> >>+ change_uv(']] .. str .. [[')
>>> >>+ end
>>> >>+]])
>>> >>+func_with_trace()
>>> >>+
>>> >>+-- If we here, it means no assertion failed during emitting.
>>> >Typo: s/we here/we are here/
>>
>>Fixed, thanks!
>>
>>> >>+test:ok(true, 'IR_OBAR emit_rma')
>>> >>+
>>> >>+-- Now check the correctness.
>>> >>+
>>> >>+-- Set GC state to GCpause.
>>> >>+collectgarbage()
>>> >>+
>>> >>+-- We want to wait for the situation, when upvalue is black,
>>> >>+-- the string is gray. Both conditions are satisfied, when the
>>> >>+-- corresponding `change_uv()` function is marked, for example.
>>> >>+-- We don't know on what exactly step our upvalue is marked as
>>> >Typo: s/exactly/exact/
>>
>>Fixed, thanks!
>>
>>> >>+-- black and execution of trace become dangerous, so just check it
>>> >>+-- at each step.
>>> >>+-- Don't need to do the full GC cycle step by step.
>>> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
>>> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
>>> >>+ collectgarbage('step')
>>> >>+ func_with_trace()
>>> >>+end
>>> >>+
>>> >>+-- If we here, it means no assertion failed during `gc_mark()`,
>>> >Typo: s/we here/we are here/
>>
>>Fixed, thanks!
>>
>>> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
>>> >>+test:ok(true, 'IR_OBAR emit_rma check correctness')
>>> >>+
>>> >>+os.exit(test:check() and 0 or 1)
>>> >>--
>>> >>2.34.1
>>> >--
>>> >Best regards,
>>> >Maxim Kokryashkin
>>> > 
>>
>>[1]:  https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
>>
>>--
>>Best regards,
>>Sergey Kaplun
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20230324/c582bac2/attachment.htm>