Hi! Thanks for the fixes! LGTM. -- Best regards, Maxim Kokryashkin     >  >>Hi, Maxim! >>Thanks for the review! >>I've updated the commit message and comments, force-pushed the branch. >> >>On 23.03.23, Maxim Kokryashkin wrote: >>> >>> Hi! >>> Thanks for the patch! >>>   >>> >  >>> >>From: Mike Pall >>> >> >>> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) >>> >> >>> >>The accessing of memory address for some operation `emit_rma()` may be >>> >>encoded in one of the following ways: >>> >> a. If the offset of the accessing address from the dispatch table >>> >I suggest paraphrasing it the following way: >>> >| If the offset of the address being accessed from the dispatch table >>> >  >>> >Same for the similar sentences below. >>> >>    (pinned to r14 that is not changed while trace execution) fits into >>> >Typo: s/while/during >>> >>    32-bit, then encode this as an access to 32-bit displacement >>> >>    relative to r14. >>> >> b. If the offset of the accessing address from the mcode (i.e. rip) >>> >>    fits into 32-bit, then encode this as an access to 32-bit >>> >>    displacement relative to rip (considering long mode specifics and >>> >>    `RID_RIP` hack). >>> >> c. If the address doesn't fit into 32-bit one and we use `mov` or >>> >>    `movsd`, then encode 64-bit load from this address. >>> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit >>> >>    into 32-bit one) displacement (the only option for non-GC64 mode). >>> >> >>> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should >>> >Typo: s/differs/that differs/ >>> >>be encoded via the last option. But if we got a 64-bit address with a >>> >Typo: s/got/get/ >>> >>big enough offset it can't be encoded and the assertion in `ptr2addr()` >>> >Typo: s/offset/offset,/ >>> >Typo: s/encoded/encoded, >>> >>will fail. >>> >Typo: s/will fail/fails/ >>> >> >>> >>There are several cases, when `emit_rma()` is used with non `mov` >>> >Typo: s/with non-`mov`/with a non-`mov`/ >>> >>instruction: >>> >>* `IR_LDEXP` with `fld` instruction for loading constant >>> >>   number `TValue` by address. >>> >>* `IR_OBAR` with the corresponding `test` instruction on >>> >>  `marked` field of `GCobj`. >>> >>All these instructions require an additional register to store value by >>> >>address. We can't truly allocate a register here due to possibility to >>> >>break IR assembling which depends on specific register usage. So, we use >>> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling, >>> >>and restore r14 here for emitting. >>> >> >>> >>Also, this patch removes `movsd` from condition from the `x86Op` type >>> >>check, as far as it never uses for the `emit_rma()` routine (see also >>> >Typo: s/uses/used >>> >>`emit_loadk64()` for details). >>> >> >>> >>Sergey Kaplun: >>> >>* added the description and the test for the problem >>> >> >>> >>Part of tarantool/tarantool#8069 >>> >>--- >> >>Fixed the commit message; new commit message is the following: >> >>x64/LJ_GC64: Fix emit_rma(). >> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) >> >>``` >>The accessing of memory address for some operation `emit_rma()` may be >>encoded in one of the following ways: >> a. If the offset of the address being accessed from the dispatch table >>    (pinned to r14 that is not changed during trace execution) fits into >>    32-bit, then encode this as an access to 32-bit displacement >>    relative to r14. >> b. If the offset of the address being accessed from the mcode (i.e. >>    rip) fits into 32-bit, then encode this as an access to 32-bit >>    displacement relative to rip (considering long mode specifics and >>    `RID_RIP` hack). >> c. If the address doesn't fit into 32-bit one and we use `mov` or >>    `movsd`, then encode 64-bit load from this address. >> d. Elsewhere, encode it as an access to 32-bit (the address should fit >>    into 32-bit one) displacement (the only option for non-GC64 mode). >> >>So, each instruction in GC64 mode that differs from `mov` or `movsd` >>should be encoded via the last option. But if we get a 64-bit address >>with a big enough offset, it can't be encoded, so the assertion in >>`ptr2addr()` fails. >> >>There are several cases, when `emit_rma()` is used with a non-`mov` >>instruction: >>* `IR_LDEXP` with `fld` instruction for loading constant >>   number `TValue` by address. >>* `IR_OBAR` with the corresponding `test` instruction on >>  `marked` field of `GCobj`. >>All these instructions require an additional register to store value by >>address. We can't truly allocate a register here due to the possibility >>to break IR assembling which depends on specific register usage. So, we >>use and restore r14 here for emitting. >> >>Also, this patch removes `movsd` from condition from the `x86Op` type >>check, as far as it is never used for the `emit_rma()` routine (see also >>`emit_loadk64()` for details). >> >>Sergey Kaplun: >>* added the description and the test for the problem >> >>Part of tarantool/tarantool#8069 >>``` >> >>> >> >>> >>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma >>> >>PR: https://github.com/tarantool/tarantool/pull/8477 >>> >>Related issue: https://github.com/tarantool/tarantool/issues/8069 >>> >> >>> >>AFAICS, other places with `emit_rma()` usage are not related to the >>> >>patch as far as they take an offset for the address of JIT constants >>> >>stored in `jit_State`, so it always be near enough to dispatch. >>> >> >>> >>Side note: you may check test-correctness of the last check with GC by >>> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to >>> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member >>> >>that instructions are emitted from bottom to top! >>> >> >>> >> src/lj_emit_x86.h | 24 ++++- >>> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ >>> >> 2 files changed, 123 insertions(+), 3 deletions(-) >>> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua >>> >> >>> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h >>> >>index 6b58306b..b3dc4ea5 100644 >>> >>--- a/src/lj_emit_x86.h >>> >>+++ b/src/lj_emit_x86.h >>> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr) >>> >>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr)); >>> >>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) { >>> >>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr)); >>> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) { >>> >>- emit_rmro(as, xo, rr, rr, 0); >>> >>- emit_loadu64(as, rr, (uintptr_t)addr); >>> >>+ } else if (!checki32((intptr_t)addr)) { >>> >>+ Reg ra = (rr & 15); >>> >>+ if (xo != XO_MOV) { >>> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */ >>> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch; >>> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0; >>> >>+ ra = RID_DISPATCH; >>> >>+ if (checku32(dispaddr)) { >>> >>+ emit_loadi(as, ra, (int32_t)dispaddr); >>> >>+ } else { /* Full-size 64 bit load. */ >>> >>+ MCode *p = as->mcp; >>> >>+ *(uint64_t *)(p-8) = dispaddr; >>> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7)); >>> >>+ p[-10] = 0x48 + ((ra>>3)&1); >>> >Why is it `0x48`? >> >>IINM, this is REX prefix for the instruction [1], Sergos should correct >>me if I wrong. >> >>| perl -E 'say sprintf "%08b", 0x48' >>| 01001000 >>  ****WRXB >>****: Fixed bit pattern. >>W bit is set to mark that a 64-bit operand size is used. >> >>This is the same approach as for `emit_loadu64()`, so I didn't left any comment. >> >>> >>+ p -= 10; >>> >>+ as->mcp = p; >>> >>+ } >>> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8); >>> >>+ } >>> >>+ emit_rmro(as, xo, rr, ra, 0); >>> >>+ emit_loadu64(as, ra, (uintptr_t)addr); >>> >>   } else >>> >> #endif >>> >>   { >>> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua >>> >>new file mode 100644 >>> >>index 00000000..faddfe83 >>> >>--- /dev/null >>> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua >> >> >> >>> >>+-- Test `IR_LDEXP`. >>> >>+ >>> >>+-- Reproducer here is a little tricky. >>> >>+-- We need to generate a bunch of traces as far we reference an >>> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of >>> >>+-- traces is empirical. Usually, assert fails on ~33d iteration, >>> >>+-- so let's use 100 just to be sure. >>> >Is there any way to make it more deterministic? >> >>I suppose no, due to undetermenistic memory allocation. So, just >>use big enough number. >> >>> >>+local test_marker >>> >>+for _ = 1, 100 do >>> >>+ test_marker = loadstring([[ >>> >>+ local test_marker >>> >>+ for i = 1, 4 do >>> >>+ -- Avoid fold optimization, use `i` as the second argument. >>> >>+ -- Need some constant differs from 1 or 0 as the first >>> >>+ -- argument. >>> >>+ test_marker = math.ldexp(1.2, i) >>> >>+ end >>> >>+ return test_marker >>> >>+ ]])() >>> >>+end >>> >>+ >>> >>+-- If we here, it means no assertion failed during emitting. >>> >Typo: s/we/we are/ >> >>Fixed, thanks! >> >>> >>+test:ok(true, 'IR_LDEXP emit_rma') >>> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result') >>> >>+ >>> >>+-- Test `IR_OBAR`. >>> >>+ >>> >>+-- First, create a closed upvalue. >>> >>+do >>> >>+ local uv -- luacheck: no unused >>> >>+ -- `IR_OBAR` is used for object write barrier on upvalues. >>> >>+ _G.change_uv = function(newv) >>> >>+ uv = newv >>> >>+ end >>> >>+end >>> >>+ >>> >>+-- We need a constant value on trace to be referenced far enough >>> >>+-- from dispatch table. So we need to create a new function >>> >>+-- prototype with a constant string. >>> >>+-- This string should be long enough to be allocated with direct >>> >>+-- alloc far away from dispatch. >>> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024 >>> >Why is that amount sufficient? Link to the source file would be enough. >> >>Added the following comment. >>``` >>@@ -64,7 +64,8 @@ end >> -- from dispatch table. So we need to create a new function >> -- prototype with a constant string. >> -- This string should be long enough to be allocated with direct >>--- alloc far away from dispatch. >>+-- alloc (not fitting in free chunks) far away from dispatch. >>+-- See for details. >> local DEFAULT_MMAP_THRESHOLD = 128 * 1024 >> local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) >> local func_with_trace = loadstring([[ >>@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[ >> ]]) >>``` >> >>> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) >>> >>+local func_with_trace = loadstring([[ >>> >>+ for _ = 1, 4 do >>> >>+ change_uv(']] .. str .. [[') >>> >>+ end >>> >>+]]) >>> >>+func_with_trace() >>> >>+ >>> >>+-- If we here, it means no assertion failed during emitting. >>> >Typo: s/we here/we are here/ >> >>Fixed, thanks! >> >>> >>+test:ok(true, 'IR_OBAR emit_rma') >>> >>+ >>> >>+-- Now check the correctness. >>> >>+ >>> >>+-- Set GC state to GCpause. >>> >>+collectgarbage() >>> >>+ >>> >>+-- We want to wait for the situation, when upvalue is black, >>> >>+-- the string is gray. Both conditions are satisfied, when the >>> >>+-- corresponding `change_uv()` function is marked, for example. >>> >>+-- We don't know on what exactly step our upvalue is marked as >>> >Typo: s/exactly/exact/ >> >>Fixed, thanks! >> >>> >>+-- black and execution of trace become dangerous, so just check it >>> >>+-- at each step. >>> >>+-- Don't need to do the full GC cycle step by step. >>> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic >>> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do >>> >>+ collectgarbage('step') >>> >>+ func_with_trace() >>> >>+end >>> >>+ >>> >>+-- If we here, it means no assertion failed during `gc_mark()`, >>> >Typo: s/we here/we are here/ >> >>Fixed, thanks! >> >>> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace. >>> >>+test:ok(true, 'IR_OBAR emit_rma check correctness') >>> >>+ >>> >>+os.exit(test:check() and 0 or 1) >>> >>-- >>> >>2.34.1 >>> >-- >>> >Best regards, >>> >Maxim Kokryashkin >>> >  >> >>[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix >> >>-- >>Best regards, >>Sergey Kaplun >