<HTML><BODY><div>Hi!</div><div>Thanks for the fixes!</div><div>LGTM.</div><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Best regards,</div><div>Maxim Kokryashkin</div></div></div><div> </div><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div> <blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_16796633691024687906_BODY">Hi, Maxim!<br>Thanks for the review!<br>I've updated the commit message and comments, force-pushed the branch.<br><br>On 23.03.23, Maxim Kokryashkin wrote:<br>><br>> Hi!<br>> Thanks for the patch!<br>> <br>> > <br>> >>From: Mike Pall <mike><br>> >><br>> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)<br>> >><br>> >>The accessing of memory address for some operation `emit_rma()` may be<br>> >>encoded in one of the following ways:<br>> >> a. If the offset of the accessing address from the dispatch table<br>> >I suggest paraphrasing it the following way:<br>> >| If the offset of the address being accessed from the dispatch table<br>> > <br>> >Same for the similar sentences below.<br>> >> (pinned to r14 that is not changed while trace execution) fits into<br>> >Typo: s/while/during<br>> >> 32-bit, then encode this as an access to 32-bit displacement<br>> >> relative to r14.<br>> >> b. If the offset of the accessing address from the mcode (i.e. rip)<br>> >> fits into 32-bit, then encode this as an access to 32-bit<br>> >> displacement relative to rip (considering long mode specifics and<br>> >> `RID_RIP` hack).<br>> >> c. If the address doesn't fit into 32-bit one and we use `mov` or<br>> >> `movsd`, then encode 64-bit load from this address.<br>> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit<br>> >> into 32-bit one) displacement (the only option for non-GC64 mode).<br>> >><br>> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should<br>> >Typo: s/differs/that differs/<br>> >>be encoded via the last option. But if we got a 64-bit address with a<br>> >Typo: s/got/get/<br>> >>big enough offset it can't be encoded and the assertion in `ptr2addr()`<br>> >Typo: s/offset/offset,/<br>> >Typo: s/encoded/encoded,<br>> >>will fail.<br>> >Typo: s/will fail/fails/<br>> >><br>> >>There are several cases, when `emit_rma()` is used with non `mov`<br>> >Typo: s/with non-`mov`/with a non-`mov`/<br>> >>instruction:<br>> >>* `IR_LDEXP` with `fld` instruction for loading constant<br>> >> number `TValue` by address.<br>> >>* `IR_OBAR` with the corresponding `test` instruction on<br>> >> `marked` field of `GCobj`.<br>> >>All these instructions require an additional register to store value by<br>> >>address. We can't truly allocate a register here due to possibility to<br>> >>break IR assembling which depends on specific register usage. So, we use<br>> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,<br>> >>and restore r14 here for emitting.<br>> >><br>> >>Also, this patch removes `movsd` from condition from the `x86Op` type<br>> >>check, as far as it never uses for the `emit_rma()` routine (see also<br>> >Typo: s/uses/used<br>> >>`emit_loadk64()` for details).<br>> >><br>> >>Sergey Kaplun:<br>> >>* added the description and the test for the problem<br>> >><br>> >>Part of tarantool/tarantool#8069<br>> >>---<br><br>Fixed the commit message; new commit message is the following:<br><br>x64/LJ_GC64: Fix emit_rma().<br><br>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)<br><br>```<br>The accessing of memory address for some operation `emit_rma()` may be<br>encoded in one of the following ways:<br> a. If the offset of the address being accessed from the dispatch table<br> (pinned to r14 that is not changed during trace execution) fits into<br> 32-bit, then encode this as an access to 32-bit displacement<br> relative to r14.<br> b. If the offset of the address being accessed from the mcode (i.e.<br> rip) fits into 32-bit, then encode this as an access to 32-bit<br> displacement relative to rip (considering long mode specifics and<br> `RID_RIP` hack).<br> c. If the address doesn't fit into 32-bit one and we use `mov` or<br> `movsd`, then encode 64-bit load from this address.<br> d. Elsewhere, encode it as an access to 32-bit (the address should fit<br> into 32-bit one) displacement (the only option for non-GC64 mode).<br><br>So, each instruction in GC64 mode that differs from `mov` or `movsd`<br>should be encoded via the last option. But if we get a 64-bit address<br>with a big enough offset, it can't be encoded, so the assertion in<br>`ptr2addr()` fails.<br><br>There are several cases, when `emit_rma()` is used with a non-`mov`<br>instruction:<br>* `IR_LDEXP` with `fld` instruction for loading constant<br> number `TValue` by address.<br>* `IR_OBAR` with the corresponding `test` instruction on<br> `marked` field of `GCobj`.<br>All these instructions require an additional register to store value by<br>address. We can't truly allocate a register here due to the possibility<br>to break IR assembling which depends on specific register usage. So, we<br>use and restore r14 here for emitting.<br><br>Also, this patch removes `movsd` from condition from the `x86Op` type<br>check, as far as it is never used for the `emit_rma()` routine (see also<br>`emit_loadk64()` for details).<br><br>Sergey Kaplun:<br>* added the description and the test for the problem<br><br>Part of tarantool/tarantool#8069<br>```<br><br>> >><br>> >>Branch: <a href="https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma" target="_blank">https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma</a><br>> >>PR: <a href="https://github.com/tarantool/tarantool/pull/8477" target="_blank">https://github.com/tarantool/tarantool/pull/8477</a><br>> >>Related issue: <a href="https://github.com/tarantool/tarantool/issues/8069" target="_blank">https://github.com/tarantool/tarantool/issues/8069</a><br>> >><br>> >>AFAICS, other places with `emit_rma()` usage are not related to the<br>> >>patch as far as they take an offset for the address of JIT constants<br>> >>stored in `jit_State`, so it always be near enough to dispatch.<br>> >><br>> >>Side note: you may check test-correctness of the last check with GC by<br>> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to<br>> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member<br>> >>that instructions are emitted from bottom to top!<br>> >><br>> >> src/lj_emit_x86.h | 24 ++++-<br>> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++<br>> >> 2 files changed, 123 insertions(+), 3 deletions(-)<br>> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua<br>> >><br>> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h<br>> >>index 6b58306b..b3dc4ea5 100644<br>> >>--- a/src/lj_emit_x86.h<br>> >>+++ b/src/lj_emit_x86.h<br>> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)<br>> >> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));<br>> >> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {<br>> >> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));<br>> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {<br>> >>- emit_rmro(as, xo, rr, rr, 0);<br>> >>- emit_loadu64(as, rr, (uintptr_t)addr);<br>> >>+ } else if (!checki32((intptr_t)addr)) {<br>> >>+ Reg ra = (rr & 15);<br>> >>+ if (xo != XO_MOV) {<br>> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */<br>> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;<br>> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;<br>> >>+ ra = RID_DISPATCH;<br>> >>+ if (checku32(dispaddr)) {<br>> >>+ emit_loadi(as, ra, (int32_t)dispaddr);<br>> >>+ } else { /* Full-size 64 bit load. */<br>> >>+ MCode *p = as->mcp;<br>> >>+ *(uint64_t *)(p-8) = dispaddr;<br>> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7));<br>> >>+ p[-10] = 0x48 + ((ra>>3)&1);<br>> >Why is it `0x48`?<br><br>IINM, this is REX prefix for the instruction [1], Sergos should correct<br>me if I wrong.<br><br>| perl -E 'say sprintf "%08b", 0x48'<br>| 01001000<br> ****WRXB<br>****: Fixed bit pattern.<br>W bit is set to mark that a 64-bit operand size is used.<br><br>This is the same approach as for `emit_loadu64()`, so I didn't left any comment.<br><br>> >>+ p -= 10;<br>> >>+ as->mcp = p;<br>> >>+ }<br>> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8);<br>> >>+ }<br>> >>+ emit_rmro(as, xo, rr, ra, 0);<br>> >>+ emit_loadu64(as, ra, (uintptr_t)addr);<br>> >> } else<br>> >> #endif<br>> >> {<br>> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua<br>> >>new file mode 100644<br>> >>index 00000000..faddfe83<br>> >>--- /dev/null<br>> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua<br><br><snipped><br><br>> >>+-- Test `IR_LDEXP`.<br>> >>+<br>> >>+-- Reproducer here is a little tricky.<br>> >>+-- We need to generate a bunch of traces as far we reference an<br>> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of<br>> >>+-- traces is empirical. Usually, assert fails on ~33d iteration,<br>> >>+-- so let's use 100 just to be sure.<br>> >Is there any way to make it more deterministic?<br><br>I suppose no, due to undetermenistic memory allocation. So, just<br>use big enough number.<br><br>> >>+local test_marker<br>> >>+for _ = 1, 100 do<br>> >>+ test_marker = loadstring([[<br>> >>+ local test_marker<br>> >>+ for i = 1, 4 do<br>> >>+ -- Avoid fold optimization, use `i` as the second argument.<br>> >>+ -- Need some constant differs from 1 or 0 as the first<br>> >>+ -- argument.<br>> >>+ test_marker = math.ldexp(1.2, i)<br>> >>+ end<br>> >>+ return test_marker<br>> >>+ ]])()<br>> >>+end<br>> >>+<br>> >>+-- If we here, it means no assertion failed during emitting.<br>> >Typo: s/we/we are/<br><br>Fixed, thanks!<br><br>> >>+test:ok(true, 'IR_LDEXP emit_rma')<br>> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')<br>> >>+<br>> >>+-- Test `IR_OBAR`.<br>> >>+<br>> >>+-- First, create a closed upvalue.<br>> >>+do<br>> >>+ local uv -- luacheck: no unused<br>> >>+ -- `IR_OBAR` is used for object write barrier on upvalues.<br>> >>+ _G.change_uv = function(newv)<br>> >>+ uv = newv<br>> >>+ end<br>> >>+end<br>> >>+<br>> >>+-- We need a constant value on trace to be referenced far enough<br>> >>+-- from dispatch table. So we need to create a new function<br>> >>+-- prototype with a constant string.<br>> >>+-- This string should be long enough to be allocated with direct<br>> >>+-- alloc far away from dispatch.<br>> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024<br>> >Why is that amount sufficient? Link to the source file would be enough.<br><br>Added the following comment.<br>```<br>@@ -64,7 +64,8 @@ end<br> -- from dispatch table. So we need to create a new function<br> -- prototype with a constant string.<br> -- This string should be long enough to be allocated with direct<br>--- alloc far away from dispatch.<br>+-- alloc (not fitting in free chunks) far away from dispatch.<br>+-- See <src/lj_alloc.c> for details.<br> local DEFAULT_MMAP_THRESHOLD = 128 * 1024<br> local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)<br> local func_with_trace = loadstring([[<br>@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[<br> ]])<br>```<br><br>> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)<br>> >>+local func_with_trace = loadstring([[<br>> >>+ for _ = 1, 4 do<br>> >>+ change_uv(']] .. str .. [[')<br>> >>+ end<br>> >>+]])<br>> >>+func_with_trace()<br>> >>+<br>> >>+-- If we here, it means no assertion failed during emitting.<br>> >Typo: s/we here/we are here/<br><br>Fixed, thanks!<br><br>> >>+test:ok(true, 'IR_OBAR emit_rma')<br>> >>+<br>> >>+-- Now check the correctness.<br>> >>+<br>> >>+-- Set GC state to GCpause.<br>> >>+collectgarbage()<br>> >>+<br>> >>+-- We want to wait for the situation, when upvalue is black,<br>> >>+-- the string is gray. Both conditions are satisfied, when the<br>> >>+-- corresponding `change_uv()` function is marked, for example.<br>> >>+-- We don't know on what exactly step our upvalue is marked as<br>> >Typo: s/exactly/exact/<br><br>Fixed, thanks!<br><br>> >>+-- black and execution of trace become dangerous, so just check it<br>> >>+-- at each step.<br>> >>+-- Don't need to do the full GC cycle step by step.<br>> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic<br>> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do<br>> >>+ collectgarbage('step')<br>> >>+ func_with_trace()<br>> >>+end<br>> >>+<br>> >>+-- If we here, it means no assertion failed during `gc_mark()`,<br>> >Typo: s/we here/we are here/<br><br>Fixed, thanks!<br><br>> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace.<br>> >>+test:ok(true, 'IR_OBAR emit_rma check correctness')<br>> >>+<br>> >>+os.exit(test:check() and 0 or 1)<br>> >>--<br>> >>2.34.1<br>> >--<br>> >Best regards,<br>> >Maxim Kokryashkin<br>> > <br><br>[1]: <a href="https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix" target="_blank">https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix</a><br><br>--<br>Best regards,<br>Sergey Kaplun</div></div></div></div></blockquote><div> </div></div></blockquote></BODY></HTML>