<HTML><BODY><div>Hi!</div><div>Thanks for the patch!</div><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div> <blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_16794738851112354322_BODY">From: Mike Pall <mike><br><br>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)<br><br>The accessing of memory address for some operation `emit_rma()` may be<br>encoded in one of the following ways:<br> a. If the offset of the accessing address from the dispatch table</div></div></div></div></blockquote><div>I suggest paraphrasing it the following way:<br>| If the offset of the address being accessed from the dispatch table<br> </div><div>Same for the similar sentences below.</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div> (pinned to r14 that is not changed while trace execution) fits into</div></div></div></div></blockquote><div>Typo: s/while/during</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div> 32-bit, then encode this as an access to 32-bit displacement<br> relative to r14.<br> b. If the offset of the accessing address from the mcode (i.e. rip)<br> fits into 32-bit, then encode this as an access to 32-bit<br> displacement relative to rip (considering long mode specifics and<br> `RID_RIP` hack).<br> c. If the address doesn't fit into 32-bit one and we use `mov` or<br> `movsd`, then encode 64-bit load from this address.<br> d. Elsewhere, encode it as an access to 32-bit (the address should fit<br> into 32-bit one) displacement (the only option for non-GC64 mode).<br><br>So, each instruction in GC64 mode differs from `mov` or `movsd` should</div></div></div></div></blockquote><div>Typo: s/differs/that differs/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>be encoded via the last option. But if we got a 64-bit address with a</div></div></div></div></blockquote><div>Typo: s/got/get/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>big enough offset it can't be encoded and the assertion in `ptr2addr()`</div></div></div></div></blockquote><div>Typo: s/offset/offset,/</div><div>Typo: s/encoded/encoded,</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>will fail.</div></div></div></div></blockquote><div>Typo: s/will fail/fails/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>There are several cases, when `emit_rma()` is used with non `mov`</div></div></div></div></blockquote><div>Typo: s/with non-`mov`/with a non-`mov`/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>instruction:<br>* `IR_LDEXP` with `fld` instruction for loading constant<br> number `TValue` by address.<br>* `IR_OBAR` with the corresponding `test` instruction on<br> `marked` field of `GCobj`.<br>All these instructions require an additional register to store value by<br>address. We can't truly allocate a register here due to possibility to<br>break IR assembling which depends on specific register usage. So, we use</div></div></div></div></blockquote><div>Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>and restore r14 here for emitting.<br><br>Also, this patch removes `movsd` from condition from the `x86Op` type<br>check, as far as it never uses for the `emit_rma()` routine (see also</div></div></div></div></blockquote><div>Typo: s/uses/used</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>`emit_loadk64()` for details).<br><br>Sergey Kaplun:<br>* added the description and the test for the problem<br><br>Part of tarantool/tarantool#8069<br>---<br><br>Branch: <a href="https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma" target="_blank">https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma</a><br>PR: <a href="https://github.com/tarantool/tarantool/pull/8477" target="_blank">https://github.com/tarantool/tarantool/pull/8477</a><br>Related issue: <a href="https://github.com/tarantool/tarantool/issues/8069" target="_blank">https://github.com/tarantool/tarantool/issues/8069</a><br><br>AFAICS, other places with `emit_rma()` usage are not related to the<br>patch as far as they take an offset for the address of JIT constants<br>stored in `jit_State`, so it always be near enough to dispatch.</div></div></div></div></blockquote><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>Side note: you may check test-correctness of the last check with GC by<br>changing the corresponding condition check on `GC_WHITES` in asm_obar to<br>CC_NZ (like it will be treated for incorrect check). Be carefull, member<br>that instructions are emitted from bottom to top!<br><br> src/lj_emit_x86.h | 24 ++++-<br> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++<br> 2 files changed, 123 insertions(+), 3 deletions(-)<br> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua<br><br>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h<br>index 6b58306b..b3dc4ea5 100644<br>--- a/src/lj_emit_x86.h<br>+++ b/src/lj_emit_x86.h<br>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)<br> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));<br> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {<br> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));<br>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {<br>- emit_rmro(as, xo, rr, rr, 0);<br>- emit_loadu64(as, rr, (uintptr_t)addr);<br>+ } else if (!checki32((intptr_t)addr)) {<br>+ Reg ra = (rr & 15);<br>+ if (xo != XO_MOV) {<br>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */<br>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;<br>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;<br>+ ra = RID_DISPATCH;<br>+ if (checku32(dispaddr)) {<br>+ emit_loadi(as, ra, (int32_t)dispaddr);<br>+ } else { /* Full-size 64 bit load. */<br>+ MCode *p = as->mcp;<br>+ *(uint64_t *)(p-8) = dispaddr;<br>+ p[-9] = (MCode)(XI_MOVri+(ra&7));<br>+ p[-10] = 0x48 + ((ra>>3)&1);</div></div></div></div></blockquote><div>Why is it `0x48`?</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>+ p -= 10;<br>+ as->mcp = p;<br>+ }<br>+ if (xo == XO_GROUP3b) emit_i8(as, i8);<br>+ }<br>+ emit_rmro(as, xo, rr, ra, 0);<br>+ emit_loadu64(as, ra, (uintptr_t)addr);<br> } else<br> #endif<br> {<br>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua<br>new file mode 100644<br>index 00000000..faddfe83<br>--- /dev/null<br>+++ b/test/tarantool-tests/fix-emit-rma.test.lua<br>@@ -0,0 +1,102 @@<br>+local tap = require('tap')<br>+local test = tap.test('fix-emit-rma'):skipcond({<br>+ ['Test requires JIT enabled'] = not jit.status(),<br>+ ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),<br>+})<br>+<br>+-- Need to test 2 cases of `emit_rma()` particulary on x64:<br>+-- * `IR_LDEXP` with `fld` instruction for loading constant<br>+-- number `TValue` by address.<br>+-- * `IR_OBAR` with the corresponding `test` instruction on<br>+-- `marked` field of `GCobj`.<br>+-- Also, test correctness.<br>+test:plan(4)<br>+<br>+local ffi = require('ffi')<br>+<br>+collectgarbage()<br>+-- Chomp memory in currently allocated GC space.<br>+collectgarbage('stop')<br>+<br>+for _ = 1, 8 do<br>+ ffi.new('char[?]', 256 * 1024 * 1024)<br>+end<br>+<br>+jit.opt.start('hotloop=1')<br>+<br>+-- Test `IR_LDEXP`.<br>+<br>+-- Reproducer here is a little tricky.<br>+-- We need to generate a bunch of traces as far we reference an<br>+-- IR field (`TValue`) address in `emit_rma()`. The amount of<br>+-- traces is empirical. Usually, assert fails on ~33d iteration,<br>+-- so let's use 100 just to be sure.</div></div></div></div></blockquote><div>Is there any way to make it more deterministic?</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>+local test_marker<br>+for _ = 1, 100 do<br>+ test_marker = loadstring([[<br>+ local test_marker<br>+ for i = 1, 4 do<br>+ -- Avoid fold optimization, use `i` as the second argument.<br>+ -- Need some constant differs from 1 or 0 as the first<br>+ -- argument.<br>+ test_marker = math.ldexp(1.2, i)<br>+ end<br>+ return test_marker<br>+ ]])()<br>+end<br>+<br>+-- If we here, it means no assertion failed during emitting.</div></div></div></div></blockquote><div>Typo: s/we/we are/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>+test:ok(true, 'IR_LDEXP emit_rma')<br>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')<br>+<br>+-- Test `IR_OBAR`.<br>+<br>+-- First, create a closed upvalue.<br>+do<br>+ local uv -- luacheck: no unused<br>+ -- `IR_OBAR` is used for object write barrier on upvalues.<br>+ _G.change_uv = function(newv)<br>+ uv = newv<br>+ end<br>+end<br>+<br>+-- We need a constant value on trace to be referenced far enough<br>+-- from dispatch table. So we need to create a new function<br>+-- prototype with a constant string.<br>+-- This string should be long enough to be allocated with direct<br>+-- alloc far away from dispatch.<br>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024</div></div></div></div></blockquote><div>Why is that amount sufficient? Link to the source file would be enough.</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)<br>+local func_with_trace = loadstring([[<br>+ for _ = 1, 4 do<br>+ change_uv(']] .. str .. [[')<br>+ end<br>+]])<br>+func_with_trace()<br>+<br>+-- If we here, it means no assertion failed during emitting.</div></div></div></div></blockquote><div>Typo: s/we here/we are here/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>+test:ok(true, 'IR_OBAR emit_rma')<br>+<br>+-- Now check the correctness.<br>+<br>+-- Set GC state to GCpause.<br>+collectgarbage()<br>+<br>+-- We want to wait for the situation, when upvalue is black,<br>+-- the string is gray. Both conditions are satisfied, when the<br>+-- corresponding `change_uv()` function is marked, for example.<br>+-- We don't know on what exactly step our upvalue is marked as</div></div></div></div></blockquote><div>Typo: s/exactly/exact/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>+-- black and execution of trace become dangerous, so just check it<br>+-- at each step.<br>+-- Don't need to do the full GC cycle step by step.<br>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic<br>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do<br>+ collectgarbage('step')<br>+ func_with_trace()<br>+end<br>+<br>+-- If we here, it means no assertion failed during `gc_mark()`,</div></div></div></div></blockquote><div>Typo: s/we here/we are here/</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>+-- due to wrong call to `lj_gc_barrieruv()` on trace.<br>+test:ok(true, 'IR_OBAR emit_rma check correctness')<br>+<br>+os.exit(test:check() and 0 or 1)<br>--<br>2.34.1</div></div></div></div></blockquote><div><div>--<br>Best regards,</div><div>Maxim Kokryashkin</div></div><div> </div></div></blockquote></BODY></HTML>