[Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().

Thu Mar 23 12:36:49 MSK 2023

Hi!
Thanks for the patch!

> 
>>From: Mike Pall <mike>
>>
>>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>
>>The accessing of memory address for some operation `emit_rma()` may be
>>encoded in one of the following ways:
>> a. If the offset of the accessing address from the dispatch table
>I suggest paraphrasing it the following way:
>| If the offset of the address being accessed from the dispatch table
> 
>Same for the similar sentences below.
>>    (pinned to r14 that is not changed while trace execution) fits into
>Typo: s/while/during
>>    32-bit, then encode this as an access to 32-bit displacement
>>    relative to r14.
>> b. If the offset of the accessing address from the mcode (i.e. rip)
>>    fits into 32-bit, then encode this as an access to 32-bit
>>    displacement relative to rip (considering long mode specifics and
>>    `RID_RIP` hack).
>> c. If the address doesn't fit into 32-bit one and we use `mov` or
>>    `movsd`, then encode 64-bit load from this address.
>> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>>    into 32-bit one) displacement (the only option for non-GC64 mode).
>>
>>So, each instruction in GC64 mode differs from `mov` or `movsd` should
>Typo: s/differs/that differs/
>>be encoded via the last option. But if we got a 64-bit address with a
>Typo: s/got/get/
>>big enough offset it can't be encoded and the assertion in `ptr2addr()`
>Typo: s/offset/offset,/
>Typo: s/encoded/encoded,
>>will fail.
>Typo: s/will fail/fails/
>>
>>There are several cases, when `emit_rma()` is used with non `mov`
>Typo: s/with non-`mov`/with a non-`mov`/
>>instruction:
>>* `IR_LDEXP` with `fld` instruction for loading constant
>>   number `TValue` by address.
>>* `IR_OBAR` with the corresponding `test` instruction on
>>  `marked` field of `GCobj`.
>>All these instructions require an additional register to store value by
>>address. We can't truly allocate a register here due to possibility to
>>break IR assembling which depends on specific register usage. So, we use
>Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
>>and restore r14 here for emitting.
>>
>>Also, this patch removes `movsd` from condition from the `x86Op` type
>>check, as far as it never uses for the `emit_rma()` routine (see also
>Typo: s/uses/used
>>`emit_loadk64()` for details).
>>
>>Sergey Kaplun:
>>* added the description and the test for the problem
>>
>>Part of tarantool/tarantool#8069
>>---
>>
>>Branch:  https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
>>PR:  https://github.com/tarantool/tarantool/pull/8477
>>Related issue:  https://github.com/tarantool/tarantool/issues/8069
>>
>>AFAICS, other places with `emit_rma()` usage are not related to the
>>patch as far as they take an offset for the address of JIT constants
>>stored in `jit_State`, so it always be near enough to dispatch.
>>
>>Side note: you may check test-correctness of the last check with GC by
>>changing the corresponding condition check on `GC_WHITES` in asm_obar to
>>CC_NZ (like it will be treated for incorrect check). Be carefull, member
>>that instructions are emitted from bottom to top!
>>
>> src/lj_emit_x86.h | 24 ++++-
>> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>> 2 files changed, 123 insertions(+), 3 deletions(-)
>> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>>
>>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
>>index 6b58306b..b3dc4ea5 100644
>>--- a/src/lj_emit_x86.h
>>+++ b/src/lj_emit_x86.h
>>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
>>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
>>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
>>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
>>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
>>- emit_rmro(as, xo, rr, rr, 0);
>>- emit_loadu64(as, rr, (uintptr_t)addr);
>>+ } else if (!checki32((intptr_t)addr)) {
>>+ Reg ra = (rr & 15);
>>+ if (xo != XO_MOV) {
>>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
>>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
>>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
>>+ ra = RID_DISPATCH;
>>+ if (checku32(dispaddr)) {
>>+ emit_loadi(as, ra, (int32_t)dispaddr);
>>+ } else { /* Full-size 64 bit load. */
>>+ MCode *p = as->mcp;
>>+ *(uint64_t *)(p-8) = dispaddr;
>>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
>>+ p[-10] = 0x48 + ((ra>>3)&1);
>Why is it `0x48`?
>>+ p -= 10;
>>+ as->mcp = p;
>>+ }
>>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
>>+ }
>>+ emit_rmro(as, xo, rr, ra, 0);
>>+ emit_loadu64(as, ra, (uintptr_t)addr);
>>   } else
>> #endif
>>   {
>>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
>>new file mode 100644
>>index 00000000..faddfe83
>>--- /dev/null
>>+++ b/test/tarantool-tests/fix-emit-rma.test.lua
>>@@ -0,0 +1,102 @@
>>+local tap = require('tap')
>>+local test = tap.test('fix-emit-rma'):skipcond({
>>+ ['Test requires JIT enabled'] = not jit.status(),
>>+ ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
>>+})
>>+
>>+-- Need to test 2 cases of `emit_rma()` particulary on x64:
>>+-- * `IR_LDEXP` with `fld` instruction for loading constant
>>+-- number `TValue` by address.
>>+-- * `IR_OBAR` with the corresponding `test` instruction on
>>+-- `marked` field of `GCobj`.
>>+-- Also, test correctness.
>>+test:plan(4)
>>+
>>+local ffi = require('ffi')
>>+
>>+collectgarbage()
>>+-- Chomp memory in currently allocated GC space.
>>+collectgarbage('stop')
>>+
>>+for _ = 1, 8 do
>>+ ffi.new('char[?]', 256 * 1024 * 1024)
>>+end
>>+
>>+jit.opt.start('hotloop=1')
>>+
>>+-- Test `IR_LDEXP`.
>>+
>>+-- Reproducer here is a little tricky.
>>+-- We need to generate a bunch of traces as far we reference an
>>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
>>+-- traces is empirical. Usually, assert fails on ~33d iteration,
>>+-- so let's use 100 just to be sure.
>Is there any way to make it more deterministic?
>>+local test_marker
>>+for _ = 1, 100 do
>>+ test_marker = loadstring([[
>>+ local test_marker
>>+ for i = 1, 4 do
>>+ -- Avoid fold optimization, use `i` as the second argument.
>>+ -- Need some constant differs from 1 or 0 as the first
>>+ -- argument.
>>+ test_marker = math.ldexp(1.2, i)
>>+ end
>>+ return test_marker
>>+ ]])()
>>+end
>>+
>>+-- If we here, it means no assertion failed during emitting.
>Typo: s/we/we are/
>>+test:ok(true, 'IR_LDEXP emit_rma')
>>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
>>+
>>+-- Test `IR_OBAR`.
>>+
>>+-- First, create a closed upvalue.
>>+do
>>+ local uv -- luacheck: no unused
>>+ -- `IR_OBAR` is used for object write barrier on upvalues.
>>+ _G.change_uv = function(newv)
>>+ uv = newv
>>+ end
>>+end
>>+
>>+-- We need a constant value on trace to be referenced far enough
>>+-- from dispatch table. So we need to create a new function
>>+-- prototype with a constant string.
>>+-- This string should be long enough to be allocated with direct
>>+-- alloc far away from dispatch.
>>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>Why is that amount sufficient? Link to the source file would be enough.
>>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>>+local func_with_trace = loadstring([[
>>+ for _ = 1, 4 do
>>+ change_uv(']] .. str .. [[')
>>+ end
>>+]])
>>+func_with_trace()
>>+
>>+-- If we here, it means no assertion failed during emitting.
>Typo: s/we here/we are here/
>>+test:ok(true, 'IR_OBAR emit_rma')
>>+
>>+-- Now check the correctness.
>>+
>>+-- Set GC state to GCpause.
>>+collectgarbage()
>>+
>>+-- We want to wait for the situation, when upvalue is black,
>>+-- the string is gray. Both conditions are satisfied, when the
>>+-- corresponding `change_uv()` function is marked, for example.
>>+-- We don't know on what exactly step our upvalue is marked as
>Typo: s/exactly/exact/
>>+-- black and execution of trace become dangerous, so just check it
>>+-- at each step.
>>+-- Don't need to do the full GC cycle step by step.
>>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
>>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
>>+ collectgarbage('step')
>>+ func_with_trace()
>>+end
>>+
>>+-- If we here, it means no assertion failed during `gc_mark()`,
>Typo: s/we here/we are here/
>>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
>>+test:ok(true, 'IR_OBAR emit_rma check correctness')
>>+
>>+os.exit(test:check() and 0 or 1)
>>--
>>2.34.1
>--
>Best regards,
>Maxim Kokryashkin
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20230323/15d1ad12/attachment.htm>