* [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
@ 2023-03-22 8:27 Sergey Kaplun via Tarantool-patches
2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-03-22 8:27 UTC (permalink / raw)
To: Sergey Ostanevich, Maxim Kokryashkin; +Cc: tarantool-patches
From: Mike Pall <mike>
(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
The accessing of memory address for some operation `emit_rma()` may be
encoded in one of the following ways:
a. If the offset of the accessing address from the dispatch table
(pinned to r14 that is not changed while trace execution) fits into
32-bit, then encode this as an access to 32-bit displacement
relative to r14.
b. If the offset of the accessing address from the mcode (i.e. rip)
fits into 32-bit, then encode this as an access to 32-bit
displacement relative to rip (considering long mode specifics and
`RID_RIP` hack).
c. If the address doesn't fit into 32-bit one and we use `mov` or
`movsd`, then encode 64-bit load from this address.
d. Elsewhere, encode it as an access to 32-bit (the address should fit
into 32-bit one) displacement (the only option for non-GC64 mode).
So, each instruction in GC64 mode differs from `mov` or `movsd` should
be encoded via the last option. But if we got a 64-bit address with a
big enough offset it can't be encoded and the assertion in `ptr2addr()`
will fail.
There are several cases, when `emit_rma()` is used with non `mov`
instruction:
* `IR_LDEXP` with `fld` instruction for loading constant
number `TValue` by address.
* `IR_OBAR` with the corresponding `test` instruction on
`marked` field of `GCobj`.
All these instructions require an additional register to store value by
address. We can't truly allocate a register here due to possibility to
break IR assembling which depends on specific register usage. So, we use
and restore r14 here for emitting.
Also, this patch removes `movsd` from condition from the `x86Op` type
check, as far as it never uses for the `emit_rma()` routine (see also
`emit_loadk64()` for details).
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#8069
---
Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
PR: https://github.com/tarantool/tarantool/pull/8477
Related issue: https://github.com/tarantool/tarantool/issues/8069
AFAICS, other places with `emit_rma()` usage are not related to the
patch as far as they take an offset for the address of JIT constants
stored in `jit_State`, so it always be near enough to dispatch.
Side note: you may check test-correctness of the last check with GC by
changing the corresponding condition check on `GC_WHITES` in asm_obar to
CC_NZ (like it will be treated for incorrect check). Be carefull, member
that instructions are emitted from bottom to top!
src/lj_emit_x86.h | 24 ++++-
test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
2 files changed, 123 insertions(+), 3 deletions(-)
create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
index 6b58306b..b3dc4ea5 100644
--- a/src/lj_emit_x86.h
+++ b/src/lj_emit_x86.h
@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
} else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
- emit_rmro(as, xo, rr, rr, 0);
- emit_loadu64(as, rr, (uintptr_t)addr);
+ } else if (!checki32((intptr_t)addr)) {
+ Reg ra = (rr & 15);
+ if (xo != XO_MOV) {
+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
+ ra = RID_DISPATCH;
+ if (checku32(dispaddr)) {
+ emit_loadi(as, ra, (int32_t)dispaddr);
+ } else { /* Full-size 64 bit load. */
+ MCode *p = as->mcp;
+ *(uint64_t *)(p-8) = dispaddr;
+ p[-9] = (MCode)(XI_MOVri+(ra&7));
+ p[-10] = 0x48 + ((ra>>3)&1);
+ p -= 10;
+ as->mcp = p;
+ }
+ if (xo == XO_GROUP3b) emit_i8(as, i8);
+ }
+ emit_rmro(as, xo, rr, ra, 0);
+ emit_loadu64(as, ra, (uintptr_t)addr);
} else
#endif
{
diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
new file mode 100644
index 00000000..faddfe83
--- /dev/null
+++ b/test/tarantool-tests/fix-emit-rma.test.lua
@@ -0,0 +1,102 @@
+local tap = require('tap')
+local test = tap.test('fix-emit-rma'):skipcond({
+ ['Test requires JIT enabled'] = not jit.status(),
+ ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
+})
+
+-- Need to test 2 cases of `emit_rma()` particulary on x64:
+-- * `IR_LDEXP` with `fld` instruction for loading constant
+-- number `TValue` by address.
+-- * `IR_OBAR` with the corresponding `test` instruction on
+-- `marked` field of `GCobj`.
+-- Also, test correctness.
+test:plan(4)
+
+local ffi = require('ffi')
+
+collectgarbage()
+-- Chomp memory in currently allocated GC space.
+collectgarbage('stop')
+
+for _ = 1, 8 do
+ ffi.new('char[?]', 256 * 1024 * 1024)
+end
+
+jit.opt.start('hotloop=1')
+
+-- Test `IR_LDEXP`.
+
+-- Reproducer here is a little tricky.
+-- We need to generate a bunch of traces as far we reference an
+-- IR field (`TValue`) address in `emit_rma()`. The amount of
+-- traces is empirical. Usually, assert fails on ~33d iteration,
+-- so let's use 100 just to be sure.
+local test_marker
+for _ = 1, 100 do
+ test_marker = loadstring([[
+ local test_marker
+ for i = 1, 4 do
+ -- Avoid fold optimization, use `i` as the second argument.
+ -- Need some constant differs from 1 or 0 as the first
+ -- argument.
+ test_marker = math.ldexp(1.2, i)
+ end
+ return test_marker
+ ]])()
+end
+
+-- If we here, it means no assertion failed during emitting.
+test:ok(true, 'IR_LDEXP emit_rma')
+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
+
+-- Test `IR_OBAR`.
+
+-- First, create a closed upvalue.
+do
+ local uv -- luacheck: no unused
+ -- `IR_OBAR` is used for object write barrier on upvalues.
+ _G.change_uv = function(newv)
+ uv = newv
+ end
+end
+
+-- We need a constant value on trace to be referenced far enough
+-- from dispatch table. So we need to create a new function
+-- prototype with a constant string.
+-- This string should be long enough to be allocated with direct
+-- alloc far away from dispatch.
+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
+local func_with_trace = loadstring([[
+ for _ = 1, 4 do
+ change_uv(']] .. str .. [[')
+ end
+]])
+func_with_trace()
+
+-- If we here, it means no assertion failed during emitting.
+test:ok(true, 'IR_OBAR emit_rma')
+
+-- Now check the correctness.
+
+-- Set GC state to GCpause.
+collectgarbage()
+
+-- We want to wait for the situation, when upvalue is black,
+-- the string is gray. Both conditions are satisfied, when the
+-- corresponding `change_uv()` function is marked, for example.
+-- We don't know on what exactly step our upvalue is marked as
+-- black and execution of trace become dangerous, so just check it
+-- at each step.
+-- Don't need to do the full GC cycle step by step.
+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
+ collectgarbage('step')
+ func_with_trace()
+end
+
+-- If we here, it means no assertion failed during `gc_mark()`,
+-- due to wrong call to `lj_gc_barrieruv()` on trace.
+test:ok(true, 'IR_OBAR emit_rma check correctness')
+
+os.exit(test:check() and 0 or 1)
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
@ 2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches
2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches
2023-04-03 9:02 ` Igor Munkin via Tarantool-patches
2023-05-25 6:16 ` Igor Munkin via Tarantool-patches
2 siblings, 1 reply; 6+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-03-23 9:36 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 8768 bytes --]
Hi!
Thanks for the patch!
>
>>From: Mike Pall <mike>
>>
>>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>
>>The accessing of memory address for some operation `emit_rma()` may be
>>encoded in one of the following ways:
>> a. If the offset of the accessing address from the dispatch table
>I suggest paraphrasing it the following way:
>| If the offset of the address being accessed from the dispatch table
>
>Same for the similar sentences below.
>> (pinned to r14 that is not changed while trace execution) fits into
>Typo: s/while/during
>> 32-bit, then encode this as an access to 32-bit displacement
>> relative to r14.
>> b. If the offset of the accessing address from the mcode (i.e. rip)
>> fits into 32-bit, then encode this as an access to 32-bit
>> displacement relative to rip (considering long mode specifics and
>> `RID_RIP` hack).
>> c. If the address doesn't fit into 32-bit one and we use `mov` or
>> `movsd`, then encode 64-bit load from this address.
>> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>> into 32-bit one) displacement (the only option for non-GC64 mode).
>>
>>So, each instruction in GC64 mode differs from `mov` or `movsd` should
>Typo: s/differs/that differs/
>>be encoded via the last option. But if we got a 64-bit address with a
>Typo: s/got/get/
>>big enough offset it can't be encoded and the assertion in `ptr2addr()`
>Typo: s/offset/offset,/
>Typo: s/encoded/encoded,
>>will fail.
>Typo: s/will fail/fails/
>>
>>There are several cases, when `emit_rma()` is used with non `mov`
>Typo: s/with non-`mov`/with a non-`mov`/
>>instruction:
>>* `IR_LDEXP` with `fld` instruction for loading constant
>> number `TValue` by address.
>>* `IR_OBAR` with the corresponding `test` instruction on
>> `marked` field of `GCobj`.
>>All these instructions require an additional register to store value by
>>address. We can't truly allocate a register here due to possibility to
>>break IR assembling which depends on specific register usage. So, we use
>Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
>>and restore r14 here for emitting.
>>
>>Also, this patch removes `movsd` from condition from the `x86Op` type
>>check, as far as it never uses for the `emit_rma()` routine (see also
>Typo: s/uses/used
>>`emit_loadk64()` for details).
>>
>>Sergey Kaplun:
>>* added the description and the test for the problem
>>
>>Part of tarantool/tarantool#8069
>>---
>>
>>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
>>PR: https://github.com/tarantool/tarantool/pull/8477
>>Related issue: https://github.com/tarantool/tarantool/issues/8069
>>
>>AFAICS, other places with `emit_rma()` usage are not related to the
>>patch as far as they take an offset for the address of JIT constants
>>stored in `jit_State`, so it always be near enough to dispatch.
>>
>>Side note: you may check test-correctness of the last check with GC by
>>changing the corresponding condition check on `GC_WHITES` in asm_obar to
>>CC_NZ (like it will be treated for incorrect check). Be carefull, member
>>that instructions are emitted from bottom to top!
>>
>> src/lj_emit_x86.h | 24 ++++-
>> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>> 2 files changed, 123 insertions(+), 3 deletions(-)
>> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>>
>>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
>>index 6b58306b..b3dc4ea5 100644
>>--- a/src/lj_emit_x86.h
>>+++ b/src/lj_emit_x86.h
>>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
>> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
>> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
>> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
>>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
>>- emit_rmro(as, xo, rr, rr, 0);
>>- emit_loadu64(as, rr, (uintptr_t)addr);
>>+ } else if (!checki32((intptr_t)addr)) {
>>+ Reg ra = (rr & 15);
>>+ if (xo != XO_MOV) {
>>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
>>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
>>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
>>+ ra = RID_DISPATCH;
>>+ if (checku32(dispaddr)) {
>>+ emit_loadi(as, ra, (int32_t)dispaddr);
>>+ } else { /* Full-size 64 bit load. */
>>+ MCode *p = as->mcp;
>>+ *(uint64_t *)(p-8) = dispaddr;
>>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
>>+ p[-10] = 0x48 + ((ra>>3)&1);
>Why is it `0x48`?
>>+ p -= 10;
>>+ as->mcp = p;
>>+ }
>>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
>>+ }
>>+ emit_rmro(as, xo, rr, ra, 0);
>>+ emit_loadu64(as, ra, (uintptr_t)addr);
>> } else
>> #endif
>> {
>>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
>>new file mode 100644
>>index 00000000..faddfe83
>>--- /dev/null
>>+++ b/test/tarantool-tests/fix-emit-rma.test.lua
>>@@ -0,0 +1,102 @@
>>+local tap = require('tap')
>>+local test = tap.test('fix-emit-rma'):skipcond({
>>+ ['Test requires JIT enabled'] = not jit.status(),
>>+ ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
>>+})
>>+
>>+-- Need to test 2 cases of `emit_rma()` particulary on x64:
>>+-- * `IR_LDEXP` with `fld` instruction for loading constant
>>+-- number `TValue` by address.
>>+-- * `IR_OBAR` with the corresponding `test` instruction on
>>+-- `marked` field of `GCobj`.
>>+-- Also, test correctness.
>>+test:plan(4)
>>+
>>+local ffi = require('ffi')
>>+
>>+collectgarbage()
>>+-- Chomp memory in currently allocated GC space.
>>+collectgarbage('stop')
>>+
>>+for _ = 1, 8 do
>>+ ffi.new('char[?]', 256 * 1024 * 1024)
>>+end
>>+
>>+jit.opt.start('hotloop=1')
>>+
>>+-- Test `IR_LDEXP`.
>>+
>>+-- Reproducer here is a little tricky.
>>+-- We need to generate a bunch of traces as far we reference an
>>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
>>+-- traces is empirical. Usually, assert fails on ~33d iteration,
>>+-- so let's use 100 just to be sure.
>Is there any way to make it more deterministic?
>>+local test_marker
>>+for _ = 1, 100 do
>>+ test_marker = loadstring([[
>>+ local test_marker
>>+ for i = 1, 4 do
>>+ -- Avoid fold optimization, use `i` as the second argument.
>>+ -- Need some constant differs from 1 or 0 as the first
>>+ -- argument.
>>+ test_marker = math.ldexp(1.2, i)
>>+ end
>>+ return test_marker
>>+ ]])()
>>+end
>>+
>>+-- If we here, it means no assertion failed during emitting.
>Typo: s/we/we are/
>>+test:ok(true, 'IR_LDEXP emit_rma')
>>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
>>+
>>+-- Test `IR_OBAR`.
>>+
>>+-- First, create a closed upvalue.
>>+do
>>+ local uv -- luacheck: no unused
>>+ -- `IR_OBAR` is used for object write barrier on upvalues.
>>+ _G.change_uv = function(newv)
>>+ uv = newv
>>+ end
>>+end
>>+
>>+-- We need a constant value on trace to be referenced far enough
>>+-- from dispatch table. So we need to create a new function
>>+-- prototype with a constant string.
>>+-- This string should be long enough to be allocated with direct
>>+-- alloc far away from dispatch.
>>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>Why is that amount sufficient? Link to the source file would be enough.
>>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>>+local func_with_trace = loadstring([[
>>+ for _ = 1, 4 do
>>+ change_uv(']] .. str .. [[')
>>+ end
>>+]])
>>+func_with_trace()
>>+
>>+-- If we here, it means no assertion failed during emitting.
>Typo: s/we here/we are here/
>>+test:ok(true, 'IR_OBAR emit_rma')
>>+
>>+-- Now check the correctness.
>>+
>>+-- Set GC state to GCpause.
>>+collectgarbage()
>>+
>>+-- We want to wait for the situation, when upvalue is black,
>>+-- the string is gray. Both conditions are satisfied, when the
>>+-- corresponding `change_uv()` function is marked, for example.
>>+-- We don't know on what exactly step our upvalue is marked as
>Typo: s/exactly/exact/
>>+-- black and execution of trace become dangerous, so just check it
>>+-- at each step.
>>+-- Don't need to do the full GC cycle step by step.
>>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
>>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
>>+ collectgarbage('step')
>>+ func_with_trace()
>>+end
>>+
>>+-- If we here, it means no assertion failed during `gc_mark()`,
>Typo: s/we here/we are here/
>>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
>>+test:ok(true, 'IR_OBAR emit_rma check correctness')
>>+
>>+os.exit(test:check() and 0 or 1)
>>--
>>2.34.1
>--
>Best regards,
>Maxim Kokryashkin
>
[-- Attachment #2: Type: text/html, Size: 13053 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches
2023-03-24 14:42 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-03-24 13:05 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi, Maxim!
Thanks for the review!
I've updated the commit message and comments, force-pushed the branch.
On 23.03.23, Maxim Kokryashkin wrote:
>
> Hi!
> Thanks for the patch!
>
> >
> >>From: Mike Pall <mike>
> >>
> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
> >>
> >>The accessing of memory address for some operation `emit_rma()` may be
> >>encoded in one of the following ways:
> >> a. If the offset of the accessing address from the dispatch table
> >I suggest paraphrasing it the following way:
> >| If the offset of the address being accessed from the dispatch table
> >
> >Same for the similar sentences below.
> >> (pinned to r14 that is not changed while trace execution) fits into
> >Typo: s/while/during
> >> 32-bit, then encode this as an access to 32-bit displacement
> >> relative to r14.
> >> b. If the offset of the accessing address from the mcode (i.e. rip)
> >> fits into 32-bit, then encode this as an access to 32-bit
> >> displacement relative to rip (considering long mode specifics and
> >> `RID_RIP` hack).
> >> c. If the address doesn't fit into 32-bit one and we use `mov` or
> >> `movsd`, then encode 64-bit load from this address.
> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit
> >> into 32-bit one) displacement (the only option for non-GC64 mode).
> >>
> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should
> >Typo: s/differs/that differs/
> >>be encoded via the last option. But if we got a 64-bit address with a
> >Typo: s/got/get/
> >>big enough offset it can't be encoded and the assertion in `ptr2addr()`
> >Typo: s/offset/offset,/
> >Typo: s/encoded/encoded,
> >>will fail.
> >Typo: s/will fail/fails/
> >>
> >>There are several cases, when `emit_rma()` is used with non `mov`
> >Typo: s/with non-`mov`/with a non-`mov`/
> >>instruction:
> >>* `IR_LDEXP` with `fld` instruction for loading constant
> >> number `TValue` by address.
> >>* `IR_OBAR` with the corresponding `test` instruction on
> >> `marked` field of `GCobj`.
> >>All these instructions require an additional register to store value by
> >>address. We can't truly allocate a register here due to possibility to
> >>break IR assembling which depends on specific register usage. So, we use
> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
> >>and restore r14 here for emitting.
> >>
> >>Also, this patch removes `movsd` from condition from the `x86Op` type
> >>check, as far as it never uses for the `emit_rma()` routine (see also
> >Typo: s/uses/used
> >>`emit_loadk64()` for details).
> >>
> >>Sergey Kaplun:
> >>* added the description and the test for the problem
> >>
> >>Part of tarantool/tarantool#8069
> >>---
Fixed the commit message; new commit message is the following:
x64/LJ_GC64: Fix emit_rma().
(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
```
The accessing of memory address for some operation `emit_rma()` may be
encoded in one of the following ways:
a. If the offset of the address being accessed from the dispatch table
(pinned to r14 that is not changed during trace execution) fits into
32-bit, then encode this as an access to 32-bit displacement
relative to r14.
b. If the offset of the address being accessed from the mcode (i.e.
rip) fits into 32-bit, then encode this as an access to 32-bit
displacement relative to rip (considering long mode specifics and
`RID_RIP` hack).
c. If the address doesn't fit into 32-bit one and we use `mov` or
`movsd`, then encode 64-bit load from this address.
d. Elsewhere, encode it as an access to 32-bit (the address should fit
into 32-bit one) displacement (the only option for non-GC64 mode).
So, each instruction in GC64 mode that differs from `mov` or `movsd`
should be encoded via the last option. But if we get a 64-bit address
with a big enough offset, it can't be encoded, so the assertion in
`ptr2addr()` fails.
There are several cases, when `emit_rma()` is used with a non-`mov`
instruction:
* `IR_LDEXP` with `fld` instruction for loading constant
number `TValue` by address.
* `IR_OBAR` with the corresponding `test` instruction on
`marked` field of `GCobj`.
All these instructions require an additional register to store value by
address. We can't truly allocate a register here due to the possibility
to break IR assembling which depends on specific register usage. So, we
use and restore r14 here for emitting.
Also, this patch removes `movsd` from condition from the `x86Op` type
check, as far as it is never used for the `emit_rma()` routine (see also
`emit_loadk64()` for details).
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#8069
```
> >>
> >>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
> >>PR: https://github.com/tarantool/tarantool/pull/8477
> >>Related issue: https://github.com/tarantool/tarantool/issues/8069
> >>
> >>AFAICS, other places with `emit_rma()` usage are not related to the
> >>patch as far as they take an offset for the address of JIT constants
> >>stored in `jit_State`, so it always be near enough to dispatch.
> >>
> >>Side note: you may check test-correctness of the last check with GC by
> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to
> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member
> >>that instructions are emitted from bottom to top!
> >>
> >> src/lj_emit_x86.h | 24 ++++-
> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
> >> 2 files changed, 123 insertions(+), 3 deletions(-)
> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
> >>
> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
> >>index 6b58306b..b3dc4ea5 100644
> >>--- a/src/lj_emit_x86.h
> >>+++ b/src/lj_emit_x86.h
> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
> >> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
> >> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
> >> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
> >>- emit_rmro(as, xo, rr, rr, 0);
> >>- emit_loadu64(as, rr, (uintptr_t)addr);
> >>+ } else if (!checki32((intptr_t)addr)) {
> >>+ Reg ra = (rr & 15);
> >>+ if (xo != XO_MOV) {
> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
> >>+ ra = RID_DISPATCH;
> >>+ if (checku32(dispaddr)) {
> >>+ emit_loadi(as, ra, (int32_t)dispaddr);
> >>+ } else { /* Full-size 64 bit load. */
> >>+ MCode *p = as->mcp;
> >>+ *(uint64_t *)(p-8) = dispaddr;
> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
> >>+ p[-10] = 0x48 + ((ra>>3)&1);
> >Why is it `0x48`?
IINM, this is REX prefix for the instruction [1], Sergos should correct
me if I wrong.
| perl -E 'say sprintf "%08b", 0x48'
| 01001000
****WRXB
****: Fixed bit pattern.
W bit is set to mark that a 64-bit operand size is used.
This is the same approach as for `emit_loadu64()`, so I didn't left any comment.
> >>+ p -= 10;
> >>+ as->mcp = p;
> >>+ }
> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
> >>+ }
> >>+ emit_rmro(as, xo, rr, ra, 0);
> >>+ emit_loadu64(as, ra, (uintptr_t)addr);
> >> } else
> >> #endif
> >> {
> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
> >>new file mode 100644
> >>index 00000000..faddfe83
> >>--- /dev/null
> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua
<snipped>
> >>+-- Test `IR_LDEXP`.
> >>+
> >>+-- Reproducer here is a little tricky.
> >>+-- We need to generate a bunch of traces as far we reference an
> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
> >>+-- traces is empirical. Usually, assert fails on ~33d iteration,
> >>+-- so let's use 100 just to be sure.
> >Is there any way to make it more deterministic?
I suppose no, due to undetermenistic memory allocation. So, just
use big enough number.
> >>+local test_marker
> >>+for _ = 1, 100 do
> >>+ test_marker = loadstring([[
> >>+ local test_marker
> >>+ for i = 1, 4 do
> >>+ -- Avoid fold optimization, use `i` as the second argument.
> >>+ -- Need some constant differs from 1 or 0 as the first
> >>+ -- argument.
> >>+ test_marker = math.ldexp(1.2, i)
> >>+ end
> >>+ return test_marker
> >>+ ]])()
> >>+end
> >>+
> >>+-- If we here, it means no assertion failed during emitting.
> >Typo: s/we/we are/
Fixed, thanks!
> >>+test:ok(true, 'IR_LDEXP emit_rma')
> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
> >>+
> >>+-- Test `IR_OBAR`.
> >>+
> >>+-- First, create a closed upvalue.
> >>+do
> >>+ local uv -- luacheck: no unused
> >>+ -- `IR_OBAR` is used for object write barrier on upvalues.
> >>+ _G.change_uv = function(newv)
> >>+ uv = newv
> >>+ end
> >>+end
> >>+
> >>+-- We need a constant value on trace to be referenced far enough
> >>+-- from dispatch table. So we need to create a new function
> >>+-- prototype with a constant string.
> >>+-- This string should be long enough to be allocated with direct
> >>+-- alloc far away from dispatch.
> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
> >Why is that amount sufficient? Link to the source file would be enough.
Added the following comment.
```
@@ -64,7 +64,8 @@ end
-- from dispatch table. So we need to create a new function
-- prototype with a constant string.
-- This string should be long enough to be allocated with direct
--- alloc far away from dispatch.
+-- alloc (not fitting in free chunks) far away from dispatch.
+-- See <src/lj_alloc.c> for details.
local DEFAULT_MMAP_THRESHOLD = 128 * 1024
local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
local func_with_trace = loadstring([[
@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[
]])
```
> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
> >>+local func_with_trace = loadstring([[
> >>+ for _ = 1, 4 do
> >>+ change_uv(']] .. str .. [[')
> >>+ end
> >>+]])
> >>+func_with_trace()
> >>+
> >>+-- If we here, it means no assertion failed during emitting.
> >Typo: s/we here/we are here/
Fixed, thanks!
> >>+test:ok(true, 'IR_OBAR emit_rma')
> >>+
> >>+-- Now check the correctness.
> >>+
> >>+-- Set GC state to GCpause.
> >>+collectgarbage()
> >>+
> >>+-- We want to wait for the situation, when upvalue is black,
> >>+-- the string is gray. Both conditions are satisfied, when the
> >>+-- corresponding `change_uv()` function is marked, for example.
> >>+-- We don't know on what exactly step our upvalue is marked as
> >Typo: s/exactly/exact/
Fixed, thanks!
> >>+-- black and execution of trace become dangerous, so just check it
> >>+-- at each step.
> >>+-- Don't need to do the full GC cycle step by step.
> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
> >>+ collectgarbage('step')
> >>+ func_with_trace()
> >>+end
> >>+
> >>+-- If we here, it means no assertion failed during `gc_mark()`,
> >Typo: s/we here/we are here/
Fixed, thanks!
> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
> >>+test:ok(true, 'IR_OBAR emit_rma check correctness')
> >>+
> >>+os.exit(test:check() and 0 or 1)
> >>--
> >>2.34.1
> >--
> >Best regards,
> >Maxim Kokryashkin
> >
[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches
@ 2023-03-24 14:42 ` Maxim Kokryashkin via Tarantool-patches
0 siblings, 0 replies; 6+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-03-24 14:42 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 12477 bytes --]
Hi!
Thanks for the fixes!
LGTM.
--
Best regards,
Maxim Kokryashkin
>
>>Hi, Maxim!
>>Thanks for the review!
>>I've updated the commit message and comments, force-pushed the branch.
>>
>>On 23.03.23, Maxim Kokryashkin wrote:
>>>
>>> Hi!
>>> Thanks for the patch!
>>>
>>> >
>>> >>From: Mike Pall <mike>
>>> >>
>>> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>> >>
>>> >>The accessing of memory address for some operation `emit_rma()` may be
>>> >>encoded in one of the following ways:
>>> >> a. If the offset of the accessing address from the dispatch table
>>> >I suggest paraphrasing it the following way:
>>> >| If the offset of the address being accessed from the dispatch table
>>> >
>>> >Same for the similar sentences below.
>>> >> (pinned to r14 that is not changed while trace execution) fits into
>>> >Typo: s/while/during
>>> >> 32-bit, then encode this as an access to 32-bit displacement
>>> >> relative to r14.
>>> >> b. If the offset of the accessing address from the mcode (i.e. rip)
>>> >> fits into 32-bit, then encode this as an access to 32-bit
>>> >> displacement relative to rip (considering long mode specifics and
>>> >> `RID_RIP` hack).
>>> >> c. If the address doesn't fit into 32-bit one and we use `mov` or
>>> >> `movsd`, then encode 64-bit load from this address.
>>> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>>> >> into 32-bit one) displacement (the only option for non-GC64 mode).
>>> >>
>>> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should
>>> >Typo: s/differs/that differs/
>>> >>be encoded via the last option. But if we got a 64-bit address with a
>>> >Typo: s/got/get/
>>> >>big enough offset it can't be encoded and the assertion in `ptr2addr()`
>>> >Typo: s/offset/offset,/
>>> >Typo: s/encoded/encoded,
>>> >>will fail.
>>> >Typo: s/will fail/fails/
>>> >>
>>> >>There are several cases, when `emit_rma()` is used with non `mov`
>>> >Typo: s/with non-`mov`/with a non-`mov`/
>>> >>instruction:
>>> >>* `IR_LDEXP` with `fld` instruction for loading constant
>>> >> number `TValue` by address.
>>> >>* `IR_OBAR` with the corresponding `test` instruction on
>>> >> `marked` field of `GCobj`.
>>> >>All these instructions require an additional register to store value by
>>> >>address. We can't truly allocate a register here due to possibility to
>>> >>break IR assembling which depends on specific register usage. So, we use
>>> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
>>> >>and restore r14 here for emitting.
>>> >>
>>> >>Also, this patch removes `movsd` from condition from the `x86Op` type
>>> >>check, as far as it never uses for the `emit_rma()` routine (see also
>>> >Typo: s/uses/used
>>> >>`emit_loadk64()` for details).
>>> >>
>>> >>Sergey Kaplun:
>>> >>* added the description and the test for the problem
>>> >>
>>> >>Part of tarantool/tarantool#8069
>>> >>---
>>
>>Fixed the commit message; new commit message is the following:
>>
>>x64/LJ_GC64: Fix emit_rma().
>>
>>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>
>>```
>>The accessing of memory address for some operation `emit_rma()` may be
>>encoded in one of the following ways:
>> a. If the offset of the address being accessed from the dispatch table
>> (pinned to r14 that is not changed during trace execution) fits into
>> 32-bit, then encode this as an access to 32-bit displacement
>> relative to r14.
>> b. If the offset of the address being accessed from the mcode (i.e.
>> rip) fits into 32-bit, then encode this as an access to 32-bit
>> displacement relative to rip (considering long mode specifics and
>> `RID_RIP` hack).
>> c. If the address doesn't fit into 32-bit one and we use `mov` or
>> `movsd`, then encode 64-bit load from this address.
>> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>> into 32-bit one) displacement (the only option for non-GC64 mode).
>>
>>So, each instruction in GC64 mode that differs from `mov` or `movsd`
>>should be encoded via the last option. But if we get a 64-bit address
>>with a big enough offset, it can't be encoded, so the assertion in
>>`ptr2addr()` fails.
>>
>>There are several cases, when `emit_rma()` is used with a non-`mov`
>>instruction:
>>* `IR_LDEXP` with `fld` instruction for loading constant
>> number `TValue` by address.
>>* `IR_OBAR` with the corresponding `test` instruction on
>> `marked` field of `GCobj`.
>>All these instructions require an additional register to store value by
>>address. We can't truly allocate a register here due to the possibility
>>to break IR assembling which depends on specific register usage. So, we
>>use and restore r14 here for emitting.
>>
>>Also, this patch removes `movsd` from condition from the `x86Op` type
>>check, as far as it is never used for the `emit_rma()` routine (see also
>>`emit_loadk64()` for details).
>>
>>Sergey Kaplun:
>>* added the description and the test for the problem
>>
>>Part of tarantool/tarantool#8069
>>```
>>
>>> >>
>>> >>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
>>> >>PR: https://github.com/tarantool/tarantool/pull/8477
>>> >>Related issue: https://github.com/tarantool/tarantool/issues/8069
>>> >>
>>> >>AFAICS, other places with `emit_rma()` usage are not related to the
>>> >>patch as far as they take an offset for the address of JIT constants
>>> >>stored in `jit_State`, so it always be near enough to dispatch.
>>> >>
>>> >>Side note: you may check test-correctness of the last check with GC by
>>> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to
>>> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member
>>> >>that instructions are emitted from bottom to top!
>>> >>
>>> >> src/lj_emit_x86.h | 24 ++++-
>>> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>>> >> 2 files changed, 123 insertions(+), 3 deletions(-)
>>> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>>> >>
>>> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
>>> >>index 6b58306b..b3dc4ea5 100644
>>> >>--- a/src/lj_emit_x86.h
>>> >>+++ b/src/lj_emit_x86.h
>>> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
>>> >> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
>>> >> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
>>> >> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
>>> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
>>> >>- emit_rmro(as, xo, rr, rr, 0);
>>> >>- emit_loadu64(as, rr, (uintptr_t)addr);
>>> >>+ } else if (!checki32((intptr_t)addr)) {
>>> >>+ Reg ra = (rr & 15);
>>> >>+ if (xo != XO_MOV) {
>>> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
>>> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
>>> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
>>> >>+ ra = RID_DISPATCH;
>>> >>+ if (checku32(dispaddr)) {
>>> >>+ emit_loadi(as, ra, (int32_t)dispaddr);
>>> >>+ } else { /* Full-size 64 bit load. */
>>> >>+ MCode *p = as->mcp;
>>> >>+ *(uint64_t *)(p-8) = dispaddr;
>>> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
>>> >>+ p[-10] = 0x48 + ((ra>>3)&1);
>>> >Why is it `0x48`?
>>
>>IINM, this is REX prefix for the instruction [1], Sergos should correct
>>me if I wrong.
>>
>>| perl -E 'say sprintf "%08b", 0x48'
>>| 01001000
>> ****WRXB
>>****: Fixed bit pattern.
>>W bit is set to mark that a 64-bit operand size is used.
>>
>>This is the same approach as for `emit_loadu64()`, so I didn't left any comment.
>>
>>> >>+ p -= 10;
>>> >>+ as->mcp = p;
>>> >>+ }
>>> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
>>> >>+ }
>>> >>+ emit_rmro(as, xo, rr, ra, 0);
>>> >>+ emit_loadu64(as, ra, (uintptr_t)addr);
>>> >> } else
>>> >> #endif
>>> >> {
>>> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
>>> >>new file mode 100644
>>> >>index 00000000..faddfe83
>>> >>--- /dev/null
>>> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua
>>
>><snipped>
>>
>>> >>+-- Test `IR_LDEXP`.
>>> >>+
>>> >>+-- Reproducer here is a little tricky.
>>> >>+-- We need to generate a bunch of traces as far we reference an
>>> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
>>> >>+-- traces is empirical. Usually, assert fails on ~33d iteration,
>>> >>+-- so let's use 100 just to be sure.
>>> >Is there any way to make it more deterministic?
>>
>>I suppose no, due to undetermenistic memory allocation. So, just
>>use big enough number.
>>
>>> >>+local test_marker
>>> >>+for _ = 1, 100 do
>>> >>+ test_marker = loadstring([[
>>> >>+ local test_marker
>>> >>+ for i = 1, 4 do
>>> >>+ -- Avoid fold optimization, use `i` as the second argument.
>>> >>+ -- Need some constant differs from 1 or 0 as the first
>>> >>+ -- argument.
>>> >>+ test_marker = math.ldexp(1.2, i)
>>> >>+ end
>>> >>+ return test_marker
>>> >>+ ]])()
>>> >>+end
>>> >>+
>>> >>+-- If we here, it means no assertion failed during emitting.
>>> >Typo: s/we/we are/
>>
>>Fixed, thanks!
>>
>>> >>+test:ok(true, 'IR_LDEXP emit_rma')
>>> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
>>> >>+
>>> >>+-- Test `IR_OBAR`.
>>> >>+
>>> >>+-- First, create a closed upvalue.
>>> >>+do
>>> >>+ local uv -- luacheck: no unused
>>> >>+ -- `IR_OBAR` is used for object write barrier on upvalues.
>>> >>+ _G.change_uv = function(newv)
>>> >>+ uv = newv
>>> >>+ end
>>> >>+end
>>> >>+
>>> >>+-- We need a constant value on trace to be referenced far enough
>>> >>+-- from dispatch table. So we need to create a new function
>>> >>+-- prototype with a constant string.
>>> >>+-- This string should be long enough to be allocated with direct
>>> >>+-- alloc far away from dispatch.
>>> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>>> >Why is that amount sufficient? Link to the source file would be enough.
>>
>>Added the following comment.
>>```
>>@@ -64,7 +64,8 @@ end
>> -- from dispatch table. So we need to create a new function
>> -- prototype with a constant string.
>> -- This string should be long enough to be allocated with direct
>>--- alloc far away from dispatch.
>>+-- alloc (not fitting in free chunks) far away from dispatch.
>>+-- See <src/lj_alloc.c> for details.
>> local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>> local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>> local func_with_trace = loadstring([[
>>@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[
>> ]])
>>```
>>
>>> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>>> >>+local func_with_trace = loadstring([[
>>> >>+ for _ = 1, 4 do
>>> >>+ change_uv(']] .. str .. [[')
>>> >>+ end
>>> >>+]])
>>> >>+func_with_trace()
>>> >>+
>>> >>+-- If we here, it means no assertion failed during emitting.
>>> >Typo: s/we here/we are here/
>>
>>Fixed, thanks!
>>
>>> >>+test:ok(true, 'IR_OBAR emit_rma')
>>> >>+
>>> >>+-- Now check the correctness.
>>> >>+
>>> >>+-- Set GC state to GCpause.
>>> >>+collectgarbage()
>>> >>+
>>> >>+-- We want to wait for the situation, when upvalue is black,
>>> >>+-- the string is gray. Both conditions are satisfied, when the
>>> >>+-- corresponding `change_uv()` function is marked, for example.
>>> >>+-- We don't know on what exactly step our upvalue is marked as
>>> >Typo: s/exactly/exact/
>>
>>Fixed, thanks!
>>
>>> >>+-- black and execution of trace become dangerous, so just check it
>>> >>+-- at each step.
>>> >>+-- Don't need to do the full GC cycle step by step.
>>> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
>>> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
>>> >>+ collectgarbage('step')
>>> >>+ func_with_trace()
>>> >>+end
>>> >>+
>>> >>+-- If we here, it means no assertion failed during `gc_mark()`,
>>> >Typo: s/we here/we are here/
>>
>>Fixed, thanks!
>>
>>> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
>>> >>+test:ok(true, 'IR_OBAR emit_rma check correctness')
>>> >>+
>>> >>+os.exit(test:check() and 0 or 1)
>>> >>--
>>> >>2.34.1
>>> >--
>>> >Best regards,
>>> >Maxim Kokryashkin
>>> >
>>
>>[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
>>
>>--
>>Best regards,
>>Sergey Kaplun
>
[-- Attachment #2: Type: text/html, Size: 16034 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-04-03 9:02 ` Igor Munkin via Tarantool-patches
2023-05-25 6:16 ` Igor Munkin via Tarantool-patches
2 siblings, 0 replies; 6+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2023-04-03 9:02 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Sergey,
Thanks for the patch! LGTM, considering the fixes for Max comments.
On 22.03.23, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>
> The accessing of memory address for some operation `emit_rma()` may be
> encoded in one of the following ways:
> a. If the offset of the accessing address from the dispatch table
> (pinned to r14 that is not changed while trace execution) fits into
> 32-bit, then encode this as an access to 32-bit displacement
> relative to r14.
> b. If the offset of the accessing address from the mcode (i.e. rip)
> fits into 32-bit, then encode this as an access to 32-bit
> displacement relative to rip (considering long mode specifics and
> `RID_RIP` hack).
> c. If the address doesn't fit into 32-bit one and we use `mov` or
> `movsd`, then encode 64-bit load from this address.
> d. Elsewhere, encode it as an access to 32-bit (the address should fit
> into 32-bit one) displacement (the only option for non-GC64 mode).
>
> So, each instruction in GC64 mode differs from `mov` or `movsd` should
> be encoded via the last option. But if we got a 64-bit address with a
> big enough offset it can't be encoded and the assertion in `ptr2addr()`
> will fail.
>
> There are several cases, when `emit_rma()` is used with non `mov`
> instruction:
> * `IR_LDEXP` with `fld` instruction for loading constant
> number `TValue` by address.
> * `IR_OBAR` with the corresponding `test` instruction on
> `marked` field of `GCobj`.
> All these instructions require an additional register to store value by
> address. We can't truly allocate a register here due to possibility to
> break IR assembling which depends on specific register usage. So, we use
> and restore r14 here for emitting.
>
> Also, this patch removes `movsd` from condition from the `x86Op` type
> check, as far as it never uses for the `emit_rma()` routine (see also
> `emit_loadk64()` for details).
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8069
> ---
>
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
> PR: https://github.com/tarantool/tarantool/pull/8477
> Related issue: https://github.com/tarantool/tarantool/issues/8069
>
> AFAICS, other places with `emit_rma()` usage are not related to the
> patch as far as they take an offset for the address of JIT constants
> stored in `jit_State`, so it always be near enough to dispatch.
>
> Side note: you may check test-correctness of the last check with GC by
> changing the corresponding condition check on `GC_WHITES` in asm_obar to
> CC_NZ (like it will be treated for incorrect check). Be carefull, member
> that instructions are emitted from bottom to top!
>
> src/lj_emit_x86.h | 24 ++++-
> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
> 2 files changed, 123 insertions(+), 3 deletions(-)
> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>
<snipped>
> --
> 2.34.1
>
--
Best regards,
IM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches
2023-04-03 9:02 ` Igor Munkin via Tarantool-patches
@ 2023-05-25 6:16 ` Igor Munkin via Tarantool-patches
2 siblings, 0 replies; 6+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2023-05-25 6:16 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
Sergey,
I've checked the patchset into all long-term branches in
tarantool/luajit and bumped a new version in master, 2.11 and 2.10.
On 22.03.23, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
>
> (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>
> The accessing of memory address for some operation `emit_rma()` may be
> encoded in one of the following ways:
> a. If the offset of the accessing address from the dispatch table
> (pinned to r14 that is not changed while trace execution) fits into
> 32-bit, then encode this as an access to 32-bit displacement
> relative to r14.
> b. If the offset of the accessing address from the mcode (i.e. rip)
> fits into 32-bit, then encode this as an access to 32-bit
> displacement relative to rip (considering long mode specifics and
> `RID_RIP` hack).
> c. If the address doesn't fit into 32-bit one and we use `mov` or
> `movsd`, then encode 64-bit load from this address.
> d. Elsewhere, encode it as an access to 32-bit (the address should fit
> into 32-bit one) displacement (the only option for non-GC64 mode).
>
> So, each instruction in GC64 mode differs from `mov` or `movsd` should
> be encoded via the last option. But if we got a 64-bit address with a
> big enough offset it can't be encoded and the assertion in `ptr2addr()`
> will fail.
>
> There are several cases, when `emit_rma()` is used with non `mov`
> instruction:
> * `IR_LDEXP` with `fld` instruction for loading constant
> number `TValue` by address.
> * `IR_OBAR` with the corresponding `test` instruction on
> `marked` field of `GCobj`.
> All these instructions require an additional register to store value by
> address. We can't truly allocate a register here due to possibility to
> break IR assembling which depends on specific register usage. So, we use
> and restore r14 here for emitting.
>
> Also, this patch removes `movsd` from condition from the `x86Op` type
> check, as far as it never uses for the `emit_rma()` routine (see also
> `emit_loadk64()` for details).
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#8069
> ---
>
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
> PR: https://github.com/tarantool/tarantool/pull/8477
> Related issue: https://github.com/tarantool/tarantool/issues/8069
>
> AFAICS, other places with `emit_rma()` usage are not related to the
> patch as far as they take an offset for the address of JIT constants
> stored in `jit_State`, so it always be near enough to dispatch.
>
> Side note: you may check test-correctness of the last check with GC by
> changing the corresponding condition check on `GC_WHITES` in asm_obar to
> CC_NZ (like it will be treated for incorrect check). Be carefull, member
> that instructions are emitted from bottom to top!
>
> src/lj_emit_x86.h | 24 ++++-
> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
> 2 files changed, 123 insertions(+), 3 deletions(-)
> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>
<snipped>
> --
> 2.34.1
>
--
Best regards,
IM
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-05-25 6:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches
2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches
2023-03-24 14:42 ` Maxim Kokryashkin via Tarantool-patches
2023-04-03 9:02 ` Igor Munkin via Tarantool-patches
2023-05-25 6:16 ` Igor Munkin via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox