Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
@ 2023-03-22  8:27 Sergey Kaplun via Tarantool-patches
  2023-03-23  9:36 ` Maxim Kokryashkin via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-03-22  8:27 UTC (permalink / raw)
  To: Sergey Ostanevich, Maxim Kokryashkin; +Cc: tarantool-patches

From: Mike Pall <mike>

(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)

The accessing of memory address for some operation `emit_rma()` may be
encoded in one of the following ways:
 a. If the offset of the accessing address from the dispatch table
    (pinned to r14 that is not changed while trace execution) fits into
    32-bit, then encode this as an access to 32-bit displacement
    relative to r14.
 b. If the offset of the accessing address from the mcode (i.e. rip)
    fits into 32-bit, then encode this as an access to 32-bit
    displacement relative to rip (considering long mode specifics and
    `RID_RIP` hack).
 c. If the address doesn't fit into 32-bit one and we use `mov` or
    `movsd`, then encode 64-bit load from this address.
 d. Elsewhere, encode it as an access to 32-bit (the address should fit
    into 32-bit one) displacement (the only option for non-GC64 mode).

So, each instruction in GC64 mode differs from `mov` or `movsd` should
be encoded via the last option. But if we got a 64-bit address with a
big enough offset it can't be encoded and the assertion in `ptr2addr()`
will fail.

There are several cases, when `emit_rma()` is used with non `mov`
instruction:
* `IR_LDEXP` with `fld` instruction for loading constant
   number `TValue` by address.
* `IR_OBAR` with the corresponding `test` instruction on
  `marked` field of `GCobj`.
All these instructions require an additional register to store value by
address. We can't truly allocate a register here due to possibility to
break IR assembling which depends on specific register usage. So, we use
and restore r14 here for emitting.

Also, this patch removes `movsd` from condition from the `x86Op` type
check, as far as it never uses for the `emit_rma()` routine (see also
`emit_loadk64()` for details).

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#8069
---

Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
PR: https://github.com/tarantool/tarantool/pull/8477
Related issue: https://github.com/tarantool/tarantool/issues/8069

AFAICS, other places with `emit_rma()` usage are not related to the
patch as far as they take an offset for the address of JIT constants
stored in `jit_State`, so it always be near enough to dispatch.

Side note: you may check test-correctness of the last check with GC by
changing the corresponding condition check on `GC_WHITES` in asm_obar to
CC_NZ (like it will be treated for incorrect check). Be carefull, member
that instructions are emitted from bottom to top!

 src/lj_emit_x86.h                          |  24 ++++-
 test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
 2 files changed, 123 insertions(+), 3 deletions(-)
 create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua

diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
index 6b58306b..b3dc4ea5 100644
--- a/src/lj_emit_x86.h
+++ b/src/lj_emit_x86.h
@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
-  } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
-    emit_rmro(as, xo, rr, rr, 0);
-    emit_loadu64(as, rr, (uintptr_t)addr);
+  } else if (!checki32((intptr_t)addr)) {
+    Reg ra = (rr & 15);
+    if (xo != XO_MOV) {
+      /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
+      uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
+      uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
+      ra = RID_DISPATCH;
+      if (checku32(dispaddr)) {
+	emit_loadi(as, ra, (int32_t)dispaddr);
+      } else {  /* Full-size 64 bit load. */
+	MCode *p = as->mcp;
+	*(uint64_t *)(p-8) = dispaddr;
+	p[-9] = (MCode)(XI_MOVri+(ra&7));
+	p[-10] = 0x48 + ((ra>>3)&1);
+	p -= 10;
+	as->mcp = p;
+      }
+      if (xo == XO_GROUP3b) emit_i8(as, i8);
+    }
+    emit_rmro(as, xo, rr, ra, 0);
+    emit_loadu64(as, ra, (uintptr_t)addr);
   } else
 #endif
   {
diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
new file mode 100644
index 00000000..faddfe83
--- /dev/null
+++ b/test/tarantool-tests/fix-emit-rma.test.lua
@@ -0,0 +1,102 @@
+local tap = require('tap')
+local test = tap.test('fix-emit-rma'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+  ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
+})
+
+-- Need to test 2 cases of `emit_rma()` particulary on x64:
+--   * `IR_LDEXP` with `fld` instruction for loading constant
+--      number `TValue` by address.
+--   * `IR_OBAR` with the corresponding `test` instruction on
+--     `marked` field of `GCobj`.
+-- Also, test correctness.
+test:plan(4)
+
+local ffi = require('ffi')
+
+collectgarbage()
+-- Chomp memory in currently allocated GC space.
+collectgarbage('stop')
+
+for _ = 1, 8 do
+  ffi.new('char[?]', 256 * 1024 * 1024)
+end
+
+jit.opt.start('hotloop=1')
+
+-- Test `IR_LDEXP`.
+
+-- Reproducer here is a little tricky.
+-- We need to generate a bunch of traces as far we reference an
+-- IR field (`TValue`) address in `emit_rma()`. The amount of
+-- traces is empirical. Usually, assert fails on ~33d iteration,
+-- so let's use 100 just to be sure.
+local test_marker
+for _ = 1, 100 do
+  test_marker = loadstring([[
+    local test_marker
+    for i = 1, 4 do
+      -- Avoid fold optimization, use `i` as the second argument.
+      -- Need some constant differs from 1 or 0 as the first
+      -- argument.
+      test_marker = math.ldexp(1.2, i)
+    end
+    return test_marker
+  ]])()
+end
+
+-- If we here, it means no assertion failed during emitting.
+test:ok(true, 'IR_LDEXP emit_rma')
+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
+
+-- Test `IR_OBAR`.
+
+-- First, create a closed upvalue.
+do
+  local uv -- luacheck: no unused
+  -- `IR_OBAR` is used for object write barrier on upvalues.
+  _G.change_uv = function(newv)
+    uv = newv
+  end
+end
+
+-- We need a constant value on trace to be referenced far enough
+-- from dispatch table. So we need to create a new function
+-- prototype with a constant string.
+-- This string should be long enough to be allocated with direct
+-- alloc far away from dispatch.
+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
+local func_with_trace = loadstring([[
+  for _ = 1, 4 do
+    change_uv(']] .. str .. [[')
+  end
+]])
+func_with_trace()
+
+-- If we here, it means no assertion failed during emitting.
+test:ok(true, 'IR_OBAR emit_rma')
+
+-- Now check the correctness.
+
+-- Set GC state to GCpause.
+collectgarbage()
+
+-- We want to wait for the situation, when upvalue is black,
+-- the string is gray. Both conditions are satisfied, when the
+-- corresponding `change_uv()` function is marked, for example.
+-- We don't know on what exactly step our upvalue is marked as
+-- black and execution of trace become dangerous, so just check it
+-- at each step.
+-- Don't need to do the full GC cycle step by step.
+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
+  collectgarbage('step')
+  func_with_trace()
+end
+
+-- If we here, it means no assertion failed during `gc_mark()`,
+-- due to wrong call to `lj_gc_barrieruv()` on trace.
+test:ok(true, 'IR_OBAR emit_rma check correctness')
+
+os.exit(test:check() and 0 or 1)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
  2023-03-22  8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
@ 2023-03-23  9:36 ` Maxim Kokryashkin via Tarantool-patches
  2023-03-24 13:05   ` Sergey Kaplun via Tarantool-patches
  2023-04-03  9:02 ` Igor Munkin via Tarantool-patches
  2023-05-25  6:16 ` Igor Munkin via Tarantool-patches
  2 siblings, 1 reply; 6+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-03-23  9:36 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 8768 bytes --]


Hi!
Thanks for the patch!
 
> 
>>From: Mike Pall <mike>
>>
>>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>
>>The accessing of memory address for some operation `emit_rma()` may be
>>encoded in one of the following ways:
>> a. If the offset of the accessing address from the dispatch table
>I suggest paraphrasing it the following way:
>| If the offset of the address being accessed from the dispatch table
> 
>Same for the similar sentences below.
>>    (pinned to r14 that is not changed while trace execution) fits into
>Typo: s/while/during
>>    32-bit, then encode this as an access to 32-bit displacement
>>    relative to r14.
>> b. If the offset of the accessing address from the mcode (i.e. rip)
>>    fits into 32-bit, then encode this as an access to 32-bit
>>    displacement relative to rip (considering long mode specifics and
>>    `RID_RIP` hack).
>> c. If the address doesn't fit into 32-bit one and we use `mov` or
>>    `movsd`, then encode 64-bit load from this address.
>> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>>    into 32-bit one) displacement (the only option for non-GC64 mode).
>>
>>So, each instruction in GC64 mode differs from `mov` or `movsd` should
>Typo: s/differs/that differs/
>>be encoded via the last option. But if we got a 64-bit address with a
>Typo: s/got/get/
>>big enough offset it can't be encoded and the assertion in `ptr2addr()`
>Typo: s/offset/offset,/
>Typo: s/encoded/encoded,
>>will fail.
>Typo: s/will fail/fails/
>>
>>There are several cases, when `emit_rma()` is used with non `mov`
>Typo: s/with non-`mov`/with a non-`mov`/
>>instruction:
>>* `IR_LDEXP` with `fld` instruction for loading constant
>>   number `TValue` by address.
>>* `IR_OBAR` with the corresponding `test` instruction on
>>  `marked` field of `GCobj`.
>>All these instructions require an additional register to store value by
>>address. We can't truly allocate a register here due to possibility to
>>break IR assembling which depends on specific register usage. So, we use
>Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
>>and restore r14 here for emitting.
>>
>>Also, this patch removes `movsd` from condition from the `x86Op` type
>>check, as far as it never uses for the `emit_rma()` routine (see also
>Typo: s/uses/used
>>`emit_loadk64()` for details).
>>
>>Sergey Kaplun:
>>* added the description and the test for the problem
>>
>>Part of tarantool/tarantool#8069
>>---
>>
>>Branch:  https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
>>PR:  https://github.com/tarantool/tarantool/pull/8477
>>Related issue:  https://github.com/tarantool/tarantool/issues/8069
>>
>>AFAICS, other places with `emit_rma()` usage are not related to the
>>patch as far as they take an offset for the address of JIT constants
>>stored in `jit_State`, so it always be near enough to dispatch.
>>
>>Side note: you may check test-correctness of the last check with GC by
>>changing the corresponding condition check on `GC_WHITES` in asm_obar to
>>CC_NZ (like it will be treated for incorrect check). Be carefull, member
>>that instructions are emitted from bottom to top!
>>
>> src/lj_emit_x86.h | 24 ++++-
>> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>> 2 files changed, 123 insertions(+), 3 deletions(-)
>> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>>
>>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
>>index 6b58306b..b3dc4ea5 100644
>>--- a/src/lj_emit_x86.h
>>+++ b/src/lj_emit_x86.h
>>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
>>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
>>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
>>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
>>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
>>- emit_rmro(as, xo, rr, rr, 0);
>>- emit_loadu64(as, rr, (uintptr_t)addr);
>>+ } else if (!checki32((intptr_t)addr)) {
>>+ Reg ra = (rr & 15);
>>+ if (xo != XO_MOV) {
>>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
>>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
>>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
>>+ ra = RID_DISPATCH;
>>+ if (checku32(dispaddr)) {
>>+ emit_loadi(as, ra, (int32_t)dispaddr);
>>+ } else { /* Full-size 64 bit load. */
>>+ MCode *p = as->mcp;
>>+ *(uint64_t *)(p-8) = dispaddr;
>>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
>>+ p[-10] = 0x48 + ((ra>>3)&1);
>Why is it `0x48`?
>>+ p -= 10;
>>+ as->mcp = p;
>>+ }
>>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
>>+ }
>>+ emit_rmro(as, xo, rr, ra, 0);
>>+ emit_loadu64(as, ra, (uintptr_t)addr);
>>   } else
>> #endif
>>   {
>>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
>>new file mode 100644
>>index 00000000..faddfe83
>>--- /dev/null
>>+++ b/test/tarantool-tests/fix-emit-rma.test.lua
>>@@ -0,0 +1,102 @@
>>+local tap = require('tap')
>>+local test = tap.test('fix-emit-rma'):skipcond({
>>+ ['Test requires JIT enabled'] = not jit.status(),
>>+ ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'),
>>+})
>>+
>>+-- Need to test 2 cases of `emit_rma()` particulary on x64:
>>+-- * `IR_LDEXP` with `fld` instruction for loading constant
>>+-- number `TValue` by address.
>>+-- * `IR_OBAR` with the corresponding `test` instruction on
>>+-- `marked` field of `GCobj`.
>>+-- Also, test correctness.
>>+test:plan(4)
>>+
>>+local ffi = require('ffi')
>>+
>>+collectgarbage()
>>+-- Chomp memory in currently allocated GC space.
>>+collectgarbage('stop')
>>+
>>+for _ = 1, 8 do
>>+ ffi.new('char[?]', 256 * 1024 * 1024)
>>+end
>>+
>>+jit.opt.start('hotloop=1')
>>+
>>+-- Test `IR_LDEXP`.
>>+
>>+-- Reproducer here is a little tricky.
>>+-- We need to generate a bunch of traces as far we reference an
>>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
>>+-- traces is empirical. Usually, assert fails on ~33d iteration,
>>+-- so let's use 100 just to be sure.
>Is there any way to make it more deterministic?
>>+local test_marker
>>+for _ = 1, 100 do
>>+ test_marker = loadstring([[
>>+ local test_marker
>>+ for i = 1, 4 do
>>+ -- Avoid fold optimization, use `i` as the second argument.
>>+ -- Need some constant differs from 1 or 0 as the first
>>+ -- argument.
>>+ test_marker = math.ldexp(1.2, i)
>>+ end
>>+ return test_marker
>>+ ]])()
>>+end
>>+
>>+-- If we here, it means no assertion failed during emitting.
>Typo: s/we/we are/
>>+test:ok(true, 'IR_LDEXP emit_rma')
>>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
>>+
>>+-- Test `IR_OBAR`.
>>+
>>+-- First, create a closed upvalue.
>>+do
>>+ local uv -- luacheck: no unused
>>+ -- `IR_OBAR` is used for object write barrier on upvalues.
>>+ _G.change_uv = function(newv)
>>+ uv = newv
>>+ end
>>+end
>>+
>>+-- We need a constant value on trace to be referenced far enough
>>+-- from dispatch table. So we need to create a new function
>>+-- prototype with a constant string.
>>+-- This string should be long enough to be allocated with direct
>>+-- alloc far away from dispatch.
>>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>Why is that amount sufficient? Link to the source file would be enough.
>>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>>+local func_with_trace = loadstring([[
>>+ for _ = 1, 4 do
>>+ change_uv(']] .. str .. [[')
>>+ end
>>+]])
>>+func_with_trace()
>>+
>>+-- If we here, it means no assertion failed during emitting.
>Typo: s/we here/we are here/
>>+test:ok(true, 'IR_OBAR emit_rma')
>>+
>>+-- Now check the correctness.
>>+
>>+-- Set GC state to GCpause.
>>+collectgarbage()
>>+
>>+-- We want to wait for the situation, when upvalue is black,
>>+-- the string is gray. Both conditions are satisfied, when the
>>+-- corresponding `change_uv()` function is marked, for example.
>>+-- We don't know on what exactly step our upvalue is marked as
>Typo: s/exactly/exact/
>>+-- black and execution of trace become dangerous, so just check it
>>+-- at each step.
>>+-- Don't need to do the full GC cycle step by step.
>>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
>>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
>>+ collectgarbage('step')
>>+ func_with_trace()
>>+end
>>+
>>+-- If we here, it means no assertion failed during `gc_mark()`,
>Typo: s/we here/we are here/
>>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
>>+test:ok(true, 'IR_OBAR emit_rma check correctness')
>>+
>>+os.exit(test:check() and 0 or 1)
>>--
>>2.34.1
>--
>Best regards,
>Maxim Kokryashkin
> 

[-- Attachment #2: Type: text/html, Size: 13053 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
  2023-03-23  9:36 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-03-24 13:05   ` Sergey Kaplun via Tarantool-patches
  2023-03-24 14:42     ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 1 reply; 6+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-03-24 13:05 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the review!
I've updated the commit message and comments, force-pushed the branch.

On 23.03.23, Maxim Kokryashkin wrote:
> 
> Hi!
> Thanks for the patch!
>  
> > 
> >>From: Mike Pall <mike>
> >>
> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
> >>
> >>The accessing of memory address for some operation `emit_rma()` may be
> >>encoded in one of the following ways:
> >> a. If the offset of the accessing address from the dispatch table
> >I suggest paraphrasing it the following way:
> >| If the offset of the address being accessed from the dispatch table
> > 
> >Same for the similar sentences below.
> >>    (pinned to r14 that is not changed while trace execution) fits into
> >Typo: s/while/during
> >>    32-bit, then encode this as an access to 32-bit displacement
> >>    relative to r14.
> >> b. If the offset of the accessing address from the mcode (i.e. rip)
> >>    fits into 32-bit, then encode this as an access to 32-bit
> >>    displacement relative to rip (considering long mode specifics and
> >>    `RID_RIP` hack).
> >> c. If the address doesn't fit into 32-bit one and we use `mov` or
> >>    `movsd`, then encode 64-bit load from this address.
> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit
> >>    into 32-bit one) displacement (the only option for non-GC64 mode).
> >>
> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should
> >Typo: s/differs/that differs/
> >>be encoded via the last option. But if we got a 64-bit address with a
> >Typo: s/got/get/
> >>big enough offset it can't be encoded and the assertion in `ptr2addr()`
> >Typo: s/offset/offset,/
> >Typo: s/encoded/encoded,
> >>will fail.
> >Typo: s/will fail/fails/
> >>
> >>There are several cases, when `emit_rma()` is used with non `mov`
> >Typo: s/with non-`mov`/with a non-`mov`/
> >>instruction:
> >>* `IR_LDEXP` with `fld` instruction for loading constant
> >>   number `TValue` by address.
> >>* `IR_OBAR` with the corresponding `test` instruction on
> >>  `marked` field of `GCobj`.
> >>All these instructions require an additional register to store value by
> >>address. We can't truly allocate a register here due to possibility to
> >>break IR assembling which depends on specific register usage. So, we use
> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
> >>and restore r14 here for emitting.
> >>
> >>Also, this patch removes `movsd` from condition from the `x86Op` type
> >>check, as far as it never uses for the `emit_rma()` routine (see also
> >Typo: s/uses/used
> >>`emit_loadk64()` for details).
> >>
> >>Sergey Kaplun:
> >>* added the description and the test for the problem
> >>
> >>Part of tarantool/tarantool#8069
> >>---

Fixed the commit message; new commit message is the following:

x64/LJ_GC64: Fix emit_rma().

(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)

```
The accessing of memory address for some operation `emit_rma()` may be
encoded in one of the following ways:
 a. If the offset of the address being accessed from the dispatch table
    (pinned to r14 that is not changed during trace execution) fits into
    32-bit, then encode this as an access to 32-bit displacement
    relative to r14.
 b. If the offset of the address being accessed from the mcode (i.e.
    rip) fits into 32-bit, then encode this as an access to 32-bit
    displacement relative to rip (considering long mode specifics and
    `RID_RIP` hack).
 c. If the address doesn't fit into 32-bit one and we use `mov` or
    `movsd`, then encode 64-bit load from this address.
 d. Elsewhere, encode it as an access to 32-bit (the address should fit
    into 32-bit one) displacement (the only option for non-GC64 mode).

So, each instruction in GC64 mode that differs from `mov` or `movsd`
should be encoded via the last option. But if we get a 64-bit address
with a big enough offset, it can't be encoded, so the assertion in
`ptr2addr()` fails.

There are several cases, when `emit_rma()` is used with a non-`mov`
instruction:
* `IR_LDEXP` with `fld` instruction for loading constant
   number `TValue` by address.
* `IR_OBAR` with the corresponding `test` instruction on
  `marked` field of `GCobj`.
All these instructions require an additional register to store value by
address. We can't truly allocate a register here due to the possibility
to break IR assembling which depends on specific register usage. So, we
use and restore r14 here for emitting.

Also, this patch removes `movsd` from condition from the `x86Op` type
check, as far as it is never used for the `emit_rma()` routine (see also
`emit_loadk64()` for details).

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#8069
```

> >>
> >>Branch:  https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
> >>PR:  https://github.com/tarantool/tarantool/pull/8477
> >>Related issue:  https://github.com/tarantool/tarantool/issues/8069
> >>
> >>AFAICS, other places with `emit_rma()` usage are not related to the
> >>patch as far as they take an offset for the address of JIT constants
> >>stored in `jit_State`, so it always be near enough to dispatch.
> >>
> >>Side note: you may check test-correctness of the last check with GC by
> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to
> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member
> >>that instructions are emitted from bottom to top!
> >>
> >> src/lj_emit_x86.h | 24 ++++-
> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
> >> 2 files changed, 123 insertions(+), 3 deletions(-)
> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
> >>
> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
> >>index 6b58306b..b3dc4ea5 100644
> >>--- a/src/lj_emit_x86.h
> >>+++ b/src/lj_emit_x86.h
> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
> >>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
> >>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
> >>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
> >>- emit_rmro(as, xo, rr, rr, 0);
> >>- emit_loadu64(as, rr, (uintptr_t)addr);
> >>+ } else if (!checki32((intptr_t)addr)) {
> >>+ Reg ra = (rr & 15);
> >>+ if (xo != XO_MOV) {
> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
> >>+ ra = RID_DISPATCH;
> >>+ if (checku32(dispaddr)) {
> >>+ emit_loadi(as, ra, (int32_t)dispaddr);
> >>+ } else { /* Full-size 64 bit load. */
> >>+ MCode *p = as->mcp;
> >>+ *(uint64_t *)(p-8) = dispaddr;
> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
> >>+ p[-10] = 0x48 + ((ra>>3)&1);
> >Why is it `0x48`?

IINM, this is  REX prefix for the instruction [1], Sergos should correct
me if I wrong.

| perl -E 'say sprintf "%08b", 0x48'
| 01001000
  ****WRXB
****: Fixed bit pattern.
W bit is set to mark that a 64-bit operand size is used.

This is the same approach as for `emit_loadu64()`, so I didn't left any comment.

> >>+ p -= 10;
> >>+ as->mcp = p;
> >>+ }
> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
> >>+ }
> >>+ emit_rmro(as, xo, rr, ra, 0);
> >>+ emit_loadu64(as, ra, (uintptr_t)addr);
> >>   } else
> >> #endif
> >>   {
> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
> >>new file mode 100644
> >>index 00000000..faddfe83
> >>--- /dev/null
> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua

<snipped>

> >>+-- Test `IR_LDEXP`.
> >>+
> >>+-- Reproducer here is a little tricky.
> >>+-- We need to generate a bunch of traces as far we reference an
> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
> >>+-- traces is empirical. Usually, assert fails on ~33d iteration,
> >>+-- so let's use 100 just to be sure.
> >Is there any way to make it more deterministic?

I suppose no, due to undetermenistic memory allocation. So, just
use big enough number.

> >>+local test_marker
> >>+for _ = 1, 100 do
> >>+ test_marker = loadstring([[
> >>+ local test_marker
> >>+ for i = 1, 4 do
> >>+ -- Avoid fold optimization, use `i` as the second argument.
> >>+ -- Need some constant differs from 1 or 0 as the first
> >>+ -- argument.
> >>+ test_marker = math.ldexp(1.2, i)
> >>+ end
> >>+ return test_marker
> >>+ ]])()
> >>+end
> >>+
> >>+-- If we here, it means no assertion failed during emitting.
> >Typo: s/we/we are/

Fixed, thanks!

> >>+test:ok(true, 'IR_LDEXP emit_rma')
> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
> >>+
> >>+-- Test `IR_OBAR`.
> >>+
> >>+-- First, create a closed upvalue.
> >>+do
> >>+ local uv -- luacheck: no unused
> >>+ -- `IR_OBAR` is used for object write barrier on upvalues.
> >>+ _G.change_uv = function(newv)
> >>+ uv = newv
> >>+ end
> >>+end
> >>+
> >>+-- We need a constant value on trace to be referenced far enough
> >>+-- from dispatch table. So we need to create a new function
> >>+-- prototype with a constant string.
> >>+-- This string should be long enough to be allocated with direct
> >>+-- alloc far away from dispatch.
> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
> >Why is that amount sufficient? Link to the source file would be enough.

Added the following comment.
```
@@ -64,7 +64,8 @@ end
 -- from dispatch table. So we need to create a new function
 -- prototype with a constant string.
 -- This string should be long enough to be allocated with direct
--- alloc far away from dispatch.
+-- alloc (not fitting in free chunks) far away from dispatch.
+-- See <src/lj_alloc.c> for details.
 local DEFAULT_MMAP_THRESHOLD = 128 * 1024
 local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
 local func_with_trace = loadstring([[
@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[
 ]])
```

> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
> >>+local func_with_trace = loadstring([[
> >>+ for _ = 1, 4 do
> >>+ change_uv(']] .. str .. [[')
> >>+ end
> >>+]])
> >>+func_with_trace()
> >>+
> >>+-- If we here, it means no assertion failed during emitting.
> >Typo: s/we here/we are here/

Fixed, thanks!

> >>+test:ok(true, 'IR_OBAR emit_rma')
> >>+
> >>+-- Now check the correctness.
> >>+
> >>+-- Set GC state to GCpause.
> >>+collectgarbage()
> >>+
> >>+-- We want to wait for the situation, when upvalue is black,
> >>+-- the string is gray. Both conditions are satisfied, when the
> >>+-- corresponding `change_uv()` function is marked, for example.
> >>+-- We don't know on what exactly step our upvalue is marked as
> >Typo: s/exactly/exact/

Fixed, thanks!

> >>+-- black and execution of trace become dangerous, so just check it
> >>+-- at each step.
> >>+-- Don't need to do the full GC cycle step by step.
> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
> >>+ collectgarbage('step')
> >>+ func_with_trace()
> >>+end
> >>+
> >>+-- If we here, it means no assertion failed during `gc_mark()`,
> >Typo: s/we here/we are here/

Fixed, thanks!

> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
> >>+test:ok(true, 'IR_OBAR emit_rma check correctness')
> >>+
> >>+os.exit(test:check() and 0 or 1)
> >>--
> >>2.34.1
> >--
> >Best regards,
> >Maxim Kokryashkin
> > 

[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches]  [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
  2023-03-24 13:05   ` Sergey Kaplun via Tarantool-patches
@ 2023-03-24 14:42     ` Maxim Kokryashkin via Tarantool-patches
  0 siblings, 0 replies; 6+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-03-24 14:42 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 12477 bytes --]


Hi!
Thanks for the fixes!
LGTM.
--
Best regards,
Maxim Kokryashkin
 
 
> 
>>Hi, Maxim!
>>Thanks for the review!
>>I've updated the commit message and comments, force-pushed the branch.
>>
>>On 23.03.23, Maxim Kokryashkin wrote:
>>>
>>> Hi!
>>> Thanks for the patch!
>>>  
>>> > 
>>> >>From: Mike Pall <mike>
>>> >>
>>> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>> >>
>>> >>The accessing of memory address for some operation `emit_rma()` may be
>>> >>encoded in one of the following ways:
>>> >> a. If the offset of the accessing address from the dispatch table
>>> >I suggest paraphrasing it the following way:
>>> >| If the offset of the address being accessed from the dispatch table
>>> > 
>>> >Same for the similar sentences below.
>>> >>    (pinned to r14 that is not changed while trace execution) fits into
>>> >Typo: s/while/during
>>> >>    32-bit, then encode this as an access to 32-bit displacement
>>> >>    relative to r14.
>>> >> b. If the offset of the accessing address from the mcode (i.e. rip)
>>> >>    fits into 32-bit, then encode this as an access to 32-bit
>>> >>    displacement relative to rip (considering long mode specifics and
>>> >>    `RID_RIP` hack).
>>> >> c. If the address doesn't fit into 32-bit one and we use `mov` or
>>> >>    `movsd`, then encode 64-bit load from this address.
>>> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>>> >>    into 32-bit one) displacement (the only option for non-GC64 mode).
>>> >>
>>> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should
>>> >Typo: s/differs/that differs/
>>> >>be encoded via the last option. But if we got a 64-bit address with a
>>> >Typo: s/got/get/
>>> >>big enough offset it can't be encoded and the assertion in `ptr2addr()`
>>> >Typo: s/offset/offset,/
>>> >Typo: s/encoded/encoded,
>>> >>will fail.
>>> >Typo: s/will fail/fails/
>>> >>
>>> >>There are several cases, when `emit_rma()` is used with non `mov`
>>> >Typo: s/with non-`mov`/with a non-`mov`/
>>> >>instruction:
>>> >>* `IR_LDEXP` with `fld` instruction for loading constant
>>> >>   number `TValue` by address.
>>> >>* `IR_OBAR` with the corresponding `test` instruction on
>>> >>  `marked` field of `GCobj`.
>>> >>All these instructions require an additional register to store value by
>>> >>address. We can't truly allocate a register here due to possibility to
>>> >>break IR assembling which depends on specific register usage. So, we use
>>> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling,
>>> >>and restore r14 here for emitting.
>>> >>
>>> >>Also, this patch removes `movsd` from condition from the `x86Op` type
>>> >>check, as far as it never uses for the `emit_rma()` routine (see also
>>> >Typo: s/uses/used
>>> >>`emit_loadk64()` for details).
>>> >>
>>> >>Sergey Kaplun:
>>> >>* added the description and the test for the problem
>>> >>
>>> >>Part of tarantool/tarantool#8069
>>> >>---
>>
>>Fixed the commit message; new commit message is the following:
>>
>>x64/LJ_GC64: Fix emit_rma().
>>
>>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
>>
>>```
>>The accessing of memory address for some operation `emit_rma()` may be
>>encoded in one of the following ways:
>> a. If the offset of the address being accessed from the dispatch table
>>    (pinned to r14 that is not changed during trace execution) fits into
>>    32-bit, then encode this as an access to 32-bit displacement
>>    relative to r14.
>> b. If the offset of the address being accessed from the mcode (i.e.
>>    rip) fits into 32-bit, then encode this as an access to 32-bit
>>    displacement relative to rip (considering long mode specifics and
>>    `RID_RIP` hack).
>> c. If the address doesn't fit into 32-bit one and we use `mov` or
>>    `movsd`, then encode 64-bit load from this address.
>> d. Elsewhere, encode it as an access to 32-bit (the address should fit
>>    into 32-bit one) displacement (the only option for non-GC64 mode).
>>
>>So, each instruction in GC64 mode that differs from `mov` or `movsd`
>>should be encoded via the last option. But if we get a 64-bit address
>>with a big enough offset, it can't be encoded, so the assertion in
>>`ptr2addr()` fails.
>>
>>There are several cases, when `emit_rma()` is used with a non-`mov`
>>instruction:
>>* `IR_LDEXP` with `fld` instruction for loading constant
>>   number `TValue` by address.
>>* `IR_OBAR` with the corresponding `test` instruction on
>>  `marked` field of `GCobj`.
>>All these instructions require an additional register to store value by
>>address. We can't truly allocate a register here due to the possibility
>>to break IR assembling which depends on specific register usage. So, we
>>use and restore r14 here for emitting.
>>
>>Also, this patch removes `movsd` from condition from the `x86Op` type
>>check, as far as it is never used for the `emit_rma()` routine (see also
>>`emit_loadk64()` for details).
>>
>>Sergey Kaplun:
>>* added the description and the test for the problem
>>
>>Part of tarantool/tarantool#8069
>>```
>>
>>> >>
>>> >>Branch:  https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
>>> >>PR:  https://github.com/tarantool/tarantool/pull/8477
>>> >>Related issue:  https://github.com/tarantool/tarantool/issues/8069
>>> >>
>>> >>AFAICS, other places with `emit_rma()` usage are not related to the
>>> >>patch as far as they take an offset for the address of JIT constants
>>> >>stored in `jit_State`, so it always be near enough to dispatch.
>>> >>
>>> >>Side note: you may check test-correctness of the last check with GC by
>>> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to
>>> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member
>>> >>that instructions are emitted from bottom to top!
>>> >>
>>> >> src/lj_emit_x86.h | 24 ++++-
>>> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>>> >> 2 files changed, 123 insertions(+), 3 deletions(-)
>>> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
>>> >>
>>> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h
>>> >>index 6b58306b..b3dc4ea5 100644
>>> >>--- a/src/lj_emit_x86.h
>>> >>+++ b/src/lj_emit_x86.h
>>> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr)
>>> >>     emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr));
>>> >>   } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) {
>>> >>     emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr));
>>> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) {
>>> >>- emit_rmro(as, xo, rr, rr, 0);
>>> >>- emit_loadu64(as, rr, (uintptr_t)addr);
>>> >>+ } else if (!checki32((intptr_t)addr)) {
>>> >>+ Reg ra = (rr & 15);
>>> >>+ if (xo != XO_MOV) {
>>> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */
>>> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch;
>>> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0;
>>> >>+ ra = RID_DISPATCH;
>>> >>+ if (checku32(dispaddr)) {
>>> >>+ emit_loadi(as, ra, (int32_t)dispaddr);
>>> >>+ } else { /* Full-size 64 bit load. */
>>> >>+ MCode *p = as->mcp;
>>> >>+ *(uint64_t *)(p-8) = dispaddr;
>>> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7));
>>> >>+ p[-10] = 0x48 + ((ra>>3)&1);
>>> >Why is it `0x48`?
>>
>>IINM, this is REX prefix for the instruction [1], Sergos should correct
>>me if I wrong.
>>
>>| perl -E 'say sprintf "%08b", 0x48'
>>| 01001000
>>  ****WRXB
>>****: Fixed bit pattern.
>>W bit is set to mark that a 64-bit operand size is used.
>>
>>This is the same approach as for `emit_loadu64()`, so I didn't left any comment.
>>
>>> >>+ p -= 10;
>>> >>+ as->mcp = p;
>>> >>+ }
>>> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8);
>>> >>+ }
>>> >>+ emit_rmro(as, xo, rr, ra, 0);
>>> >>+ emit_loadu64(as, ra, (uintptr_t)addr);
>>> >>   } else
>>> >> #endif
>>> >>   {
>>> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua
>>> >>new file mode 100644
>>> >>index 00000000..faddfe83
>>> >>--- /dev/null
>>> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua
>>
>><snipped>
>>
>>> >>+-- Test `IR_LDEXP`.
>>> >>+
>>> >>+-- Reproducer here is a little tricky.
>>> >>+-- We need to generate a bunch of traces as far we reference an
>>> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of
>>> >>+-- traces is empirical. Usually, assert fails on ~33d iteration,
>>> >>+-- so let's use 100 just to be sure.
>>> >Is there any way to make it more deterministic?
>>
>>I suppose no, due to undetermenistic memory allocation. So, just
>>use big enough number.
>>
>>> >>+local test_marker
>>> >>+for _ = 1, 100 do
>>> >>+ test_marker = loadstring([[
>>> >>+ local test_marker
>>> >>+ for i = 1, 4 do
>>> >>+ -- Avoid fold optimization, use `i` as the second argument.
>>> >>+ -- Need some constant differs from 1 or 0 as the first
>>> >>+ -- argument.
>>> >>+ test_marker = math.ldexp(1.2, i)
>>> >>+ end
>>> >>+ return test_marker
>>> >>+ ]])()
>>> >>+end
>>> >>+
>>> >>+-- If we here, it means no assertion failed during emitting.
>>> >Typo: s/we/we are/
>>
>>Fixed, thanks!
>>
>>> >>+test:ok(true, 'IR_LDEXP emit_rma')
>>> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result')
>>> >>+
>>> >>+-- Test `IR_OBAR`.
>>> >>+
>>> >>+-- First, create a closed upvalue.
>>> >>+do
>>> >>+ local uv -- luacheck: no unused
>>> >>+ -- `IR_OBAR` is used for object write barrier on upvalues.
>>> >>+ _G.change_uv = function(newv)
>>> >>+ uv = newv
>>> >>+ end
>>> >>+end
>>> >>+
>>> >>+-- We need a constant value on trace to be referenced far enough
>>> >>+-- from dispatch table. So we need to create a new function
>>> >>+-- prototype with a constant string.
>>> >>+-- This string should be long enough to be allocated with direct
>>> >>+-- alloc far away from dispatch.
>>> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>>> >Why is that amount sufficient? Link to the source file would be enough.
>>
>>Added the following comment.
>>```
>>@@ -64,7 +64,8 @@ end
>> -- from dispatch table. So we need to create a new function
>> -- prototype with a constant string.
>> -- This string should be long enough to be allocated with direct
>>--- alloc far away from dispatch.
>>+-- alloc (not fitting in free chunks) far away from dispatch.
>>+-- See <src/lj_alloc.c> for details.
>> local DEFAULT_MMAP_THRESHOLD = 128 * 1024
>> local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>> local func_with_trace = loadstring([[
>>@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[
>> ]])
>>```
>>
>>> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD)
>>> >>+local func_with_trace = loadstring([[
>>> >>+ for _ = 1, 4 do
>>> >>+ change_uv(']] .. str .. [[')
>>> >>+ end
>>> >>+]])
>>> >>+func_with_trace()
>>> >>+
>>> >>+-- If we here, it means no assertion failed during emitting.
>>> >Typo: s/we here/we are here/
>>
>>Fixed, thanks!
>>
>>> >>+test:ok(true, 'IR_OBAR emit_rma')
>>> >>+
>>> >>+-- Now check the correctness.
>>> >>+
>>> >>+-- Set GC state to GCpause.
>>> >>+collectgarbage()
>>> >>+
>>> >>+-- We want to wait for the situation, when upvalue is black,
>>> >>+-- the string is gray. Both conditions are satisfied, when the
>>> >>+-- corresponding `change_uv()` function is marked, for example.
>>> >>+-- We don't know on what exactly step our upvalue is marked as
>>> >Typo: s/exactly/exact/
>>
>>Fixed, thanks!
>>
>>> >>+-- black and execution of trace become dangerous, so just check it
>>> >>+-- at each step.
>>> >>+-- Don't need to do the full GC cycle step by step.
>>> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic
>>> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do
>>> >>+ collectgarbage('step')
>>> >>+ func_with_trace()
>>> >>+end
>>> >>+
>>> >>+-- If we here, it means no assertion failed during `gc_mark()`,
>>> >Typo: s/we here/we are here/
>>
>>Fixed, thanks!
>>
>>> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace.
>>> >>+test:ok(true, 'IR_OBAR emit_rma check correctness')
>>> >>+
>>> >>+os.exit(test:check() and 0 or 1)
>>> >>--
>>> >>2.34.1
>>> >--
>>> >Best regards,
>>> >Maxim Kokryashkin
>>> > 
>>
>>[1]:  https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
>>
>>--
>>Best regards,
>>Sergey Kaplun
> 

[-- Attachment #2: Type: text/html, Size: 16034 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
  2023-03-22  8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
  2023-03-23  9:36 ` Maxim Kokryashkin via Tarantool-patches
@ 2023-04-03  9:02 ` Igor Munkin via Tarantool-patches
  2023-05-25  6:16 ` Igor Munkin via Tarantool-patches
  2 siblings, 0 replies; 6+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2023-04-03  9:02 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

Thanks for the patch! LGTM, considering the fixes for Max comments.

On 22.03.23, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
> 
> The accessing of memory address for some operation `emit_rma()` may be
> encoded in one of the following ways:
>  a. If the offset of the accessing address from the dispatch table
>     (pinned to r14 that is not changed while trace execution) fits into
>     32-bit, then encode this as an access to 32-bit displacement
>     relative to r14.
>  b. If the offset of the accessing address from the mcode (i.e. rip)
>     fits into 32-bit, then encode this as an access to 32-bit
>     displacement relative to rip (considering long mode specifics and
>     `RID_RIP` hack).
>  c. If the address doesn't fit into 32-bit one and we use `mov` or
>     `movsd`, then encode 64-bit load from this address.
>  d. Elsewhere, encode it as an access to 32-bit (the address should fit
>     into 32-bit one) displacement (the only option for non-GC64 mode).
> 
> So, each instruction in GC64 mode differs from `mov` or `movsd` should
> be encoded via the last option. But if we got a 64-bit address with a
> big enough offset it can't be encoded and the assertion in `ptr2addr()`
> will fail.
> 
> There are several cases, when `emit_rma()` is used with non `mov`
> instruction:
> * `IR_LDEXP` with `fld` instruction for loading constant
>    number `TValue` by address.
> * `IR_OBAR` with the corresponding `test` instruction on
>   `marked` field of `GCobj`.
> All these instructions require an additional register to store value by
> address. We can't truly allocate a register here due to possibility to
> break IR assembling which depends on specific register usage. So, we use
> and restore r14 here for emitting.
> 
> Also, this patch removes `movsd` from condition from the `x86Op` type
> check, as far as it never uses for the `emit_rma()` routine (see also
> `emit_loadk64()` for details).
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#8069
> ---
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
> PR: https://github.com/tarantool/tarantool/pull/8477
> Related issue: https://github.com/tarantool/tarantool/issues/8069
> 
> AFAICS, other places with `emit_rma()` usage are not related to the
> patch as far as they take an offset for the address of JIT constants
> stored in `jit_State`, so it always be near enough to dispatch.
> 
> Side note: you may check test-correctness of the last check with GC by
> changing the corresponding condition check on `GC_WHITES` in asm_obar to
> CC_NZ (like it will be treated for incorrect check). Be carefull, member
> that instructions are emitted from bottom to top!
> 
>  src/lj_emit_x86.h                          |  24 ++++-
>  test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>  2 files changed, 123 insertions(+), 3 deletions(-)
>  create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
> 

<snipped>

> -- 
> 2.34.1
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma().
  2023-03-22  8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
  2023-03-23  9:36 ` Maxim Kokryashkin via Tarantool-patches
  2023-04-03  9:02 ` Igor Munkin via Tarantool-patches
@ 2023-05-25  6:16 ` Igor Munkin via Tarantool-patches
  2 siblings, 0 replies; 6+ messages in thread
From: Igor Munkin via Tarantool-patches @ 2023-05-25  6:16 UTC (permalink / raw)
  To: Sergey Kaplun; +Cc: tarantool-patches

Sergey,

I've checked the patchset into all long-term branches in
tarantool/luajit and bumped a new version in master, 2.11 and 2.10.

On 22.03.23, Sergey Kaplun via Tarantool-patches wrote:
> From: Mike Pall <mike>
> 
> (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3)
> 
> The accessing of memory address for some operation `emit_rma()` may be
> encoded in one of the following ways:
>  a. If the offset of the accessing address from the dispatch table
>     (pinned to r14 that is not changed while trace execution) fits into
>     32-bit, then encode this as an access to 32-bit displacement
>     relative to r14.
>  b. If the offset of the accessing address from the mcode (i.e. rip)
>     fits into 32-bit, then encode this as an access to 32-bit
>     displacement relative to rip (considering long mode specifics and
>     `RID_RIP` hack).
>  c. If the address doesn't fit into 32-bit one and we use `mov` or
>     `movsd`, then encode 64-bit load from this address.
>  d. Elsewhere, encode it as an access to 32-bit (the address should fit
>     into 32-bit one) displacement (the only option for non-GC64 mode).
> 
> So, each instruction in GC64 mode differs from `mov` or `movsd` should
> be encoded via the last option. But if we got a 64-bit address with a
> big enough offset it can't be encoded and the assertion in `ptr2addr()`
> will fail.
> 
> There are several cases, when `emit_rma()` is used with non `mov`
> instruction:
> * `IR_LDEXP` with `fld` instruction for loading constant
>    number `TValue` by address.
> * `IR_OBAR` with the corresponding `test` instruction on
>   `marked` field of `GCobj`.
> All these instructions require an additional register to store value by
> address. We can't truly allocate a register here due to possibility to
> break IR assembling which depends on specific register usage. So, we use
> and restore r14 here for emitting.
> 
> Also, this patch removes `movsd` from condition from the `x86Op` type
> check, as far as it never uses for the `emit_rma()` routine (see also
> `emit_loadk64()` for details).
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#8069
> ---
> 
> Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma
> PR: https://github.com/tarantool/tarantool/pull/8477
> Related issue: https://github.com/tarantool/tarantool/issues/8069
> 
> AFAICS, other places with `emit_rma()` usage are not related to the
> patch as far as they take an offset for the address of JIT constants
> stored in `jit_State`, so it always be near enough to dispatch.
> 
> Side note: you may check test-correctness of the last check with GC by
> changing the corresponding condition check on `GC_WHITES` in asm_obar to
> CC_NZ (like it will be treated for incorrect check). Be carefull, member
> that instructions are emitted from bottom to top!
> 
>  src/lj_emit_x86.h                          |  24 ++++-
>  test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++
>  2 files changed, 123 insertions(+), 3 deletions(-)
>  create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua
> 

<snipped>

> -- 
> 2.34.1
> 

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-25  6:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22  8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches
2023-03-23  9:36 ` Maxim Kokryashkin via Tarantool-patches
2023-03-24 13:05   ` Sergey Kaplun via Tarantool-patches
2023-03-24 14:42     ` Maxim Kokryashkin via Tarantool-patches
2023-04-03  9:02 ` Igor Munkin via Tarantool-patches
2023-05-25  6:16 ` Igor Munkin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox