* [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma(). @ 2023-03-22 8:27 Sergey Kaplun via Tarantool-patches 2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2023-03-22 8:27 UTC (permalink / raw) To: Sergey Ostanevich, Maxim Kokryashkin; +Cc: tarantool-patches From: Mike Pall <mike> (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) The accessing of memory address for some operation `emit_rma()` may be encoded in one of the following ways: a. If the offset of the accessing address from the dispatch table (pinned to r14 that is not changed while trace execution) fits into 32-bit, then encode this as an access to 32-bit displacement relative to r14. b. If the offset of the accessing address from the mcode (i.e. rip) fits into 32-bit, then encode this as an access to 32-bit displacement relative to rip (considering long mode specifics and `RID_RIP` hack). c. If the address doesn't fit into 32-bit one and we use `mov` or `movsd`, then encode 64-bit load from this address. d. Elsewhere, encode it as an access to 32-bit (the address should fit into 32-bit one) displacement (the only option for non-GC64 mode). So, each instruction in GC64 mode differs from `mov` or `movsd` should be encoded via the last option. But if we got a 64-bit address with a big enough offset it can't be encoded and the assertion in `ptr2addr()` will fail. There are several cases, when `emit_rma()` is used with non `mov` instruction: * `IR_LDEXP` with `fld` instruction for loading constant number `TValue` by address. * `IR_OBAR` with the corresponding `test` instruction on `marked` field of `GCobj`. All these instructions require an additional register to store value by address. We can't truly allocate a register here due to possibility to break IR assembling which depends on specific register usage. So, we use and restore r14 here for emitting. Also, this patch removes `movsd` from condition from the `x86Op` type check, as far as it never uses for the `emit_rma()` routine (see also `emit_loadk64()` for details). Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#8069 --- Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma PR: https://github.com/tarantool/tarantool/pull/8477 Related issue: https://github.com/tarantool/tarantool/issues/8069 AFAICS, other places with `emit_rma()` usage are not related to the patch as far as they take an offset for the address of JIT constants stored in `jit_State`, so it always be near enough to dispatch. Side note: you may check test-correctness of the last check with GC by changing the corresponding condition check on `GC_WHITES` in asm_obar to CC_NZ (like it will be treated for incorrect check). Be carefull, member that instructions are emitted from bottom to top! src/lj_emit_x86.h | 24 ++++- test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ 2 files changed, 123 insertions(+), 3 deletions(-) create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h index 6b58306b..b3dc4ea5 100644 --- a/src/lj_emit_x86.h +++ b/src/lj_emit_x86.h @@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr) emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr)); } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) { emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr)); - } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) { - emit_rmro(as, xo, rr, rr, 0); - emit_loadu64(as, rr, (uintptr_t)addr); + } else if (!checki32((intptr_t)addr)) { + Reg ra = (rr & 15); + if (xo != XO_MOV) { + /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */ + uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch; + uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0; + ra = RID_DISPATCH; + if (checku32(dispaddr)) { + emit_loadi(as, ra, (int32_t)dispaddr); + } else { /* Full-size 64 bit load. */ + MCode *p = as->mcp; + *(uint64_t *)(p-8) = dispaddr; + p[-9] = (MCode)(XI_MOVri+(ra&7)); + p[-10] = 0x48 + ((ra>>3)&1); + p -= 10; + as->mcp = p; + } + if (xo == XO_GROUP3b) emit_i8(as, i8); + } + emit_rmro(as, xo, rr, ra, 0); + emit_loadu64(as, ra, (uintptr_t)addr); } else #endif { diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua new file mode 100644 index 00000000..faddfe83 --- /dev/null +++ b/test/tarantool-tests/fix-emit-rma.test.lua @@ -0,0 +1,102 @@ +local tap = require('tap') +local test = tap.test('fix-emit-rma'):skipcond({ + ['Test requires JIT enabled'] = not jit.status(), + ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'), +}) + +-- Need to test 2 cases of `emit_rma()` particulary on x64: +-- * `IR_LDEXP` with `fld` instruction for loading constant +-- number `TValue` by address. +-- * `IR_OBAR` with the corresponding `test` instruction on +-- `marked` field of `GCobj`. +-- Also, test correctness. +test:plan(4) + +local ffi = require('ffi') + +collectgarbage() +-- Chomp memory in currently allocated GC space. +collectgarbage('stop') + +for _ = 1, 8 do + ffi.new('char[?]', 256 * 1024 * 1024) +end + +jit.opt.start('hotloop=1') + +-- Test `IR_LDEXP`. + +-- Reproducer here is a little tricky. +-- We need to generate a bunch of traces as far we reference an +-- IR field (`TValue`) address in `emit_rma()`. The amount of +-- traces is empirical. Usually, assert fails on ~33d iteration, +-- so let's use 100 just to be sure. +local test_marker +for _ = 1, 100 do + test_marker = loadstring([[ + local test_marker + for i = 1, 4 do + -- Avoid fold optimization, use `i` as the second argument. + -- Need some constant differs from 1 or 0 as the first + -- argument. + test_marker = math.ldexp(1.2, i) + end + return test_marker + ]])() +end + +-- If we here, it means no assertion failed during emitting. +test:ok(true, 'IR_LDEXP emit_rma') +test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result') + +-- Test `IR_OBAR`. + +-- First, create a closed upvalue. +do + local uv -- luacheck: no unused + -- `IR_OBAR` is used for object write barrier on upvalues. + _G.change_uv = function(newv) + uv = newv + end +end + +-- We need a constant value on trace to be referenced far enough +-- from dispatch table. So we need to create a new function +-- prototype with a constant string. +-- This string should be long enough to be allocated with direct +-- alloc far away from dispatch. +local DEFAULT_MMAP_THRESHOLD = 128 * 1024 +local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) +local func_with_trace = loadstring([[ + for _ = 1, 4 do + change_uv(']] .. str .. [[') + end +]]) +func_with_trace() + +-- If we here, it means no assertion failed during emitting. +test:ok(true, 'IR_OBAR emit_rma') + +-- Now check the correctness. + +-- Set GC state to GCpause. +collectgarbage() + +-- We want to wait for the situation, when upvalue is black, +-- the string is gray. Both conditions are satisfied, when the +-- corresponding `change_uv()` function is marked, for example. +-- We don't know on what exactly step our upvalue is marked as +-- black and execution of trace become dangerous, so just check it +-- at each step. +-- Don't need to do the full GC cycle step by step. +local old_steps_atomic = misc.getmetrics().gc_steps_atomic +while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do + collectgarbage('step') + func_with_trace() +end + +-- If we here, it means no assertion failed during `gc_mark()`, +-- due to wrong call to `lj_gc_barrieruv()` on trace. +test:ok(true, 'IR_OBAR emit_rma check correctness') + +os.exit(test:check() and 0 or 1) -- 2.34.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma(). 2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches @ 2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches 2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches 2023-04-03 9:02 ` Igor Munkin via Tarantool-patches 2023-05-25 6:16 ` Igor Munkin via Tarantool-patches 2 siblings, 1 reply; 6+ messages in thread From: Maxim Kokryashkin via Tarantool-patches @ 2023-03-23 9:36 UTC (permalink / raw) To: Sergey Kaplun; +Cc: tarantool-patches [-- Attachment #1: Type: text/plain, Size: 8768 bytes --] Hi! Thanks for the patch! > >>From: Mike Pall <mike> >> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) >> >>The accessing of memory address for some operation `emit_rma()` may be >>encoded in one of the following ways: >> a. If the offset of the accessing address from the dispatch table >I suggest paraphrasing it the following way: >| If the offset of the address being accessed from the dispatch table > >Same for the similar sentences below. >> (pinned to r14 that is not changed while trace execution) fits into >Typo: s/while/during >> 32-bit, then encode this as an access to 32-bit displacement >> relative to r14. >> b. If the offset of the accessing address from the mcode (i.e. rip) >> fits into 32-bit, then encode this as an access to 32-bit >> displacement relative to rip (considering long mode specifics and >> `RID_RIP` hack). >> c. If the address doesn't fit into 32-bit one and we use `mov` or >> `movsd`, then encode 64-bit load from this address. >> d. Elsewhere, encode it as an access to 32-bit (the address should fit >> into 32-bit one) displacement (the only option for non-GC64 mode). >> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should >Typo: s/differs/that differs/ >>be encoded via the last option. But if we got a 64-bit address with a >Typo: s/got/get/ >>big enough offset it can't be encoded and the assertion in `ptr2addr()` >Typo: s/offset/offset,/ >Typo: s/encoded/encoded, >>will fail. >Typo: s/will fail/fails/ >> >>There are several cases, when `emit_rma()` is used with non `mov` >Typo: s/with non-`mov`/with a non-`mov`/ >>instruction: >>* `IR_LDEXP` with `fld` instruction for loading constant >> number `TValue` by address. >>* `IR_OBAR` with the corresponding `test` instruction on >> `marked` field of `GCobj`. >>All these instructions require an additional register to store value by >>address. We can't truly allocate a register here due to possibility to >>break IR assembling which depends on specific register usage. So, we use >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling, >>and restore r14 here for emitting. >> >>Also, this patch removes `movsd` from condition from the `x86Op` type >>check, as far as it never uses for the `emit_rma()` routine (see also >Typo: s/uses/used >>`emit_loadk64()` for details). >> >>Sergey Kaplun: >>* added the description and the test for the problem >> >>Part of tarantool/tarantool#8069 >>--- >> >>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma >>PR: https://github.com/tarantool/tarantool/pull/8477 >>Related issue: https://github.com/tarantool/tarantool/issues/8069 >> >>AFAICS, other places with `emit_rma()` usage are not related to the >>patch as far as they take an offset for the address of JIT constants >>stored in `jit_State`, so it always be near enough to dispatch. >> >>Side note: you may check test-correctness of the last check with GC by >>changing the corresponding condition check on `GC_WHITES` in asm_obar to >>CC_NZ (like it will be treated for incorrect check). Be carefull, member >>that instructions are emitted from bottom to top! >> >> src/lj_emit_x86.h | 24 ++++- >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ >> 2 files changed, 123 insertions(+), 3 deletions(-) >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua >> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h >>index 6b58306b..b3dc4ea5 100644 >>--- a/src/lj_emit_x86.h >>+++ b/src/lj_emit_x86.h >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr) >> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr)); >> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) { >> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr)); >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) { >>- emit_rmro(as, xo, rr, rr, 0); >>- emit_loadu64(as, rr, (uintptr_t)addr); >>+ } else if (!checki32((intptr_t)addr)) { >>+ Reg ra = (rr & 15); >>+ if (xo != XO_MOV) { >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */ >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch; >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0; >>+ ra = RID_DISPATCH; >>+ if (checku32(dispaddr)) { >>+ emit_loadi(as, ra, (int32_t)dispaddr); >>+ } else { /* Full-size 64 bit load. */ >>+ MCode *p = as->mcp; >>+ *(uint64_t *)(p-8) = dispaddr; >>+ p[-9] = (MCode)(XI_MOVri+(ra&7)); >>+ p[-10] = 0x48 + ((ra>>3)&1); >Why is it `0x48`? >>+ p -= 10; >>+ as->mcp = p; >>+ } >>+ if (xo == XO_GROUP3b) emit_i8(as, i8); >>+ } >>+ emit_rmro(as, xo, rr, ra, 0); >>+ emit_loadu64(as, ra, (uintptr_t)addr); >> } else >> #endif >> { >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua >>new file mode 100644 >>index 00000000..faddfe83 >>--- /dev/null >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua >>@@ -0,0 +1,102 @@ >>+local tap = require('tap') >>+local test = tap.test('fix-emit-rma'):skipcond({ >>+ ['Test requires JIT enabled'] = not jit.status(), >>+ ['Test requires GC64 mode enabled'] = not require('ffi').abi('gc64'), >>+}) >>+ >>+-- Need to test 2 cases of `emit_rma()` particulary on x64: >>+-- * `IR_LDEXP` with `fld` instruction for loading constant >>+-- number `TValue` by address. >>+-- * `IR_OBAR` with the corresponding `test` instruction on >>+-- `marked` field of `GCobj`. >>+-- Also, test correctness. >>+test:plan(4) >>+ >>+local ffi = require('ffi') >>+ >>+collectgarbage() >>+-- Chomp memory in currently allocated GC space. >>+collectgarbage('stop') >>+ >>+for _ = 1, 8 do >>+ ffi.new('char[?]', 256 * 1024 * 1024) >>+end >>+ >>+jit.opt.start('hotloop=1') >>+ >>+-- Test `IR_LDEXP`. >>+ >>+-- Reproducer here is a little tricky. >>+-- We need to generate a bunch of traces as far we reference an >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of >>+-- traces is empirical. Usually, assert fails on ~33d iteration, >>+-- so let's use 100 just to be sure. >Is there any way to make it more deterministic? >>+local test_marker >>+for _ = 1, 100 do >>+ test_marker = loadstring([[ >>+ local test_marker >>+ for i = 1, 4 do >>+ -- Avoid fold optimization, use `i` as the second argument. >>+ -- Need some constant differs from 1 or 0 as the first >>+ -- argument. >>+ test_marker = math.ldexp(1.2, i) >>+ end >>+ return test_marker >>+ ]])() >>+end >>+ >>+-- If we here, it means no assertion failed during emitting. >Typo: s/we/we are/ >>+test:ok(true, 'IR_LDEXP emit_rma') >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result') >>+ >>+-- Test `IR_OBAR`. >>+ >>+-- First, create a closed upvalue. >>+do >>+ local uv -- luacheck: no unused >>+ -- `IR_OBAR` is used for object write barrier on upvalues. >>+ _G.change_uv = function(newv) >>+ uv = newv >>+ end >>+end >>+ >>+-- We need a constant value on trace to be referenced far enough >>+-- from dispatch table. So we need to create a new function >>+-- prototype with a constant string. >>+-- This string should be long enough to be allocated with direct >>+-- alloc far away from dispatch. >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024 >Why is that amount sufficient? Link to the source file would be enough. >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) >>+local func_with_trace = loadstring([[ >>+ for _ = 1, 4 do >>+ change_uv(']] .. str .. [[') >>+ end >>+]]) >>+func_with_trace() >>+ >>+-- If we here, it means no assertion failed during emitting. >Typo: s/we here/we are here/ >>+test:ok(true, 'IR_OBAR emit_rma') >>+ >>+-- Now check the correctness. >>+ >>+-- Set GC state to GCpause. >>+collectgarbage() >>+ >>+-- We want to wait for the situation, when upvalue is black, >>+-- the string is gray. Both conditions are satisfied, when the >>+-- corresponding `change_uv()` function is marked, for example. >>+-- We don't know on what exactly step our upvalue is marked as >Typo: s/exactly/exact/ >>+-- black and execution of trace become dangerous, so just check it >>+-- at each step. >>+-- Don't need to do the full GC cycle step by step. >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do >>+ collectgarbage('step') >>+ func_with_trace() >>+end >>+ >>+-- If we here, it means no assertion failed during `gc_mark()`, >Typo: s/we here/we are here/ >>+-- due to wrong call to `lj_gc_barrieruv()` on trace. >>+test:ok(true, 'IR_OBAR emit_rma check correctness') >>+ >>+os.exit(test:check() and 0 or 1) >>-- >>2.34.1 >-- >Best regards, >Maxim Kokryashkin > [-- Attachment #2: Type: text/html, Size: 13053 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma(). 2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches @ 2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches 2023-03-24 14:42 ` Maxim Kokryashkin via Tarantool-patches 0 siblings, 1 reply; 6+ messages in thread From: Sergey Kaplun via Tarantool-patches @ 2023-03-24 13:05 UTC (permalink / raw) To: Maxim Kokryashkin; +Cc: tarantool-patches Hi, Maxim! Thanks for the review! I've updated the commit message and comments, force-pushed the branch. On 23.03.23, Maxim Kokryashkin wrote: > > Hi! > Thanks for the patch! > > > > >>From: Mike Pall <mike> > >> > >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) > >> > >>The accessing of memory address for some operation `emit_rma()` may be > >>encoded in one of the following ways: > >> a. If the offset of the accessing address from the dispatch table > >I suggest paraphrasing it the following way: > >| If the offset of the address being accessed from the dispatch table > > > >Same for the similar sentences below. > >> (pinned to r14 that is not changed while trace execution) fits into > >Typo: s/while/during > >> 32-bit, then encode this as an access to 32-bit displacement > >> relative to r14. > >> b. If the offset of the accessing address from the mcode (i.e. rip) > >> fits into 32-bit, then encode this as an access to 32-bit > >> displacement relative to rip (considering long mode specifics and > >> `RID_RIP` hack). > >> c. If the address doesn't fit into 32-bit one and we use `mov` or > >> `movsd`, then encode 64-bit load from this address. > >> d. Elsewhere, encode it as an access to 32-bit (the address should fit > >> into 32-bit one) displacement (the only option for non-GC64 mode). > >> > >>So, each instruction in GC64 mode differs from `mov` or `movsd` should > >Typo: s/differs/that differs/ > >>be encoded via the last option. But if we got a 64-bit address with a > >Typo: s/got/get/ > >>big enough offset it can't be encoded and the assertion in `ptr2addr()` > >Typo: s/offset/offset,/ > >Typo: s/encoded/encoded, > >>will fail. > >Typo: s/will fail/fails/ > >> > >>There are several cases, when `emit_rma()` is used with non `mov` > >Typo: s/with non-`mov`/with a non-`mov`/ > >>instruction: > >>* `IR_LDEXP` with `fld` instruction for loading constant > >> number `TValue` by address. > >>* `IR_OBAR` with the corresponding `test` instruction on > >> `marked` field of `GCobj`. > >>All these instructions require an additional register to store value by > >>address. We can't truly allocate a register here due to possibility to > >>break IR assembling which depends on specific register usage. So, we use > >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling, > >>and restore r14 here for emitting. > >> > >>Also, this patch removes `movsd` from condition from the `x86Op` type > >>check, as far as it never uses for the `emit_rma()` routine (see also > >Typo: s/uses/used > >>`emit_loadk64()` for details). > >> > >>Sergey Kaplun: > >>* added the description and the test for the problem > >> > >>Part of tarantool/tarantool#8069 > >>--- Fixed the commit message; new commit message is the following: x64/LJ_GC64: Fix emit_rma(). (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) ``` The accessing of memory address for some operation `emit_rma()` may be encoded in one of the following ways: a. If the offset of the address being accessed from the dispatch table (pinned to r14 that is not changed during trace execution) fits into 32-bit, then encode this as an access to 32-bit displacement relative to r14. b. If the offset of the address being accessed from the mcode (i.e. rip) fits into 32-bit, then encode this as an access to 32-bit displacement relative to rip (considering long mode specifics and `RID_RIP` hack). c. If the address doesn't fit into 32-bit one and we use `mov` or `movsd`, then encode 64-bit load from this address. d. Elsewhere, encode it as an access to 32-bit (the address should fit into 32-bit one) displacement (the only option for non-GC64 mode). So, each instruction in GC64 mode that differs from `mov` or `movsd` should be encoded via the last option. But if we get a 64-bit address with a big enough offset, it can't be encoded, so the assertion in `ptr2addr()` fails. There are several cases, when `emit_rma()` is used with a non-`mov` instruction: * `IR_LDEXP` with `fld` instruction for loading constant number `TValue` by address. * `IR_OBAR` with the corresponding `test` instruction on `marked` field of `GCobj`. All these instructions require an additional register to store value by address. We can't truly allocate a register here due to the possibility to break IR assembling which depends on specific register usage. So, we use and restore r14 here for emitting. Also, this patch removes `movsd` from condition from the `x86Op` type check, as far as it is never used for the `emit_rma()` routine (see also `emit_loadk64()` for details). Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#8069 ``` > >> > >>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma > >>PR: https://github.com/tarantool/tarantool/pull/8477 > >>Related issue: https://github.com/tarantool/tarantool/issues/8069 > >> > >>AFAICS, other places with `emit_rma()` usage are not related to the > >>patch as far as they take an offset for the address of JIT constants > >>stored in `jit_State`, so it always be near enough to dispatch. > >> > >>Side note: you may check test-correctness of the last check with GC by > >>changing the corresponding condition check on `GC_WHITES` in asm_obar to > >>CC_NZ (like it will be treated for incorrect check). Be carefull, member > >>that instructions are emitted from bottom to top! > >> > >> src/lj_emit_x86.h | 24 ++++- > >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ > >> 2 files changed, 123 insertions(+), 3 deletions(-) > >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua > >> > >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h > >>index 6b58306b..b3dc4ea5 100644 > >>--- a/src/lj_emit_x86.h > >>+++ b/src/lj_emit_x86.h > >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr) > >> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr)); > >> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) { > >> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr)); > >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) { > >>- emit_rmro(as, xo, rr, rr, 0); > >>- emit_loadu64(as, rr, (uintptr_t)addr); > >>+ } else if (!checki32((intptr_t)addr)) { > >>+ Reg ra = (rr & 15); > >>+ if (xo != XO_MOV) { > >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */ > >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch; > >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0; > >>+ ra = RID_DISPATCH; > >>+ if (checku32(dispaddr)) { > >>+ emit_loadi(as, ra, (int32_t)dispaddr); > >>+ } else { /* Full-size 64 bit load. */ > >>+ MCode *p = as->mcp; > >>+ *(uint64_t *)(p-8) = dispaddr; > >>+ p[-9] = (MCode)(XI_MOVri+(ra&7)); > >>+ p[-10] = 0x48 + ((ra>>3)&1); > >Why is it `0x48`? IINM, this is REX prefix for the instruction [1], Sergos should correct me if I wrong. | perl -E 'say sprintf "%08b", 0x48' | 01001000 ****WRXB ****: Fixed bit pattern. W bit is set to mark that a 64-bit operand size is used. This is the same approach as for `emit_loadu64()`, so I didn't left any comment. > >>+ p -= 10; > >>+ as->mcp = p; > >>+ } > >>+ if (xo == XO_GROUP3b) emit_i8(as, i8); > >>+ } > >>+ emit_rmro(as, xo, rr, ra, 0); > >>+ emit_loadu64(as, ra, (uintptr_t)addr); > >> } else > >> #endif > >> { > >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua > >>new file mode 100644 > >>index 00000000..faddfe83 > >>--- /dev/null > >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua <snipped> > >>+-- Test `IR_LDEXP`. > >>+ > >>+-- Reproducer here is a little tricky. > >>+-- We need to generate a bunch of traces as far we reference an > >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of > >>+-- traces is empirical. Usually, assert fails on ~33d iteration, > >>+-- so let's use 100 just to be sure. > >Is there any way to make it more deterministic? I suppose no, due to undetermenistic memory allocation. So, just use big enough number. > >>+local test_marker > >>+for _ = 1, 100 do > >>+ test_marker = loadstring([[ > >>+ local test_marker > >>+ for i = 1, 4 do > >>+ -- Avoid fold optimization, use `i` as the second argument. > >>+ -- Need some constant differs from 1 or 0 as the first > >>+ -- argument. > >>+ test_marker = math.ldexp(1.2, i) > >>+ end > >>+ return test_marker > >>+ ]])() > >>+end > >>+ > >>+-- If we here, it means no assertion failed during emitting. > >Typo: s/we/we are/ Fixed, thanks! > >>+test:ok(true, 'IR_LDEXP emit_rma') > >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result') > >>+ > >>+-- Test `IR_OBAR`. > >>+ > >>+-- First, create a closed upvalue. > >>+do > >>+ local uv -- luacheck: no unused > >>+ -- `IR_OBAR` is used for object write barrier on upvalues. > >>+ _G.change_uv = function(newv) > >>+ uv = newv > >>+ end > >>+end > >>+ > >>+-- We need a constant value on trace to be referenced far enough > >>+-- from dispatch table. So we need to create a new function > >>+-- prototype with a constant string. > >>+-- This string should be long enough to be allocated with direct > >>+-- alloc far away from dispatch. > >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024 > >Why is that amount sufficient? Link to the source file would be enough. Added the following comment. ``` @@ -64,7 +64,8 @@ end -- from dispatch table. So we need to create a new function -- prototype with a constant string. -- This string should be long enough to be allocated with direct --- alloc far away from dispatch. +-- alloc (not fitting in free chunks) far away from dispatch. +-- See <src/lj_alloc.c> for details. local DEFAULT_MMAP_THRESHOLD = 128 * 1024 local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) local func_with_trace = loadstring([[ @@ -74,7 +75,7 @@ local func_with_trace = loadstring([[ ]]) ``` > >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) > >>+local func_with_trace = loadstring([[ > >>+ for _ = 1, 4 do > >>+ change_uv(']] .. str .. [[') > >>+ end > >>+]]) > >>+func_with_trace() > >>+ > >>+-- If we here, it means no assertion failed during emitting. > >Typo: s/we here/we are here/ Fixed, thanks! > >>+test:ok(true, 'IR_OBAR emit_rma') > >>+ > >>+-- Now check the correctness. > >>+ > >>+-- Set GC state to GCpause. > >>+collectgarbage() > >>+ > >>+-- We want to wait for the situation, when upvalue is black, > >>+-- the string is gray. Both conditions are satisfied, when the > >>+-- corresponding `change_uv()` function is marked, for example. > >>+-- We don't know on what exactly step our upvalue is marked as > >Typo: s/exactly/exact/ Fixed, thanks! > >>+-- black and execution of trace become dangerous, so just check it > >>+-- at each step. > >>+-- Don't need to do the full GC cycle step by step. > >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic > >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do > >>+ collectgarbage('step') > >>+ func_with_trace() > >>+end > >>+ > >>+-- If we here, it means no assertion failed during `gc_mark()`, > >Typo: s/we here/we are here/ Fixed, thanks! > >>+-- due to wrong call to `lj_gc_barrieruv()` on trace. > >>+test:ok(true, 'IR_OBAR emit_rma check correctness') > >>+ > >>+os.exit(test:check() and 0 or 1) > >>-- > >>2.34.1 > >-- > >Best regards, > >Maxim Kokryashkin > > [1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix -- Best regards, Sergey Kaplun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma(). 2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches @ 2023-03-24 14:42 ` Maxim Kokryashkin via Tarantool-patches 0 siblings, 0 replies; 6+ messages in thread From: Maxim Kokryashkin via Tarantool-patches @ 2023-03-24 14:42 UTC (permalink / raw) To: Sergey Kaplun; +Cc: tarantool-patches [-- Attachment #1: Type: text/plain, Size: 12477 bytes --] Hi! Thanks for the fixes! LGTM. -- Best regards, Maxim Kokryashkin > >>Hi, Maxim! >>Thanks for the review! >>I've updated the commit message and comments, force-pushed the branch. >> >>On 23.03.23, Maxim Kokryashkin wrote: >>> >>> Hi! >>> Thanks for the patch! >>> >>> > >>> >>From: Mike Pall <mike> >>> >> >>> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) >>> >> >>> >>The accessing of memory address for some operation `emit_rma()` may be >>> >>encoded in one of the following ways: >>> >> a. If the offset of the accessing address from the dispatch table >>> >I suggest paraphrasing it the following way: >>> >| If the offset of the address being accessed from the dispatch table >>> > >>> >Same for the similar sentences below. >>> >> (pinned to r14 that is not changed while trace execution) fits into >>> >Typo: s/while/during >>> >> 32-bit, then encode this as an access to 32-bit displacement >>> >> relative to r14. >>> >> b. If the offset of the accessing address from the mcode (i.e. rip) >>> >> fits into 32-bit, then encode this as an access to 32-bit >>> >> displacement relative to rip (considering long mode specifics and >>> >> `RID_RIP` hack). >>> >> c. If the address doesn't fit into 32-bit one and we use `mov` or >>> >> `movsd`, then encode 64-bit load from this address. >>> >> d. Elsewhere, encode it as an access to 32-bit (the address should fit >>> >> into 32-bit one) displacement (the only option for non-GC64 mode). >>> >> >>> >>So, each instruction in GC64 mode differs from `mov` or `movsd` should >>> >Typo: s/differs/that differs/ >>> >>be encoded via the last option. But if we got a 64-bit address with a >>> >Typo: s/got/get/ >>> >>big enough offset it can't be encoded and the assertion in `ptr2addr()` >>> >Typo: s/offset/offset,/ >>> >Typo: s/encoded/encoded, >>> >>will fail. >>> >Typo: s/will fail/fails/ >>> >> >>> >>There are several cases, when `emit_rma()` is used with non `mov` >>> >Typo: s/with non-`mov`/with a non-`mov`/ >>> >>instruction: >>> >>* `IR_LDEXP` with `fld` instruction for loading constant >>> >> number `TValue` by address. >>> >>* `IR_OBAR` with the corresponding `test` instruction on >>> >> `marked` field of `GCobj`. >>> >>All these instructions require an additional register to store value by >>> >>address. We can't truly allocate a register here due to possibility to >>> >>break IR assembling which depends on specific register usage. So, we use >>> >Typo: s/due to possibility to break IR assembling/due to the possibility of breaking IR assembling, >>> >>and restore r14 here for emitting. >>> >> >>> >>Also, this patch removes `movsd` from condition from the `x86Op` type >>> >>check, as far as it never uses for the `emit_rma()` routine (see also >>> >Typo: s/uses/used >>> >>`emit_loadk64()` for details). >>> >> >>> >>Sergey Kaplun: >>> >>* added the description and the test for the problem >>> >> >>> >>Part of tarantool/tarantool#8069 >>> >>--- >> >>Fixed the commit message; new commit message is the following: >> >>x64/LJ_GC64: Fix emit_rma(). >> >>(cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) >> >>``` >>The accessing of memory address for some operation `emit_rma()` may be >>encoded in one of the following ways: >> a. If the offset of the address being accessed from the dispatch table >> (pinned to r14 that is not changed during trace execution) fits into >> 32-bit, then encode this as an access to 32-bit displacement >> relative to r14. >> b. If the offset of the address being accessed from the mcode (i.e. >> rip) fits into 32-bit, then encode this as an access to 32-bit >> displacement relative to rip (considering long mode specifics and >> `RID_RIP` hack). >> c. If the address doesn't fit into 32-bit one and we use `mov` or >> `movsd`, then encode 64-bit load from this address. >> d. Elsewhere, encode it as an access to 32-bit (the address should fit >> into 32-bit one) displacement (the only option for non-GC64 mode). >> >>So, each instruction in GC64 mode that differs from `mov` or `movsd` >>should be encoded via the last option. But if we get a 64-bit address >>with a big enough offset, it can't be encoded, so the assertion in >>`ptr2addr()` fails. >> >>There are several cases, when `emit_rma()` is used with a non-`mov` >>instruction: >>* `IR_LDEXP` with `fld` instruction for loading constant >> number `TValue` by address. >>* `IR_OBAR` with the corresponding `test` instruction on >> `marked` field of `GCobj`. >>All these instructions require an additional register to store value by >>address. We can't truly allocate a register here due to the possibility >>to break IR assembling which depends on specific register usage. So, we >>use and restore r14 here for emitting. >> >>Also, this patch removes `movsd` from condition from the `x86Op` type >>check, as far as it is never used for the `emit_rma()` routine (see also >>`emit_loadk64()` for details). >> >>Sergey Kaplun: >>* added the description and the test for the problem >> >>Part of tarantool/tarantool#8069 >>``` >> >>> >> >>> >>Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma >>> >>PR: https://github.com/tarantool/tarantool/pull/8477 >>> >>Related issue: https://github.com/tarantool/tarantool/issues/8069 >>> >> >>> >>AFAICS, other places with `emit_rma()` usage are not related to the >>> >>patch as far as they take an offset for the address of JIT constants >>> >>stored in `jit_State`, so it always be near enough to dispatch. >>> >> >>> >>Side note: you may check test-correctness of the last check with GC by >>> >>changing the corresponding condition check on `GC_WHITES` in asm_obar to >>> >>CC_NZ (like it will be treated for incorrect check). Be carefull, member >>> >>that instructions are emitted from bottom to top! >>> >> >>> >> src/lj_emit_x86.h | 24 ++++- >>> >> test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ >>> >> 2 files changed, 123 insertions(+), 3 deletions(-) >>> >> create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua >>> >> >>> >>diff --git a/src/lj_emit_x86.h b/src/lj_emit_x86.h >>> >>index 6b58306b..b3dc4ea5 100644 >>> >>--- a/src/lj_emit_x86.h >>> >>+++ b/src/lj_emit_x86.h >>> >>@@ -345,9 +345,27 @@ static void emit_rma(ASMState *as, x86Op xo, Reg rr, const void *addr) >>> >> emit_rmro(as, xo, rr, RID_DISPATCH, (int32_t)dispofs(as, addr)); >>> >> } else if (checki32(mcpofs(as, addr)) && checki32(mctopofs(as, addr))) { >>> >> emit_rmro(as, xo, rr, RID_RIP, (int32_t)mcpofs(as, addr)); >>> >>- } else if (!checki32((intptr_t)addr) && (xo == XO_MOV || xo == XO_MOVSD)) { >>> >>- emit_rmro(as, xo, rr, rr, 0); >>> >>- emit_loadu64(as, rr, (uintptr_t)addr); >>> >>+ } else if (!checki32((intptr_t)addr)) { >>> >>+ Reg ra = (rr & 15); >>> >>+ if (xo != XO_MOV) { >>> >>+ /* We can't allocate a register here. Use and restore DISPATCH. Ugly. */ >>> >>+ uint64_t dispaddr = (uintptr_t)J2GG(as->J)->dispatch; >>> >>+ uint8_t i8 = xo == XO_GROUP3b ? *as->mcp++ : 0; >>> >>+ ra = RID_DISPATCH; >>> >>+ if (checku32(dispaddr)) { >>> >>+ emit_loadi(as, ra, (int32_t)dispaddr); >>> >>+ } else { /* Full-size 64 bit load. */ >>> >>+ MCode *p = as->mcp; >>> >>+ *(uint64_t *)(p-8) = dispaddr; >>> >>+ p[-9] = (MCode)(XI_MOVri+(ra&7)); >>> >>+ p[-10] = 0x48 + ((ra>>3)&1); >>> >Why is it `0x48`? >> >>IINM, this is REX prefix for the instruction [1], Sergos should correct >>me if I wrong. >> >>| perl -E 'say sprintf "%08b", 0x48' >>| 01001000 >> ****WRXB >>****: Fixed bit pattern. >>W bit is set to mark that a 64-bit operand size is used. >> >>This is the same approach as for `emit_loadu64()`, so I didn't left any comment. >> >>> >>+ p -= 10; >>> >>+ as->mcp = p; >>> >>+ } >>> >>+ if (xo == XO_GROUP3b) emit_i8(as, i8); >>> >>+ } >>> >>+ emit_rmro(as, xo, rr, ra, 0); >>> >>+ emit_loadu64(as, ra, (uintptr_t)addr); >>> >> } else >>> >> #endif >>> >> { >>> >>diff --git a/test/tarantool-tests/fix-emit-rma.test.lua b/test/tarantool-tests/fix-emit-rma.test.lua >>> >>new file mode 100644 >>> >>index 00000000..faddfe83 >>> >>--- /dev/null >>> >>+++ b/test/tarantool-tests/fix-emit-rma.test.lua >> >><snipped> >> >>> >>+-- Test `IR_LDEXP`. >>> >>+ >>> >>+-- Reproducer here is a little tricky. >>> >>+-- We need to generate a bunch of traces as far we reference an >>> >>+-- IR field (`TValue`) address in `emit_rma()`. The amount of >>> >>+-- traces is empirical. Usually, assert fails on ~33d iteration, >>> >>+-- so let's use 100 just to be sure. >>> >Is there any way to make it more deterministic? >> >>I suppose no, due to undetermenistic memory allocation. So, just >>use big enough number. >> >>> >>+local test_marker >>> >>+for _ = 1, 100 do >>> >>+ test_marker = loadstring([[ >>> >>+ local test_marker >>> >>+ for i = 1, 4 do >>> >>+ -- Avoid fold optimization, use `i` as the second argument. >>> >>+ -- Need some constant differs from 1 or 0 as the first >>> >>+ -- argument. >>> >>+ test_marker = math.ldexp(1.2, i) >>> >>+ end >>> >>+ return test_marker >>> >>+ ]])() >>> >>+end >>> >>+ >>> >>+-- If we here, it means no assertion failed during emitting. >>> >Typo: s/we/we are/ >> >>Fixed, thanks! >> >>> >>+test:ok(true, 'IR_LDEXP emit_rma') >>> >>+test:ok(test_marker == math.ldexp(1.2, 4), 'IR_LDEXP emit_rma check result') >>> >>+ >>> >>+-- Test `IR_OBAR`. >>> >>+ >>> >>+-- First, create a closed upvalue. >>> >>+do >>> >>+ local uv -- luacheck: no unused >>> >>+ -- `IR_OBAR` is used for object write barrier on upvalues. >>> >>+ _G.change_uv = function(newv) >>> >>+ uv = newv >>> >>+ end >>> >>+end >>> >>+ >>> >>+-- We need a constant value on trace to be referenced far enough >>> >>+-- from dispatch table. So we need to create a new function >>> >>+-- prototype with a constant string. >>> >>+-- This string should be long enough to be allocated with direct >>> >>+-- alloc far away from dispatch. >>> >>+local DEFAULT_MMAP_THRESHOLD = 128 * 1024 >>> >Why is that amount sufficient? Link to the source file would be enough. >> >>Added the following comment. >>``` >>@@ -64,7 +64,8 @@ end >> -- from dispatch table. So we need to create a new function >> -- prototype with a constant string. >> -- This string should be long enough to be allocated with direct >>--- alloc far away from dispatch. >>+-- alloc (not fitting in free chunks) far away from dispatch. >>+-- See <src/lj_alloc.c> for details. >> local DEFAULT_MMAP_THRESHOLD = 128 * 1024 >> local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) >> local func_with_trace = loadstring([[ >>@@ -74,7 +75,7 @@ local func_with_trace = loadstring([[ >> ]]) >>``` >> >>> >>+local str = string.rep('x', DEFAULT_MMAP_THRESHOLD) >>> >>+local func_with_trace = loadstring([[ >>> >>+ for _ = 1, 4 do >>> >>+ change_uv(']] .. str .. [[') >>> >>+ end >>> >>+]]) >>> >>+func_with_trace() >>> >>+ >>> >>+-- If we here, it means no assertion failed during emitting. >>> >Typo: s/we here/we are here/ >> >>Fixed, thanks! >> >>> >>+test:ok(true, 'IR_OBAR emit_rma') >>> >>+ >>> >>+-- Now check the correctness. >>> >>+ >>> >>+-- Set GC state to GCpause. >>> >>+collectgarbage() >>> >>+ >>> >>+-- We want to wait for the situation, when upvalue is black, >>> >>+-- the string is gray. Both conditions are satisfied, when the >>> >>+-- corresponding `change_uv()` function is marked, for example. >>> >>+-- We don't know on what exactly step our upvalue is marked as >>> >Typo: s/exactly/exact/ >> >>Fixed, thanks! >> >>> >>+-- black and execution of trace become dangerous, so just check it >>> >>+-- at each step. >>> >>+-- Don't need to do the full GC cycle step by step. >>> >>+local old_steps_atomic = misc.getmetrics().gc_steps_atomic >>> >>+while (misc.getmetrics().gc_steps_atomic == old_steps_atomic) do >>> >>+ collectgarbage('step') >>> >>+ func_with_trace() >>> >>+end >>> >>+ >>> >>+-- If we here, it means no assertion failed during `gc_mark()`, >>> >Typo: s/we here/we are here/ >> >>Fixed, thanks! >> >>> >>+-- due to wrong call to `lj_gc_barrieruv()` on trace. >>> >>+test:ok(true, 'IR_OBAR emit_rma check correctness') >>> >>+ >>> >>+os.exit(test:check() and 0 or 1) >>> >>-- >>> >>2.34.1 >>> >-- >>> >Best regards, >>> >Maxim Kokryashkin >>> > >> >>[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix >> >>-- >>Best regards, >>Sergey Kaplun > [-- Attachment #2: Type: text/html, Size: 16034 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma(). 2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches 2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches @ 2023-04-03 9:02 ` Igor Munkin via Tarantool-patches 2023-05-25 6:16 ` Igor Munkin via Tarantool-patches 2 siblings, 0 replies; 6+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-04-03 9:02 UTC (permalink / raw) To: Sergey Kaplun; +Cc: tarantool-patches Sergey, Thanks for the patch! LGTM, considering the fixes for Max comments. On 22.03.23, Sergey Kaplun via Tarantool-patches wrote: > From: Mike Pall <mike> > > (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) > > The accessing of memory address for some operation `emit_rma()` may be > encoded in one of the following ways: > a. If the offset of the accessing address from the dispatch table > (pinned to r14 that is not changed while trace execution) fits into > 32-bit, then encode this as an access to 32-bit displacement > relative to r14. > b. If the offset of the accessing address from the mcode (i.e. rip) > fits into 32-bit, then encode this as an access to 32-bit > displacement relative to rip (considering long mode specifics and > `RID_RIP` hack). > c. If the address doesn't fit into 32-bit one and we use `mov` or > `movsd`, then encode 64-bit load from this address. > d. Elsewhere, encode it as an access to 32-bit (the address should fit > into 32-bit one) displacement (the only option for non-GC64 mode). > > So, each instruction in GC64 mode differs from `mov` or `movsd` should > be encoded via the last option. But if we got a 64-bit address with a > big enough offset it can't be encoded and the assertion in `ptr2addr()` > will fail. > > There are several cases, when `emit_rma()` is used with non `mov` > instruction: > * `IR_LDEXP` with `fld` instruction for loading constant > number `TValue` by address. > * `IR_OBAR` with the corresponding `test` instruction on > `marked` field of `GCobj`. > All these instructions require an additional register to store value by > address. We can't truly allocate a register here due to possibility to > break IR assembling which depends on specific register usage. So, we use > and restore r14 here for emitting. > > Also, this patch removes `movsd` from condition from the `x86Op` type > check, as far as it never uses for the `emit_rma()` routine (see also > `emit_loadk64()` for details). > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#8069 > --- > > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma > PR: https://github.com/tarantool/tarantool/pull/8477 > Related issue: https://github.com/tarantool/tarantool/issues/8069 > > AFAICS, other places with `emit_rma()` usage are not related to the > patch as far as they take an offset for the address of JIT constants > stored in `jit_State`, so it always be near enough to dispatch. > > Side note: you may check test-correctness of the last check with GC by > changing the corresponding condition check on `GC_WHITES` in asm_obar to > CC_NZ (like it will be treated for incorrect check). Be carefull, member > that instructions are emitted from bottom to top! > > src/lj_emit_x86.h | 24 ++++- > test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ > 2 files changed, 123 insertions(+), 3 deletions(-) > create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua > <snipped> > -- > 2.34.1 > -- Best regards, IM ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma(). 2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches 2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches 2023-04-03 9:02 ` Igor Munkin via Tarantool-patches @ 2023-05-25 6:16 ` Igor Munkin via Tarantool-patches 2 siblings, 0 replies; 6+ messages in thread From: Igor Munkin via Tarantool-patches @ 2023-05-25 6:16 UTC (permalink / raw) To: Sergey Kaplun; +Cc: tarantool-patches Sergey, I've checked the patchset into all long-term branches in tarantool/luajit and bumped a new version in master, 2.11 and 2.10. On 22.03.23, Sergey Kaplun via Tarantool-patches wrote: > From: Mike Pall <mike> > > (cherry picked from commit 7e662e4f87134f1e84f7bea80933e033c5bf53a3) > > The accessing of memory address for some operation `emit_rma()` may be > encoded in one of the following ways: > a. If the offset of the accessing address from the dispatch table > (pinned to r14 that is not changed while trace execution) fits into > 32-bit, then encode this as an access to 32-bit displacement > relative to r14. > b. If the offset of the accessing address from the mcode (i.e. rip) > fits into 32-bit, then encode this as an access to 32-bit > displacement relative to rip (considering long mode specifics and > `RID_RIP` hack). > c. If the address doesn't fit into 32-bit one and we use `mov` or > `movsd`, then encode 64-bit load from this address. > d. Elsewhere, encode it as an access to 32-bit (the address should fit > into 32-bit one) displacement (the only option for non-GC64 mode). > > So, each instruction in GC64 mode differs from `mov` or `movsd` should > be encoded via the last option. But if we got a 64-bit address with a > big enough offset it can't be encoded and the assertion in `ptr2addr()` > will fail. > > There are several cases, when `emit_rma()` is used with non `mov` > instruction: > * `IR_LDEXP` with `fld` instruction for loading constant > number `TValue` by address. > * `IR_OBAR` with the corresponding `test` instruction on > `marked` field of `GCobj`. > All these instructions require an additional register to store value by > address. We can't truly allocate a register here due to possibility to > break IR assembling which depends on specific register usage. So, we use > and restore r14 here for emitting. > > Also, this patch removes `movsd` from condition from the `x86Op` type > check, as far as it never uses for the `emit_rma()` routine (see also > `emit_loadk64()` for details). > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#8069 > --- > > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-emit-rma > PR: https://github.com/tarantool/tarantool/pull/8477 > Related issue: https://github.com/tarantool/tarantool/issues/8069 > > AFAICS, other places with `emit_rma()` usage are not related to the > patch as far as they take an offset for the address of JIT constants > stored in `jit_State`, so it always be near enough to dispatch. > > Side note: you may check test-correctness of the last check with GC by > changing the corresponding condition check on `GC_WHITES` in asm_obar to > CC_NZ (like it will be treated for incorrect check). Be carefull, member > that instructions are emitted from bottom to top! > > src/lj_emit_x86.h | 24 ++++- > test/tarantool-tests/fix-emit-rma.test.lua | 102 +++++++++++++++++++++ > 2 files changed, 123 insertions(+), 3 deletions(-) > create mode 100644 test/tarantool-tests/fix-emit-rma.test.lua > <snipped> > -- > 2.34.1 > -- Best regards, IM ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-05-25 6:19 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-22 8:27 [Tarantool-patches] [PATCH luajit] x64/LJ_GC64: Fix emit_rma() Sergey Kaplun via Tarantool-patches 2023-03-23 9:36 ` Maxim Kokryashkin via Tarantool-patches 2023-03-24 13:05 ` Sergey Kaplun via Tarantool-patches 2023-03-24 14:42 ` Maxim Kokryashkin via Tarantool-patches 2023-04-03 9:02 ` Igor Munkin via Tarantool-patches 2023-05-25 6:16 ` Igor Munkin via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox