* [Tarantool-patches] [PATCH luajit] x86/x64: Add more red zone checks to assembler backend.
@ 2025-01-16 13:35 Sergey Kaplun via Tarantool-patches
2025-01-16 14:55 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 1 reply; 4+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-01-16 13:35 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Thanks to Peter Cawley.
(cherry picked from commit d854d00ce94b274359e5181bed13e977420daf5c)
Assembling some instructions (like `IR_CONV int.num`, for example) with
many mcode to be emitted may overflow the `MCLIM_REDZONE` (64) at once
due to the huge mcode emitting.
For example `IR_CONV` in this test requires 66 bytes of the
machine code:
| cvttsd2si r15d, xmm5
| xorps xmm9, xmm9
| cvtsi2sd xmm9, r15d
| ucomisd xmm5, xmm9
| jnz 0x11edb00e5 ->37
| jpe 0x11edb00e5 ->37
| mov [rsp+0x80], r15d
| mov r15, [rsp+0xe8]
| movsd xmm9, [rsp+0xe0]
| movsd xmm5, [rsp+0xd8]
The reproducer needs sufficient register pressure as to immediately
spill the result of the instruction to the stack and then reload the
three registers used by the instruction, and to have chosen enough
registers with numbers >=8 (because shaving off a REX prefix [1] or two
would get 66 back down to <= `MCLIM_REDZONE`), and to be using lots of
spill slots (because memory offsets <= 0x7f are shorter to encode
compared to those >= 0x80. So, each reload instruction consumes 9 bytes.
This makes this reproducer unstable (regarding the register allocator
changes). Thus, only original test case is added as a regression test.
This patch adds the red zone overflow checks more often for the IRs with
many instructions to be emitted.
Sergey Kaplun:
* added the description and the test for the problem
[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
Part of tarantool/tarantool#10709
---
Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1116-redzones-checks
Related issues:
* https://github.com/tarantool/tarantool/issues/10709
* https://github.com/LuaJIT/LuaJIT/issues/1116
src/lj_asm_x86.h | 8 +-
.../lj-1116-redzones-checks.test.lua | 118 ++++++++++++++++++
2 files changed, 124 insertions(+), 2 deletions(-)
create mode 100644 test/tarantool-tests/lj-1116-redzones-checks.test.lua
diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
index a96bc2e7..4ae721a4 100644
--- a/src/lj_asm_x86.h
+++ b/src/lj_asm_x86.h
@@ -794,6 +794,7 @@ static void asm_tointg(ASMState *as, IRIns *ir, Reg left)
emit_rr(as, XO_UCOMISD, left, tmp);
emit_rr(as, XO_CVTSI2SD, tmp, dest);
emit_rr(as, XO_XORPS, tmp, tmp); /* Avoid partial register stall. */
+ checkmclim(as);
emit_rr(as, XO_CVTTSD2SI, dest, left);
/* Can't fuse since left is needed twice. */
}
@@ -836,6 +837,7 @@ static void asm_conv(ASMState *as, IRIns *ir)
emit_rr(as, XO_SUBSD, dest, bias); /* Subtract 2^52+2^51 bias. */
emit_rr(as, XO_XORPS, dest, bias); /* Merge bias and integer. */
emit_rma(as, XO_MOVSD, bias, k);
+ checkmclim(as);
emit_mrm(as, XO_MOVD, dest, asm_fuseload(as, lref, RSET_GPR));
return;
} else { /* Integer to FP conversion. */
@@ -1151,6 +1153,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
asm_guardcc(as, CC_E);
else
emit_sjcc(as, CC_E, l_end);
+ checkmclim(as);
if (irt_isnum(kt)) {
if (isk) {
/* Assumes -0.0 is already canonicalized to +0.0. */
@@ -1210,7 +1213,6 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
#endif
}
emit_sfixup(as, l_loop);
- checkmclim(as);
#if LJ_GC64
if (!isk && irt_isaddr(kt)) {
emit_rr(as, XO_OR, tmp|REX_64, key);
@@ -1242,6 +1244,7 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
emit_rr(as, XO_ARITH(XOg_SUB), dest, tmp);
emit_shifti(as, XOg_ROL, tmp, HASH_ROT3);
emit_rr(as, XO_ARITH(XOg_XOR), dest, tmp);
+ checkmclim(as);
emit_shifti(as, XOg_ROL, dest, HASH_ROT2);
emit_rr(as, XO_ARITH(XOg_SUB), tmp, dest);
emit_shifti(as, XOg_ROL, dest, HASH_ROT1);
@@ -1259,7 +1262,6 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
} else {
emit_rr(as, XO_MOV, tmp, key);
#if LJ_GC64
- checkmclim(as);
emit_gri(as, XG_ARITHi(XOg_XOR), dest, irt_toitype(kt) << 15);
if ((as->flags & JIT_F_BMI2)) {
emit_i8(as, 32);
@@ -1530,6 +1532,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
if (irt_islightud(ir->t)) {
Reg dest = asm_load_lightud64(as, ir, 1);
if (ra_hasreg(dest)) {
+ checkmclim(as);
asm_fuseahuref(as, ir->op1, RSET_GPR);
emit_mrm(as, XO_MOV, dest|REX_64, RID_MRM);
}
@@ -1574,6 +1577,7 @@ static void asm_ahuvload(ASMState *as, IRIns *ir)
if (LJ_64 && irt_type(ir->t) >= IRT_NUM) {
lj_assertA(irt_isinteger(ir->t) || irt_isnum(ir->t),
"bad load type %d", irt_type(ir->t));
+ checkmclim(as);
#if LJ_GC64
emit_u32(as, LJ_TISNUM << 15);
#else
diff --git a/test/tarantool-tests/lj-1116-redzones-checks.test.lua b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
new file mode 100644
index 00000000..70062ec9
--- /dev/null
+++ b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
@@ -0,0 +1,118 @@
+local tap = require('tap')
+-- Test file to demonstrate mcode area overflow during recording a
+-- trace with the high FPR pressure.
+-- See also, https://github.com/LuaJIT/LuaJIT/issues/1116.
+--
+-- XXX: Test fails only with GC64 enabled before the commit.
+local test = tap.test('lj-1116-redzones-checks'):skipcond({
+ ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(1)
+
+jit.opt.start('hotloop=1')
+
+-- XXX: This test snippet was originally created by the fuzzer.
+-- See https://oss-fuzz.com/testcase-detail/5622965122170880.
+--
+-- Unfortunately, it's impossible to reduce the testcase further.
+-- Before the patch, assembling some instructions (like `IR_CONV
+-- int.num`, for example) with many mcode to be emitted may
+-- overflow the `MCLIM_REDZONE` (64) at once due to the huge
+-- mcode emitting.
+-- For example `IR_CONV` in this test requires 66 bytes of the
+-- machine code:
+-- | cvttsd2si r15d, xmm5
+-- | xorps xmm9, xmm9
+-- | cvtsi2sd xmm9, r15d
+-- | ucomisd xmm5, xmm9
+-- | jnz 0x11edb00e5 ->37
+-- | jpe 0x11edb00e5 ->37
+-- | mov [rsp+0x80], r15d
+-- | mov r15, [rsp+0xe8]
+-- | movsd xmm9, [rsp+0xe0]
+-- | movsd xmm5, [rsp+0xd8]
+--
+-- The reproducer needs sufficient register pressure as to
+-- immediately spill the result of the instruction to the stack
+-- and then reload the three registers used by the instruction,
+-- and to have chosen enough registers with numbers >=8 (because
+-- shaving off a REX prefix [1] or two would get 66 back down
+-- to <= `MCLIM_REDZONE`), and to be using lots of spill slots
+-- (because memory offsets <= 0x7f are shorter to encode compared
+-- to those >= 0x80. So, each reload instruction consumes 9 bytes.
+-- This makes this reproducer unstable (regarding the register
+-- allocator changes). So, lets use this as a regression test.
+--
+-- [1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
+
+_G.a = 0
+_G.b = 0
+_G.c = 0
+_G.d = 0
+_G.e = 0
+_G.f = 0
+_G.g = 0
+_G.h = 0
+-- Skip `i`.
+_G.j = 0
+_G.k = 0
+_G.l = 0
+_G.m = 0
+_G.n = 0
+_G.o = 0
+_G.p = 0
+_G.q = 0
+_G.r = 0
+_G.s = 0
+_G.t = 0
+_G.u = 0
+_G.v = 0
+_G.w = 0
+_G.x = 0
+_G.y = 0
+_G.z = 0
+
+-- XXX: Need here not 4, but 4.5 top border of the cycle to create
+-- FPR pressure.
+for i = 1, 4.5 do
+ _G.a = _G.a + 1
+ _G.b = _G.b + 1
+ _G.c = _G.c + 1
+ _G.d = _G.d + 1
+ for _ = i, 2 do
+ _G.e = _G.e + 1
+ end
+ -- Here we emit `IR_CONV int.num`. This loop is inlined.
+ -- Assertion failed after emitting the variant part of the
+ -- big loop.
+ for _ = 2, i do
+ _G.f = _G.f + 1
+ _G.g = _G.g + 1
+ _G.h = _G.h + 1
+ _G.j = _G.j + 1
+ _G.k = _G.k + 1
+ _G.l = _G.l + 1
+ end
+ _G.m = _G.m + 1
+ _G.n = _G.n + 1
+ _G.o = _G.o + 1
+ _G.p = _G.p + 1
+ _G.q = _G.q + 1
+ _G.r = _G.r + 1
+ _G.s = _G.s + 1
+ _G.t = _G.t + 1
+ _G.u = _G.u + 1
+ _G.v = _G.v + 1
+ for _ = i, 2.1 do
+ _G.aa = _G.a
+ _G.w = _G.w + 1
+ _G.x = _G.x + 1
+ _G.y = _G.y + 1
+ _G.z = _G.z + 1
+ end
+end
+
+test:ok(true, 'no mcode limit assertion failed during recording')
+
+test:done(true)
--
2.47.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x86/x64: Add more red zone checks to assembler backend.
2025-01-16 13:35 [Tarantool-patches] [PATCH luajit] x86/x64: Add more red zone checks to assembler backend Sergey Kaplun via Tarantool-patches
@ 2025-01-16 14:55 ` Sergey Bronnikov via Tarantool-patches
2025-01-17 8:31 ` Sergey Kaplun via Tarantool-patches
0 siblings, 1 reply; 4+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-01-16 14:55 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 2620 bytes --]
Hi, Sergey,
thanks for the patch! See comments below.
On 16.01.2025 16:35, Sergey Kaplun wrote:
<snipped>
> diff --git a/test/tarantool-tests/lj-1116-redzones-checks.test.lua b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
> new file mode 100644
> index 00000000..70062ec9
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
> @@ -0,0 +1,118 @@
> +local tap = require('tap')
> +-- Test file to demonstrate mcode area overflow during recording a
> +-- trace with the high FPR pressure.
> +-- See also,https://github.com/LuaJIT/LuaJIT/issues/1116.
> +--
> +-- XXX: Test fails only with GC64 enabled before the commit.
I would rephrase: XXX: Test fails with reverted fix and enabled GC64.
> +local test = tap.test('lj-1116-redzones-checks'):skipcond({
> + ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +test:plan(1)
> +
> +jit.opt.start('hotloop=1')
> +
> +-- XXX: This test snippet was originally created by the fuzzer.
> +-- Seehttps://oss-fuzz.com/testcase-detail/5622965122170880.
> +--
> +-- Unfortunately, it's impossible to reduce the testcase further.
> +-- Before the patch, assembling some instructions (like `IR_CONV
> +-- int.num`, for example) with many mcode to be emitted may
> +-- overflow the `MCLIM_REDZONE` (64) at once due to the huge
> +-- mcode emitting.
> +-- For example `IR_CONV` in this test requires 66 bytes of the
> +-- machine code:
> +-- | cvttsd2si r15d, xmm5
> +-- | xorps xmm9, xmm9
> +-- | cvtsi2sd xmm9, r15d
> +-- | ucomisd xmm5, xmm9
> +-- | jnz 0x11edb00e5 ->37
> +-- | jpe 0x11edb00e5 ->37
> +-- | mov [rsp+0x80], r15d
> +-- | mov r15, [rsp+0xe8]
> +-- | movsd xmm9, [rsp+0xe0]
> +-- | movsd xmm5, [rsp+0xd8]
> +--
> +-- The reproducer needs sufficient register pressure as to
> +-- immediately spill the result of the instruction to the stack
> +-- and then reload the three registers used by the instruction,
> +-- and to have chosen enough registers with numbers >=8 (because
> +-- shaving off a REX prefix [1] or two would get 66 back down
> +-- to <= `MCLIM_REDZONE`), and to be using lots of spill slots
> +-- (because memory offsets <= 0x7f are shorter to encode compared
> +-- to those >= 0x80. So, each reload instruction consumes 9 bytes.
> +-- This makes this reproducer unstable (regarding the register
> +-- allocator changes). So, lets use this as a regression test.
> +--
> +-- [1]:https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
> +
> +_G.a = 0
> +_G.b = 0
> +_G.c = 0
> +_G.d = 0
> +_G.e = 0
> +_G.f = 0
> +_G.g = 0
> +_G.h = 0
> +-- Skip `i`.
I didn't get it.
<snipped>
[-- Attachment #2: Type: text/html, Size: 3855 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] x86/x64: Add more red zone checks to assembler backend.
2025-01-16 14:55 ` Sergey Bronnikov via Tarantool-patches
@ 2025-01-17 8:31 ` Sergey Kaplun via Tarantool-patches
2025-01-17 10:07 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 1 reply; 4+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-01-17 8:31 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
Fixed your comments below.
Branch is force-pushed.
On 16.01.25, Sergey Bronnikov wrote:
> Hi, Sergey,
>
> thanks for the patch! See comments below.
>
>
> On 16.01.2025 16:35, Sergey Kaplun wrote:
>
>
> <snipped>
>
> > diff --git a/test/tarantool-tests/lj-1116-redzones-checks.test.lua b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
> > new file mode 100644
> > index 00000000..70062ec9
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
> > @@ -0,0 +1,118 @@
> > +local tap = require('tap')
> > +-- Test file to demonstrate mcode area overflow during recording a
> > +-- trace with the high FPR pressure.
> > +-- See also,https://github.com/LuaJIT/LuaJIT/issues/1116.
> > +--
> > +-- XXX: Test fails only with GC64 enabled before the commit.
> I would rephrase: XXX: Test fails with reverted fix and enabled GC64.
Rephrased as you suggested. See the iterative patch below.
> > +local test = tap.test('lj-1116-redzones-checks'):skipcond({
> > + ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(1)
> > +
<snipped>
> > +--
> > +-- [1]:https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
> > +
> > +_G.a = 0
> > +_G.b = 0
> > +_G.c = 0
> > +_G.d = 0
> > +_G.e = 0
> > +_G.f = 0
> > +_G.g = 0
> > +_G.h = 0
> > +-- Skip `i`.
>
> I didn't get it.
Updated the commit:
===================================================================
diff --git a/test/tarantool-tests/lj-1116-redzones-checks.test.lua b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
index 70062ec9..4f4f5870 100644
--- a/test/tarantool-tests/lj-1116-redzones-checks.test.lua
+++ b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
@@ -3,7 +3,7 @@ local tap = require('tap')
-- trace with the high FPR pressure.
-- See also, https://github.com/LuaJIT/LuaJIT/issues/1116.
--
--- XXX: Test fails only with GC64 enabled before the commit.
+-- XXX: Test fails with reverted fix and enabled GC64 mode.
local test = tap.test('lj-1116-redzones-checks'):skipcond({
['Test requires JIT enabled'] = not jit.status(),
})
@@ -54,7 +54,7 @@ _G.e = 0
_G.f = 0
_G.g = 0
_G.h = 0
--- Skip `i`.
+-- Skip `i` -- it is used for the loop index.
_G.j = 0
_G.k = 0
_G.l = 0
===================================================================
>
>
> <snipped>
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-17 10:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-16 13:35 [Tarantool-patches] [PATCH luajit] x86/x64: Add more red zone checks to assembler backend Sergey Kaplun via Tarantool-patches
2025-01-16 14:55 ` Sergey Bronnikov via Tarantool-patches
2025-01-17 8:31 ` Sergey Kaplun via Tarantool-patches
2025-01-17 10:07 ` Sergey Bronnikov via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox