Hi, Sergey,

thanks for the patch! See comments below.


On 16.01.2025 16:35, Sergey Kaplun wrote:


<snipped>

diff --git a/test/tarantool-tests/lj-1116-redzones-checks.test.lua b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
new file mode 100644
index 00000000..70062ec9
--- /dev/null
+++ b/test/tarantool-tests/lj-1116-redzones-checks.test.lua
@@ -0,0 +1,118 @@
+local tap = require('tap')
+-- Test file to demonstrate mcode area overflow during recording a
+-- trace with the high FPR pressure.
+-- See also, https://github.com/LuaJIT/LuaJIT/issues/1116.
+--
+-- XXX: Test fails only with GC64 enabled before the commit.
I would rephrase: XXX: Test fails with reverted fix and enabled GC64.
+local test = tap.test('lj-1116-redzones-checks'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(1)
+
+jit.opt.start('hotloop=1')
+
+-- XXX: This test snippet was originally created by the fuzzer.
+-- See https://oss-fuzz.com/testcase-detail/5622965122170880.
+--
+-- Unfortunately, it's impossible to reduce the testcase further.
+-- Before the patch, assembling some instructions (like `IR_CONV
+-- int.num`, for example) with many mcode to be emitted may
+-- overflow the `MCLIM_REDZONE` (64) at once due to the huge
+-- mcode emitting.
+-- For example `IR_CONV` in this test requires 66 bytes of the
+-- machine code:
+-- |  cvttsd2si r15d, xmm5
+-- |  xorps xmm9, xmm9
+-- |  cvtsi2sd xmm9, r15d
+-- |  ucomisd xmm5, xmm9
+-- |  jnz 0x11edb00e5       ->37
+-- |  jpe 0x11edb00e5       ->37
+-- |  mov [rsp+0x80], r15d
+-- |  mov r15, [rsp+0xe8]
+-- |  movsd xmm9, [rsp+0xe0]
+-- |  movsd xmm5, [rsp+0xd8]
+--
+-- The reproducer needs sufficient register pressure as to
+-- immediately spill the result of the instruction to the stack
+-- and then reload the three registers used by the instruction,
+-- and to have chosen enough registers with numbers >=8 (because
+-- shaving off a REX prefix [1] or two would get 66 back down
+-- to <= `MCLIM_REDZONE`), and to be using lots of spill slots
+-- (because memory offsets <= 0x7f are shorter to encode compared
+-- to those >= 0x80. So, each reload instruction consumes 9 bytes.
+-- This makes this reproducer unstable (regarding the register
+-- allocator changes). So, lets use this as a regression test.
+--
+-- [1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix
+
+_G.a = 0
+_G.b = 0
+_G.c = 0
+_G.d = 0
+_G.e = 0
+_G.f = 0
+_G.g = 0
+_G.h = 0
+-- Skip `i`.

I didn't get it.



    
<snipped>