Hi, Sergey,

thanks for review! Please see my comments below.

Sergey

On 9/1/25 16:07, Sergey Kaplun via Tarantool-patches wrote:
Hi, Sergey!
Thanks for the patch!
Please consider my comments below.

On 27.08.25, Sergey Bronnikov wrote:
Thanks to Peter Cawley.

(cherry picked from commit d1a2fef8a8f53b0055ee041f7f63d83a27444ffa)

The builtin `pcall()` has two separate ways by which it can
grow the stack by one slot:

1. Resolving the `__call` metamethod of its first argument.
This is unrelated to this patch, so it can be omitted.

2. Growing the stack by one slot in LJ_FR2 mode.

The first case leads to a stack smash if `pcall()` is used as
`__call`. Setting a metatable with this metamethod will cause
an infinite loop which fills up the stack with `pcall`-frames
and then keeps going beyond the end of the stack until it segfaults.
This issue is not related to this patch.

Either of these points can cause an issue if `pcall()` is used as
`__newindex`.
Looks like the metamethods are not required for issue reproducing.

              The patch partially fixes aforementioned issues.
By how?

I've updated the commit message as the following:

    Stack overflow can cause a segmentation fault in vararg
    function on ARM64 and MIPS64 in LJ_FR2 mode. This happen
    because stack check in BC_IFUNCV is off by one on these
    platforms without the patch. The patch partially fixes
    aforementioned issue by bumping LJ_STACK_EXTRA by 1 to
    give a space to write the entire frame link and fixing
    a number of last free slot in the stack.


Sergey Bronnikov:
* added the description and the test for the problem

Part of tarantool/tarantool#11691
---
 src/lj_def.h                                  |  2 +-
 src/lj_dispatch.c                             |  2 +-
 src/vm_arm64.dasc                             |  1 +
 src/vm_mips64.dasc                            |  1 +
 ...048-fix-stack-checks-vararg-calls.test.lua | 56 +++++++++++++++++++
 5 files changed, 60 insertions(+), 2 deletions(-)
 create mode 100644 test/tarantool-tests/lj-1048-fix-stack-checks-vararg-calls.test.lua

diff --git a/src/lj_def.h b/src/lj_def.h
<snipped>

diff --git a/src/lj_dispatch.c b/src/lj_dispatch.c
index a44a5adf..431cb3c2 100644
--- a/src/lj_dispatch.c
+++ b/src/lj_dispatch.c
@@ -453,7 +453,7 @@ static int call_init(lua_State *L, GCfunc *fn)
     int numparams = pt->numparams;
     int gotparams = (int)(L->top - L->base);
     int need = pt->framesize;
-    if ((pt->flags & PROTO_VARARG)) need += 1+gotparams;
+    if ((pt->flags & PROTO_VARARG)) need += 1+LJ_FR2+gotparams;
I can't see the test related to this change. Not `prober_1()` nor
`prober_2()` lead to the assertion failure for x86_64 or aarch64 without
it.

Please check again. Both testcases trigger segfault on AArch64 (odroid).

cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug  -DLUA_USE_ASSERT=ON -DLUA_USE_APICHECK=ON

cmake --build build --parallel

LUA_PATH="/root/sergeyb/luajit/test/tarantool-tests/?.lua;/root/sergeyb/luajit/test/tarantool-tests/?/init.lua;/root/sergeyb/luajit/src/?.lua;/root/sergeyb/luajit/build/src/?.lua;;" gdb --args /root/sergeyb/luajit/build/src/luajit "-e" "dofile[[/root/sergeyb/luajit/test/luajit-test-init.lua]]" "/root/sergeyb/luajit/test/tarantool-tests/lj-1048-fix-stack-checks-vararg-calls.test.lua"

(gdb) bt

#0  0x00000055555c16f4 in lj_alloc_free (msp=0x7fb7d56010, ptr=0x7fb7d69088)
    at /root/sergeyb/luajit/src/lj_alloc.c:1405
#1  0x00000055555c1fe4 in lj_alloc_realloc (msp=0x7fb7d56010, ptr=0x7fb7d69088, nsize=1696)
    at /root/sergeyb/luajit/src/lj_alloc.c:1471
#2  0x00000055555c204c in lj_alloc_f (msp=0x7fb7d56010, ptr=0x7fb7d69088, osize=816, nsize=1696)
    at /root/sergeyb/luajit/src/lj_alloc.c:1486
#3  0x00000055555790e0 in lj_mem_realloc (L=0x7fb7d6d330, p=0x7fb7d69088, osz=816, nsz=1696)
    at /root/sergeyb/luajit/src/lj_gc.c:896
#4  0x000000555557e610 in resizestack (L=0x7fb7d6d330, n=204) at /root/sergeyb/luajit/src/lj_state.c:82
#5  0x000000555557e970 in lj_state_growstack (L=0x7fb7d6d330, need=48)
    at /root/sergeyb/luajit/src/lj_state.c:130
#6  0x00000055555fad68 in lj_vm_growstack_l () at buildvm_arm64.dasc:1263
#7  0x00000055555fb8d4 in lj_ff_coroutine_wrap_aux () at buildvm_arm64.dasc:1775
#8  0x000000555556a824 in lua_pcall (L=0x7fb7d56378, nargs=0, nresults=-1, errfunc=2)
    at /root/sergeyb/luajit/src/lj_api.c:1173
#9  0x000000555555d258 in docall (L=0x7fb7d56378, narg=0, clear=0) at /root/sergeyb/luajit/src/luajit.c:134
#10 0x000000555555db9c in handle_script (L=0x7fb7d56378, argx=0x7ffffff280)
    at /root/sergeyb/luajit/src/luajit.c:304
#11 0x000000555555ea54 in pmain (L=0x7fb7d56378) at /root/sergeyb/luajit/src/luajit.c:602
#12 0x00000055555fab90 in lj_BC_FUNCC () at buildvm_arm64.dasc:894
#13 0x000000555556ad90 in lua_cpcall (L=0x7fb7d56378, func=0x555555e898 <pmain>, ud=0x0)
    at /root/sergeyb/luajit/src/lj_api.c:1208
#14 0x000000555555ebb4 in main (argc=4, argv=0x7ffffff268) at /root/sergeyb/luajit/src/luajit.c:633

(gdb)

With commented out first testcase:

(gdb) bt
#0  0x00000055555c18fc in lj_alloc_free (msp=0x7fb7d56010, ptr=0x7fb7d69068)
    at /root/sergeyb/luajit/src/lj_alloc.c:1406
#1  0x00000055555c1fe4 in lj_alloc_realloc (msp=0x7fb7d56010, ptr=0x7fb7d69068, nsize=1696)
    at /root/sergeyb/luajit/src/lj_alloc.c:1471
#2  0x00000055555c204c in lj_alloc_f (msp=0x7fb7d56010, ptr=0x7fb7d69068, osize=816, nsize=1696)
    at /root/sergeyb/luajit/src/lj_alloc.c:1486
#3  0x00000055555790e0 in lj_mem_realloc (L=0x7fb7d6d2a0, p=0x7fb7d69068, osz=816, nsz=1696)
    at /root/sergeyb/luajit/src/lj_gc.c:896
#4  0x000000555557e610 in resizestack (L=0x7fb7d6d2a0, n=204) at /root/sergeyb/luajit/src/lj_state.c:82
#5  0x000000555557e970 in lj_state_growstack (L=0x7fb7d6d2a0, need=48)
    at /root/sergeyb/luajit/src/lj_state.c:130
#6  0x00000055555fad68 in lj_vm_growstack_l () at buildvm_arm64.dasc:1263
#7  0x00000055555fb8d4 in lj_ff_coroutine_wrap_aux () at buildvm_arm64.dasc:1775
#8  0x000000555556a824 in lua_pcall (L=0x7fb7d56378, nargs=0, nresults=-1, errfunc=2)
    at /root/sergeyb/luajit/src/lj_api.c:1173
#9  0x000000555555d258 in docall (L=0x7fb7d56378, narg=0, clear=0) at /root/sergeyb/luajit/src/luajit.c:134
#10 0x000000555555db9c in handle_script (L=0x7fb7d56378, argx=0x7ffffff280)
    at /root/sergeyb/luajit/src/luajit.c:304
#11 0x000000555555ea54 in pmain (L=0x7fb7d56378) at /root/sergeyb/luajit/src/luajit.c:602
#12 0x00000055555fab90 in lj_BC_FUNCC () at buildvm_arm64.dasc:894
#13 0x000000555556ad90 in lua_cpcall (L=0x7fb7d56378, func=0x555555e898 <pmain>, ud=0x0)
    at /root/sergeyb/luajit/src/lj_api.c:1208
#14 0x000000555555ebb4 in main (argc=4, argv=0x7ffffff268) at /root/sergeyb/luajit/src/luajit.c:633
(gdb)


     lj_state_checkstack(L, (MSize)need);
     numparams -= gotparams;
     return numparams >= 0 ? numparams : 0;
diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
index c5f0a7a7..cf8e575a 100644
--- a/src/vm_arm64.dasc
+++ b/src/vm_arm64.dasc
@@ -3779,6 +3779,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
     |   add TMP2, BASE, RC
     |   add LFUNC:CARG3, CARG3, TMP0, lsl #47
     |  add RA, RA, RC
+    |  sub CARG1, CARG1, #8
Please mention in the commit message why the original stack check was
incorrect (for aarch64 and mips64).

Also, mention why the x64 isn't affected:

x64:
| RA == BASE + (RD=NARGS+1)*8 + framesize * 8 +8 > maxstack
The last summand here is the `LJ_FR2` adjustment.

arm64|mips64 -- incorrect check:
| RA == BASE + (RD=NARGS)*8 + framesize * 8 >= maxstack

Added.

      
     |   add TMP0, RC, #16+FRAME_VARG
     |   str LFUNC:CARG3, [TMP2], #8	// Store (tagged) copy of LFUNC.
     |    ldr KBASE, [PC, #-4+PC2PROTO(k)]
diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
index 44fba36c..7f49df5b 100644
--- a/src/vm_mips64.dasc
+++ b/src/vm_mips64.dasc
<snipped>

diff --git a/test/tarantool-tests/lj-1048-fix-stack-checks-vararg-calls.test.lua b/test/tarantool-tests/lj-1048-fix-stack-checks-vararg-calls.test.lua
new file mode 100644
index 00000000..e300d5c1
--- /dev/null
+++ b/test/tarantool-tests/lj-1048-fix-stack-checks-vararg-calls.test.lua
@@ -0,0 +1,56 @@
+local tap = require('tap')
+
+-- A test file to demonstrate a stack overflow in `pcall()` in
+-- some cases, see below testcase descriptions.
+-- See also https://github.com/LuaJIT/LuaJIT/issues/1048.
+local test = tap.test('lj-1048-fix-stack-checks-vararg-calls'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(2)
+
+-- The first testcase demonstrate a stack overflow in `pcall()`
+-- by recursive calling `pcall()`. The functions are vararg
+-- because stack check in BC_IFUNCV is off by one without the
Minor: by one for the arm64, mips64 architectures.

Updated (here and below):

--- a/test/tarantool-tests/lj-1048-fix-stack-checks-vararg-calls.test.lua
+++ b/test/tarantool-tests/lj-1048-fix-stack-checks-vararg-calls.test.lua
@@ -11,8 +11,8 @@ test:plan(2)
 
 -- The first testcase demonstrate a stack overflow in `pcall()`
 -- by recursive calling `pcall()`. The functions are vararg
--- because stack check in BC_IFUNCV is off by one without the
--- patch.
+-- because stack check in BC_IFUNCV is off by one on ARM64,
+-- MIPS64 without the patch.
 local function prober_1(...) -- luacheck: no unused
   pcall(pcall, pcall, pcall, pcall, pcall, pcall, pcall, pcall, pairs, {})
 end


+-- patch.
+local function prober_1(...) -- luacheck: no unused
+  pcall(pcall, pcall, pcall, pcall, pcall, pcall, pcall, pcall, pairs, {})
+end
Why do we want to use probber_1 here? Why is this different from the
second example? Only because of the metamethods?

If we want to keep it, please describe why we need at least 9 pcall-s.
As I got right, exactly this number of pcall's is needed to trigger a stack overflow.

Also, there is no need for `pairs()` here. Let's use another simpler fast
function (like `type()`).

(discussed in a private conversation)

Updated:

 local function prober_1(...) -- luacheck: no unused
-  pcall(pcall, pcall, pcall, pcall, pcall, pcall, pcall, pcall, pairs, {})
+  -- Any fast function can be used, but `type` is most convenient
+  -- here because it works fast and can be used with any data type.
+  pcall(pcall, pcall, pcall, pcall, pcall, pcall, pcall, pcall, type, 0)
 end
 
 local function looper_1(n, ...)

Also, please add a comment about fast function
usage, see the example below.

+
+local function looper_1(n, ...)
+  prober_1(...)
+  prober_1(nil, ...)
Why do we need `nil` here? I suppose this line is excess, see the
comment with the example below.

Right, removed:

 end
 
 local function looper_1(n, ...)
   prober_1(...)
-  prober_1(nil, ...)
   return looper_1(n + 1, n, ...)
 end
 


+  return looper_1(n + 1, n, ...)
+end
+
+pcall(coroutine.wrap(looper_1), 0)
+
+test:ok(true, 'no stack overflow with recursive pcall')
+
+-- The second testcase demonstrate a stack overflow in `pcall()`
+-- with using metamethods. A stack overflow is triggered when
+-- `pcall()` is used as `__call` metamethod, setting metatable
+-- will cause an infinite loop which fills up the stack with
+-- `pcall`-frames and then keeps going beyond the end of the
+-- stack until it segfaults.
This comment is unrelated to this test.

Updated and now it looks as the following:


-- The testcase demonstrate a stack overflow when `pcall()`
-- is used as `__newindex` metamethod. The function is vararg
-- because stack check in BC_IFUNCV is off by one on ARM64
-- and MIPS64 without the patch.


                               Also, a stack overflow can be
+-- triggered when `pcall()` is used as `__newindex` metamethod.
+-- The functions are vararg because stack check in BC_IFUNCV is
+-- off by one without the patch.
+
+local mt = setmetatable({}, { __newindex = pcall, __call = pairs })
+
+local function prober_2(...) -- luacheck: no unused
+  mt[mt] = mt
+end
+
+local function looper_2(n, ...)
+  prober_2(...)
+  prober_2(nil, ...)
+  return looper_2(n + 1, n, ...)
+end
+
+pcall(coroutine.wrap(looper_2), 0)
This can be simplified to the following:
| src/luajit -e '
| -- Do not use a Lua function as metamethod -- since it will check
| -- the stack on each invocation. Use simple `type()` built-in
| -- instead.
| local t = setmetatable({}, {__newindex = pcall, __call = type})
| local function prober(...)
|     -- Invokes `pcall(t, t, t)`.
|     t[t] = t
| end
| local function looper(n, ...)
|     prober(...)
|     return looper(n+1, n, ...)
| end
| pcall(coroutine.wrap(looper), 0)
| '

Updated (added a comment about FF and removed prober() with nil):

@@ -37,15 +38,18 @@ test:ok(true, 'no stack overflow with recursive pcall')
 -- The functions are vararg because stack check in BC_IFUNCV is
 -- off by one without the patch.
 
-local mt = setmetatable({}, { __newindex = pcall, __call = pairs })
+-- The `type()` function is more convenient here, it works fast
+-- and can be used with any data type. However, any fast function
+-- can be used instead.
+local t = setmetatable({}, { __newindex = pcall, __call = type })
 
 local function prober_2(...) -- luacheck: no unused
-  mt[mt] = mt
+  -- Invokes `pcall(t, t, t)`.
+  t[t] = t
 end
 
 local function looper_2(n, ...)
   prober_2(...)
-  prober_2(nil, ...)
   return looper_2(n + 1, n, ...)
 end


      
+
+test:ok(true, 'no stack overflow with using metamethod')
+
+test:done(true)
-- 
2.43.0