Hello, Sergey!

Thanks for the patch! LGTM with a minor below.

Sergey

On 6/10/25 13:28, Sergey Kaplun wrote:
From: Mike Pall <mike>

Reported by pwnhacker0x18. Fixed by Peter Cawley.

(cherry picked from commit 811c5322c8ab6bdbb6784cd43aa57041a1cc9360)

`lj_snap_restore()` restores the PC for the inner cframe, but not the
outer (before the protected call to the `trace_exit_cp()`). If the stack
overflow is observed during the further snapshot restoration, it doesn't
fix up the outer cframe's PC. After that, in the following error
rethrowing from the right C frame, in case of error handler set, the
stack overflow error may be raised again, and with an incorrect value of
the PC for that frame, it leads to the crash in the `debug_framepc()`.

This patch prevents it by inserting the special pseudo-valid value `L`.
Unfortunately, this leads to the uninitialized reads by the
`debug_framepc()` (by the address `L - 4`), if the error handler
observes the resulted PC. This will be fixed in the next patch.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11278
---
 src/lj_trace.c                                |  4 +-
 .../lj-1196-partial-snap-restore.test.lua     | 51 +++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100644 test/tarantool-tests/lj-1196-partial-snap-restore.test.lua

diff --git a/src/lj_trace.c b/src/lj_trace.c
index 0d1d233a..8a18d3cf 100644
--- a/src/lj_trace.c
+++ b/src/lj_trace.c
@@ -909,8 +909,10 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr)
   exd.J = J;
   exd.exptr = exptr;
   errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp);
-  if (errcode)
+  if (errcode) {
+    setcframe_pc(cframe_raw(L->cframe), L);  /* Point to any valid memory. */
     return -errcode;  /* Return negated error code. */
+  }
 
   if (exitcode) copyTV(L, L->top++, &exiterr);  /* Anchor the error object. */
 
diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
new file mode 100644
index 00000000..8ee8f673
--- /dev/null
+++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
@@ -0,0 +1,51 @@
+local tap = require('tap')
+
+-- Test file to demonstrate LuaJIT crash during snapshot restore
+-- in case of the stack overflow.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1196.
+
+local test = tap.test('lj-1196-partial-snap-restore')
+
+test:plan(1)
+
+-- XXX: The reproducer below uses several stack slot offsets to
+-- make sure that stack overflow happens during the snapshot
+-- restoration and not the call to the stitched function and
+-- return its result. The actual stack size should be less than
+-- `LJ_STACK_MAXEX`, but with requested space it should be greater
+-- than `LJ_STACK_MAX`, see <src/lj_state.c> for the details.
+-- Before that, the `lj_snap_restore()` restores the `pc` for the
+-- inner cframe, but not the outer (before the protected call to
+-- the `trace_exit_cp()`). Thus, the further error rethrowing from
+-- the right C frame leads to the crash before the patch.
+
+-- XXX: Simplify the `jit.dump()` output.
+local tonumber = tonumber
+
+-- This function starts the first trace.
+local function recursive_f()
+  -- Function with the single result to cause the trace stitching.
+  tonumber('')
+  -- Prereserved stack space before the call.
+  -- luacheck: no unused
+  local _, _, _, _, _, _, _, _, _, _, _
+  -- Link from the stitched trace to the parent one.
+  recursive_f()
+  -- Additional stack required for the snapshot restoration.
/stack/stack space/?
+  -- luacheck: no unused
+  local _, _, _
+end
+
+-- Use coroutine wrap for the fixed stack size at the start.
+coroutine.wrap(function()
+  -- XXX: Special stack slot offset.
+  -- luacheck: no unused
+  local _, _, _, _, _, _, _, _, _, _
+  -- The error is observed only if we have the error handler set,
+  -- since we try to resize stack for its call.
+  xpcall(recursive_f, function() end)
+end)()
+
+test:ok(true, 'no crash during snapshot restoring')
+
+test:done(true)