Hello, Sergey!
Thanks for the patch! LGTM with a minor below.
Sergey
/stack/stack space/?From: Mike Pall <mike> Reported by pwnhacker0x18. Fixed by Peter Cawley. (cherry picked from commit 811c5322c8ab6bdbb6784cd43aa57041a1cc9360) `lj_snap_restore()` restores the PC for the inner cframe, but not the outer (before the protected call to the `trace_exit_cp()`). If the stack overflow is observed during the further snapshot restoration, it doesn't fix up the outer cframe's PC. After that, in the following error rethrowing from the right C frame, in case of error handler set, the stack overflow error may be raised again, and with an incorrect value of the PC for that frame, it leads to the crash in the `debug_framepc()`. This patch prevents it by inserting the special pseudo-valid value `L`. Unfortunately, this leads to the uninitialized reads by the `debug_framepc()` (by the address `L - 4`), if the error handler observes the resulted PC. This will be fixed in the next patch. Sergey Kaplun: * added the description and the test for the problem Part of tarantool/tarantool#11278 --- src/lj_trace.c | 4 +- .../lj-1196-partial-snap-restore.test.lua | 51 +++++++++++++++++++ 2 files changed, 54 insertions(+), 1 deletion(-) create mode 100644 test/tarantool-tests/lj-1196-partial-snap-restore.test.lua diff --git a/src/lj_trace.c b/src/lj_trace.c index 0d1d233a..8a18d3cf 100644 --- a/src/lj_trace.c +++ b/src/lj_trace.c @@ -909,8 +909,10 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr) exd.J = J; exd.exptr = exptr; errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp); - if (errcode) + if (errcode) { + setcframe_pc(cframe_raw(L->cframe), L); /* Point to any valid memory. */ return -errcode; /* Return negated error code. */ + } if (exitcode) copyTV(L, L->top++, &exiterr); /* Anchor the error object. */ diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua new file mode 100644 index 00000000..8ee8f673 --- /dev/null +++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua @@ -0,0 +1,51 @@ +local tap = require('tap') + +-- Test file to demonstrate LuaJIT crash during snapshot restore +-- in case of the stack overflow. +-- See also: https://github.com/LuaJIT/LuaJIT/issues/1196. + +local test = tap.test('lj-1196-partial-snap-restore') + +test:plan(1) + +-- XXX: The reproducer below uses several stack slot offsets to +-- make sure that stack overflow happens during the snapshot +-- restoration and not the call to the stitched function and +-- return its result. The actual stack size should be less than +-- `LJ_STACK_MAXEX`, but with requested space it should be greater +-- than `LJ_STACK_MAX`, see <src/lj_state.c> for the details. +-- Before that, the `lj_snap_restore()` restores the `pc` for the +-- inner cframe, but not the outer (before the protected call to +-- the `trace_exit_cp()`). Thus, the further error rethrowing from +-- the right C frame leads to the crash before the patch. + +-- XXX: Simplify the `jit.dump()` output. +local tonumber = tonumber + +-- This function starts the first trace. +local function recursive_f() + -- Function with the single result to cause the trace stitching. + tonumber('') + -- Prereserved stack space before the call. + -- luacheck: no unused + local _, _, _, _, _, _, _, _, _, _, _ + -- Link from the stitched trace to the parent one. + recursive_f() + -- Additional stack required for the snapshot restoration.
+ -- luacheck: no unused + local _, _, _ +end + +-- Use coroutine wrap for the fixed stack size at the start. +coroutine.wrap(function() + -- XXX: Special stack slot offset. + -- luacheck: no unused + local _, _, _, _, _, _, _, _, _, _ + -- The error is observed only if we have the error handler set, + -- since we try to resize stack for its call. + xpcall(recursive_f, function() end) +end)() + +test:ok(true, 'no crash during snapshot restoring') + +test:done(true)