* [Tarantool-patches] [PATCH luajit 0/2] Snapshot restore with stack overflow
@ 2025-06-10 10:28 Sergey Kaplun via Tarantool-patches
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to " Sergey Kaplun via Tarantool-patches
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 2/2] Different fix for " Sergey Kaplun via Tarantool-patches
0 siblings, 2 replies; 8+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-06-10 10:28 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
This patchset fixes the issue with stackoverflow during the snapshot
restoration. The first patch solves the problem in the incorrect way,
since it leads to the uninitalized read. The second patch fixes the
issue correctly.
Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1196-partial-snap-restore
Related issues:
* https://github.com/LuaJIT/LuaJIT/issues/1196
* https://github.com/tarantool/tarantool/issues/11278
Related ML link: https://www.freelists.org/post/luajit/Invalid-read-found-by-valgrind
Mike Pall (2):
Handle partial snapshot restore due to stack overflow.
Different fix for partial snapshot restore due to stack overflow.
src/lj_snap.c | 3 +-
.../lj-1196-partial-snap-restore.test.lua | 51 +++++++++++++++
...-1196-stack-overflow-snap-restore.test.lua | 65 +++++++++++++++++++
3 files changed, 118 insertions(+), 1 deletion(-)
create mode 100644 test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
create mode 100644 test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
--
2.49.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to stack overflow.
2025-06-10 10:28 [Tarantool-patches] [PATCH luajit 0/2] Snapshot restore with stack overflow Sergey Kaplun via Tarantool-patches
@ 2025-06-10 10:28 ` Sergey Kaplun via Tarantool-patches
2025-06-10 15:29 ` Sergey Bronnikov via Tarantool-patches
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 2/2] Different fix for " Sergey Kaplun via Tarantool-patches
1 sibling, 1 reply; 8+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-06-10 10:28 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Reported by pwnhacker0x18. Fixed by Peter Cawley.
(cherry picked from commit 811c5322c8ab6bdbb6784cd43aa57041a1cc9360)
`lj_snap_restore()` restores the PC for the inner cframe, but not the
outer (before the protected call to the `trace_exit_cp()`). If the stack
overflow is observed during the further snapshot restoration, it doesn't
fix up the outer cframe's PC. After that, in the following error
rethrowing from the right C frame, in case of error handler set, the
stack overflow error may be raised again, and with an incorrect value of
the PC for that frame, it leads to the crash in the `debug_framepc()`.
This patch prevents it by inserting the special pseudo-valid value `L`.
Unfortunately, this leads to the uninitialized reads by the
`debug_framepc()` (by the address `L - 4`), if the error handler
observes the resulted PC. This will be fixed in the next patch.
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#11278
---
src/lj_trace.c | 4 +-
.../lj-1196-partial-snap-restore.test.lua | 51 +++++++++++++++++++
2 files changed, 54 insertions(+), 1 deletion(-)
create mode 100644 test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
diff --git a/src/lj_trace.c b/src/lj_trace.c
index 0d1d233a..8a18d3cf 100644
--- a/src/lj_trace.c
+++ b/src/lj_trace.c
@@ -909,8 +909,10 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr)
exd.J = J;
exd.exptr = exptr;
errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp);
- if (errcode)
+ if (errcode) {
+ setcframe_pc(cframe_raw(L->cframe), L); /* Point to any valid memory. */
return -errcode; /* Return negated error code. */
+ }
if (exitcode) copyTV(L, L->top++, &exiterr); /* Anchor the error object. */
diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
new file mode 100644
index 00000000..8ee8f673
--- /dev/null
+++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
@@ -0,0 +1,51 @@
+local tap = require('tap')
+
+-- Test file to demonstrate LuaJIT crash during snapshot restore
+-- in case of the stack overflow.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1196.
+
+local test = tap.test('lj-1196-partial-snap-restore')
+
+test:plan(1)
+
+-- XXX: The reproducer below uses several stack slot offsets to
+-- make sure that stack overflow happens during the snapshot
+-- restoration and not the call to the stitched function and
+-- return its result. The actual stack size should be less than
+-- `LJ_STACK_MAXEX`, but with requested space it should be greater
+-- than `LJ_STACK_MAX`, see <src/lj_state.c> for the details.
+-- Before that, the `lj_snap_restore()` restores the `pc` for the
+-- inner cframe, but not the outer (before the protected call to
+-- the `trace_exit_cp()`). Thus, the further error rethrowing from
+-- the right C frame leads to the crash before the patch.
+
+-- XXX: Simplify the `jit.dump()` output.
+local tonumber = tonumber
+
+-- This function starts the first trace.
+local function recursive_f()
+ -- Function with the single result to cause the trace stitching.
+ tonumber('')
+ -- Prereserved stack space before the call.
+ -- luacheck: no unused
+ local _, _, _, _, _, _, _, _, _, _, _
+ -- Link from the stitched trace to the parent one.
+ recursive_f()
+ -- Additional stack required for the snapshot restoration.
+ -- luacheck: no unused
+ local _, _, _
+end
+
+-- Use coroutine wrap for the fixed stack size at the start.
+coroutine.wrap(function()
+ -- XXX: Special stack slot offset.
+ -- luacheck: no unused
+ local _, _, _, _, _, _, _, _, _, _
+ -- The error is observed only if we have the error handler set,
+ -- since we try to resize stack for its call.
+ xpcall(recursive_f, function() end)
+end)()
+
+test:ok(true, 'no crash during snapshot restoring')
+
+test:done(true)
--
2.49.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Tarantool-patches] [PATCH luajit 2/2] Different fix for partial snapshot restore due to stack overflow.
2025-06-10 10:28 [Tarantool-patches] [PATCH luajit 0/2] Snapshot restore with stack overflow Sergey Kaplun via Tarantool-patches
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to " Sergey Kaplun via Tarantool-patches
@ 2025-06-10 10:28 ` Sergey Kaplun via Tarantool-patches
2025-06-10 16:15 ` Sergey Bronnikov via Tarantool-patches
1 sibling, 1 reply; 8+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-06-10 10:28 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Reported by Junlong Li. Fixed by Peter Cawley.
(cherry picked from commit 86e7123bb1782a5f200ba5e83b8c4f3fbad4f7bc)
This patch is a follow-up to the previous commit, which leads to a dirty
read of the pseudo-valid PC set for the cframe on snapshot restoration.
To avoid these dirty reads, this patch sets the PC to the outer frame
as well before possible error throwing.
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#11278
---
src/lj_snap.c | 3 +-
src/lj_trace.c | 4 +-
...-1196-stack-overflow-snap-restore.test.lua | 65 +++++++++++++++++++
3 files changed, 68 insertions(+), 4 deletions(-)
create mode 100644 test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
diff --git a/src/lj_snap.c b/src/lj_snap.c
index 8d7bd868..4cfae579 100644
--- a/src/lj_snap.c
+++ b/src/lj_snap.c
@@ -955,7 +955,8 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
lua_State *L = J->L;
/* Set interpreter PC to the next PC to get correct error messages. */
- setcframe_pc(cframe_raw(L->cframe), pc+1);
+ setcframe_pc(L->cframe, pc+1);
+ setcframe_pc(cframe_raw(cframe_prev(L->cframe)), pc);
/* Make sure the stack is big enough for the slots from the snapshot. */
if (LJ_UNLIKELY(L->base + snap->topslot >= tvref(L->maxstack))) {
diff --git a/src/lj_trace.c b/src/lj_trace.c
index 8a18d3cf..0d1d233a 100644
--- a/src/lj_trace.c
+++ b/src/lj_trace.c
@@ -909,10 +909,8 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr)
exd.J = J;
exd.exptr = exptr;
errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp);
- if (errcode) {
- setcframe_pc(cframe_raw(L->cframe), L); /* Point to any valid memory. */
+ if (errcode)
return -errcode; /* Return negated error code. */
- }
if (exitcode) copyTV(L, L->top++, &exiterr); /* Anchor the error object. */
diff --git a/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
new file mode 100644
index 00000000..942d1f82
--- /dev/null
+++ b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
@@ -0,0 +1,65 @@
+local tap = require('tap')
+
+-- Test file to demonstrate LuaJIT dirty reads after stack
+-- overflow during restoration from the snapshot.
+-- The test fails before the patch under Valgrind.
+--
+-- luacheck: push no max_comment_line_length
+--
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1196,
+-- https://www.freelists.org/post/luajit/Invalid-read-found-by-valgrind.
+--
+-- luacheck: pop
+
+local test = tap.test('lj-1196-stack-overflow-snap-restore')
+
+test:plan(4)
+
+-- XXX: This file has the same tests as the
+-- <test/LuaJIT-tests/lang/stackov.lua>, but without disabling the
+-- compilation for the given functions. Hence, the check here is
+-- less strict -- we just check that there are no dirty reads,
+-- uninitialized memory access, etc.
+
+local function recursive_f_noarg()
+ recursive_f_noarg()
+end
+
+local function recursive_one_arg(argument)
+ recursive_one_arg(argument)
+end
+
+local function recursive_f_vararg(...)
+ recursive_f_vararg(1, ...)
+end
+
+local function recursive_f_vararg_tail(...)
+ return recursive_f_vararg_tail(1, ...)
+end
+
+-- Use `coroutine.wrap()`, for independent stack sizes.
+-- The invalid read is done by the error handler
+-- `debug.traceback()`, since it observes the pseudo PC (`L`) and
+-- reads the memory by `L - 4` address before the patch.
+
+coroutine.wrap(function()
+ local status = xpcall(recursive_f_noarg, debug.traceback)
+ test:ok(not status, 'correct status, recursive no arguments')
+end)()
+
+coroutine.wrap(function()
+ local status = xpcall(recursive_one_arg, debug.traceback, 1)
+ test:ok(not status, 'correct status, recursive one argument')
+end)()
+
+coroutine.wrap(function()
+ local status = xpcall(recursive_f_vararg, debug.traceback, 1)
+ test:ok(not status, 'correct status, recursive vararg')
+end)()
+
+coroutine.wrap(function()
+ local status = xpcall(recursive_f_vararg_tail, debug.traceback, 1)
+ test:ok(not status, 'correct status, recursive vararg tail')
+end)()
+
+test:done(true)
--
2.49.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to stack overflow.
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to " Sergey Kaplun via Tarantool-patches
@ 2025-06-10 15:29 ` Sergey Bronnikov via Tarantool-patches
2025-06-10 16:03 ` Sergey Kaplun via Tarantool-patches
0 siblings, 1 reply; 8+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-06-10 15:29 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 4293 bytes --]
Hello, Sergey!
Thanks for the patch! LGTM with a minor below.
Sergey
On 6/10/25 13:28, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Reported by pwnhacker0x18. Fixed by Peter Cawley.
>
> (cherry picked from commit 811c5322c8ab6bdbb6784cd43aa57041a1cc9360)
>
> `lj_snap_restore()` restores the PC for the inner cframe, but not the
> outer (before the protected call to the `trace_exit_cp()`). If the stack
> overflow is observed during the further snapshot restoration, it doesn't
> fix up the outer cframe's PC. After that, in the following error
> rethrowing from the right C frame, in case of error handler set, the
> stack overflow error may be raised again, and with an incorrect value of
> the PC for that frame, it leads to the crash in the `debug_framepc()`.
>
> This patch prevents it by inserting the special pseudo-valid value `L`.
> Unfortunately, this leads to the uninitialized reads by the
> `debug_framepc()` (by the address `L - 4`), if the error handler
> observes the resulted PC. This will be fixed in the next patch.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#11278
> ---
> src/lj_trace.c | 4 +-
> .../lj-1196-partial-snap-restore.test.lua | 51 +++++++++++++++++++
> 2 files changed, 54 insertions(+), 1 deletion(-)
> create mode 100644 test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
>
> diff --git a/src/lj_trace.c b/src/lj_trace.c
> index 0d1d233a..8a18d3cf 100644
> --- a/src/lj_trace.c
> +++ b/src/lj_trace.c
> @@ -909,8 +909,10 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr)
> exd.J = J;
> exd.exptr = exptr;
> errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp);
> - if (errcode)
> + if (errcode) {
> + setcframe_pc(cframe_raw(L->cframe), L); /* Point to any valid memory. */
> return -errcode; /* Return negated error code. */
> + }
>
> if (exitcode) copyTV(L, L->top++, &exiterr); /* Anchor the error object. */
>
> diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
> new file mode 100644
> index 00000000..8ee8f673
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
> @@ -0,0 +1,51 @@
> +local tap = require('tap')
> +
> +-- Test file to demonstrate LuaJIT crash during snapshot restore
> +-- in case of the stack overflow.
> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1196.
> +
> +local test = tap.test('lj-1196-partial-snap-restore')
> +
> +test:plan(1)
> +
> +-- XXX: The reproducer below uses several stack slot offsets to
> +-- make sure that stack overflow happens during the snapshot
> +-- restoration and not the call to the stitched function and
> +-- return its result. The actual stack size should be less than
> +-- `LJ_STACK_MAXEX`, but with requested space it should be greater
> +-- than `LJ_STACK_MAX`, see <src/lj_state.c> for the details.
> +-- Before that, the `lj_snap_restore()` restores the `pc` for the
> +-- inner cframe, but not the outer (before the protected call to
> +-- the `trace_exit_cp()`). Thus, the further error rethrowing from
> +-- the right C frame leads to the crash before the patch.
> +
> +-- XXX: Simplify the `jit.dump()` output.
> +local tonumber = tonumber
> +
> +-- This function starts the first trace.
> +local function recursive_f()
> + -- Function with the single result to cause the trace stitching.
> + tonumber('')
> + -- Prereserved stack space before the call.
> + -- luacheck: no unused
> + local _, _, _, _, _, _, _, _, _, _, _
> + -- Link from the stitched trace to the parent one.
> + recursive_f()
> + -- Additional stack required for the snapshot restoration.
/stack/stack space/?
> + -- luacheck: no unused
> + local _, _, _
> +end
> +
> +-- Use coroutine wrap for the fixed stack size at the start.
> +coroutine.wrap(function()
> + -- XXX: Special stack slot offset.
> + -- luacheck: no unused
> + local _, _, _, _, _, _, _, _, _, _
> + -- The error is observed only if we have the error handler set,
> + -- since we try to resize stack for its call.
> + xpcall(recursive_f, function() end)
> +end)()
> +
> +test:ok(true, 'no crash during snapshot restoring')
> +
> +test:done(true)
[-- Attachment #2: Type: text/html, Size: 4901 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to stack overflow.
2025-06-10 15:29 ` Sergey Bronnikov via Tarantool-patches
@ 2025-06-10 16:03 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 8+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-06-10 16:03 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
Fixed your comment and force-pushed the branch.
On 10.06.25, Sergey Bronnikov wrote:
> Hello, Sergey!
>
> Thanks for the patch! LGTM with a minor below.
>
> Sergey
>
> On 6/10/25 13:28, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Reported by pwnhacker0x18. Fixed by Peter Cawley.
> >
> > (cherry picked from commit 811c5322c8ab6bdbb6784cd43aa57041a1cc9360)
> >
> > `lj_snap_restore()` restores the PC for the inner cframe, but not the
> > outer (before the protected call to the `trace_exit_cp()`). If the stack
> > overflow is observed during the further snapshot restoration, it doesn't
> > fix up the outer cframe's PC. After that, in the following error
> > rethrowing from the right C frame, in case of error handler set, the
> > stack overflow error may be raised again, and with an incorrect value of
> > the PC for that frame, it leads to the crash in the `debug_framepc()`.
> >
> > This patch prevents it by inserting the special pseudo-valid value `L`.
> > Unfortunately, this leads to the uninitialized reads by the
> > `debug_framepc()` (by the address `L - 4`), if the error handler
> > observes the resulted PC. This will be fixed in the next patch.
> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#11278
> > ---
<snipped>
> > +-- This function starts the first trace.
> > +local function recursive_f()
> > + -- Function with the single result to cause the trace stitching.
> > + tonumber('')
> > + -- Prereserved stack space before the call.
> > + -- luacheck: no unused
> > + local _, _, _, _, _, _, _, _, _, _, _
> > + -- Link from the stitched trace to the parent one.
> > + recursive_f()
> > + -- Additional stack required for the snapshot restoration.
> /stack/stack space/?
Rephrased:
===================================================================
diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
index 8ee8f673..4ab78d31 100644
--- a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
+++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
@@ -31,7 +31,7 @@ local function recursive_f()
local _, _, _, _, _, _, _, _, _, _, _
-- Link from the stitched trace to the parent one.
recursive_f()
- -- Additional stack required for the snapshot restoration.
+ -- Additional stack space required for the snapshot restoration.
-- luacheck: no unused
local _, _, _
end
===================================================================
> > + -- luacheck: no unused
> > + local _, _, _
> > +end
> > +
> > +-- Use coroutine wrap for the fixed stack size at the start.
> > +coroutine.wrap(function()
> > + -- XXX: Special stack slot offset.
> > + -- luacheck: no unused
> > + local _, _, _, _, _, _, _, _, _, _
> > + -- The error is observed only if we have the error handler set,
> > + -- since we try to resize stack for its call.
> > + xpcall(recursive_f, function() end)
> > +end)()
> > +
> > +test:ok(true, 'no crash during snapshot restoring')
> > +
> > +test:done(true)
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] Different fix for partial snapshot restore due to stack overflow.
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 2/2] Different fix for " Sergey Kaplun via Tarantool-patches
@ 2025-06-10 16:15 ` Sergey Bronnikov via Tarantool-patches
2025-06-10 16:22 ` Sergey Kaplun via Tarantool-patches
0 siblings, 1 reply; 8+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-06-10 16:15 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 4879 bytes --]
Hello, Sergey!
Thanks for the patch! See a comment below.
Sergey
On 6/10/25 13:28, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Reported by Junlong Li. Fixed by Peter Cawley.
>
> (cherry picked from commit 86e7123bb1782a5f200ba5e83b8c4f3fbad4f7bc)
>
> This patch is a follow-up to the previous commit, which leads to a dirty
> read of the pseudo-valid PC set for the cframe on snapshot restoration.
> To avoid these dirty reads, this patch sets the PC to the outer frame
> as well before possible error throwing.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#11278
> ---
> src/lj_snap.c | 3 +-
> src/lj_trace.c | 4 +-
> ...-1196-stack-overflow-snap-restore.test.lua | 65 +++++++++++++++++++
> 3 files changed, 68 insertions(+), 4 deletions(-)
> create mode 100644 test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
>
> diff --git a/src/lj_snap.c b/src/lj_snap.c
> index 8d7bd868..4cfae579 100644
> --- a/src/lj_snap.c
> +++ b/src/lj_snap.c
> @@ -955,7 +955,8 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
> lua_State *L = J->L;
>
> /* Set interpreter PC to the next PC to get correct error messages. */
> - setcframe_pc(cframe_raw(L->cframe), pc+1);
> + setcframe_pc(L->cframe, pc+1);
> + setcframe_pc(cframe_raw(cframe_prev(L->cframe)), pc);
>
> /* Make sure the stack is big enough for the slots from the snapshot. */
> if (LJ_UNLIKELY(L->base + snap->topslot >= tvref(L->maxstack))) {
> diff --git a/src/lj_trace.c b/src/lj_trace.c
> index 8a18d3cf..0d1d233a 100644
> --- a/src/lj_trace.c
> +++ b/src/lj_trace.c
> @@ -909,10 +909,8 @@ int LJ_FASTCALL lj_trace_exit(jit_State *J, void *exptr)
> exd.J = J;
> exd.exptr = exptr;
> errcode = lj_vm_cpcall(L, NULL, &exd, trace_exit_cp);
> - if (errcode) {
> - setcframe_pc(cframe_raw(L->cframe), L); /* Point to any valid memory. */
> + if (errcode)
> return -errcode; /* Return negated error code. */
> - }
>
> if (exitcode) copyTV(L, L->top++, &exiterr); /* Anchor the error object. */
>
> diff --git a/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
> new file mode 100644
> index 00000000..942d1f82
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
> @@ -0,0 +1,65 @@
> +local tap = require('tap')
> +
> +-- Test file to demonstrate LuaJIT dirty reads after stack
> +-- overflow during restoration from the snapshot.
> +-- The test fails before the patch under Valgrind.
Please specify valgrind option that is required for reproducing the bug.
Cannot reproduce with command line below with reverted patch:
VALGRIND_OPTS="--leak-check=no --malloc-fill=0x00 --free-fill=0x00"
ctest -V -R test/tarantool-tests/lj-1196-partial-snap-restore.test.lua -V
> +--
> +-- luacheck: push no max_comment_line_length
> +--
> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1196,
> +--https://www.freelists.org/post/luajit/Invalid-read-found-by-valgrind.
> +--
> +-- luacheck: pop
> +
> +local test = tap.test('lj-1196-stack-overflow-snap-restore')
> +
> +test:plan(4)
> +
> +-- XXX: This file has the same tests as the
> +-- <test/LuaJIT-tests/lang/stackov.lua>, but without disabling the
> +-- compilation for the given functions. Hence, the check here is
> +-- less strict -- we just check that there are no dirty reads,
> +-- uninitialized memory access, etc.
> +
> +local function recursive_f_noarg()
> + recursive_f_noarg()
> +end
> +
> +local function recursive_one_arg(argument)
> + recursive_one_arg(argument)
> +end
> +
> +local function recursive_f_vararg(...)
> + recursive_f_vararg(1, ...)
> +end
> +
> +local function recursive_f_vararg_tail(...)
> + return recursive_f_vararg_tail(1, ...)
> +end
> +
> +-- Use `coroutine.wrap()`, for independent stack sizes.
> +-- The invalid read is done by the error handler
> +-- `debug.traceback()`, since it observes the pseudo PC (`L`) and
> +-- reads the memory by `L - 4` address before the patch.
> +
> +coroutine.wrap(function()
> + local status = xpcall(recursive_f_noarg, debug.traceback)
> +test:ok(not status, 'correct status, recursive no arguments')
> +end)()
> +
> +coroutine.wrap(function()
> + local status = xpcall(recursive_one_arg, debug.traceback, 1)
> +test:ok(not status, 'correct status, recursive one argument')
> +end)()
> +
> +coroutine.wrap(function()
> + local status = xpcall(recursive_f_vararg, debug.traceback, 1)
> +test:ok(not status, 'correct status, recursive vararg')
> +end)()
> +
> +coroutine.wrap(function()
> + local status = xpcall(recursive_f_vararg_tail, debug.traceback, 1)
> +test:ok(not status, 'correct status, recursive vararg tail')
> +end)()
> +
> +test:done(true)
[-- Attachment #2: Type: text/html, Size: 5865 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] Different fix for partial snapshot restore due to stack overflow.
2025-06-10 16:15 ` Sergey Bronnikov via Tarantool-patches
@ 2025-06-10 16:22 ` Sergey Kaplun via Tarantool-patches
2025-06-11 9:36 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 1 reply; 8+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-06-10 16:22 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
Hi, Sergey!
Thanks for the review!
Please, consider my answer below.
On 10.06.25, Sergey Bronnikov wrote:
> Hello, Sergey!
>
> Thanks for the patch! See a comment below.
>
> Sergey
>
> On 6/10/25 13:28, Sergey Kaplun wrote:
> > From: Mike Pall <mike>
> >
> > Reported by Junlong Li. Fixed by Peter Cawley.
> >
> > (cherry picked from commit 86e7123bb1782a5f200ba5e83b8c4f3fbad4f7bc)
> >
> > This patch is a follow-up to the previous commit, which leads to a dirty
> > read of the pseudo-valid PC set for the cframe on snapshot restoration.
> > To avoid these dirty reads, this patch sets the PC to the outer frame
> > as well before possible error throwing.
> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#11278
> > ---
> > src/lj_snap.c | 3 +-
> > src/lj_trace.c | 4 +-
> > ...-1196-stack-overflow-snap-restore.test.lua | 65 +++++++++++++++++++
> > 3 files changed, 68 insertions(+), 4 deletions(-)
> > create mode 100644 test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
> >
> > diff --git a/src/lj_snap.c b/src/lj_snap.c
> > index 8d7bd868..4cfae579 100644
> > --- a/src/lj_snap.c
> > +++ b/src/lj_snap.c
<snipped>
> > diff --git a/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
> > new file mode 100644
> > index 00000000..942d1f82
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
> > @@ -0,0 +1,65 @@
> > +local tap = require('tap')
> > +
> > +-- Test file to demonstrate LuaJIT dirty reads after stack
> > +-- overflow during restoration from the snapshot.
> > +-- The test fails before the patch under Valgrind.
>
> Please specify valgrind option that is required for reproducing the bug.
This just mean the default valgrind settings (without any flags).
>
> Cannot reproduce with command line below with reverted patch:
>
> VALGRIND_OPTS="--leak-check=no --malloc-fill=0x00 --free-fill=0x00"
> ctest -V -R test/tarantool-tests/lj-1196-partial-snap-restore.test.lua -V
This doesn't fail due to --leadk-check=no flag.
Also, I suppose this should fail under ASan.
>
> > +--
> > +-- luacheck: push no max_comment_line_length
> > +--
> > +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1196,
> > +--https://www.freelists.org/post/luajit/Invalid-read-found-by-valgrind.
> > +--
> > +-- luacheck: pop
> > +
<snipped>
> > +test:done(true)
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit 2/2] Different fix for partial snapshot restore due to stack overflow.
2025-06-10 16:22 ` Sergey Kaplun via Tarantool-patches
@ 2025-06-11 9:36 ` Sergey Bronnikov via Tarantool-patches
0 siblings, 0 replies; 8+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-06-11 9:36 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 2963 bytes --]
Hi, Sergey,
LGTM
On 6/10/25 19:22, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
> Please, consider my answer below.
>
> On 10.06.25, Sergey Bronnikov wrote:
>> Hello, Sergey!
>>
>> Thanks for the patch! See a comment below.
>>
>> Sergey
>>
>> On 6/10/25 13:28, Sergey Kaplun wrote:
>>> From: Mike Pall <mike>
>>>
>>> Reported by Junlong Li. Fixed by Peter Cawley.
>>>
>>> (cherry picked from commit 86e7123bb1782a5f200ba5e83b8c4f3fbad4f7bc)
>>>
>>> This patch is a follow-up to the previous commit, which leads to a dirty
>>> read of the pseudo-valid PC set for the cframe on snapshot restoration.
>>> To avoid these dirty reads, this patch sets the PC to the outer frame
>>> as well before possible error throwing.
>>>
>>> Sergey Kaplun:
>>> * added the description and the test for the problem
>>>
>>> Part of tarantool/tarantool#11278
>>> ---
>>> src/lj_snap.c | 3 +-
>>> src/lj_trace.c | 4 +-
>>> ...-1196-stack-overflow-snap-restore.test.lua | 65 +++++++++++++++++++
>>> 3 files changed, 68 insertions(+), 4 deletions(-)
>>> create mode 100644 test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
>>>
>>> diff --git a/src/lj_snap.c b/src/lj_snap.c
>>> index 8d7bd868..4cfae579 100644
>>> --- a/src/lj_snap.c
>>> +++ b/src/lj_snap.c
> <snipped>
>
>>> diff --git a/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
>>> new file mode 100644
>>> index 00000000..942d1f82
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua
>>> @@ -0,0 +1,65 @@
>>> +local tap = require('tap')
>>> +
>>> +-- Test file to demonstrate LuaJIT dirty reads after stack
>>> +-- overflow during restoration from the snapshot.
>>> +-- The test fails before the patch under Valgrind.
>> Please specify valgrind option that is required for reproducing the bug.
> This just mean the default valgrind settings (without any flags).
With this command-line the bug is reproduced:
cmake -S . -B build -DLUAJIT_USE_VALGRIND=ON -DLUAJIT_USE_ASAN=OFF
-DLUAJIT_USE_SYSMALLOC=ON -DLUAJIT_ENABLE_GC64=ON -DLUA_USE_APICHECK=ON
-DLUA_USE_ASSERT=ON -DCMAKE_BUILD_TYPE=Debug
cd build && ctest -V -R
test/tarantool-tests/lj-1196-stack-overflow-snap-restore.test.lua -V
>
>> Cannot reproduce with command line below with reverted patch:
>>
>> VALGRIND_OPTS="--leak-check=no --malloc-fill=0x00 --free-fill=0x00"
>> ctest -V -R test/tarantool-tests/lj-1196-partial-snap-restore.test.lua -V
> This doesn't fail due to --leadk-check=no flag.
> Also, I suppose this should fail under ASan.
>
>>> +--
>>> +-- luacheck: push no max_comment_line_length
>>> +--
>>> +-- Seealso:https://github.com/LuaJIT/LuaJIT/issues/1196,
>>> +--https://www.freelists.org/post/luajit/Invalid-read-found-by-valgrind.
>>> +--
>>> +-- luacheck: pop
>>> +
> <snipped>
>
>>> +test:done(true)
[-- Attachment #2: Type: text/html, Size: 4624 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-11 9:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-10 10:28 [Tarantool-patches] [PATCH luajit 0/2] Snapshot restore with stack overflow Sergey Kaplun via Tarantool-patches
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 1/2] Handle partial snapshot restore due to " Sergey Kaplun via Tarantool-patches
2025-06-10 15:29 ` Sergey Bronnikov via Tarantool-patches
2025-06-10 16:03 ` Sergey Kaplun via Tarantool-patches
2025-06-10 10:28 ` [Tarantool-patches] [PATCH luajit 2/2] Different fix for " Sergey Kaplun via Tarantool-patches
2025-06-10 16:15 ` Sergey Bronnikov via Tarantool-patches
2025-06-10 16:22 ` Sergey Kaplun via Tarantool-patches
2025-06-11 9:36 ` Sergey Bronnikov via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox