* [Tarantool-patches] [PATCH luajit] Avoid out-of-range PC for stack overflow error from snapshot restore.
@ 2025-08-19 17:11 Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 4+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-08-19 17:11 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Reported by Sergey Kaplun.
(cherry picked from commit e3fa3c48d8a4aadcf86429e9f7f6f1171914b15a)
In case when the saved PC in the snapshot is the first (0th index) PC in
the prototype like JFUNC*, the subtraction to determine the previous PC
in the `debug_framepc()` overflows and contains `NO_BCPOS` value. After
that, the pos is greater than sizebc. Hence, the code below may
interpret the bits in `pt->varinfo` like `bc_isret()` and assign an
invalid value to `pos` to be returned. Further, it may lead to the
assertion failure in the lj_debug_frameline().
This patch fixes it by pretending that this means the first non-header
bytecode in the prototype. Also, this patch removes the skipcond
introduced in the commit a74e5be07d54b4e98b85493de73317db520b3f71
("test: conditionally disable flaky lj-1196"). The new test isn't added
since the assertion failure depends on the specific memory address of
the `varinfo`, so it is too hard to create a stable reproducer.
Sergey Kaplun:
* added the description for the problem
Part of tarantool/tarantool#11691
---
Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1369-stackov-invalid-bc
Related issues:
* https://github.com/tarantool/tarantool/issues/11691
* https://github.com/LuaJIT/LuaJIT/issues/1369
* https://github.com/LuaJIT/LuaJIT/issues/1359
* https://github.com/LuaJIT/LuaJIT/issues/1196
src/lj_debug.c | 1 +
.../lj-1196-partial-snap-restore.test.lua | 10 +---------
2 files changed, 2 insertions(+), 9 deletions(-)
diff --git a/src/lj_debug.c b/src/lj_debug.c
index 76e48aca..bc057cf6 100644
--- a/src/lj_debug.c
+++ b/src/lj_debug.c
@@ -101,6 +101,7 @@ static BCPos debug_framepc(lua_State *L, GCfunc *fn, cTValue *nextframe)
pt = funcproto(fn);
pos = proto_bcpos(pt, ins) - 1;
#if LJ_HASJIT
+ if (pos == NO_BCPOS) return 1; /* Pretend it's the first bytecode. */
if (pos > pt->sizebc) { /* Undo the effects of lj_trace_exit for JLOOP. */
if (bc_isret(bc_op(ins[-1]))) {
GCtrace *T = (GCtrace *)((char *)(ins-1) - offsetof(GCtrace, startins));
diff --git a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
index 5199ca00..a74f97bd 100644
--- a/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
+++ b/test/tarantool-tests/lj-1196-partial-snap-restore.test.lua
@@ -4,15 +4,7 @@ local tap = require('tap')
-- in case of the stack overflow.
-- See also: https://github.com/LuaJIT/LuaJIT/issues/1196.
-local test = tap.test('lj-1196-partial-snap-restore'):skipcond({
- -- Disable test for Tarantool to avoid failures, see also:
- -- https://github.com/LuaJIT/LuaJIT/issues/1369.
- ['Disabled for Tarantool due to lj-1369'] = _TARANTOOL,
- -- Also, it may fail on some non-arm64 runners stable after
- -- adding the skip condition above.
- ['Disabled for x86/x64 due to lj-1369'] = jit.arch ~= 'arm64',
-})
-
+local test = tap.test('lj-1196-partial-snap-restore')
test:plan(1)
-- XXX: The reproducer below uses several stack slot offsets to
--
2.50.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Tarantool-patches] [PATCH luajit] Avoid out-of-range PC for stack overflow error from snapshot restore.
@ 2025-06-12 9:12 Sergey Kaplun via Tarantool-patches
2025-06-25 14:47 ` Sergey Bronnikov via Tarantool-patches
2025-07-25 9:18 ` Sergey Kaplun via Tarantool-patches
0 siblings, 2 replies; 4+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-06-12 9:12 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
From: Mike Pall <mike>
Reported by Sergey Kaplun.
(cherry picked from commit cd4af8ad80bb6430ad2e547f7af236268c9be7d9)
When restoring a snapshot because of exiting by the check of the stack,
the snapshot from the parent trace is used. For the correct error
message, the snapshot map uses the next bytecode after the stored one.
In case, when the PC in the snapshot is BC_RET* (the last bytecode in
the prototype), there is no next value to be referenced, so this
approach leads to the assertion failure in `lj_debug_framepc()`.
This patch prevents this behaviour by checking the original bytecode and
avoiding PC adjusting in the case of the last bytecode in the prototype.
Sergey Kaplun:
* added the description and the test for the problem
Part of tarantool/tarantool#11278
---
Related issues:
* https://github.com/LuaJIT/LuaJIT/issues/1359
* https://github.com/tarantool/tarantool/issues/11278
Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1359-bad-pc-on-snap-restore-stackov
src/lj_bc.h | 5 ++
src/lj_parse.c | 14 +----
src/lj_snap.c | 6 ++-
...59-bad-pc-on-snap-restore-stackov.test.lua | 51 +++++++++++++++++++
4 files changed, 61 insertions(+), 15 deletions(-)
create mode 100644 test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua
diff --git a/src/lj_bc.h b/src/lj_bc.h
index 69a45f28..2196dbed 100644
--- a/src/lj_bc.h
+++ b/src/lj_bc.h
@@ -259,6 +259,11 @@ static LJ_AINLINE int bc_isret(BCOp op)
return (op == BC_RETM || op == BC_RET || op == BC_RET0 || op == BC_RET1);
}
+static LJ_AINLINE int bc_isret_or_tail(BCOp op)
+{
+ return (op == BC_CALLMT || op == BC_CALLT || bc_isret(op));
+}
+
LJ_DATA const uint16_t lj_bc_mode[];
LJ_DATA const uint16_t lj_bc_ofs[];
diff --git a/src/lj_parse.c b/src/lj_parse.c
index ec85ac9b..88108c01 100644
--- a/src/lj_parse.c
+++ b/src/lj_parse.c
@@ -1519,23 +1519,11 @@ static void fs_fixup_var(LexState *ls, GCproto *pt, uint8_t *p, size_t ofsvar)
#endif
-/* Check if bytecode op returns. */
-static int bcopisret(BCOp op)
-{
- switch (op) {
- case BC_CALLMT: case BC_CALLT:
- case BC_RETM: case BC_RET: case BC_RET0: case BC_RET1:
- return 1;
- default:
- return 0;
- }
-}
-
/* Fixup return instruction for prototype. */
static void fs_fixup_ret(FuncState *fs)
{
BCPos lastpc = fs->pc;
- if (lastpc <= fs->lasttarget || !bcopisret(bc_op(fs->bcbase[lastpc-1].ins))) {
+ if (lastpc <= fs->lasttarget || !bc_isret_or_tail(bc_op(fs->bcbase[lastpc-1].ins))) {
if ((fs->bl->flags & FSCOPE_UPVAL))
bcemit_AJ(fs, BC_UCLO, 0, 0);
bcemit_AD(fs, BC_RET0, 0, 1); /* Need final return. */
diff --git a/src/lj_snap.c b/src/lj_snap.c
index 4cfae579..3542a46c 100644
--- a/src/lj_snap.c
+++ b/src/lj_snap.c
@@ -954,8 +954,10 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
const BCIns *pc = snap_pc(&map[nent]);
lua_State *L = J->L;
- /* Set interpreter PC to the next PC to get correct error messages. */
- setcframe_pc(L->cframe, pc+1);
+ /* Set interpreter PC to the next PC to get correct error messages.
+ ** But not for returns or tail calls, since pc+1 may be out-of-range.
+ */
+ setcframe_pc(L->cframe, bc_isret_or_tail(bc_op(*pc)) ? pc : pc+1);
setcframe_pc(cframe_raw(cframe_prev(L->cframe)), pc);
/* Make sure the stack is big enough for the slots from the snapshot. */
diff --git a/test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua b/test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua
new file mode 100644
index 00000000..d09b5c42
--- /dev/null
+++ b/test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua
@@ -0,0 +1,51 @@
+local tap = require('tap')
+
+-- The test file to demonstrate the LuaJIT misbehaviour during
+-- stack overflow on snapshot restoration for the last bytecode in
+-- the prototype.
+-- See also: https://github.com/LuaJIT/LuaJIT/issues/1359.
+
+local test = tap.test('lj-1359-bad-pc-on-snap-restore-stackov'):skipcond({
+ ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(2)
+
+jit.opt.start('hotloop=1')
+
+-- When restoring a snapshot because of exiting by the check of
+-- the stack, the snapshot from the parent trace is used. For the
+-- correct error message, the snapshot map uses the next bytecode
+-- after the stored one. In case, when the PC in the snapshot is
+-- BC_RET* (the last bytecode in the prototype), there is no next
+-- value to be referenced, so this approach leads to the assertion
+-- failure in lj_debug_framepc().
+
+local counter = 0
+-- The second trace started from the side exit by the counter.
+-- It ends by entering the first trace.
+local function func_side_trace()
+ if counter > 5 then
+ -- This RET0 BC is the first recorded for the second trace.
+ -- The stack check for the child trace uses the first snapshot
+ -- from the parent trace.
+ return
+ end
+ counter = counter + 1;
+end
+
+-- The trace with up-recursion for the function that causes stack
+-- overflow. It is recorded first and inlines func_side_trace.
+local function stackov_f()
+ local r = stackov_f(func_side_trace())
+ return r
+end
+
+local result, errmsg = pcall(stackov_f)
+
+-- There is no assertion failure if we are here.
+-- Just sanity checks.
+test:ok(not result, 'correct status')
+test:like(errmsg, 'stack overflow', 'correct error message')
+
+test:done(true)
--
2.49.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] Avoid out-of-range PC for stack overflow error from snapshot restore.
2025-06-12 9:12 Sergey Kaplun via Tarantool-patches
@ 2025-06-25 14:47 ` Sergey Bronnikov via Tarantool-patches
2025-07-25 9:18 ` Sergey Kaplun via Tarantool-patches
1 sibling, 0 replies; 4+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2025-06-25 14:47 UTC (permalink / raw)
To: Sergey Kaplun; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 5875 bytes --]
Hi, Sergey,
thanks for the patch! LGTM
Sergey
On 6/12/25 12:12, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Reported by Sergey Kaplun.
>
> (cherry picked from commit cd4af8ad80bb6430ad2e547f7af236268c9be7d9)
>
> When restoring a snapshot because of exiting by the check of the stack,
> the snapshot from the parent trace is used. For the correct error
> message, the snapshot map uses the next bytecode after the stored one.
> In case, when the PC in the snapshot is BC_RET* (the last bytecode in
> the prototype), there is no next value to be referenced, so this
> approach leads to the assertion failure in `lj_debug_framepc()`.
>
> This patch prevents this behaviour by checking the original bytecode and
> avoiding PC adjusting in the case of the last bytecode in the prototype.
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#11278
> ---
>
> Related issues:
> *https://github.com/LuaJIT/LuaJIT/issues/1359
> *https://github.com/tarantool/tarantool/issues/11278
> Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1359-bad-pc-on-snap-restore-stackov
>
> src/lj_bc.h | 5 ++
> src/lj_parse.c | 14 +----
> src/lj_snap.c | 6 ++-
> ...59-bad-pc-on-snap-restore-stackov.test.lua | 51 +++++++++++++++++++
> 4 files changed, 61 insertions(+), 15 deletions(-)
> create mode 100644 test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua
>
> diff --git a/src/lj_bc.h b/src/lj_bc.h
> index 69a45f28..2196dbed 100644
> --- a/src/lj_bc.h
> +++ b/src/lj_bc.h
> @@ -259,6 +259,11 @@ static LJ_AINLINE int bc_isret(BCOp op)
> return (op == BC_RETM || op == BC_RET || op == BC_RET0 || op == BC_RET1);
> }
>
> +static LJ_AINLINE int bc_isret_or_tail(BCOp op)
> +{
> + return (op == BC_CALLMT || op == BC_CALLT || bc_isret(op));
> +}
> +
> LJ_DATA const uint16_t lj_bc_mode[];
> LJ_DATA const uint16_t lj_bc_ofs[];
>
> diff --git a/src/lj_parse.c b/src/lj_parse.c
> index ec85ac9b..88108c01 100644
> --- a/src/lj_parse.c
> +++ b/src/lj_parse.c
> @@ -1519,23 +1519,11 @@ static void fs_fixup_var(LexState *ls, GCproto *pt, uint8_t *p, size_t ofsvar)
>
> #endif
>
> -/* Check if bytecode op returns. */
> -static int bcopisret(BCOp op)
> -{
> - switch (op) {
> - case BC_CALLMT: case BC_CALLT:
> - case BC_RETM: case BC_RET: case BC_RET0: case BC_RET1:
> - return 1;
> - default:
> - return 0;
> - }
> -}
> -
> /* Fixup return instruction for prototype. */
> static void fs_fixup_ret(FuncState *fs)
> {
> BCPos lastpc = fs->pc;
> - if (lastpc <= fs->lasttarget || !bcopisret(bc_op(fs->bcbase[lastpc-1].ins))) {
> + if (lastpc <= fs->lasttarget || !bc_isret_or_tail(bc_op(fs->bcbase[lastpc-1].ins))) {
> if ((fs->bl->flags & FSCOPE_UPVAL))
> bcemit_AJ(fs, BC_UCLO, 0, 0);
> bcemit_AD(fs, BC_RET0, 0, 1); /* Need final return. */
> diff --git a/src/lj_snap.c b/src/lj_snap.c
> index 4cfae579..3542a46c 100644
> --- a/src/lj_snap.c
> +++ b/src/lj_snap.c
> @@ -954,8 +954,10 @@ const BCIns *lj_snap_restore(jit_State *J, void *exptr)
> const BCIns *pc = snap_pc(&map[nent]);
> lua_State *L = J->L;
>
> - /* Set interpreter PC to the next PC to get correct error messages. */
> - setcframe_pc(L->cframe, pc+1);
> + /* Set interpreter PC to the next PC to get correct error messages.
> + ** But not for returns or tail calls, since pc+1 may be out-of-range.
> + */
> + setcframe_pc(L->cframe, bc_isret_or_tail(bc_op(*pc)) ? pc : pc+1);
> setcframe_pc(cframe_raw(cframe_prev(L->cframe)), pc);
>
> /* Make sure the stack is big enough for the slots from the snapshot. */
> diff --git a/test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua b/test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua
> new file mode 100644
> index 00000000..d09b5c42
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1359-bad-pc-on-snap-restore-stackov.test.lua
> @@ -0,0 +1,51 @@
> +local tap = require('tap')
> +
> +-- The test file to demonstrate the LuaJIT misbehaviour during
> +-- stack overflow on snapshot restoration for the last bytecode in
> +-- the prototype.
> +-- See also:https://github.com/LuaJIT/LuaJIT/issues/1359.
> +
> +local test = tap.test('lj-1359-bad-pc-on-snap-restore-stackov'):skipcond({
> + ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +test:plan(2)
> +
> +jit.opt.start('hotloop=1')
> +
> +-- When restoring a snapshot because of exiting by the check of
> +-- the stack, the snapshot from the parent trace is used. For the
> +-- correct error message, the snapshot map uses the next bytecode
> +-- after the stored one. In case, when the PC in the snapshot is
> +-- BC_RET* (the last bytecode in the prototype), there is no next
> +-- value to be referenced, so this approach leads to the assertion
> +-- failure in lj_debug_framepc().
> +
> +local counter = 0
> +-- The second trace started from the side exit by the counter.
> +-- It ends by entering the first trace.
> +local function func_side_trace()
> + if counter > 5 then
> + -- This RET0 BC is the first recorded for the second trace.
> + -- The stack check for the child trace uses the first snapshot
> + -- from the parent trace.
> + return
> + end
> + counter = counter + 1;
> +end
> +
> +-- The trace with up-recursion for the function that causes stack
> +-- overflow. It is recorded first and inlines func_side_trace.
> +local function stackov_f()
> + local r = stackov_f(func_side_trace())
> + return r
> +end
> +
> +local result, errmsg = pcall(stackov_f)
> +
> +-- There is no assertion failure if we are here.
> +-- Just sanity checks.
> +test:ok(not result, 'correct status')
> +test:like(errmsg, 'stack overflow', 'correct error message')
> +
> +test:done(true)
[-- Attachment #2: Type: text/html, Size: 6574 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Tarantool-patches] [PATCH luajit] Avoid out-of-range PC for stack overflow error from snapshot restore.
2025-06-12 9:12 Sergey Kaplun via Tarantool-patches
2025-06-25 14:47 ` Sergey Bronnikov via Tarantool-patches
@ 2025-07-25 9:18 ` Sergey Kaplun via Tarantool-patches
1 sibling, 0 replies; 4+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2025-07-25 9:18 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches
I've applied the patch into all long-term branches in
tarantool/luajit and bumped a new version in Tarantool's master [1],
release/3.4 [2], release/3.3 [3], release/3.2 [4] and release/2.11 [5].
[1]: https://github.com/tarantool/tarantool/pull/11693
[2]: https://github.com/tarantool/tarantool/pull/11694
[3]: https://github.com/tarantool/tarantool/pull/11695
[4]: https://github.com/tarantool/tarantool/pull/11696
[5]: https://github.com/tarantool/tarantool/pull/11697
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-19 17:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-19 17:11 [Tarantool-patches] [PATCH luajit] Avoid out-of-range PC for stack overflow error from snapshot restore Sergey Kaplun via Tarantool-patches
-- strict thread matches above, loose matches on Subject: below --
2025-06-12 9:12 Sergey Kaplun via Tarantool-patches
2025-06-25 14:47 ` Sergey Bronnikov via Tarantool-patches
2025-07-25 9:18 ` Sergey Kaplun via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox