[Tarantool-patches] [PATCH luajit] Avoid assertion in case of stack overflow from stitched trace.

Sergey Kaplun skaplun at tarantool.org
Thu Dec 14 15:20:53 MSK 2023


Hi, Maxim!
Thanks for the patch!
Please, consider my comments below.

On 14.12.23, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
> 
> Reported by Sergey Bronnikov. Fixed by Sergey Kaplun.
> 
> (cherry-picked from commit 1d75cd4d7be638babe6d4e47bf73ea05fc65d81c)
> 
> When we exit from a stitched trace due to the Lua stack overflow
> error, the Lua and JIT stacks are not synchronized, and they
> won't be as long as the mentioned error is raised. Because of

The reason is not the stack synchornization, but the determination of
the bytecode when the error is raised. In the case of stitched trace
(see the dumps below), the last slot in the snapshot map is the argument
for the function to stitch (in our case, ref to 42). Or for the second
trace, we have any other function where the recording is stopped but not
the function where it was started.
So determine bytecode position isn't trivial. And it is better just
return `NO_BCPOS` in such cases.

| Trace 1 start
| ---- TRACE IR
| ....          SNAP   #0   [ ---- ]
| 0001      fun SLOAD  [L ] lit: #0   lit: R
| 0002      tab FLOAD  [L ] ref: 0001 lit: func.env
| 0003      int FLOAD  [L ] ref: 0002 lit: tab.hmask
| 0004   >  int EQ     [C ] ref: 0003 ref: integer 63
| 0005      p32 FLOAD  [L ] ref: 0002 lit: tab.node
| 0006   >  p32 HREFK  [R ] ref: 0005 ref: string "math" @ 0x400061e8 KSLOT: @54
| 0007   >  tab HLOAD  [L ] ref: 0006
| 0008      int FLOAD  [L ] ref: 0007 lit: tab.hmask
| 0009   >  int EQ     [C ] ref: 0008 ref: integer 31
| 0010      p32 FLOAD  [L ] ref: 0007 lit: tab.node
| 0011   >  p32 HREFK  [R ] ref: 0010 ref: string "fmod" @ 0x40006c88 KSLOT: @18
| 0012   >  fun HLOAD  [L ] ref: 0011
| 0013   >  fun EQ     [C ] ref: 0012 ref: fast function #58
| ....          SNAP   #1   [ ---- {trace 0x1 @ 0x400271d8} contpc ftsz | {number 42} {number 42} ]
| 0014          NOP    [N ]
|
| Trace 2 start
| ---- TRACE IR
| ....          SNAP   #0   [ ---- ]
| 0001      fun SLOAD  [L ] lit: #0   lit: R
| 0002   >  fun EQ     [C ] ref: 0001 ref: Lua function @ 0x400157f0, 1 upvalues, "@../master/test/tarantool-tests/lj-913-so-stitched.test.lua":11
| ....          SNAP   #1   [ ftsz | ftsz | ]
| 0003          NOP    [N ]

> that, we get the incorrect bytecode instruction in
> `debug_framepc`. This patch fixes this behavior, so the
> `debug_framepc` now returns `NO_BCPOS` for this case.
> 
> Maxim Kokryashkin:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#9145
> ---
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-913-avoid-assertion-stkov-from-stitched-trace
> PR: https://github.com/tarantool/tarantool/pull/9484
> Issues: https://github.com/tarantool/tarantool/issues/9145
> https://github.com/LuaJIT/LuaJIT/issues/913
> 
>  src/lj_debug.c                                |  9 +++++---
>  ...-913-stackoverflow-stitched-trace.test.lua | 23 +++++++++++++++++++
>  2 files changed, 29 insertions(+), 3 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-913-stackoverflow-stitched-trace.test.lua
> 
> diff --git a/src/lj_debug.c b/src/lj_debug.c
> index 46c442c6..107f464c 100644
> --- a/src/lj_debug.c
> +++ b/src/lj_debug.c

<snipped>

> diff --git a/test/tarantool-tests/lj-913-stackoverflow-stitched-trace.test.lua b/test/tarantool-tests/lj-913-stackoverflow-stitched-trace.test.lua
> new file mode 100644
> index 00000000..3c12f0d9
> --- /dev/null
> +++ b/test/tarantool-tests/lj-913-stackoverflow-stitched-trace.test.lua
> @@ -0,0 +1,23 @@
> +local tap = require('tap')
> +-- Test to demonstrate the incorrect LuaJIT behavior when exiting
> +-- from a snapshot for stitched trace.
> +local test = tap.test('lj-913-stackoverflow-stitched-trace'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +test:plan(3)
> +
> +-- Recursion to cause stack overflow.
> +local function callee()
> +  -- `math.fmod()` is NYI, so trace will be stitched here.
> +  local _ = math.fmod(42, 42)

We don't need this `_` variable here. Just using `math.fmod()` is
enough.

> +  callee()
> +end
> +
> +local st, err = pcall(callee)
> +
> +test:ok(true, 'assertion is not triggered')
> +test:ok(not st, 'error happened')
> +test:like(err, 'stack overflow', 'stack overflow happened')
> +
> +test:done(true)
> --
> 2.43.0
> 

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list