Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching.
@ 2024-02-16 11:11 Maxim Kokryashkin via Tarantool-patches
  2024-02-20  8:08 ` Sergey Bronnikov via Tarantool-patches
  2024-02-21 12:34 ` Sergey Kaplun via Tarantool-patches
  0 siblings, 2 replies; 5+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2024-02-16 11:11 UTC (permalink / raw)
  To: tarantool-patches, skaplun, sergeyb

From: Mike Pall <mike>

Thanks to doujiang24.

(cherry-picked from commit 3f9389edc6cdf3f78a6896d550c236860aed62b2)

The Lua stack is changed in the `lj_record_stop`, so if the trace
is aborted, for example, when the `maxsnap` is reached, and an
error is thrown from there, the execution continues with an
unbalanced stack.

Since the only error that is thrown from the `lj_record_stop`
with modified stack is `LJ_TRERR_SNAPOV`, this patch adds a
check for snapshot overflow to the `recff_stitch`.

Maxim Kokryashkin:
* added the description and the test for the problem

Part of tarantool/tarantool#9595
---
Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-pr-720-errors-before-stitch
PR: https://github.com/tarantool/tarantool/pull/9703
Issues:
https://github.com/tarantool/tarantool/issues/9595
https://github.com/LuaJIT/LuaJIT/pull/720

 src/lj_ffrecord.c                             |  4 ++++
 .../lj-pr-720-errors-before-stitch.test.lua   | 20 +++++++++++++++++++
 2 files changed, 24 insertions(+)
 create mode 100644 test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua

diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
index 1b2ddb72..e3ed80fb 100644
--- a/src/lj_ffrecord.c
+++ b/src/lj_ffrecord.c
@@ -107,6 +107,10 @@ static void recff_stitch(jit_State *J)
   const BCIns *pc = frame_pc(base-1);
   TValue *pframe = frame_prevl(base-1);

+  /* Check for this now. Throwing in lj_record_stop messes up the stack. */
+  if (J->cur.nsnap >= (MSize)J->param[JIT_P_maxsnap])
+    lj_trace_err(J, LJ_TRERR_SNAPOV);
+
   /* Move func + args up in Lua stack and insert continuation. */
   memmove(&base[1], &base[-1-LJ_FR2], sizeof(TValue)*nslot);
   setframe_ftsz(nframe, ((char *)nframe - (char *)pframe) + FRAME_CONT);
diff --git a/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
new file mode 100644
index 00000000..53b7b5ea
--- /dev/null
+++ b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
@@ -0,0 +1,20 @@
+local tap = require('tap')
+local test = tap.test('lj-pr-720-errors-before-stitch'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+test:plan(1)
+
+-- `math.modf` recording is NYI.
+local modf = require('math').modf
+jit.opt.start('hotloop=1', 'maxsnap=1')
+
+-- The loop has only two iterations: the first to detect its
+-- hotness and the second to record it. The snapshot limit is
+-- set to one and is certainly reached.
+for _ = 1, 2 do
+  -- Forcify stitch.
+  modf(1.2)
+end
+
+test:ok(true, 'stack is balanced')
+test:done(true)
--
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching.
  2024-02-16 11:11 [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching Maxim Kokryashkin via Tarantool-patches
@ 2024-02-20  8:08 ` Sergey Bronnikov via Tarantool-patches
  2024-03-11 15:58   ` Sergey Bronnikov via Tarantool-patches
  2024-02-21 12:34 ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 5+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2024-02-20  8:08 UTC (permalink / raw)
  To: Maxim Kokryashkin, tarantool-patches, skaplun

Hello, Max,

thanks for the patch! see my comments

Sergey

On 2/16/24 14:11, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
>
> Thanks to doujiang24.
>
> (cherry-picked from commit 3f9389edc6cdf3f78a6896d550c236860aed62b2)
>
> The Lua stack is changed in the `lj_record_stop`, so if the trace
> is aborted, for example, when the `maxsnap` is reached, and an
> error is thrown from there, the execution continues with an
> unbalanced stack.
>
> Since the only error that is thrown from the `lj_record_stop`
> with modified stack is `LJ_TRERR_SNAPOV`, this patch adds a

As I see lj_record_stop [1] throws only LJ_TRERR_RETRY or

I got you wrong.


1. 
https://github.com/LuaJIT/LuaJIT/blob/0d313b243194a0b8d2399d8b549ca5a0ff234db5/src/lj_record.c#L299

> check for snapshot overflow to the `recff_stitch`.
>
> Maxim Kokryashkin:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#9595
> ---
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-pr-720-errors-before-stitch
> PR: https://github.com/tarantool/tarantool/pull/9703
> Issues:
> https://github.com/tarantool/tarantool/issues/9595
> https://github.com/LuaJIT/LuaJIT/pull/720
>
>   src/lj_ffrecord.c                             |  4 ++++
>   .../lj-pr-720-errors-before-stitch.test.lua   | 20 +++++++++++++++++++
>   2 files changed, 24 insertions(+)
>   create mode 100644 test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
>
> diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
> index 1b2ddb72..e3ed80fb 100644
> --- a/src/lj_ffrecord.c
> +++ b/src/lj_ffrecord.c
> @@ -107,6 +107,10 @@ static void recff_stitch(jit_State *J)
>     const BCIns *pc = frame_pc(base-1);
>     TValue *pframe = frame_prevl(base-1);
>
> +  /* Check for this now. Throwing in lj_record_stop messes up the stack. */
> +  if (J->cur.nsnap >= (MSize)J->param[JIT_P_maxsnap])
> +    lj_trace_err(J, LJ_TRERR_SNAPOV);
> +
>     /* Move func + args up in Lua stack and insert continuation. */
>     memmove(&base[1], &base[-1-LJ_FR2], sizeof(TValue)*nslot);
>     setframe_ftsz(nframe, ((char *)nframe - (char *)pframe) + FRAME_CONT);
> diff --git a/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
> new file mode 100644
> index 00000000..53b7b5ea
> --- /dev/null
> +++ b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua

You could omit "-pr" in the name, anyway

https://github.com/LuaJIT/LuaJIT/issues/720 redirects to 
https://github.com/LuaJIT/LuaJIT/pull/720.

> @@ -0,0 +1,20 @@
> +local tap = require('tap')
> +local test = tap.test('lj-pr-720-errors-before-stitch'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +test:plan(1)
> +
> +-- `math.modf` recording is NYI.
> +local modf = require('math').modf
> +jit.opt.start('hotloop=1', 'maxsnap=1')
> +
> +-- The loop has only two iterations: the first to detect its
> +-- hotness and the second to record it. The snapshot limit is
> +-- set to one and is certainly reached.
> +for _ = 1, 2 do
> +  -- Forcify stitch.
> +  modf(1.2)
> +end
> +
> +test:ok(true, 'stack is balanced')
> +test:done(true)
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching.
  2024-02-16 11:11 [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching Maxim Kokryashkin via Tarantool-patches
  2024-02-20  8:08 ` Sergey Bronnikov via Tarantool-patches
@ 2024-02-21 12:34 ` Sergey Kaplun via Tarantool-patches
  2024-02-22 14:03   ` Sergey Kaplun via Tarantool-patches
  1 sibling, 1 reply; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-02-21 12:34 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the patch!
Please consider my comments below.

On 16.02.24, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
> 
> Thanks to doujiang24.
> 
> (cherry-picked from commit 3f9389edc6cdf3f78a6896d550c236860aed62b2)
> 
> The Lua stack is changed in the `lj_record_stop`, so if the trace

Actually, it is changed in the `recff_stitch()` -- the continuation frame
is inserted below a function with arguments.

> is aborted, for example, when the `maxsnap` is reached, and an

If there are any other situations, we must handle them too:

1) AFAICS, when LuaJIT is built with -DLUAJIT_ENABLE_TABLE_BUMP, the
following lines may lead to the error, and thereby the Lua stack
becomes unbalanced:

<src/lj_record.c:290> `lj_record_stop()`:
| if (J->retryrec)
|   lj_trace_err(J, LJ_TRERR_RETRY);

Build with the following flags:
| cmake -DCMAKE_C_FLAGS="-DLUAJIT_ENABLE_TABLE_BUMP" -DLUA_USE_APICHECK=ON \
|       -DLUA_USE_ASSERT=ON -DCMAKE_BUILD_TYPE=Debug \
|       -DLUAJIT_ENABLE_GC64=OFF . && make -j

The following script reproduces the issue (`maxsnap` configuration option
is gone):

| src/luajit -e "
| local modf = math.modf
| jit.opt.start('hotloop=1')
|
| local t1
| for i = 1, 2 do
|   t1 = {}
|   t1[i] = i
|   -- Forcify stitch.
|   modf(1.2)
| end
| "
| LuaJIT ASSERT src/lj_dispatch.c:502: lj_dispatch_call: unbalanced stack after hot instruction

2) Also, it may happen when the memory error is raised when we try to
reallocate the IR buffer for the next instruction in `lj_ir_nextins()`.

The reproducer for this issue is much more tricky.

My suggestions here are the following:
Resolve the special case in the scope of backporting this patch, and
report the 2 follow-ups with suggested solutions to the upstream.
As I can see, this issue should be fixed in the following way:
Add the `lj_vm_cpcall()` to the `record_stop()` call and rethrow
the error, if needed, after clenup.

Thoughts?

> error is thrown from there, the execution continues with an
> unbalanced stack.
> 
> Since the only error that is thrown from the `lj_record_stop`
> with modified stack is `LJ_TRERR_SNAPOV`, this patch adds a
> check for snapshot overflow to the `recff_stitch`.
> 
> Maxim Kokryashkin:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#9595
> ---
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/lj-pr-720-errors-before-stitch
> PR: https://github.com/tarantool/tarantool/pull/9703
> Issues:
> https://github.com/tarantool/tarantool/issues/9595
> https://github.com/LuaJIT/LuaJIT/pull/720
> 
>  src/lj_ffrecord.c                             |  4 ++++
>  .../lj-pr-720-errors-before-stitch.test.lua   | 20 +++++++++++++++++++
>  2 files changed, 24 insertions(+)
>  create mode 100644 test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
> 
> diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
> index 1b2ddb72..e3ed80fb 100644
> --- a/src/lj_ffrecord.c
> +++ b/src/lj_ffrecord.c
> @@ -107,6 +107,10 @@ static void recff_stitch(jit_State *J)
>    const BCIns *pc = frame_pc(base-1);
>    TValue *pframe = frame_prevl(base-1);
> 
> +  /* Check for this now. Throwing in lj_record_stop messes up the stack. */
> +  if (J->cur.nsnap >= (MSize)J->param[JIT_P_maxsnap])
> +    lj_trace_err(J, LJ_TRERR_SNAPOV);
> +
>    /* Move func + args up in Lua stack and insert continuation. */
>    memmove(&base[1], &base[-1-LJ_FR2], sizeof(TValue)*nslot);
>    setframe_ftsz(nframe, ((char *)nframe - (char *)pframe) + FRAME_CONT);
> diff --git a/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
> new file mode 100644
> index 00000000..53b7b5ea
> --- /dev/null
> +++ b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua

I suggest to ommit the "pr-" part of the prefix. Since PR are treated
the same way by Mike we don't need to mention it specifically. Also, as
Sergey mentioned, GitHub redirects users to the PR page, so we can avoid
specific mentioning.

Side note-reminder (top priority of mentioning at the top):
* lj-dddd stands for issues from LuaJIT/LuaJIT repository.
* gh-dddd stands for issues from tarantool/tarantool repository.
* or-dddd stands for issues from openresty/lua-resty-core or (since lack
  of them) openresty/luajit2 repository.

> @@ -0,0 +1,20 @@
> +local tap = require('tap')
> +local test = tap.test('lj-pr-720-errors-before-stitch'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +test:plan(1)
> +
> +-- `math.modf` recording is NYI.
> +local modf = require('math').modf

There is no need to require math library since it is already loaded.

> +jit.opt.start('hotloop=1', 'maxsnap=1')
> +
> +-- The loop has only two iterations: the first to detect its
> +-- hotness and the second to record it. The snapshot limit is
> +-- set to one and is certainly reached.
> +for _ = 1, 2 do
> +  -- Forcify stitch.
> +  modf(1.2)
> +end
> +
> +test:ok(true, 'stack is balanced')
> +test:done(true)
> --
> 2.43.0
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching.
  2024-02-21 12:34 ` Sergey Kaplun via Tarantool-patches
@ 2024-02-22 14:03   ` Sergey Kaplun via Tarantool-patches
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2024-02-22 14:03 UTC (permalink / raw)
  To: Maxim Kokryashkin, tarantool-patches

Hi, again, folks!

On 21.02.24, Sergey Kaplun via Tarantool-patches wrote:
> Hi, Maxim!
> Thanks for the patch!
> Please consider my comments below.
> 
> On 16.02.24, Maxim Kokryashkin wrote:
> > From: Mike Pall <mike>
> > 
> > Thanks to doujiang24.
> > 
> > (cherry-picked from commit 3f9389edc6cdf3f78a6896d550c236860aed62b2)
> > 
> > The Lua stack is changed in the `lj_record_stop`, so if the trace
> 
> Actually, it is changed in the `recff_stitch()` -- the continuation frame
> is inserted below a function with arguments.
> 
> > is aborted, for example, when the `maxsnap` is reached, and an
> 
> If there are any other situations, we must handle them too:
> 
> 1) AFAICS, when LuaJIT is built with -DLUAJIT_ENABLE_TABLE_BUMP, the
> following lines may lead to the error, and thereby the Lua stack
> becomes unbalanced:
> 
> <src/lj_record.c:290> `lj_record_stop()`:
> | if (J->retryrec)
> |   lj_trace_err(J, LJ_TRERR_RETRY);
> 
> Build with the following flags:
> | cmake -DCMAKE_C_FLAGS="-DLUAJIT_ENABLE_TABLE_BUMP" -DLUA_USE_APICHECK=ON \
> |       -DLUA_USE_ASSERT=ON -DCMAKE_BUILD_TYPE=Debug \
> |       -DLUAJIT_ENABLE_GC64=OFF . && make -j
> 
> The following script reproduces the issue (`maxsnap` configuration option
> is gone):
> 
> | src/luajit -e "
> | local modf = math.modf
> | jit.opt.start('hotloop=1')
> |
> | local t1
> | for i = 1, 2 do
> |   t1 = {}
> |   t1[i] = i
> |   -- Forcify stitch.
> |   modf(1.2)
> | end
> | "
> | LuaJIT ASSERT src/lj_dispatch.c:502: lj_dispatch_call: unbalanced stack after hot instruction
> 
> 2) Also, it may happen when the memory error is raised when we try to
> reallocate the IR buffer for the next instruction in `lj_ir_nextins()`.
> 
> The reproducer for this issue is much more tricky.
> 
> My suggestions here are the following:
> Resolve the special case in the scope of backporting this patch, and
> report the 2 follow-ups with suggested solutions to the upstream.
> As I can see, this issue should be fixed in the following way:
> Add the `lj_vm_cpcall()` to the `record_stop()` call and rethrow
> the error, if needed, after clenup.
> 

Reported here:
https://github.com/LuaJIT/LuaJIT/issues/1166

<snipped>

> > 
> 
> -- 
> Best regards,
> Sergey Kaplun

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching.
  2024-02-20  8:08 ` Sergey Bronnikov via Tarantool-patches
@ 2024-03-11 15:58   ` Sergey Bronnikov via Tarantool-patches
  0 siblings, 0 replies; 5+ messages in thread
From: Sergey Bronnikov via Tarantool-patches @ 2024-03-11 15:58 UTC (permalink / raw)
  To: Maxim Kokryashkin, tarantool-patches, skaplun

Max, thanks for fixes! LGTM

On 2/20/24 11:08, Sergey Bronnikov via Tarantool-patches wrote:
> Hello, Max,
>
> thanks for the patch! see my comments
>
> Sergey
>
> On 2/16/24 14:11, Maxim Kokryashkin wrote:
>> From: Mike Pall <mike>
>>
>> Thanks to doujiang24.
>>
>> (cherry-picked from commit 3f9389edc6cdf3f78a6896d550c236860aed62b2)
>>
>> The Lua stack is changed in the `lj_record_stop`, so if the trace
>> is aborted, for example, when the `maxsnap` is reached, and an
>> error is thrown from there, the execution continues with an
>> unbalanced stack.
>>
>> Since the only error that is thrown from the `lj_record_stop`
>> with modified stack is `LJ_TRERR_SNAPOV`, this patch adds a
>
> As I see lj_record_stop [1] throws only LJ_TRERR_RETRY or
>
> I got you wrong.
>
>
> 1. 
> https://github.com/LuaJIT/LuaJIT/blob/0d313b243194a0b8d2399d8b549ca5a0ff234db5/src/lj_record.c#L299
>
>> check for snapshot overflow to the `recff_stitch`.
>>
>> Maxim Kokryashkin:
>> * added the description and the test for the problem
>>
>> Part of tarantool/tarantool#9595
>> ---
>> Branch: 
>> https://github.com/tarantool/luajit/tree/fckxorg/lj-pr-720-errors-before-stitch
>> PR: https://github.com/tarantool/tarantool/pull/9703
>> Issues:
>> https://github.com/tarantool/tarantool/issues/9595
>> https://github.com/LuaJIT/LuaJIT/pull/720
>>
>>   src/lj_ffrecord.c                             |  4 ++++
>>   .../lj-pr-720-errors-before-stitch.test.lua   | 20 +++++++++++++++++++
>>   2 files changed, 24 insertions(+)
>>   create mode 100644 
>> test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
>>
>> diff --git a/src/lj_ffrecord.c b/src/lj_ffrecord.c
>> index 1b2ddb72..e3ed80fb 100644
>> --- a/src/lj_ffrecord.c
>> +++ b/src/lj_ffrecord.c
>> @@ -107,6 +107,10 @@ static void recff_stitch(jit_State *J)
>>     const BCIns *pc = frame_pc(base-1);
>>     TValue *pframe = frame_prevl(base-1);
>>
>> +  /* Check for this now. Throwing in lj_record_stop messes up the 
>> stack. */
>> +  if (J->cur.nsnap >= (MSize)J->param[JIT_P_maxsnap])
>> +    lj_trace_err(J, LJ_TRERR_SNAPOV);
>> +
>>     /* Move func + args up in Lua stack and insert continuation. */
>>     memmove(&base[1], &base[-1-LJ_FR2], sizeof(TValue)*nslot);
>>     setframe_ftsz(nframe, ((char *)nframe - (char *)pframe) + 
>> FRAME_CONT);
>> diff --git 
>> a/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua 
>> b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
>> new file mode 100644
>> index 00000000..53b7b5ea
>> --- /dev/null
>> +++ b/test/tarantool-tests/lj-pr-720-errors-before-stitch.test.lua
>
> You could omit "-pr" in the name, anyway
>
> https://github.com/LuaJIT/LuaJIT/issues/720 redirects to 
> https://github.com/LuaJIT/LuaJIT/pull/720.
>
>> @@ -0,0 +1,20 @@
>> +local tap = require('tap')
>> +local test = tap.test('lj-pr-720-errors-before-stitch'):skipcond({
>> +  ['Test requires JIT enabled'] = not jit.status(),
>> +})
>> +test:plan(1)
>> +
>> +-- `math.modf` recording is NYI.
>> +local modf = require('math').modf
>> +jit.opt.start('hotloop=1', 'maxsnap=1')
>> +
>> +-- The loop has only two iterations: the first to detect its
>> +-- hotness and the second to record it. The snapshot limit is
>> +-- set to one and is certainly reached.
>> +for _ = 1, 2 do
>> +  -- Forcify stitch.
>> +  modf(1.2)
>> +end
>> +
>> +test:ok(true, 'stack is balanced')
>> +test:done(true)
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-11 15:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-16 11:11 [Tarantool-patches] [PATCH luajit] Throw any errors before stack changes in trace stitching Maxim Kokryashkin via Tarantool-patches
2024-02-20  8:08 ` Sergey Bronnikov via Tarantool-patches
2024-03-11 15:58   ` Sergey Bronnikov via Tarantool-patches
2024-02-21 12:34 ` Sergey Kaplun via Tarantool-patches
2024-02-22 14:03   ` Sergey Kaplun via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox