Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH luajit v2] sysprof: fix crash during FFUNC stream
@ 2023-06-07 12:25 Maxim Kokryashkin via Tarantool-patches
  2023-06-29  8:54 ` Sergey Kaplun via Tarantool-patches
  2023-07-09 12:20 ` Sergey Kaplun via Tarantool-patches
  0 siblings, 2 replies; 3+ messages in thread
From: Maxim Kokryashkin via Tarantool-patches @ 2023-06-07 12:25 UTC (permalink / raw)
  To: tarantool-patches, skaplun, sergeyb

Sometimes, the Lua stack can be inconsistent during
the FFUNC execution, which may lead to a sysprof
crash during the stack unwinding.

This patch replaces the `top_frame` property of `global_State`
with `lj_sysprof_topframe` structure, which contains `top_frame`
and `ffid` properties. `ffid` property makes sense only when the
LuaJIT VM state is set to `FFUNC`. That property is set to the
ffid of the fast function that VM is about to execute.
In the same time, `top_frame` property is not updated now, so
the top frame of the Lua stack can be streamed based on the ffid,
and the rest of the Lua stack can be streamed as usual.

Resolves tarantool/tarantool#8594
---
Changes in v2:
- Sysprof binary data is now dumped into `/dev/null` to avoid cluttering
of the test runner drive

Branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-8594-sysprof-ffunc-crash
PR: https://github.com/tarantool/tarantool/pull/8737

 src/lj_obj.h                                  |  7 +++-
 src/lj_sysprof.c                              | 26 ++++++++++++---
 src/vm_x64.dasc                               | 21 ++++++++++--
 src/vm_x86.dasc                               | 22 ++++++++++---
 .../gh-8594-sysprof-ffunc-crash.test.lua      | 33 +++++++++++++++++++
 5 files changed, 96 insertions(+), 13 deletions(-)
 create mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua

diff --git a/src/lj_obj.h b/src/lj_obj.h
index 45507e0d..186433a3 100644
--- a/src/lj_obj.h
+++ b/src/lj_obj.h
@@ -598,6 +598,11 @@ enum {
   GCSmax
 };
 
+struct lj_sysprof_topframe {
+  TValue *top_frame; /* Top frame for sysprof. */
+  uint8_t ffid; /* FFID of the fast function VM is about to execute. */
+};
+
 typedef struct GCState {
   GCSize total;		/* Memory currently allocated. */
   GCSize threshold;	/* Memory threshold. */
@@ -675,7 +680,7 @@ typedef struct global_State {
   MRef ctype_state;	/* Pointer to C type state. */
   GCRef gcroot[GCROOT_MAX];  /* GC roots. */
 #ifdef LJ_HASSYSPROF
-  TValue *top_frame;	/* Top frame for sysprof. */
+  struct lj_sysprof_topframe top_frame_info;	/* Top frame info for sysprof. */
 #endif
 } global_State;
 
diff --git a/src/lj_sysprof.c b/src/lj_sysprof.c
index 2e9ed9b3..0a341e16 100644
--- a/src/lj_sysprof.c
+++ b/src/lj_sysprof.c
@@ -109,6 +109,12 @@ static void stream_epilogue(struct sysprof *sp)
   lj_wbuf_addbyte(&sp->out, LJP_EPILOGUE_BYTE);
 }
 
+static void stream_ffunc_impl(struct lj_wbuf *buf, uint8_t ffid)
+{
+  lj_wbuf_addbyte(buf, LJP_FRAME_FFUNC);
+  lj_wbuf_addu64(buf, ffid);
+}
+
 static void stream_lfunc(struct lj_wbuf *buf, const GCfunc *func)
 {
   lua_assert(isluafunc(func));
@@ -129,8 +135,7 @@ static void stream_cfunc(struct lj_wbuf *buf, const GCfunc *func)
 static void stream_ffunc(struct lj_wbuf *buf, const GCfunc *func)
 {
   lua_assert(isffunc(func));
-  lj_wbuf_addbyte(buf, LJP_FRAME_FFUNC);
-  lj_wbuf_addu64(buf, func->c.ffid);
+  stream_ffunc_impl(buf, func->c.ffid);
 }
 
 static void stream_frame_lua(struct lj_wbuf *buf, const cTValue *frame)
@@ -148,7 +153,7 @@ static void stream_frame_lua(struct lj_wbuf *buf, const cTValue *frame)
     lua_assert(0);
 }
 
-static void stream_backtrace_lua(struct sysprof *sp)
+static void stream_backtrace_lua(struct sysprof *sp, uint32_t vmstate)
 {
   global_State *g = sp->g;
   struct lj_wbuf *buf = &sp->out;
@@ -158,8 +163,19 @@ static void stream_backtrace_lua(struct sysprof *sp)
   lua_assert(g != NULL);
   L = gco2th(gcref(g->cur_L));
   lua_assert(L != NULL);
+  /*
+  ** Lua stack may be inconsistent during the execution of a
+  ** fast-function, so instead of updating the `top_frame` for
+  ** it, its `ffid` is set instead. The first frame on the
+  ** result stack is streamed manually, and the rest of the
+  ** stack is streamed based on the previous `top_frame` value.
+  */
+  if (vmstate == LJ_VMST_FFUNC) {
+    uint8_t ffid = g->top_frame_info.ffid;
+    stream_ffunc_impl(buf, ffid);
+  }
 
-  top_frame = g->top_frame - 1; //(1 + LJ_FR2)
+  top_frame = g->top_frame_info.top_frame - 1;
 
   bot = tvref(L->stack) + LJ_FR2;
   /* Traverse frames backwards */
@@ -234,7 +250,7 @@ static void stream_trace(struct sysprof *sp, uint32_t vmstate)
 static void stream_guest(struct sysprof *sp, uint32_t vmstate)
 {
   lj_wbuf_addbyte(&sp->out, (uint8_t)vmstate);
-  stream_backtrace_lua(sp);
+  stream_backtrace_lua(sp, vmstate);
   stream_backtrace_host(sp);
 }
 
diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
index 7b04b928..3a35b9f7 100644
--- a/src/vm_x64.dasc
+++ b/src/vm_x64.dasc
@@ -353,14 +353,29 @@
 |// it syncs with the BASE register only when the control is passed to
 |// user code. So we need to sync the BASE on each vmstate change to
 |// keep it consistent.
+|// The only execption are FFUNCs because sometimes even internal BASE
+|// stash is inconsistent for them. To address that issue, their ffid
+|// is stashed instead, so the corresponding frame can be streamed
+|// manually.
 |.macro set_vmstate_sync_base, st
 |.if LJ_HASSYSPROF
 |  set_vmstate INTERP  // Guard for non-atomic VM context restoration
-|  mov qword [DISPATCH+DISPATCH_GL(top_frame)], BASE
+|  mov qword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], BASE
 |.endif
 |  set_vmstate st
 |.endmacro
 |
+|.macro set_vmstate_ffunc
+|.if LJ_HASSYSPROF
+|  set_vmstate INTERP
+|  mov TMPR, [BASE - 16]
+|  cleartp LFUNC:TMPR
+|  mov r10b, LFUNC:TMPR->ffid // r10b is the byte-sized part of TMPR
+|  mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], r10b
+|.endif
+|  set_vmstate FFUNC
+|.endmacro
+|
 |// Uses TMPRd (r10d).
 |.macro save_vmstate
 |.if not WIN
@@ -376,7 +391,7 @@
 |  set_vmstate INTERP
 |  mov TMPR, SAVE_L
 |  mov TMPR, L:TMPR->base
-|  mov qword [DISPATCH+DISPATCH_GL(top_frame)], TMPR
+|  mov qword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], TMPR
 |.endif
 |  mov TMPRd, SAVE_VMSTATE
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], TMPRd
@@ -1188,7 +1203,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
-  |  set_vmstate_sync_base FFUNC
+  |  set_vmstate_ffunc
   |.endmacro
   |
   |.macro .ffunc_1, name
diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
index bd1e940e..fabeec9f 100644
--- a/src/vm_x86.dasc
+++ b/src/vm_x86.dasc
@@ -451,14 +451,28 @@
 |// it syncs with the BASE register only when the control is passed to
 |// user code. So we need to sync the BASE on each vmstate change to
 |// keep it consistent.
+|// The only execption are FFUNCs because sometimes even internal BASE
+|// stash is inconsistent for them. To address that issue, their ffid
+|// is stashed instead, so the corresponding frame can be streamed
+|// manually.
 |.macro set_vmstate_sync_base, st
 |.if LJ_HASSYSPROF
 |  set_vmstate INTERP  // Guard for non-atomic VM context restoration
-|  mov dword [DISPATCH+DISPATCH_GL(top_frame)], BASE
+|  mov dword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], BASE
 |.endif
 |  set_vmstate st
 |.endmacro
 |
+|.macro set_vmstate_ffunc
+|.if LJ_HASSYSPROF
+|  set_vmstate INTERP
+|  mov LFUNC:XCHGd, [BASE - 8]
+|  mov r11b, LFUNC:XCHGd->ffid // r11b is the byte-sized part of XCHGd
+|  mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], r11b
+|.endif
+|  set_vmstate FFUNC
+|.endmacro
+|
 |// Uses spilled ecx on x86 or XCHGd (r11d) on x64.
 |.macro save_vmstate
 |.if not WIN
@@ -485,7 +499,7 @@
 |.if LJ_HASSYSPROF
 |  mov ecx, SAVE_L
 |  mov ecx, L:ecx->base
-|  mov dword [DISPATCH+DISPATCH_GL(top_frame)], ecx
+|  mov dword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], ecx
 |.endif
 |  mov ecx, SAVE_VMSTATE
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], ecx
@@ -494,7 +508,7 @@
 |.if LJ_HASSYSPROF
 |  mov XCHGd, SAVE_L
 |  mov XCHGd, L:XCHGd->base
-|  mov dword [DISPATCH+DISPATCH_GL(top_frame)], XCHGd
+|  mov dword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], XCHGd
 |.endif
 |  mov XCHGd, SAVE_VMSTATE
 |  mov dword [DISPATCH+DISPATCH_GL(vmstate)], XCHGd
@@ -1488,7 +1502,7 @@ static void build_subroutines(BuildCtx *ctx)
   |
   |.macro .ffunc, name
   |->ff_ .. name:
-  |  set_vmstate_sync_base FFUNC
+  |  set_vmstate_ffunc
   |.endmacro
   |
   |.macro .ffunc_1, name
diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
new file mode 100644
index 00000000..027eed74
--- /dev/null
+++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
@@ -0,0 +1,33 @@
+local tap = require('tap')
+local test = tap.test('gh-8594-sysprof-ffunc-crash'):skipcond({
+  ['Sysprof is implemented for x86_64 only'] = jit.arch ~= 'x86' and
+                                               jit.arch ~= 'x64',
+  ['Sysprof is implemented for Linux only'] = jit.os ~= "Linux",
+})
+
+test:plan(1)
+
+jit.off()
+-- XXX: Run JIT tuning functions in a safe frame to avoid errors
+-- thrown when LuaJIT is compiled with JIT engine disabled.
+pcall(jit.flush)
+
+local TMP_BINFILE = '/dev/null'
+
+local res, err = misc.sysprof.start{
+  mode = 'C',
+  interval = 1,
+  path = TMP_BINFILE,
+}
+assert(res, err)
+
+for i = 1, 1e6 do
+  tostring(i)
+end
+
+res, err = misc.sysprof.stop()
+assert(res, err)
+
+test:ok(true, 'sysprof finished successfully')
+
+os.exit(test:check() and 0 or 1)
-- 
2.40.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2] sysprof: fix crash during FFUNC stream
  2023-06-07 12:25 [Tarantool-patches] [PATCH luajit v2] sysprof: fix crash during FFUNC stream Maxim Kokryashkin via Tarantool-patches
@ 2023-06-29  8:54 ` Sergey Kaplun via Tarantool-patches
  2023-07-09 12:20 ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 3+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-06-29  8:54 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
Thanks for the patch!
Please consider my comments below.

On 07.06.23, Maxim Kokryashkin wrote:
> Sometimes, the Lua stack can be inconsistent during
> the FFUNC execution, which may lead to a sysprof
> crash during the stack unwinding.
> 
> This patch replaces the `top_frame` property of `global_State`
> with `lj_sysprof_topframe` structure, which contains `top_frame`
> and `ffid` properties. `ffid` property makes sense only when the
> LuaJIT VM state is set to `FFUNC`. That property is set to the
> ffid of the fast function that VM is about to execute.
> In the same time, `top_frame` property is not updated now, so
> the top frame of the Lua stack can be streamed based on the ffid,
> and the rest of the Lua stack can be streamed as usual.
> 
> Resolves tarantool/tarantool#8594
> ---
> Changes in v2:
> - Sysprof binary data is now dumped into `/dev/null` to avoid cluttering
> of the test runner drive
> 
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-8594-sysprof-ffunc-crash
> PR: https://github.com/tarantool/tarantool/pull/8737
> 
>  src/lj_obj.h                                  |  7 +++-
>  src/lj_sysprof.c                              | 26 ++++++++++++---
>  src/vm_x64.dasc                               | 21 ++++++++++--
>  src/vm_x86.dasc                               | 22 ++++++++++---
>  .../gh-8594-sysprof-ffunc-crash.test.lua      | 33 +++++++++++++++++++
>  5 files changed, 96 insertions(+), 13 deletions(-)
>  create mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> 
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 45507e0d..186433a3 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -598,6 +598,11 @@ enum {
>    GCSmax
>  };
>  
> +struct lj_sysprof_topframe {
> +  TValue *top_frame; /* Top frame for sysprof. */
> +  uint8_t ffid; /* FFID of the fast function VM is about to execute. */
> +};

I concerned a bit that the structure isn't well alligned. Maybe we
should place ffid on the top, to make a "hole" in the structure, but it
will be 64-bit alligned.

> +
>  typedef struct GCState {
>    GCSize total;		/* Memory currently allocated. */
>    GCSize threshold;	/* Memory threshold. */
> @@ -675,7 +680,7 @@ typedef struct global_State {
>    MRef ctype_state;	/* Pointer to C type state. */
>    GCRef gcroot[GCROOT_MAX];  /* GC roots. */
>  #ifdef LJ_HASSYSPROF
> -  TValue *top_frame;	/* Top frame for sysprof. */
> +  struct lj_sysprof_topframe top_frame_info;	/* Top frame info for sysprof. */
>  #endif
>  } global_State;
>  
> diff --git a/src/lj_sysprof.c b/src/lj_sysprof.c
> index 2e9ed9b3..0a341e16 100644
> --- a/src/lj_sysprof.c
> +++ b/src/lj_sysprof.c

<snipped>

> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 7b04b928..3a35b9f7 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc
> @@ -353,14 +353,29 @@
>  |// it syncs with the BASE register only when the control is passed to
>  |// user code. So we need to sync the BASE on each vmstate change to
>  |// keep it consistent.
> +|// The only execption are FFUNCs because sometimes even internal BASE

Typo: s/execption/exception/

> +|// stash is inconsistent for them. To address that issue, their ffid
> +|// is stashed instead, so the corresponding frame can be streamed
> +|// manually.

<snipped>

> +|.macro set_vmstate_ffunc
> +|.if LJ_HASSYSPROF
> +|  set_vmstate INTERP
> +|  mov TMPR, [BASE - 16]
> +|  cleartp LFUNC:TMPR

I suppose that this line is excess: we don't work with TMPR as LFUNC any
again after this chunk.

> +|  mov r10b, LFUNC:TMPR->ffid // r10b is the byte-sized part of TMPR

So, maybe its better to define a macro instead, like `TMPRb`.

> +|  mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], r10b
> +|.endif
> +|  set_vmstate FFUNC
> +|.endmacro
> +|
>  |// Uses TMPRd (r10d).
>  |.macro save_vmstate
>  |.if not WIN
> @@ -376,7 +391,7 @@

<snipped>

> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> index bd1e940e..fabeec9f 100644
> --- a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc
> @@ -451,14 +451,28 @@
>  |// it syncs with the BASE register only when the control is passed to
>  |// user code. So we need to sync the BASE on each vmstate change to
>  |// keep it consistent.
> +|// The only execption are FFUNCs because sometimes even internal BASE

Typo: s/execption/exception/

> +|// stash is inconsistent for them. To address that issue, their ffid
> +|// is stashed instead, so the corresponding frame can be streamed
> +|// manually.

<snipped>

>  |
> +|.macro set_vmstate_ffunc
> +|.if LJ_HASSYSPROF
> +|  set_vmstate INTERP
> +|  mov LFUNC:XCHGd, [BASE - 8]

What about the x86 arch -- XCHGd isn't defined for it, so I'm very
surprised that the VM is even built :)...
We should spill ECX here too, I suppose.

| >>> src/luajit -e 'print(jit.arch)'
| x86
| >>> cd test/tarantool-tests/
| >>> LUA_PATH="./?.lua;../../src/?.lua;;" ../../src/luajit gh-8594-sysprof-ffunc-crash.test.lua
| TAP version 13
| 1..1
| Segmentation fault

Build like the following:
| make -j CC="gcc -m32" CCDEBUG=" -g -ggdb3" CFLAGS=" -O0" XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT " -f Makefile.original

Side note: I'm really dissapointed that we still don't have some flags
to do it from cmake, so it will be available in the our exotic build
testing.

> +|  mov r11b, LFUNC:XCHGd->ffid // r11b is the byte-sized part of XCHGd

So, maybe its better to define a macro instead, like `XCHGb`.

> +|  mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], r11b
> +|.endif
> +|  set_vmstate FFUNC
> +|.endmacro
> +|
>  |// Uses spilled ecx on x86 or XCHGd (r11d) on x64.
>  |.macro save_vmstate
>  |.if not WIN
> @@ -485,7 +499,7 @@

<snipped>

> diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> new file mode 100644
> index 00000000..027eed74
> --- /dev/null
> +++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> @@ -0,0 +1,33 @@
> +local tap = require('tap')
> +local test = tap.test('gh-8594-sysprof-ffunc-crash'):skipcond({
> +  ['Sysprof is implemented for x86_64 only'] = jit.arch ~= 'x86' and
> +                                               jit.arch ~= 'x64',
> +  ['Sysprof is implemented for Linux only'] = jit.os ~= "Linux",

Nit: Typo: s/"Linux"/'Linux'/

> +})

<snipped>

> -- 
> 2.40.1
> 

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH luajit v2] sysprof: fix crash during FFUNC stream
  2023-06-07 12:25 [Tarantool-patches] [PATCH luajit v2] sysprof: fix crash during FFUNC stream Maxim Kokryashkin via Tarantool-patches
  2023-06-29  8:54 ` Sergey Kaplun via Tarantool-patches
@ 2023-07-09 12:20 ` Sergey Kaplun via Tarantool-patches
  1 sibling, 0 replies; 3+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2023-07-09 12:20 UTC (permalink / raw)
  To: Maxim Kokryashkin; +Cc: tarantool-patches

Hi, Maxim!
I see only changes related to the profiling interval. May you resend the
letter with the other fixes too (maybe you've forgot to commit them)?

-- 
Best regards,
Sergey Kaplun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-09 12:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-07 12:25 [Tarantool-patches] [PATCH luajit v2] sysprof: fix crash during FFUNC stream Maxim Kokryashkin via Tarantool-patches
2023-06-29  8:54 ` Sergey Kaplun via Tarantool-patches
2023-07-09 12:20 ` Sergey Kaplun via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox