[Tarantool-patches] [PATCH luajit v2] sysprof: fix crash during FFUNC stream

Sergey Kaplun skaplun at tarantool.org
Thu Jun 29 11:54:55 MSK 2023


Hi, Maxim!
Thanks for the patch!
Please consider my comments below.

On 07.06.23, Maxim Kokryashkin wrote:
> Sometimes, the Lua stack can be inconsistent during
> the FFUNC execution, which may lead to a sysprof
> crash during the stack unwinding.
> 
> This patch replaces the `top_frame` property of `global_State`
> with `lj_sysprof_topframe` structure, which contains `top_frame`
> and `ffid` properties. `ffid` property makes sense only when the
> LuaJIT VM state is set to `FFUNC`. That property is set to the
> ffid of the fast function that VM is about to execute.
> In the same time, `top_frame` property is not updated now, so
> the top frame of the Lua stack can be streamed based on the ffid,
> and the rest of the Lua stack can be streamed as usual.
> 
> Resolves tarantool/tarantool#8594
> ---
> Changes in v2:
> - Sysprof binary data is now dumped into `/dev/null` to avoid cluttering
> of the test runner drive
> 
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-8594-sysprof-ffunc-crash
> PR: https://github.com/tarantool/tarantool/pull/8737
> 
>  src/lj_obj.h                                  |  7 +++-
>  src/lj_sysprof.c                              | 26 ++++++++++++---
>  src/vm_x64.dasc                               | 21 ++++++++++--
>  src/vm_x86.dasc                               | 22 ++++++++++---
>  .../gh-8594-sysprof-ffunc-crash.test.lua      | 33 +++++++++++++++++++
>  5 files changed, 96 insertions(+), 13 deletions(-)
>  create mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> 
> diff --git a/src/lj_obj.h b/src/lj_obj.h
> index 45507e0d..186433a3 100644
> --- a/src/lj_obj.h
> +++ b/src/lj_obj.h
> @@ -598,6 +598,11 @@ enum {
>    GCSmax
>  };
>  
> +struct lj_sysprof_topframe {
> +  TValue *top_frame; /* Top frame for sysprof. */
> +  uint8_t ffid; /* FFID of the fast function VM is about to execute. */
> +};

I concerned a bit that the structure isn't well alligned. Maybe we
should place ffid on the top, to make a "hole" in the structure, but it
will be 64-bit alligned.

> +
>  typedef struct GCState {
>    GCSize total;		/* Memory currently allocated. */
>    GCSize threshold;	/* Memory threshold. */
> @@ -675,7 +680,7 @@ typedef struct global_State {
>    MRef ctype_state;	/* Pointer to C type state. */
>    GCRef gcroot[GCROOT_MAX];  /* GC roots. */
>  #ifdef LJ_HASSYSPROF
> -  TValue *top_frame;	/* Top frame for sysprof. */
> +  struct lj_sysprof_topframe top_frame_info;	/* Top frame info for sysprof. */
>  #endif
>  } global_State;
>  
> diff --git a/src/lj_sysprof.c b/src/lj_sysprof.c
> index 2e9ed9b3..0a341e16 100644
> --- a/src/lj_sysprof.c
> +++ b/src/lj_sysprof.c

<snipped>

> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 7b04b928..3a35b9f7 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc
> @@ -353,14 +353,29 @@
>  |// it syncs with the BASE register only when the control is passed to
>  |// user code. So we need to sync the BASE on each vmstate change to
>  |// keep it consistent.
> +|// The only execption are FFUNCs because sometimes even internal BASE

Typo: s/execption/exception/

> +|// stash is inconsistent for them. To address that issue, their ffid
> +|// is stashed instead, so the corresponding frame can be streamed
> +|// manually.

<snipped>

> +|.macro set_vmstate_ffunc
> +|.if LJ_HASSYSPROF
> +|  set_vmstate INTERP
> +|  mov TMPR, [BASE - 16]
> +|  cleartp LFUNC:TMPR

I suppose that this line is excess: we don't work with TMPR as LFUNC any
again after this chunk.

> +|  mov r10b, LFUNC:TMPR->ffid // r10b is the byte-sized part of TMPR

So, maybe its better to define a macro instead, like `TMPRb`.

> +|  mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], r10b
> +|.endif
> +|  set_vmstate FFUNC
> +|.endmacro
> +|
>  |// Uses TMPRd (r10d).
>  |.macro save_vmstate
>  |.if not WIN
> @@ -376,7 +391,7 @@

<snipped>

> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> index bd1e940e..fabeec9f 100644
> --- a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc
> @@ -451,14 +451,28 @@
>  |// it syncs with the BASE register only when the control is passed to
>  |// user code. So we need to sync the BASE on each vmstate change to
>  |// keep it consistent.
> +|// The only execption are FFUNCs because sometimes even internal BASE

Typo: s/execption/exception/

> +|// stash is inconsistent for them. To address that issue, their ffid
> +|// is stashed instead, so the corresponding frame can be streamed
> +|// manually.

<snipped>

>  |
> +|.macro set_vmstate_ffunc
> +|.if LJ_HASSYSPROF
> +|  set_vmstate INTERP
> +|  mov LFUNC:XCHGd, [BASE - 8]

What about the x86 arch -- XCHGd isn't defined for it, so I'm very
surprised that the VM is even built :)...
We should spill ECX here too, I suppose.

| >>> src/luajit -e 'print(jit.arch)'
| x86
| >>> cd test/tarantool-tests/
| >>> LUA_PATH="./?.lua;../../src/?.lua;;" ../../src/luajit gh-8594-sysprof-ffunc-crash.test.lua
| TAP version 13
| 1..1
| Segmentation fault

Build like the following:
| make -j CC="gcc -m32" CCDEBUG=" -g -ggdb3" CFLAGS=" -O0" XCFLAGS=" -DLUA_USE_APICHECK -DLUA_USE_ASSERT " -f Makefile.original

Side note: I'm really dissapointed that we still don't have some flags
to do it from cmake, so it will be available in the our exotic build
testing.

> +|  mov r11b, LFUNC:XCHGd->ffid // r11b is the byte-sized part of XCHGd

So, maybe its better to define a macro instead, like `XCHGb`.

> +|  mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], r11b
> +|.endif
> +|  set_vmstate FFUNC
> +|.endmacro
> +|
>  |// Uses spilled ecx on x86 or XCHGd (r11d) on x64.
>  |.macro save_vmstate
>  |.if not WIN
> @@ -485,7 +499,7 @@

<snipped>

> diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> new file mode 100644
> index 00000000..027eed74
> --- /dev/null
> +++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
> @@ -0,0 +1,33 @@
> +local tap = require('tap')
> +local test = tap.test('gh-8594-sysprof-ffunc-crash'):skipcond({
> +  ['Sysprof is implemented for x86_64 only'] = jit.arch ~= 'x86' and
> +                                               jit.arch ~= 'x64',
> +  ['Sysprof is implemented for Linux only'] = jit.os ~= "Linux",

Nit: Typo: s/"Linux"/'Linux'/

> +})

<snipped>

> -- 
> 2.40.1
> 

-- 
Best regards,
Sergey Kaplun


More information about the Tarantool-patches mailing list