<HTML><BODY><div>Hi!</div><div>Thanks for the review!</div><div> </div><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">Четверг, 13 июля 2023, 11:12 +03:00 от Sergey Bronnikov <sergeyb@tarantool.org>:<br> <div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_16892359651307612832_BODY">Hi, Max!<br><br>Thanks for the patch!<br><br><br>Test is passed with reverted patch, so it is useless.</div></div></div></div></blockquote><div>It is not useless. The issue is really hard to reproduce in a stable</div><div>way, because it is vital to land in the very small region of the</div><div>VM assembly code. I’ve tried to make it happen more often</div><div>by increasing the sampling frequency, however our CI is not</div><div>capable of handling intense cases like that. I’ve reduced</div><div>the frequency, so CI is ok now, but the issue is statistically less</div><div>likely to reproduce.</div><div> </div><div>I’ve added the comment about that to the test,</div><div>so there won’t be any questions regarding that in the future.</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br><br>As I get it right there is a non-zero possibility to catch inconsistency<br>in VM state in a future,<br><br>so I propose to enable sysprof in our fuzzing test [1] with a high<br>sampling interval and check that profiler<br><br>will work on generated Lua programs . What do you think?</div></div></div></div></blockquote><div>Sure, I welcome that kind of testing.</div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br><br>Also see my comments inline.<br><br><br>1.<br><a href="https://github.com/ligurio/tarantool/commit/b8bc3293da66ded41383faf5ea913a2554d987b8" target="_blank">https://github.com/ligurio/tarantool/commit/b8bc3293da66ded41383faf5ea913a2554d987b8</a><br><br><br>Sergey<br><br>On 7/10/23 15:28, Maxim Kokryashkin wrote:<br>> Sometimes, the Lua stack can be inconsistent during<br>> the FFUNC execution, which may lead to a sysprof<br>> crash during the stack unwinding.<br>><br>> This patch replaces the `top_frame` property of `global_State`<br>> with `lj_sysprof_topframe` structure, which contains `top_frame`<br>> and `ffid` properties. `ffid` property makes sense only when the<br>> LuaJIT VM state is set to `FFUNC`. That property is set to the<br>> ffid of the fast function that VM is about to execute.<br>> In the same time, `top_frame` property is not updated now, so<br>> the top frame of the Lua stack can be streamed based on the ffid,<br>> and the rest of the Lua stack can be streamed as usual.<br>><br>> Also, this patch fixes build with plain makefile, by adding<br>> the `LJ_HASSYSPROF` flag support to it.<br>><br>> Resolves tarantool/tarantool#8594<br>> ---<br>> Changes in v3:<br>> - Fixed comments as per review by Sergey<br>><br>> Branch: <a href="https://github.com/tarantool/luajit/tree/fckxorg/gh-8594-sysprof-ffunc-crash" target="_blank">https://github.com/tarantool/luajit/tree/fckxorg/gh-8594-sysprof-ffunc-crash</a><br>> PR: <a href="https://github.com/tarantool/tarantool/pull/8737" target="_blank">https://github.com/tarantool/tarantool/pull/8737</a><br>> src/Makefile.original | 3 ++<br>> src/lj_obj.h | 7 +++-<br>> src/lj_sysprof.c | 26 ++++++++++++---<br>> src/vm_x64.dasc | 22 +++++++++++--<br>> src/vm_x86.dasc | 31 ++++++++++++++---<br>> .../gh-8594-sysprof-ffunc-crash.test.lua | 33 +++++++++++++++++++<br>> 6 files changed, 109 insertions(+), 13 deletions(-)<br>> create mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua<br>><br>> diff --git a/src/Makefile.original b/src/Makefile.original<br>> index aedaaa73..e21a0e56 100644<br>> --- a/src/Makefile.original<br>> +++ b/src/Makefile.original<br>> @@ -441,6 +441,9 @@ ifneq (,$(findstring LJ_NO_UNWIND 1,$(TARGET_TESTARCH)))<br>> DASM_AFLAGS+= -D NO_UNWIND<br>> TARGET_ARCH+= -DLUAJIT_NO_UNWIND<br>> endif<br>> +ifneq (,$(findstring LJ_HASSYSPROF 1,$(TARGET_TESTARCH)))<br>> + DASM_AFLAGS+= -D LJ_HASSYSPROF<br>> +endif<br>> DASM_AFLAGS+= -D VER=$(subst LJ_ARCH_VERSION_,,$(filter LJ_ARCH_VERSION_%,$(subst LJ_ARCH_VERSION ,LJ_ARCH_VERSION_,$(TARGET_TESTARCH))))<br>> ifeq (Windows,$(TARGET_SYS))<br>> DASM_AFLAGS+= -D WIN<br>> diff --git a/src/lj_obj.h b/src/lj_obj.h<br>> index 45507e0d..e17316df 100644<br>> --- a/src/lj_obj.h<br>> +++ b/src/lj_obj.h<br>> @@ -598,6 +598,11 @@ enum {<br>> GCSmax<br>> };<br>><br>> +struct lj_sysprof_topframe {<br>> + uint8_t ffid; /* FFID of the fast function VM is about to execute. */<br>> + TValue *top_frame; /* Top frame for sysprof. */<br>> +};<br>> +<br>> typedef struct GCState {<br>> GCSize total; /* Memory currently allocated. */<br>> GCSize threshold; /* Memory threshold. */<br>> @@ -675,7 +680,7 @@ typedef struct global_State {<br>> MRef ctype_state; /* Pointer to C type state. */<br>> GCRef gcroot[GCROOT_MAX]; /* GC roots. */<br>> #ifdef LJ_HASSYSPROF<br>> - TValue *top_frame; /* Top frame for sysprof. */<br>> + struct lj_sysprof_topframe top_frame_info; /* Top frame info for sysprof. */<br>> #endif<br>> } global_State;<br>><br>> diff --git a/src/lj_sysprof.c b/src/lj_sysprof.c<br>> index 2e9ed9b3..0a341e16 100644<br>> --- a/src/lj_sysprof.c<br>> +++ b/src/lj_sysprof.c<br>> @@ -109,6 +109,12 @@ static void stream_epilogue(struct sysprof *sp)<br>> lj_wbuf_addbyte(&sp->out, LJP_EPILOGUE_BYTE);<br>> }<br>><br>> +static void stream_ffunc_impl(struct lj_wbuf *buf, uint8_t ffid)<br>> +{<br>> + lj_wbuf_addbyte(buf, LJP_FRAME_FFUNC);<br>> + lj_wbuf_addu64(buf, ffid);<br>> +}<br>> +<br>> static void stream_lfunc(struct lj_wbuf *buf, const GCfunc *func)<br>> {<br>> lua_assert(isluafunc(func));<br>> @@ -129,8 +135,7 @@ static void stream_cfunc(struct lj_wbuf *buf, const GCfunc *func)<br>> static void stream_ffunc(struct lj_wbuf *buf, const GCfunc *func)<br>> {<br>> lua_assert(isffunc(func));<br>> - lj_wbuf_addbyte(buf, LJP_FRAME_FFUNC);<br>> - lj_wbuf_addu64(buf, func->c.ffid);<br>> + stream_ffunc_impl(buf, func->c.ffid);<br>> }<br>><br>> static void stream_frame_lua(struct lj_wbuf *buf, const cTValue *frame)<br>> @@ -148,7 +153,7 @@ static void stream_frame_lua(struct lj_wbuf *buf, const cTValue *frame)<br>> lua_assert(0);<br>> }<br>><br>> -static void stream_backtrace_lua(struct sysprof *sp)<br>> +static void stream_backtrace_lua(struct sysprof *sp, uint32_t vmstate)<br>> {<br>> global_State *g = sp->g;<br>> struct lj_wbuf *buf = &sp->out;<br>> @@ -158,8 +163,19 @@ static void stream_backtrace_lua(struct sysprof *sp)<br>> lua_assert(g != NULL);<br>> L = gco2th(gcref(g->cur_L));<br>> lua_assert(L != NULL);<br>> + /*<br>> + ** Lua stack may be inconsistent during the execution of a<br>> + ** fast-function, so instead of updating the `top_frame` for<br>> + ** it, its `ffid` is set instead. The first frame on the<br>> + ** result stack is streamed manually, and the rest of the<br>> + ** stack is streamed based on the previous `top_frame` value.<br>> + */<br>> + if (vmstate == LJ_VMST_FFUNC) {<br>> + uint8_t ffid = g->top_frame_info.ffid;<br>> + stream_ffunc_impl(buf, ffid);<br>> + }<br>><br>> - top_frame = g->top_frame - 1; //(1 + LJ_FR2)<br>> + top_frame = g->top_frame_info.top_frame - 1;<br>><br>> bot = tvref(L->stack) + LJ_FR2;<br>> /* Traverse frames backwards */<br>> @@ -234,7 +250,7 @@ static void stream_trace(struct sysprof *sp, uint32_t vmstate)<br>> static void stream_guest(struct sysprof *sp, uint32_t vmstate)<br>> {<br>> lj_wbuf_addbyte(&sp->out, (uint8_t)vmstate);<br>> - stream_backtrace_lua(sp);<br>> + stream_backtrace_lua(sp, vmstate);<br>> stream_backtrace_host(sp);<br>> }<br>><br>> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc<br>> index 7b04b928..2a0c3f03 100644<br>> --- a/src/vm_x64.dasc<br>> +++ b/src/vm_x64.dasc<br>> @@ -53,6 +53,7 @@<br>> |.define RDL, RCL<br>> |.define TMPR, r10<br>> |.define TMPRd, r10d<br>> +|.define TMPRb, r10b<br>> |.define ITYPE, r11<br>> |.define ITYPEd, r11d<br>> |<br>> @@ -353,14 +354,29 @@<br>> |// it syncs with the BASE register only when the control is passed to<br>> |// user code. So we need to sync the BASE on each vmstate change to<br>> |// keep it consistent.<br>> +|// The only exception are FFUNCs because sometimes even internal BASE<br>> +|// stash is inconsistent for them. To address that issue, their ffid<br>> +|// is stashed instead, so the corresponding frame can be streamed<br>> +|// manually.<br>> |.macro set_vmstate_sync_base, st<br>> |.if LJ_HASSYSPROF<br>> | set_vmstate INTERP // Guard for non-atomic VM context restoration<br>> -| mov qword [DISPATCH+DISPATCH_GL(top_frame)], BASE<br>> +| mov qword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], BASE<br>> |.endif<br>> | set_vmstate st<br>> |.endmacro<br>> |<br>> +|.macro set_vmstate_ffunc<br>> +|.if LJ_HASSYSPROF<br>> +| set_vmstate INTERP<br>> +| mov TMPR, [BASE - 16]<br>> +| cleartp LFUNC:TMPR // Obtain plain address value. Equivalent of `gcval`.<br>> +| mov TMPRb, LFUNC:TMPR->ffid<br>> +| mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], TMPRb<br>> +|.endif<br>> +| set_vmstate FFUNC<br>> +|.endmacro<br>> +|<br>> |// Uses TMPRd (r10d).<br>> |.macro save_vmstate<br>> |.if not WIN<br>> @@ -376,7 +392,7 @@<br>> | set_vmstate INTERP<br>> | mov TMPR, SAVE_L<br>> | mov TMPR, L:TMPR->base<br>> -| mov qword [DISPATCH+DISPATCH_GL(top_frame)], TMPR<br>> +| mov qword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], TMPR<br>> |.endif<br>> | mov TMPRd, SAVE_VMSTATE<br>> | mov dword [DISPATCH+DISPATCH_GL(vmstate)], TMPRd<br>> @@ -1188,7 +1204,7 @@ static void build_subroutines(BuildCtx *ctx)<br>> |<br>> |.macro .ffunc, name<br>> |->ff_ .. name:<br>> - | set_vmstate_sync_base FFUNC<br>> + | set_vmstate_ffunc<br>> |.endmacro<br>> |<br>> |.macro .ffunc_1, name<br>> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc<br>> index bd1e940e..ff388d58 100644<br>> --- a/src/vm_x86.dasc<br>> +++ b/src/vm_x86.dasc<br>> @@ -101,6 +101,7 @@<br>> |.define FCARG2, CARG2d<br>> |<br>> |.define XCHGd, r11d // TMP on x64, used for exchange.<br>> +|.define XCHGb, r11b // TMPRb on x64, used for exchange.<br>> |.endif<br>> |<br>> |// Type definitions. Some of these are only used for documentation.<br>> @@ -451,14 +452,36 @@<br>> |// it syncs with the BASE register only when the control is passed to<br>> |// user code. So we need to sync the BASE on each vmstate change to<br>> |// keep it consistent.<br>> +|// The only exception are FFUNCs because sometimes even internal BASE<br>> +|// stash is inconsistent for them. To address that issue, their ffid<br>> +|// is stashed instead, so the corresponding frame can be streamed<br>> +|// manually.<br>> |.macro set_vmstate_sync_base, st<br>> |.if LJ_HASSYSPROF<br>> | set_vmstate INTERP // Guard for non-atomic VM context restoration<br>> -| mov dword [DISPATCH+DISPATCH_GL(top_frame)], BASE<br>> +| mov dword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], BASE<br>> |.endif<br>> | set_vmstate st<br>> |.endmacro<br>> |<br>> +|.macro set_vmstate_ffunc<br>> +|.if LJ_HASSYSPROF<br>> +| set_vmstate INTERP<br>> +|.if not X64<br>> +| mov SPILLECX, ecx<br>> +| mov LFUNC:ecx, [BASE - 4]<br>> +| mov cl, LFUNC:ecx->ffid<br>> +| mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], cl<br>> +| mov ecx, SPILLECX<br>> +|.else // X64<br>> +| mov LFUNC:XCHGd, [BASE - 8]<br>> +| mov XCHGb, LFUNC:XCHGd->ffid<br>> +| mov byte [DISPATCH+DISPATCH_GL(top_frame_info.ffid)], XCHGb<br>> +|.endif // X64<br>> +|.endif // LJ_HASSYSPROF<br>> +| set_vmstate FFUNC<br>> +|.endmacro<br>> +|<br>> |// Uses spilled ecx on x86 or XCHGd (r11d) on x64.<br>> |.macro save_vmstate<br>> |.if not WIN<br>> @@ -485,7 +508,7 @@<br>> |.if LJ_HASSYSPROF<br>> | mov ecx, SAVE_L<br>> | mov ecx, L:ecx->base<br>> -| mov dword [DISPATCH+DISPATCH_GL(top_frame)], ecx<br>> +| mov dword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], ecx<br>> |.endif<br>> | mov ecx, SAVE_VMSTATE<br>> | mov dword [DISPATCH+DISPATCH_GL(vmstate)], ecx<br>> @@ -494,7 +517,7 @@<br>> |.if LJ_HASSYSPROF<br>> | mov XCHGd, SAVE_L<br>> | mov XCHGd, L:XCHGd->base<br>> -| mov dword [DISPATCH+DISPATCH_GL(top_frame)], XCHGd<br>> +| mov dword [DISPATCH+DISPATCH_GL(top_frame_info.top_frame)], XCHGd<br>> |.endif<br>> | mov XCHGd, SAVE_VMSTATE<br>> | mov dword [DISPATCH+DISPATCH_GL(vmstate)], XCHGd<br>> @@ -1488,7 +1511,7 @@ static void build_subroutines(BuildCtx *ctx)<br>> |<br>> |.macro .ffunc, name<br>> |->ff_ .. name:<br>> - | set_vmstate_sync_base FFUNC<br>> + | set_vmstate_ffunc<br>> |.endmacro<br>> |<br>> |.macro .ffunc_1, name<br>> diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua<br>> new file mode 100644<br>> index 00000000..e5cdeb07<br>> --- /dev/null<br>> +++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua<br>> @@ -0,0 +1,33 @@<br>> +local tap = require('tap')<br>> +local test = tap.test('gh-8594-sysprof-ffunc-crash'):skipcond({<br>> + ['Sysprof is implemented for x86_64 only'] = jit.arch ~= 'x86' and<br>> + jit.arch ~= 'x64',<br>> + ['Sysprof is implemented for Linux only'] = jit.os ~= 'Linux',<br>> +})<br>> +<br>> +test:plan(1)<br>> +<br>> +jit.off()<br>> +-- XXX: Run JIT tuning functions in a safe frame to avoid errors<br>> +-- thrown when LuaJIT is compiled with JIT engine disabled.<br>> +pcall(jit.flush)<br>> +<br>> +local TMP_BINFILE = '/dev/null'<br>> +<br>> +local res, err = misc.sysprof.start{<br>> + mode = 'C',<br><br>In the reproducer from bug description "L" mode is used.<br><br>Why do you use "C" mode here?<br><br>> + interval = 3,<br>> + path = TMP_BINFILE,<br>> +}<br>> +assert(res, err)<br>> +<br>> +for i = 1, 1e5 do<br>> + tostring(i)<br><br>Please add a comment, that we use "tostring" here, because "tostring" is<br>a fast function.<br><br>(Sure it's may be obvious for you, but it will be more clear for those<br>who will read it.)</div></div></div></div></blockquote><div>Added.</div><div> </div><div>Here is the diff:</div><div>================================================</div><div><div><div>diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua</div><div>index e5cdeb07..a053e41c 100644</div><div>--- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua</div><div>+++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua</div><div>@@ -14,6 +14,12 @@ pcall(jit.flush)</div><div> </div><div> local TMP_BINFILE = '/dev/null'</div><div> </div><div>+-- XXX: The best way to test the issue is to set the profile</div><div>+-- interval to be as short as possible. However, our CI is</div><div>+-- not capable of handling such intense testing, so it was a</div><div>+-- forced decision to reduce the sampling frequency. As a</div><div>+-- result, it is now less likely to reproduce the issue</div><div>+-- statistically, but the test case is still valid.</div><div> local res, err = misc.sysprof.start{</div><div> mode = 'C',</div><div> interval = 3,</div><div>@@ -22,6 +28,7 @@ local res, err = misc.sysprof.start{</div><div> assert(res, err)</div><div> </div><div> for i = 1, 1e5 do</div><div>+ -- XXX: `tostring` is FFUNC.</div><div> tostring(i)</div><div> end</div><div>================================================</div><div>Branch is force-pushed.</div><div> </div></div></div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> +end<br>> +<br>> +res, err = misc.sysprof.stop()<br>> +assert(res, err)<br>> +<br>> +test:ok(true, 'sysprof finished successfully')<br>> +<br>> +os.exit(test:check() and 0 or 1)</div></div></div></div></blockquote><div><div>--<br>Best regards,</div><div>Maxim Kokryashkin</div></div><div> </div></BODY></HTML>