Hi! Thanks for the review!   >  >>Hi, Maxim! >>Thanks for the fixes! >>Generally LGTM, but please consider my comments below. >> >>On 10.07.23, Maxim Kokryashkin wrote: >>> Sometimes, the Lua stack can be inconsistent during >>> the FFUNC execution, which may lead to a sysprof >>> crash during the stack unwinding. >>> >>> This patch replaces the `top_frame` property of `global_State` >>> with `lj_sysprof_topframe` structure, which contains `top_frame` >>> and `ffid` properties. `ffid` property makes sense only when the >>> LuaJIT VM state is set to `FFUNC`. That property is set to the >>> ffid of the fast function that VM is about to execute. >>> In the same time, `top_frame` property is not updated now, so >>> the top frame of the Lua stack can be streamed based on the ffid, >>> and the rest of the Lua stack can be streamed as usual. >>> >>> Also, this patch fixes build with plain makefile, by adding >> >>Nit: I suggest to rephrase it like "original Makefile" or >>Makefile.original. >> >>Feel free to ignore. >Fixed! The branch is force-pushed. >> >>> the `LJ_HASSYSPROF` flag support to it. >>> >>> Resolves tarantool/tarantool#8594 >>> --- >>> Changes in v3: >>> - Fixed comments as per review by Sergey >>> >>> Branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-8594-sysprof-ffunc-crash >>> PR: https://github.com/tarantool/tarantool/pull/8737 >>> src/Makefile.original | 3 ++ >>> src/lj_obj.h | 7 +++- >>> src/lj_sysprof.c | 26 ++++++++++++--- >>> src/vm_x64.dasc | 22 +++++++++++-- >>> src/vm_x86.dasc | 31 ++++++++++++++--- >>> .../gh-8594-sysprof-ffunc-crash.test.lua | 33 +++++++++++++++++++ >>> 6 files changed, 109 insertions(+), 13 deletions(-) >>> create mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >>> >>> diff --git a/src/Makefile.original b/src/Makefile.original >>> index aedaaa73..e21a0e56 100644 >>> --- a/src/Makefile.original >>> +++ b/src/Makefile.original >> >> >> >>> diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >>> new file mode 100644 >>> index 00000000..e5cdeb07 >>> --- /dev/null >>> +++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >>> @@ -0,0 +1,33 @@ >>> +local tap = require('tap') >>> +local test = tap.test('gh-8594-sysprof-ffunc-crash'):skipcond({ >>> + ['Sysprof is implemented for x86_64 only'] = jit.arch ~= 'x86' and >>> + jit.arch ~= 'x64', >>> + ['Sysprof is implemented for Linux only'] = jit.os ~= 'Linux', >>> +}) >>> + >>> +test:plan(1) >>> + >>> +jit.off() >>> +-- XXX: Run JIT tuning functions in a safe frame to avoid errors >>> +-- thrown when LuaJIT is compiled with JIT engine disabled. >>> +pcall(jit.flush) >>> + >>> +local TMP_BINFILE = '/dev/null' >>> + >>> +local res, err = misc.sysprof.start{ >>> + mode = 'C', >>> + interval = 3, >>> + path = TMP_BINFILE, >>> +} >>> +assert(res, err) >>> + >>> +for i = 1, 1e5 do >>> + tostring(i) >>> +end >> >>Within these (interval/iterations) changes I hardly can see assertion failure >>on master branch for my laptop (1/10 cases). >> >>| >>> git log -n1 --no-decorate --oneline >>| 8e46d601 test: fix flaky >>| >>> LUA_PATH="src/?.lua;test/tarantool-tests/?.lua;;" src/luajit test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >>| TAP version 13 >>| 1..1 >>| ok - sysprof finished successfully >> >>I don't know how to resolve this problem with our CI at death's door... >>OTOH, we may consider that this value is enough for our CI, so I'll see >>the problem there. >>So, I'll agree with Sergey's opinion about this flaky test. >I’ve added a conditional parametrization for this test case, here is the diff: >========================================================== >diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >index a053e41c..347bd087 100644 >--- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >+++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua >@@ -17,17 +17,30 @@ local TMP_BINFILE = '/dev/null' > -- XXX: The best way to test the issue is to set the profile > -- interval to be as short as possible. However, our CI is > -- not capable of handling such intense testing, so it was a >--- forced decision to reduce the sampling frequency. As a >+-- forced decision to reduce the sampling frequency for it. As a > -- result, it is now less likely to reproduce the issue > -- statistically, but the test case is still valid. >+ >+-- GitHub always sets[1] the `CI` environment variable to `true` >+-- for every step in a workflow. >+-- [1]: https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables >+local CI = os.getenv('CI') == 'true' >+ >+-- Profile interval and number of iterations for CI are >+-- empirical. Non-CI profile interval is set to be as short >+-- as possible, so the issue is more likely to reproduce. >+-- Non-CI number of iterations is greater for the same reason. >+local PROFILE_INTERVAL = CI and 3 or 1 >+local N_ITERATIONS = CI and 1e5 or 1e6 >+ > local res, err = misc.sysprof.start{ >   mode = 'C', >-  interval = 3, >+  interval = PROFILE_INTERVAL, >   path = TMP_BINFILE, > } > assert(res, err) >  >-for i = 1, 1e5 do >+for i = 1, N_ITERATIONS do >   -- XXX: `tostring` is FFUNC. >   tostring(i) > end >========================================================== >The branch is force-pushed. >> >>> + >>> +res, err = misc.sysprof.stop() >>> +assert(res, err) >>> + >>> +test:ok(true, 'sysprof finished successfully') >>> + >>> +os.exit(test:check() and 0 or 1) >>> -- >>> 2.40.1 >>> >> >>-- >>Best regards, >>Sergey Kaplun >-- >Best regards, >Maxim Kokryashkin