[Tarantool-patches] [PATCH luajit v3] sysprof: fix crash during FFUNC stream

Maxim Kokryashkin m.kokryashkin at tarantool.org
Mon Jul 17 15:52:41 MSK 2023


Hi!
Thanks for the review!
 
> 
>>Hi, Maxim!
>>Thanks for the fixes!
>>Generally LGTM, but please consider my comments below.
>>
>>On 10.07.23, Maxim Kokryashkin wrote:
>>> Sometimes, the Lua stack can be inconsistent during
>>> the FFUNC execution, which may lead to a sysprof
>>> crash during the stack unwinding.
>>>
>>> This patch replaces the `top_frame` property of `global_State`
>>> with `lj_sysprof_topframe` structure, which contains `top_frame`
>>> and `ffid` properties. `ffid` property makes sense only when the
>>> LuaJIT VM state is set to `FFUNC`. That property is set to the
>>> ffid of the fast function that VM is about to execute.
>>> In the same time, `top_frame` property is not updated now, so
>>> the top frame of the Lua stack can be streamed based on the ffid,
>>> and the rest of the Lua stack can be streamed as usual.
>>>
>>> Also, this patch fixes build with plain makefile, by adding
>>
>>Nit: I suggest to rephrase it like "original Makefile" or
>>Makefile.original.
>>
>>Feel free to ignore.
>Fixed! The branch is force-pushed.
>>
>>> the `LJ_HASSYSPROF` flag support to it.
>>>
>>> Resolves tarantool/tarantool#8594
>>> ---
>>> Changes in v3:
>>> - Fixed comments as per review by Sergey
>>>
>>> Branch:  https://github.com/tarantool/luajit/tree/fckxorg/gh-8594-sysprof-ffunc-crash
>>> PR:  https://github.com/tarantool/tarantool/pull/8737
>>> src/Makefile.original | 3 ++
>>> src/lj_obj.h | 7 +++-
>>> src/lj_sysprof.c | 26 ++++++++++++---
>>> src/vm_x64.dasc | 22 +++++++++++--
>>> src/vm_x86.dasc | 31 ++++++++++++++---
>>> .../gh-8594-sysprof-ffunc-crash.test.lua | 33 +++++++++++++++++++
>>> 6 files changed, 109 insertions(+), 13 deletions(-)
>>> create mode 100644 test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
>>>
>>> diff --git a/src/Makefile.original b/src/Makefile.original
>>> index aedaaa73..e21a0e56 100644
>>> --- a/src/Makefile.original
>>> +++ b/src/Makefile.original
>>
>><snipped>
>>
>>> diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
>>> new file mode 100644
>>> index 00000000..e5cdeb07
>>> --- /dev/null
>>> +++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
>>> @@ -0,0 +1,33 @@
>>> +local tap = require('tap')
>>> +local test = tap.test('gh-8594-sysprof-ffunc-crash'):skipcond({
>>> + ['Sysprof is implemented for x86_64 only'] = jit.arch ~= 'x86' and
>>> + jit.arch ~= 'x64',
>>> + ['Sysprof is implemented for Linux only'] = jit.os ~= 'Linux',
>>> +})
>>> +
>>> +test:plan(1)
>>> +
>>> +jit.off()
>>> +-- XXX: Run JIT tuning functions in a safe frame to avoid errors
>>> +-- thrown when LuaJIT is compiled with JIT engine disabled.
>>> +pcall(jit.flush)
>>> +
>>> +local TMP_BINFILE = '/dev/null'
>>> +
>>> +local res, err = misc.sysprof.start{
>>> + mode = 'C',
>>> + interval = 3,
>>> + path = TMP_BINFILE,
>>> +}
>>> +assert(res, err)
>>> +
>>> +for i = 1, 1e5 do
>>> + tostring(i)
>>> +end
>>
>>Within these (interval/iterations) changes I hardly can see assertion failure
>>on master branch for my laptop (1/10 cases).
>>
>>| >>> git log -n1 --no-decorate --oneline
>>| 8e46d601 test: fix flaky <unit-jit-parse.test.lua>
>>| >>> LUA_PATH="src/?.lua;test/tarantool-tests/?.lua;;" src/luajit test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
>>| TAP version 13
>>| 1..1
>>| ok - sysprof finished successfully
>>
>>I don't know how to resolve this problem with our CI at death's door...
>>OTOH, we may consider that this value is enough for our CI, so I'll see
>>the problem there.
>>So, I'll agree with Sergey's opinion about this flaky test.
>I’ve added a conditional parametrization for this test case, here is the diff:
>==========================================================
>diff --git a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
>index a053e41c..347bd087 100644
>--- a/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
>+++ b/test/tarantool-tests/gh-8594-sysprof-ffunc-crash.test.lua
>@@ -17,17 +17,30 @@ local TMP_BINFILE = '/dev/null'
> -- XXX: The best way to test the issue is to set the profile
> -- interval to be as short as possible. However, our CI is
> -- not capable of handling such intense testing, so it was a
>--- forced decision to reduce the sampling frequency. As a
>+-- forced decision to reduce the sampling frequency for it. As a
> -- result, it is now less likely to reproduce the issue
> -- statistically, but the test case is still valid.
>+
>+-- GitHub always sets[1] the `CI` environment variable to `true`
>+-- for every step in a workflow.
>+-- [1]: https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables
>+local CI = os.getenv('CI') == 'true'
>+
>+-- Profile interval and number of iterations for CI are
>+-- empirical. Non-CI profile interval is set to be as short
>+-- as possible, so the issue is more likely to reproduce.
>+-- Non-CI number of iterations is greater for the same reason.
>+local PROFILE_INTERVAL = CI and 3 or 1
>+local N_ITERATIONS = CI and 1e5 or 1e6
>+
> local res, err = misc.sysprof.start{
>   mode = 'C',
>-  interval = 3,
>+  interval = PROFILE_INTERVAL,
>   path = TMP_BINFILE,
> }
> assert(res, err)
> 
>-for i = 1, 1e5 do
>+for i = 1, N_ITERATIONS do
>   -- XXX: `tostring` is FFUNC.
>   tostring(i)
> end
>==========================================================
>The branch is force-pushed.
>>
>>> +
>>> +res, err = misc.sysprof.stop()
>>> +assert(res, err)
>>> +
>>> +test:ok(true, 'sysprof finished successfully')
>>> +
>>> +os.exit(test:check() and 0 or 1)
>>> --
>>> 2.40.1
>>>
>>
>>--
>>Best regards,
>>Sergey Kaplun
>--
>Best regards,
>Maxim Kokryashkin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20230717/ad9a6c30/attachment.htm>


More information about the Tarantool-patches mailing list