[Tarantool-patches] [PATCH luajit] LJ_GC64: Always snapshot functions for non-base frames.

Wed Oct 18 17:59:36 MSK 2023

Hi, Max

On 10/17/23 17:56, Maxim Kokryashkin via Tarantool-patches wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
>
> On Thu, Oct 12, 2023 at 04:06:25PM +0300, Sergey Bronnikov via Tarantool-patches wrote:
>> From: Sergey Bronnikov <sergeyb at tarantool.org>
>>
>> Reported by Arseny Vakhrushev.
>> Analysis and fix contributed by Peter Cawley.
>>
>> (cherry picked from commit ff1e72acead01df7d8ed0fbb31efd32f57953618)
>>
>> The problem is GC64-specific and could be reproduced with enabled
>> compiler options LUA_USE_ASSERT and LUA_USE_APICHECK.
> I would prefer to see a bug description in the commit message too.
>> Sergey Kaplun:
>>    * minimized reproducer made by fuzzing test
>>
>> Sergey Bronnikov:
>>    * added the description (see a comment in the test)
>>    * added tests for the problem: first one based on the original
>>      reproducer and second one based on a reproducer made by fuzzing test.
>>
>> Part of tarantool/tarantool#8825
> Should be tarantool/tarantool#9145

Fixed.

>> ---
>> Branch: https://github.com/tarantool/luajit/commits/ligurio/lj-611-always-snapshot-functions-for-non-base-frames
>> PR: https://github.com/tarantool/tarantool/pull/9254
>> LJ issue: https://github.com/LuaJIT/LuaJIT/issues/611
>>
>>   src/lj_record.c                               |  1 +
>>   src/lj_snap.c                                 |  9 +++-
>>   ...t-functions-for-non-base-frames-1.test.lua | 36 +++++++++++++++
>>   ...hot-functions-for-non-base-frames.test.lua | 45 +++++++++++++++++++
>>   4 files changed, 89 insertions(+), 2 deletions(-)
>>   create mode 100644 test/tarantool-tests/lj-611-always-snapshot-functions-for-non-base-frames-1.test.lua
>>   create mode 100644 test/tarantool-tests/lj-611-always-snapshot-functions-for-non-base-frames.test.lua
>>
>> diff --git a/src/lj_record.c b/src/lj_record.c
>> index 48a5481b..55785e23 100644
>> --- a/src/lj_record.c
>> +++ b/src/lj_record.c
>> @@ -211,6 +211,7 @@ static TRef getcurrf(jit_State *J)
>>   {
>>     if (J->base[-1-LJ_FR2])
>>       return J->base[-1-LJ_FR2];
>> +  /* Non-base frame functions ought to be loaded already. */
>>     lj_assertJ(J->baseslot == 1+LJ_FR2, "bad baseslot");
>>     return sloadt(J, -1-LJ_FR2, IRT_FUNC, IRSLOAD_READONLY);
>>   }
>> diff --git a/src/lj_snap.c b/src/lj_snap.c
>> index 6c5e5e53..06ae17eb 100644
>> --- a/src/lj_snap.c
>> +++ b/src/lj_snap.c
>> @@ -85,8 +85,13 @@ static MSize snapshot_slots(jit_State *J, SnapEntry *map, BCReg nslots)
>>         IRIns *ir = &J->cur.ir[ref];
>>         if ((LJ_FR2 || !(sn & (SNAP_CONT|SNAP_FRAME))) &&
>>   	  ir->o == IR_SLOAD && ir->op1 == s && ref > retf) {
>> -	/* No need to snapshot unmodified non-inherited slots. */
>> -	if (!(ir->op2 & IRSLOAD_INHERIT))
>> +	/*
>> +	** No need to snapshot unmodified non-inherited slots.
>> +	** But always snapshot the function below a frame in LJ_FR2 mode.
>> +	*/
>> +	if (!(ir->op2 & IRSLOAD_INHERIT) &&
>> +	    (!LJ_FR2 || s == 0 || s+1 == nslots ||
>> +	     !(J->slot[s+1] & (TREF_CONT|TREF_FRAME))))
>>   	  continue;
>>   	/* No need to restore readonly slots and unmodified non-parent slots. */
>>   	if (!(LJ_DUALNUM && (ir->op2 & IRSLOAD_CONVERT)) &&
>> diff --git a/test/tarantool-tests/lj-611-always-snapshot-functions-for-non-base-frames-1.test.lua b/test/tarantool-tests/lj-611-always-snapshot-functions-for-non-base-frames-1.test.lua
>> new file mode 100644
>> index 00000000..759c2862
>> --- /dev/null
>> +++ b/test/tarantool-tests/lj-611-always-snapshot-functions-for-non-base-frames-1.test.lua
> I believe we should give these tests better names that <name>-1.test.lua
> and <name>.test.lua. Maybe those names should include the original source of
> the reproducer (I mean, fuzzer, and the original issue).

Actually specifying source of the test in its name will say nothing to 
everyone who will use tests.

Let's use postfix "original" for the first testcase.

>
<snipped>
>> +
>> +test:plan(1)
>> +
>> +jit.opt.start('hotloop=1', 'hotexit=1')
>> +
>> +-- Test reproduces a bug "GC64: Function missing in snapshot for non-base
> Typo: s/Test/The test/
> Typo: s/a bug/the bug/
Fixed.
>> +-- frame" [1], and based on reproducer described in [2].
> Typo: s/based/is based/
> Typo: s/reproducer/the reproducer/

Fixed.

>> +--
>> +-- [1]: https://github.com/LuaJIT/LuaJIT/issues/611
>> +-- [2]: https://github.com/LuaJIT/LuaJIT/issues/611#issuecomment-679228156
>> +--
>> +-- Function `outer` is recorded to a trace and calls a builtin function that is
> Typo: s/builtin/built-in/

Fixed.

>> +-- not JIT-compilable and therefore triggers exit to interpreter, and then it
> Typo: s/to interpreter/to the interpreter

Fixed.

>> +-- resumes tracing just after the call returns - this is a trace stitching.
> Typo: s/is a/is/

Fixed.

>> +-- Then, within the call, we need the potential for a side trace. Finally, we need
>> +-- that side exit to be taken enough for the exit to be compiled into a trace.
> Typo: s/taken enough/taken enough times/

Fixed.

>> +-- The loop at the bottom has enough iterations to trigger JIT compilation, and
>> +-- enough more on top on trigger compilation of the not case. Compilation of
> These two lines are incomprehensible. Please parahphrase them.

Updated (and force-pushed):

  --
--- Function `outer` is recorded to a trace and calls a builtin function 
that is
--- not JIT-compilable and therefore triggers exit to interpreter, and 
then it
--- resumes tracing just after the call returns - this is a trace stitching.
--- Then, within the call, we need the potential for a side trace. 
Finally, we need
--- that side exit to be taken enough for the exit to be compiled into a 
trace.
--- The loop at the bottom has enough iterations to trigger JIT 
compilation, and
--- enough more on top on trigger compilation of the not case. 
Compilation of
--- this case hits the assertion failure.
+-- Function `outer` is recorded to a trace and calls a built-in
+-- function that is not JIT-compilable and therefore triggers
+-- exit to the interpreter, and then it resumes tracing just after
+-- the call returns - this is trace stitching. Then, within
+-- the call, we need the potential for a side trace. Finally,
+-- we need that side exit to be taken enough times for the exit
+-- to be compiled into a trace. This compilation hits
+-- the assertion failure.

  local inner
  for _ = 1, 3 do

>> +-- this case hits the assertion failure.
>> +
>> +local inner
>> +for _ = 1, 3 do
>> +  inner = function(_, i)
>> +    return i < 4
>> +  end
>> +end
>> +
>> +local function outer(i)
>> +  -- The function `string.gsub` is not JIT-compilable and triggers a trace
>> +  -- exit. For example, `string.gmatch` and `string.match` are suitable as
>> +  -- well.
>> +  -- See https://github.com/tarantool/tarantool/wiki/LuaJIT-Not-Yet-Implemented.
>> +  inner(string.gsub('', '', ''), i)
>> +end
>> +
>> +for i = 1, 4 do
>> +  outer(i)
>> +end
> I like this repro much more. It is easier to understand than the first one.
> What's the motivation behind having two tests for the same issue?
We had a repro produced by our fuzzing test and we agreed with Sergey K. 
to include this repro as a test too.
>> +
>> +test:ok(true, 'function is present in snapshot')
>> +test:done(true)
>> -- 
>> 2.34.1
>>