[Tarantool-patches] [PATCH luajit] Fix bytecode register allocation for comparisons.
Igor Munkin
imun at tarantool.org
Mon Aug 16 10:20:07 MSK 2021
Sergey,
Thanks for the explanation! Please consider the new comments below.
On 01.08.21, Sergey Kaplun wrote:
> Hi, Igor!
>
> Thanks for the review!
>
> On 01.08.21, Igor Munkin wrote:
> > Sergey,
> >
> > Thanks for the patch! Please consider the comments below. I didn't check
> > the test yet, since I don't get the JIT peculiarities from your commit
> > message and comments. Please provide a clearer description and I'll
> > proceed with the review of the test case then.
> >
> > On 19.07.21, Sergey Kaplun wrote:
> > > From: Mike Pall <mike>
> > >
> > > (cherry picked from commit 2f3f07882fb4ad9c64967d7088461b1ca0a25d3a)
> > >
> > > When LuaJIT is build with LJ_FR2 (GC64), information about frame takes
> > > two slots -- the first takes the TValue with the function to call, the
> > > second takes the additional frame information. The recording JIT
> >
> > Minor: The second slot is the framelink in LuaJIT terms.
>
> Yes, because it takes the additional frame information. How do you want
> to modify this line?
Just say that the second slot takes the framelink: this is lapidary.
>
> >
> > > machinery works pretty the same -- the function IR_KGC is loaded in the
> > > first slot, and the second is set to TREF_FRAME value. This value
> > > should be rewritten after return from a callee. It is done either by the
> > > return values either this slot is cleared (set to zero) manually with
> > > the next bytecode with RA dst mode with the assumption, that the dst RA
> > > takes the next slot after TREF_FRAME, i.e. an earlier instruction uses
> > > the smallest possible destination register (see `lj_record_ins()` for
> > > the details).
> >
> > The main point lies in the monstrous 5-line sentence. I've read several
> > times, but still don't get it. Could you please reword it in a not such
> > complex sentence?
>
> The first option is rewrite this slot by return values from the
> function.
And this is not the case, right? I mean, this approach works fine even
without the patch, doesn't it?
>
> The second option is clearing slot (i.e. set to zero) manually, when
> there is no values to return. It is done by the next bytecode having RA
> dst mode. This obliges that the destination of RA takes the next slot
> after TREF_FRAME. For this an earlier instruction must use the smallest
> possible destination register (see `lj_record_ins()` for the details).
Here is the case, got it, thanks! So, I guess it's enough to adjust the
commit message to be similar to the section above.
>
> >
> > >
> > > Bytecode allocator swaps operands for ISGT and ISGE comparisons.
I believe this should be called "bytecode emitter" or just "frontend".
> > > When it happens, the aforementioned rule for registers allocations
> > > may be violated. When it happens, and this chunk is recording, the slot
> > > with TREF_FRAME is not rewritten (but the next empty slot after
> > > TREF_FRAME is) during bytecode recording. This leads to JIT slots
> > > inconsistency and assertion failure in `rec_check_slots()` during
> > > recording the next bytecode instruction.
> > >
> > > This patch fixes bytecode register allocation by changing the register
> > > allocation order in case of ISGT and ISGE bytecodes.
> >
> > It's better to use "virtual register" or even "VM register" to avoid
> > ambiguous plain "register" usage.
>
> Changed to VM register.
>
> >
> > >
> > > Sergey Kaplun:
> > > * added the description and the test for the problem
> > >
> > > Resolves tarantool/tarantool#6227
> >
> > Minor: Why #5629 is not mentioned?
>
> Added.
> Branch is updated and force-pushed.
>
> >
> > > ---
> > >
> > > Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > > Tarantool branch: https://github.com/tarantool/tarantool/tree/skaplun/gh-6227-fix-bytecode-allocator-for-comp
> > > Issue: https://github.com/tarantool/tarantool/issues/6227
> > >
> > > src/lj_parse.c | 7 +++-
> > > ...ytecode-allocator-for-comparisons.test.lua | 41 +++++++++++++++++++
> > > 2 files changed, 46 insertions(+), 2 deletions(-)
> > > create mode 100644 test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > >
<snipped>
> > > diff --git a/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > new file mode 100644
> > > index 00000000..66f6885e
> > > --- /dev/null
> > > +++ b/test/tarantool-tests/gh-6227-bytecode-allocator-for-comparisons.test.lua
> > > @@ -0,0 +1,41 @@
> > > +local tap = require('tap')
> > > +local test = tap.test('gh-6227-bytecode-allocator-for-comparisons')
> > > +test:plan(1)
> > > +
> > > +-- Test file to demonstrate assertion failure during recording
> > > +-- wrong allocated bytecode for comparisons.
> > > +-- See also https://github.com/tarantool/tarantool/issues/6227.
> > > +
> > > +-- Need function with RET0 bytecode to avoid reset of
> > > +-- the first JIT slot with frame info. Also need no assignments
> > > +-- by the caller.
> > > +local function empty() end
> > > +
> > > +local uv = 0
> > > +
> > > +-- This function needs to reset register enumerating.
> > > +-- Also set `J->maxslot` to zero.
Please add the reason, why J->maxslot is zero (it is initialized with
nargs in <rec_call_setup>).
> > > +-- The upvalue function to call is loaded to 0 slot.
> > > +local function bump_frame()
> > > + -- First call function with RET0 to set TREF_FRAME in the
> > > + -- last slot.
> > > + empty()
> > > + -- Test ISGE or ISGT bytecode. These bytecodes swap their
> > > + -- operands. Also, a constant is always loaded into the slot
> > > + -- smaller than upvalue. So, if upvalue loads before KSHORT,
> > > + -- then the difference between registers is more than 2 (2 is
> > > + -- needed for LJ_FR2) and TREF_FRAME slot is not rewriting by
> > > + -- the bytecode after call and return as expected. That leads
If the constant is loaded into a slot prior to the one with an upvalue,
then how upvalue can be loaded *before* KSHORT? How the difference
becomes more than 2? I don't get this math.
Furthermore, what does stop you from using local variables?
> > > + -- to recording slots inconsistency and assertion failure at
> > > + -- `rec_check_slots()`.
> > > + empty(1>uv)
> > > +end
> > > +
> > > +jit.opt.start('hotloop=1')
It's worth to mention, that such JIT engine tuning allows to compile
<empty> function at first, and only later compile the loop below. As a
result <empty> function is not inlined into the loop body, so the fix
can be checked.
> > > +
> > > +for _ = 1,3 do
Minor: Space is missing after the comma.
> > > + bump_frame()
> > > +end
> > > +
> > > +test:ok(true)
> > > +os.exit(test:check() and 0 or 1)
> > > --
> > > 2.31.0
> > >
> >
> > --
> > Best regards,
> > IM
>
> --
> Best regards,
> Sergey Kaplun
--
Best regards,
IM
More information about the Tarantool-patches
mailing list