Hi, Sergey!
Thanks for the fixes!
LGTM
--
Best regards,
Maxim Kokryashkin
 
 
Среда, 16 августа 2023, 18:22 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:
 
Hi, Maxim!
Thanks for the review!
Please, see my answers below.

On 16.08.23, Maxim Kokryashkin wrote:
> Hi, Sergey!
> Thanks for the patch!
> Please consider my comments below.
> On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:
> > From: Mike Pall <mike>
> >
> > Contributed by James Cowgill.
> >
> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)
> >
> > The issue is observed for the following merged IRs:
> > | p64 HREF 0001 "a" ; or other keys
> > | > p64 EQ 0002 [0x4002d0c528] ; nilnode
> > Sometimes, when we need to rematerialize a constant during evicting of
> Typo: s/during evicting/during the eviction/

Fixed.

> > the register. So, the instruction related to constant rematerialization
> Sometimes happens what? The sentence looks kind of chopped.

The "when" is misleading here. Dropped it.

> > is placed in the delay branch slot, which suppose to contain the loads
> Typo: s/which suppose/which is supposed/

Fixed.

> > of trace exit number to the `$ra` register. The resulting assembly is
> Typo: s/number/numbers/ (because of `loads` being in the plural form)

Fixed.

> > the following (for example):
> > | beq ra, r1, 0x400abee9b0 ->exit
> > | lui r1, 65531 ; delay slot without setting of the `ra`
> > This leading to the assertion failure during trace exit in
> Typo: s/leading/leads/

Fixed.

> > `lj_trace_exit()`, since a trace number is incorrect.
> >
> > This patch moves the constant register allocations above the main
> > instruction emitting code in `asm_href()`.
> AFAICS, It is not just moved, the register allocation logic has changed too.
> Before the patch, there were a few cases of inplace emissions, which
> disappeared after the patch. I believe it is important to mention to, along
> with a more detailed description of the logic changes.

No, the logic is just the same, we just choose the register early.
Since we use now `cmp64` register everywhere, there is no need to use
duplicate code in if - else if - else chunks.

> >
> > Sergey Kaplun:
> > * added the description and the test for the problem
> >
> > Part of tarantool/tarantool#8825
> > ---
> > src/lj_asm_mips.h | 42 +++++---
> > ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++
> > 2 files changed, 126 insertions(+), 17 deletions(-)
> > create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> >
> > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h
> > index c27d8413..23ffc3aa 100644
> > --- a/src/lj_asm_mips.h
> > +++ b/src/lj_asm_mips.h
> > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> > Reg dest = ra_dest(as, ir, allow);
> > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));
> > Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;
> > +#if LJ_64
> > + Reg cmp64 = RID_NONE;
> > +#endif
> > IRRef refkey = ir->op2;
> > IRIns *irkey = IR(refkey);
> > int isk = irref_isk(refkey);
> > @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> > #endif
> > tmp2 = ra_scratch(as, allow);
> > rset_clear(allow, tmp2);
> > +#if LJ_64
> > + if (LJ_SOFTFP || !irt_isnum(kt)) {
> > + /* Allocate cmp64 register used for 64-bit comparisons */
> > + if (LJ_SOFTFP && irt_isnum(kt)) {
> > + cmp64 = key;
> > + } else if (!isk && irt_isaddr(kt)) {
> > + cmp64 = tmp2;
> > + } else {
> > + int64_t k;
> > + if (isk && irt_isaddr(kt)) {
> > + k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> > + } else {
> > + lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> > + k = ~((int64_t)~irt_toitype(ir->t) << 47);
> > + }
> > + cmp64 = ra_allock(as, k, allow);
> > + rset_clear(allow, cmp64);
> > + }
> > + }
> > +#endif
> >
> > /* Key not found in chain: jump to exit (if merged) or load niltv. */
> > l_end = emit_label(as);
> > @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)
> > emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);
> > emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);
> > emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > - } else if (LJ_SOFTFP && irt_isnum(kt)) {
> > - emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);
> > - emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > - } else if (irt_isaddr(kt)) {
> > - Reg refk = tmp2;
> > - if (isk) {
> > - int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;
> > - refk = ra_allock(as, k, allow);
> > - rset_clear(allow, refk);
> > - }
> > - emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);
> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> > } else {
> > - Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);
> > - rset_clear(allow, pri);
> > - lua_assert(irt_ispri(kt) && !irt_isnil(kt));
> > - emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);
> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));
> > + emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);
> > + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));
> > }
> > *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);
> > if (!isk && irt_isaddr(kt)) {
> > diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> > new file mode 100644
> > index 00000000..8c75e69c
> > --- /dev/null
> > +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
> > @@ -0,0 +1,101 @@
> > +local tap = require('tap')
> > +-- Test file to demonstrate the incorrect JIT behaviour for HREF
> > +-- IR compilation on mips64.
> > +-- See also https://github.com/LuaJIT/LuaJIT/pull/362.
> > +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
> > + ['Test requires JIT enabled'] = not jit.status(),
> > +})
> > +
> > +test:plan(1)
> > +
> > +-- To reproduce the issue we need to compile a trace with
> > +-- `IR_HREF`, with a lookup of constant hash key GC value. To
> Typo: s/constant/a constant/

Fixed.

> > +-- prevent an `IR_HREFK` to be emitted instead, we need a table
> Typo: s/to be/from being/

Fixed.

> > +-- with a huge hash part. Delta of address between the start of
> Typo: s/Delta/The delta/

Fixed.

> > +-- the hash part of the table and the current node to lookup must
> > +-- be more than `(1024 * 64 - 1) * sizeof(Node)`.
> Typo: s/more/greater/

Fixed.

> > +-- See <src/lj_record.c>, for details.
> > +-- XXX: This constant is well suited to prevent test to be flaky,
> Typo: s/to be/from being/

Fixed.

> > +-- because the aforementioned delta is always large enough.
> > +-- Also, this constant avoids table rehashing, when inserting new
> > +-- keys.
> > +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
> > +
> > +-- XXX: don't set `hotexit` to prevent compilation of trace after
> > +-- exiting the main test cycle.
> I suggest rehprasing it the following way:
> | The `hotexit` option is not set to prevent the compilation of traces
> | after the emission of the main test cycle.

Rephrased.

> > +jit.opt.start('hotloop=1')
> > +
> > +-- Don't use `table.new()`, here by intence -- this leads to the
> Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/

Fixed.

> > +-- allocation failure for the mcode memory, so traces are not
> > +-- compiled.
> > +local filled_tab = {}
> > +-- Filling-up the table with GC values to minimize the amount of
> Typo: s/Filling-up/Fill up/

Fixed.

> > +-- hash collisions and increase delta between the start of the
> Typo: s/delta/the delta/

Fixed.

> > +-- hash part of the table and currently stored node.
> Typo: s/currently/the currently/

Fixed.

> > +for _ = 1, N_HASH_FIELDS do
> > + filled_tab[1LL] = 1
> > +end
> > +
> > +-- luacheck: no unused
> > +local tab_value_a
> > +local tab_value_b
> > +local tab_value_c
> > +local tab_value_d
> > +local tab_value_e
> > +local tab_value_f
> > +local tab_value_g
> > +local tab_value_h
> > +local tab_value_i
> > +
> > +-- The function for this trace has a bunch of the following IRs:
> > +-- p64 HREF 0001 "a" ; or other keys
> > +-- > p64 EQ 0002 [0x4002d0c528] ; nilnode
> > +-- Sometimes, when we need to rematerialize a constant during
> > +-- evicting of the register. So, the instruction related to
> Typo: s/evicting/the eviction/

Fixed.

> Again, sometimes happens what?

The "when" is misleading here. Dropped it.

> > +-- constant rematerialization is placed in the delay branch slot,
> > +-- which suppose to contain the loads of trace exit number to the
> Typo: s/which suppose/which is supposed/

Fixed.

> Typo: s/number/numbers/

Fixed.

> > +-- `$ra` register. This leading to the assertion failure during
> Typo: s/leading/leads/

Fixed.

> > +-- trace exit in `lj_trace_exit()`, since a trace number is
> > +-- incorrect. The amount of the side exit to check is empirical
> Typo: s/exit/exits/

Fixed.

> > +-- (even a little bit more, than necessary just in case).
> Typo: s/more/greater/

Fixed.

> > +local function href_const(tab)
> > + tab_value_a = tab.a
> > + tab_value_b = tab.b
> > + tab_value_c = tab.c
> > + tab_value_d = tab.d
> > + tab_value_e = tab.e
> > + tab_value_f = tab.f
> > + tab_value_g = tab.g
> > + tab_value_h = tab.h
> > + tab_value_i = tab.i
> > +end
> > +
> > +-- Compile main trace first.
> Typo: s/main/the main/

Fixed.

> > +href_const(filled_tab)
> > +href_const(filled_tab)
> > +
> > +-- Now brute-force side exits to check that they are compiled
> > +-- correct. Take side exits in the reverse order to take a new
> Typo: s/correct/correctly/
> Typo: s/the reverse/reverse/

Fixed.

<snipped>

See the iterative patch below:

===================================================================
diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
index 8c75e69c..b4ee9e2b 100644
--- a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua
@@ -9,29 +9,29 @@ local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({
 test:plan(1)
 
 -- To reproduce the issue we need to compile a trace with
--- `IR_HREF`, with a lookup of constant hash key GC value. To
--- prevent an `IR_HREFK` to be emitted instead, we need a table
--- with a huge hash part. Delta of address between the start of
--- the hash part of the table and the current node to lookup must
--- be more than `(1024 * 64 - 1) * sizeof(Node)`.
+-- `IR_HREF`, with a lookup of a constant hash key GC value. To
+-- prevent an `IR_HREFK` from being emitted instead, we need a
+-- table with a huge hash part. The delta of address between the
+-- start of the hash part of the table and the current node to
+-- lookup must be greater than `(1024 * 64 - 1) * sizeof(Node)`.
 -- See <src/lj_record.c>, for details.
--- XXX: This constant is well suited to prevent test to be flaky,
--- because the aforementioned delta is always large enough.
+-- XXX: This constant is well suited to prevent test from being
+-- flaky, because the aforementioned delta is always large enough.
 -- Also, this constant avoids table rehashing, when inserting new
 -- keys.
 local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15
 
--- XXX: don't set `hotexit` to prevent compilation of trace after
--- exiting the main test cycle.
+-- XXX: The `hotexit` option is not set to prevent the compilation
+-- of traces after the emission of the main test cycle.
 jit.opt.start('hotloop=1')
 
--- Don't use `table.new()`, here by intence -- this leads to the
--- allocation failure for the mcode memory, so traces are not
+-- `table.new()` is not used here by intention -- this leads to
+-- the allocation failure for the mcode memory, so traces are not
 -- compiled.
 local filled_tab = {}
--- Filling-up the table with GC values to minimize the amount of
--- hash collisions and increase delta between the start of the
--- hash part of the table and currently stored node.
+-- Fill up the table with GC values to minimize the amount of hash
+-- collisions and increase the delta between the start of the hash
+-- part of the table and the currently stored node.
 for _ = 1, N_HASH_FIELDS do
   filled_tab[1LL] = 1
 end
@@ -50,14 +50,14 @@ local tab_value_i
 -- The function for this trace has a bunch of the following IRs:
 -- p64 HREF 0001 "a" ; or other keys
 -- > p64 EQ 0002 [0x4002d0c528] ; nilnode
--- Sometimes, when we need to rematerialize a constant during
--- evicting of the register. So, the instruction related to
+-- Sometimes, we need to rematerialize a constant during the
+-- eviction of the register. So, the instruction related to
 -- constant rematerialization is placed in the delay branch slot,
--- which suppose to contain the loads of trace exit number to the
--- `$ra` register. This leading to the assertion failure during
--- trace exit in `lj_trace_exit()`, since a trace number is
--- incorrect. The amount of the side exit to check is empirical
--- (even a little bit more, than necessary just in case).
+-- which is supposed to contain the load of the trace exit number
+-- to the `$ra` register. This leads to the assertion failure
+-- during trace exit in `lj_trace_exit()`, since a trace number is
+-- incorrect. The amount of the side exits to check is empirical
+-- (even a little bit greater, than necessary just in case).
 local function href_const(tab)
   tab_value_a = tab.a
   tab_value_b = tab.b
@@ -70,13 +70,13 @@ local function href_const(tab)
   tab_value_i = tab.i
 end
 
--- Compile main trace first.
+-- Compile the main trace first.
 href_const(filled_tab)
 href_const(filled_tab)
 
 -- Now brute-force side exits to check that they are compiled
--- correct. Take side exits in the reverse order to take a new
--- side exit each time.
+-- correctly. Take side exits in reverse order to take a new side
+-- exit each time.
 filled_tab.i = 'i'
 href_const(filled_tab)
 filled_tab.h = 'h'
===================================================================

> >

--
Best regards,
Sergey Kaplun