Hi, Sergey! Thanks for the fixes! LGTM -- Best regards, Maxim Kokryashkin     >Среда, 16 августа 2023, 18:22 +03:00 от Sergey Kaplun : >  >Hi, Maxim! >Thanks for the review! >Please, see my answers below. > >On 16.08.23, Maxim Kokryashkin wrote: >> Hi, Sergey! >> Thanks for the patch! >> Please consider my comments below. >> On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote: >> > From: Mike Pall >> > >> > Contributed by James Cowgill. >> > >> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2) >> > >> > The issue is observed for the following merged IRs: >> > | p64 HREF 0001 "a" ; or other keys >> > | > p64 EQ 0002 [0x4002d0c528] ; nilnode >> > Sometimes, when we need to rematerialize a constant during evicting of >> Typo: s/during evicting/during the eviction/ > >Fixed. > >> > the register. So, the instruction related to constant rematerialization >> Sometimes happens what? The sentence looks kind of chopped. > >The "when" is misleading here. Dropped it. > >> > is placed in the delay branch slot, which suppose to contain the loads >> Typo: s/which suppose/which is supposed/ > >Fixed. > >> > of trace exit number to the `$ra` register. The resulting assembly is >> Typo: s/number/numbers/ (because of `loads` being in the plural form) > >Fixed. > >> > the following (for example): >> > | beq ra, r1, 0x400abee9b0 ->exit >> > | lui r1, 65531 ; delay slot without setting of the `ra` >> > This leading to the assertion failure during trace exit in >> Typo: s/leading/leads/ > >Fixed. > >> > `lj_trace_exit()`, since a trace number is incorrect. >> > >> > This patch moves the constant register allocations above the main >> > instruction emitting code in `asm_href()`. >> AFAICS, It is not just moved, the register allocation logic has changed too. >> Before the patch, there were a few cases of inplace emissions, which >> disappeared after the patch. I believe it is important to mention to, along >> with a more detailed description of the logic changes. > >No, the logic is just the same, we just choose the register early. >Since we use now `cmp64` register everywhere, there is no need to use >duplicate code in if - else if - else chunks. > >> > >> > Sergey Kaplun: >> > * added the description and the test for the problem >> > >> > Part of tarantool/tarantool#8825 >> > --- >> > src/lj_asm_mips.h | 42 +++++--- >> > ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++ >> > 2 files changed, 126 insertions(+), 17 deletions(-) >> > create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua >> > >> > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h >> > index c27d8413..23ffc3aa 100644 >> > --- a/src/lj_asm_mips.h >> > +++ b/src/lj_asm_mips.h >> > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) >> > Reg dest = ra_dest(as, ir, allow); >> > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest)); >> > Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2; >> > +#if LJ_64 >> > + Reg cmp64 = RID_NONE; >> > +#endif >> > IRRef refkey = ir->op2; >> > IRIns *irkey = IR(refkey); >> > int isk = irref_isk(refkey); >> > @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) >> > #endif >> > tmp2 = ra_scratch(as, allow); >> > rset_clear(allow, tmp2); >> > +#if LJ_64 >> > + if (LJ_SOFTFP || !irt_isnum(kt)) { >> > + /* Allocate cmp64 register used for 64-bit comparisons */ >> > + if (LJ_SOFTFP && irt_isnum(kt)) { >> > + cmp64 = key; >> > + } else if (!isk && irt_isaddr(kt)) { >> > + cmp64 = tmp2; >> > + } else { >> > + int64_t k; >> > + if (isk && irt_isaddr(kt)) { >> > + k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; >> > + } else { >> > + lua_assert(irt_ispri(kt) && !irt_isnil(kt)); >> > + k = ~((int64_t)~irt_toitype(ir->t) << 47); >> > + } >> > + cmp64 = ra_allock(as, k, allow); >> > + rset_clear(allow, cmp64); >> > + } >> > + } >> > +#endif >> > >> > /* Key not found in chain: jump to exit (if merged) or load niltv. */ >> > l_end = emit_label(as); >> > @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) >> > emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15); >> > emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum); >> > emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64)); >> > - } else if (LJ_SOFTFP && irt_isnum(kt)) { >> > - emit_branch(as, MIPSI_BEQ, tmp1, key, l_end); >> > - emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64)); >> > - } else if (irt_isaddr(kt)) { >> > - Reg refk = tmp2; >> > - if (isk) { >> > - int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; >> > - refk = ra_allock(as, k, allow); >> > - rset_clear(allow, refk); >> > - } >> > - emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end); >> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key)); >> > } else { >> > - Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow); >> > - rset_clear(allow, pri); >> > - lua_assert(irt_ispri(kt) && !irt_isnil(kt)); >> > - emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end); >> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key)); >> > + emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end); >> > + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64)); >> > } >> > *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu); >> > if (!isk && irt_isaddr(kt)) { >> > diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua >> > new file mode 100644 >> > index 00000000..8c75e69c >> > --- /dev/null >> > +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua >> > @@ -0,0 +1,101 @@ >> > +local tap = require('tap') >> > +-- Test file to demonstrate the incorrect JIT behaviour for HREF >> > +-- IR compilation on mips64. >> > +-- See also https://github.com/LuaJIT/LuaJIT/pull/362 . >> > +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({ >> > + ['Test requires JIT enabled'] = not jit.status(), >> > +}) >> > + >> > +test:plan(1) >> > + >> > +-- To reproduce the issue we need to compile a trace with >> > +-- `IR_HREF`, with a lookup of constant hash key GC value. To >> Typo: s/constant/a constant/ > >Fixed. > >> > +-- prevent an `IR_HREFK` to be emitted instead, we need a table >> Typo: s/to be/from being/ > >Fixed. > >> > +-- with a huge hash part. Delta of address between the start of >> Typo: s/Delta/The delta/ > >Fixed. > >> > +-- the hash part of the table and the current node to lookup must >> > +-- be more than `(1024 * 64 - 1) * sizeof(Node)`. >> Typo: s/more/greater/ > >Fixed. > >> > +-- See , for details. >> > +-- XXX: This constant is well suited to prevent test to be flaky, >> Typo: s/to be/from being/ > >Fixed. > >> > +-- because the aforementioned delta is always large enough. >> > +-- Also, this constant avoids table rehashing, when inserting new >> > +-- keys. >> > +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15 >> > + >> > +-- XXX: don't set `hotexit` to prevent compilation of trace after >> > +-- exiting the main test cycle. >> I suggest rehprasing it the following way: >> | The `hotexit` option is not set to prevent the compilation of traces >> | after the emission of the main test cycle. > >Rephrased. > >> > +jit.opt.start('hotloop=1') >> > + >> > +-- Don't use `table.new()`, here by intence -- this leads to the >> Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/ > >Fixed. > >> > +-- allocation failure for the mcode memory, so traces are not >> > +-- compiled. >> > +local filled_tab = {} >> > +-- Filling-up the table with GC values to minimize the amount of >> Typo: s/Filling-up/Fill up/ > >Fixed. > >> > +-- hash collisions and increase delta between the start of the >> Typo: s/delta/the delta/ > >Fixed. > >> > +-- hash part of the table and currently stored node. >> Typo: s/currently/the currently/ > >Fixed. > >> > +for _ = 1, N_HASH_FIELDS do >> > + filled_tab[1LL] = 1 >> > +end >> > + >> > +-- luacheck: no unused >> > +local tab_value_a >> > +local tab_value_b >> > +local tab_value_c >> > +local tab_value_d >> > +local tab_value_e >> > +local tab_value_f >> > +local tab_value_g >> > +local tab_value_h >> > +local tab_value_i >> > + >> > +-- The function for this trace has a bunch of the following IRs: >> > +-- p64 HREF 0001 "a" ; or other keys >> > +-- > p64 EQ 0002 [0x4002d0c528] ; nilnode >> > +-- Sometimes, when we need to rematerialize a constant during >> > +-- evicting of the register. So, the instruction related to >> Typo: s/evicting/the eviction/ > >Fixed. > >> Again, sometimes happens what? > >The "when" is misleading here. Dropped it. > >> > +-- constant rematerialization is placed in the delay branch slot, >> > +-- which suppose to contain the loads of trace exit number to the >> Typo: s/which suppose/which is supposed/ > >Fixed. > >> Typo: s/number/numbers/ > >Fixed. > >> > +-- `$ra` register. This leading to the assertion failure during >> Typo: s/leading/leads/ > >Fixed. > >> > +-- trace exit in `lj_trace_exit()`, since a trace number is >> > +-- incorrect. The amount of the side exit to check is empirical >> Typo: s/exit/exits/ > >Fixed. > >> > +-- (even a little bit more, than necessary just in case). >> Typo: s/more/greater/ > >Fixed. > >> > +local function href_const(tab) >> > + tab_value_a = tab.a >> > + tab_value_b = tab.b >> > + tab_value_c = tab.c >> > + tab_value_d = tab.d >> > + tab_value_e = tab.e >> > + tab_value_f = tab.f >> > + tab_value_g = tab.g >> > + tab_value_h = tab.h >> > + tab_value_i = tab.i >> > +end >> > + >> > +-- Compile main trace first. >> Typo: s/main/the main/ > >Fixed. > >> > +href_const(filled_tab) >> > +href_const(filled_tab) >> > + >> > +-- Now brute-force side exits to check that they are compiled >> > +-- correct. Take side exits in the reverse order to take a new >> Typo: s/correct/correctly/ >> Typo: s/the reverse/reverse/ > >Fixed. > > > >See the iterative patch below: > >=================================================================== >diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua >index 8c75e69c..b4ee9e2b 100644 >--- a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua >+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua >@@ -9,29 +9,29 @@ local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({ > test:plan(1) >  > -- To reproduce the issue we need to compile a trace with >--- `IR_HREF`, with a lookup of constant hash key GC value. To >--- prevent an `IR_HREFK` to be emitted instead, we need a table >--- with a huge hash part. Delta of address between the start of >--- the hash part of the table and the current node to lookup must >--- be more than `(1024 * 64 - 1) * sizeof(Node)`. >+-- `IR_HREF`, with a lookup of a constant hash key GC value. To >+-- prevent an `IR_HREFK` from being emitted instead, we need a >+-- table with a huge hash part. The delta of address between the >+-- start of the hash part of the table and the current node to >+-- lookup must be greater than `(1024 * 64 - 1) * sizeof(Node)`. > -- See , for details. >--- XXX: This constant is well suited to prevent test to be flaky, >--- because the aforementioned delta is always large enough. >+-- XXX: This constant is well suited to prevent test from being >+-- flaky, because the aforementioned delta is always large enough. > -- Also, this constant avoids table rehashing, when inserting new > -- keys. > local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15 >  >--- XXX: don't set `hotexit` to prevent compilation of trace after >--- exiting the main test cycle. >+-- XXX: The `hotexit` option is not set to prevent the compilation >+-- of traces after the emission of the main test cycle. > jit.opt.start('hotloop=1') >  >--- Don't use `table.new()`, here by intence -- this leads to the >--- allocation failure for the mcode memory, so traces are not >+-- `table.new()` is not used here by intention -- this leads to >+-- the allocation failure for the mcode memory, so traces are not > -- compiled. > local filled_tab = {} >--- Filling-up the table with GC values to minimize the amount of >--- hash collisions and increase delta between the start of the >--- hash part of the table and currently stored node. >+-- Fill up the table with GC values to minimize the amount of hash >+-- collisions and increase the delta between the start of the hash >+-- part of the table and the currently stored node. > for _ = 1, N_HASH_FIELDS do >   filled_tab[1LL] = 1 > end >@@ -50,14 +50,14 @@ local tab_value_i > -- The function for this trace has a bunch of the following IRs: > -- p64 HREF 0001 "a" ; or other keys > -- > p64 EQ 0002 [0x4002d0c528] ; nilnode >--- Sometimes, when we need to rematerialize a constant during >--- evicting of the register. So, the instruction related to >+-- Sometimes, we need to rematerialize a constant during the >+-- eviction of the register. So, the instruction related to > -- constant rematerialization is placed in the delay branch slot, >--- which suppose to contain the loads of trace exit number to the >--- `$ra` register. This leading to the assertion failure during >--- trace exit in `lj_trace_exit()`, since a trace number is >--- incorrect. The amount of the side exit to check is empirical >--- (even a little bit more, than necessary just in case). >+-- which is supposed to contain the load of the trace exit number >+-- to the `$ra` register. This leads to the assertion failure >+-- during trace exit in `lj_trace_exit()`, since a trace number is >+-- incorrect. The amount of the side exits to check is empirical >+-- (even a little bit greater, than necessary just in case). > local function href_const(tab) >   tab_value_a = tab.a >   tab_value_b = tab.b >@@ -70,13 +70,13 @@ local function href_const(tab) >   tab_value_i = tab.i > end >  >--- Compile main trace first. >+-- Compile the main trace first. > href_const(filled_tab) > href_const(filled_tab) >  > -- Now brute-force side exits to check that they are compiled >--- correct. Take side exits in the reverse order to take a new >--- side exit each time. >+-- correctly. Take side exits in reverse order to take a new side >+-- exit each time. > filled_tab.i = 'i' > href_const(filled_tab) > filled_tab.h = 'h' >=================================================================== > >> > > >-- >Best regards, >Sergey Kaplun