<HTML><BODY><div>Hi, Sergey!</div><div>Thanks for the fixes!</div><div>LGTM</div><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Best regards,</div><div>Maxim Kokryashkin</div></div></div><div> </div><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">Среда, 16 августа 2023, 18:22 +03:00 от Sergey Kaplun <skaplun@tarantool.org>:<br> <div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_16921993240612571467_BODY">Hi, Maxim!<br>Thanks for the review!<br>Please, see my answers below.<br><br>On 16.08.23, Maxim Kokryashkin wrote:<br>> Hi, Sergey!<br>> Thanks for the patch!<br>> Please consider my comments below.<br>> On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote:<br>> > From: Mike Pall <mike><br>> ><br>> > Contributed by James Cowgill.<br>> ><br>> > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2)<br>> ><br>> > The issue is observed for the following merged IRs:<br>> > | p64 HREF 0001 "a" ; or other keys<br>> > | > p64 EQ 0002 [0x4002d0c528] ; nilnode<br>> > Sometimes, when we need to rematerialize a constant during evicting of<br>> Typo: s/during evicting/during the eviction/<br><br>Fixed.<br><br>> > the register. So, the instruction related to constant rematerialization<br>> Sometimes happens what? The sentence looks kind of chopped.<br><br>The "when" is misleading here. Dropped it.<br><br>> > is placed in the delay branch slot, which suppose to contain the loads<br>> Typo: s/which suppose/which is supposed/<br><br>Fixed.<br><br>> > of trace exit number to the `$ra` register. The resulting assembly is<br>> Typo: s/number/numbers/ (because of `loads` being in the plural form)<br><br>Fixed.<br><br>> > the following (for example):<br>> > | beq ra, r1, 0x400abee9b0 ->exit<br>> > | lui r1, 65531 ; delay slot without setting of the `ra`<br>> > This leading to the assertion failure during trace exit in<br>> Typo: s/leading/leads/<br><br>Fixed.<br><br>> > `lj_trace_exit()`, since a trace number is incorrect.<br>> ><br>> > This patch moves the constant register allocations above the main<br>> > instruction emitting code in `asm_href()`.<br>> AFAICS, It is not just moved, the register allocation logic has changed too.<br>> Before the patch, there were a few cases of inplace emissions, which<br>> disappeared after the patch. I believe it is important to mention to, along<br>> with a more detailed description of the logic changes.<br><br>No, the logic is just the same, we just choose the register early.<br>Since we use now `cmp64` register everywhere, there is no need to use<br>duplicate code in if - else if - else chunks.<br><br>> ><br>> > Sergey Kaplun:<br>> > * added the description and the test for the problem<br>> ><br>> > Part of tarantool/tarantool#8825<br>> > ---<br>> > src/lj_asm_mips.h | 42 +++++---<br>> > ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++<br>> > 2 files changed, 126 insertions(+), 17 deletions(-)<br>> > create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua<br>> ><br>> > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h<br>> > index c27d8413..23ffc3aa 100644<br>> > --- a/src/lj_asm_mips.h<br>> > +++ b/src/lj_asm_mips.h<br>> > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)<br>> > Reg dest = ra_dest(as, ir, allow);<br>> > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest));<br>> > Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2;<br>> > +#if LJ_64<br>> > + Reg cmp64 = RID_NONE;<br>> > +#endif<br>> > IRRef refkey = ir->op2;<br>> > IRIns *irkey = IR(refkey);<br>> > int isk = irref_isk(refkey);<br>> > @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)<br>> > #endif<br>> > tmp2 = ra_scratch(as, allow);<br>> > rset_clear(allow, tmp2);<br>> > +#if LJ_64<br>> > + if (LJ_SOFTFP || !irt_isnum(kt)) {<br>> > + /* Allocate cmp64 register used for 64-bit comparisons */<br>> > + if (LJ_SOFTFP && irt_isnum(kt)) {<br>> > + cmp64 = key;<br>> > + } else if (!isk && irt_isaddr(kt)) {<br>> > + cmp64 = tmp2;<br>> > + } else {<br>> > + int64_t k;<br>> > + if (isk && irt_isaddr(kt)) {<br>> > + k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;<br>> > + } else {<br>> > + lua_assert(irt_ispri(kt) && !irt_isnil(kt));<br>> > + k = ~((int64_t)~irt_toitype(ir->t) << 47);<br>> > + }<br>> > + cmp64 = ra_allock(as, k, allow);<br>> > + rset_clear(allow, cmp64);<br>> > + }<br>> > + }<br>> > +#endif<br>> ><br>> > /* Key not found in chain: jump to exit (if merged) or load niltv. */<br>> > l_end = emit_label(as);<br>> > @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge)<br>> > emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15);<br>> > emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum);<br>> > emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));<br>> > - } else if (LJ_SOFTFP && irt_isnum(kt)) {<br>> > - emit_branch(as, MIPSI_BEQ, tmp1, key, l_end);<br>> > - emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));<br>> > - } else if (irt_isaddr(kt)) {<br>> > - Reg refk = tmp2;<br>> > - if (isk) {<br>> > - int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64;<br>> > - refk = ra_allock(as, k, allow);<br>> > - rset_clear(allow, refk);<br>> > - }<br>> > - emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end);<br>> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));<br>> > } else {<br>> > - Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow);<br>> > - rset_clear(allow, pri);<br>> > - lua_assert(irt_ispri(kt) && !irt_isnil(kt));<br>> > - emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end);<br>> > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key));<br>> > + emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end);<br>> > + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64));<br>> > }<br>> > *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu);<br>> > if (!isk && irt_isaddr(kt)) {<br>> > diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua<br>> > new file mode 100644<br>> > index 00000000..8c75e69c<br>> > --- /dev/null<br>> > +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua<br>> > @@ -0,0 +1,101 @@<br>> > +local tap = require('tap')<br>> > +-- Test file to demonstrate the incorrect JIT behaviour for HREF<br>> > +-- IR compilation on mips64.<br>> > +-- See also <a href="https://github.com/LuaJIT/LuaJIT/pull/362" target="_blank">https://github.com/LuaJIT/LuaJIT/pull/362</a>.<br>> > +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({<br>> > + ['Test requires JIT enabled'] = not jit.status(),<br>> > +})<br>> > +<br>> > +test:plan(1)<br>> > +<br>> > +-- To reproduce the issue we need to compile a trace with<br>> > +-- `IR_HREF`, with a lookup of constant hash key GC value. To<br>> Typo: s/constant/a constant/<br><br>Fixed.<br><br>> > +-- prevent an `IR_HREFK` to be emitted instead, we need a table<br>> Typo: s/to be/from being/<br><br>Fixed.<br><br>> > +-- with a huge hash part. Delta of address between the start of<br>> Typo: s/Delta/The delta/<br><br>Fixed.<br><br>> > +-- the hash part of the table and the current node to lookup must<br>> > +-- be more than `(1024 * 64 - 1) * sizeof(Node)`.<br>> Typo: s/more/greater/<br><br>Fixed.<br><br>> > +-- See <src/lj_record.c>, for details.<br>> > +-- XXX: This constant is well suited to prevent test to be flaky,<br>> Typo: s/to be/from being/<br><br>Fixed.<br><br>> > +-- because the aforementioned delta is always large enough.<br>> > +-- Also, this constant avoids table rehashing, when inserting new<br>> > +-- keys.<br>> > +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15<br>> > +<br>> > +-- XXX: don't set `hotexit` to prevent compilation of trace after<br>> > +-- exiting the main test cycle.<br>> I suggest rehprasing it the following way:<br>> | The `hotexit` option is not set to prevent the compilation of traces<br>> | after the emission of the main test cycle.<br><br>Rephrased.<br><br>> > +jit.opt.start('hotloop=1')<br>> > +<br>> > +-- Don't use `table.new()`, here by intence -- this leads to the<br>> Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/<br><br>Fixed.<br><br>> > +-- allocation failure for the mcode memory, so traces are not<br>> > +-- compiled.<br>> > +local filled_tab = {}<br>> > +-- Filling-up the table with GC values to minimize the amount of<br>> Typo: s/Filling-up/Fill up/<br><br>Fixed.<br><br>> > +-- hash collisions and increase delta between the start of the<br>> Typo: s/delta/the delta/<br><br>Fixed.<br><br>> > +-- hash part of the table and currently stored node.<br>> Typo: s/currently/the currently/<br><br>Fixed.<br><br>> > +for _ = 1, N_HASH_FIELDS do<br>> > + filled_tab[1LL] = 1<br>> > +end<br>> > +<br>> > +-- luacheck: no unused<br>> > +local tab_value_a<br>> > +local tab_value_b<br>> > +local tab_value_c<br>> > +local tab_value_d<br>> > +local tab_value_e<br>> > +local tab_value_f<br>> > +local tab_value_g<br>> > +local tab_value_h<br>> > +local tab_value_i<br>> > +<br>> > +-- The function for this trace has a bunch of the following IRs:<br>> > +-- p64 HREF 0001 "a" ; or other keys<br>> > +-- > p64 EQ 0002 [0x4002d0c528] ; nilnode<br>> > +-- Sometimes, when we need to rematerialize a constant during<br>> > +-- evicting of the register. So, the instruction related to<br>> Typo: s/evicting/the eviction/<br><br>Fixed.<br><br>> Again, sometimes happens what?<br><br>The "when" is misleading here. Dropped it.<br><br>> > +-- constant rematerialization is placed in the delay branch slot,<br>> > +-- which suppose to contain the loads of trace exit number to the<br>> Typo: s/which suppose/which is supposed/<br><br>Fixed.<br><br>> Typo: s/number/numbers/<br><br>Fixed.<br><br>> > +-- `$ra` register. This leading to the assertion failure during<br>> Typo: s/leading/leads/<br><br>Fixed.<br><br>> > +-- trace exit in `lj_trace_exit()`, since a trace number is<br>> > +-- incorrect. The amount of the side exit to check is empirical<br>> Typo: s/exit/exits/<br><br>Fixed.<br><br>> > +-- (even a little bit more, than necessary just in case).<br>> Typo: s/more/greater/<br><br>Fixed.<br><br>> > +local function href_const(tab)<br>> > + tab_value_a = tab.a<br>> > + tab_value_b = tab.b<br>> > + tab_value_c = tab.c<br>> > + tab_value_d = tab.d<br>> > + tab_value_e = tab.e<br>> > + tab_value_f = tab.f<br>> > + tab_value_g = tab.g<br>> > + tab_value_h = tab.h<br>> > + tab_value_i = tab.i<br>> > +end<br>> > +<br>> > +-- Compile main trace first.<br>> Typo: s/main/the main/<br><br>Fixed.<br><br>> > +href_const(filled_tab)<br>> > +href_const(filled_tab)<br>> > +<br>> > +-- Now brute-force side exits to check that they are compiled<br>> > +-- correct. Take side exits in the reverse order to take a new<br>> Typo: s/correct/correctly/<br>> Typo: s/the reverse/reverse/<br><br>Fixed.<br><br><snipped><br><br>See the iterative patch below:<br><br>===================================================================<br>diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua<br>index 8c75e69c..b4ee9e2b 100644<br>--- a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua<br>+++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua<br>@@ -9,29 +9,29 @@ local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({<br> test:plan(1)<br> <br> -- To reproduce the issue we need to compile a trace with<br>--- `IR_HREF`, with a lookup of constant hash key GC value. To<br>--- prevent an `IR_HREFK` to be emitted instead, we need a table<br>--- with a huge hash part. Delta of address between the start of<br>--- the hash part of the table and the current node to lookup must<br>--- be more than `(1024 * 64 - 1) * sizeof(Node)`.<br>+-- `IR_HREF`, with a lookup of a constant hash key GC value. To<br>+-- prevent an `IR_HREFK` from being emitted instead, we need a<br>+-- table with a huge hash part. The delta of address between the<br>+-- start of the hash part of the table and the current node to<br>+-- lookup must be greater than `(1024 * 64 - 1) * sizeof(Node)`.<br> -- See <src/lj_record.c>, for details.<br>--- XXX: This constant is well suited to prevent test to be flaky,<br>--- because the aforementioned delta is always large enough.<br>+-- XXX: This constant is well suited to prevent test from being<br>+-- flaky, because the aforementioned delta is always large enough.<br> -- Also, this constant avoids table rehashing, when inserting new<br> -- keys.<br> local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15<br> <br>--- XXX: don't set `hotexit` to prevent compilation of trace after<br>--- exiting the main test cycle.<br>+-- XXX: The `hotexit` option is not set to prevent the compilation<br>+-- of traces after the emission of the main test cycle.<br> jit.opt.start('hotloop=1')<br> <br>--- Don't use `table.new()`, here by intence -- this leads to the<br>--- allocation failure for the mcode memory, so traces are not<br>+-- `table.new()` is not used here by intention -- this leads to<br>+-- the allocation failure for the mcode memory, so traces are not<br> -- compiled.<br> local filled_tab = {}<br>--- Filling-up the table with GC values to minimize the amount of<br>--- hash collisions and increase delta between the start of the<br>--- hash part of the table and currently stored node.<br>+-- Fill up the table with GC values to minimize the amount of hash<br>+-- collisions and increase the delta between the start of the hash<br>+-- part of the table and the currently stored node.<br> for _ = 1, N_HASH_FIELDS do<br> filled_tab[1LL] = 1<br> end<br>@@ -50,14 +50,14 @@ local tab_value_i<br> -- The function for this trace has a bunch of the following IRs:<br> -- p64 HREF 0001 "a" ; or other keys<br> -- > p64 EQ 0002 [0x4002d0c528] ; nilnode<br>--- Sometimes, when we need to rematerialize a constant during<br>--- evicting of the register. So, the instruction related to<br>+-- Sometimes, we need to rematerialize a constant during the<br>+-- eviction of the register. So, the instruction related to<br> -- constant rematerialization is placed in the delay branch slot,<br>--- which suppose to contain the loads of trace exit number to the<br>--- `$ra` register. This leading to the assertion failure during<br>--- trace exit in `lj_trace_exit()`, since a trace number is<br>--- incorrect. The amount of the side exit to check is empirical<br>--- (even a little bit more, than necessary just in case).<br>+-- which is supposed to contain the load of the trace exit number<br>+-- to the `$ra` register. This leads to the assertion failure<br>+-- during trace exit in `lj_trace_exit()`, since a trace number is<br>+-- incorrect. The amount of the side exits to check is empirical<br>+-- (even a little bit greater, than necessary just in case).<br> local function href_const(tab)<br> tab_value_a = tab.a<br> tab_value_b = tab.b<br>@@ -70,13 +70,13 @@ local function href_const(tab)<br> tab_value_i = tab.i<br> end<br> <br>--- Compile main trace first.<br>+-- Compile the main trace first.<br> href_const(filled_tab)<br> href_const(filled_tab)<br> <br> -- Now brute-force side exits to check that they are compiled<br>--- correct. Take side exits in the reverse order to take a new<br>--- side exit each time.<br>+-- correctly. Take side exits in reverse order to take a new side<br>+-- exit each time.<br> filled_tab.i = 'i'<br> href_const(filled_tab)<br> filled_tab.h = 'h'<br>===================================================================<br><br>> ><br><br>--<br>Best regards,<br>Sergey Kaplun</div></div></div></div></blockquote><div> </div></BODY></HTML>