From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 6C606580F72; Wed, 16 Aug 2023 18:22:06 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6C606580F72 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1692199326; bh=PTnobFq6efMB0AEcqauimQVWaLvEgq3hjbeaw7BN4qM=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Kl+0btjkm+tud3wr+1CPxLvrQFW/mb8HPOAuDAOnRveGBgwqC1gfWeoskYuuGbkje ywUzLIzE2KL6WaYlXvbRhzoU61llvIdwMJSzm8BXwoJfCNLb6J82/U6mUHrNEyY5vc AD9NJrXFALVhIV8hYNAdMHdhj9D0mAiu0+MrsrMU= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 02F59580F72 for ; Wed, 16 Aug 2023 18:22:04 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 02F59580F72 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1qWILT-0003HH-U5; Wed, 16 Aug 2023 18:22:04 +0300 Date: Wed, 16 Aug 2023 18:17:16 +0300 To: Maxim Kokryashkin Message-ID: References: <37c2435a3529beb36c2e428f9c8e8b5c007c68e7.1691592488.git.skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9700E0DCE2907754D12A1D0741C7E82ED7F70A80435AA08BE182A05F538085040EB6508A197E4F6CF46E9B582721B913C2A5569649B3FA115F2E82D4E989B4A90 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE71BB7708D34E2BFDAEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637651D61939D0B3DD78638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D815C4523385C1E86D231B404614F17088117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCB816BE3345416868389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC8B861051D4BA689FCF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F7900637B6C899C330ACEAEA22CA9DD8327EE4930A3850AC1BE2E735C8D5298E42E60C1FC4224003CC83647689D4C264860C145E X-B7AD71C0: 4965CFDFE0519134C1FE400A9E48C5401DD40DE57556AFB266D16FC5F53507A1816E0A2A8F779BBED8D40077074E805C66D16FC5F53507A117535B0CF9F6D0C3EE9D5CB6078CC77C2963DF80DB83C08AEFFFE7C7C1A70394 X-C1DE0DAB: 0D63561A33F958A5C5D376404BBC2D6420F27E923051E8047BC8AF03FFE5F825F87CCE6106E1FC07E67D4AC08A07B9B01F9513A7CA91E555CB5012B2E24CD356 X-C8649E89: 1C3962B70DF3F0AD75DCE07D45A749953FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CF646303DEE7FF31D5E89BC645491FE357A7C293215C404BF18E65D1BB82B319CB3D7E4ABC8EFFEBBE24F7434E9608BF9F22554EBB4BBE8D4B8DE5DB5A0F4DA847A74DFFEFA5DC0E7F02C26D483E81D6BE5EF9655DD6DEA7D65774BB76CC95456EEC5B5AD62611EEC62B5AFB4261A09AF0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojHVl7ekwB6hh6AxXq+xcF1w== X-DA7885C5: 67631F420804EABEBC7BBB9BC1DFC07C2BC19150D2429497757C6B811A2E653C262E2D401490A4A0DB037EFA58388B346E8BC1A9835FDE71 X-Mailru-Sender: 689FA8AB762F73930F533AC2B33E986B8487E68E14AE36524C1556E1704EB04A0FBE9A32752B8C9C2AA642CC12EC09F1FB559BB5D741EB962F61BD320559CF1EFD657A8799238ED55FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 17/19] MIPS64: Fix register allocation in assembly of HREF. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Maxim! Thanks for the review! Please, see my answers below. On 16.08.23, Maxim Kokryashkin wrote: > Hi, Sergey! > Thanks for the patch! > Please consider my comments below. > On Wed, Aug 09, 2023 at 06:36:06PM +0300, Sergey Kaplun via Tarantool-patches wrote: > > From: Mike Pall > > > > Contributed by James Cowgill. > > > > (cherry-picked from commit 99cdfbf6a1e8856f64908072ef10443a7eab14f2) > > > > The issue is observed for the following merged IRs: > > | p64 HREF 0001 "a" ; or other keys > > | > p64 EQ 0002 [0x4002d0c528] ; nilnode > > Sometimes, when we need to rematerialize a constant during evicting of > Typo: s/during evicting/during the eviction/ Fixed. > > the register. So, the instruction related to constant rematerialization > Sometimes happens what? The sentence looks kind of chopped. The "when" is misleading here. Dropped it. > > is placed in the delay branch slot, which suppose to contain the loads > Typo: s/which suppose/which is supposed/ Fixed. > > of trace exit number to the `$ra` register. The resulting assembly is > Typo: s/number/numbers/ (because of `loads` being in the plural form) Fixed. > > the following (for example): > > | beq ra, r1, 0x400abee9b0 ->exit > > | lui r1, 65531 ; delay slot without setting of the `ra` > > This leading to the assertion failure during trace exit in > Typo: s/leading/leads/ Fixed. > > `lj_trace_exit()`, since a trace number is incorrect. > > > > This patch moves the constant register allocations above the main > > instruction emitting code in `asm_href()`. > AFAICS, It is not just moved, the register allocation logic has changed too. > Before the patch, there were a few cases of inplace emissions, which > disappeared after the patch. I believe it is important to mention to, along > with a more detailed description of the logic changes. No, the logic is just the same, we just choose the register early. Since we use now `cmp64` register everywhere, there is no need to use duplicate code in if - else if - else chunks. > > > > Sergey Kaplun: > > * added the description and the test for the problem > > > > Part of tarantool/tarantool#8825 > > --- > > src/lj_asm_mips.h | 42 +++++--- > > ...-mips64-href-delay-slot-side-exit.test.lua | 101 ++++++++++++++++++ > > 2 files changed, 126 insertions(+), 17 deletions(-) > > create mode 100644 test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua > > > > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h > > index c27d8413..23ffc3aa 100644 > > --- a/src/lj_asm_mips.h > > +++ b/src/lj_asm_mips.h > > @@ -859,6 +859,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) > > Reg dest = ra_dest(as, ir, allow); > > Reg tab = ra_alloc1(as, ir->op1, rset_clear(allow, dest)); > > Reg key = RID_NONE, type = RID_NONE, tmpnum = RID_NONE, tmp1 = RID_TMP, tmp2; > > +#if LJ_64 > > + Reg cmp64 = RID_NONE; > > +#endif > > IRRef refkey = ir->op2; > > IRIns *irkey = IR(refkey); > > int isk = irref_isk(refkey); > > @@ -901,6 +904,26 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) > > #endif > > tmp2 = ra_scratch(as, allow); > > rset_clear(allow, tmp2); > > +#if LJ_64 > > + if (LJ_SOFTFP || !irt_isnum(kt)) { > > + /* Allocate cmp64 register used for 64-bit comparisons */ > > + if (LJ_SOFTFP && irt_isnum(kt)) { > > + cmp64 = key; > > + } else if (!isk && irt_isaddr(kt)) { > > + cmp64 = tmp2; > > + } else { > > + int64_t k; > > + if (isk && irt_isaddr(kt)) { > > + k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; > > + } else { > > + lua_assert(irt_ispri(kt) && !irt_isnil(kt)); > > + k = ~((int64_t)~irt_toitype(ir->t) << 47); > > + } > > + cmp64 = ra_allock(as, k, allow); > > + rset_clear(allow, cmp64); > > + } > > + } > > +#endif > > > > /* Key not found in chain: jump to exit (if merged) or load niltv. */ > > l_end = emit_label(as); > > @@ -943,24 +966,9 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) > > emit_dta(as, MIPSI_DSRA32, tmp1, tmp1, 15); > > emit_tg(as, MIPSI_DMTC1, tmp1, tmpnum); > > emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64)); > > - } else if (LJ_SOFTFP && irt_isnum(kt)) { > > - emit_branch(as, MIPSI_BEQ, tmp1, key, l_end); > > - emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64)); > > - } else if (irt_isaddr(kt)) { > > - Reg refk = tmp2; > > - if (isk) { > > - int64_t k = ((int64_t)irt_toitype(irkey->t) << 47) | irkey[1].tv.u64; > > - refk = ra_allock(as, k, allow); > > - rset_clear(allow, refk); > > - } > > - emit_branch(as, MIPSI_BEQ, tmp1, refk, l_end); > > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key)); > > } else { > > - Reg pri = ra_allock(as, ~((int64_t)~irt_toitype(ir->t) << 47), allow); > > - rset_clear(allow, pri); > > - lua_assert(irt_ispri(kt) && !irt_isnil(kt)); > > - emit_branch(as, MIPSI_BEQ, tmp1, pri, l_end); > > - emit_tsi(as, MIPSI_LD, tmp1, dest, offsetof(Node, key)); > > + emit_branch(as, MIPSI_BEQ, tmp1, cmp64, l_end); > > + emit_tsi(as, MIPSI_LD, tmp1, dest, (int32_t)offsetof(Node, key.u64)); > > } > > *l_loop = MIPSI_BNE | MIPSF_S(tmp1) | ((as->mcp-l_loop-1) & 0xffffu); > > if (!isk && irt_isaddr(kt)) { > > diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua > > new file mode 100644 > > index 00000000..8c75e69c > > --- /dev/null > > +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua > > @@ -0,0 +1,101 @@ > > +local tap = require('tap') > > +-- Test file to demonstrate the incorrect JIT behaviour for HREF > > +-- IR compilation on mips64. > > +-- See also https://github.com/LuaJIT/LuaJIT/pull/362. > > +local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({ > > + ['Test requires JIT enabled'] = not jit.status(), > > +}) > > + > > +test:plan(1) > > + > > +-- To reproduce the issue we need to compile a trace with > > +-- `IR_HREF`, with a lookup of constant hash key GC value. To > Typo: s/constant/a constant/ Fixed. > > +-- prevent an `IR_HREFK` to be emitted instead, we need a table > Typo: s/to be/from being/ Fixed. > > +-- with a huge hash part. Delta of address between the start of > Typo: s/Delta/The delta/ Fixed. > > +-- the hash part of the table and the current node to lookup must > > +-- be more than `(1024 * 64 - 1) * sizeof(Node)`. > Typo: s/more/greater/ Fixed. > > +-- See , for details. > > +-- XXX: This constant is well suited to prevent test to be flaky, > Typo: s/to be/from being/ Fixed. > > +-- because the aforementioned delta is always large enough. > > +-- Also, this constant avoids table rehashing, when inserting new > > +-- keys. > > +local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15 > > + > > +-- XXX: don't set `hotexit` to prevent compilation of trace after > > +-- exiting the main test cycle. > I suggest rehprasing it the following way: > | The `hotexit` option is not set to prevent the compilation of traces > | after the emission of the main test cycle. Rephrased. > > +jit.opt.start('hotloop=1') > > + > > +-- Don't use `table.new()`, here by intence -- this leads to the > Typo: s/Don't use `table.new()`, here by intence/`table.new()` is not used here by intention/ Fixed. > > +-- allocation failure for the mcode memory, so traces are not > > +-- compiled. > > +local filled_tab = {} > > +-- Filling-up the table with GC values to minimize the amount of > Typo: s/Filling-up/Fill up/ Fixed. > > +-- hash collisions and increase delta between the start of the > Typo: s/delta/the delta/ Fixed. > > +-- hash part of the table and currently stored node. > Typo: s/currently/the currently/ Fixed. > > +for _ = 1, N_HASH_FIELDS do > > + filled_tab[1LL] = 1 > > +end > > + > > +-- luacheck: no unused > > +local tab_value_a > > +local tab_value_b > > +local tab_value_c > > +local tab_value_d > > +local tab_value_e > > +local tab_value_f > > +local tab_value_g > > +local tab_value_h > > +local tab_value_i > > + > > +-- The function for this trace has a bunch of the following IRs: > > +-- p64 HREF 0001 "a" ; or other keys > > +-- > p64 EQ 0002 [0x4002d0c528] ; nilnode > > +-- Sometimes, when we need to rematerialize a constant during > > +-- evicting of the register. So, the instruction related to > Typo: s/evicting/the eviction/ Fixed. > Again, sometimes happens what? The "when" is misleading here. Dropped it. > > +-- constant rematerialization is placed in the delay branch slot, > > +-- which suppose to contain the loads of trace exit number to the > Typo: s/which suppose/which is supposed/ Fixed. > Typo: s/number/numbers/ Fixed. > > +-- `$ra` register. This leading to the assertion failure during > Typo: s/leading/leads/ Fixed. > > +-- trace exit in `lj_trace_exit()`, since a trace number is > > +-- incorrect. The amount of the side exit to check is empirical > Typo: s/exit/exits/ Fixed. > > +-- (even a little bit more, than necessary just in case). > Typo: s/more/greater/ Fixed. > > +local function href_const(tab) > > + tab_value_a = tab.a > > + tab_value_b = tab.b > > + tab_value_c = tab.c > > + tab_value_d = tab.d > > + tab_value_e = tab.e > > + tab_value_f = tab.f > > + tab_value_g = tab.g > > + tab_value_h = tab.h > > + tab_value_i = tab.i > > +end > > + > > +-- Compile main trace first. > Typo: s/main/the main/ Fixed. > > +href_const(filled_tab) > > +href_const(filled_tab) > > + > > +-- Now brute-force side exits to check that they are compiled > > +-- correct. Take side exits in the reverse order to take a new > Typo: s/correct/correctly/ > Typo: s/the reverse/reverse/ Fixed. See the iterative patch below: =================================================================== diff --git a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua index 8c75e69c..b4ee9e2b 100644 --- a/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua +++ b/test/tarantool-tests/lj-362-mips64-href-delay-slot-side-exit.test.lua @@ -9,29 +9,29 @@ local test = tap.test('lj-362-mips64-href-delay-slot-side-exit'):skipcond({ test:plan(1) -- To reproduce the issue we need to compile a trace with --- `IR_HREF`, with a lookup of constant hash key GC value. To --- prevent an `IR_HREFK` to be emitted instead, we need a table --- with a huge hash part. Delta of address between the start of --- the hash part of the table and the current node to lookup must --- be more than `(1024 * 64 - 1) * sizeof(Node)`. +-- `IR_HREF`, with a lookup of a constant hash key GC value. To +-- prevent an `IR_HREFK` from being emitted instead, we need a +-- table with a huge hash part. The delta of address between the +-- start of the hash part of the table and the current node to +-- lookup must be greater than `(1024 * 64 - 1) * sizeof(Node)`. -- See , for details. --- XXX: This constant is well suited to prevent test to be flaky, --- because the aforementioned delta is always large enough. +-- XXX: This constant is well suited to prevent test from being +-- flaky, because the aforementioned delta is always large enough. -- Also, this constant avoids table rehashing, when inserting new -- keys. local N_HASH_FIELDS = 2 ^ 16 + 2 ^ 15 --- XXX: don't set `hotexit` to prevent compilation of trace after --- exiting the main test cycle. +-- XXX: The `hotexit` option is not set to prevent the compilation +-- of traces after the emission of the main test cycle. jit.opt.start('hotloop=1') --- Don't use `table.new()`, here by intence -- this leads to the --- allocation failure for the mcode memory, so traces are not +-- `table.new()` is not used here by intention -- this leads to +-- the allocation failure for the mcode memory, so traces are not -- compiled. local filled_tab = {} --- Filling-up the table with GC values to minimize the amount of --- hash collisions and increase delta between the start of the --- hash part of the table and currently stored node. +-- Fill up the table with GC values to minimize the amount of hash +-- collisions and increase the delta between the start of the hash +-- part of the table and the currently stored node. for _ = 1, N_HASH_FIELDS do filled_tab[1LL] = 1 end @@ -50,14 +50,14 @@ local tab_value_i -- The function for this trace has a bunch of the following IRs: -- p64 HREF 0001 "a" ; or other keys -- > p64 EQ 0002 [0x4002d0c528] ; nilnode --- Sometimes, when we need to rematerialize a constant during --- evicting of the register. So, the instruction related to +-- Sometimes, we need to rematerialize a constant during the +-- eviction of the register. So, the instruction related to -- constant rematerialization is placed in the delay branch slot, --- which suppose to contain the loads of trace exit number to the --- `$ra` register. This leading to the assertion failure during --- trace exit in `lj_trace_exit()`, since a trace number is --- incorrect. The amount of the side exit to check is empirical --- (even a little bit more, than necessary just in case). +-- which is supposed to contain the load of the trace exit number +-- to the `$ra` register. This leads to the assertion failure +-- during trace exit in `lj_trace_exit()`, since a trace number is +-- incorrect. The amount of the side exits to check is empirical +-- (even a little bit greater, than necessary just in case). local function href_const(tab) tab_value_a = tab.a tab_value_b = tab.b @@ -70,13 +70,13 @@ local function href_const(tab) tab_value_i = tab.i end --- Compile main trace first. +-- Compile the main trace first. href_const(filled_tab) href_const(filled_tab) -- Now brute-force side exits to check that they are compiled --- correct. Take side exits in the reverse order to take a new --- side exit each time. +-- correctly. Take side exits in reverse order to take a new side +-- exit each time. filled_tab.i = 'i' href_const(filled_tab) filled_tab.h = 'h' =================================================================== > > -- Best regards, Sergey Kaplun