From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 3C94B57648B; Wed, 9 Aug 2023 18:50:30 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3C94B57648B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1691596230; bh=/82hN9WTWH/A8RpI3ibGITOeJ/t54xmf7b29pp/h8qQ=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=y6FtXjy/c1F0CU2ajBpk0i6vGoAI4FC9iTjh/5icRy0dW03gBIl4dlxyqJtAtUnd+ u0chM1RQq51ZbmI9KxgxnglGbmJoe9n1JpYYFnBKfh436FkHpGyzldku6cZ1DGam36 /CIl/CPckivVz9FGjicEOyqtztvl4TQWTps5pgQc= Received: from smtp32.i.mail.ru (smtp32.i.mail.ru [95.163.41.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5935357648B for ; Wed, 9 Aug 2023 18:41:28 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5935357648B Received: by smtp32.i.mail.ru with esmtpa (envelope-from ) id 1qTlJP-003Nbf-0K; Wed, 09 Aug 2023 18:41:28 +0300 To: Igor Munkin , Sergey Bronnikov Date: Wed, 9 Aug 2023 18:36:08 +0300 Message-ID: <876736e650dcb70ce32a272f92cc3ba034a4dd3b.1691592488.git.skaplun@tarantool.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD969E04B5EED670DC86EE92E42F0C271DDC310B0E58918727B182A05F53808504045890B913B750DE0E8F4881418B56D3E0FD67FF0191F0EA0D8ED3C11D941A98E X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE74417F0137118BEFEEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AF379580E0D677568638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D83BC0C64CF6153E895456113DE0B9AFD6117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2FD16FCC8DB5F8BEA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F4460429728776938767073520B1593CA6EC85F86D6FD1C55BDD38FC3FD2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE4B6963042765DA4B3D5BA627BF9F2FCFD8FC6C240DEA76429C9F4D5AE37F343AA9539A8B242431040A6AB1C7CE11FEE368E4D7E803FA7AD503F1AB874ED89028C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637B8F435DEDE9E76EBEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-C1DE0DAB: 0D63561A33F958A5DE2A3F8E56A9C29F2F6BEBBAB7CE8134866A1C66C4FBB915F87CCE6106E1FC07E67D4AC08A07B9B05E3BF8C76DC23F749C5DF10A05D560A950611B66E3DA6D700B0A020F03D25A0997E3FB2386030E77 X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CF1439ABA631039779F07D44312F440B478661167B6B4D5461B08000D117106249B268D298D11CBB1A1529AA6C8ABD04CAF2688491D16F7A936B4E5AE6629F9AB4A74DFFEFA5DC0E7F02C26D483E81D6BE5EF9655DD6DEA7D65774BB76CC95456EEC5B5AD62611EEC62B5AFB4261A09AF0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojUzxoxvtYX2osJkKCHPrtow== X-Mailru-Sender: 11C2EC085EDE56FAC07928AF2646A7696393FF2C4AA9DB1CE8F4881418B56D3EBDDB26EEA8DAF3A6DEDBA653FF35249392D99EB8CC7091A70E183A470755BFD208F19895AA18418972D6B4FCE48DF648AE208404248635DF X-Mras: Ok Subject: [Tarantool-patches] [PATCH luajit 19/19] MIPS: Add MIPS64 R6 port. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" From: Mike Pall Contributed by Hua Zhang, YunQiang Su from Wave Computing, and Radovan Birdic from RT-RK. Sponsored by Wave Computing. (cherry-picked from commit 94d0b53004a5fa368defa4307a17edcdb87fe727) This patch adds support for MIPS Release 6 [1] for the 64-bit build. This includes: * Global `_map_def` value is set with . `MIPSR6` key specifies the corresponding instruction set support. Also, `MIPSR6` is defined in `DYNASM_FLAGS` (`DASM_AFLAGS`). * New instructions are added within , they are used if the aforementioned key is set. * Obsolete instructions (that are no more in use in r6) are used in the opposite case (if `MIPSR6` isn't set). * New opcode maps are added into . * `map_arch` table in is refactored for more convenient usage. Now each arch key contains a table with the corresponding info about supported architecture: - `e`: endianess; "le" or "be" - `b`: bit-width of the supported architecture; 32 or 64 - `m`: machine specification (see `e_machine` in man elf) - `f`: processor-specific flags (see `e_flags` in man elf) - `p`: number that identifies the type of target machine [2] for Portable Executable format [3]. * New `LJ_TARGET_MIPSR6` define is set for MIPSR6 in . * The corresponding "MIPS32R6", "MIPS64R6" CPU strings are added to the * MIPSR6 instructions are added to the , some obsolete instructions are removed or defined only for the non-MIPSR6 build. * All release-dependent instructions in are instrumented with `LJ_TARGET_MIPSR6` macro. * `f20`, `f21`, `f22` FP registers are defined as `FTMP0`, `FTMP1`, `FTMP2` correspondingly in the VM. * All release-dependent instructions in are instrumented with `MIPSR6` macro. * `sfmin_max` macro now takes the third operand for the MIPSR6 build. * Fix implicit fallthrough warning for `LJ_SOFTFP && !LJ_NEED_FP64` build in . Note, that 32-bit r6 targets still unsupported, because it is difficult and most available r6 CPUs are 64 bit. [1]: https://www.mips.com/products/architectures/mips64/ [2]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#machine-types [3]: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format Sergey Kaplun: * added the description for the feature Part of tarantool/tarantool#8825 --- cmake/SetDynASMFlags.cmake | 5 + dynasm/dasm_mips.h | 13 +- dynasm/dasm_mips.lua | 625 +++++++++++++++++++++++-------------- dynasm/dynasm.lua | 1 + src/Makefile.original | 3 + src/jit/bcsave.lua | 84 ++--- src/jit/dis_mips.lua | 293 +++++++++++++++-- src/jit/dis_mips64r6.lua | 17 + src/jit/dis_mips64r6el.lua | 17 + src/lj_arch.h | 29 +- src/lj_asm.c | 2 +- src/lj_asm_mips.h | 114 ++++++- src/lj_emit_mips.h | 15 +- src/lj_jit.h | 8 + src/lj_target_mips.h | 52 ++- src/vm_mips64.dasc | 370 ++++++++++++++++++++-- 16 files changed, 1301 insertions(+), 347 deletions(-) create mode 100644 src/jit/dis_mips64r6.lua create mode 100644 src/jit/dis_mips64r6el.lua diff --git a/cmake/SetDynASMFlags.cmake b/cmake/SetDynASMFlags.cmake index 142d7e64..7eead6e9 100644 --- a/cmake/SetDynASMFlags.cmake +++ b/cmake/SetDynASMFlags.cmake @@ -64,6 +64,11 @@ elseif(LUAJIT_ARCH STREQUAL "mips") endif() endif() +string(FIND "${TESTARCH}" "LJ_TARGET_MIPSR6" FOUND) +if(NOT FOUND EQUAL -1) + AppendFlags(DYNASM_FLAGS -D MIPSR6) +endif() + string(FIND "${TESTARCH}" "LJ_LE 1" FOUND) if(NOT FOUND EQUAL -1) list(APPEND DYNASM_FLAGS -D ENDIAN_LE) diff --git a/dynasm/dasm_mips.h b/dynasm/dasm_mips.h index 71a835b2..7d06aa72 100644 --- a/dynasm/dasm_mips.h +++ b/dynasm/dasm_mips.h @@ -355,14 +355,15 @@ int dasm_encode(Dst_DECL, void *buffer) CK(n >= 0, UNDEF_PC); n = *DASM_POS2PTR(D, n); if (ins & 2048) - n = n - (int)((char *)cp - base); - else n = (n + (int)(size_t)base) & 0x0fffffff; - patchrel: + else + n = n - (int)((char *)cp - base); + patchrel: { + unsigned int e = 16 + ((ins >> 12) & 15); CK((n & 3) == 0 && - ((n + ((ins & 2048) ? 0x00020000 : 0)) >> - ((ins & 2048) ? 18 : 28)) == 0, RANGE_REL); - cp[-1] |= ((n>>2) & ((ins & 2048) ? 0x0000ffff: 0x03ffffff)); + ((n + ((ins & 2048) ? 0 : (1<<(e+1)))) >> (e+2)) == 0, RANGE_REL); + cp[-1] |= ((n>>2) & ((1<= 20) D->globals[ins-10] = (void *)(base + n); diff --git a/dynasm/dasm_mips.lua b/dynasm/dasm_mips.lua index bd2a2b43..ccdc53cd 100644 --- a/dynasm/dasm_mips.lua +++ b/dynasm/dasm_mips.lua @@ -6,6 +6,7 @@ ------------------------------------------------------------------------------ local mips64 = mips64 +local mipsr6 = _map_def.MIPSR6 -- Module information: local _info = { @@ -238,7 +239,6 @@ local map_op = { bne_3 = "14000000STB", blez_2 = "18000000SB", bgtz_2 = "1c000000SB", - addi_3 = "20000000TSI", li_2 = "24000000TI", addiu_3 = "24000000TSI", slti_3 = "28000000TSI", @@ -248,40 +248,22 @@ local map_op = { ori_3 = "34000000TSU", xori_3 = "38000000TSU", lui_2 = "3c000000TU", - beqzl_2 = "50000000SB", - beql_3 = "50000000STB", - bnezl_2 = "54000000SB", - bnel_3 = "54000000STB", - blezl_2 = "58000000SB", - bgtzl_2 = "5c000000SB", - daddi_3 = mips64 and "60000000TSI", daddiu_3 = mips64 and "64000000TSI", ldl_2 = mips64 and "68000000TO", ldr_2 = mips64 and "6c000000TO", lb_2 = "80000000TO", lh_2 = "84000000TO", - lwl_2 = "88000000TO", lw_2 = "8c000000TO", lbu_2 = "90000000TO", lhu_2 = "94000000TO", - lwr_2 = "98000000TO", lwu_2 = mips64 and "9c000000TO", sb_2 = "a0000000TO", sh_2 = "a4000000TO", - swl_2 = "a8000000TO", sw_2 = "ac000000TO", - sdl_2 = mips64 and "b0000000TO", - sdr_2 = mips64 and "b1000000TO", - swr_2 = "b8000000TO", - cache_2 = "bc000000NO", - ll_2 = "c0000000TO", lwc1_2 = "c4000000HO", - pref_2 = "cc000000NO", ldc1_2 = "d4000000HO", ld_2 = mips64 and "dc000000TO", - sc_2 = "e0000000TO", swc1_2 = "e4000000HO", - scd_2 = mips64 and "f0000000TO", sdc1_2 = "f4000000HO", sd_2 = mips64 and "fc000000TO", @@ -289,10 +271,6 @@ local map_op = { nop_0 = "00000000", sll_3 = "00000000DTA", sextw_2 = "00000000DT", - movf_2 = "00000001DS", - movf_3 = "00000001DSC", - movt_2 = "00010001DS", - movt_3 = "00010001DSC", srl_3 = "00000002DTA", rotr_3 = "00200002DTA", sra_3 = "00000003DTA", @@ -301,31 +279,16 @@ local map_op = { rotrv_3 = "00000046DTS", drotrv_3 = mips64 and "00000056DTS", srav_3 = "00000007DTS", - jr_1 = "00000008S", jalr_1 = "0000f809S", jalr_2 = "00000009DS", - movz_3 = "0000000aDST", - movn_3 = "0000000bDST", syscall_0 = "0000000c", syscall_1 = "0000000cY", break_0 = "0000000d", break_1 = "0000000dY", sync_0 = "0000000f", - mfhi_1 = "00000010D", - mthi_1 = "00000011S", - mflo_1 = "00000012D", - mtlo_1 = "00000013S", dsllv_3 = mips64 and "00000014DTS", dsrlv_3 = mips64 and "00000016DTS", dsrav_3 = mips64 and "00000017DTS", - mult_2 = "00000018ST", - multu_2 = "00000019ST", - div_2 = "0000001aST", - divu_2 = "0000001bST", - dmult_2 = mips64 and "0000001cST", - dmultu_2 = mips64 and "0000001dST", - ddiv_2 = mips64 and "0000001eST", - ddivu_2 = mips64 and "0000001fST", add_3 = "00000020DST", move_2 = mips64 and "00000025DS" or "00000021DS", addu_3 = "00000021DST", @@ -369,32 +332,9 @@ local map_op = { bgez_2 = "04010000SB", bltzl_2 = "04020000SB", bgezl_2 = "04030000SB", - tgei_2 = "04080000SI", - tgeiu_2 = "04090000SI", - tlti_2 = "040a0000SI", - tltiu_2 = "040b0000SI", - teqi_2 = "040c0000SI", - tnei_2 = "040e0000SI", - bltzal_2 = "04100000SB", bal_1 = "04110000B", - bgezal_2 = "04110000SB", - bltzall_2 = "04120000SB", - bgezall_2 = "04130000SB", synci_1 = "041f0000O", - -- Opcode SPECIAL2. - madd_2 = "70000000ST", - maddu_2 = "70000001ST", - mul_3 = "70000002DST", - msub_2 = "70000004ST", - msubu_2 = "70000005ST", - clz_2 = "70000020DS=", - clo_2 = "70000021DS=", - dclz_2 = mips64 and "70000024DS=", - dclo_2 = mips64 and "70000025DS=", - sdbbp_0 = "7000003f", - sdbbp_1 = "7000003fY", - -- Opcode SPECIAL3. ext_4 = "7c000000TSAM", -- Note: last arg is msbd = size-1 dextm_4 = mips64 and "7c000001TSAM", -- Args: pos | size-1-32 @@ -445,15 +385,6 @@ local map_op = { ctc1_2 = "44c00000TG", mthc1_2 = "44e00000TG", - bc1f_1 = "45000000B", - bc1f_2 = "45000000CB", - bc1t_1 = "45010000B", - bc1t_2 = "45010000CB", - bc1fl_1 = "45020000B", - bc1fl_2 = "45020000CB", - bc1tl_1 = "45030000B", - bc1tl_2 = "45030000CB", - ["add.s_3"] = "46000000FGH", ["sub.s_3"] = "46000001FGH", ["mul.s_3"] = "46000002FGH", @@ -470,51 +401,11 @@ local map_op = { ["trunc.w.s_2"] = "4600000dFG", ["ceil.w.s_2"] = "4600000eFG", ["floor.w.s_2"] = "4600000fFG", - ["movf.s_2"] = "46000011FG", - ["movf.s_3"] = "46000011FGC", - ["movt.s_2"] = "46010011FG", - ["movt.s_3"] = "46010011FGC", - ["movz.s_3"] = "46000012FGT", - ["movn.s_3"] = "46000013FGT", ["recip.s_2"] = "46000015FG", ["rsqrt.s_2"] = "46000016FG", ["cvt.d.s_2"] = "46000021FG", ["cvt.w.s_2"] = "46000024FG", ["cvt.l.s_2"] = "46000025FG", - ["cvt.ps.s_3"] = "46000026FGH", - ["c.f.s_2"] = "46000030GH", - ["c.f.s_3"] = "46000030VGH", - ["c.un.s_2"] = "46000031GH", - ["c.un.s_3"] = "46000031VGH", - ["c.eq.s_2"] = "46000032GH", - ["c.eq.s_3"] = "46000032VGH", - ["c.ueq.s_2"] = "46000033GH", - ["c.ueq.s_3"] = "46000033VGH", - ["c.olt.s_2"] = "46000034GH", - ["c.olt.s_3"] = "46000034VGH", - ["c.ult.s_2"] = "46000035GH", - ["c.ult.s_3"] = "46000035VGH", - ["c.ole.s_2"] = "46000036GH", - ["c.ole.s_3"] = "46000036VGH", - ["c.ule.s_2"] = "46000037GH", - ["c.ule.s_3"] = "46000037VGH", - ["c.sf.s_2"] = "46000038GH", - ["c.sf.s_3"] = "46000038VGH", - ["c.ngle.s_2"] = "46000039GH", - ["c.ngle.s_3"] = "46000039VGH", - ["c.seq.s_2"] = "4600003aGH", - ["c.seq.s_3"] = "4600003aVGH", - ["c.ngl.s_2"] = "4600003bGH", - ["c.ngl.s_3"] = "4600003bVGH", - ["c.lt.s_2"] = "4600003cGH", - ["c.lt.s_3"] = "4600003cVGH", - ["c.nge.s_2"] = "4600003dGH", - ["c.nge.s_3"] = "4600003dVGH", - ["c.le.s_2"] = "4600003eGH", - ["c.le.s_3"] = "4600003eVGH", - ["c.ngt.s_2"] = "4600003fGH", - ["c.ngt.s_3"] = "4600003fVGH", - ["add.d_3"] = "46200000FGH", ["sub.d_3"] = "46200001FGH", ["mul.d_3"] = "46200002FGH", @@ -531,130 +422,410 @@ local map_op = { ["trunc.w.d_2"] = "4620000dFG", ["ceil.w.d_2"] = "4620000eFG", ["floor.w.d_2"] = "4620000fFG", - ["movf.d_2"] = "46200011FG", - ["movf.d_3"] = "46200011FGC", - ["movt.d_2"] = "46210011FG", - ["movt.d_3"] = "46210011FGC", - ["movz.d_3"] = "46200012FGT", - ["movn.d_3"] = "46200013FGT", ["recip.d_2"] = "46200015FG", ["rsqrt.d_2"] = "46200016FG", ["cvt.s.d_2"] = "46200020FG", ["cvt.w.d_2"] = "46200024FG", ["cvt.l.d_2"] = "46200025FG", - ["c.f.d_2"] = "46200030GH", - ["c.f.d_3"] = "46200030VGH", - ["c.un.d_2"] = "46200031GH", - ["c.un.d_3"] = "46200031VGH", - ["c.eq.d_2"] = "46200032GH", - ["c.eq.d_3"] = "46200032VGH", - ["c.ueq.d_2"] = "46200033GH", - ["c.ueq.d_3"] = "46200033VGH", - ["c.olt.d_2"] = "46200034GH", - ["c.olt.d_3"] = "46200034VGH", - ["c.ult.d_2"] = "46200035GH", - ["c.ult.d_3"] = "46200035VGH", - ["c.ole.d_2"] = "46200036GH", - ["c.ole.d_3"] = "46200036VGH", - ["c.ule.d_2"] = "46200037GH", - ["c.ule.d_3"] = "46200037VGH", - ["c.sf.d_2"] = "46200038GH", - ["c.sf.d_3"] = "46200038VGH", - ["c.ngle.d_2"] = "46200039GH", - ["c.ngle.d_3"] = "46200039VGH", - ["c.seq.d_2"] = "4620003aGH", - ["c.seq.d_3"] = "4620003aVGH", - ["c.ngl.d_2"] = "4620003bGH", - ["c.ngl.d_3"] = "4620003bVGH", - ["c.lt.d_2"] = "4620003cGH", - ["c.lt.d_3"] = "4620003cVGH", - ["c.nge.d_2"] = "4620003dGH", - ["c.nge.d_3"] = "4620003dVGH", - ["c.le.d_2"] = "4620003eGH", - ["c.le.d_3"] = "4620003eVGH", - ["c.ngt.d_2"] = "4620003fGH", - ["c.ngt.d_3"] = "4620003fVGH", - - ["add.ps_3"] = "46c00000FGH", - ["sub.ps_3"] = "46c00001FGH", - ["mul.ps_3"] = "46c00002FGH", - ["abs.ps_2"] = "46c00005FG", - ["mov.ps_2"] = "46c00006FG", - ["neg.ps_2"] = "46c00007FG", - ["movf.ps_2"] = "46c00011FG", - ["movf.ps_3"] = "46c00011FGC", - ["movt.ps_2"] = "46c10011FG", - ["movt.ps_3"] = "46c10011FGC", - ["movz.ps_3"] = "46c00012FGT", - ["movn.ps_3"] = "46c00013FGT", - ["cvt.s.pu_2"] = "46c00020FG", - ["cvt.s.pl_2"] = "46c00028FG", - ["pll.ps_3"] = "46c0002cFGH", - ["plu.ps_3"] = "46c0002dFGH", - ["pul.ps_3"] = "46c0002eFGH", - ["puu.ps_3"] = "46c0002fFGH", - ["c.f.ps_2"] = "46c00030GH", - ["c.f.ps_3"] = "46c00030VGH", - ["c.un.ps_2"] = "46c00031GH", - ["c.un.ps_3"] = "46c00031VGH", - ["c.eq.ps_2"] = "46c00032GH", - ["c.eq.ps_3"] = "46c00032VGH", - ["c.ueq.ps_2"] = "46c00033GH", - ["c.ueq.ps_3"] = "46c00033VGH", - ["c.olt.ps_2"] = "46c00034GH", - ["c.olt.ps_3"] = "46c00034VGH", - ["c.ult.ps_2"] = "46c00035GH", - ["c.ult.ps_3"] = "46c00035VGH", - ["c.ole.ps_2"] = "46c00036GH", - ["c.ole.ps_3"] = "46c00036VGH", - ["c.ule.ps_2"] = "46c00037GH", - ["c.ule.ps_3"] = "46c00037VGH", - ["c.sf.ps_2"] = "46c00038GH", - ["c.sf.ps_3"] = "46c00038VGH", - ["c.ngle.ps_2"] = "46c00039GH", - ["c.ngle.ps_3"] = "46c00039VGH", - ["c.seq.ps_2"] = "46c0003aGH", - ["c.seq.ps_3"] = "46c0003aVGH", - ["c.ngl.ps_2"] = "46c0003bGH", - ["c.ngl.ps_3"] = "46c0003bVGH", - ["c.lt.ps_2"] = "46c0003cGH", - ["c.lt.ps_3"] = "46c0003cVGH", - ["c.nge.ps_2"] = "46c0003dGH", - ["c.nge.ps_3"] = "46c0003dVGH", - ["c.le.ps_2"] = "46c0003eGH", - ["c.le.ps_3"] = "46c0003eVGH", - ["c.ngt.ps_2"] = "46c0003fGH", - ["c.ngt.ps_3"] = "46c0003fVGH", - ["cvt.s.w_2"] = "46800020FG", ["cvt.d.w_2"] = "46800021FG", - ["cvt.s.l_2"] = "46a00020FG", ["cvt.d.l_2"] = "46a00021FG", - - -- Opcode COP1X. - lwxc1_2 = "4c000000FX", - ldxc1_2 = "4c000001FX", - luxc1_2 = "4c000005FX", - swxc1_2 = "4c000008FX", - sdxc1_2 = "4c000009FX", - suxc1_2 = "4c00000dFX", - prefx_2 = "4c00000fMX", - ["alnv.ps_4"] = "4c00001eFGHS", - ["madd.s_4"] = "4c000020FRGH", - ["madd.d_4"] = "4c000021FRGH", - ["madd.ps_4"] = "4c000026FRGH", - ["msub.s_4"] = "4c000028FRGH", - ["msub.d_4"] = "4c000029FRGH", - ["msub.ps_4"] = "4c00002eFRGH", - ["nmadd.s_4"] = "4c000030FRGH", - ["nmadd.d_4"] = "4c000031FRGH", - ["nmadd.ps_4"] = "4c000036FRGH", - ["nmsub.s_4"] = "4c000038FRGH", - ["nmsub.d_4"] = "4c000039FRGH", - ["nmsub.ps_4"] = "4c00003eFRGH", } +if mipsr6 then -- Instructions added with MIPSR6. + + for k,v in pairs({ + + -- Add immediate to upper bits. + aui_3 = "3c000000TSI", + daui_3 = mips64 and "74000000TSI", + dahi_2 = mips64 and "04060000SI", + dati_2 = mips64 and "041e0000SI", + + -- TODO: addiupc, auipc, aluipc, lwpc, lwupc, ldpc. + + -- Compact branches. + blezalc_2 = "18000000TB", -- rt != 0. + bgezalc_2 = "18000000T=SB", -- rt != 0. + bgtzalc_2 = "1c000000TB", -- rt != 0. + bltzalc_2 = "1c000000T=SB", -- rt != 0. + + blezc_2 = "58000000TB", -- rt != 0. + bgezc_2 = "58000000T=SB", -- rt != 0. + bgec_3 = "58000000STB", -- rs != rt. + blec_3 = "58000000TSB", -- rt != rs. + + bgtzc_2 = "5c000000TB", -- rt != 0. + bltzc_2 = "5c000000T=SB", -- rt != 0. + bltc_3 = "5c000000STB", -- rs != rt. + bgtc_3 = "5c000000TSB", -- rt != rs. + + bgeuc_3 = "18000000STB", -- rs != rt. + bleuc_3 = "18000000TSB", -- rt != rs. + bltuc_3 = "1c000000STB", -- rs != rt. + bgtuc_3 = "1c000000TSB", -- rt != rs. + + beqzalc_2 = "20000000TB", -- rt != 0. + bnezalc_2 = "60000000TB", -- rt != 0. + beqc_3 = "20000000STB", -- rs < rt. + bnec_3 = "60000000STB", -- rs < rt. + bovc_3 = "20000000STB", -- rs >= rt. + bnvc_3 = "60000000STB", -- rs >= rt. + + beqzc_2 = "d8000000SK", -- rs != 0. + bnezc_2 = "f8000000SK", -- rs != 0. + jic_2 = "d8000000TI", + jialc_2 = "f8000000TI", + bc_1 = "c8000000L", + balc_1 = "e8000000L", + + -- Opcode SPECIAL. + jr_1 = "00000009S", + sdbbp_0 = "0000000e", + sdbbp_1 = "0000000eY", + lsa_4 = "00000005DSTA", + dlsa_4 = mips64 and "00000015DSTA", + seleqz_3 = "00000035DST", + selnez_3 = "00000037DST", + clz_2 = "00000050DS", + clo_2 = "00000051DS", + dclz_2 = mips64 and "00000052DS", + dclo_2 = mips64 and "00000053DS", + mul_3 = "00000098DST", + muh_3 = "000000d8DST", + mulu_3 = "00000099DST", + muhu_3 = "000000d9DST", + div_3 = "0000009aDST", + mod_3 = "000000daDST", + divu_3 = "0000009bDST", + modu_3 = "000000dbDST", + dmul_3 = mips64 and "0000009cDST", + dmuh_3 = mips64 and "000000dcDST", + dmulu_3 = mips64 and "0000009dDST", + dmuhu_3 = mips64 and "000000ddDST", + ddiv_3 = mips64 and "0000009eDST", + dmod_3 = mips64 and "000000deDST", + ddivu_3 = mips64 and "0000009fDST", + dmodu_3 = mips64 and "000000dfDST", + + -- Opcode SPECIAL3. + align_4 = "7c000220DSTA", + dalign_4 = mips64 and "7c000224DSTA", + bitswap_2 = "7c000020DT", + dbitswap_2 = mips64 and "7c000024DT", + + -- Opcode COP1. + bc1eqz_2 = "45200000HB", + bc1nez_2 = "45a00000HB", + + ["sel.s_3"] = "46000010FGH", + ["seleqz.s_3"] = "46000014FGH", + ["selnez.s_3"] = "46000017FGH", + ["maddf.s_3"] = "46000018FGH", + ["msubf.s_3"] = "46000019FGH", + ["rint.s_2"] = "4600001aFG", + ["class.s_2"] = "4600001bFG", + ["min.s_3"] = "4600001cFGH", + ["mina.s_3"] = "4600001dFGH", + ["max.s_3"] = "4600001eFGH", + ["maxa.s_3"] = "4600001fFGH", + ["cmp.af.s_3"] = "46800000FGH", + ["cmp.un.s_3"] = "46800001FGH", + ["cmp.or.s_3"] = "46800011FGH", + ["cmp.eq.s_3"] = "46800002FGH", + ["cmp.une.s_3"] = "46800012FGH", + ["cmp.ueq.s_3"] = "46800003FGH", + ["cmp.ne.s_3"] = "46800013FGH", + ["cmp.lt.s_3"] = "46800004FGH", + ["cmp.ult.s_3"] = "46800005FGH", + ["cmp.le.s_3"] = "46800006FGH", + ["cmp.ule.s_3"] = "46800007FGH", + ["cmp.saf.s_3"] = "46800008FGH", + ["cmp.sun.s_3"] = "46800009FGH", + ["cmp.sor.s_3"] = "46800019FGH", + ["cmp.seq.s_3"] = "4680000aFGH", + ["cmp.sune.s_3"] = "4680001aFGH", + ["cmp.sueq.s_3"] = "4680000bFGH", + ["cmp.sne.s_3"] = "4680001bFGH", + ["cmp.slt.s_3"] = "4680000cFGH", + ["cmp.sult.s_3"] = "4680000dFGH", + ["cmp.sle.s_3"] = "4680000eFGH", + ["cmp.sule.s_3"] = "4680000fFGH", + + ["sel.d_3"] = "46200010FGH", + ["seleqz.d_3"] = "46200014FGH", + ["selnez.d_3"] = "46200017FGH", + ["maddf.d_3"] = "46200018FGH", + ["msubf.d_3"] = "46200019FGH", + ["rint.d_2"] = "4620001aFG", + ["class.d_2"] = "4620001bFG", + ["min.d_3"] = "4620001cFGH", + ["mina.d_3"] = "4620001dFGH", + ["max.d_3"] = "4620001eFGH", + ["maxa.d_3"] = "4620001fFGH", + ["cmp.af.d_3"] = "46a00000FGH", + ["cmp.un.d_3"] = "46a00001FGH", + ["cmp.or.d_3"] = "46a00011FGH", + ["cmp.eq.d_3"] = "46a00002FGH", + ["cmp.une.d_3"] = "46a00012FGH", + ["cmp.ueq.d_3"] = "46a00003FGH", + ["cmp.ne.d_3"] = "46a00013FGH", + ["cmp.lt.d_3"] = "46a00004FGH", + ["cmp.ult.d_3"] = "46a00005FGH", + ["cmp.le.d_3"] = "46a00006FGH", + ["cmp.ule.d_3"] = "46a00007FGH", + ["cmp.saf.d_3"] = "46a00008FGH", + ["cmp.sun.d_3"] = "46a00009FGH", + ["cmp.sor.d_3"] = "46a00019FGH", + ["cmp.seq.d_3"] = "46a0000aFGH", + ["cmp.sune.d_3"] = "46a0001aFGH", + ["cmp.sueq.d_3"] = "46a0000bFGH", + ["cmp.sne.d_3"] = "46a0001bFGH", + ["cmp.slt.d_3"] = "46a0000cFGH", + ["cmp.sult.d_3"] = "46a0000dFGH", + ["cmp.sle.d_3"] = "46a0000eFGH", + ["cmp.sule.d_3"] = "46a0000fFGH", + + }) do map_op[k] = v end + +else -- Instructions removed by MIPSR6. + + for k,v in pairs({ + -- Traps, don't use. + addi_3 = "20000000TSI", + daddi_3 = mips64 and "60000000TSI", + + -- Branch on likely, don't use. + beqzl_2 = "50000000SB", + beql_3 = "50000000STB", + bnezl_2 = "54000000SB", + bnel_3 = "54000000STB", + blezl_2 = "58000000SB", + bgtzl_2 = "5c000000SB", + + lwl_2 = "88000000TO", + lwr_2 = "98000000TO", + swl_2 = "a8000000TO", + sdl_2 = mips64 and "b0000000TO", + sdr_2 = mips64 and "b1000000TO", + swr_2 = "b8000000TO", + cache_2 = "bc000000NO", + ll_2 = "c0000000TO", + pref_2 = "cc000000NO", + sc_2 = "e0000000TO", + scd_2 = mips64 and "f0000000TO", + + -- Opcode SPECIAL. + movf_2 = "00000001DS", + movf_3 = "00000001DSC", + movt_2 = "00010001DS", + movt_3 = "00010001DSC", + jr_1 = "00000008S", + movz_3 = "0000000aDST", + movn_3 = "0000000bDST", + mfhi_1 = "00000010D", + mthi_1 = "00000011S", + mflo_1 = "00000012D", + mtlo_1 = "00000013S", + mult_2 = "00000018ST", + multu_2 = "00000019ST", + div_3 = "0000001aST", + divu_3 = "0000001bST", + ddiv_3 = mips64 and "0000001eST", + ddivu_3 = mips64 and "0000001fST", + dmult_2 = mips64 and "0000001cST", + dmultu_2 = mips64 and "0000001dST", + + -- Opcode REGIMM. + tgei_2 = "04080000SI", + tgeiu_2 = "04090000SI", + tlti_2 = "040a0000SI", + tltiu_2 = "040b0000SI", + teqi_2 = "040c0000SI", + tnei_2 = "040e0000SI", + bltzal_2 = "04100000SB", + bgezal_2 = "04110000SB", + bltzall_2 = "04120000SB", + bgezall_2 = "04130000SB", + + -- Opcode SPECIAL2. + madd_2 = "70000000ST", + maddu_2 = "70000001ST", + mul_3 = "70000002DST", + msub_2 = "70000004ST", + msubu_2 = "70000005ST", + clz_2 = "70000020D=TS", + clo_2 = "70000021D=TS", + dclz_2 = mips64 and "70000024D=TS", + dclo_2 = mips64 and "70000025D=TS", + sdbbp_0 = "7000003f", + sdbbp_1 = "7000003fY", + + -- Opcode COP1. + bc1f_1 = "45000000B", + bc1f_2 = "45000000CB", + bc1t_1 = "45010000B", + bc1t_2 = "45010000CB", + bc1fl_1 = "45020000B", + bc1fl_2 = "45020000CB", + bc1tl_1 = "45030000B", + bc1tl_2 = "45030000CB", + + ["movf.s_2"] = "46000011FG", + ["movf.s_3"] = "46000011FGC", + ["movt.s_2"] = "46010011FG", + ["movt.s_3"] = "46010011FGC", + ["movz.s_3"] = "46000012FGT", + ["movn.s_3"] = "46000013FGT", + ["cvt.ps.s_3"] = "46000026FGH", + ["c.f.s_2"] = "46000030GH", + ["c.f.s_3"] = "46000030VGH", + ["c.un.s_2"] = "46000031GH", + ["c.un.s_3"] = "46000031VGH", + ["c.eq.s_2"] = "46000032GH", + ["c.eq.s_3"] = "46000032VGH", + ["c.ueq.s_2"] = "46000033GH", + ["c.ueq.s_3"] = "46000033VGH", + ["c.olt.s_2"] = "46000034GH", + ["c.olt.s_3"] = "46000034VGH", + ["c.ult.s_2"] = "46000035GH", + ["c.ult.s_3"] = "46000035VGH", + ["c.ole.s_2"] = "46000036GH", + ["c.ole.s_3"] = "46000036VGH", + ["c.ule.s_2"] = "46000037GH", + ["c.ule.s_3"] = "46000037VGH", + ["c.sf.s_2"] = "46000038GH", + ["c.sf.s_3"] = "46000038VGH", + ["c.ngle.s_2"] = "46000039GH", + ["c.ngle.s_3"] = "46000039VGH", + ["c.seq.s_2"] = "4600003aGH", + ["c.seq.s_3"] = "4600003aVGH", + ["c.ngl.s_2"] = "4600003bGH", + ["c.ngl.s_3"] = "4600003bVGH", + ["c.lt.s_2"] = "4600003cGH", + ["c.lt.s_3"] = "4600003cVGH", + ["c.nge.s_2"] = "4600003dGH", + ["c.nge.s_3"] = "4600003dVGH", + ["c.le.s_2"] = "4600003eGH", + ["c.le.s_3"] = "4600003eVGH", + ["c.ngt.s_2"] = "4600003fGH", + ["c.ngt.s_3"] = "4600003fVGH", + ["movf.d_2"] = "46200011FG", + ["movf.d_3"] = "46200011FGC", + ["movt.d_2"] = "46210011FG", + ["movt.d_3"] = "46210011FGC", + ["movz.d_3"] = "46200012FGT", + ["movn.d_3"] = "46200013FGT", + ["c.f.d_2"] = "46200030GH", + ["c.f.d_3"] = "46200030VGH", + ["c.un.d_2"] = "46200031GH", + ["c.un.d_3"] = "46200031VGH", + ["c.eq.d_2"] = "46200032GH", + ["c.eq.d_3"] = "46200032VGH", + ["c.ueq.d_2"] = "46200033GH", + ["c.ueq.d_3"] = "46200033VGH", + ["c.olt.d_2"] = "46200034GH", + ["c.olt.d_3"] = "46200034VGH", + ["c.ult.d_2"] = "46200035GH", + ["c.ult.d_3"] = "46200035VGH", + ["c.ole.d_2"] = "46200036GH", + ["c.ole.d_3"] = "46200036VGH", + ["c.ule.d_2"] = "46200037GH", + ["c.ule.d_3"] = "46200037VGH", + ["c.sf.d_2"] = "46200038GH", + ["c.sf.d_3"] = "46200038VGH", + ["c.ngle.d_2"] = "46200039GH", + ["c.ngle.d_3"] = "46200039VGH", + ["c.seq.d_2"] = "4620003aGH", + ["c.seq.d_3"] = "4620003aVGH", + ["c.ngl.d_2"] = "4620003bGH", + ["c.ngl.d_3"] = "4620003bVGH", + ["c.lt.d_2"] = "4620003cGH", + ["c.lt.d_3"] = "4620003cVGH", + ["c.nge.d_2"] = "4620003dGH", + ["c.nge.d_3"] = "4620003dVGH", + ["c.le.d_2"] = "4620003eGH", + ["c.le.d_3"] = "4620003eVGH", + ["c.ngt.d_2"] = "4620003fGH", + ["c.ngt.d_3"] = "4620003fVGH", + ["add.ps_3"] = "46c00000FGH", + ["sub.ps_3"] = "46c00001FGH", + ["mul.ps_3"] = "46c00002FGH", + ["abs.ps_2"] = "46c00005FG", + ["mov.ps_2"] = "46c00006FG", + ["neg.ps_2"] = "46c00007FG", + ["movf.ps_2"] = "46c00011FG", + ["movf.ps_3"] = "46c00011FGC", + ["movt.ps_2"] = "46c10011FG", + ["movt.ps_3"] = "46c10011FGC", + ["movz.ps_3"] = "46c00012FGT", + ["movn.ps_3"] = "46c00013FGT", + ["cvt.s.pu_2"] = "46c00020FG", + ["cvt.s.pl_2"] = "46c00028FG", + ["pll.ps_3"] = "46c0002cFGH", + ["plu.ps_3"] = "46c0002dFGH", + ["pul.ps_3"] = "46c0002eFGH", + ["puu.ps_3"] = "46c0002fFGH", + ["c.f.ps_2"] = "46c00030GH", + ["c.f.ps_3"] = "46c00030VGH", + ["c.un.ps_2"] = "46c00031GH", + ["c.un.ps_3"] = "46c00031VGH", + ["c.eq.ps_2"] = "46c00032GH", + ["c.eq.ps_3"] = "46c00032VGH", + ["c.ueq.ps_2"] = "46c00033GH", + ["c.ueq.ps_3"] = "46c00033VGH", + ["c.olt.ps_2"] = "46c00034GH", + ["c.olt.ps_3"] = "46c00034VGH", + ["c.ult.ps_2"] = "46c00035GH", + ["c.ult.ps_3"] = "46c00035VGH", + ["c.ole.ps_2"] = "46c00036GH", + ["c.ole.ps_3"] = "46c00036VGH", + ["c.ule.ps_2"] = "46c00037GH", + ["c.ule.ps_3"] = "46c00037VGH", + ["c.sf.ps_2"] = "46c00038GH", + ["c.sf.ps_3"] = "46c00038VGH", + ["c.ngle.ps_2"] = "46c00039GH", + ["c.ngle.ps_3"] = "46c00039VGH", + ["c.seq.ps_2"] = "46c0003aGH", + ["c.seq.ps_3"] = "46c0003aVGH", + ["c.ngl.ps_2"] = "46c0003bGH", + ["c.ngl.ps_3"] = "46c0003bVGH", + ["c.lt.ps_2"] = "46c0003cGH", + ["c.lt.ps_3"] = "46c0003cVGH", + ["c.nge.ps_2"] = "46c0003dGH", + ["c.nge.ps_3"] = "46c0003dVGH", + ["c.le.ps_2"] = "46c0003eGH", + ["c.le.ps_3"] = "46c0003eVGH", + ["c.ngt.ps_2"] = "46c0003fGH", + ["c.ngt.ps_3"] = "46c0003fVGH", + + -- Opcode COP1X. + lwxc1_2 = "4c000000FX", + ldxc1_2 = "4c000001FX", + luxc1_2 = "4c000005FX", + swxc1_2 = "4c000008FX", + sdxc1_2 = "4c000009FX", + suxc1_2 = "4c00000dFX", + prefx_2 = "4c00000fMX", + ["alnv.ps_4"] = "4c00001eFGHS", + ["madd.s_4"] = "4c000020FRGH", + ["madd.d_4"] = "4c000021FRGH", + ["madd.ps_4"] = "4c000026FRGH", + ["msub.s_4"] = "4c000028FRGH", + ["msub.d_4"] = "4c000029FRGH", + ["msub.ps_4"] = "4c00002eFRGH", + ["nmadd.s_4"] = "4c000030FRGH", + ["nmadd.d_4"] = "4c000031FRGH", + ["nmadd.ps_4"] = "4c000036FRGH", + ["nmsub.s_4"] = "4c000038FRGH", + ["nmsub.d_4"] = "4c000039FRGH", + ["nmsub.ps_4"] = "4c00003eFRGH", + + }) do map_op[k] = v end + +end + ------------------------------------------------------------------------------ local function parse_gpr(expr) @@ -808,9 +979,11 @@ map_op[".template__"] = function(params, template, nparams) op = op + parse_disp(params[n]); n = n + 1 elseif p == "X" then op = op + parse_index(params[n]); n = n + 1 - elseif p == "B" or p == "J" then + elseif p == "B" or p == "J" or p == "K" or p == "L" then local mode, m, s = parse_label(params[n], false) - if p == "B" then m = m + 2048 end + if p == "J" then m = m + 0xa800 + elseif p == "K" then m = m + 0x5000 + elseif p == "L" then m = m + 0xa000 end waction("REL_"..mode, m, s, 1) n = n + 1 elseif p == "A" then @@ -833,7 +1006,7 @@ map_op[".template__"] = function(params, template, nparams) elseif p == "Z" then op = op + parse_imm(params[n], 10, 6, 0, false); n = n + 1 elseif p == "=" then - op = op + shl(band(op, 0xf800), 5) -- Copy D to T for clz, clo. + n = n - 1 -- Re-use previous parameter for next template char. else assert(false) end diff --git a/dynasm/dynasm.lua b/dynasm/dynasm.lua index 5ec21a79..46ebfca8 100644 --- a/dynasm/dynasm.lua +++ b/dynasm/dynasm.lua @@ -630,6 +630,7 @@ end -- Load architecture-specific module. local function loadarch(arch) if not match(arch, "^[%w_]+$") then return "bad arch name" end + _G._map_def = map_def local ok, m_arch = pcall(require, "dasm_"..arch) if not ok then return "cannot load module: "..m_arch end g_arch = m_arch diff --git a/src/Makefile.original b/src/Makefile.original index aedaaa73..22d36a27 100644 --- a/src/Makefile.original +++ b/src/Makefile.original @@ -455,6 +455,9 @@ ifeq (arm,$(TARGET_LJARCH)) DASM_AFLAGS+= -D IOS endif else +ifneq (,$(findstring LJ_TARGET_MIPSR6 ,$(TARGET_TESTARCH))) + DASM_AFLAGS+= -D MIPSR6 +endif ifeq (ppc,$(TARGET_LJARCH)) ifneq (,$(findstring LJ_ARCH_SQRT 1,$(TARGET_TESTARCH))) DASM_AFLAGS+= -D SQRT diff --git a/src/jit/bcsave.lua b/src/jit/bcsave.lua index 2553d97e..41081184 100644 --- a/src/jit/bcsave.lua +++ b/src/jit/bcsave.lua @@ -17,6 +17,10 @@ local bit = require("bit") -- Symbol name prefix for LuaJIT bytecode. local LJBC_PREFIX = "luaJIT_BC_" +local type, assert = type, assert +local format = string.format +local tremove, tconcat = table.remove, table.concat + ------------------------------------------------------------------------------ local function usage() @@ -63,8 +67,18 @@ local map_type = { } local map_arch = { - x86 = true, x64 = true, arm = true, arm64 = true, arm64be = true, - ppc = true, mips = true, mipsel = true, + x86 = { e = "le", b = 32, m = 3, p = 0x14c, }, + x64 = { e = "le", b = 64, m = 62, p = 0x8664, }, + arm = { e = "le", b = 32, m = 40, p = 0x1c0, }, + arm64 = { e = "le", b = 64, m = 183, p = 0xaa64, }, + arm64be = { e = "be", b = 64, m = 183, }, + ppc = { e = "be", b = 32, m = 20, }, + mips = { e = "be", b = 32, m = 8, f = 0x50001006, }, + mipsel = { e = "le", b = 32, m = 8, f = 0x50001006, }, + mips64 = { e = "be", b = 64, m = 8, f = 0x80000007, }, + mips64el = { e = "le", b = 64, m = 8, f = 0x80000007, }, + mips64r6 = { e = "be", b = 64, m = 8, f = 0xa0000407, }, + mips64r6el = { e = "le", b = 64, m = 8, f = 0xa0000407, }, } local map_os = { @@ -73,33 +87,33 @@ local map_os = { } local function checkarg(str, map, err) - str = string.lower(str) + str = str:lower() local s = check(map[str], "unknown ", err) - return s == true and str or s + return type(s) == "string" and s or str end local function detecttype(str) - local ext = string.match(string.lower(str), "%.(%a+)$") + local ext = str:lower():match("%.(%a+)$") return map_type[ext] or "raw" end local function checkmodname(str) - check(string.match(str, "^[%w_.%-]+$"), "bad module name") - return string.gsub(str, "[%.%-]", "_") + check(str:match("^[%w_.%-]+$"), "bad module name") + return str:gsub("[%.%-]", "_") end local function detectmodname(str) if type(str) == "string" then - local tail = string.match(str, "[^/\\]+$") + local tail = str:match("[^/\\]+$") if tail then str = tail end - local head = string.match(str, "^(.*)%.[^.]*$") + local head = str:match("^(.*)%.[^.]*$") if head then str = head end - str = string.match(str, "^[%w_.%-]+") + str = str:match("^[%w_.%-]+") else str = nil end check(str, "cannot derive module name, use -n name") - return string.gsub(str, "[%.%-]", "_") + return str:gsub("[%.%-]", "_") end ------------------------------------------------------------------------------ @@ -118,7 +132,7 @@ end local function bcsave_c(ctx, output, s) local fp = savefile(output, "w") if ctx.type == "c" then - fp:write(string.format([[ + fp:write(format([[ #ifdef _cplusplus extern "C" #endif @@ -128,7 +142,7 @@ __declspec(dllexport) const unsigned char %s%s[] = { ]], LJBC_PREFIX, ctx.modname)) else - fp:write(string.format([[ + fp:write(format([[ #define %s%s_SIZE %d static const unsigned char %s%s[] = { ]], LJBC_PREFIX, ctx.modname, #s, LJBC_PREFIX, ctx.modname)) @@ -138,13 +152,13 @@ static const unsigned char %s%s[] = { local b = tostring(string.byte(s, i)) m = m + #b + 1 if m > 78 then - fp:write(table.concat(t, ",", 1, n), ",\n") + fp:write(tconcat(t, ",", 1, n), ",\n") n, m = 0, #b + 1 end n = n + 1 t[n] = b end - bcsave_tail(fp, output, table.concat(t, ",", 1, n).."\n};\n") + bcsave_tail(fp, output, tconcat(t, ",", 1, n).."\n};\n") end local function bcsave_elfobj(ctx, output, s, ffi) @@ -199,12 +213,8 @@ typedef struct { } ELF64obj; ]] local symname = LJBC_PREFIX..ctx.modname - local is64, isbe = false, false - if ctx.arch == "x64" or ctx.arch == "arm64" or ctx.arch == "arm64be" then - is64 = true - elseif ctx.arch == "ppc" or ctx.arch == "mips" then - isbe = true - end + local ai = assert(map_arch[ctx.arch]) + local is64, isbe = ai.b == 64, ai.e == "be" -- Handle different host/target endianess. local function f32(x) return x end @@ -237,10 +247,8 @@ typedef struct { hdr.eendian = isbe and 2 or 1 hdr.eversion = 1 hdr.type = f16(1) - hdr.machine = f16(({ x86=3, x64=62, arm=40, arm64=183, arm64be=183, ppc=20, mips=8, mipsel=8 })[ctx.arch]) - if ctx.arch == "mips" or ctx.arch == "mipsel" then - hdr.flags = f32(0x50001006) - end + hdr.machine = f16(ai.m) + hdr.flags = f32(ai.f or 0) hdr.version = f32(1) hdr.shofs = fofs(ffi.offsetof(o, "sect")) hdr.ehsize = f16(ffi.sizeof(hdr)) @@ -336,12 +344,8 @@ typedef struct { } PEobj; ]] local symname = LJBC_PREFIX..ctx.modname - local is64 = false - if ctx.arch == "x86" then - symname = "_"..symname - elseif ctx.arch == "x64" then - is64 = true - end + local ai = assert(map_arch[ctx.arch]) + local is64 = ai.b == 64 local symexport = " /EXPORT:"..symname..",DATA " -- The file format is always little-endian. Swap if the host is big-endian. @@ -355,7 +359,7 @@ typedef struct { -- Create PE object and fill in header. local o = ffi.new("PEobj") local hdr = o.hdr - hdr.arch = f16(({ x86=0x14c, x64=0x8664, arm=0x1c0, ppc=0x1f2, mips=0x366, mipsel=0x366 })[ctx.arch]) + hdr.arch = f16(assert(ai.p)) hdr.nsects = f16(2) hdr.symtabofs = f32(ffi.offsetof(o, "sym0")) hdr.nsyms = f32(6) @@ -605,16 +609,16 @@ local function docmd(...) local n = 1 local list = false local ctx = { - strip = true, arch = jit.arch, os = string.lower(jit.os), + strip = true, arch = jit.arch, os = jit.os:lower(), type = false, modname = false, } while n <= #arg do local a = arg[n] - if type(a) == "string" and string.sub(a, 1, 1) == "-" and a ~= "-" then - table.remove(arg, n) + if type(a) == "string" and a:sub(1, 1) == "-" and a ~= "-" then + tremove(arg, n) if a == "--" then break end for m=2,#a do - local opt = string.sub(a, m, m) + local opt = a:sub(m, m) if opt == "l" then list = true elseif opt == "s" then @@ -627,13 +631,13 @@ local function docmd(...) if n ~= 1 then usage() end arg[1] = check(loadstring(arg[1])) elseif opt == "n" then - ctx.modname = checkmodname(table.remove(arg, n)) + ctx.modname = checkmodname(tremove(arg, n)) elseif opt == "t" then - ctx.type = checkarg(table.remove(arg, n), map_type, "file type") + ctx.type = checkarg(tremove(arg, n), map_type, "file type") elseif opt == "a" then - ctx.arch = checkarg(table.remove(arg, n), map_arch, "architecture") + ctx.arch = checkarg(tremove(arg, n), map_arch, "architecture") elseif opt == "o" then - ctx.os = checkarg(table.remove(arg, n), map_os, "OS name") + ctx.os = checkarg(tremove(arg, n), map_os, "OS name") else usage() end diff --git a/src/jit/dis_mips.lua b/src/jit/dis_mips.lua index a12b8e62..c003b984 100644 --- a/src/jit/dis_mips.lua +++ b/src/jit/dis_mips.lua @@ -19,13 +19,34 @@ local band, bor, tohex = bit.band, bit.bor, bit.tohex local lshift, rshift, arshift = bit.lshift, bit.rshift, bit.arshift ------------------------------------------------------------------------------ --- Primary and extended opcode maps +-- Extended opcode maps common to all MIPS releases ------------------------------------------------------------------------------ -local map_movci = { shift = 16, mask = 1, [0] = "movfDSC", "movtDSC", } local map_srl = { shift = 21, mask = 1, [0] = "srlDTA", "rotrDTA", } local map_srlv = { shift = 6, mask = 1, [0] = "srlvDTS", "rotrvDTS", } +local map_cop0 = { + shift = 25, mask = 1, + [0] = { + shift = 21, mask = 15, + [0] = "mfc0TDW", [4] = "mtc0TDW", + [10] = "rdpgprDT", + [11] = { shift = 5, mask = 1, [0] = "diT0", "eiT0", }, + [14] = "wrpgprDT", + }, { + shift = 0, mask = 63, + [1] = "tlbr", [2] = "tlbwi", [6] = "tlbwr", [8] = "tlbp", + [24] = "eret", [31] = "deret", + [32] = "wait", + }, +} + +------------------------------------------------------------------------------ +-- Primary and extended opcode maps for MIPS R1-R5 +------------------------------------------------------------------------------ + +local map_movci = { shift = 16, mask = 1, [0] = "movfDSC", "movtDSC", } + local map_special = { shift = 0, mask = 63, [0] = { shift = 0, mask = -1, [0] = "nop", _ = "sllDTA" }, @@ -87,22 +108,6 @@ local map_regimm = { false, false, false, "synciSO", } -local map_cop0 = { - shift = 25, mask = 1, - [0] = { - shift = 21, mask = 15, - [0] = "mfc0TDW", [4] = "mtc0TDW", - [10] = "rdpgprDT", - [11] = { shift = 5, mask = 1, [0] = "diT0", "eiT0", }, - [14] = "wrpgprDT", - }, { - shift = 0, mask = 63, - [1] = "tlbr", [2] = "tlbwi", [6] = "tlbwr", [8] = "tlbp", - [24] = "eret", [31] = "deret", - [32] = "wait", - }, -} - local map_cop1s = { shift = 0, mask = 63, [0] = "add.sFGH", "sub.sFGH", "mul.sFGH", "div.sFGH", @@ -233,6 +238,208 @@ local map_pri = { false, "sdc1HSO", "sdc2TSO", "sdTSO", } +------------------------------------------------------------------------------ +-- Primary and extended opcode maps for MIPS R6 +------------------------------------------------------------------------------ + +local map_mul_r6 = { shift = 6, mask = 3, [2] = "mulDST", [3] = "muhDST" } +local map_mulu_r6 = { shift = 6, mask = 3, [2] = "muluDST", [3] = "muhuDST" } +local map_div_r6 = { shift = 6, mask = 3, [2] = "divDST", [3] = "modDST" } +local map_divu_r6 = { shift = 6, mask = 3, [2] = "divuDST", [3] = "moduDST" } +local map_dmul_r6 = { shift = 6, mask = 3, [2] = "dmulDST", [3] = "dmuhDST" } +local map_dmulu_r6 = { shift = 6, mask = 3, [2] = "dmuluDST", [3] = "dmuhuDST" } +local map_ddiv_r6 = { shift = 6, mask = 3, [2] = "ddivDST", [3] = "dmodDST" } +local map_ddivu_r6 = { shift = 6, mask = 3, [2] = "ddivuDST", [3] = "dmoduDST" } + +local map_special_r6 = { + shift = 0, mask = 63, + [0] = { shift = 0, mask = -1, [0] = "nop", _ = "sllDTA" }, + false, map_srl, "sraDTA", + "sllvDTS", false, map_srlv, "sravDTS", + "jrS", "jalrD1S", false, false, + "syscallY", "breakY", false, "sync", + "clzDS", "cloDS", "dclzDS", "dcloDS", + "dsllvDST", "dlsaDSTA", "dsrlvDST", "dsravDST", + map_mul_r6, map_mulu_r6, map_div_r6, map_divu_r6, + map_dmul_r6, map_dmulu_r6, map_ddiv_r6, map_ddivu_r6, + "addDST", "addu|moveDST0", "subDST", "subu|neguDS0T", + "andDST", "or|moveDST0", "xorDST", "nor|notDST0", + false, false, "sltDST", "sltuDST", + "daddDST", "dadduDST", "dsubDST", "dsubuDST", + "tgeSTZ", "tgeuSTZ", "tltSTZ", "tltuSTZ", + "teqSTZ", "seleqzDST", "tneSTZ", "selnezDST", + "dsllDTA", false, "dsrlDTA", "dsraDTA", + "dsll32DTA", false, "dsrl32DTA", "dsra32DTA", +} + +local map_bshfl_r6 = { + shift = 9, mask = 3, + [1] = "alignDSTa", + _ = { + shift = 6, mask = 31, + [0] = "bitswapDT", + [2] = "wsbhDT", + [16] = "sebDT", + [24] = "sehDT", + } +} + +local map_dbshfl_r6 = { + shift = 9, mask = 3, + [1] = "dalignDSTa", + _ = { + shift = 6, mask = 31, + [0] = "dbitswapDT", + [2] = "dsbhDT", + [5] = "dshdDT", + } +} + +local map_special3_r6 = { + shift = 0, mask = 63, + [0] = "extTSAK", [1] = "dextmTSAP", [3] = "dextTSAK", + [4] = "insTSAL", [6] = "dinsuTSEQ", [7] = "dinsTSAL", + [32] = map_bshfl_r6, [36] = map_dbshfl_r6, [59] = "rdhwrTD", +} + +local map_regimm_r6 = { + shift = 16, mask = 31, + [0] = "bltzSB", [1] = "bgezSB", + [6] = "dahiSI", [30] = "datiSI", + [23] = "sigrieI", [31] = "synciSO", +} + +local map_pcrel_r6 = { + shift = 19, mask = 3, + [0] = "addiupcS2", "lwpcS2", "lwupcS2", { + shift = 18, mask = 1, + [0] = "ldpcS3", { shift = 16, mask = 3, [2] = "auipcSI", [3] = "aluipcSI" } + } +} + +local map_cop1s_r6 = { + shift = 0, mask = 63, + [0] = "add.sFGH", "sub.sFGH", "mul.sFGH", "div.sFGH", + "sqrt.sFG", "abs.sFG", "mov.sFG", "neg.sFG", + "round.l.sFG", "trunc.l.sFG", "ceil.l.sFG", "floor.l.sFG", + "round.w.sFG", "trunc.w.sFG", "ceil.w.sFG", "floor.w.sFG", + "sel.sFGH", false, false, false, + "seleqz.sFGH", "recip.sFG", "rsqrt.sFG", "selnez.sFGH", + "maddf.sFGH", "msubf.sFGH", "rint.sFG", "class.sFG", + "min.sFGH", "mina.sFGH", "max.sFGH", "maxa.sFGH", + false, "cvt.d.sFG", false, false, + "cvt.w.sFG", "cvt.l.sFG", +} + +local map_cop1d_r6 = { + shift = 0, mask = 63, + [0] = "add.dFGH", "sub.dFGH", "mul.dFGH", "div.dFGH", + "sqrt.dFG", "abs.dFG", "mov.dFG", "neg.dFG", + "round.l.dFG", "trunc.l.dFG", "ceil.l.dFG", "floor.l.dFG", + "round.w.dFG", "trunc.w.dFG", "ceil.w.dFG", "floor.w.dFG", + "sel.dFGH", false, false, false, + "seleqz.dFGH", "recip.dFG", "rsqrt.dFG", "selnez.dFGH", + "maddf.dFGH", "msubf.dFGH", "rint.dFG", "class.dFG", + "min.dFGH", "mina.dFGH", "max.dFGH", "maxa.dFGH", + "cvt.s.dFG", false, false, false, + "cvt.w.dFG", "cvt.l.dFG", +} + +local map_cop1w_r6 = { + shift = 0, mask = 63, + [0] = "cmp.af.sFGH", "cmp.un.sFGH", "cmp.eq.sFGH", "cmp.ueq.sFGH", + "cmp.lt.sFGH", "cmp.ult.sFGH", "cmp.le.sFGH", "cmp.ule.sFGH", + "cmp.saf.sFGH", "cmp.sun.sFGH", "cmp.seq.sFGH", "cmp.sueq.sFGH", + "cmp.slt.sFGH", "cmp.sult.sFGH", "cmp.sle.sFGH", "cmp.sule.sFGH", + false, "cmp.or.sFGH", "cmp.une.sFGH", "cmp.ne.sFGH", + false, false, false, false, + false, "cmp.sor.sFGH", "cmp.sune.sFGH", "cmp.sne.sFGH", + false, false, false, false, + "cvt.s.wFG", "cvt.d.wFG", +} + +local map_cop1l_r6 = { + shift = 0, mask = 63, + [0] = "cmp.af.dFGH", "cmp.un.dFGH", "cmp.eq.dFGH", "cmp.ueq.dFGH", + "cmp.lt.dFGH", "cmp.ult.dFGH", "cmp.le.dFGH", "cmp.ule.dFGH", + "cmp.saf.dFGH", "cmp.sun.dFGH", "cmp.seq.dFGH", "cmp.sueq.dFGH", + "cmp.slt.dFGH", "cmp.sult.dFGH", "cmp.sle.dFGH", "cmp.sule.dFGH", + false, "cmp.or.dFGH", "cmp.une.dFGH", "cmp.ne.dFGH", + false, false, false, false, + false, "cmp.sor.dFGH", "cmp.sune.dFGH", "cmp.sne.dFGH", + false, false, false, false, + "cvt.s.lFG", "cvt.d.lFG", +} + +local map_cop1_r6 = { + shift = 21, mask = 31, + [0] = "mfc1TG", "dmfc1TG", "cfc1TG", "mfhc1TG", + "mtc1TG", "dmtc1TG", "ctc1TG", "mthc1TG", + false, "bc1eqzHB", false, false, + false, "bc1nezHB", false, false, + map_cop1s_r6, map_cop1d_r6, false, false, + map_cop1w_r6, map_cop1l_r6, +} + +local function maprs_popTS(rs, rt) + if rt == 0 then return 0 elseif rs == 0 then return 1 + elseif rs == rt then return 2 else return 3 end +end + +local map_pop06_r6 = { + maprs = maprs_popTS, [0] = "blezSB", "blezalcTB", "bgezalcTB", "bgeucSTB" +} +local map_pop07_r6 = { + maprs = maprs_popTS, [0] = "bgtzSB", "bgtzalcTB", "bltzalcTB", "bltucSTB" +} +local map_pop26_r6 = { + maprs = maprs_popTS, "blezcTB", "bgezcTB", "bgecSTB" +} +local map_pop27_r6 = { + maprs = maprs_popTS, "bgtzcTB", "bltzcTB", "bltcSTB" +} + +local function maprs_popS(rs, rt) + if rs == 0 then return 0 else return 1 end +end + +local map_pop66_r6 = { + maprs = maprs_popS, [0] = "jicTI", "beqzcSb" +} +local map_pop76_r6 = { + maprs = maprs_popS, [0] = "jialcTI", "bnezcSb" +} + +local function maprs_popST(rs, rt) + if rs >= rt then return 0 elseif rs == 0 then return 1 else return 2 end +end + +local map_pop10_r6 = { + maprs = maprs_popST, [0] = "bovcSTB", "beqzalcTB", "beqcSTB" +} +local map_pop30_r6 = { + maprs = maprs_popST, [0] = "bnvcSTB", "bnezalcTB", "bnecSTB" +} + +local map_pri_r6 = { + [0] = map_special_r6, map_regimm_r6, "jJ", "jalJ", + "beq|beqz|bST00B", "bne|bnezST0B", map_pop06_r6, map_pop07_r6, + map_pop10_r6, "addiu|liTS0I", "sltiTSI", "sltiuTSI", + "andiTSU", "ori|liTS0U", "xoriTSU", "aui|luiTS0U", + map_cop0, map_cop1_r6, false, false, + false, false, map_pop26_r6, map_pop27_r6, + map_pop30_r6, "daddiuTSI", false, false, + false, "dauiTSI", false, map_special3_r6, + "lbTSO", "lhTSO", false, "lwTSO", + "lbuTSO", "lhuTSO", false, false, + "sbTSO", "shTSO", false, "swTSO", + false, false, false, false, + false, "lwc1HSO", "bc#", false, + false, "ldc1HSO", map_pop66_r6, "ldTSO", + false, "swc1HSO", "balc#", map_pcrel_r6, + false, "sdc1HSO", map_pop76_r6, "sdTSO", +} + ------------------------------------------------------------------------------ local map_gpr = { @@ -287,10 +494,14 @@ local function disass_ins(ctx) ctx.op = op ctx.rel = nil - local opat = map_pri[rshift(op, 26)] + local opat = ctx.map_pri[rshift(op, 26)] while type(opat) ~= "string" do if not opat then return unknown(ctx) end - opat = opat[band(rshift(op, opat.shift), opat.mask)] or opat._ + if opat.maprs then + opat = opat[opat.maprs(band(rshift(op,21),31), band(rshift(op,16),31))] + else + opat = opat[band(rshift(op, opat.shift), opat.mask)] or opat._ + end end local name, pat = match(opat, "^([a-z0-9_.]*)(.*)") local altname, pat2 = match(pat, "|([a-z0-9_.|]*)(.*)") @@ -314,6 +525,8 @@ local function disass_ins(ctx) x = "f"..band(rshift(op, 21), 31) elseif p == "A" then x = band(rshift(op, 6), 31) + elseif p == "a" then + x = band(rshift(op, 6), 7) elseif p == "E" then x = band(rshift(op, 6), 31) + 32 elseif p == "M" then @@ -333,6 +546,10 @@ local function disass_ins(ctx) x = band(rshift(op, 11), 31) - last + 33 elseif p == "I" then x = arshift(lshift(op, 16), 16) + elseif p == "2" then + x = arshift(lshift(op, 13), 11) + elseif p == "3" then + x = arshift(lshift(op, 14), 11) elseif p == "U" then x = band(op, 0xffff) elseif p == "O" then @@ -342,7 +559,15 @@ local function disass_ins(ctx) local index = map_gpr[band(rshift(op, 16), 31)] operands[#operands] = format("%s(%s)", index, last) elseif p == "B" then - x = ctx.addr + ctx.pos + arshift(lshift(op, 16), 16)*4 + 4 + x = ctx.addr + ctx.pos + arshift(lshift(op, 16), 14) + 4 + ctx.rel = x + x = format("0x%08x", x) + elseif p == "b" then + x = ctx.addr + ctx.pos + arshift(lshift(op, 11), 9) + 4 + ctx.rel = x + x = format("0x%08x", x) + elseif p == "#" then + x = ctx.addr + ctx.pos + arshift(lshift(op, 6), 4) + 4 ctx.rel = x x = format("0x%08x", x) elseif p == "J" then @@ -408,6 +633,7 @@ local function create(code, addr, out) ctx.disass = disass_block ctx.hexdump = 8 ctx.get = get_be + ctx.map_pri = map_pri return ctx end @@ -417,6 +643,19 @@ local function create_el(code, addr, out) return ctx end +local function create_r6(code, addr, out) + local ctx = create(code, addr, out) + ctx.map_pri = map_pri_r6 + return ctx +end + +local function create_r6_el(code, addr, out) + local ctx = create(code, addr, out) + ctx.get = get_le + ctx.map_pri = map_pri_r6 + return ctx +end + -- Simple API: disassemble code (a string) at address and output via out. local function disass(code, addr, out) create(code, addr, out):disass() @@ -426,6 +665,14 @@ local function disass_el(code, addr, out) create_el(code, addr, out):disass() end +local function disass_r6(code, addr, out) + create_r6(code, addr, out):disass() +end + +local function disass_r6_el(code, addr, out) + create_r6_el(code, addr, out):disass() +end + -- Return register name for RID. local function regname(r) if r < 32 then return map_gpr[r] end @@ -436,8 +683,12 @@ end return { create = create, create_el = create_el, + create_r6 = create_r6, + create_r6_el = create_r6_el, disass = disass, disass_el = disass_el, + disass_r6 = disass_r6, + disass_r6_el = disass_r6_el, regname = regname } diff --git a/src/jit/dis_mips64r6.lua b/src/jit/dis_mips64r6.lua new file mode 100644 index 00000000..023c05ab --- /dev/null +++ b/src/jit/dis_mips64r6.lua @@ -0,0 +1,17 @@ +---------------------------------------------------------------------------- +-- LuaJIT MIPS64R6 disassembler wrapper module. +-- +-- Copyright (C) 2005-2017 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- This module just exports the r6 big-endian functions from the +-- MIPS disassembler module. All the interesting stuff is there. +------------------------------------------------------------------------------ + +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips") +return { + create = dis_mips.create_r6, + disass = dis_mips.disass_r6, + regname = dis_mips.regname +} + diff --git a/src/jit/dis_mips64r6el.lua b/src/jit/dis_mips64r6el.lua new file mode 100644 index 00000000..f2988339 --- /dev/null +++ b/src/jit/dis_mips64r6el.lua @@ -0,0 +1,17 @@ +---------------------------------------------------------------------------- +-- LuaJIT MIPS64R6EL disassembler wrapper module. +-- +-- Copyright (C) 2005-2017 Mike Pall. All rights reserved. +-- Released under the MIT license. See Copyright Notice in luajit.h +---------------------------------------------------------------------------- +-- This module just exports the r6 little-endian functions from the +-- MIPS disassembler module. All the interesting stuff is there. +------------------------------------------------------------------------------ + +local dis_mips = require((string.match(..., ".*%.") or "").."dis_mips") +return { + create = dis_mips.create_r6_el, + disass = dis_mips.disass_r6_el, + regname = dis_mips.regname +} + diff --git a/src/lj_arch.h b/src/lj_arch.h index 0351e046..cf31a291 100644 --- a/src/lj_arch.h +++ b/src/lj_arch.h @@ -342,18 +342,38 @@ #elif LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 || LUAJIT_TARGET == LUAJIT_ARCH_MIPS64 #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) +#if __mips_isa_rev >= 6 +#define LJ_TARGET_MIPSR6 1 +#define LJ_TARGET_UNALIGNED 1 +#endif #if LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 +#if LJ_TARGET_MIPSR6 +#define LJ_ARCH_NAME "mips32r6el" +#else #define LJ_ARCH_NAME "mipsel" +#endif +#else +#if LJ_TARGET_MIPSR6 +#define LJ_ARCH_NAME "mips64r6el" #else #define LJ_ARCH_NAME "mips64el" #endif +#endif #define LJ_ARCH_ENDIAN LUAJIT_LE #else #if LUAJIT_TARGET == LUAJIT_ARCH_MIPS32 +#if LJ_TARGET_MIPSR6 +#define LJ_ARCH_NAME "mips32r6" +#else #define LJ_ARCH_NAME "mips" +#endif +#else +#if LJ_TARGET_MIPSR6 +#define LJ_ARCH_NAME "mips64r6" #else #define LJ_ARCH_NAME "mips64" #endif +#endif #define LJ_ARCH_ENDIAN LUAJIT_BE #endif @@ -390,7 +410,9 @@ #define LJ_TARGET_UNIFYROT 2 /* Want only IR_BROR. */ #define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL -#if _MIPS_ARCH_MIPS32R2 || _MIPS_ARCH_MIPS64R2 +#if LJ_TARGET_MIPSR6 +#define LJ_ARCH_VERSION 60 +#elif _MIPS_ARCH_MIPS32R2 || _MIPS_ARCH_MIPS64R2 #define LJ_ARCH_VERSION 20 #else #define LJ_ARCH_VERSION 10 @@ -472,8 +494,13 @@ #if !((defined(_MIPS_SIM_ABI32) && _MIPS_SIM == _MIPS_SIM_ABI32) || (defined(_ABIO32) && _MIPS_SIM == _ABIO32)) #error "Only o32 ABI supported for MIPS32" #endif +#if LJ_TARGET_MIPSR6 +/* Not that useful, since most available r6 CPUs are 64 bit. */ +#error "No support for MIPS32R6" +#endif #elif LJ_TARGET_MIPS64 #if !((defined(_MIPS_SIM_ABI64) && _MIPS_SIM == _MIPS_SIM_ABI64) || (defined(_ABI64) && _MIPS_SIM == _ABI64)) +/* MIPS32ON64 aka n32 ABI support might be desirable, but difficult. */ #error "Only n64 ABI supported for MIPS64" #endif #endif diff --git a/src/lj_asm.c b/src/lj_asm.c index 25b96264..96b8c032 100644 --- a/src/lj_asm.c +++ b/src/lj_asm.c @@ -2159,8 +2159,8 @@ static void asm_setup_regsp(ASMState *as) ir->prev = REGSP_HINT(RID_FPRET); continue; } - /* fallthrough */ #endif + /* fallthrough */ case IR_CALLN: case IR_CALLXS: #if LJ_SOFTFP case IR_MIN: case IR_MAX: diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h index 23ffc3aa..4626507b 100644 --- a/src/lj_asm_mips.h +++ b/src/lj_asm_mips.h @@ -101,7 +101,12 @@ static void asm_guard(ASMState *as, MIPSIns mi, Reg rs, Reg rt) as->invmcp = NULL; as->loopinv = 1; as->mcp = p+1; +#if !LJ_TARGET_MIPSR6 mi = mi ^ ((mi>>28) == 1 ? 0x04000000u : 0x00010000u); /* Invert cond. */ +#else + mi = mi ^ ((mi>>28) == 1 ? 0x04000000u : + (mi>>28) == 4 ? 0x00800000u : 0x00010000u); /* Invert cond. */ +#endif target = p; /* Patch target later in asm_loop_fixup. */ } emit_ti(as, MIPSI_LI, RID_TMP, as->snapno); @@ -410,7 +415,11 @@ static void asm_callround(ASMState *as, IRIns *ir, IRCallID id) { /* The modified regs must match with the *.dasc implementation. */ RegSet drop = RID2RSET(RID_R1)|RID2RSET(RID_R12)|RID2RSET(RID_FPRET)| - RID2RSET(RID_F2)|RID2RSET(RID_F4)|RID2RSET(REGARG_FIRSTFPR); + RID2RSET(RID_F2)|RID2RSET(RID_F4)|RID2RSET(REGARG_FIRSTFPR) +#if LJ_TARGET_MIPSR6 + |RID2RSET(RID_F21) +#endif + ; if (ra_hasreg(ir->r)) rset_clear(drop, ir->r); ra_evictset(as, drop); ra_destreg(as, ir, RID_FPRET); @@ -444,8 +453,13 @@ static void asm_tointg(ASMState *as, IRIns *ir, Reg left) { Reg tmp = ra_scratch(as, rset_exclude(RSET_FPR, left)); Reg dest = ra_dest(as, ir, RSET_GPR); +#if !LJ_TARGET_MIPSR6 asm_guard(as, MIPSI_BC1F, 0, 0); emit_fgh(as, MIPSI_C_EQ_D, 0, tmp, left); +#else + asm_guard(as, MIPSI_BC1EQZ, 0, (tmp&31)); + emit_fgh(as, MIPSI_CMP_EQ_D, tmp, tmp, left); +#endif emit_fg(as, MIPSI_CVT_D_W, tmp, tmp); emit_tg(as, MIPSI_MFC1, dest, tmp); emit_fg(as, MIPSI_CVT_W_D, tmp, left); @@ -599,8 +613,13 @@ static void asm_conv(ASMState *as, IRIns *ir) (void *)&as->J->k64[LJ_K64_M2P64], rset_exclude(RSET_GPR, dest)); emit_fg(as, MIPSI_TRUNC_L_D, tmp, left); /* Delay slot. */ - emit_branch(as, MIPSI_BC1T, 0, 0, l_end); - emit_fgh(as, MIPSI_C_OLT_D, 0, left, tmp); +#if !LJ_TARGET_MIPSR6 + emit_branch(as, MIPSI_BC1T, 0, 0, l_end); + emit_fgh(as, MIPSI_C_OLT_D, 0, left, tmp); +#else + emit_branch(as, MIPSI_BC1NEZ, 0, (left&31), l_end); + emit_fgh(as, MIPSI_CMP_LT_D, left, left, tmp); +#endif emit_lsptr(as, MIPSI_LDC1, (tmp & 31), (void *)&as->J->k64[LJ_K64_2P63], rset_exclude(RSET_GPR, dest)); @@ -611,8 +630,13 @@ static void asm_conv(ASMState *as, IRIns *ir) (void *)&as->J->k32[LJ_K32_M2P64], rset_exclude(RSET_GPR, dest)); emit_fg(as, MIPSI_TRUNC_L_S, tmp, left); /* Delay slot. */ - emit_branch(as, MIPSI_BC1T, 0, 0, l_end); - emit_fgh(as, MIPSI_C_OLT_S, 0, left, tmp); +#if !LJ_TARGET_MIPSR6 + emit_branch(as, MIPSI_BC1T, 0, 0, l_end); + emit_fgh(as, MIPSI_C_OLT_S, 0, left, tmp); +#else + emit_branch(as, MIPSI_BC1NEZ, 0, (left&31), l_end); + emit_fgh(as, MIPSI_CMP_LT_S, left, left, tmp); +#endif emit_lsptr(as, MIPSI_LWC1, (tmp & 31), (void *)&as->J->k32[LJ_K32_2P63], rset_exclude(RSET_GPR, dest)); @@ -840,8 +864,12 @@ static void asm_aref(ASMState *as, IRIns *ir) } base = ra_alloc1(as, ir->op1, RSET_GPR); idx = ra_alloc1(as, ir->op2, rset_exclude(RSET_GPR, base)); +#if !LJ_TARGET_MIPSR6 emit_dst(as, MIPSI_AADDU, dest, RID_TMP, base); emit_dta(as, MIPSI_SLL, RID_TMP, idx, 3); +#else + emit_dst(as, MIPSI_ALSA | MIPSF_A(3-1), dest, idx, base); +#endif } /* Inlined hash lookup. Specialized for key type and for const keys. @@ -944,8 +972,13 @@ static void asm_href(ASMState *as, IRIns *ir, IROp merge) l_end = asm_exitstub_addr(as); } if (!LJ_SOFTFP && irt_isnum(kt)) { +#if !LJ_TARGET_MIPSR6 emit_branch(as, MIPSI_BC1T, 0, 0, l_end); emit_fgh(as, MIPSI_C_EQ_D, 0, tmpnum, key); +#else + emit_branch(as, MIPSI_BC1NEZ, 0, (tmpnum&31), l_end); + emit_fgh(as, MIPSI_CMP_EQ_D, tmpnum, tmpnum, key); +#endif *--as->mcp = MIPSI_NOP; /* Avoid NaN comparison overhead. */ emit_branch(as, MIPSI_BEQ, tmp1, RID_ZERO, l_next); emit_tsi(as, MIPSI_SLTIU, tmp1, tmp1, (int32_t)LJ_TISNUM); @@ -1196,7 +1229,9 @@ static MIPSIns asm_fxloadins(IRIns *ir) case IRT_I16: return MIPSI_LH; case IRT_U16: return MIPSI_LHU; case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_LDC1; + /* fallthrough */ case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_LWC1; + /* fallthrough */ default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_LD : MIPSI_LW; } } @@ -1207,7 +1242,9 @@ static MIPSIns asm_fxstoreins(IRIns *ir) case IRT_I8: case IRT_U8: return MIPSI_SB; case IRT_I16: case IRT_U16: return MIPSI_SH; case IRT_NUM: lua_assert(!LJ_SOFTFP32); if (!LJ_SOFTFP) return MIPSI_SDC1; + /* fallthrough */ case IRT_FLOAT: if (!LJ_SOFTFP) return MIPSI_SWC1; + /* fallthrough */ default: return (LJ_64 && irt_is64(ir->t)) ? MIPSI_SD : MIPSI_SW; } } @@ -1253,7 +1290,7 @@ static void asm_xload(ASMState *as, IRIns *ir) { Reg dest = ra_dest(as, ir, (!LJ_SOFTFP && irt_isfp(ir->t)) ? RSET_FPR : RSET_GPR); - lua_assert(!(ir->op2 & IRXLOAD_UNALIGNED)); + lua_assert(LJ_TARGET_UNALIGNED || !(ir->op2 & IRXLOAD_UNALIGNED)); asm_fusexref(as, asm_fxloadins(ir), dest, ir->op1, RSET_GPR, 0); } @@ -1545,7 +1582,7 @@ static void asm_cnew(ASMState *as, IRIns *ir) ofs -= 4; if (LJ_BE) ir++; else ir--; } #else - emit_tsi(as, MIPSI_SD, ra_alloc1(as, ir->op2, allow), + emit_tsi(as, sz == 8 ? MIPSI_SD : MIPSI_SW, ra_alloc1(as, ir->op2, allow), RID_RET, sizeof(GCcdata)); #endif lua_assert(sz == 4 || sz == 8); @@ -1678,6 +1715,7 @@ static void asm_add(ASMState *as, IRIns *ir) } else #endif { + /* TODO MIPSR6: Fuse ADD(BSHL(a,1-4),b) or ADD(ADD(a,a),b) to MIPSI_ALSA. */ Reg dest = ra_dest(as, ir, RSET_GPR); Reg right, left = ra_hintalloc(as, ir->op1, dest, RSET_GPR); if (irref_isk(ir->op2)) { @@ -1722,8 +1760,12 @@ static void asm_mul(ASMState *as, IRIns *ir) Reg right, left = ra_alloc2(as, ir, RSET_GPR); right = (left >> 8); left &= 255; if (LJ_64 && irt_is64(ir->t)) { +#if !LJ_TARGET_MIPSR6 emit_dst(as, MIPSI_MFLO, dest, 0, 0); emit_dst(as, MIPSI_DMULT, 0, left, right); +#else + emit_dst(as, MIPSI_DMUL, dest, left, right); +#endif } else { emit_dst(as, MIPSI_MUL, dest, left, right); } @@ -1806,6 +1848,7 @@ static void asm_abs(ASMState *as, IRIns *ir) static void asm_arithov(ASMState *as, IRIns *ir) { + /* TODO MIPSR6: bovc/bnvc. Caveat: no delay slot to load RID_TMP. */ Reg right, left, tmp, dest = ra_dest(as, ir, RSET_GPR); lua_assert(!irt_is64(ir->t)); if (irref_isk(ir->op2)) { @@ -1850,9 +1893,14 @@ static void asm_mulov(ASMState *as, IRIns *ir) right), dest)); asm_guard(as, MIPSI_BNE, RID_TMP, tmp); emit_dta(as, MIPSI_SRA, RID_TMP, dest, 31); +#if !LJ_TARGET_MIPSR6 emit_dst(as, MIPSI_MFHI, tmp, 0, 0); emit_dst(as, MIPSI_MFLO, dest, 0, 0); emit_dst(as, MIPSI_MULT, 0, left, right); +#else + emit_dst(as, MIPSI_MUL, dest, left, right); + emit_dst(as, MIPSI_MUH, tmp, left, right); +#endif } #if LJ_32 && LJ_HASFFI @@ -2076,6 +2124,7 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax) Reg dest = ra_dest(as, ir, RSET_FPR); Reg right, left = ra_alloc2(as, ir, RSET_FPR); right = (left >> 8); left &= 255; +#if !LJ_TARGET_MIPSR6 if (dest == left) { emit_fg(as, MIPSI_MOVT_D, dest, right); } else { @@ -2083,19 +2132,37 @@ static void asm_min_max(ASMState *as, IRIns *ir, int ismax) if (dest != right) emit_fg(as, MIPSI_MOV_D, dest, right); } emit_fgh(as, MIPSI_C_OLT_D, 0, ismax ? left : right, ismax ? right : left); +#else + emit_fgh(as, ismax ? MIPSI_MAX_D : MIPSI_MIN_D, dest, left, right); +#endif #endif } else { Reg dest = ra_dest(as, ir, RSET_GPR); Reg right, left = ra_alloc2(as, ir, RSET_GPR); right = (left >> 8); left &= 255; - if (dest == left) { - emit_dst(as, MIPSI_MOVN, dest, right, RID_TMP); + if (left == right) { + if (dest != left) emit_move(as, dest, left); } else { - emit_dst(as, MIPSI_MOVZ, dest, left, RID_TMP); - if (dest != right) emit_move(as, dest, right); +#if !LJ_TARGET_MIPSR6 + if (dest == left) { + emit_dst(as, MIPSI_MOVN, dest, right, RID_TMP); + } else { + emit_dst(as, MIPSI_MOVZ, dest, left, RID_TMP); + if (dest != right) emit_move(as, dest, right); + } +#else + emit_dst(as, MIPSI_OR, dest, dest, RID_TMP); + if (dest != right) { + emit_dst(as, MIPSI_SELNEZ, RID_TMP, right, RID_TMP); + emit_dst(as, MIPSI_SELEQZ, dest, left, RID_TMP); + } else { + emit_dst(as, MIPSI_SELEQZ, RID_TMP, left, RID_TMP); + emit_dst(as, MIPSI_SELNEZ, dest, right, RID_TMP); + } +#endif + emit_dst(as, MIPSI_SLT, RID_TMP, + ismax ? left : right, ismax ? right : left); } - emit_dst(as, MIPSI_SLT, RID_TMP, - ismax ? left : right, ismax ? right : left); } } @@ -2179,10 +2246,18 @@ static void asm_comp(ASMState *as, IRIns *ir) #if LJ_SOFTFP asm_sfpcomp(as, ir); #else +#if !LJ_TARGET_MIPSR6 Reg right, left = ra_alloc2(as, ir, RSET_FPR); right = (left >> 8); left &= 255; asm_guard(as, (op&1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0); emit_fgh(as, MIPSI_C_OLT_D + ((op&3) ^ ((op>>2)&1)), 0, left, right); +#else + Reg tmp, right, left = ra_alloc2(as, ir, RSET_FPR); + right = (left >> 8); left &= 255; + tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_FPR, left), right)); + asm_guard(as, (op&1) ? MIPSI_BC1NEZ : MIPSI_BC1EQZ, 0, (tmp&31)); + emit_fgh(as, MIPSI_CMP_LT_D + ((op&3) ^ ((op>>2)&1)), tmp, left, right); +#endif #endif } else { Reg right, left = ra_alloc1(as, ir->op1, RSET_GPR); @@ -2218,9 +2293,13 @@ static void asm_equal(ASMState *as, IRIns *ir) if (!LJ_SOFTFP32 && irt_isnum(ir->t)) { #if LJ_SOFTFP asm_sfpcomp(as, ir); -#else +#elif !LJ_TARGET_MIPSR6 asm_guard(as, (ir->o & 1) ? MIPSI_BC1T : MIPSI_BC1F, 0, 0); emit_fgh(as, MIPSI_C_EQ_D, 0, left, right); +#else + Reg tmp = ra_scratch(as, rset_exclude(rset_exclude(RSET_FPR, left), right)); + asm_guard(as, (ir->o & 1) ? MIPSI_BC1NEZ : MIPSI_BC1EQZ, 0, (tmp&31)); + emit_fgh(as, MIPSI_CMP_EQ_D, tmp, left, right); #endif } else { asm_guard(as, (ir->o & 1) ? MIPSI_BEQ : MIPSI_BNE, left, right); @@ -2623,7 +2702,12 @@ void lj_asm_patchexit(jit_State *J, GCtrace *T, ExitNo exitno, MCode *target) if (((p[-1] ^ (px-p)) & 0xffffu) == 0 && ((p[-1] & 0xf0000000u) == MIPSI_BEQ || (p[-1] & 0xfc1e0000u) == MIPSI_BLTZ || - (p[-1] & 0xffe00000u) == MIPSI_BC1F)) { +#if !LJ_TARGET_MIPSR6 + (p[-1] & 0xffe00000u) == MIPSI_BC1F +#else + (p[-1] & 0xff600000u) == MIPSI_BC1EQZ +#endif + )) { ptrdiff_t delta = target - p; if (((delta + 0x8000) >> 16) == 0) { /* Patch in-range branch. */ patchbranch: diff --git a/src/lj_emit_mips.h b/src/lj_emit_mips.h index bb6593ae..313d030a 100644 --- a/src/lj_emit_mips.h +++ b/src/lj_emit_mips.h @@ -138,6 +138,7 @@ static void emit_loadu64(ASMState *as, Reg r, uint64_t u64) } else if (emit_kdelta1(as, r, (intptr_t)u64)) { return; } else { + /* TODO MIPSR6: Use DAHI & DATI. Caveat: sign-extension. */ if ((u64 & 0xffff)) { emit_tsi(as, MIPSI_ORI, r, r, u64 & 0xffff); } @@ -236,10 +237,22 @@ static void emit_jmp(ASMState *as, MCode *target) static void emit_call(ASMState *as, void *target, int needcfa) { MCode *p = as->mcp; - *--p = MIPSI_NOP; +#if LJ_TARGET_MIPSR6 + ptrdiff_t delta = (char *)target - (char *)p; + if ((((delta>>2) + 0x02000000) >> 26) == 0) { /* Try compact call first. */ + *--p = MIPSI_BALC | (((uintptr_t)delta >>2) & 0x03ffffffu); + as->mcp = p; + return; + } +#endif + *--p = MIPSI_NOP; /* Delay slot. */ if ((((uintptr_t)target ^ (uintptr_t)p) >> 28) == 0) { +#if !LJ_TARGET_MIPSR6 *--p = (((uintptr_t)target & 1) ? MIPSI_JALX : MIPSI_JAL) | (((uintptr_t)target >>2) & 0x03ffffffu); +#else + *--p = MIPSI_JAL | (((uintptr_t)target >>2) & 0x03ffffffu); +#endif } else { /* Target out of range: need indirect call. */ *--p = MIPSI_JALR | MIPSF_S(RID_CFUNCADDR); needcfa = 1; diff --git a/src/lj_jit.h b/src/lj_jit.h index c06829ab..a8b6f9a7 100644 --- a/src/lj_jit.h +++ b/src/lj_jit.h @@ -51,10 +51,18 @@ /* Names for the CPU-specific flags. Must match the order above. */ #define JIT_F_CPU_FIRST JIT_F_MIPSXXR2 #if LJ_TARGET_MIPS32 +#if LJ_TARGET_MIPSR6 +#define JIT_F_CPUSTRING "\010MIPS32R6" +#else #define JIT_F_CPUSTRING "\010MIPS32R2" +#endif +#else +#if LJ_TARGET_MIPSR6 +#define JIT_F_CPUSTRING "\010MIPS64R6" #else #define JIT_F_CPUSTRING "\010MIPS64R2" #endif +#endif #else #define JIT_F_CPU_FIRST 0 #define JIT_F_CPUSTRING "" diff --git a/src/lj_target_mips.h b/src/lj_target_mips.h index 740687b3..84db6012 100644 --- a/src/lj_target_mips.h +++ b/src/lj_target_mips.h @@ -223,6 +223,8 @@ typedef enum MIPSIns { MIPSI_ADDIU = 0x24000000, MIPSI_SUB = 0x00000022, MIPSI_SUBU = 0x00000023, + +#if !LJ_TARGET_MIPSR6 MIPSI_MUL = 0x70000002, MIPSI_DIV = 0x0000001a, MIPSI_DIVU = 0x0000001b, @@ -232,6 +234,15 @@ typedef enum MIPSIns { MIPSI_MFHI = 0x00000010, MIPSI_MFLO = 0x00000012, MIPSI_MULT = 0x00000018, +#else + MIPSI_MUL = 0x00000098, + MIPSI_MUH = 0x000000d8, + MIPSI_DIV = 0x0000009a, + MIPSI_DIVU = 0x0000009b, + + MIPSI_SELEQZ = 0x00000035, + MIPSI_SELNEZ = 0x00000037, +#endif MIPSI_SLL = 0x00000000, MIPSI_SRL = 0x00000002, @@ -253,8 +264,13 @@ typedef enum MIPSIns { MIPSI_B = 0x10000000, MIPSI_J = 0x08000000, MIPSI_JAL = 0x0c000000, +#if !LJ_TARGET_MIPSR6 MIPSI_JALX = 0x74000000, MIPSI_JR = 0x00000008, +#else + MIPSI_JR = 0x00000009, + MIPSI_BALC = 0xe8000000, +#endif MIPSI_JALR = 0x0000f809, MIPSI_BEQ = 0x10000000, @@ -282,15 +298,23 @@ typedef enum MIPSIns { /* MIPS64 instructions. */ MIPSI_DADD = 0x0000002c, - MIPSI_DADDI = 0x60000000, MIPSI_DADDU = 0x0000002d, MIPSI_DADDIU = 0x64000000, MIPSI_DSUB = 0x0000002e, MIPSI_DSUBU = 0x0000002f, +#if !LJ_TARGET_MIPSR6 MIPSI_DDIV = 0x0000001e, MIPSI_DDIVU = 0x0000001f, MIPSI_DMULT = 0x0000001c, MIPSI_DMULTU = 0x0000001d, +#else + MIPSI_DDIV = 0x0000009e, + MIPSI_DMOD = 0x000000de, + MIPSI_DDIVU = 0x0000009f, + MIPSI_DMODU = 0x000000df, + MIPSI_DMUL = 0x0000009c, + MIPSI_DMUH = 0x000000dc, +#endif MIPSI_DSLL = 0x00000038, MIPSI_DSRL = 0x0000003a, @@ -308,6 +332,11 @@ typedef enum MIPSIns { MIPSI_ASUBU = LJ_32 ? MIPSI_SUBU : MIPSI_DSUBU, MIPSI_AL = LJ_32 ? MIPSI_LW : MIPSI_LD, MIPSI_AS = LJ_32 ? MIPSI_SW : MIPSI_SD, +#if LJ_TARGET_MIPSR6 + MIPSI_LSA = 0x00000005, + MIPSI_DLSA = 0x00000015, + MIPSI_ALSA = LJ_32 ? MIPSI_LSA : MIPSI_DLSA, +#endif /* Extract/insert instructions. */ MIPSI_DEXTM = 0x7c000001, @@ -317,18 +346,19 @@ typedef enum MIPSIns { MIPSI_DINSU = 0x7c000006, MIPSI_DINS = 0x7c000007, - MIPSI_RINT_D = 0x4620001a, - MIPSI_RINT_S = 0x4600001a, - MIPSI_RINT = 0x4400001a, MIPSI_FLOOR_D = 0x4620000b, - MIPSI_CEIL_D = 0x4620000a, - MIPSI_ROUND_D = 0x46200008, /* FP instructions. */ MIPSI_MOV_S = 0x46000006, MIPSI_MOV_D = 0x46200006, +#if !LJ_TARGET_MIPSR6 MIPSI_MOVT_D = 0x46210011, MIPSI_MOVF_D = 0x46200011, +#else + MIPSI_MIN_D = 0x4620001C, + MIPSI_MAX_D = 0x4620001E, + MIPSI_SEL_D = 0x46200010, +#endif MIPSI_ABS_D = 0x46200005, MIPSI_NEG_D = 0x46200007, @@ -363,15 +393,23 @@ typedef enum MIPSIns { MIPSI_DMTC1 = 0x44a00000, MIPSI_DMFC1 = 0x44200000, +#if !LJ_TARGET_MIPSR6 MIPSI_BC1F = 0x45000000, MIPSI_BC1T = 0x45010000, - MIPSI_C_EQ_D = 0x46200032, MIPSI_C_OLT_S = 0x46000034, MIPSI_C_OLT_D = 0x46200034, MIPSI_C_ULT_D = 0x46200035, MIPSI_C_OLE_D = 0x46200036, MIPSI_C_ULE_D = 0x46200037, +#else + MIPSI_BC1EQZ = 0x45200000, + MIPSI_BC1NEZ = 0x45a00000, + MIPSI_CMP_EQ_D = 0x46a00002, + MIPSI_CMP_LT_S = 0x46800004, + MIPSI_CMP_LT_D = 0x46a00004, +#endif + } MIPSIns; #endif diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc index 9839b5ac..44fba36c 100644 --- a/src/vm_mips64.dasc +++ b/src/vm_mips64.dasc @@ -83,6 +83,10 @@ | |.define FRET1, f0 |.define FRET2, f2 +| +|.define FTMP0, f20 +|.define FTMP1, f21 +|.define FTMP2, f22 |.endif | |// Stack layout while in interpreter. Must match with lj_frame.h. @@ -310,10 +314,10 @@ |.endmacro | |// Assumes DISPATCH is relative to GL. -#define DISPATCH_GL(field) (GG_DISP2G + (int)offsetof(global_State, field)) -#define DISPATCH_J(field) (GG_DISP2J + (int)offsetof(jit_State, field)) -#define GG_DISP2GOT (GG_OFS(got) - GG_OFS(dispatch)) -#define DISPATCH_GOT(name) (GG_DISP2GOT + sizeof(void*)*LJ_GOT_##name) +#define DISPATCH_GL(field) (GG_DISP2G + (int)offsetof(global_State, field)) +#define DISPATCH_J(field) (GG_DISP2J + (int)offsetof(jit_State, field)) +#define GG_DISP2GOT (GG_OFS(got) - GG_OFS(dispatch)) +#define DISPATCH_GOT(name) (GG_DISP2GOT + sizeof(void*)*LJ_GOT_##name) | #define PC2PROTO(field) ((int)offsetof(GCproto, field)-(int)sizeof(GCproto)) | @@ -492,8 +496,15 @@ static void build_subroutines(BuildCtx *ctx) |7: // Less results wanted. | subu TMP0, RD, TMP2 | dsubu TMP0, BASE, TMP0 // Either keep top or shrink it. + |.if MIPSR6 + | selnez TMP0, TMP0, TMP2 // LUA_MULTRET+1 case? + | seleqz BASE, BASE, TMP2 + | b <3 + |. or BASE, BASE, TMP0 + |.else | b <3 |. movn BASE, TMP0, TMP2 // LUA_MULTRET+1 case? + |.endif | |8: // Corner case: need to grow stack for filling up results. | // This can happen if: @@ -1125,11 +1136,16 @@ static void build_subroutines(BuildCtx *ctx) |.endmacro | |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1 and has delay slot! + |// MIPSR6: no delay slot, but a forbidden slot. |.macro ffgccheck | ld TMP0, DISPATCH_GL(gc.total)(DISPATCH) | ld TMP1, DISPATCH_GL(gc.threshold)(DISPATCH) | dsubu AT, TMP0, TMP1 + |.if MIPSR6 + | bgezalc AT, ->fff_gcstep + |.else | bgezal AT, ->fff_gcstep + |.endif |.endmacro | |//-- Base library: checks ----------------------------------------------- @@ -1157,7 +1173,13 @@ static void build_subroutines(BuildCtx *ctx) | sltu TMP1, TISNUM, TMP0 | not TMP2, TMP0 | li TMP3, ~LJ_TISNUM + |.if MIPSR6 + | selnez TMP2, TMP2, TMP1 + | seleqz TMP3, TMP3, TMP1 + | or TMP2, TMP2, TMP3 + |.else | movz TMP2, TMP3, TMP1 + |.endif | dsll TMP2, TMP2, 3 | daddu TMP2, CFUNC:RB, TMP2 | b ->fff_restv @@ -1169,7 +1191,11 @@ static void build_subroutines(BuildCtx *ctx) | gettp TMP2, CARG1 | daddiu TMP0, TMP2, -LJ_TTAB | daddiu TMP1, TMP2, -LJ_TUDATA + |.if MIPSR6 + | selnez TMP0, TMP1, TMP0 + |.else | movn TMP0, TMP1, TMP0 + |.endif | bnez TMP0, >6 |. cleartp TAB:CARG1 |1: // Field metatable must be at same offset for GCtab and GCudata! @@ -1208,7 +1234,13 @@ static void build_subroutines(BuildCtx *ctx) | |6: | sltiu AT, TMP2, LJ_TISNUM + |.if MIPSR6 + | selnez TMP0, TISNUM, AT + | seleqz AT, TMP2, AT + | or TMP2, TMP0, AT + |.else | movn TMP2, TISNUM, AT + |.endif | dsll TMP2, TMP2, 3 | dsubu TMP0, DISPATCH, TMP2 | b <2 @@ -1270,8 +1302,13 @@ static void build_subroutines(BuildCtx *ctx) | or TMP0, TMP0, TMP1 | bnez TMP0, ->fff_fallback |. sd BASE, L->base // Add frame since C call can throw. + |.if MIPSR6 + | sd PC, SAVE_PC // Redundant (but a defined value). + | ffgccheck + |.else | ffgccheck |. sd PC, SAVE_PC // Redundant (but a defined value). + |.endif | load_got lj_strfmt_number | move CARG1, L | call_intern lj_strfmt_number // (lua_State *L, cTValue *o) @@ -1441,8 +1478,15 @@ static void build_subroutines(BuildCtx *ctx) | addiu AT, TMP0, -LUA_YIELD | daddu CARG3, CARG2, TMP0 | daddiu TMP3, CARG2, 8 + |.if MIPSR6 + | seleqz CARG2, CARG2, AT + | selnez TMP3, TMP3, AT + | bgtz AT, ->fff_fallback // st > LUA_YIELD? + |. or CARG2, TMP3, CARG2 + |.else | bgtz AT, ->fff_fallback // st > LUA_YIELD? |. movn CARG2, TMP3, AT + |.endif | xor TMP2, TMP2, CARG3 | bnez TMP1, ->fff_fallback // cframe != 0? |. or AT, TMP2, TMP0 @@ -1754,7 +1798,7 @@ static void build_subroutines(BuildCtx *ctx) | b ->fff_res |. li RD, (2+1)*8 | - |.macro math_minmax, name, intins, fpins + |.macro math_minmax, name, intins, intinsc, fpins | .ffunc_1 name | daddu TMP3, BASE, NARGS8:RC | checkint CARG1, >5 @@ -1766,7 +1810,13 @@ static void build_subroutines(BuildCtx *ctx) |. sextw CARG1, CARG1 | lw CARG2, LO(TMP2) |. slt AT, CARG1, CARG2 + |.if MIPSR6 + | intins TMP1, CARG2, AT + | intinsc CARG1, CARG1, AT + | or CARG1, CARG1, TMP1 + |.else | intins CARG1, CARG2, AT + |.endif | daddiu TMP2, TMP2, 8 | zextw CARG1, CARG1 | b <1 @@ -1802,13 +1852,23 @@ static void build_subroutines(BuildCtx *ctx) |. nop |7: |.if FPU + |.if MIPSR6 + | fpins FRET1, FRET1, FARG1 + |.else | c.olt.d FRET1, FARG1 | fpins FRET1, FARG1 + |.endif |.else | bal ->vm_sfcmpolt |. nop + |.if MIPSR6 + | intins AT, CARG2, CRET1 + | intinsc CARG1, CARG1, CRET1 + | or CARG1, CARG1, AT + |.else | intins CARG1, CARG2, CRET1 |.endif + |.endif | b <6 |. daddiu TMP2, TMP2, 8 | @@ -1828,8 +1888,13 @@ static void build_subroutines(BuildCtx *ctx) | |.endmacro | - | math_minmax math_min, movz, movf.d - | math_minmax math_max, movn, movt.d + |.if MIPSR6 + | math_minmax math_min, seleqz, selnez, min.d + | math_minmax math_max, selnez, seleqz, max.d + |.else + | math_minmax math_min, movz, _, movf.d + | math_minmax math_max, movn, _, movt.d + |.endif | |//-- String library ----------------------------------------------------- | @@ -1854,7 +1919,9 @@ static void build_subroutines(BuildCtx *ctx) | |.ffunc string_char // Only handle the 1-arg case here. | ffgccheck + |.if not MIPSR6 |. nop + |.endif | ld CARG1, 0(BASE) | gettp TMP0, CARG1 | xori AT, NARGS8:RC, 8 // Exactly 1 argument. @@ -1884,7 +1951,9 @@ static void build_subroutines(BuildCtx *ctx) | |.ffunc string_sub | ffgccheck + |.if not MIPSR6 |. nop + |.endif | addiu AT, NARGS8:RC, -16 | ld TMP0, 0(BASE) | bltz AT, ->fff_fallback @@ -1907,8 +1976,30 @@ static void build_subroutines(BuildCtx *ctx) | addiu TMP0, CARG2, 1 | addu TMP1, CARG4, TMP0 | slt TMP3, CARG3, r0 + |.if MIPSR6 + | seleqz CARG4, CARG4, AT + | selnez TMP1, TMP1, AT + | or CARG4, TMP1, CARG4 // if (end < 0) end += len+1 + |.else | movn CARG4, TMP1, AT // if (end < 0) end += len+1 + |.endif | addu TMP1, CARG3, TMP0 + |.if MIPSR6 + | selnez TMP1, TMP1, TMP3 + | seleqz CARG3, CARG3, TMP3 + | or CARG3, TMP1, CARG3 // if (start < 0) start += len+1 + | li TMP2, 1 + | slt AT, CARG4, r0 + | slt TMP3, r0, CARG3 + | seleqz CARG4, CARG4, AT // if (end < 0) end = 0 + | selnez CARG3, CARG3, TMP3 + | seleqz TMP2, TMP2, TMP3 + | or CARG3, TMP2, CARG3 // if (start < 1) start = 1 + | slt AT, CARG2, CARG4 + | seleqz CARG4, CARG4, AT + | selnez CARG2, CARG2, AT + | or CARG4, CARG2, CARG4 // if (end > len) end = len + |.else | movn CARG3, TMP1, TMP3 // if (start < 0) start += len+1 | li TMP2, 1 | slt AT, CARG4, r0 @@ -1917,6 +2008,7 @@ static void build_subroutines(BuildCtx *ctx) | movz CARG3, TMP2, TMP3 // if (start < 1) start = 1 | slt AT, CARG2, CARG4 | movn CARG4, CARG2, AT // if (end > len) end = len + |.endif | daddu CARG2, STR:CARG1, CARG3 | subu CARG3, CARG4, CARG3 // len = end - start | daddiu CARG2, CARG2, sizeof(GCstr)-1 @@ -1978,7 +2070,13 @@ static void build_subroutines(BuildCtx *ctx) | slt AT, CARG1, r0 | dsrlv CRET1, TMP0, CARG3 | dsubu TMP0, r0, CRET1 + |.if MIPSR6 + | selnez TMP0, TMP0, AT + | seleqz CRET1, CRET1, AT + | or CRET1, CRET1, TMP0 + |.else | movn CRET1, TMP0, AT + |.endif | jr ra |. zextw CRET1, CRET1 |1: @@ -2001,14 +2099,28 @@ static void build_subroutines(BuildCtx *ctx) | slt AT, CARG1, r0 | dsrlv CRET1, CRET2, TMP0 | dsubu CARG1, r0, CRET1 + |.if MIPSR6 + | seleqz CRET1, CRET1, AT + | selnez CARG1, CARG1, AT + | or CRET1, CRET1, CARG1 + |.else | movn CRET1, CARG1, AT + |.endif | li CARG1, 64 | subu TMP0, CARG1, TMP0 | dsllv CRET2, CRET2, TMP0 // Integer check. | sextw AT, CRET1 | xor AT, CRET1, AT // Range check. | jr ra + |.if MIPSR6 + | seleqz AT, AT, CRET2 + | selnez CRET2, CRET2, CRET2 + | jr ra + |. or CRET2, AT, CRET2 + |.else + | jr ra |. movz CRET2, AT, CRET2 + |.endif |1: | jr ra |. li CRET2, 1 @@ -2518,15 +2630,22 @@ static void build_subroutines(BuildCtx *ctx) | |// Hard-float round to integer. |// Modifies AT, TMP0, FRET1, FRET2, f4. Keeps all others incl. FARG1. + |// MIPSR6: Modifies FTMP1, too. |.macro vm_round_hf, func | lui TMP0, 0x4330 // Hiword of 2^52 (double). | dsll TMP0, TMP0, 32 | dmtc1 TMP0, f4 | abs.d FRET2, FARG1 // |x| | dmfc1 AT, FARG1 + |.if MIPSR6 + | cmp.lt.d FTMP1, FRET2, f4 + | add.d FRET1, FRET2, f4 // (|x| + 2^52) - 2^52 + | bc1eqz FTMP1, >1 // Truncate only if |x| < 2^52. + |.else | c.olt.d 0, FRET2, f4 | add.d FRET1, FRET2, f4 // (|x| + 2^52) - 2^52 | bc1f 0, >1 // Truncate only if |x| < 2^52. + |.endif |. sub.d FRET1, FRET1, f4 | slt AT, AT, r0 |.if "func" == "ceil" @@ -2537,16 +2656,38 @@ static void build_subroutines(BuildCtx *ctx) |.if "func" == "trunc" | dsll TMP0, TMP0, 32 | dmtc1 TMP0, f4 + |.if MIPSR6 + | cmp.lt.d FTMP1, FRET2, FRET1 // |x| < result? + | sub.d FRET2, FRET1, f4 + | sel.d FTMP1, FRET1, FRET2 // If yes, subtract +1. + | dmtc1 AT, FRET1 + | neg.d FRET2, FTMP1 + | jr ra + |. sel.d FRET1, FTMP1, FRET2 // Merge sign bit back in. + |.else | c.olt.d 0, FRET2, FRET1 // |x| < result? | sub.d FRET2, FRET1, f4 | movt.d FRET1, FRET2, 0 // If yes, subtract +1. | neg.d FRET2, FRET1 | jr ra |. movn.d FRET1, FRET2, AT // Merge sign bit back in. + |.endif |.else | neg.d FRET2, FRET1 | dsll TMP0, TMP0, 32 | dmtc1 TMP0, f4 + |.if MIPSR6 + | dmtc1 AT, FTMP1 + | sel.d FTMP1, FRET1, FRET2 + |.if "func" == "ceil" + | cmp.lt.d FRET1, FTMP1, FARG1 // x > result? + |.else + | cmp.lt.d FRET1, FARG1, FTMP1 // x < result? + |.endif + | sub.d FRET2, FTMP1, f4 // If yes, subtract +-1. + | jr ra + |. sel.d FRET1, FTMP1, FRET2 + |.else | movn.d FRET1, FRET2, AT // Merge sign bit back in. |.if "func" == "ceil" | c.olt.d 0, FRET1, FARG1 // x > result? @@ -2557,6 +2698,7 @@ static void build_subroutines(BuildCtx *ctx) | jr ra |. movt.d FRET1, FRET2, 0 |.endif + |.endif |1: | jr ra |. mov.d FRET1, FARG1 @@ -2701,7 +2843,7 @@ static void build_subroutines(BuildCtx *ctx) |. li CRET1, 0 |.endif | - |.macro sfmin_max, name, intins + |.macro sfmin_max, name, intins, intinsc |->vm_sf .. name: |.if JIT and not FPU | move TMP2, ra @@ -2710,13 +2852,25 @@ static void build_subroutines(BuildCtx *ctx) | move ra, TMP2 | move TMP0, CRET1 | move CRET1, CARG1 + |.if MIPSR6 + | intins CRET1, CRET1, TMP0 + | intinsc TMP0, CARG2, TMP0 + | jr ra + |. or CRET1, CRET1, TMP0 + |.else | jr ra |. intins CRET1, CARG2, TMP0 |.endif + |.endif |.endmacro | - | sfmin_max min, movz - | sfmin_max max, movn + |.if MIPSR6 + | sfmin_max min, selnez, seleqz + | sfmin_max max, seleqz, selnez + |.else + | sfmin_max min, movz, _ + | sfmin_max max, movn, _ + |.endif | |//----------------------------------------------------------------------- |//-- Miscellaneous functions -------------------------------------------- @@ -2885,7 +3039,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) | slt AT, CARG1, CARG2 | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + | movop TMP2, TMP2, AT + |.else | movop TMP2, r0, AT + |.endif |1: | daddu PC, PC, TMP2 | ins_next @@ -2903,16 +3061,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |.endif |3: // RA and RD are both numbers. |.if FPU - | fcomp f20, f22 + |.if MIPSR6 + | fcomp FTMP0, FTMP0, FTMP2 + | addu TMP2, TMP2, TMP3 + | mfc1 TMP3, FTMP0 + | b <1 + |. fmovop TMP2, TMP2, TMP3 + |.else + | fcomp FTMP0, FTMP2 | addu TMP2, TMP2, TMP3 | b <1 |. fmovop TMP2, r0 + |.endif |.else | bal sfcomp |. addu TMP2, TMP2, TMP3 | b <1 + |.if MIPSR6 + |. movop TMP2, TMP2, CRET1 + |.else |. movop TMP2, r0, CRET1 |.endif + |.endif | |4: // RA is a number, RD is not a number. | bne CARG4, TISNUM, ->vmeta_comp @@ -2959,15 +3129,27 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |.endif |.endmacro | + |.if MIPSR6 + if (op == BC_ISLT) { + | bc_comp FTMP0, FTMP2, CARG1, CARG2, selnez, selnez, cmp.lt.d, ->vm_sfcmpolt + } else if (op == BC_ISGE) { + | bc_comp FTMP0, FTMP2, CARG1, CARG2, seleqz, seleqz, cmp.lt.d, ->vm_sfcmpolt + } else if (op == BC_ISLE) { + | bc_comp FTMP2, FTMP0, CARG2, CARG1, seleqz, seleqz, cmp.ult.d, ->vm_sfcmpult + } else { + | bc_comp FTMP2, FTMP0, CARG2, CARG1, selnez, selnez, cmp.ult.d, ->vm_sfcmpult + } + |.else if (op == BC_ISLT) { - | bc_comp f20, f22, CARG1, CARG2, movz, movf, c.olt.d, ->vm_sfcmpolt + | bc_comp FTMP0, FTMP2, CARG1, CARG2, movz, movf, c.olt.d, ->vm_sfcmpolt } else if (op == BC_ISGE) { - | bc_comp f20, f22, CARG1, CARG2, movn, movt, c.olt.d, ->vm_sfcmpolt + | bc_comp FTMP0, FTMP2, CARG1, CARG2, movn, movt, c.olt.d, ->vm_sfcmpolt } else if (op == BC_ISLE) { - | bc_comp f22, f20, CARG2, CARG1, movn, movt, c.ult.d, ->vm_sfcmpult + | bc_comp FTMP2, FTMP0, CARG2, CARG1, movn, movt, c.ult.d, ->vm_sfcmpult } else { - | bc_comp f22, f20, CARG2, CARG1, movz, movf, c.ult.d, ->vm_sfcmpult + | bc_comp FTMP2, FTMP0, CARG2, CARG1, movz, movf, c.ult.d, ->vm_sfcmpult } + |.endif break; case BC_ISEQV: case BC_ISNEV: @@ -3013,7 +3195,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |2: // Check if the tags are the same and it's a table or userdata. | xor AT, CARG3, CARG4 // Same type? | sltiu TMP0, CARG3, LJ_TISTABUD+1 // Table or userdata? + |.if MIPSR6 + | seleqz TMP0, TMP0, AT + |.else | movn TMP0, r0, AT + |.endif if (vk) { | beqz TMP0, <1 } else { @@ -3063,11 +3249,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) | xor TMP1, CARG1, CARG2 | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + if (vk) { + | seleqz TMP2, TMP2, TMP1 + } else { + | selnez TMP2, TMP2, TMP1 + } + |.else if (vk) { | movn TMP2, r0, TMP1 } else { | movz TMP2, r0, TMP1 } + |.endif | daddu PC, PC, TMP2 | ins_next break; @@ -3094,6 +3288,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | bne CARG4, TISNUM, >6 |. addu TMP2, TMP2, TMP3 | xor AT, CARG1, CARG2 + |.if MIPSR6 + if (vk) { + | seleqz TMP2, TMP2, AT + |1: + | daddu PC, PC, TMP2 + |2: + } else { + | selnez TMP2, TMP2, AT + |1: + |2: + | daddu PC, PC, TMP2 + } + |.else if (vk) { | movn TMP2, r0, AT |1: @@ -3105,6 +3312,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |2: | daddu PC, PC, TMP2 } + |.endif | ins_next | |3: // RA is not an integer. @@ -3117,30 +3325,49 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |. addu TMP2, TMP2, TMP3 | sltu AT, CARG4, TISNUM |.if FPU - | ldc1 f20, 0(RA) - | ldc1 f22, 0(RD) + | ldc1 FTMP0, 0(RA) + | ldc1 FTMP2, 0(RD) |.endif | beqz AT, >5 |. nop |4: // RA and RD are both numbers. |.if FPU - | c.eq.d f20, f22 + |.if MIPSR6 + | cmp.eq.d FTMP0, FTMP0, FTMP2 + | dmfc1 TMP1, FTMP0 + | b <1 + if (vk) { + |. selnez TMP2, TMP2, TMP1 + } else { + |. seleqz TMP2, TMP2, TMP1 + } + |.else + | c.eq.d FTMP0, FTMP2 | b <1 if (vk) { |. movf TMP2, r0 } else { |. movt TMP2, r0 } + |.endif |.else | bal ->vm_sfcmpeq |. nop | b <1 + |.if MIPSR6 + if (vk) { + |. selnez TMP2, TMP2, CRET1 + } else { + |. seleqz TMP2, TMP2, CRET1 + } + |.else if (vk) { |. movz TMP2, r0, CRET1 } else { |. movn TMP2, r0, CRET1 } |.endif + |.endif | |5: // RA is a number, RD is not a number. |.if FFI @@ -3150,9 +3377,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |.endif | // RA is a number, RD is an integer. Convert RD to a number. |.if FPU - |. lwc1 f22, LO(RD) + |. lwc1 FTMP2, LO(RD) | b <4 - |. cvt.d.w f22, f22 + |. cvt.d.w FTMP2, FTMP2 |.else |. sextw CARG2, CARG2 | bal ->vm_sfi2d_2 @@ -3170,10 +3397,10 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |.endif | // RA is an integer, RD is a number. Convert RA to a number. |.if FPU - |. lwc1 f20, LO(RA) - | ldc1 f22, 0(RD) + |. lwc1 FTMP0, LO(RA) + | ldc1 FTMP2, 0(RD) | b <4 - | cvt.d.w f20, f20 + | cvt.d.w FTMP0, FTMP0 |.else |. sextw CARG1, CARG1 | bal ->vm_sfi2d_1 @@ -3216,11 +3443,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | decode_RD4b TMP2 | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + if (vk) { + | seleqz TMP2, TMP2, TMP0 + } else { + | selnez TMP2, TMP2, TMP0 + } + |.else if (vk) { | movn TMP2, r0, TMP0 } else { | movz TMP2, r0, TMP0 } + |.endif | daddu PC, PC, TMP2 | ins_next break; @@ -3239,11 +3474,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | decode_RD4b TMP2 | lui TMP3, (-(BCBIAS_J*4 >> 16) & 65535) | addu TMP2, TMP2, TMP3 + |.if MIPSR6 + if (op == BC_IST) { + | selnez TMP2, TMP2, TMP0; + } else { + | seleqz TMP2, TMP2, TMP0; + } + |.else if (op == BC_IST) { | movz TMP2, r0, TMP0 } else { | movn TMP2, r0, TMP0 } + |.endif | daddu PC, PC, TMP2 } else { | ld CRET1, 0(RD) @@ -3486,9 +3729,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | bltz TMP1, ->vmeta_arith |. daddu RA, BASE, RA |.elif "intins" == "mult" + |.if MIPSR6 + |. nop + | mul CRET1, CARG3, CARG4 + | muh TMP2, CARG3, CARG4 + |.else |. intins CARG3, CARG4 | mflo CRET1 | mfhi TMP2 + |.endif | sra TMP1, CRET1, 31 | bne TMP1, TMP2, ->vmeta_arith |. daddu RA, BASE, RA @@ -3511,16 +3760,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) |.endif | |5: // Check for two numbers. - | .FPU ldc1 f20, 0(RB) + | .FPU ldc1 FTMP0, 0(RB) | sltu AT, TMP0, TISNUM | sltu TMP0, TMP1, TISNUM - | .FPU ldc1 f22, 0(RC) + | .FPU ldc1 FTMP2, 0(RC) | and AT, AT, TMP0 | beqz AT, ->vmeta_arith |. daddu RA, BASE, RA | |.if FPU - | fpins FRET1, f20, f22 + | fpins FRET1, FTMP0, FTMP2 |.elif "fpcall" == "sfpmod" | sfpmod |.else @@ -3850,7 +4099,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | li TMP0, 0x801 | addiu AT, CARG2, -0x7ff | srl CARG3, RD, 14 + |.if MIPSR6 + | seleqz TMP0, TMP0, AT + | selnez CARG2, CARG2, AT + | or CARG2, CARG2, TMP0 + |.else | movz CARG2, TMP0, AT + |.endif | // (lua_State *L, int32_t asize, uint32_t hbits) | call_intern lj_tab_new |. move CARG1, L @@ -4131,7 +4386,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | daddu NODE:TMP2, NODE:TMP2, TMP1 // node = tab->node + (idx*32-idx*8) | settp STR:RC, TMP3 // Tagged key to look for. |.if FPU - | ldc1 f20, 0(RA) + | ldc1 FTMP0, 0(RA) |.else | ld CRET1, 0(RA) |.endif @@ -4147,7 +4402,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | andi AT, TMP3, LJ_GC_BLACK // isblack(table) | bnez AT, >7 |.if FPU - |. sdc1 f20, NODE:TMP2->val + |. sdc1 FTMP0, NODE:TMP2->val |.else |. sd CRET1, NODE:TMP2->val |.endif @@ -4188,7 +4443,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | ld BASE, L->base |.if FPU | b <3 // No 2nd write barrier needed. - |. sdc1 f20, 0(CRET1) + |. sdc1 FTMP0, 0(CRET1) |.else | ld CARG1, 0(RA) | b <3 // No 2nd write barrier needed. @@ -4531,7 +4786,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | ld CARG1, 0(RC) | sltu AT, RC, TMP3 | daddiu RC, RC, 8 + |.if MIPSR6 + | selnez CARG1, CARG1, AT + | seleqz AT, TISNIL, AT + | or CARG1, CARG1, AT + |.else | movz CARG1, TISNIL, AT + |.endif | sd CARG1, 0(RA) | sltu AT, RA, TMP2 | bnez AT, <1 @@ -4720,7 +4981,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | dext AT, CRET1, 31, 0 | slt CRET1, CARG2, CARG3 | slt TMP1, CARG3, CARG2 + |.if MIPSR6 + | selnez TMP1, TMP1, AT + | seleqz CRET1, CRET1, AT + | or CRET1, CRET1, TMP1 + |.else | movn CRET1, TMP1, AT + |.endif } else { | bne CARG3, TISNUM, >5 |. ld CARG2, FORL_STEP*8(RA) // STEP CARG2 - CARG4 type @@ -4736,20 +5003,34 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | slt CRET1, CRET1, CARG1 | slt AT, CARG2, r0 | slt TMP0, TMP0, r0 // ((y^a) & (y^b)) < 0: overflow. + |.if MIPSR6 + | selnez TMP1, TMP1, AT + | seleqz CRET1, CRET1, AT + | or CRET1, CRET1, TMP1 + |.else | movn CRET1, TMP1, AT + |.endif | or CRET1, CRET1, TMP0 | zextw CARG1, CARG1 | settp CARG1, TISNUM } |1: if (op == BC_FORI) { + |.if MIPSR6 + | selnez TMP2, TMP2, CRET1 + |.else | movz TMP2, r0, CRET1 + |.endif | daddu PC, PC, TMP2 } else if (op == BC_JFORI) { | daddu PC, PC, TMP2 | lhu RD, -4+OFS_RD(PC) } else if (op == BC_IFORL) { + |.if MIPSR6 + | seleqz TMP2, TMP2, CRET1 + |.else | movn TMP2, r0, CRET1 + |.endif | daddu PC, PC, TMP2 } if (vk) { @@ -4779,6 +5060,14 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | and AT, AT, TMP0 | beqz AT, ->vmeta_for |. slt TMP3, TMP3, r0 + |.if MIPSR6 + | dmtc1 TMP3, FTMP2 + | cmp.lt.d FTMP0, f0, f2 + | cmp.lt.d FTMP1, f2, f0 + | sel.d FTMP2, FTMP1, FTMP0 + | b <1 + |. dmfc1 CRET1, FTMP2 + |.else | c.ole.d 0, f0, f2 | c.ole.d 1, f2, f0 | li CRET1, 1 @@ -4786,12 +5075,25 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | movt AT, r0, 1 | b <1 |. movn CRET1, AT, TMP3 + |.endif } else { | ldc1 f0, FORL_IDX*8(RA) | ldc1 f4, FORL_STEP*8(RA) | ldc1 f2, FORL_STOP*8(RA) | ld TMP3, FORL_STEP*8(RA) | add.d f0, f0, f4 + |.if MIPSR6 + | slt TMP3, TMP3, r0 + | dmtc1 TMP3, FTMP2 + | cmp.lt.d FTMP0, f0, f2 + | cmp.lt.d FTMP1, f2, f0 + | sel.d FTMP2, FTMP1, FTMP0 + | dmfc1 CRET1, FTMP2 + if (op == BC_IFORL) { + | seleqz TMP2, TMP2, CRET1 + | daddu PC, PC, TMP2 + } + |.else | c.ole.d 0, f0, f2 | c.ole.d 1, f2, f0 | slt TMP3, TMP3, r0 @@ -4804,6 +5106,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | movn TMP2, r0, CRET1 | daddu PC, PC, TMP2 } + |.endif | sdc1 f0, FORL_IDX*8(RA) | ins_next1 | b <2 @@ -4979,8 +5282,17 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) | ld TMP0, 0(RA) | sltu AT, RA, RC // Less args than parameters? | move CARG1, TMP0 + |.if MIPSR6 + | selnez TMP0, TMP0, AT + | seleqz TMP3, TISNIL, AT + | or TMP0, TMP0, TMP3 + | seleqz TMP3, CARG1, AT + | selnez CARG1, TISNIL, AT + | or CARG1, CARG1, TMP3 + |.else | movz TMP0, TISNIL, AT // Clear missing parameters. | movn CARG1, TISNIL, AT // Clear old fixarg slot (help the GC). + |.endif | addiu TMP2, TMP2, -1 | sd TMP0, 16(TMP1) | daddiu TMP1, TMP1, 8 -- 2.41.0