From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id C2E455C320C; Thu, 17 Aug 2023 17:53:25 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C2E455C320C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1692284005; bh=R2fmUe4KZMW6Fb9Q9+KqVVyap2uo6wYiKbSdAXbZcMQ=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=JPFJdsM8oFbAe+tmKj5C2EYyXMa/iZ4QL53WPmqo6qCMXqMcoG9zamBiyW77YYFhX OJZupIMbIVXXdaPrN6p1tzf/mO+GvvHwFiSbiBsXwieIc0MJgjPfGbFDH/juRYctH/ 8z5pCKKwsYjOFMQufrmqba1KpASBhxHicf4JQ4ds= Received: from smtp35.i.mail.ru (smtp35.i.mail.ru [95.163.41.76]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C49915C320C for ; Thu, 17 Aug 2023 17:53:24 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C49915C320C Received: by smtp35.i.mail.ru with esmtpa (envelope-from ) id 1qWeNH-007Dko-1h; Thu, 17 Aug 2023 17:53:24 +0300 Message-ID: <6eb33fda-1c3b-7ce2-f7f5-955229429873@tarantool.org> Date: Thu, 17 Aug 2023 17:53:23 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Content-Language: en-US To: Sergey Kaplun , Igor Munkin Cc: tarantool-patches@dev.tarantool.org References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9700E0DCE2907754D1399757346C038D9B2FE68E137CB1837182A05F538085040C166940AD2354C3C3209D0D68FDDE11C0E6F343DCF10A4A38200ED41C4AA01B7 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE780D115B306136E0AEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063788758EA7442DD2858638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D87D4401612F3AFC419DEC67508F7BFD8C117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCDCBA8CBAA3833548A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18BDFBBEFFF4125B51D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE26055571C92BF10F287C8E22D4AE2A51D8FC6C240DEA76429C9F4D5AE37F343AA9539A8B242431040A6AB1C7CE11FEE32D01283D1ACF37BA302FCEF25BFAB345C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947C2FFDA4F57982C5F42E808ACE2090B5E1725E5C173C3A84C34964A708C60C975A089D37D7C0E48F6C8AA50765F79006373BA04B6A498D0BA4731C566533BA786AA5CC5B56E945C8DA X-B7AD71C0: 4965CFDFE0519134C1FE400A9E48C5401DD40DE57556AFB266D16FC5F53507A1816E0A2A8F779BBED8D40077074E805C66D16FC5F53507A117535B0CF9F6D0C3EE9D5CB6078CC77CED87DEF3E013AF28EFFFE7C7C1A70394 X-C1DE0DAB: 0D63561A33F958A5C59FCEC3F48A39AF3EF1F0C60D6113E166D32F0EC6D2A0F4F87CCE6106E1FC07E67D4AC08A07B9B04E7D9683544204AFCB5012B2E24CD356 X-C8649E89: 1C3962B70DF3F0AD75DCE07D45A749953FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFC2CED21642DB732680B972B0CC9B070641DED7D6619D935B7770ED11B6ED1804F6452569441AF36424F7434E9608BF9FD24239AACBFE4720E5CCB0490AD0A615E48CAC7CA610320002C26D483E81D6BE0DBAE6F56676BC7117BB6831D7356A2DEC5B5AD62611EEC62B5AFB4261A09AF0 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojFRrmMqSMPxr6IG1/5e0/Bw== X-Mailru-Sender: 11C2EC085EDE56FAC07928AF2646A76911A8B28D3081CB003209D0D68FDDE11C14D23FB96A9F340BEBA65886582A37BD66FEC6BF5C9C28D98A98C1125256619760D574B6FC815AB872D6B4FCE48DF648AE208404248635DF X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 05/19] PPC: Add soft-float support to interpreter. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Sergey thanks for the patch! LGTM On 8/9/23 18:35, Sergey Kaplun wrote: > From: Mike Pall > > Contributed by Djordje Kovacevic and Stefan Pejic from RT-RK.com. > Sponsored by Cisco Systems, Inc. > > (cherry-picked from commit 71b7bc88341945f13f3951e2bb5fd247b639ff7a) > > The software floating point library is used on machines which do not > have hardware support for floating point [1]. This patch enables > support for such machines in the VM for powerpc. > This includes: > * Any loads/storages of double values use load/storage through 32-bit > registers of `lo` and `hi` part of the TValue union. > * Macro .FPU is added to skip instructions necessary only for > hard-float operations (load/store floating point registers from/on the > stack, when leave/enter VM, for example). > * Now r25 named as `SAVE1` is used as saved temporary register (used in > different fast functions) > * `sfi2d` macro is introduced to convert integer, that represents a > soft-float, to double. Receives destination and source registers, uses > `TMP0` and `TMP1`. > * `sfpmod` macro is introduced for soft-float point `fmod` built-in. > * `ins_arith` now receives the third parameter -- operation to use for > soft-float point. > * `LJ_ARCH_HASFPU`, `LJ_ABI_SOFTFP` macros are introduced to mark that > there is defined `_SOFT_FLOAT` or `_SOFT_DOUBLE`. `LJ_ARCH_NUMMODE` is > set to the `LJ_NUMMODE_DUAL`, when `LJ_ABI_SOFTFP` is true. > > Support of soft-float point for the JIT compiler will be added in the > next patch. > > [1]: https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html > > Sergey Kaplun: > * added the description for the feature > > Part of tarantool/tarantool#8825 > --- > src/host/buildvm_asm.c | 2 +- > src/lj_arch.h | 29 +- > src/lj_ccall.c | 38 +- > src/lj_ccall.h | 4 +- > src/lj_ccallback.c | 30 +- > src/lj_frame.h | 2 +- > src/lj_ircall.h | 2 +- > src/vm_ppc.dasc | 1249 +++++++++++++++++++++++++++++++++------- > 8 files changed, 1101 insertions(+), 255 deletions(-) > > diff --git a/src/host/buildvm_asm.c b/src/host/buildvm_asm.c > index ffd14903..43595b31 100644 > --- a/src/host/buildvm_asm.c > +++ b/src/host/buildvm_asm.c > @@ -338,7 +338,7 @@ void emit_asm(BuildCtx *ctx) > #if !(LJ_TARGET_PS3 || LJ_TARGET_PSVITA) > fprintf(ctx->fp, "\t.section .note.GNU-stack,\"\"," ELFASM_PX "progbits\n"); > #endif > -#if LJ_TARGET_PPC && !LJ_TARGET_PS3 > +#if LJ_TARGET_PPC && !LJ_TARGET_PS3 && !LJ_ABI_SOFTFP > /* Hard-float ABI. */ > fprintf(ctx->fp, "\t.gnu_attribute 4, 1\n"); > #endif > diff --git a/src/lj_arch.h b/src/lj_arch.h > index c39526ea..8bb8757d 100644 > --- a/src/lj_arch.h > +++ b/src/lj_arch.h > @@ -262,6 +262,29 @@ > #else > #define LJ_ARCH_BITS 32 > #define LJ_ARCH_NAME "ppc" > + > +#if !defined(LJ_ARCH_HASFPU) > +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE) > +#define LJ_ARCH_HASFPU 0 > +#else > +#define LJ_ARCH_HASFPU 1 > +#endif > +#endif > + > +#if !defined(LJ_ABI_SOFTFP) > +#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE) > +#define LJ_ABI_SOFTFP 1 > +#else > +#define LJ_ABI_SOFTFP 0 > +#endif > +#endif > +#endif > + > +#if LJ_ABI_SOFTFP > +#define LJ_ARCH_NOJIT 1 /* NYI */ > +#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL > +#else > +#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE > #endif > > #define LJ_TARGET_PPC 1 > @@ -271,7 +294,6 @@ > #define LJ_TARGET_MASKSHIFT 0 > #define LJ_TARGET_MASKROT 1 > #define LJ_TARGET_UNIFYROT 1 /* Want only IR_BROL. */ > -#define LJ_ARCH_NUMMODE LJ_NUMMODE_DUAL_SINGLE > > #if LJ_TARGET_CONSOLE > #define LJ_ARCH_PPC32ON64 1 > @@ -431,16 +453,13 @@ > #error "No support for ILP32 model on ARM64" > #endif > #elif LJ_TARGET_PPC > -#if defined(_SOFT_FLOAT) || defined(_SOFT_DOUBLE) > -#error "No support for PowerPC CPUs without double-precision FPU" > -#endif > #if !LJ_ARCH_PPC64 && LJ_ARCH_ENDIAN == LUAJIT_LE > #error "No support for little-endian PPC32" > #endif > #if LJ_ARCH_PPC64 > #error "No support for PowerPC 64 bit mode (yet)" > #endif > -#ifdef __NO_FPRS__ > +#if defined(__NO_FPRS__) && !defined(_SOFT_FLOAT) > #error "No support for PPC/e500 anymore (use LuaJIT 2.0)" > #endif > #elif LJ_TARGET_MIPS32 > diff --git a/src/lj_ccall.c b/src/lj_ccall.c > index d39ff861..c1e12f56 100644 > --- a/src/lj_ccall.c > +++ b/src/lj_ccall.c > @@ -388,6 +388,24 @@ > #define CCALL_HANDLE_COMPLEXARG \ > /* Pass complex by value in 2 or 4 GPRs. */ > > +#define CCALL_HANDLE_GPR \ > + /* Try to pass argument in GPRs. */ \ > + if (n > 1) { \ > + lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \ > + if (ctype_isinteger(d->info) || ctype_isfp(d->info)) \ > + ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \ > + else if (ngpr + n > maxgpr) \ > + ngpr = maxgpr; /* Prevent reordering. */ \ > + } \ > + if (ngpr + n <= maxgpr) { \ > + dp = &cc->gpr[ngpr]; \ > + ngpr += n; \ > + goto done; \ > + } \ > + > +#if LJ_ABI_SOFTFP > +#define CCALL_HANDLE_REGARG CCALL_HANDLE_GPR > +#else > #define CCALL_HANDLE_REGARG \ > if (isfp) { /* Try to pass argument in FPRs. */ \ > if (nfpr + 1 <= CCALL_NARG_FPR) { \ > @@ -396,24 +414,16 @@ > d = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */ \ > goto done; \ > } \ > - } else { /* Try to pass argument in GPRs. */ \ > - if (n > 1) { \ > - lua_assert(n == 2 || n == 4); /* int64_t or complex (float). */ \ > - if (ctype_isinteger(d->info)) \ > - ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \ > - else if (ngpr + n > maxgpr) \ > - ngpr = maxgpr; /* Prevent reordering. */ \ > - } \ > - if (ngpr + n <= maxgpr) { \ > - dp = &cc->gpr[ngpr]; \ > - ngpr += n; \ > - goto done; \ > - } \ > + } else { \ > + CCALL_HANDLE_GPR \ > } > +#endif > > +#if !LJ_ABI_SOFTFP > #define CCALL_HANDLE_RET \ > if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \ > ctr = ctype_get(cts, CTID_DOUBLE); /* FPRs always hold doubles. */ > +#endif > > #elif LJ_TARGET_MIPS32 > /* -- MIPS o32 calling conventions ---------------------------------------- */ > @@ -1081,7 +1091,7 @@ static int ccall_set_args(lua_State *L, CTState *cts, CType *ct, > } > if (fid) lj_err_caller(L, LJ_ERR_FFI_NUMARG); /* Too few arguments. */ > > -#if LJ_TARGET_X64 || LJ_TARGET_PPC > +#if LJ_TARGET_X64 || (LJ_TARGET_PPC && !LJ_ABI_SOFTFP) > cc->nfpr = nfpr; /* Required for vararg functions. */ > #endif > cc->nsp = nsp; > diff --git a/src/lj_ccall.h b/src/lj_ccall.h > index 59f66481..6efa48c7 100644 > --- a/src/lj_ccall.h > +++ b/src/lj_ccall.h > @@ -86,9 +86,9 @@ typedef union FPRArg { > #elif LJ_TARGET_PPC > > #define CCALL_NARG_GPR 8 > -#define CCALL_NARG_FPR 8 > +#define CCALL_NARG_FPR (LJ_ABI_SOFTFP ? 0 : 8) > #define CCALL_NRET_GPR 4 /* For complex double. */ > -#define CCALL_NRET_FPR 1 > +#define CCALL_NRET_FPR (LJ_ABI_SOFTFP ? 0 : 1) > #define CCALL_SPS_EXTRA 4 > #define CCALL_SPS_FREE 0 > > diff --git a/src/lj_ccallback.c b/src/lj_ccallback.c > index 224b6b94..c33190d7 100644 > --- a/src/lj_ccallback.c > +++ b/src/lj_ccallback.c > @@ -419,6 +419,23 @@ void lj_ccallback_mcode_free(CTState *cts) > > #elif LJ_TARGET_PPC > > +#define CALLBACK_HANDLE_GPR \ > + if (n > 1) { \ > + lua_assert(((LJ_ABI_SOFTFP && ctype_isnum(cta->info)) || /* double. */ \ > + ctype_isinteger(cta->info)) && n == 2); /* int64_t. */ \ > + ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \ > + } \ > + if (ngpr + n <= maxgpr) { \ > + sp = &cts->cb.gpr[ngpr]; \ > + ngpr += n; \ > + goto done; \ > + } > + > +#if LJ_ABI_SOFTFP > +#define CALLBACK_HANDLE_REGARG \ > + CALLBACK_HANDLE_GPR \ > + UNUSED(isfp); > +#else > #define CALLBACK_HANDLE_REGARG \ > if (isfp) { \ > if (nfpr + 1 <= CCALL_NARG_FPR) { \ > @@ -427,20 +444,15 @@ void lj_ccallback_mcode_free(CTState *cts) > goto done; \ > } \ > } else { /* Try to pass argument in GPRs. */ \ > - if (n > 1) { \ > - lua_assert(ctype_isinteger(cta->info) && n == 2); /* int64_t. */ \ > - ngpr = (ngpr + 1u) & ~1u; /* Align int64_t to regpair. */ \ > - } \ > - if (ngpr + n <= maxgpr) { \ > - sp = &cts->cb.gpr[ngpr]; \ > - ngpr += n; \ > - goto done; \ > - } \ > + CALLBACK_HANDLE_GPR \ > } > +#endif > > +#if !LJ_ABI_SOFTFP > #define CALLBACK_HANDLE_RET \ > if (ctype_isfp(ctr->info) && ctr->size == sizeof(float)) \ > *(double *)dp = *(float *)dp; /* FPRs always hold doubles. */ > +#endif > > #elif LJ_TARGET_MIPS32 > > diff --git a/src/lj_frame.h b/src/lj_frame.h > index 2bdf3c48..5cb3d639 100644 > --- a/src/lj_frame.h > +++ b/src/lj_frame.h > @@ -226,7 +226,7 @@ enum { LJ_CONT_TAILCALL, LJ_CONT_FFI_CALLBACK }; /* Special continuations. */ > #define CFRAME_OFS_L 36 > #define CFRAME_OFS_PC 32 > #define CFRAME_OFS_MULTRES 28 > -#define CFRAME_SIZE 272 > +#define CFRAME_SIZE (LJ_ARCH_HASFPU ? 272 : 128) > #define CFRAME_SHIFT_MULTRES 3 > #endif > #elif LJ_TARGET_MIPS32 > diff --git a/src/lj_ircall.h b/src/lj_ircall.h > index c1ac29d1..bbad35b1 100644 > --- a/src/lj_ircall.h > +++ b/src/lj_ircall.h > @@ -291,7 +291,7 @@ LJ_DATA const CCallInfo lj_ir_callinfo[IRCALL__MAX+1]; > #define fp64_f2l __aeabi_f2lz > #define fp64_f2ul __aeabi_f2ulz > #endif > -#elif LJ_TARGET_MIPS > +#elif LJ_TARGET_MIPS || LJ_TARGET_PPC > #define softfp_add __adddf3 > #define softfp_sub __subdf3 > #define softfp_mul __muldf3 > diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc > index 7ad8df37..980ad897 100644 > --- a/src/vm_ppc.dasc > +++ b/src/vm_ppc.dasc > @@ -103,6 +103,18 @@ > |// Fixed register assignments for the interpreter. > |// Don't use: r1 = sp, r2 and r13 = reserved (TOC, TLS or SDATA) > | > +|.macro .FPU, a, b > +|.if FPU > +| a, b > +|.endif > +|.endmacro > +| > +|.macro .FPU, a, b, c > +|.if FPU > +| a, b, c > +|.endif > +|.endmacro > +| > |// The following must be C callee-save (but BASE is often refetched). > |.define BASE, r14 // Base of current Lua stack frame. > |.define KBASE, r15 // Constants of current Lua function. > @@ -116,8 +128,10 @@ > |.define TISNUM, r22 > |.define TISNIL, r23 > |.define ZERO, r24 > +|.if FPU > |.define TOBIT, f30 // 2^52 + 2^51. > |.define TONUM, f31 // 2^52 + 2^51 + 2^31. > +|.endif > | > |// The following temporaries are not saved across C calls, except for RA. > |.define RA, r20 // Callee-save. > @@ -133,6 +147,7 @@ > | > |// Saved temporaries. > |.define SAVE0, r21 > +|.define SAVE1, r25 > | > |// Calling conventions. > |.define CARG1, r3 > @@ -141,8 +156,10 @@ > |.define CARG4, r6 // Overlaps TMP3. > |.define CARG5, r7 // Overlaps INS. > | > +|.if FPU > |.define FARG1, f1 > |.define FARG2, f2 > +|.endif > | > |.define CRET1, r3 > |.define CRET2, r4 > @@ -213,10 +230,16 @@ > |.endif > |.else > | > +|.if FPU > |.define SAVE_LR, 276(sp) > |.define CFRAME_SPACE, 272 // Delta for sp. > |// Back chain for sp: 272(sp) <-- sp entering interpreter > |.define SAVE_FPR_, 128 // .. 128+18*8: 64 bit FPR saves. > +|.else > +|.define SAVE_LR, 132(sp) > +|.define CFRAME_SPACE, 128 // Delta for sp. > +|// Back chain for sp: 128(sp) <-- sp entering interpreter > +|.endif > |.define SAVE_GPR_, 56 // .. 56+18*4: 32 bit GPR saves. > |.define SAVE_CR, 52(sp) // 32 bit CR save. > |.define SAVE_ERRF, 48(sp) // 32 bit C frame info. > @@ -226,16 +249,25 @@ > |.define SAVE_PC, 32(sp) > |.define SAVE_MULTRES, 28(sp) > |.define UNUSED1, 24(sp) > +|.if FPU > |.define TMPD_LO, 20(sp) > |.define TMPD_HI, 16(sp) > |.define TONUM_LO, 12(sp) > |.define TONUM_HI, 8(sp) > +|.else > +|.define SFSAVE_4, 20(sp) > +|.define SFSAVE_3, 16(sp) > +|.define SFSAVE_2, 12(sp) > +|.define SFSAVE_1, 8(sp) > +|.endif > |// Next frame lr: 4(sp) > |// Back chain for sp: 0(sp) <-- sp while in interpreter > | > +|.if FPU > |.define TMPD_BLO, 23(sp) > |.define TMPD, TMPD_HI > |.define TONUM_D, TONUM_HI > +|.endif > | > |.endif > | > @@ -245,7 +277,7 @@ > |.else > | stw r..reg, SAVE_GPR_+(reg-14)*4(sp) > |.endif > -| stfd f..reg, SAVE_FPR_+(reg-14)*8(sp) > +| .FPU stfd f..reg, SAVE_FPR_+(reg-14)*8(sp) > |.endmacro > |.macro rest_, reg > |.if GPR64 > @@ -253,7 +285,7 @@ > |.else > | lwz r..reg, SAVE_GPR_+(reg-14)*4(sp) > |.endif > -| lfd f..reg, SAVE_FPR_+(reg-14)*8(sp) > +| .FPU lfd f..reg, SAVE_FPR_+(reg-14)*8(sp) > |.endmacro > | > |.macro saveregs > @@ -323,6 +355,7 @@ > |// Trap for not-yet-implemented parts. > |.macro NYI; tw 4, sp, sp; .endmacro > | > +|.if FPU > |// int/FP conversions. > |.macro tonum_i, freg, reg > | xoris reg, reg, 0x8000 > @@ -346,6 +379,7 @@ > |.macro toint, reg, freg > | toint reg, freg, freg > |.endmacro > +|.endif > | > |//----------------------------------------------------------------------- > | > @@ -533,9 +567,19 @@ static void build_subroutines(BuildCtx *ctx) > | beq >2 > |1: > | addic. TMP1, TMP1, -8 > + |.if FPU > | lfd f0, 0(RA) > + |.else > + | lwz CARG1, 0(RA) > + | lwz CARG2, 4(RA) > + |.endif > | addi RA, RA, 8 > + |.if FPU > | stfd f0, 0(BASE) > + |.else > + | stw CARG1, 0(BASE) > + | stw CARG2, 4(BASE) > + |.endif > | addi BASE, BASE, 8 > | bney <1 > | > @@ -613,23 +657,23 @@ static void build_subroutines(BuildCtx *ctx) > | .toc ld TOCREG, SAVE_TOC > | li TISNUM, LJ_TISNUM // Setup type comparison constants. > | lp BASE, L->base > - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > | lwz DISPATCH, L->glref // Setup pointer to dispatch table. > | li ZERO, 0 > - | stw TMP3, TMPD > + | .FPU stw TMP3, TMPD > | li TMP1, LJ_TFALSE > - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > | li TISNIL, LJ_TNIL > | li_vmstate INTERP > - | lfs TOBIT, TMPD > + | .FPU lfs TOBIT, TMPD > | lwz PC, FRAME_PC(BASE) // Fetch PC of previous frame. > | la RA, -8(BASE) // Results start at BASE-8. > - | stw TMP3, TMPD > + | .FPU stw TMP3, TMPD > | addi DISPATCH, DISPATCH, GG_G2DISP > | stw TMP1, 0(RA) // Prepend false to error message. > | li RD, 16 // 2 results: false + error message. > | st_vmstate > - | lfs TONUM, TMPD > + | .FPU lfs TONUM, TMPD > | b ->vm_returnc > | > |//----------------------------------------------------------------------- > @@ -690,22 +734,22 @@ static void build_subroutines(BuildCtx *ctx) > | li TISNUM, LJ_TISNUM // Setup type comparison constants. > | lp TMP1, L->top > | lwz PC, FRAME_PC(BASE) > - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > | stb CARG3, L->status > - | stw TMP3, TMPD > - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > - | lfs TOBIT, TMPD > + | .FPU stw TMP3, TMPD > + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > + | .FPU lfs TOBIT, TMPD > | sub RD, TMP1, BASE > - | stw TMP3, TMPD > - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > + | .FPU stw TMP3, TMPD > + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > | addi RD, RD, 8 > - | stw TMP0, TONUM_HI > + | .FPU stw TMP0, TONUM_HI > | li_vmstate INTERP > | li ZERO, 0 > | st_vmstate > | andix. TMP0, PC, FRAME_TYPE > | mr MULTRES, RD > - | lfs TONUM, TMPD > + | .FPU lfs TONUM, TMPD > | li TISNIL, LJ_TNIL > | beq ->BC_RET_Z > | b ->vm_return > @@ -739,19 +783,19 @@ static void build_subroutines(BuildCtx *ctx) > | lp TMP2, L->base // TMP2 = old base (used in vmeta_call). > | li TISNUM, LJ_TISNUM // Setup type comparison constants. > | lp TMP1, L->top > - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > | add PC, PC, BASE > - | stw TMP3, TMPD > + | .FPU stw TMP3, TMPD > | li ZERO, 0 > - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > - | lfs TOBIT, TMPD > + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > + | .FPU lfs TOBIT, TMPD > | sub PC, PC, TMP2 // PC = frame delta + frame type > - | stw TMP3, TMPD > - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > + | .FPU stw TMP3, TMPD > + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > | sub NARGS8:RC, TMP1, BASE > - | stw TMP0, TONUM_HI > + | .FPU stw TMP0, TONUM_HI > | li_vmstate INTERP > - | lfs TONUM, TMPD > + | .FPU lfs TONUM, TMPD > | li TISNIL, LJ_TNIL > | st_vmstate > | > @@ -839,15 +883,30 @@ static void build_subroutines(BuildCtx *ctx) > | lwz INS, -4(PC) > | subi CARG2, RB, 16 > | decode_RB8 SAVE0, INS > + |.if FPU > | lfd f0, 0(RA) > + |.else > + | lwz TMP2, 0(RA) > + | lwz TMP3, 4(RA) > + |.endif > | add TMP1, BASE, SAVE0 > | stp BASE, L->base > | cmplw TMP1, CARG2 > | sub CARG3, CARG2, TMP1 > | decode_RA8 RA, INS > + |.if FPU > | stfd f0, 0(CARG2) > + |.else > + | stw TMP2, 0(CARG2) > + | stw TMP3, 4(CARG2) > + |.endif > | bney ->BC_CAT_Z > + |.if FPU > | stfdx f0, BASE, RA > + |.else > + | stwux TMP2, RA, BASE > + | stw TMP3, 4(RA) > + |.endif > | b ->cont_nop > | > |//-- Table indexing metamethods ----------------------------------------- > @@ -900,9 +959,19 @@ static void build_subroutines(BuildCtx *ctx) > | // Returns TValue * (finished) or NULL (metamethod). > | cmplwi CRET1, 0 > | beq >3 > + |.if FPU > | lfd f0, 0(CRET1) > + |.else > + | lwz TMP0, 0(CRET1) > + | lwz TMP1, 4(CRET1) > + |.endif > | ins_next1 > + |.if FPU > | stfdx f0, BASE, RA > + |.else > + | stwux TMP0, RA, BASE > + | stw TMP1, 4(RA) > + |.endif > | ins_next2 > | > |3: // Call __index metamethod. > @@ -920,7 +989,12 @@ static void build_subroutines(BuildCtx *ctx) > | // Returns cTValue * or NULL. > | cmplwi CRET1, 0 > | beq >1 > + |.if FPU > | lfd f14, 0(CRET1) > + |.else > + | lwz SAVE0, 0(CRET1) > + | lwz SAVE1, 4(CRET1) > + |.endif > | b ->BC_TGETR_Z > |1: > | stwx TISNIL, BASE, RA > @@ -975,11 +1049,21 @@ static void build_subroutines(BuildCtx *ctx) > | bl extern lj_meta_tset // (lua_State *L, TValue *o, TValue *k) > | // Returns TValue * (finished) or NULL (metamethod). > | cmplwi CRET1, 0 > + |.if FPU > | lfdx f0, BASE, RA > + |.else > + | lwzux TMP2, RA, BASE > + | lwz TMP3, 4(RA) > + |.endif > | beq >3 > | // NOBARRIER: lj_meta_tset ensures the table is not black. > | ins_next1 > + |.if FPU > | stfd f0, 0(CRET1) > + |.else > + | stw TMP2, 0(CRET1) > + | stw TMP3, 4(CRET1) > + |.endif > | ins_next2 > | > |3: // Call __newindex metamethod. > @@ -990,7 +1074,12 @@ static void build_subroutines(BuildCtx *ctx) > | add PC, TMP1, BASE > | lwz LFUNC:RB, FRAME_FUNC(BASE) // Guaranteed to be a function here. > | li NARGS8:RC, 24 // 3 args for func(t, k, v) > + |.if FPU > | stfd f0, 16(BASE) // Copy value to third argument. > + |.else > + | stw TMP2, 16(BASE) > + | stw TMP3, 20(BASE) > + |.endif > | b ->vm_call_dispatch_f > | > |->vmeta_tsetr: > @@ -999,7 +1088,12 @@ static void build_subroutines(BuildCtx *ctx) > | stw PC, SAVE_PC > | bl extern lj_tab_setinth // (lua_State *L, GCtab *t, int32_t key) > | // Returns TValue *. > + |.if FPU > | stfd f14, 0(CRET1) > + |.else > + | stw SAVE0, 0(CRET1) > + | stw SAVE1, 4(CRET1) > + |.endif > | b ->cont_nop > | > |//-- Comparison metamethods --------------------------------------------- > @@ -1038,9 +1132,19 @@ static void build_subroutines(BuildCtx *ctx) > | > |->cont_ra: // RA = resultptr > | lwz INS, -4(PC) > + |.if FPU > | lfd f0, 0(RA) > + |.else > + | lwz CARG1, 0(RA) > + | lwz CARG2, 4(RA) > + |.endif > | decode_RA8 TMP1, INS > + |.if FPU > | stfdx f0, BASE, TMP1 > + |.else > + | stwux CARG1, TMP1, BASE > + | stw CARG2, 4(TMP1) > + |.endif > | b ->cont_nop > | > |->cont_condt: // RA = resultptr > @@ -1246,22 +1350,32 @@ static void build_subroutines(BuildCtx *ctx) > |.macro .ffunc_n, name > |->ff_ .. name: > | cmplwi NARGS8:RC, 8 > - | lwz CARG3, 0(BASE) > + | lwz CARG1, 0(BASE) > + |.if FPU > | lfd FARG1, 0(BASE) > + |.else > + | lwz CARG2, 4(BASE) > + |.endif > | blt ->fff_fallback > - | checknum CARG3; bge ->fff_fallback > + | checknum CARG1; bge ->fff_fallback > |.endmacro > | > |.macro .ffunc_nn, name > |->ff_ .. name: > | cmplwi NARGS8:RC, 16 > - | lwz CARG3, 0(BASE) > + | lwz CARG1, 0(BASE) > + |.if FPU > | lfd FARG1, 0(BASE) > - | lwz CARG4, 8(BASE) > + | lwz CARG3, 8(BASE) > | lfd FARG2, 8(BASE) > + |.else > + | lwz CARG2, 4(BASE) > + | lwz CARG3, 8(BASE) > + | lwz CARG4, 12(BASE) > + |.endif > | blt ->fff_fallback > + | checknum CARG1; bge ->fff_fallback > | checknum CARG3; bge ->fff_fallback > - | checknum CARG4; bge ->fff_fallback > |.endmacro > | > |// Inlined GC threshold check. Caveat: uses TMP0 and TMP1. > @@ -1282,14 +1396,21 @@ static void build_subroutines(BuildCtx *ctx) > | bge cr1, ->fff_fallback > | stw CARG3, 0(RA) > | addi RD, NARGS8:RC, 8 // Compute (nresults+1)*8. > + | addi TMP1, BASE, 8 > + | add TMP2, RA, NARGS8:RC > | stw CARG1, 4(RA) > | beq ->fff_res // Done if exactly 1 argument. > - | li TMP1, 8 > - | subi RC, RC, 8 > |1: > - | cmplw TMP1, RC > - | lfdx f0, BASE, TMP1 > - | stfdx f0, RA, TMP1 > + | cmplw TMP1, TMP2 > + |.if FPU > + | lfd f0, 0(TMP1) > + | stfd f0, 0(TMP1) > + |.else > + | lwz CARG1, 0(TMP1) > + | lwz CARG2, 4(TMP1) > + | stw CARG1, -8(TMP1) > + | stw CARG2, -4(TMP1) > + |.endif > | addi TMP1, TMP1, 8 > | bney <1 > | b ->fff_res > @@ -1304,8 +1425,14 @@ static void build_subroutines(BuildCtx *ctx) > | orc TMP1, TMP2, TMP0 > | addi TMP1, TMP1, ~LJ_TISNUM+1 > | slwi TMP1, TMP1, 3 > + |.if FPU > | la TMP2, CFUNC:RB->upvalue > | lfdx FARG1, TMP2, TMP1 > + |.else > + | add TMP1, CFUNC:RB, TMP1 > + | lwz CARG1, CFUNC:TMP1->upvalue[0].u32.hi > + | lwz CARG2, CFUNC:TMP1->upvalue[0].u32.lo > + |.endif > | b ->fff_resn > | > |//-- Base library: getters and setters --------------------------------- > @@ -1383,7 +1510,12 @@ static void build_subroutines(BuildCtx *ctx) > | mr CARG1, L > | bl extern lj_tab_get // (lua_State *L, GCtab *t, cTValue *key) > | // Returns cTValue *. > + |.if FPU > | lfd FARG1, 0(CRET1) > + |.else > + | lwz CARG2, 4(CRET1) > + | lwz CARG1, 0(CRET1) // Caveat: CARG1 == CRET1. > + |.endif > | b ->fff_resn > | > |//-- Base library: conversions ------------------------------------------ > @@ -1392,7 +1524,11 @@ static void build_subroutines(BuildCtx *ctx) > | // Only handles the number case inline (without a base argument). > | cmplwi NARGS8:RC, 8 > | lwz CARG1, 0(BASE) > + |.if FPU > | lfd FARG1, 0(BASE) > + |.else > + | lwz CARG2, 4(BASE) > + |.endif > | bne ->fff_fallback // Exactly one argument. > | checknum CARG1; bgt ->fff_fallback > | b ->fff_resn > @@ -1443,12 +1579,23 @@ static void build_subroutines(BuildCtx *ctx) > | cmplwi CRET1, 0 > | li CARG3, LJ_TNIL > | beq ->fff_restv // End of traversal: return nil. > - | lfd f0, 8(BASE) // Copy key and value to results. > | la RA, -8(BASE) > + |.if FPU > + | lfd f0, 8(BASE) // Copy key and value to results. > | lfd f1, 16(BASE) > | stfd f0, 0(RA) > - | li RD, (2+1)*8 > | stfd f1, 8(RA) > + |.else > + | lwz CARG1, 8(BASE) > + | lwz CARG2, 12(BASE) > + | lwz CARG3, 16(BASE) > + | lwz CARG4, 20(BASE) > + | stw CARG1, 0(RA) > + | stw CARG2, 4(RA) > + | stw CARG3, 8(RA) > + | stw CARG4, 12(RA) > + |.endif > + | li RD, (2+1)*8 > | b ->fff_res > | > |.ffunc_1 pairs > @@ -1457,17 +1604,32 @@ static void build_subroutines(BuildCtx *ctx) > | bne ->fff_fallback > #if LJ_52 > | lwz TAB:TMP2, TAB:CARG1->metatable > + |.if FPU > | lfd f0, CFUNC:RB->upvalue[0] > + |.else > + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi > + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo > + |.endif > | cmplwi TAB:TMP2, 0 > | la RA, -8(BASE) > | bne ->fff_fallback > #else > + |.if FPU > | lfd f0, CFUNC:RB->upvalue[0] > + |.else > + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi > + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo > + |.endif > | la RA, -8(BASE) > #endif > | stw TISNIL, 8(BASE) > | li RD, (3+1)*8 > + |.if FPU > | stfd f0, 0(RA) > + |.else > + | stw TMP0, 0(RA) > + | stw TMP1, 4(RA) > + |.endif > | b ->fff_res > | > |.ffunc ipairs_aux > @@ -1513,14 +1675,24 @@ static void build_subroutines(BuildCtx *ctx) > | stfd FARG2, 0(RA) > |.endif > | ble >2 // Not in array part? > + |.if FPU > | lwzx TMP2, TMP1, TMP3 > | lfdx f0, TMP1, TMP3 > + |.else > + | lwzux TMP2, TMP1, TMP3 > + | lwz TMP3, 4(TMP1) > + |.endif > |1: > | checknil TMP2 > | li RD, (0+1)*8 > | beq ->fff_res // End of iteration, return 0 results. > | li RD, (2+1)*8 > + |.if FPU > | stfd f0, 8(RA) > + |.else > + | stw TMP2, 8(RA) > + | stw TMP3, 12(RA) > + |.endif > | b ->fff_res > |2: // Check for empty hash part first. Otherwise call C function. > | lwz TMP0, TAB:CARG1->hmask > @@ -1534,7 +1706,11 @@ static void build_subroutines(BuildCtx *ctx) > | li RD, (0+1)*8 > | beq ->fff_res > | lwz TMP2, 0(CRET1) > + |.if FPU > | lfd f0, 0(CRET1) > + |.else > + | lwz TMP3, 4(CRET1) > + |.endif > | b <1 > | > |.ffunc_1 ipairs > @@ -1543,12 +1719,22 @@ static void build_subroutines(BuildCtx *ctx) > | bne ->fff_fallback > #if LJ_52 > | lwz TAB:TMP2, TAB:CARG1->metatable > + |.if FPU > | lfd f0, CFUNC:RB->upvalue[0] > + |.else > + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi > + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo > + |.endif > | cmplwi TAB:TMP2, 0 > | la RA, -8(BASE) > | bne ->fff_fallback > #else > + |.if FPU > | lfd f0, CFUNC:RB->upvalue[0] > + |.else > + | lwz TMP0, CFUNC:RB->upvalue[0].u32.hi > + | lwz TMP1, CFUNC:RB->upvalue[0].u32.lo > + |.endif > | la RA, -8(BASE) > #endif > |.if DUALNUM > @@ -1558,7 +1744,12 @@ static void build_subroutines(BuildCtx *ctx) > |.endif > | stw ZERO, 12(BASE) > | li RD, (3+1)*8 > + |.if FPU > | stfd f0, 0(RA) > + |.else > + | stw TMP0, 0(RA) > + | stw TMP1, 4(RA) > + |.endif > | b ->fff_res > | > |//-- Base library: catch errors ---------------------------------------- > @@ -1577,19 +1768,32 @@ static void build_subroutines(BuildCtx *ctx) > | > |.ffunc xpcall > | cmplwi NARGS8:RC, 16 > - | lwz CARG4, 8(BASE) > + | lwz CARG3, 8(BASE) > + |.if FPU > | lfd FARG2, 8(BASE) > | lfd FARG1, 0(BASE) > + |.else > + | lwz CARG1, 0(BASE) > + | lwz CARG2, 4(BASE) > + | lwz CARG4, 12(BASE) > + |.endif > | blt ->fff_fallback > | lbz TMP1, DISPATCH_GL(hookmask)(DISPATCH) > | mr TMP2, BASE > - | checkfunc CARG4; bne ->fff_fallback // Traceback must be a function. > + | checkfunc CARG3; bne ->fff_fallback // Traceback must be a function. > | la BASE, 16(BASE) > | // Remember active hook before pcall. > | rlwinm TMP1, TMP1, 32-HOOK_ACTIVE_SHIFT, 31, 31 > + |.if FPU > | stfd FARG2, 0(TMP2) // Swap function and traceback. > - | subi NARGS8:RC, NARGS8:RC, 16 > | stfd FARG1, 8(TMP2) > + |.else > + | stw CARG3, 0(TMP2) > + | stw CARG4, 4(TMP2) > + | stw CARG1, 8(TMP2) > + | stw CARG2, 12(TMP2) > + |.endif > + | subi NARGS8:RC, NARGS8:RC, 16 > | addi PC, TMP1, 16+FRAME_PCALL > | b ->vm_call_dispatch > | > @@ -1632,9 +1836,21 @@ static void build_subroutines(BuildCtx *ctx) > | stp BASE, L->top > |2: // Move args to coroutine. > | cmpw TMP1, NARGS8:RC > + |.if FPU > | lfdx f0, BASE, TMP1 > + |.else > + | add CARG3, BASE, TMP1 > + | lwz TMP2, 0(CARG3) > + | lwz TMP3, 4(CARG3) > + |.endif > | beq >3 > + |.if FPU > | stfdx f0, CARG2, TMP1 > + |.else > + | add CARG3, CARG2, TMP1 > + | stw TMP2, 0(CARG3) > + | stw TMP3, 4(CARG3) > + |.endif > | addi TMP1, TMP1, 8 > | b <2 > |3: > @@ -1665,8 +1881,17 @@ static void build_subroutines(BuildCtx *ctx) > | stp TMP2, L:SAVE0->top // Clear coroutine stack. > |5: // Move results from coroutine. > | cmplw TMP1, TMP3 > + |.if FPU > | lfdx f0, TMP2, TMP1 > | stfdx f0, BASE, TMP1 > + |.else > + | add CARG3, TMP2, TMP1 > + | lwz CARG1, 0(CARG3) > + | lwz CARG2, 4(CARG3) > + | add CARG3, BASE, TMP1 > + | stw CARG1, 0(CARG3) > + | stw CARG2, 4(CARG3) > + |.endif > | addi TMP1, TMP1, 8 > | bne <5 > |6: > @@ -1691,12 +1916,22 @@ static void build_subroutines(BuildCtx *ctx) > | andix. TMP0, PC, FRAME_TYPE > | la TMP3, -8(TMP3) > | li TMP1, LJ_TFALSE > + |.if FPU > | lfd f0, 0(TMP3) > + |.else > + | lwz CARG1, 0(TMP3) > + | lwz CARG2, 4(TMP3) > + |.endif > | stp TMP3, L:SAVE0->top // Remove error from coroutine stack. > | li RD, (2+1)*8 > | stw TMP1, -8(BASE) // Prepend false to results. > | la RA, -8(BASE) > + |.if FPU > | stfd f0, 0(BASE) // Copy error message. > + |.else > + | stw CARG1, 0(BASE) // Copy error message. > + | stw CARG2, 4(BASE) > + |.endif > | b <7 > |.else > | mr CARG1, L > @@ -1875,7 +2110,12 @@ static void build_subroutines(BuildCtx *ctx) > | lus CARG1, 0x8000 // -(2^31). > | beqy ->fff_resi > |5: > + |.if FPU > | lfd FARG1, 0(BASE) > + |.else > + | lwz CARG1, 0(BASE) > + | lwz CARG2, 4(BASE) > + |.endif > | blex func > | b ->fff_resn > |.endmacro > @@ -1899,10 +2139,14 @@ static void build_subroutines(BuildCtx *ctx) > | > |.ffunc math_log > | cmplwi NARGS8:RC, 8 > - | lwz CARG3, 0(BASE) > - | lfd FARG1, 0(BASE) > + | lwz CARG1, 0(BASE) > | bne ->fff_fallback // Need exactly 1 argument. > - | checknum CARG3; bge ->fff_fallback > + | checknum CARG1; bge ->fff_fallback > + |.if FPU > + | lfd FARG1, 0(BASE) > + |.else > + | lwz CARG2, 4(BASE) > + |.endif > | blex log > | b ->fff_resn > | > @@ -1924,17 +2168,24 @@ static void build_subroutines(BuildCtx *ctx) > |.if DUALNUM > |.ffunc math_ldexp > | cmplwi NARGS8:RC, 16 > - | lwz CARG3, 0(BASE) > + | lwz TMP0, 0(BASE) > + |.if FPU > | lfd FARG1, 0(BASE) > - | lwz CARG4, 8(BASE) > + |.else > + | lwz CARG1, 0(BASE) > + | lwz CARG2, 4(BASE) > + |.endif > + | lwz TMP1, 8(BASE) > |.if GPR64 > | lwz CARG2, 12(BASE) > - |.else > + |.elif FPU > | lwz CARG1, 12(BASE) > + |.else > + | lwz CARG3, 12(BASE) > |.endif > | blt ->fff_fallback > - | checknum CARG3; bge ->fff_fallback > - | checknum CARG4; bne ->fff_fallback > + | checknum TMP0; bge ->fff_fallback > + | checknum TMP1; bne ->fff_fallback > |.else > |.ffunc_nn math_ldexp > |.if GPR64 > @@ -1949,8 +2200,10 @@ static void build_subroutines(BuildCtx *ctx) > |.ffunc_n math_frexp > |.if GPR64 > | la CARG2, DISPATCH_GL(tmptv)(DISPATCH) > - |.else > + |.elif FPU > | la CARG1, DISPATCH_GL(tmptv)(DISPATCH) > + |.else > + | la CARG3, DISPATCH_GL(tmptv)(DISPATCH) > |.endif > | lwz PC, FRAME_PC(BASE) > | blex frexp > @@ -1959,7 +2212,12 @@ static void build_subroutines(BuildCtx *ctx) > |.if not DUALNUM > | tonum_i FARG2, TMP1 > |.endif > + |.if FPU > | stfd FARG1, 0(RA) > + |.else > + | stw CRET1, 0(RA) > + | stw CRET2, 4(RA) > + |.endif > | li RD, (2+1)*8 > |.if DUALNUM > | stw TISNUM, 8(RA) > @@ -1972,13 +2230,20 @@ static void build_subroutines(BuildCtx *ctx) > |.ffunc_n math_modf > |.if GPR64 > | la CARG2, -8(BASE) > - |.else > + |.elif FPU > | la CARG1, -8(BASE) > + |.else > + | la CARG3, -8(BASE) > |.endif > | lwz PC, FRAME_PC(BASE) > | blex modf > | la RA, -8(BASE) > + |.if FPU > | stfd FARG1, 0(BASE) > + |.else > + | stw CRET1, 0(BASE) > + | stw CRET2, 4(BASE) > + |.endif > | li RD, (2+1)*8 > | b ->fff_res > | > @@ -1986,13 +2251,13 @@ static void build_subroutines(BuildCtx *ctx) > |.if DUALNUM > | .ffunc_1 name > | checknum CARG3 > - | addi TMP1, BASE, 8 > - | add TMP2, BASE, NARGS8:RC > + | addi SAVE0, BASE, 8 > + | add SAVE1, BASE, NARGS8:RC > | bne >4 > |1: // Handle integers. > - | lwz CARG4, 0(TMP1) > - | cmplw cr1, TMP1, TMP2 > - | lwz CARG2, 4(TMP1) > + | lwz CARG4, 0(SAVE0) > + | cmplw cr1, SAVE0, SAVE1 > + | lwz CARG2, 4(SAVE0) > | bge cr1, ->fff_resi > | checknum CARG4 > | xoris TMP0, CARG1, 0x8000 > @@ -2009,36 +2274,76 @@ static void build_subroutines(BuildCtx *ctx) > |.if GPR64 > | rldicl CARG1, CARG1, 0, 32 > |.endif > - | addi TMP1, TMP1, 8 > + | addi SAVE0, SAVE0, 8 > | b <1 > |3: > | bge ->fff_fallback > | // Convert intermediate result to number and continue below. > + |.if FPU > | tonum_i FARG1, CARG1 > - | lfd FARG2, 0(TMP1) > + | lfd FARG2, 0(SAVE0) > + |.else > + | mr CARG2, CARG1 > + | bl ->vm_sfi2d_1 > + | lwz CARG3, 0(SAVE0) > + | lwz CARG4, 4(SAVE0) > + |.endif > | b >6 > |4: > + |.if FPU > | lfd FARG1, 0(BASE) > + |.else > + | lwz CARG1, 0(BASE) > + | lwz CARG2, 4(BASE) > + |.endif > | bge ->fff_fallback > |5: // Handle numbers. > - | lwz CARG4, 0(TMP1) > - | cmplw cr1, TMP1, TMP2 > - | lfd FARG2, 0(TMP1) > + | lwz CARG3, 0(SAVE0) > + | cmplw cr1, SAVE0, SAVE1 > + |.if FPU > + | lfd FARG2, 0(SAVE0) > + |.else > + | lwz CARG4, 4(SAVE0) > + |.endif > | bge cr1, ->fff_resn > - | checknum CARG4; bge >7 > + | checknum CARG3; bge >7 > |6: > + | addi SAVE0, SAVE0, 8 > + |.if FPU > | fsub f0, FARG1, FARG2 > - | addi TMP1, TMP1, 8 > |.if ismax > | fsel FARG1, f0, FARG1, FARG2 > |.else > | fsel FARG1, f0, FARG2, FARG1 > |.endif > + |.else > + | stw CARG1, SFSAVE_1 > + | stw CARG2, SFSAVE_2 > + | stw CARG3, SFSAVE_3 > + | stw CARG4, SFSAVE_4 > + | blex __ledf2 > + | cmpwi CRET1, 0 > + |.if ismax > + | blt >8 > + |.else > + | bge >8 > + |.endif > + | lwz CARG1, SFSAVE_1 > + | lwz CARG2, SFSAVE_2 > + | b <5 > + |8: > + | lwz CARG1, SFSAVE_3 > + | lwz CARG2, SFSAVE_4 > + |.endif > | b <5 > |7: // Convert integer to number and continue above. > - | lwz CARG2, 4(TMP1) > + | lwz CARG3, 4(SAVE0) > | bne ->fff_fallback > - | tonum_i FARG2, CARG2 > + |.if FPU > + | tonum_i FARG2, CARG3 > + |.else > + | bl ->vm_sfi2d_2 > + |.endif > | b <6 > |.else > | .ffunc_n name > @@ -2238,28 +2543,37 @@ static void build_subroutines(BuildCtx *ctx) > | > |.macro .ffunc_bit_op, name, ins > | .ffunc_bit name > - | addi TMP1, BASE, 8 > - | add TMP2, BASE, NARGS8:RC > + | addi SAVE0, BASE, 8 > + | add SAVE1, BASE, NARGS8:RC > |1: > - | lwz CARG4, 0(TMP1) > - | cmplw cr1, TMP1, TMP2 > + | lwz CARG4, 0(SAVE0) > + | cmplw cr1, SAVE0, SAVE1 > |.if DUALNUM > - | lwz CARG2, 4(TMP1) > + | lwz CARG2, 4(SAVE0) > |.else > - | lfd FARG1, 0(TMP1) > + | lfd FARG1, 0(SAVE0) > |.endif > | bgey cr1, ->fff_resi > | checknum CARG4 > |.if DUALNUM > + |.if FPU > | bnel ->fff_bitop_fb > |.else > + | beq >3 > + | stw CARG1, SFSAVE_1 > + | bl ->fff_bitop_fb > + | mr CARG2, CARG1 > + | lwz CARG1, SFSAVE_1 > + |3: > + |.endif > + |.else > | fadd FARG1, FARG1, TOBIT > | bge ->fff_fallback > | stfd FARG1, TMPD > | lwz CARG2, TMPD_LO > |.endif > | ins CARG1, CARG1, CARG2 > - | addi TMP1, TMP1, 8 > + | addi SAVE0, SAVE0, 8 > | b <1 > |.endmacro > | > @@ -2281,7 +2595,14 @@ static void build_subroutines(BuildCtx *ctx) > |.macro .ffunc_bit_sh, name, ins, shmod > |.if DUALNUM > | .ffunc_2 bit_..name > + |.if FPU > | checknum CARG3; bnel ->fff_tobit_fb > + |.else > + | checknum CARG3; beq >1 > + | bl ->fff_tobit_fb > + | lwz CARG2, 12(BASE) // Conversion polluted CARG2. > + |1: > + |.endif > | // Note: no inline conversion from number for 2nd argument! > | checknum CARG4; bne ->fff_fallback > |.else > @@ -2318,27 +2639,77 @@ static void build_subroutines(BuildCtx *ctx) > |->fff_resn: > | lwz PC, FRAME_PC(BASE) > | la RA, -8(BASE) > + |.if FPU > | stfd FARG1, -8(BASE) > + |.else > + | stw CARG1, -8(BASE) > + | stw CARG2, -4(BASE) > + |.endif > | b ->fff_res1 > | > |// Fallback FP number to bit conversion. > |->fff_tobit_fb: > |.if DUALNUM > + |.if FPU > | lfd FARG1, 0(BASE) > | bgt ->fff_fallback > | fadd FARG1, FARG1, TOBIT > | stfd FARG1, TMPD > | lwz CARG1, TMPD_LO > | blr > + |.else > + | bgt ->fff_fallback > + | mr CARG2, CARG1 > + | mr CARG1, CARG3 > + |// Modifies: CARG1, CARG2, TMP0, TMP1, TMP2. > + |->vm_tobit: > + | slwi TMP2, CARG1, 1 > + | addis TMP2, TMP2, 0x0020 > + | cmpwi TMP2, 0 > + | bge >2 > + | li TMP1, 0x3e0 > + | srawi TMP2, TMP2, 21 > + | not TMP1, TMP1 > + | sub. TMP2, TMP1, TMP2 > + | cmpwi cr7, CARG1, 0 > + | blt >1 > + | slwi TMP1, CARG1, 11 > + | srwi TMP0, CARG2, 21 > + | oris TMP1, TMP1, 0x8000 > + | or TMP1, TMP1, TMP0 > + | srw CARG1, TMP1, TMP2 > + | bclr 4, 28 // Return if cr7[lt] == 0, no hint. > + | neg CARG1, CARG1 > + | blr > + |1: > + | addi TMP2, TMP2, 21 > + | srw TMP1, CARG2, TMP2 > + | slwi CARG2, CARG1, 12 > + | subfic TMP2, TMP2, 20 > + | slw TMP0, CARG2, TMP2 > + | or CARG1, TMP1, TMP0 > + | bclr 4, 28 // Return if cr7[lt] == 0, no hint. > + | neg CARG1, CARG1 > + | blr > + |2: > + | li CARG1, 0 > + | blr > + |.endif > |.endif > |->fff_bitop_fb: > |.if DUALNUM > - | lfd FARG1, 0(TMP1) > + |.if FPU > + | lfd FARG1, 0(SAVE0) > | bgt ->fff_fallback > | fadd FARG1, FARG1, TOBIT > | stfd FARG1, TMPD > | lwz CARG2, TMPD_LO > | blr > + |.else > + | bgt ->fff_fallback > + | mr CARG1, CARG4 > + | b ->vm_tobit > + |.endif > |.endif > | > |//----------------------------------------------------------------------- > @@ -2531,10 +2902,21 @@ static void build_subroutines(BuildCtx *ctx) > | decode_RA8 RC, INS // Call base. > | beq >2 > |1: // Move results down. > + |.if FPU > | lfd f0, 0(RA) > + |.else > + | lwz CARG1, 0(RA) > + | lwz CARG2, 4(RA) > + |.endif > | addic. TMP1, TMP1, -8 > | addi RA, RA, 8 > + |.if FPU > | stfdx f0, BASE, RC > + |.else > + | add CARG3, BASE, RC > + | stw CARG1, 0(CARG3) > + | stw CARG2, 4(CARG3) > + |.endif > | addi RC, RC, 8 > | bne <1 > |2: > @@ -2587,10 +2969,12 @@ static void build_subroutines(BuildCtx *ctx) > |//----------------------------------------------------------------------- > | > |.macro savex_, a, b, c, d > + |.if FPU > | stfd f..a, 16+a*8(sp) > | stfd f..b, 16+b*8(sp) > | stfd f..c, 16+c*8(sp) > | stfd f..d, 16+d*8(sp) > + |.endif > |.endmacro > | > |->vm_exit_handler: > @@ -2662,16 +3046,16 @@ static void build_subroutines(BuildCtx *ctx) > | lwz KBASE, PC2PROTO(k)(TMP1) > | // Setup type comparison constants. > | li TISNUM, LJ_TISNUM > - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > - | stw TMP3, TMPD > + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > + | .FPU stw TMP3, TMPD > | li ZERO, 0 > - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > - | lfs TOBIT, TMPD > - | stw TMP3, TMPD > - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > + | .FPU lfs TOBIT, TMPD > + | .FPU stw TMP3, TMPD > + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > | li TISNIL, LJ_TNIL > - | stw TMP0, TONUM_HI > - | lfs TONUM, TMPD > + | .FPU stw TMP0, TONUM_HI > + | .FPU lfs TONUM, TMPD > | // Modified copy of ins_next which handles function header dispatch, too. > | lwz INS, 0(PC) > | addi PC, PC, 4 > @@ -2716,7 +3100,35 @@ static void build_subroutines(BuildCtx *ctx) > |//-- Math helper functions ---------------------------------------------- > |//----------------------------------------------------------------------- > | > - |// NYI: Use internal implementations of floor, ceil, trunc. > + |// NYI: Use internal implementations of floor, ceil, trunc, sfcmp. > + | > + |.macro sfi2d, AHI, ALO > + |.if not FPU > + | mr. AHI, ALO > + | bclr 12, 2 // Handle zero first. > + | srawi TMP0, ALO, 31 > + | xor TMP1, ALO, TMP0 > + | sub TMP1, TMP1, TMP0 // Absolute value in TMP1. > + | cntlzw AHI, TMP1 > + | andix. TMP0, TMP0, 0x800 // Mask sign bit. > + | slw TMP1, TMP1, AHI // Align mantissa left with leading 1. > + | subfic AHI, AHI, 0x3ff+31-1 // Exponent -1 in AHI. > + | slwi ALO, TMP1, 21 > + | or AHI, AHI, TMP0 // Sign | Exponent. > + | srwi TMP1, TMP1, 11 > + | slwi AHI, AHI, 20 // Align left. > + | add AHI, AHI, TMP1 // Add mantissa, increment exponent. > + | blr > + |.endif > + |.endmacro > + | > + |// Input: CARG2. Output: CARG1, CARG2. Temporaries: TMP0, TMP1. > + |->vm_sfi2d_1: > + | sfi2d CARG1, CARG2 > + | > + |// Input: CARG4. Output: CARG3, CARG4. Temporaries: TMP0, TMP1. > + |->vm_sfi2d_2: > + | sfi2d CARG3, CARG4 > | > |->vm_modi: > | divwo. TMP0, CARG1, CARG2 > @@ -2784,21 +3196,21 @@ static void build_subroutines(BuildCtx *ctx) > | addi DISPATCH, r12, GG_G2DISP > | stw r11, CTSTATE->cb.slot > | stw r3, CTSTATE->cb.gpr[0] > - | stfd f1, CTSTATE->cb.fpr[0] > + | .FPU stfd f1, CTSTATE->cb.fpr[0] > | stw r4, CTSTATE->cb.gpr[1] > - | stfd f2, CTSTATE->cb.fpr[1] > + | .FPU stfd f2, CTSTATE->cb.fpr[1] > | stw r5, CTSTATE->cb.gpr[2] > - | stfd f3, CTSTATE->cb.fpr[2] > + | .FPU stfd f3, CTSTATE->cb.fpr[2] > | stw r6, CTSTATE->cb.gpr[3] > - | stfd f4, CTSTATE->cb.fpr[3] > + | .FPU stfd f4, CTSTATE->cb.fpr[3] > | stw r7, CTSTATE->cb.gpr[4] > - | stfd f5, CTSTATE->cb.fpr[4] > + | .FPU stfd f5, CTSTATE->cb.fpr[4] > | stw r8, CTSTATE->cb.gpr[5] > - | stfd f6, CTSTATE->cb.fpr[5] > + | .FPU stfd f6, CTSTATE->cb.fpr[5] > | stw r9, CTSTATE->cb.gpr[6] > - | stfd f7, CTSTATE->cb.fpr[6] > + | .FPU stfd f7, CTSTATE->cb.fpr[6] > | stw r10, CTSTATE->cb.gpr[7] > - | stfd f8, CTSTATE->cb.fpr[7] > + | .FPU stfd f8, CTSTATE->cb.fpr[7] > | addi TMP0, sp, CFRAME_SPACE+8 > | stw TMP0, CTSTATE->cb.stack > | mr CARG1, CTSTATE > @@ -2809,21 +3221,21 @@ static void build_subroutines(BuildCtx *ctx) > | lp BASE, L:CRET1->base > | li TISNUM, LJ_TISNUM // Setup type comparison constants. > | lp RC, L:CRET1->top > - | lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > + | .FPU lus TMP3, 0x59c0 // TOBIT = 2^52 + 2^51 (float). > | li ZERO, 0 > | mr L, CRET1 > - | stw TMP3, TMPD > - | lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > + | .FPU stw TMP3, TMPD > + | .FPU lus TMP0, 0x4338 // Hiword of 2^52 + 2^51 (double) > | lwz LFUNC:RB, FRAME_FUNC(BASE) > - | ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > - | stw TMP0, TONUM_HI > + | .FPU ori TMP3, TMP3, 0x0004 // TONUM = 2^52 + 2^51 + 2^31 (float). > + | .FPU stw TMP0, TONUM_HI > | li TISNIL, LJ_TNIL > | li_vmstate INTERP > - | lfs TOBIT, TMPD > - | stw TMP3, TMPD > + | .FPU lfs TOBIT, TMPD > + | .FPU stw TMP3, TMPD > | sub RC, RC, BASE > | st_vmstate > - | lfs TONUM, TMPD > + | .FPU lfs TONUM, TMPD > | ins_callt > |.endif > | > @@ -2837,7 +3249,7 @@ static void build_subroutines(BuildCtx *ctx) > | mr CARG2, RA > | bl extern lj_ccallback_leave // (CTState *cts, TValue *o) > | lwz CRET1, CTSTATE->cb.gpr[0] > - | lfd FARG1, CTSTATE->cb.fpr[0] > + | .FPU lfd FARG1, CTSTATE->cb.fpr[0] > | lwz CRET2, CTSTATE->cb.gpr[1] > | b ->vm_leave_unw > |.endif > @@ -2871,14 +3283,14 @@ static void build_subroutines(BuildCtx *ctx) > | bge <1 > |2: > | bney cr1, >3 > - | lfd f1, CCSTATE->fpr[0] > - | lfd f2, CCSTATE->fpr[1] > - | lfd f3, CCSTATE->fpr[2] > - | lfd f4, CCSTATE->fpr[3] > - | lfd f5, CCSTATE->fpr[4] > - | lfd f6, CCSTATE->fpr[5] > - | lfd f7, CCSTATE->fpr[6] > - | lfd f8, CCSTATE->fpr[7] > + | .FPU lfd f1, CCSTATE->fpr[0] > + | .FPU lfd f2, CCSTATE->fpr[1] > + | .FPU lfd f3, CCSTATE->fpr[2] > + | .FPU lfd f4, CCSTATE->fpr[3] > + | .FPU lfd f5, CCSTATE->fpr[4] > + | .FPU lfd f6, CCSTATE->fpr[5] > + | .FPU lfd f7, CCSTATE->fpr[6] > + | .FPU lfd f8, CCSTATE->fpr[7] > |3: > | lp TMP0, CCSTATE->func > | lwz CARG2, CCSTATE->gpr[1] > @@ -2895,7 +3307,7 @@ static void build_subroutines(BuildCtx *ctx) > | lwz TMP2, -4(r14) > | lwz TMP0, 4(r14) > | stw CARG1, CCSTATE:TMP1->gpr[0] > - | stfd FARG1, CCSTATE:TMP1->fpr[0] > + | .FPU stfd FARG1, CCSTATE:TMP1->fpr[0] > | stw CARG2, CCSTATE:TMP1->gpr[1] > | mtlr TMP0 > | stw CARG3, CCSTATE:TMP1->gpr[2] > @@ -2924,19 +3336,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > case BC_ISLT: case BC_ISGE: case BC_ISLE: case BC_ISGT: > | // RA = src1*8, RD = src2*8, JMP with RD = target > |.if DUALNUM > - | lwzux TMP0, RA, BASE > + | lwzux CARG1, RA, BASE > | addi PC, PC, 4 > | lwz CARG2, 4(RA) > - | lwzux TMP1, RD, BASE > + | lwzux CARG3, RD, BASE > | lwz TMP2, -4(PC) > - | checknum cr0, TMP0 > - | lwz CARG3, 4(RD) > + | checknum cr0, CARG1 > + | lwz CARG4, 4(RD) > | decode_RD4 TMP2, TMP2 > - | checknum cr1, TMP1 > - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16) > + | checknum cr1, CARG3 > + | addis SAVE0, TMP2, -(BCBIAS_J*4 >> 16) > | bne cr0, >7 > | bne cr1, >8 > - | cmpw CARG2, CARG3 > + | cmpw CARG2, CARG4 > if (op == BC_ISLT) { > | bge >2 > } else if (op == BC_ISGE) { > @@ -2947,28 +3359,41 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | ble >2 > } > |1: > - | add PC, PC, TMP2 > + | add PC, PC, SAVE0 > |2: > | ins_next > | > |7: // RA is not an integer. > | bgt cr0, ->vmeta_comp > | // RA is a number. > - | lfd f0, 0(RA) > + | .FPU lfd f0, 0(RA) > | bgt cr1, ->vmeta_comp > | blt cr1, >4 > | // RA is a number, RD is an integer. > - | tonum_i f1, CARG3 > + |.if FPU > + | tonum_i f1, CARG4 > + |.else > + | bl ->vm_sfi2d_2 > + |.endif > | b >5 > | > |8: // RA is an integer, RD is not an integer. > | bgt cr1, ->vmeta_comp > | // RA is an integer, RD is a number. > + |.if FPU > | tonum_i f0, CARG2 > + |.else > + | bl ->vm_sfi2d_1 > + |.endif > |4: > - | lfd f1, 0(RD) > + | .FPU lfd f1, 0(RD) > |5: > + |.if FPU > | fcmpu cr0, f0, f1 > + |.else > + | blex __ledf2 > + | cmpwi CRET1, 0 > + |.endif > if (op == BC_ISLT) { > | bge <2 > } else if (op == BC_ISGE) { > @@ -3016,42 +3441,42 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > vk = op == BC_ISEQV; > | // RA = src1*8, RD = src2*8, JMP with RD = target > |.if DUALNUM > - | lwzux TMP0, RA, BASE > + | lwzux CARG1, RA, BASE > | addi PC, PC, 4 > | lwz CARG2, 4(RA) > - | lwzux TMP1, RD, BASE > - | checknum cr0, TMP0 > - | lwz TMP2, -4(PC) > - | checknum cr1, TMP1 > - | decode_RD4 TMP2, TMP2 > - | lwz CARG3, 4(RD) > + | lwzux CARG3, RD, BASE > + | checknum cr0, CARG1 > + | lwz SAVE0, -4(PC) > + | checknum cr1, CARG3 > + | decode_RD4 SAVE0, SAVE0 > + | lwz CARG4, 4(RD) > | cror 4*cr7+gt, 4*cr0+gt, 4*cr1+gt > - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16) > + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16) > if (vk) { > | ble cr7, ->BC_ISEQN_Z > } else { > | ble cr7, ->BC_ISNEN_Z > } > |.else > - | lwzux TMP0, RA, BASE > - | lwz TMP2, 0(PC) > + | lwzux CARG1, RA, BASE > + | lwz SAVE0, 0(PC) > | lfd f0, 0(RA) > | addi PC, PC, 4 > - | lwzux TMP1, RD, BASE > - | checknum cr0, TMP0 > - | decode_RD4 TMP2, TMP2 > + | lwzux CARG3, RD, BASE > + | checknum cr0, CARG1 > + | decode_RD4 SAVE0, SAVE0 > | lfd f1, 0(RD) > - | checknum cr1, TMP1 > - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16) > + | checknum cr1, CARG3 > + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16) > | bge cr0, >5 > | bge cr1, >5 > | fcmpu cr0, f0, f1 > if (vk) { > | bne >1 > - | add PC, PC, TMP2 > + | add PC, PC, SAVE0 > } else { > | beq >1 > - | add PC, PC, TMP2 > + | add PC, PC, SAVE0 > } > |1: > | ins_next > @@ -3059,36 +3484,36 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > |5: // Either or both types are not numbers. > |.if not DUALNUM > | lwz CARG2, 4(RA) > - | lwz CARG3, 4(RD) > + | lwz CARG4, 4(RD) > |.endif > |.if FFI > - | cmpwi cr7, TMP0, LJ_TCDATA > - | cmpwi cr5, TMP1, LJ_TCDATA > + | cmpwi cr7, CARG1, LJ_TCDATA > + | cmpwi cr5, CARG3, LJ_TCDATA > |.endif > - | not TMP3, TMP0 > - | cmplw TMP0, TMP1 > - | cmplwi cr1, TMP3, ~LJ_TISPRI // Primitive? > + | not TMP2, CARG1 > + | cmplw CARG1, CARG3 > + | cmplwi cr1, TMP2, ~LJ_TISPRI // Primitive? > |.if FFI > | cror 4*cr7+eq, 4*cr7+eq, 4*cr5+eq > |.endif > - | cmplwi cr6, TMP3, ~LJ_TISTABUD // Table or userdata? > + | cmplwi cr6, TMP2, ~LJ_TISTABUD // Table or userdata? > |.if FFI > | beq cr7, ->vmeta_equal_cd > |.endif > - | cmplw cr5, CARG2, CARG3 > + | cmplw cr5, CARG2, CARG4 > | crandc 4*cr0+gt, 4*cr0+eq, 4*cr1+gt // 2: Same type and primitive. > | crorc 4*cr0+lt, 4*cr5+eq, 4*cr0+eq // 1: Same tv or different type. > | crand 4*cr0+eq, 4*cr0+eq, 4*cr5+eq // 0: Same type and same tv. > - | mr SAVE0, PC > + | mr SAVE1, PC > | cror 4*cr0+eq, 4*cr0+eq, 4*cr0+gt // 0 or 2. > | cror 4*cr0+lt, 4*cr0+lt, 4*cr0+gt // 1 or 2. > if (vk) { > | bne cr0, >6 > - | add PC, PC, TMP2 > + | add PC, PC, SAVE0 > |6: > } else { > | beq cr0, >6 > - | add PC, PC, TMP2 > + | add PC, PC, SAVE0 > |6: > } > |.if DUALNUM > @@ -3103,6 +3528,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | > | // Different tables or userdatas. Need to check __eq metamethod. > | // Field metatable must be at same offset for GCtab and GCudata! > + | mr CARG3, CARG4 > | lwz TAB:TMP2, TAB:CARG2->metatable > | li CARG4, 1-vk // ne = 0 or 1. > | cmplwi TAB:TMP2, 0 > @@ -3110,7 +3536,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lbz TMP2, TAB:TMP2->nomm > | andix. TMP2, TMP2, 1< | bne <1 // Or 'no __eq' flag set? > - | mr PC, SAVE0 // Restore old PC. > + | mr PC, SAVE1 // Restore old PC. > | b ->vmeta_equal // Handle __eq metamethod. > break; > > @@ -3151,16 +3577,16 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > vk = op == BC_ISEQN; > | // RA = src*8, RD = num_const*8, JMP with RD = target > |.if DUALNUM > - | lwzux TMP0, RA, BASE > + | lwzux CARG1, RA, BASE > | addi PC, PC, 4 > | lwz CARG2, 4(RA) > - | lwzux TMP1, RD, KBASE > - | checknum cr0, TMP0 > - | lwz TMP2, -4(PC) > - | checknum cr1, TMP1 > - | decode_RD4 TMP2, TMP2 > - | lwz CARG3, 4(RD) > - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16) > + | lwzux CARG3, RD, KBASE > + | checknum cr0, CARG1 > + | lwz SAVE0, -4(PC) > + | checknum cr1, CARG3 > + | decode_RD4 SAVE0, SAVE0 > + | lwz CARG4, 4(RD) > + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16) > if (vk) { > |->BC_ISEQN_Z: > } else { > @@ -3168,7 +3594,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > } > | bne cr0, >7 > | bne cr1, >8 > - | cmpw CARG2, CARG3 > + | cmpw CARG2, CARG4 > |4: > |.else > if (vk) { > @@ -3176,20 +3602,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > } else { > |->BC_ISNEN_Z: // Dummy label. > } > - | lwzx TMP0, BASE, RA > + | lwzx CARG1, BASE, RA > | addi PC, PC, 4 > | lfdx f0, BASE, RA > - | lwz TMP2, -4(PC) > + | lwz SAVE0, -4(PC) > | lfdx f1, KBASE, RD > - | decode_RD4 TMP2, TMP2 > - | checknum TMP0 > - | addis TMP2, TMP2, -(BCBIAS_J*4 >> 16) > + | decode_RD4 SAVE0, SAVE0 > + | checknum CARG1 > + | addis SAVE0, SAVE0, -(BCBIAS_J*4 >> 16) > | bge >3 > | fcmpu cr0, f0, f1 > |.endif > if (vk) { > | bne >1 > - | add PC, PC, TMP2 > + | add PC, PC, SAVE0 > |1: > |.if not FFI > |3: > @@ -3200,13 +3626,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > |.if not FFI > |3: > |.endif > - | add PC, PC, TMP2 > + | add PC, PC, SAVE0 > |2: > } > | ins_next > |.if FFI > |3: > - | cmpwi TMP0, LJ_TCDATA > + | cmpwi CARG1, LJ_TCDATA > | beq ->vmeta_equal_cd > | b <1 > |.endif > @@ -3214,18 +3640,31 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > |7: // RA is not an integer. > | bge cr0, <3 > | // RA is a number. > - | lfd f0, 0(RA) > + | .FPU lfd f0, 0(RA) > | blt cr1, >1 > | // RA is a number, RD is an integer. > - | tonum_i f1, CARG3 > + |.if FPU > + | tonum_i f1, CARG4 > + |.else > + | bl ->vm_sfi2d_2 > + |.endif > | b >2 > | > |8: // RA is an integer, RD is a number. > + |.if FPU > | tonum_i f0, CARG2 > + |.else > + | bl ->vm_sfi2d_1 > + |.endif > |1: > - | lfd f1, 0(RD) > + | .FPU lfd f1, 0(RD) > |2: > + |.if FPU > | fcmpu cr0, f0, f1 > + |.else > + | blex __ledf2 > + | cmpwi CRET1, 0 > + |.endif > | b <4 > |.endif > break; > @@ -3280,7 +3719,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | add PC, PC, TMP2 > } else { > | li TMP1, LJ_TFALSE > + |.if FPU > | lfdx f0, BASE, RD > + |.else > + | lwzux CARG1, RD, BASE > + | lwz CARG2, 4(RD) > + |.endif > | cmplw TMP0, TMP1 > if (op == BC_ISTC) { > | bge >1 > @@ -3289,7 +3733,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > } > | addis PC, PC, -(BCBIAS_J*4 >> 16) > | decode_RD4 TMP2, INS > + |.if FPU > | stfdx f0, BASE, RA > + |.else > + | stwux CARG1, RA, BASE > + | stw CARG2, 4(RA) > + |.endif > | add PC, PC, TMP2 > |1: > } > @@ -3324,8 +3773,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > case BC_MOV: > | // RA = dst*8, RD = src*8 > | ins_next1 > + |.if FPU > | lfdx f0, BASE, RD > | stfdx f0, BASE, RA > + |.else > + | lwzux TMP0, RD, BASE > + | lwz TMP1, 4(RD) > + | stwux TMP0, RA, BASE > + | stw TMP1, 4(RA) > + |.endif > | ins_next2 > break; > case BC_NOT: > @@ -3427,44 +3883,65 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN); > ||switch (vk) { > ||case 0: > - | lwzx TMP1, BASE, RB > + | lwzx CARG1, BASE, RB > | .if DUALNUM > - | lwzx TMP2, KBASE, RC > + | lwzx CARG3, KBASE, RC > | .endif > + | .if FPU > | lfdx f14, BASE, RB > | lfdx f15, KBASE, RC > + | .else > + | add TMP1, BASE, RB > + | add TMP2, KBASE, RC > + | lwz CARG2, 4(TMP1) > + | lwz CARG4, 4(TMP2) > + | .endif > | .if DUALNUM > - | checknum cr0, TMP1 > - | checknum cr1, TMP2 > + | checknum cr0, CARG1 > + | checknum cr1, CARG3 > | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | bge ->vmeta_arith_vn > | .else > - | checknum TMP1; bge ->vmeta_arith_vn > + | checknum CARG1; bge ->vmeta_arith_vn > | .endif > || break; > ||case 1: > - | lwzx TMP1, BASE, RB > + | lwzx CARG1, BASE, RB > | .if DUALNUM > - | lwzx TMP2, KBASE, RC > + | lwzx CARG3, KBASE, RC > | .endif > + | .if FPU > | lfdx f15, BASE, RB > | lfdx f14, KBASE, RC > + | .else > + | add TMP1, BASE, RB > + | add TMP2, KBASE, RC > + | lwz CARG2, 4(TMP1) > + | lwz CARG4, 4(TMP2) > + | .endif > | .if DUALNUM > - | checknum cr0, TMP1 > - | checknum cr1, TMP2 > + | checknum cr0, CARG1 > + | checknum cr1, CARG3 > | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | bge ->vmeta_arith_nv > | .else > - | checknum TMP1; bge ->vmeta_arith_nv > + | checknum CARG1; bge ->vmeta_arith_nv > | .endif > || break; > ||default: > - | lwzx TMP1, BASE, RB > - | lwzx TMP2, BASE, RC > + | lwzx CARG1, BASE, RB > + | lwzx CARG3, BASE, RC > + | .if FPU > | lfdx f14, BASE, RB > | lfdx f15, BASE, RC > - | checknum cr0, TMP1 > - | checknum cr1, TMP2 > + | .else > + | add TMP1, BASE, RB > + | add TMP2, BASE, RC > + | lwz CARG2, 4(TMP1) > + | lwz CARG4, 4(TMP2) > + | .endif > + | checknum cr0, CARG1 > + | checknum cr1, CARG3 > | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | bge ->vmeta_arith_vv > || break; > @@ -3498,48 +3975,78 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | fsub a, b, a // b - floor(b/c)*c > |.endmacro > | > + |.macro sfpmod > + |->BC_MODVN_Z: > + | stw CARG1, SFSAVE_1 > + | stw CARG2, SFSAVE_2 > + | mr SAVE0, CARG3 > + | mr SAVE1, CARG4 > + | blex __divdf3 > + | blex floor > + | mr CARG3, SAVE0 > + | mr CARG4, SAVE1 > + | blex __muldf3 > + | mr CARG3, CRET1 > + | mr CARG4, CRET2 > + | lwz CARG1, SFSAVE_1 > + | lwz CARG2, SFSAVE_2 > + | blex __subdf3 > + |.endmacro > + | > |.macro ins_arithfp, fpins > | ins_arithpre > |.if "fpins" == "fpmod_" > | b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway. > - |.else > + |.elif FPU > | fpins f0, f14, f15 > | ins_next1 > | stfdx f0, BASE, RA > | ins_next2 > + |.else > + | blex __divdf3 // Only soft-float div uses this macro. > + | ins_next1 > + | stwux CRET1, RA, BASE > + | stw CRET2, 4(RA) > + | ins_next2 > |.endif > |.endmacro > | > - |.macro ins_arithdn, intins, fpins > + |.macro ins_arithdn, intins, fpins, fpcall > | // RA = dst*8, RB = src1*8, RC = src2*8 | num_const*8 > ||vk = ((int)op - BC_ADDVN) / (BC_ADDNV-BC_ADDVN); > ||switch (vk) { > ||case 0: > - | lwzux TMP1, RB, BASE > - | lwzux TMP2, RC, KBASE > - | lwz CARG1, 4(RB) > - | checknum cr0, TMP1 > - | lwz CARG2, 4(RC) > + | lwzux CARG1, RB, BASE > + | lwzux CARG3, RC, KBASE > + | lwz CARG2, 4(RB) > + | checknum cr0, CARG1 > + | lwz CARG4, 4(RC) > + | checknum cr1, CARG3 > || break; > ||case 1: > - | lwzux TMP1, RB, BASE > - | lwzux TMP2, RC, KBASE > - | lwz CARG2, 4(RB) > - | checknum cr0, TMP1 > - | lwz CARG1, 4(RC) > + | lwzux CARG3, RB, BASE > + | lwzux CARG1, RC, KBASE > + | lwz CARG4, 4(RB) > + | checknum cr0, CARG3 > + | lwz CARG2, 4(RC) > + | checknum cr1, CARG1 > || break; > ||default: > - | lwzux TMP1, RB, BASE > - | lwzux TMP2, RC, BASE > - | lwz CARG1, 4(RB) > - | checknum cr0, TMP1 > - | lwz CARG2, 4(RC) > + | lwzux CARG1, RB, BASE > + | lwzux CARG3, RC, BASE > + | lwz CARG2, 4(RB) > + | checknum cr0, CARG1 > + | lwz CARG4, 4(RC) > + | checknum cr1, CARG3 > || break; > ||} > - | checknum cr1, TMP2 > | bne >5 > | bne cr1, >5 > - | intins CARG1, CARG1, CARG2 > + |.if "intins" == "intmod" > + | mr CARG1, CARG2 > + | mr CARG2, CARG4 > + |.endif > + | intins CARG1, CARG2, CARG4 > | bso >4 > |1: > | ins_next1 > @@ -3551,29 +4058,40 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | checkov TMP0, <1 // Ignore unrelated overflow. > | ins_arithfallback b > |5: // FP variant. > + |.if FPU > ||if (vk == 1) { > | lfd f15, 0(RB) > - | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | lfd f14, 0(RC) > ||} else { > | lfd f14, 0(RB) > - | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | lfd f15, 0(RC) > ||} > + |.endif > + | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | ins_arithfallback bge > |.if "fpins" == "fpmod_" > | b ->BC_MODVN_Z // Avoid 3 copies. It's slow anyway. > |.else > + |.if FPU > | fpins f0, f14, f15 > - | ins_next1 > | stfdx f0, BASE, RA > + |.else > + |.if "fpcall" == "sfpmod" > + | sfpmod > + |.else > + | blex fpcall > + |.endif > + | stwux CRET1, RA, BASE > + | stw CRET2, 4(RA) > + |.endif > + | ins_next1 > | b <2 > |.endif > |.endmacro > | > - |.macro ins_arith, intins, fpins > + |.macro ins_arith, intins, fpins, fpcall > |.if DUALNUM > - | ins_arithdn intins, fpins > + | ins_arithdn intins, fpins, fpcall > |.else > | ins_arithfp fpins > |.endif > @@ -3588,9 +4106,9 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | addo. TMP0, TMP0, TMP3 > | add y, a, b > |.endmacro > - | ins_arith addo32., fadd > + | ins_arith addo32., fadd, __adddf3 > |.else > - | ins_arith addo., fadd > + | ins_arith addo., fadd, __adddf3 > |.endif > break; > case BC_SUBVN: case BC_SUBNV: case BC_SUBVV: > @@ -3602,36 +4120,48 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | subo. TMP0, TMP0, TMP3 > | sub y, a, b > |.endmacro > - | ins_arith subo32., fsub > + | ins_arith subo32., fsub, __subdf3 > |.else > - | ins_arith subo., fsub > + | ins_arith subo., fsub, __subdf3 > |.endif > break; > case BC_MULVN: case BC_MULNV: case BC_MULVV: > - | ins_arith mullwo., fmul > + | ins_arith mullwo., fmul, __muldf3 > break; > case BC_DIVVN: case BC_DIVNV: case BC_DIVVV: > | ins_arithfp fdiv > break; > case BC_MODVN: > - | ins_arith intmod, fpmod > + | ins_arith intmod, fpmod, sfpmod > break; > case BC_MODNV: case BC_MODVV: > - | ins_arith intmod, fpmod_ > + | ins_arith intmod, fpmod_, sfpmod > break; > case BC_POW: > | // NYI: (partial) integer arithmetic. > - | lwzx TMP1, BASE, RB > + | lwzx CARG1, BASE, RB > + | lwzx CARG3, BASE, RC > + |.if FPU > | lfdx FARG1, BASE, RB > - | lwzx TMP2, BASE, RC > | lfdx FARG2, BASE, RC > - | checknum cr0, TMP1 > - | checknum cr1, TMP2 > + |.else > + | add TMP1, BASE, RB > + | add TMP2, BASE, RC > + | lwz CARG2, 4(TMP1) > + | lwz CARG4, 4(TMP2) > + |.endif > + | checknum cr0, CARG1 > + | checknum cr1, CARG3 > | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | bge ->vmeta_arith_vv > | blex pow > | ins_next1 > + |.if FPU > | stfdx FARG1, BASE, RA > + |.else > + | stwux CARG1, RA, BASE > + | stw CARG2, 4(RA) > + |.endif > | ins_next2 > break; > > @@ -3651,8 +4181,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lp BASE, L->base > | bne ->vmeta_binop > | ins_next1 > + |.if FPU > | lfdx f0, BASE, SAVE0 // Copy result from RB to RA. > | stfdx f0, BASE, RA > + |.else > + | lwzux TMP0, SAVE0, BASE > + | lwz TMP1, 4(SAVE0) > + | stwux TMP0, RA, BASE > + | stw TMP1, 4(RA) > + |.endif > | ins_next2 > break; > > @@ -3715,8 +4252,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > case BC_KNUM: > | // RA = dst*8, RD = num_const*8 > | ins_next1 > + |.if FPU > | lfdx f0, KBASE, RD > | stfdx f0, BASE, RA > + |.else > + | lwzux TMP0, RD, KBASE > + | lwz TMP1, 4(RD) > + | stwux TMP0, RA, BASE > + | stw TMP1, 4(RA) > + |.endif > | ins_next2 > break; > case BC_KPRI: > @@ -3749,8 +4293,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lwzx UPVAL:RB, LFUNC:RB, RD > | ins_next1 > | lwz TMP1, UPVAL:RB->v > + |.if FPU > | lfd f0, 0(TMP1) > | stfdx f0, BASE, RA > + |.else > + | lwz TMP2, 0(TMP1) > + | lwz TMP3, 4(TMP1) > + | stwux TMP2, RA, BASE > + | stw TMP3, 4(RA) > + |.endif > | ins_next2 > break; > case BC_USETV: > @@ -3758,14 +4309,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lwz LFUNC:RB, FRAME_FUNC(BASE) > | srwi RA, RA, 1 > | addi RA, RA, offsetof(GCfuncL, uvptr) > + |.if FPU > | lfdux f0, RD, BASE > + |.else > + | lwzux CARG1, RD, BASE > + | lwz CARG3, 4(RD) > + |.endif > | lwzx UPVAL:RB, LFUNC:RB, RA > | lbz TMP3, UPVAL:RB->marked > | lwz CARG2, UPVAL:RB->v > | andix. TMP3, TMP3, LJ_GC_BLACK // isblack(uv) > | lbz TMP0, UPVAL:RB->closed > | lwz TMP2, 0(RD) > + |.if FPU > | stfd f0, 0(CARG2) > + |.else > + | stw CARG1, 0(CARG2) > + | stw CARG3, 4(CARG2) > + |.endif > | cmplwi cr1, TMP0, 0 > | lwz TMP1, 4(RD) > | cror 4*cr0+eq, 4*cr0+eq, 4*cr1+eq > @@ -3821,11 +4382,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lwz LFUNC:RB, FRAME_FUNC(BASE) > | srwi RA, RA, 1 > | addi RA, RA, offsetof(GCfuncL, uvptr) > + |.if FPU > | lfdx f0, KBASE, RD > + |.else > + | lwzux TMP2, RD, KBASE > + | lwz TMP3, 4(RD) > + |.endif > | lwzx UPVAL:RB, LFUNC:RB, RA > | ins_next1 > | lwz TMP1, UPVAL:RB->v > + |.if FPU > | stfd f0, 0(TMP1) > + |.else > + | stw TMP2, 0(TMP1) > + | stw TMP3, 4(TMP1) > + |.endif > | ins_next2 > break; > case BC_USETP: > @@ -3973,11 +4544,21 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > |.endif > | ble ->vmeta_tgetv // Integer key and in array part? > | lwzx TMP0, TMP1, TMP2 > + |.if FPU > | lfdx f14, TMP1, TMP2 > + |.else > + | lwzux SAVE0, TMP1, TMP2 > + | lwz SAVE1, 4(TMP1) > + |.endif > | checknil TMP0; beq >2 > |1: > | ins_next1 > + |.if FPU > | stfdx f14, BASE, RA > + |.else > + | stwux SAVE0, RA, BASE > + | stw SAVE1, 4(RA) > + |.endif > | ins_next2 > | > |2: // Check for __index if table value is nil. > @@ -4053,12 +4634,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lwz TMP1, TAB:RB->asize > | lwz TMP2, TAB:RB->array > | cmplw TMP0, TMP1; bge ->vmeta_tgetb > + |.if FPU > | lwzx TMP1, TMP2, RC > | lfdx f0, TMP2, RC > + |.else > + | lwzux TMP1, TMP2, RC > + | lwz TMP3, 4(TMP2) > + |.endif > | checknil TMP1; beq >5 > |1: > | ins_next1 > + |.if FPU > | stfdx f0, BASE, RA > + |.else > + | stwux TMP1, RA, BASE > + | stw TMP3, 4(RA) > + |.endif > | ins_next2 > | > |5: // Check for __index if table value is nil. > @@ -4088,10 +4679,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | cmplw TMP0, CARG2 > | slwi TMP2, CARG2, 3 > | ble ->vmeta_tgetr // In array part? > + |.if FPU > | lfdx f14, TMP1, TMP2 > + |.else > + | lwzux SAVE0, TMP2, TMP1 > + | lwz SAVE1, 4(TMP2) > + |.endif > |->BC_TGETR_Z: > | ins_next1 > + |.if FPU > | stfdx f14, BASE, RA > + |.else > + | stwux SAVE0, RA, BASE > + | stw SAVE1, 4(RA) > + |.endif > | ins_next2 > break; > > @@ -4132,11 +4733,22 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | ble ->vmeta_tsetv // Integer key and in array part? > | lwzx TMP2, TMP1, TMP0 > | lbz TMP3, TAB:RB->marked > + |.if FPU > | lfdx f14, BASE, RA > + |.else > + | add SAVE1, BASE, RA > + | lwz SAVE0, 0(SAVE1) > + | lwz SAVE1, 4(SAVE1) > + |.endif > | checknil TMP2; beq >3 > |1: > | andix. TMP2, TMP3, LJ_GC_BLACK // isblack(table) > + |.if FPU > | stfdx f14, TMP1, TMP0 > + |.else > + | stwux SAVE0, TMP1, TMP0 > + | stw SAVE1, 4(TMP1) > + |.endif > | bne >7 > |2: > | ins_next > @@ -4177,7 +4789,13 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lwz NODE:TMP2, TAB:RB->node > | stb ZERO, TAB:RB->nomm // Clear metamethod cache. > | and TMP1, TMP1, TMP0 // idx = str->hash & tab->hmask > + |.if FPU > | lfdx f14, BASE, RA > + |.else > + | add CARG2, BASE, RA > + | lwz SAVE0, 0(CARG2) > + | lwz SAVE1, 4(CARG2) > + |.endif > | slwi TMP0, TMP1, 5 > | slwi TMP1, TMP1, 3 > | sub TMP1, TMP0, TMP1 > @@ -4193,7 +4811,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | checknil CARG2; beq >4 // Key found, but nil value? > |2: > | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table) > + |.if FPU > | stfd f14, NODE:TMP2->val > + |.else > + | stw SAVE0, NODE:TMP2->val.u32.hi > + | stw SAVE1, NODE:TMP2->val.u32.lo > + |.endif > | bne >7 > |3: > | ins_next > @@ -4232,7 +4855,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | bl extern lj_tab_newkey // (lua_State *L, GCtab *t, TValue *k) > | // Returns TValue *. > | lp BASE, L->base > + |.if FPU > | stfd f14, 0(CRET1) > + |.else > + | stw SAVE0, 0(CRET1) > + | stw SAVE1, 4(CRET1) > + |.endif > | b <3 // No 2nd write barrier needed. > | > |7: // Possible table write barrier for the value. Skip valiswhite check. > @@ -4249,13 +4877,24 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | lwz TMP2, TAB:RB->array > | lbz TMP3, TAB:RB->marked > | cmplw TMP0, TMP1 > + |.if FPU > | lfdx f14, BASE, RA > + |.else > + | add CARG2, BASE, RA > + | lwz SAVE0, 0(CARG2) > + | lwz SAVE1, 4(CARG2) > + |.endif > | bge ->vmeta_tsetb > | lwzx TMP1, TMP2, RC > | checknil TMP1; beq >5 > |1: > | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table) > + |.if FPU > | stfdx f14, TMP2, RC > + |.else > + | stwux SAVE0, RC, TMP2 > + | stw SAVE1, 4(RC) > + |.endif > | bne >7 > |2: > | ins_next > @@ -4295,10 +4934,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > |2: > | cmplw TMP0, CARG3 > | slwi TMP2, CARG3, 3 > + |.if FPU > | lfdx f14, BASE, RA > + |.else > + | lwzux SAVE0, RA, BASE > + | lwz SAVE1, 4(RA) > + |.endif > | ble ->vmeta_tsetr // In array part? > | ins_next1 > + |.if FPU > | stfdx f14, TMP1, TMP2 > + |.else > + | stwux SAVE0, TMP1, TMP2 > + | stw SAVE1, 4(TMP1) > + |.endif > | ins_next2 > | > |7: // Possible table write barrier for the value. Skip valiswhite check. > @@ -4328,10 +4977,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | add TMP1, TMP1, TMP0 > | andix. TMP0, TMP3, LJ_GC_BLACK // isblack(table) > |3: // Copy result slots to table. > + |.if FPU > | lfd f0, 0(RA) > + |.else > + | lwz SAVE0, 0(RA) > + | lwz SAVE1, 4(RA) > + |.endif > | addi RA, RA, 8 > | cmpw cr1, RA, TMP2 > + |.if FPU > | stfd f0, 0(TMP1) > + |.else > + | stw SAVE0, 0(TMP1) > + | stw SAVE1, 4(TMP1) > + |.endif > | addi TMP1, TMP1, 8 > | blt cr1, <3 > | bne >7 > @@ -4398,9 +5057,20 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | beq cr1, >3 > |2: > | addi TMP3, TMP2, 8 > + |.if FPU > | lfdx f0, RA, TMP2 > + |.else > + | add CARG3, RA, TMP2 > + | lwz CARG1, 0(CARG3) > + | lwz CARG2, 4(CARG3) > + |.endif > | cmplw cr1, TMP3, NARGS8:RC > + |.if FPU > | stfdx f0, BASE, TMP2 > + |.else > + | stwux CARG1, TMP2, BASE > + | stw CARG2, 4(TMP2) > + |.endif > | mr TMP2, TMP3 > | bne cr1, <2 > |3: > @@ -4433,14 +5103,28 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | add BASE, BASE, RA > | lwz TMP1, -24(BASE) > | lwz LFUNC:RB, -20(BASE) > + |.if FPU > | lfd f1, -8(BASE) > | lfd f0, -16(BASE) > + |.else > + | lwz CARG1, -8(BASE) > + | lwz CARG2, -4(BASE) > + | lwz CARG3, -16(BASE) > + | lwz CARG4, -12(BASE) > + |.endif > | stw TMP1, 0(BASE) // Copy callable. > | stw LFUNC:RB, 4(BASE) > | checkfunc TMP1 > - | stfd f1, 16(BASE) // Copy control var. > | li NARGS8:RC, 16 // Iterators get 2 arguments. > + |.if FPU > + | stfd f1, 16(BASE) // Copy control var. > | stfdu f0, 8(BASE) // Copy state. > + |.else > + | stw CARG1, 16(BASE) // Copy control var. > + | stw CARG2, 20(BASE) > + | stwu CARG3, 8(BASE) // Copy state. > + | stw CARG4, 4(BASE) > + |.endif > | bne ->vmeta_call > | ins_call > break; > @@ -4461,7 +5145,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | slwi TMP3, RC, 3 > | bge >5 // Index points after array part? > | lwzx TMP2, TMP1, TMP3 > + |.if FPU > | lfdx f0, TMP1, TMP3 > + |.else > + | lwzux CARG1, TMP3, TMP1 > + | lwz CARG2, 4(TMP3) > + |.endif > | checknil TMP2 > | lwz INS, -4(PC) > | beq >4 > @@ -4473,7 +5162,12 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > |.endif > | addi RC, RC, 1 > | addis TMP3, PC, -(BCBIAS_J*4 >> 16) > + |.if FPU > | stfd f0, 8(RA) > + |.else > + | stw CARG1, 8(RA) > + | stw CARG2, 12(RA) > + |.endif > | decode_RD4 TMP1, INS > | stw RC, -4(RA) // Update control var. > | add PC, TMP1, TMP3 > @@ -4498,17 +5192,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | slwi RB, RC, 3 > | sub TMP3, TMP3, RB > | lwzx RB, TMP2, TMP3 > + |.if FPU > | lfdx f0, TMP2, TMP3 > + |.else > + | add CARG3, TMP2, TMP3 > + | lwz CARG1, 0(CARG3) > + | lwz CARG2, 4(CARG3) > + |.endif > | add NODE:TMP3, TMP2, TMP3 > | checknil RB > | lwz INS, -4(PC) > | beq >7 > + |.if FPU > | lfd f1, NODE:TMP3->key > + |.else > + | lwz CARG3, NODE:TMP3->key.u32.hi > + | lwz CARG4, NODE:TMP3->key.u32.lo > + |.endif > | addis TMP2, PC, -(BCBIAS_J*4 >> 16) > + |.if FPU > | stfd f0, 8(RA) > + |.else > + | stw CARG1, 8(RA) > + | stw CARG2, 12(RA) > + |.endif > | add RC, RC, TMP0 > | decode_RD4 TMP1, INS > + |.if FPU > | stfd f1, 0(RA) > + |.else > + | stw CARG3, 0(RA) > + | stw CARG4, 4(RA) > + |.endif > | addi RC, RC, 1 > | add PC, TMP1, TMP2 > | stw RC, -4(RA) // Update control var. > @@ -4574,9 +5289,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | subi TMP2, TMP2, 16 > | ble >2 // No vararg slots? > |1: // Copy vararg slots to destination slots. > + |.if FPU > | lfd f0, 0(RC) > + |.else > + | lwz CARG1, 0(RC) > + | lwz CARG2, 4(RC) > + |.endif > | addi RC, RC, 8 > + |.if FPU > | stfd f0, 0(RA) > + |.else > + | stw CARG1, 0(RA) > + | stw CARG2, 4(RA) > + |.endif > | cmplw RA, TMP2 > | cmplw cr1, RC, TMP3 > | bge >3 // All destination slots filled? > @@ -4599,9 +5324,19 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | addi MULTRES, TMP1, 8 > | bgt >7 > |6: > + |.if FPU > | lfd f0, 0(RC) > + |.else > + | lwz CARG1, 0(RC) > + | lwz CARG2, 4(RC) > + |.endif > | addi RC, RC, 8 > + |.if FPU > | stfd f0, 0(RA) > + |.else > + | stw CARG1, 0(RA) > + | stw CARG2, 4(RA) > + |.endif > | cmplw RC, TMP3 > | addi RA, RA, 8 > | blt <6 // More vararg slots? > @@ -4652,14 +5387,38 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | li TMP1, 0 > |2: > | addi TMP3, TMP1, 8 > + |.if FPU > | lfdx f0, RA, TMP1 > + |.else > + | add CARG3, RA, TMP1 > + | lwz CARG1, 0(CARG3) > + | lwz CARG2, 4(CARG3) > + |.endif > | cmpw TMP3, RC > + |.if FPU > | stfdx f0, TMP2, TMP1 > + |.else > + | add CARG3, TMP2, TMP1 > + | stw CARG1, 0(CARG3) > + | stw CARG2, 4(CARG3) > + |.endif > | beq >3 > | addi TMP1, TMP3, 8 > + |.if FPU > | lfdx f1, RA, TMP3 > + |.else > + | add CARG3, RA, TMP3 > + | lwz CARG1, 0(CARG3) > + | lwz CARG2, 4(CARG3) > + |.endif > | cmpw TMP1, RC > + |.if FPU > | stfdx f1, TMP2, TMP3 > + |.else > + | add CARG3, TMP2, TMP3 > + | stw CARG1, 0(CARG3) > + | stw CARG2, 4(CARG3) > + |.endif > | bne <2 > |3: > |5: > @@ -4701,8 +5460,15 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | subi TMP2, BASE, 8 > | decode_RB8 RB, INS > if (op == BC_RET1) { > + |.if FPU > | lfd f0, 0(RA) > | stfd f0, 0(TMP2) > + |.else > + | lwz CARG1, 0(RA) > + | lwz CARG2, 4(RA) > + | stw CARG1, 0(TMP2) > + | stw CARG2, 4(TMP2) > + |.endif > } > |5: > | cmplw RB, RD > @@ -4763,11 +5529,11 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > |4: > | stw CARG1, FORL_IDX*8+4(RA) > } else { > - | lwz TMP3, FORL_STEP*8(RA) > + | lwz SAVE0, FORL_STEP*8(RA) > | lwz CARG3, FORL_STEP*8+4(RA) > | lwz TMP2, FORL_STOP*8(RA) > | lwz CARG2, FORL_STOP*8+4(RA) > - | cmplw cr7, TMP3, TISNUM > + | cmplw cr7, SAVE0, TISNUM > | cmplw cr1, TMP2, TISNUM > | crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq > | crand 4*cr0+eq, 4*cr0+eq, 4*cr1+eq > @@ -4810,41 +5576,80 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > if (vk) { > |.if DUALNUM > |9: // FP loop. > + |.if FPU > | lfd f1, FORL_IDX*8(RA) > |.else > + | lwz CARG1, FORL_IDX*8(RA) > + | lwz CARG2, FORL_IDX*8+4(RA) > + |.endif > + |.else > | lfdux f1, RA, BASE > |.endif > + |.if FPU > | lfd f3, FORL_STEP*8(RA) > | lfd f2, FORL_STOP*8(RA) > - | lwz TMP3, FORL_STEP*8(RA) > | fadd f1, f1, f3 > | stfd f1, FORL_IDX*8(RA) > + |.else > + | lwz CARG3, FORL_STEP*8(RA) > + | lwz CARG4, FORL_STEP*8+4(RA) > + | mr SAVE1, RD > + | blex __adddf3 > + | mr RD, SAVE1 > + | stw CRET1, FORL_IDX*8(RA) > + | stw CRET2, FORL_IDX*8+4(RA) > + | lwz CARG3, FORL_STOP*8(RA) > + | lwz CARG4, FORL_STOP*8+4(RA) > + |.endif > + | lwz SAVE0, FORL_STEP*8(RA) > } else { > |.if DUALNUM > |9: // FP loop. > |.else > | lwzux TMP1, RA, BASE > - | lwz TMP3, FORL_STEP*8(RA) > + | lwz SAVE0, FORL_STEP*8(RA) > | lwz TMP2, FORL_STOP*8(RA) > | cmplw cr0, TMP1, TISNUM > - | cmplw cr7, TMP3, TISNUM > + | cmplw cr7, SAVE0, TISNUM > | cmplw cr1, TMP2, TISNUM > |.endif > + |.if FPU > | lfd f1, FORL_IDX*8(RA) > + |.else > + | lwz CARG1, FORL_IDX*8(RA) > + | lwz CARG2, FORL_IDX*8+4(RA) > + |.endif > | crand 4*cr0+lt, 4*cr0+lt, 4*cr7+lt > | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > + |.if FPU > | lfd f2, FORL_STOP*8(RA) > + |.else > + | lwz CARG3, FORL_STOP*8(RA) > + | lwz CARG4, FORL_STOP*8+4(RA) > + |.endif > | bge ->vmeta_for > } > - | cmpwi cr6, TMP3, 0 > + | cmpwi cr6, SAVE0, 0 > if (op != BC_JFORL) { > | srwi RD, RD, 1 > } > + |.if FPU > | stfd f1, FORL_EXT*8(RA) > + |.else > + | stw CARG1, FORL_EXT*8(RA) > + | stw CARG2, FORL_EXT*8+4(RA) > + |.endif > if (op != BC_JFORL) { > | add RD, PC, RD > } > + |.if FPU > | fcmpu cr0, f1, f2 > + |.else > + | mr SAVE1, RD > + | blex __ledf2 > + | cmpwi CRET1, 0 > + | mr RD, SAVE1 > + |.endif > if (op == BC_JFORI) { > | addis PC, RD, -(BCBIAS_J*4 >> 16) > }