From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 845A757B4AC; Sun, 20 Aug 2023 12:48:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 845A757B4AC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1692524901; bh=PZAoVBK/7ERsVZbhVESZ5BO9XEBhuyMjLcdI96feoJs=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=sDe+93xOEdPgIQd2T3pFQCUIDDAnCBiIwj5fNlU5EV1NSe7GlRVAUBiwqIi18jWj0 X7DYhM+tzoPiibmnKzrlkbVD0VMSm3JYqbLNEq5Zzq4D1x9r9nRWWFhf4s58FAX8ke hzdsWcLjFZMC2HvEnFe2dvlyr4r0xeueoahb4jZI= Received: from smtp48.i.mail.ru (smtp48.i.mail.ru [95.163.41.86]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 81E8757B48E for ; Sun, 20 Aug 2023 12:48:20 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 81E8757B48E Received: by smtp48.i.mail.ru with esmtpa (envelope-from ) id 1qXf2h-003XmT-01; Sun, 20 Aug 2023 12:48:20 +0300 Date: Sun, 20 Aug 2023 12:48:18 +0300 To: Sergey Kaplun Message-ID: References: <04224626635ddc4c5bb3341088dddd0d310f7e9f.1692089299.git.skaplun@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD93C8852532D76B9E3B40810D894915830A8FE19CFDA93E591182A05F5380850400052E277E12F7B2BFF31AA8F71734F662AF7327C31DA0FD83F1835691D2A33EA X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7FD1DC081A8FC712EEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006373C9FC9F3BACECB908638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8DCA83A8B46E6012CDA87E62A38C405AA117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCC0EC8C44E4C1BEE2A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD186FD1C55BDD38FC3FD2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EEB28585415E75ADA93D5BA627BF9F2FCFD8FC6C240DEA76429C9F4D5AE37F343AA9539A8B242431040A6AB1C7CE11FEE367F1C1C3ABB44F3A03F1AB874ED89028C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407978DA827A17800CE7A03885E80CDF58692DBA43225CD8A89F83C798A30B85E16B57739F23D657EF2BB5C8C57E37DE458BEDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A586E4C4E1C1999BD2876A8EC58A1F0A48F6AAE36F1AF2432BF87CCE6106E1FC07E67D4AC08A07B9B0DB8A315C1FF4794DBDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFFD9782EA11F88F18D71942CB8A4859BACFB8CB65963FA4A78A90E5062EF79A989DCE6E1EBC7C769ED9FDDA0AEF943CB3151324E5356463C446A66302938002DAE48CAC7CA610320002C26D483E81D6BE64ACE4A408B72B61B0CA6F94E606A667A52EF62A646584F811BD90D3D42C882D43082AE146A756F3 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojGRTUr9c9PlmqTBpJTZ9YgA== X-Mailru-Sender: 0E9E14D9EC491FBA7D1E237993EE709EF2FAACB74C4303C9FF31AA8F71734F66BC4EE72AB2E748C504C9FB44FCBCE9EE92D99EB8CC7091A7ECEABDC5717908DEF544888E8238EB4872D6B4FCE48DF648AE208404248635DF X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 2/5] Remove pow() splitting and cleanup backends. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Maxim Kokryashkin via Tarantool-patches Reply-To: Maxim Kokryashkin Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Sergey! Thanks for the fixes! LGTM now, see my responses below. On Thu, Aug 17, 2023 at 06:33:31PM +0300, Sergey Kaplun wrote: > Hi Maxim! > Thanks for the review! > Updated considering your comments. > > On 17.08.23, Maxim Kokryashkin wrote: > > Hi, Sergey! > > Thanks for the patch! > > Please consider my comments below. > > > > On Tue, Aug 15, 2023 at 12:36:28PM +0300, Sergey Kaplun wrote: > > > From: Mike Pall > > > > > > (cherry-picked from commit b2307c8ad817e350d65cc909a579ca2f77439682) > > > > > > The JIT engine tries to split b^c to exp2(c * log2(b)) with attempt to > > Typo: s/with attempt/with an attempt/ > > Fixed. > > > > rejoin them later for some backends. It adds a dependency on C99 > > > exp2() and log2(), which aren't part of some libm implementations. > > > Also, for some cases for IEEE754 we can see, that exp2(log2(x)) != x, > > > due to mathematical functions accuracy and double precision > > > restrictions. So, the values on the JIT slots and Lua stack are > > > inconsistent. > > > > There is a lot to it. There are chnages in emission, fold optimizations, > > narrowing, etc. Maybe it is worth mentioning some key changes that > > happened as a result of that? That way, this changeset is easier to absorb. > > It's mentioned below, or I don't understand the idea. Well, I think my brain just short-circuited or somehting. Yep, everything is ok. > > > > > > > > > This patch removes splitting of pow operator, so IR_POW is emitting for > > Typo: s/removes/removes the/ > > Fixed. > > > > all cases (except power of 0.5 replaced with sqrt operation). > > Typo: s/except/except for the/ > > Typo: s/0.5/0.5, which is/ > > Typo: s/with sqrt/with the sqrt/ > > Fixed all. > > > > > > > Also this patch does some refactoring: > > > > > > * Functions `asm_pow()`, `asm_mod()`, `asm_ldexp()`, `asm_div()` > > > (replaced with `asm_fpdiv()` for CPU architectures) are moved to the > > Typo: s/to the/to/ > > Fixed. > > > > as far as their implementation is generic for all > > > architectures. > > > * Fusing of IR_HREF + IR_EQ/IR_NE moved to a `asm_fuseequal()`. > > Typo: s/moved/was moved/ > > Typo: s/to a/to/ > > Fixed all. > > > > * Since `lj_vm_exp2()` subroutine and `IRFPM_EXP2` are removed as no > > > longer used. > > I can't understand what this sentence means, please rephrase it. > > Removed "Since" as measleading. > > > > > > > > What about changes with `asm_cnew`? I think you should mention them too. > > Added. > > > > Sergey Kaplun: > > > * added the description and the test for the problem > > > > > > Part of tarantool/tarantool#8825 > > > --- > > > src/lj_arch.h | 3 - > > > src/lj_asm.c | 106 +++++++++++------- > > > src/lj_asm_arm.h | 10 +- > > > src/lj_asm_arm64.h | 39 +------ > > > src/lj_asm_mips.h | 38 +------ > > > src/lj_asm_ppc.h | 9 +- > > > src/lj_asm_x86.h | 37 +----- > > > src/lj_ir.h | 2 +- > > > src/lj_ircall.h | 1 - > > > src/lj_opt_fold.c | 18 ++- > > > src/lj_opt_narrow.c | 20 +--- > > > src/lj_opt_split.c | 21 ---- > > > src/lj_vm.h | 5 - > > > src/lj_vmmath.c | 8 -- > > > .../lj-9-pow-inconsistencies.test.lua | 63 +++++++++++ > > > 15 files changed, 158 insertions(+), 222 deletions(-) > > > create mode 100644 test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > > > > > > diff --git a/src/lj_arch.h b/src/lj_arch.h > > > index cf31a291..3bdbe84e 100644 > > > --- a/src/lj_arch.h > > > +++ b/src/lj_arch.h > > > @@ -607,9 +607,6 @@ > > > #if defined(__ANDROID__) || defined(__symbian__) || LJ_TARGET_XBOX360 || LJ_TARGET_WINDOWS > > > #define LUAJIT_NO_LOG2 > > > #endif > > > -#if defined(__symbian__) || LJ_TARGET_WINDOWS > > > -#define LUAJIT_NO_EXP2 > > > -#endif > > > #if LJ_TARGET_CONSOLE || (LJ_TARGET_IOS && __IPHONE_OS_VERSION_MIN_REQUIRED >= __IPHONE_8_0) > > > #define LJ_NO_SYSTEM 1 > > > #endif > > > diff --git a/src/lj_asm.c b/src/lj_asm.c > > > index b352fd35..a6906b19 100644 > > > --- a/src/lj_asm.c > > > +++ b/src/lj_asm.c > > > @@ -1356,32 +1356,6 @@ static void asm_call(ASMState *as, IRIns *ir) > > > asm_gencall(as, ci, args); > > > } > > > > > > -#if !LJ_SOFTFP32 > > > -static void asm_fppow(ASMState *as, IRIns *ir, IRRef lref, IRRef rref) > > > -{ > > > - const CCallInfo *ci = &lj_ir_callinfo[IRCALL_pow]; > > > - IRRef args[2]; > > > - args[0] = lref; > > > - args[1] = rref; > > > - asm_setupresult(as, ir, ci); > > > - asm_gencall(as, ci, args); > > > -} > > > - > > > -static int asm_fpjoin_pow(ASMState *as, IRIns *ir) > > > -{ > > > - IRIns *irp = IR(ir->op1); > > > - if (irp == ir-1 && irp->o == IR_MUL && !ra_used(irp)) { > > > - IRIns *irpp = IR(irp->op1); > > > - if (irpp == ir-2 && irpp->o == IR_FPMATH && > > > - irpp->op2 == IRFPM_LOG2 && !ra_used(irpp)) { > > > - asm_fppow(as, ir, irpp->op1, irp->op2); > > > - return 1; > > > - } > > > - } > > > - return 0; > > > -} > > > -#endif > > > - > > > /* -- PHI and loop handling ----------------------------------------------- */ > > > > > > /* Break a PHI cycle by renaming to a free register (evict if needed). */ > > > @@ -1652,6 +1626,62 @@ static void asm_loop(ASMState *as) > > > #error "Missing assembler for target CPU" > > > #endif > > > > > > +/* -- Common instruction helpers ------------------------------------------ */ > > > + > > > +#if !LJ_SOFTFP32 > > > +#if !LJ_TARGET_X86ORX64 > > > +#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp) > > > +#define asm_fppowi(as, ir) asm_callid(as, ir, IRCALL_lj_vm_powi) > > > +#endif > > > + > > > +static void asm_pow(ASMState *as, IRIns *ir) > > > +{ > > > +#if LJ_64 && LJ_HASFFI > > > + if (!irt_isnum(ir->t)) > > > + asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_powi64 : > > > + IRCALL_lj_carith_powu64); > > > + else > > > +#endif > > > + if (irt_isnum(IR(ir->op2)->t)) > > > + asm_callid(as, ir, IRCALL_pow); > > > + else > > > + asm_fppowi(as, ir); > > > +} > > > + > > > +static void asm_div(ASMState *as, IRIns *ir) > > > +{ > > > +#if LJ_64 && LJ_HASFFI > > > + if (!irt_isnum(ir->t)) > > > + asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_divi64 : > > > + IRCALL_lj_carith_divu64); > > > + else > > > +#endif > > > + asm_fpdiv(as, ir); > > > +} > > > +#endif > > > + > > > +static void asm_mod(ASMState *as, IRIns *ir) > > > +{ > > > +#if LJ_64 && LJ_HASFFI > > > + if (!irt_isint(ir->t)) > > > + asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_modi64 : > > > + IRCALL_lj_carith_modu64); > > > + else > > > +#endif > > > + asm_callid(as, ir, IRCALL_lj_vm_modi); > > > +} > > > + > > > +static void asm_fuseequal(ASMState *as, IRIns *ir) > > > +{ > > > + /* Fuse HREF + EQ/NE. */ > > > + if ((ir-1)->o == IR_HREF && ir->op1 == as->curins-1) { > > > + as->curins--; > > > + asm_href(as, ir-1, (IROp)ir->o); > > > + } else { > > > + asm_equal(as, ir); > > > + } > > > +} > > > + > > > /* -- Instruction dispatch ------------------------------------------------ */ > > > > > > /* Assemble a single instruction. */ > > > @@ -1674,14 +1704,7 @@ static void asm_ir(ASMState *as, IRIns *ir) > > > case IR_ABC: > > > asm_comp(as, ir); > > > break; > > > - case IR_EQ: case IR_NE: > > > - if ((ir-1)->o == IR_HREF && ir->op1 == as->curins-1) { > > > - as->curins--; > > > - asm_href(as, ir-1, (IROp)ir->o); > > > - } else { > > > - asm_equal(as, ir); > > > - } > > > - break; > > > + case IR_EQ: case IR_NE: asm_fuseequal(as, ir); break; > > > > > > case IR_RETF: asm_retf(as, ir); break; > > > > > > @@ -1750,7 +1773,13 @@ static void asm_ir(ASMState *as, IRIns *ir) > > > case IR_SNEW: case IR_XSNEW: asm_snew(as, ir); break; > > > case IR_TNEW: asm_tnew(as, ir); break; > > > case IR_TDUP: asm_tdup(as, ir); break; > > > - case IR_CNEW: case IR_CNEWI: asm_cnew(as, ir); break; > > > + case IR_CNEW: case IR_CNEWI: > > > +#if LJ_HASFFI > > > + asm_cnew(as, ir); > > > +#else > > > + lua_assert(0); > > > +#endif > > > + break; > > > > > > /* Buffer operations. */ > > > case IR_BUFHDR: asm_bufhdr(as, ir); break; > > > @@ -2215,6 +2244,10 @@ static void asm_setup_regsp(ASMState *as) > > > if (inloop) > > > as->modset |= RSET_SCRATCH; > > > #if LJ_TARGET_X86 > > > + if (irt_isnum(IR(ir->op2)->t)) { > > > + if (as->evenspill < 4) /* Leave room to call pow(). */ > > > + as->evenspill = 4; > > > + } > > > break; > > > #else > > > ir->prev = REGSP_HINT(RID_FPRET); > > > @@ -2240,9 +2273,6 @@ static void asm_setup_regsp(ASMState *as) > > > continue; > > > } > > > break; > > > - } else if (ir->op2 == IRFPM_EXP2 && !LJ_64) { > > > - if (as->evenspill < 4) /* Leave room to call pow(). */ > > > - as->evenspill = 4; > > > } > > > #endif > > > if (inloop) > > > diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h > > > index 2894e5c9..29a07c80 100644 > > > --- a/src/lj_asm_arm.h > > > +++ b/src/lj_asm_arm.h > > > @@ -1275,8 +1275,6 @@ static void asm_cnew(ASMState *as, IRIns *ir) > > > ra_allockreg(as, (int32_t)(sz+sizeof(GCcdata)), > > > ra_releasetmp(as, ASMREF_TMP1)); > > > } > > > -#else > > > -#define asm_cnew(as, ir) ((void)0) > > > #endif > > > > > > /* -- Write barriers ------------------------------------------------------ */ > > > @@ -1371,8 +1369,6 @@ static void asm_callround(ASMState *as, IRIns *ir, int id) > > > > > > static void asm_fpmath(ASMState *as, IRIns *ir) > > > { > > > - if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir)) > > > - return; > > > if (ir->op2 <= IRFPM_TRUNC) > > > asm_callround(as, ir, ir->op2); > > > else if (ir->op2 == IRFPM_SQRT) > > > @@ -1499,14 +1495,10 @@ static void asm_mul(ASMState *as, IRIns *ir) > > > #define asm_mulov(as, ir) asm_mul(as, ir) > > > > > > #if !LJ_SOFTFP > > > -#define asm_div(as, ir) asm_fparith(as, ir, ARMI_VDIV_D) > > > -#define asm_pow(as, ir) asm_callid(as, ir, IRCALL_lj_vm_powi) > > > +#define asm_fpdiv(as, ir) asm_fparith(as, ir, ARMI_VDIV_D) > > > #define asm_abs(as, ir) asm_fpunary(as, ir, ARMI_VABS_D) > > > -#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp) > > > #endif > > > > > > -#define asm_mod(as, ir) asm_callid(as, ir, IRCALL_lj_vm_modi) > > > - > > > static void asm_neg(ASMState *as, IRIns *ir) > > > { > > > #if !LJ_SOFTFP > > > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > > > index aea251a9..c3d6889e 100644 > > > --- a/src/lj_asm_arm64.h > > > +++ b/src/lj_asm_arm64.h > > > @@ -1249,8 +1249,6 @@ static void asm_cnew(ASMState *as, IRIns *ir) > > > ra_allockreg(as, (int32_t)(sz+sizeof(GCcdata)), > > > ra_releasetmp(as, ASMREF_TMP1)); > > > } > > > -#else > > > -#define asm_cnew(as, ir) ((void)0) > > > #endif > > > > > > /* -- Write barriers ------------------------------------------------------ */ > > > @@ -1327,8 +1325,6 @@ static void asm_fpmath(ASMState *as, IRIns *ir) > > > } else if (fpm <= IRFPM_TRUNC) { > > > asm_fpunary(as, ir, fpm == IRFPM_FLOOR ? A64I_FRINTMd : > > > fpm == IRFPM_CEIL ? A64I_FRINTPd : A64I_FRINTZd); > > > - } else if (fpm == IRFPM_EXP2 && asm_fpjoin_pow(as, ir)) { > > > - return; > > > } else { > > > asm_callid(as, ir, IRCALL_lj_vm_floor + fpm); > > > } > > > @@ -1435,45 +1431,12 @@ static void asm_mul(ASMState *as, IRIns *ir) > > > asm_intmul(as, ir); > > > } > > > > > > -static void asm_div(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_HASFFI > > > - if (!irt_isnum(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_divi64 : > > > - IRCALL_lj_carith_divu64); > > > - else > > > -#endif > > > - asm_fparith(as, ir, A64I_FDIVd); > > > -} > > > - > > > -static void asm_pow(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_HASFFI > > > - if (!irt_isnum(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_powi64 : > > > - IRCALL_lj_carith_powu64); > > > - else > > > -#endif > > > - asm_callid(as, ir, IRCALL_lj_vm_powi); > > > -} > > > - > > > #define asm_addov(as, ir) asm_add(as, ir) > > > #define asm_subov(as, ir) asm_sub(as, ir) > > > #define asm_mulov(as, ir) asm_mul(as, ir) > > > > > > +#define asm_fpdiv(as, ir) asm_fparith(as, ir, A64I_FDIVd) > > > #define asm_abs(as, ir) asm_fpunary(as, ir, A64I_FABS) > > > -#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp) > > > - > > > -static void asm_mod(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_HASFFI > > > - if (!irt_isint(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_modi64 : > > > - IRCALL_lj_carith_modu64); > > > - else > > > -#endif > > > - asm_callid(as, ir, IRCALL_lj_vm_modi); > > > -} > > > > > > static void asm_neg(ASMState *as, IRIns *ir) > > > { > > > diff --git a/src/lj_asm_mips.h b/src/lj_asm_mips.h > > > index 4626507b..0f92959b 100644 > > > --- a/src/lj_asm_mips.h > > > +++ b/src/lj_asm_mips.h > > > @@ -1613,8 +1613,6 @@ static void asm_cnew(ASMState *as, IRIns *ir) > > > ra_allockreg(as, (int32_t)(sz+sizeof(GCcdata)), > > > ra_releasetmp(as, ASMREF_TMP1)); > > > } > > > -#else > > > -#define asm_cnew(as, ir) ((void)0) > > > #endif > > > > > > /* -- Write barriers ------------------------------------------------------ */ > > > @@ -1683,8 +1681,6 @@ static void asm_fpunary(ASMState *as, IRIns *ir, MIPSIns mi) > > > #if !LJ_SOFTFP32 > > > static void asm_fpmath(ASMState *as, IRIns *ir) > > > { > > > - if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir)) > > > - return; > > > #if !LJ_SOFTFP > > > if (ir->op2 <= IRFPM_TRUNC) > > > asm_callround(as, ir, IRCALL_lj_vm_floor + ir->op2); > > > @@ -1772,41 +1768,13 @@ static void asm_mul(ASMState *as, IRIns *ir) > > > } > > > } > > > > > > -static void asm_mod(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_64 && LJ_HASFFI > > > - if (!irt_isint(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_modi64 : > > > - IRCALL_lj_carith_modu64); > > > - else > > > -#endif > > > - asm_callid(as, ir, IRCALL_lj_vm_modi); > > > -} > > > - > > > #if !LJ_SOFTFP32 > > > -static void asm_pow(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_64 && LJ_HASFFI > > > - if (!irt_isnum(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_powi64 : > > > - IRCALL_lj_carith_powu64); > > > - else > > > -#endif > > > - asm_callid(as, ir, IRCALL_lj_vm_powi); > > > -} > > > - > > > -static void asm_div(ASMState *as, IRIns *ir) > > > +static void asm_fpdiv(ASMState *as, IRIns *ir) > > > { > > > -#if LJ_64 && LJ_HASFFI > > > - if (!irt_isnum(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_divi64 : > > > - IRCALL_lj_carith_divu64); > > > - else > > > -#endif > > > #if !LJ_SOFTFP > > > asm_fparith(as, ir, MIPSI_DIV_D); > > > #else > > > - asm_callid(as, ir, IRCALL_softfp_div); > > > + asm_callid(as, ir, IRCALL_softfp_div); > > > #endif > > > } > > > #endif > > > @@ -1844,8 +1812,6 @@ static void asm_abs(ASMState *as, IRIns *ir) > > > } > > > #endif > > > > > > -#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp) > > > - > > > static void asm_arithov(ASMState *as, IRIns *ir) > > > { > > > /* TODO MIPSR6: bovc/bnvc. Caveat: no delay slot to load RID_TMP. */ > > > diff --git a/src/lj_asm_ppc.h b/src/lj_asm_ppc.h > > > index 6aaed058..62a5c3e2 100644 > > > --- a/src/lj_asm_ppc.h > > > +++ b/src/lj_asm_ppc.h > > > @@ -1177,8 +1177,6 @@ static void asm_cnew(ASMState *as, IRIns *ir) > > > ra_allockreg(as, (int32_t)(sz+sizeof(GCcdata)), > > > ra_releasetmp(as, ASMREF_TMP1)); > > > } > > > -#else > > > -#define asm_cnew(as, ir) ((void)0) > > > #endif > > > > > > /* -- Write barriers ------------------------------------------------------ */ > > > @@ -1249,8 +1247,6 @@ static void asm_fpunary(ASMState *as, IRIns *ir, PPCIns pi) > > > > > > static void asm_fpmath(ASMState *as, IRIns *ir) > > > { > > > - if (ir->op2 == IRFPM_EXP2 && asm_fpjoin_pow(as, ir)) > > > - return; > > > if (ir->op2 == IRFPM_SQRT && (as->flags & JIT_F_SQRT)) > > > asm_fpunary(as, ir, PPCI_FSQRT); > > > else > > > @@ -1364,9 +1360,7 @@ static void asm_mul(ASMState *as, IRIns *ir) > > > } > > > } > > > > > > -#define asm_div(as, ir) asm_fparith(as, ir, PPCI_FDIV) > > > -#define asm_mod(as, ir) asm_callid(as, ir, IRCALL_lj_vm_modi) > > > -#define asm_pow(as, ir) asm_callid(as, ir, IRCALL_lj_vm_powi) > > > +#define asm_fpdiv(as, ir) asm_fparith(as, ir, PPCI_FDIV) > > > > > > static void asm_neg(ASMState *as, IRIns *ir) > > > { > > > @@ -1390,7 +1384,6 @@ static void asm_neg(ASMState *as, IRIns *ir) > > > } > > > > > > #define asm_abs(as, ir) asm_fpunary(as, ir, PPCI_FABS) > > > -#define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp) > > > > > > static void asm_arithov(ASMState *as, IRIns *ir, PPCIns pi) > > > { > > > diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h > > > index 63d332ca..5f5fe3cf 100644 > > > --- a/src/lj_asm_x86.h > > > +++ b/src/lj_asm_x86.h > > > @@ -1857,8 +1857,6 @@ static void asm_cnew(ASMState *as, IRIns *ir) > > > asm_gencall(as, ci, args); > > > emit_loadi(as, ra_releasetmp(as, ASMREF_TMP1), (int32_t)(sz+sizeof(GCcdata))); > > > } > > > -#else > > > -#define asm_cnew(as, ir) ((void)0) > > > #endif > > > > > > /* -- Write barriers ------------------------------------------------------ */ > > > @@ -1964,8 +1962,6 @@ static void asm_fpmath(ASMState *as, IRIns *ir) > > > fpm == IRFPM_CEIL ? lj_vm_ceil_sse : lj_vm_trunc_sse); > > > ra_left(as, RID_XMM0, ir->op1); > > > } > > > - } else if (fpm == IRFPM_EXP2 && asm_fpjoin_pow(as, ir)) { > > > - /* Rejoined to pow(). */ > > > } else { > > > asm_callid(as, ir, IRCALL_lj_vm_floor + fpm); > > > } > > > @@ -2000,17 +1996,6 @@ static void asm_fppowi(ASMState *as, IRIns *ir) > > > ra_left(as, RID_EAX, ir->op2); > > > } > > > > > > -static void asm_pow(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_64 && LJ_HASFFI > > > - if (!irt_isnum(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_powi64 : > > > - IRCALL_lj_carith_powu64); > > > - else > > > -#endif > > > - asm_fppowi(as, ir); > > > -} > > > - > > > static int asm_swapops(ASMState *as, IRIns *ir) > > > { > > > IRIns *irl = IR(ir->op1); > > > @@ -2208,27 +2193,7 @@ static void asm_mul(ASMState *as, IRIns *ir) > > > asm_intarith(as, ir, XOg_X_IMUL); > > > } > > > > > > -static void asm_div(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_64 && LJ_HASFFI > > > - if (!irt_isnum(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_divi64 : > > > - IRCALL_lj_carith_divu64); > > > - else > > > -#endif > > > - asm_fparith(as, ir, XO_DIVSD); > > > -} > > > - > > > -static void asm_mod(ASMState *as, IRIns *ir) > > > -{ > > > -#if LJ_64 && LJ_HASFFI > > > - if (!irt_isint(ir->t)) > > > - asm_callid(as, ir, irt_isi64(ir->t) ? IRCALL_lj_carith_modi64 : > > > - IRCALL_lj_carith_modu64); > > > - else > > > -#endif > > > - asm_callid(as, ir, IRCALL_lj_vm_modi); > > > -} > > > +#define asm_fpdiv(as, ir) asm_fparith(as, ir, XO_DIVSD) > > > > > > static void asm_neg_not(ASMState *as, IRIns *ir, x86Group3 xg) > > > { > > > diff --git a/src/lj_ir.h b/src/lj_ir.h > > > index e8bca275..43e55069 100644 > > > --- a/src/lj_ir.h > > > +++ b/src/lj_ir.h > > > @@ -177,7 +177,7 @@ LJ_STATIC_ASSERT((int)IR_XLOAD + IRDELTA_L2S == (int)IR_XSTORE); > > > /* FPMATH sub-functions. ORDER FPM. */ > > > #define IRFPMDEF(_) \ > > > _(FLOOR) _(CEIL) _(TRUNC) /* Must be first and in this order. */ \ > > > - _(SQRT) _(EXP2) _(LOG) _(LOG2) \ > > > + _(SQRT) _(LOG) _(LOG2) \ > > > _(OTHER) > > > > > > typedef enum { > > > diff --git a/src/lj_ircall.h b/src/lj_ircall.h > > > index bbad35b1..af064a6f 100644 > > > --- a/src/lj_ircall.h > > > +++ b/src/lj_ircall.h > > > @@ -192,7 +192,6 @@ typedef struct CCallInfo { > > > _(FPMATH, lj_vm_ceil, 1, N, NUM, XA_FP) \ > > > _(FPMATH, lj_vm_trunc, 1, N, NUM, XA_FP) \ > > > _(FPMATH, sqrt, 1, N, NUM, XA_FP) \ > > > - _(ANY, lj_vm_exp2, 1, N, NUM, XA_FP) \ > > > _(ANY, log, 1, N, NUM, XA_FP) \ > > > _(ANY, lj_vm_log2, 1, N, NUM, XA_FP) \ > > > _(ANY, lj_vm_powi, 2, N, NUM, XA_FP) \ > > > diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c > > > index 27e489af..cd803d87 100644 > > > --- a/src/lj_opt_fold.c > > > +++ b/src/lj_opt_fold.c > > > @@ -237,10 +237,11 @@ LJFOLDF(kfold_fpcall2) > > > } > > > > > > LJFOLD(POW KNUM KINT) > > > +LJFOLD(POW KNUM KNUM) > > > LJFOLDF(kfold_numpow) > > > { > > > lua_Number a = knumleft; > > > - lua_Number b = (lua_Number)fright->i; > > > + lua_Number b = fright->o == IR_KINT ? (lua_Number)fright->i : knumright; > > > lua_Number y = lj_vm_foldarith(a, b, IR_POW - IR_ADD); > > > return lj_ir_knum(J, y); > > > } > > > @@ -1077,7 +1078,7 @@ LJFOLDF(simplify_nummuldiv_negneg) > > > } > > > > > > LJFOLD(POW any KINT) > > > -LJFOLDF(simplify_numpow_xk) > > > +LJFOLDF(simplify_numpow_xkint) > > > { > > > int32_t k = fright->i; > > > TRef ref = fins->op1; > > > @@ -1106,13 +1107,22 @@ LJFOLDF(simplify_numpow_xk) > > > return ref; > > > } > > > > > > +LJFOLD(POW any KNUM) > > > +LJFOLDF(simplify_numpow_xknum) > > > +{ > > > + if (knumright == 0.5) /* x ^ 0.5 ==> sqrt(x) */ > > > + return emitir(IRTN(IR_FPMATH), fins->op1, IRFPM_SQRT); > > > + return NEXTFOLD; > > > +} > > > + > > > LJFOLD(POW KNUM any) > > > LJFOLDF(simplify_numpow_kx) > > > { > > > lua_Number n = knumleft; > > > - if (n == 2.0) { /* 2.0 ^ i ==> ldexp(1.0, tonum(i)) */ > > > - fins->o = IR_CONV; > > > + if (n == 2.0 && irt_isint(fright->t)) { /* 2.0 ^ i ==> ldexp(1.0, i) */ > > > #if LJ_TARGET_X86ORX64 > > > + /* Different IR_LDEXP calling convention on x86/x64 requires conversion. */ > > > + fins->o = IR_CONV; > > > fins->op1 = fins->op2; > > > fins->op2 = IRCONV_NUM_INT; > > > fins->op2 = (IRRef1)lj_opt_fold(J); > > > diff --git a/src/lj_opt_narrow.c b/src/lj_opt_narrow.c > > > index bb61f97b..4f285334 100644 > > > --- a/src/lj_opt_narrow.c > > > +++ b/src/lj_opt_narrow.c > > > @@ -593,10 +593,10 @@ TRef lj_opt_narrow_pow(jit_State *J, TRef rb, TRef rc, TValue *vb, TValue *vc) > > > /* Narrowing must be unconditional to preserve (-x)^i semantics. */ > > > if (tvisint(vc) || numisint(numV(vc))) { > > > int checkrange = 0; > > > - /* Split pow is faster for bigger exponents. But do this only for (+k)^i. */ > > > + /* pow() is faster for bigger exponents. But do this only for (+k)^i. */ > > > if (tref_isk(rb) && (int32_t)ir_knum(IR(tref_ref(rb)))->u32.hi >= 0) { > > > int32_t k = numberVint(vc); > > > - if (!(k >= -65536 && k <= 65536)) goto split_pow; > > > + if (!(k >= -65536 && k <= 65536)) goto force_pow_num; > > > checkrange = 1; > > > } > > > if (!tref_isinteger(rc)) { > > > @@ -607,19 +607,11 @@ TRef lj_opt_narrow_pow(jit_State *J, TRef rb, TRef rc, TValue *vb, TValue *vc) > > > TRef tmp = emitir(IRTI(IR_ADD), rc, lj_ir_kint(J, 65536)); > > > emitir(IRTGI(IR_ULE), tmp, lj_ir_kint(J, 2*65536)); > > > } > > > - return emitir(IRTN(IR_POW), rb, rc); > > > + } else { > > > +force_pow_num: > > > + rc = lj_ir_tonum(J, rc); /* Want POW(num, num), not POW(num, int). */ > > > } > > > -split_pow: > > > - /* FOLD covers most cases, but some are easier to do here. */ > > > - if (tref_isk(rb) && tvispone(ir_knum(IR(tref_ref(rb))))) > > > - return rb; /* 1 ^ x ==> 1 */ > > > - rc = lj_ir_tonum(J, rc); > > > - if (tref_isk(rc) && ir_knum(IR(tref_ref(rc)))->n == 0.5) > > > - return emitir(IRTN(IR_FPMATH), rb, IRFPM_SQRT); /* x ^ 0.5 ==> sqrt(x) */ > > > - /* Split up b^c into exp2(c*log2(b)). Assembler may rejoin later. */ > > > - rb = emitir(IRTN(IR_FPMATH), rb, IRFPM_LOG2); > > > - rc = emitir(IRTN(IR_MUL), rb, rc); > > > - return emitir(IRTN(IR_FPMATH), rc, IRFPM_EXP2); > > > + return emitir(IRTN(IR_POW), rb, rc); > > > } > > > > > > /* -- Predictive narrowing of induction variables ------------------------- */ > > > diff --git a/src/lj_opt_split.c b/src/lj_opt_split.c > > > index 2fc36b8d..c10a85cb 100644 > > > --- a/src/lj_opt_split.c > > > +++ b/src/lj_opt_split.c > > > @@ -403,27 +403,6 @@ static void split_ir(jit_State *J) > > > hi = split_call_li(J, hisubst, oir, ir, IRCALL_lj_vm_powi); > > > break; > > > case IR_FPMATH: > > > - /* Try to rejoin pow from EXP2, MUL and LOG2. */ > > > - if (nir->op2 == IRFPM_EXP2 && nir->op1 > J->loopref) { > > > - IRIns *irp = IR(nir->op1); > > > - if (irp->o == IR_CALLN && irp->op2 == IRCALL_softfp_mul) { > > > - IRIns *irm4 = IR(irp->op1); > > > - IRIns *irm3 = IR(irm4->op1); > > > - IRIns *irm12 = IR(irm3->op1); > > > - IRIns *irl1 = IR(irm12->op1); > > > - if (irm12->op1 > J->loopref && irl1->o == IR_CALLN && > > > - irl1->op2 == IRCALL_lj_vm_log2) { > > > - IRRef tmp = irl1->op1; /* Recycle first two args from LOG2. */ > > > - IRRef arg3 = irm3->op2, arg4 = irm4->op2; > > > - J->cur.nins--; > > > - tmp = split_emit(J, IRT(IR_CARG, IRT_NIL), tmp, arg3); > > > - tmp = split_emit(J, IRT(IR_CARG, IRT_NIL), tmp, arg4); > > > - ir->prev = tmp = split_emit(J, IRTI(IR_CALLN), tmp, IRCALL_pow); > > > - hi = split_emit(J, IRT(IR_HIOP, IRT_SOFTFP), tmp, tmp); > > > - break; > > > - } > > > - } > > > - } > > > hi = split_call_l(J, hisubst, oir, ir, IRCALL_lj_vm_floor + ir->op2); > > > break; > > > case IR_LDEXP: > > > diff --git a/src/lj_vm.h b/src/lj_vm.h > > > index 411caafa..abaa7c52 100644 > > > --- a/src/lj_vm.h > > > +++ b/src/lj_vm.h > > > @@ -95,11 +95,6 @@ LJ_ASMF double lj_vm_trunc(double); > > > LJ_ASMF double lj_vm_trunc_sf(double); > > > #endif > > > #endif > > > -#ifdef LUAJIT_NO_EXP2 > > > -LJ_ASMF double lj_vm_exp2(double); > > > -#else > > > -#define lj_vm_exp2 exp2 > > > -#endif > > > #if LJ_HASFFI > > > LJ_ASMF int lj_vm_errno(void); > > > #endif > > > diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c > > > index ae4e0f15..9c0d3fde 100644 > > > --- a/src/lj_vmmath.c > > > +++ b/src/lj_vmmath.c > > > @@ -79,13 +79,6 @@ double lj_vm_log2(double a) > > > } > > > #endif > > > > > > -#ifdef LUAJIT_NO_EXP2 > > > -double lj_vm_exp2(double a) > > > -{ > > > - return exp(a * 0.6931471805599453); > > > -} > > > -#endif > > > - > > > #if !LJ_TARGET_X86ORX64 > > > /* Unsigned x^k. */ > > > static double lj_vm_powui(double x, uint32_t k) > > > @@ -128,7 +121,6 @@ double lj_vm_foldfpm(double x, int fpm) > > > case IRFPM_CEIL: return lj_vm_ceil(x); > > > case IRFPM_TRUNC: return lj_vm_trunc(x); > > > case IRFPM_SQRT: return sqrt(x); > > > - case IRFPM_EXP2: return lj_vm_exp2(x); > > > case IRFPM_LOG: return log(x); > > > case IRFPM_LOG2: return lj_vm_log2(x); > > > default: lua_assert(0); > > > diff --git a/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > > > new file mode 100644 > > > index 00000000..21b3a0d9 > > > --- /dev/null > > > +++ b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > > > @@ -0,0 +1,63 @@ > > > +local tap = require('tap') > > > +-- Test to demonstrate the incorrect JIT behaviour when splitting > > > +-- IR_POW. > > > +-- See also https://github.com/LuaJIT/LuaJIT/issues/9. > > > +local test = tap.test('lj-9-pow-inconsistencies'):skipcond({ > > > + ['Test requires JIT enabled'] = not jit.status(), > > > +}) > > > + > > > +local nan = 0 / 0 > > > +local inf = math.huge > > > + > > > +-- Table with some corner cases to check: > > > +local INTERESTING_VALUES = { > > > + -- 0, -0, 1, -1 special cases with nan, inf, etc.. > > > + 0, -0, 1, -1, nan, inf, -inf, > > > + -- x ^ inf = 0 (inf), if |x| < 1 (|x| > 1). > > > + -- x ^ -inf = inf (0), if |x| < 1 (|x| > 1). > > > + 0.999999, 1.000001, -0.999999, -1.000001, > > > +} > > > +test:plan(1 + (#INTERESTING_VALUES) ^ 2) > > > > I suggest renaming it to `CORNER_CASES`, since `INTERESTING_VALUES` > > is not very formal. > > Renamed. > > > Also, please mention that not all of the possible pairs are faulty > > and most of them are left here for two reasons: > > 1. Improved readability. > > 2. More extensive and change-proof testing. > > Added the comment. > > > > > > + > > > +jit.opt.start('hotloop=1') > > > + > > > +-- The JIT engine tries to split b^c to exp2(c * log2(b)). > > > +-- For some cases for IEEE754 we can see, that > > > +-- (double)exp2((double)log2(x)) != x, due to mathematical > > > +-- functions accuracy and double precision restrictions. > > > +-- Just use some numbers to observe this misbehaviour. > > > +local res = {} > > > +local cnt = 1 > > > +while cnt < 4 do > > > + -- XXX: use local variable to prevent folding via parser. > > > + local b = -0.90000000001 > > > + res[cnt] = 1000 ^ b > > > + cnt = cnt + 1 > > > +end > > > > Is there a specific reason you decided to use while over for? > > Since I can't remember, I think no, so I replaced with `for`. > > > > + > > > +test:samevalues(res, 'consistent pow operator behaviour for corner case') > > > + > > > +-- Prevent JIT side effects for parent loops. > > > +jit.off() > > > +for i = 1, #INTERESTING_VALUES do > > > + for j = 1, #INTERESTING_VALUES do > > > + local b = INTERESTING_VALUES[i] > > > + local c = INTERESTING_VALUES[j] > > > + local results = {} > > > + local counter = 1 > > > + jit.on() > > > + while counter < 4 do > > > + results[counter] = b ^ c > > > + counter = counter + 1 > > > + end > > Same question about for and while. > > Fixed. > > > > + -- Prevent JIT side effects. > > > + jit.off() > > > + jit.flush() > > Also, I think we should move the part from jit.on() to jit.flush() into > > a separate function. > > I don't agree here -- we still use tons of variables from the cycles, > and I don't want to see any side-effects of the function call in > traces. Ok, that is not a big deal. > > See other changes in the iterative patch below: > > =================================================================== > diff --git a/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > index 21b3a0d9..6abba07f 100644 > --- a/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > +++ b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > @@ -10,14 +10,18 @@ local nan = 0 / 0 > local inf = math.huge > > -- Table with some corner cases to check: > -local INTERESTING_VALUES = { > +-- Not all of them fail on each CPU architecture, but bruteforce > +-- is better, than custom enumerated usage for two reasons: > +-- * Improved readability. > +-- * More extensive and change-proof testing. > +local CORNER_CASES = { > -- 0, -0, 1, -1 special cases with nan, inf, etc.. > 0, -0, 1, -1, nan, inf, -inf, > -- x ^ inf = 0 (inf), if |x| < 1 (|x| > 1). > -- x ^ -inf = inf (0), if |x| < 1 (|x| > 1). > 0.999999, 1.000001, -0.999999, -1.000001, > } > -test:plan(1 + (#INTERESTING_VALUES) ^ 2) > +test:plan(1 + (#CORNER_CASES) ^ 2) > > jit.opt.start('hotloop=1') > > @@ -27,28 +31,25 @@ jit.opt.start('hotloop=1') > -- functions accuracy and double precision restrictions. > -- Just use some numbers to observe this misbehaviour. > local res = {} > -local cnt = 1 > -while cnt < 4 do > +for i = 1, 4 do > -- XXX: use local variable to prevent folding via parser. > local b = -0.90000000001 > - res[cnt] = 1000 ^ b > - cnt = cnt + 1 > + res[i] = 1000 ^ b > end > > test:samevalues(res, 'consistent pow operator behaviour for corner case') > > -- Prevent JIT side effects for parent loops. > jit.off() > -for i = 1, #INTERESTING_VALUES do > - for j = 1, #INTERESTING_VALUES do > - local b = INTERESTING_VALUES[i] > - local c = INTERESTING_VALUES[j] > +for i = 1, #CORNER_CASES do > + for j = 1, #CORNER_CASES do > + local b = CORNER_CASES[i] > + local c = CORNER_CASES[j] > local results = {} > local counter = 1 > jit.on() > - while counter < 4 do > - results[counter] = b ^ c > - counter = counter + 1 > + for k = 1, 4 do > + results[k] = b ^ c > end > -- Prevent JIT side effects. > jit.off() > =================================================================== > > > > > -- > > > 2.41.0 > > > > > -- > Best regards, > Sergey Kaplun