From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id E50AE57B48E; Sun, 20 Aug 2023 12:26:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E50AE57B48E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1692523600; bh=7H6M4v/lj043Ty0hLkDAj0+mKf4HCyqS4XA9thtlhd0=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=NzXqkD6W4B92XwQG3S6MdFGuiZuZQr0YSpFoKkd8Crphj34+JShRp7Dlf84Y1VQfb NSmb5BhSM+QZHZmQRwqD/TbwUoHac76zhLp3z9TpurYjy1wPUQkhKxtoRKH8nOxv7E aFhWJhGs9bAxQtaKjQvCvO3ufaeeWTE1VoCyIWr0= Received: from smtp35.i.mail.ru (smtp35.i.mail.ru [95.163.41.76]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 1A4504F3F5D for ; Sun, 20 Aug 2023 12:26:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1A4504F3F5D Received: by smtp35.i.mail.ru with esmtpa (envelope-from ) id 1qXehg-00C82i-1l; Sun, 20 Aug 2023 12:26:37 +0300 Date: Sun, 20 Aug 2023 12:26:35 +0300 To: Sergey Kaplun Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD93C8852532D76B9E3155AD137556C14E709A4466288DB511C182A05F53808504006FA5EE65A590815ED51C8901504CBFD5B30A636FB950735636D2EB0EC970425 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7AB5815F4DE05345AEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637775FFFCA96730EC9EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BE5CCB53A13BC8DBADFD4B76FD0121C6FEBC8998F0924A21BCC7F00164DA146DAFE8445B8C89999728AA50765F79006375A3B25A3A11CE7E4389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC8D2DCF9CF1F528DBCF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA73AA81AA40904B5D9A18204E546F3947C4AB4081B6A6C2E07BA3038C0950A5D36C8A9BA7A39EFB766D91E3A1F190DE8FDBA3038C0950A5D36D5E8D9A59859A8B66EFF71F1B3C06F723AA81AA40904B5D99C9F4D5AE37F343AD1F44FA8B9022EA23BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964BD2DCF9CF1F528DBC35872C767BF85DA227C277FBC8AE2E8B953A8A48A05D51F175ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A570F16911F099A6CC0A8F8DE4AC9799F4DDD4B58190BA6399F87CCE6106E1FC07E67D4AC08A07B9B0DB8A315C1FF4794DBDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CF9BC83B6DDA7BA0FADDAF3FEC9FB2A8AE8B0094BD4DDCCA3DB748A6593E709CC82DC0EA915DEF7D0CD9FDDA0AEF943CB3F1499563BA73C2EBBF390AC1C34B53C2E48CAC7CA610320002C26D483E81D6BE64ACE4A408B72B61B0CA6F94E606A667A52EF62A646584F811BD90D3D42C882D43082AE146A756F3 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojGRTUr9c9Plmaxv+6T7CjwQ== X-Mailru-Sender: 0E9E14D9EC491FBA7D1E237993EE709EFB288F3940739F5DED51C8901504CBFDFE5F7FEA15315F1604C9FB44FCBCE9EE92D99EB8CC7091A7ECEABDC5717908DEF544888E8238EB4872D6B4FCE48DF648AE208404248635DF X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit 4/5] Fix pow() optimization inconsistencies. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Maxim Kokryashkin via Tarantool-patches Reply-To: Maxim Kokryashkin Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Sergey! Thanks for the patch! Please consider my comments below. On Tue, Aug 15, 2023 at 12:36:30PM +0300, Sergey Kaplun wrote: > From: Mike Pall > > (cherry-picked from commit 9512d5c1aced61e13e7be2d3208ec7ae3516b458) > > This patch fixes different misbehaviour between JIT-compiled code and Typo: s/misbehaviour/misbehaviours/ > the interpreter for power operator with the following ways: Typo: s/with the/in the/ > * Drop folding optimizations for base ^ 0.5 => sqrt(base), as far as > pow(base, 0.5) isn't interchangeable and depends on the > implementation. > * Drop folding optimizations for 2 ^ int_pow => ldexp(1.0, int_pow), to > avoid dependcy on the implementation. > * Now `asm_pow()` always assemble a call to the `lj_vm_powi()` function, Typo: s/assemble/assembles/ > that is general now for all CPU architectures. Using this internal > function instead of toolchain-provided `pow()` guarantees consistency Typo: s/of/of the/ > between interpreter and JIT results. Also, it drops custom Typo: s/drops/drops the/ > implementation for the `vm_powi_sse()` on x86_64. Typo: s/for the/for/ > * `math_extern2` macro in the VM may take the second argument, that is > used as the target function to call. The first argument is still the > name for `func_nnsse` macro. > * Narrowing for power operation avoids range guard for non-constant base > IR. This leads to invalid result if value on trace is out of range. Typo: s/to invalid/to an invalid/ > Now it is done unconditionally. > > Be aware, that [220/502] lib/string/format/num.lua test [1] from Typo: s/from the/from/ > LuaJIT-test suite fails after this commit. > > [1]: https://www.exploringbinary.com/incorrect-floating-point-to-decimal-conversions/ > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#8825 > --- > src/lj_asm.c | 7 +- > src/lj_asm_x86.h | 13 --- > src/lj_dispatch.h | 2 +- > src/lj_ircall.h | 2 +- > src/lj_opt_fold.c | 27 ------ > src/lj_opt_narrow.c | 12 +-- > src/lj_vm.h | 7 +- > src/lj_vmmath.c | 82 +++++++++-------- > src/vm_arm.dasc | 13 +-- > src/vm_arm64.dasc | 11 ++- > src/vm_mips.dasc | 11 ++- > src/vm_mips64.dasc | 11 ++- > src/vm_ppc.dasc | 11 ++- > src/vm_x64.dasc | 44 ++------- > src/vm_x86.dasc | 46 ++-------- > .../lj-684-pow-inconsistencies.test.lua | 89 +++++++++++++++++++ > .../lj-9-pow-inconsistencies.test.lua | 2 + > 17 files changed, 195 insertions(+), 195 deletions(-) > create mode 100644 test/tarantool-tests/lj-684-pow-inconsistencies.test.lua > > diff --git a/src/lj_asm.c b/src/lj_asm.c > index d71fa8c8..65261d50 100644 > --- a/src/lj_asm.c > +++ b/src/lj_asm.c > @@ -1650,7 +1650,6 @@ static void asm_loop(ASMState *as) > #if !LJ_SOFTFP32 > #if !LJ_TARGET_X86ORX64 > #define asm_ldexp(as, ir) asm_callid(as, ir, IRCALL_ldexp) > -#define asm_fppowi(as, ir) asm_callid(as, ir, IRCALL_lj_vm_powi) > #endif > > static void asm_pow(ASMState *as, IRIns *ir) > @@ -1661,10 +1660,8 @@ static void asm_pow(ASMState *as, IRIns *ir) > IRCALL_lj_carith_powu64); > else > #endif > - if (irt_isnum(IR(ir->op2)->t)) > - asm_callid(as, ir, IRCALL_pow); > - else > - asm_fppowi(as, ir); > + asm_callid(as, ir, irt_isnum(IR(ir->op2)->t) ? IRCALL_lj_vm_pow : > + IRCALL_lj_vm_powi); > } > > static void asm_div(ASMState *as, IRIns *ir) > diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h > index 74f2d853..2b810c8d 100644 > --- a/src/lj_asm_x86.h > +++ b/src/lj_asm_x86.h > @@ -2005,19 +2005,6 @@ static void asm_ldexp(ASMState *as, IRIns *ir) > asm_x87load(as, ir->op2); > } > > -static void asm_fppowi(ASMState *as, IRIns *ir) > -{ > - /* The modified regs must match with the *.dasc implementation. */ > - RegSet drop = RSET_RANGE(RID_XMM0, RID_XMM1+1)|RID2RSET(RID_EAX); > - if (ra_hasreg(ir->r)) > - rset_clear(drop, ir->r); /* Dest reg handled below. */ > - ra_evictset(as, drop); > - ra_destreg(as, ir, RID_XMM0); > - emit_call(as, lj_vm_powi_sse); > - ra_left(as, RID_XMM0, ir->op1); > - ra_left(as, RID_EAX, ir->op2); > -} > - > static int asm_swapops(ASMState *as, IRIns *ir) > { > IRIns *irl = IR(ir->op1); > diff --git a/src/lj_dispatch.h b/src/lj_dispatch.h > index b8bc2594..af870a75 100644 > --- a/src/lj_dispatch.h > +++ b/src/lj_dispatch.h > @@ -44,7 +44,7 @@ extern double __divdf3(double a, double b); > #define GOTDEF(_) \ > _(floor) _(ceil) _(trunc) _(log) _(log10) _(exp) _(sin) _(cos) _(tan) \ > _(asin) _(acos) _(atan) _(sinh) _(cosh) _(tanh) _(frexp) _(modf) _(atan2) \ > - _(pow) _(fmod) _(ldexp) _(lj_vm_modi) \ > + _(lj_vm_pow) _(fmod) _(ldexp) _(lj_vm_modi) \ > _(lj_dispatch_call) _(lj_dispatch_ins) _(lj_dispatch_stitch) \ > _(lj_dispatch_profile) _(lj_err_throw) \ > _(lj_ffh_coroutine_wrap_err) _(lj_func_closeuv) _(lj_func_newL_gc) \ > diff --git a/src/lj_ircall.h b/src/lj_ircall.h > index af064a6f..ac0888a0 100644 > --- a/src/lj_ircall.h > +++ b/src/lj_ircall.h > @@ -195,7 +195,7 @@ typedef struct CCallInfo { > _(ANY, log, 1, N, NUM, XA_FP) \ > _(ANY, lj_vm_log2, 1, N, NUM, XA_FP) \ > _(ANY, lj_vm_powi, 2, N, NUM, XA_FP) \ > - _(ANY, pow, 2, N, NUM, XA2_FP) \ > + _(ANY, lj_vm_pow, 2, N, NUM, XA2_FP) \ > _(ANY, atan2, 2, N, NUM, XA2_FP) \ > _(ANY, ldexp, 2, N, NUM, XA_FP) \ > _(SOFTFP, lj_vm_tobit, 1, N, INT, XA_FP32) \ > diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c > index 0007107b..7d7cc9d1 100644 > --- a/src/lj_opt_fold.c > +++ b/src/lj_opt_fold.c > @@ -1114,33 +1114,6 @@ LJFOLDF(simplify_numpow_xkint) > return ref; > } > > -LJFOLD(POW any KNUM) > -LJFOLDF(simplify_numpow_xknum) > -{ > - if (knumright == 0.5) /* x ^ 0.5 ==> sqrt(x) */ > - return emitir(IRTN(IR_FPMATH), fins->op1, IRFPM_SQRT); > - return NEXTFOLD; > -} > - > -LJFOLD(POW KNUM any) > -LJFOLDF(simplify_numpow_kx) > -{ > - lua_Number n = knumleft; > - if (n == 2.0 && irt_isint(fright->t)) { /* 2.0 ^ i ==> ldexp(1.0, i) */ > -#if LJ_TARGET_X86ORX64 > - /* Different IR_LDEXP calling convention on x86/x64 requires conversion. */ > - fins->o = IR_CONV; > - fins->op1 = fins->op2; > - fins->op2 = IRCONV_NUM_INT; > - fins->op2 = (IRRef1)lj_opt_fold(J); > -#endif > - fins->op1 = (IRRef1)lj_ir_knum_one(J); > - fins->o = IR_LDEXP; > - return RETRYFOLD; > - } > - return NEXTFOLD; > -} > - > /* -- Simplify conversions ------------------------------------------------ */ > > LJFOLD(CONV CONV IRCONV_NUM_INT) /* _NUM */ > diff --git a/src/lj_opt_narrow.c b/src/lj_opt_narrow.c > index 2cfb775b..d6601f4c 100644 > --- a/src/lj_opt_narrow.c > +++ b/src/lj_opt_narrow.c > @@ -590,20 +590,14 @@ TRef lj_opt_narrow_pow(jit_State *J, TRef rb, TRef rc, TValue *vb, TValue *vc) > rb = conv_str_tonum(J, rb, vb); > rb = lj_ir_tonum(J, rb); /* Left arg is always treated as an FP number. */ > rc = conv_str_tonum(J, rc, vc); > - /* Narrowing must be unconditional to preserve (-x)^i semantics. */ > if (tvisint(vc) || numisint(numV(vc))) { > - int checkrange = 0; > - /* pow() is faster for bigger exponents. But do this only for (+k)^i. */ > - if (tref_isk(rb) && (int32_t)ir_knum(IR(tref_ref(rb)))->u32.hi >= 0) { > - int32_t k = numberVint(vc); > - if (!(k >= -65536 && k <= 65536)) goto force_pow_num; > - checkrange = 1; > - } > + int32_t k = numberVint(vc); > + if (!(k >= -65536 && k <= 65536)) goto force_pow_num; > if (!tref_isinteger(rc)) { > /* Guarded conversion to integer! */ > rc = emitir(IRTGI(IR_CONV), rc, IRCONV_INT_NUM|IRCONV_CHECK); > } > - if (checkrange && !tref_isk(rc)) { /* Range guard: -65536 <= i <= 65536 */ > + if (!tref_isk(rc)) { /* Range guard: -65536 <= i <= 65536 */ > TRef tmp = emitir(IRTI(IR_ADD), rc, lj_ir_kint(J, 65536)); > emitir(IRTGI(IR_ULE), tmp, lj_ir_kint(J, 2*65536)); > } > diff --git a/src/lj_vm.h b/src/lj_vm.h > index abaa7c52..f6f28a08 100644 > --- a/src/lj_vm.h > +++ b/src/lj_vm.h > @@ -82,10 +82,6 @@ LJ_ASMF int32_t LJ_FASTCALL lj_vm_modi(int32_t, int32_t); > LJ_ASMF void lj_vm_floor_sse(void); > LJ_ASMF void lj_vm_ceil_sse(void); > LJ_ASMF void lj_vm_trunc_sse(void); > -LJ_ASMF void lj_vm_powi_sse(void); > -#define lj_vm_powi NULL > -#else > -LJ_ASMF double lj_vm_powi(double, int32_t); > #endif > #if LJ_TARGET_PPC || LJ_TARGET_ARM64 > #define lj_vm_trunc trunc > @@ -100,6 +96,9 @@ LJ_ASMF int lj_vm_errno(void); > #endif > #endif > > +LJ_ASMF double lj_vm_powi(double, int32_t); > +LJ_ASMF double lj_vm_pow(double, double); > + > /* Continuations for metamethods. */ > LJ_ASMF void lj_cont_cat(void); /* Continue with concatenation. */ > LJ_ASMF void lj_cont_ra(void); /* Store result in RA from instruction. */ > diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c > index 14e66687..539f955b 100644 > --- a/src/lj_vmmath.c > +++ b/src/lj_vmmath.c > @@ -30,11 +30,51 @@ LJ_FUNCA double lj_wrap_sinh(double x) { return sinh(x); } > LJ_FUNCA double lj_wrap_cosh(double x) { return cosh(x); } > LJ_FUNCA double lj_wrap_tanh(double x) { return tanh(x); } > LJ_FUNCA double lj_wrap_atan2(double x, double y) { return atan2(x, y); } > -LJ_FUNCA double lj_wrap_pow(double x, double y) { return pow(x, y); } > LJ_FUNCA double lj_wrap_fmod(double x, double y) { return fmod(x, y); } > #endif > > -/* -- Helper functions for generated machine code ------------------------- */ > +/* -- Helper functions ---------------------------------------------------- */ > + > +/* Unsigned x^k. */ > +static double lj_vm_powui(double x, uint32_t k) > +{ > + double y; > + lj_assertX(k != 0, "pow with zero exponent"); > + for (; (k & 1) == 0; k >>= 1) x *= x; > + y = x; > + if ((k >>= 1) != 0) { > + for (;;) { > + x *= x; > + if (k == 1) break; > + if (k & 1) y *= x; > + k >>= 1; > + } > + y *= x; > + } > + return y; > +} > + > +/* Signed x^k. */ > +double lj_vm_powi(double x, int32_t k) > +{ > + if (k > 1) > + return lj_vm_powui(x, (uint32_t)k); > + else if (k == 1) > + return x; > + else if (k == 0) > + return 1.0; > + else > + return 1.0 / lj_vm_powui(x, (uint32_t)-k); > +} > + > +double lj_vm_pow(double x, double y) > +{ > + int32_t k = lj_num2int(y); > + if ((k >= -65536 && k <= 65536) && y == (double)k) > + return lj_vm_powi(x, k); > + else > + return pow(x, y); > +} > > double lj_vm_foldarith(double x, double y, int op) > { > @@ -44,7 +84,7 @@ double lj_vm_foldarith(double x, double y, int op) > case IR_MUL - IR_ADD: return x*y; break; > case IR_DIV - IR_ADD: return x/y; break; > case IR_MOD - IR_ADD: return x-lj_vm_floor(x/y)*y; break; > - case IR_POW - IR_ADD: return pow(x, y); break; > + case IR_POW - IR_ADD: return lj_vm_pow(x, y); break; > case IR_NEG - IR_ADD: return -x; break; > case IR_ABS - IR_ADD: return fabs(x); break; > #if LJ_HASJIT > @@ -56,6 +96,8 @@ double lj_vm_foldarith(double x, double y, int op) > } > } > > +/* -- Helper functions for generated machine code ------------------------- */ > + > #if (LJ_HASJIT && !(LJ_TARGET_ARM || LJ_TARGET_ARM64 || LJ_TARGET_PPC)) || LJ_TARGET_MIPS > int32_t LJ_FASTCALL lj_vm_modi(int32_t a, int32_t b) > { > @@ -80,40 +122,6 @@ double lj_vm_log2(double a) > } > #endif > > -#if !LJ_TARGET_X86ORX64 > -/* Unsigned x^k. */ > -static double lj_vm_powui(double x, uint32_t k) > -{ > - double y; > - lj_assertX(k != 0, "pow with zero exponent"); > - for (; (k & 1) == 0; k >>= 1) x *= x; > - y = x; > - if ((k >>= 1) != 0) { > - for (;;) { > - x *= x; > - if (k == 1) break; > - if (k & 1) y *= x; > - k >>= 1; > - } > - y *= x; > - } > - return y; > -} > - > -/* Signed x^k. */ > -double lj_vm_powi(double x, int32_t k) > -{ > - if (k > 1) > - return lj_vm_powui(x, (uint32_t)k); > - else if (k == 1) > - return x; > - else if (k == 0) > - return 1.0; > - else > - return 1.0 / lj_vm_powui(x, (uint32_t)-k); > -} > -#endif > - > /* Computes fpm(x) for extended math functions. */ > double lj_vm_foldfpm(double x, int fpm) > { > diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc > index 767d31f9..792f0363 100644 > --- a/src/vm_arm.dasc > +++ b/src/vm_arm.dasc > @@ -1485,11 +1485,11 @@ static void build_subroutines(BuildCtx *ctx) > |.endif > |.endmacro > | > - |.macro math_extern2, func > + |.macro math_extern2, name, func > |.if HFABI > - | .ffunc_dd math_ .. func > + | .ffunc_dd math_ .. name > |.else > - | .ffunc_nn math_ .. func > + | .ffunc_nn math_ .. name > |.endif > | .IOS mov RA, BASE > | bl extern func > @@ -1500,6 +1500,9 @@ static void build_subroutines(BuildCtx *ctx) > | b ->fff_restv > |.endif > |.endmacro > + |.macro math_extern2, func > + | math_extern2 func, func > + |.endmacro > | > |.if FPU > | .ffunc_d math_sqrt > @@ -1545,7 +1548,7 @@ static void build_subroutines(BuildCtx *ctx) > | math_extern sinh > | math_extern cosh > | math_extern tanh > - | math_extern2 pow > + | math_extern2 pow, lj_vm_pow > | math_extern2 atan2 > | math_extern2 fmod > | > @@ -3153,7 +3156,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > break; > case BC_POW: > | // NYI: (partial) integer arithmetic. > - | ins_arithfp extern, extern pow > + | ins_arithfp extern, extern lj_vm_pow > break; > > case BC_CAT: > diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc > index de33bde4..fb267a76 100644 > --- a/src/vm_arm64.dasc > +++ b/src/vm_arm64.dasc > @@ -1391,11 +1391,14 @@ static void build_subroutines(BuildCtx *ctx) > | b ->fff_resn > |.endmacro > | > - |.macro math_extern2, func > - | .ffunc_nn math_ .. func > + |.macro math_extern2, name, func > + | .ffunc_nn math_ .. name > | bl extern func > | b ->fff_resn > |.endmacro > + |.macro math_extern2, func > + | math_extern2 func, func > + |.endmacro > | > |.ffunc_n math_sqrt > | fsqrt d0, d0 > @@ -1424,7 +1427,7 @@ static void build_subroutines(BuildCtx *ctx) > | math_extern sinh > | math_extern cosh > | math_extern tanh > - | math_extern2 pow > + | math_extern2 pow, lj_vm_pow > | math_extern2 atan2 > | math_extern2 fmod > | > @@ -2621,7 +2624,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | ins_arithload FARG1, FARG2 > | ins_arithfallback ins_arithcheck_num > |.if "fpins" == "fpow" > - | bl extern pow > + | bl extern lj_vm_pow > |.else > | fpins FARG1, FARG1, FARG2 > |.endif > diff --git a/src/vm_mips.dasc b/src/vm_mips.dasc > index 32caabf7..5664f503 100644 > --- a/src/vm_mips.dasc > +++ b/src/vm_mips.dasc > @@ -1631,14 +1631,17 @@ static void build_subroutines(BuildCtx *ctx) > |. nop > |.endmacro > | > - |.macro math_extern2, func > - | .ffunc_nn math_ .. func > + |.macro math_extern2, name, func > + | .ffunc_nn math_ .. name > |. load_got func > | call_extern > |. nop > | b ->fff_resn > |. nop > |.endmacro > + |.macro math_extern2, func > + | math_extern2 func, func > + |.endmacro > | > |// TODO: Return integer type if result is integer (own sf implementation). > |.macro math_round, func > @@ -1692,7 +1695,7 @@ static void build_subroutines(BuildCtx *ctx) > | math_extern sinh > | math_extern cosh > | math_extern tanh > - | math_extern2 pow > + | math_extern2 pow, lj_vm_pow > | math_extern2 atan2 > | math_extern2 fmod > | > @@ -3585,7 +3588,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | sltiu AT, SFARG1HI, LJ_TISNUM > | sltiu TMP0, SFARG2HI, LJ_TISNUM > | and AT, AT, TMP0 > - | load_got pow > + | load_got lj_vm_pow > | beqz AT, ->vmeta_arith > |. addu RA, BASE, RA > |.if FPU > diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc > index 44fba36c..249605d4 100644 > --- a/src/vm_mips64.dasc > +++ b/src/vm_mips64.dasc > @@ -1669,14 +1669,17 @@ static void build_subroutines(BuildCtx *ctx) > |. nop > |.endmacro > | > - |.macro math_extern2, func > - | .ffunc_nn math_ .. func > + |.macro math_extern2, name, func > + | .ffunc_nn math_ .. name > |. load_got func > | call_extern > |. nop > | b ->fff_resn > |. nop > |.endmacro > + |.macro math_extern2, func > + | math_extern2 func, func > + |.endmacro > | > |// TODO: Return integer type if result is integer (own sf implementation). > |.macro math_round, func > @@ -1730,7 +1733,7 @@ static void build_subroutines(BuildCtx *ctx) > | math_extern sinh > | math_extern cosh > | math_extern tanh > - | math_extern2 pow > + | math_extern2 pow, lj_vm_pow > | math_extern2 atan2 > | math_extern2 fmod > | > @@ -3823,7 +3826,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | sltiu TMP0, TMP0, LJ_TISNUM > | sltiu TMP1, TMP1, LJ_TISNUM > | and AT, TMP0, TMP1 > - | load_got pow > + | load_got lj_vm_pow > | beqz AT, ->vmeta_arith > |. daddu RA, BASE, RA > |.if FPU > diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc > index 980ad897..94af63e6 100644 > --- a/src/vm_ppc.dasc > +++ b/src/vm_ppc.dasc > @@ -2032,11 +2032,14 @@ static void build_subroutines(BuildCtx *ctx) > | b ->fff_resn > |.endmacro > | > - |.macro math_extern2, func > - | .ffunc_nn math_ .. func > + |.macro math_extern2, name, func > + | .ffunc_nn math_ .. name > | blex func > | b ->fff_resn > |.endmacro > + |.macro math_extern2, func > + | math_extern2 func, func > + |.endmacro > | > |.macro math_round, func > | .ffunc_1 math_ .. func > @@ -2161,7 +2164,7 @@ static void build_subroutines(BuildCtx *ctx) > | math_extern sinh > | math_extern cosh > | math_extern tanh > - | math_extern2 pow > + | math_extern2 pow, lj_vm_pow > | math_extern2 atan2 > | math_extern2 fmod > | > @@ -4154,7 +4157,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | checknum cr1, CARG3 > | crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt > | bge ->vmeta_arith_vv > - | blex pow > + | blex lj_vm_pow > | ins_next1 > |.if FPU > | stfdx FARG1, BASE, RA > diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc > index 7b04b928..acbe8dc2 100644 > --- a/src/vm_x64.dasc > +++ b/src/vm_x64.dasc > @@ -1825,13 +1825,16 @@ static void build_subroutines(BuildCtx *ctx) > | jmp ->fff_resxmm0 > |.endmacro > | > - |.macro math_extern2, func > - | .ffunc_nn math_ .. func > + |.macro math_extern2, name, func > + | .ffunc_nn math_ .. name > | mov RB, BASE > | call extern func > | mov BASE, RB > | jmp ->fff_resxmm0 > |.endmacro > + |.macro math_extern2, func > + | math_extern2 func, func > + |.endmacro > | > | math_extern log10 > | math_extern exp > @@ -1844,7 +1847,7 @@ static void build_subroutines(BuildCtx *ctx) > | math_extern sinh > | math_extern cosh > | math_extern tanh > - | math_extern2 pow > + | math_extern2 pow, lj_vm_pow > | math_extern2 atan2 > | math_extern2 fmod > | > @@ -2649,41 +2652,6 @@ static void build_subroutines(BuildCtx *ctx) > | subsd xmm0, xmm1 > | ret > | > - |// Args in xmm0/eax. Ret in xmm0. xmm0-xmm1 and eax modified. > - |->vm_powi_sse: > - | cmp eax, 1; jle >6 // i<=1? > - | // Now 1 < (unsigned)i <= 0x80000000. > - |1: // Handle leading zeros. > - | test eax, 1; jnz >2 > - | mulsd xmm0, xmm0 > - | shr eax, 1 > - | jmp <1 > - |2: > - | shr eax, 1; jz >5 > - | movaps xmm1, xmm0 > - |3: // Handle trailing bits. > - | mulsd xmm0, xmm0 > - | shr eax, 1; jz >4 > - | jnc <3 > - | mulsd xmm1, xmm0 > - | jmp <3 > - |4: > - | mulsd xmm0, xmm1 > - |5: > - | ret > - |6: > - | je <5 // x^1 ==> x > - | jb >7 // x^0 ==> 1 > - | neg eax > - | call <1 > - | sseconst_1 xmm1, RD > - | divsd xmm1, xmm0 > - | movaps xmm0, xmm1 > - | ret > - |7: > - | sseconst_1 xmm0, RD > - | ret > - | > |//----------------------------------------------------------------------- > |//-- Miscellaneous functions -------------------------------------------- > |//----------------------------------------------------------------------- > diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc > index bd1e940e..bf30cce6 100644 > --- a/src/vm_x86.dasc > +++ b/src/vm_x86.dasc > @@ -2240,8 +2240,8 @@ static void build_subroutines(BuildCtx *ctx) > | jmp ->fff_resfp > |.endmacro > | > - |.macro math_extern2, func > - | .ffunc_nnsse math_ .. func > + |.macro math_extern2, name, func > + | .ffunc_nnsse math_ .. name > |.if not X64 > | movsd FPARG1, xmm0 > | movsd FPARG3, xmm1 > @@ -2251,6 +2251,9 @@ static void build_subroutines(BuildCtx *ctx) > | mov BASE, RB > | jmp ->fff_resfp > |.endmacro > + |.macro math_extern2, func > + | math_extern2 func, func > + |.endmacro > | > | math_extern log10 > | math_extern exp > @@ -2263,7 +2266,7 @@ static void build_subroutines(BuildCtx *ctx) > | math_extern sinh > | math_extern cosh > | math_extern tanh > - | math_extern2 pow > + | math_extern2 pow, lj_vm_pow > | math_extern2 atan2 > | math_extern2 fmod > | > @@ -3140,41 +3143,6 @@ static void build_subroutines(BuildCtx *ctx) > | subsd xmm0, xmm1 > | ret > | > - |// Args in xmm0/eax. Ret in xmm0. xmm0-xmm1 and eax modified. > - |->vm_powi_sse: > - | cmp eax, 1; jle >6 // i<=1? > - | // Now 1 < (unsigned)i <= 0x80000000. > - |1: // Handle leading zeros. > - | test eax, 1; jnz >2 > - | mulsd xmm0, xmm0 > - | shr eax, 1 > - | jmp <1 > - |2: > - | shr eax, 1; jz >5 > - | movaps xmm1, xmm0 > - |3: // Handle trailing bits. > - | mulsd xmm0, xmm0 > - | shr eax, 1; jz >4 > - | jnc <3 > - | mulsd xmm1, xmm0 > - | jmp <3 > - |4: > - | mulsd xmm0, xmm1 > - |5: > - | ret > - |6: > - | je <5 // x^1 ==> x > - | jb >7 // x^0 ==> 1 > - | neg eax > - | call <1 > - | sseconst_1 xmm1, RDa > - | divsd xmm1, xmm0 > - | movaps xmm0, xmm1 > - | ret > - |7: > - | sseconst_1 xmm0, RDa > - | ret > - | > |//----------------------------------------------------------------------- > |//-- Miscellaneous functions -------------------------------------------- > |//----------------------------------------------------------------------- > @@ -3976,7 +3944,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop) > | movsd FPARG1, xmm0 > | movsd FPARG3, xmm1 > |.endif > - | call extern pow > + | call extern lj_vm_pow > | movzx RA, PC_RA > | mov BASE, RB > |.if X64 > diff --git a/test/tarantool-tests/lj-684-pow-inconsistencies.test.lua b/test/tarantool-tests/lj-684-pow-inconsistencies.test.lua > new file mode 100644 > index 00000000..5129fc45 > --- /dev/null > +++ b/test/tarantool-tests/lj-684-pow-inconsistencies.test.lua > @@ -0,0 +1,89 @@ > +local tap = require('tap') > +-- Test to demonstrate the incorrect JIT behaviour for different > +-- power operation optimizations. > +-- See also: > +-- https://github.com/LuaJIT/LuaJIT/issues/684. > +local test = tap.test('lj-684-pow-inconsistencies'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +local tostring = tostring > + > +test:plan(4) > + > +jit.opt.start('hotloop=1') > + > +-- XXX: Prevent hotcount side effects. > +jit.off() > +jit.flush() > + > +local res = {} > +-- -0 ^ 0.5 = 0. Test sign with `tostring()`. Typo: s/Test/Test the/ > +-- XXX: use local variable to prevent folding via parser. > +-- XXX: use stack slot out of trace to prevent constant folding. > +local minus_zero = -0 > +jit.on() > +for i = 1, 4 do > + res[i] = tostring(minus_zero ^ 0.5) > +end > + > +-- XXX: Prevent hotcount side effects. > +jit.off() > +jit.flush() > + > +test:samevalues(res, ('consistent results for folding (-0) ^ 0.5')) > + > +jit.on() > +-- -inf ^ 0.5 = inf. > +res = {} > +local minus_inf = -math.huge > +jit.on() > +for i = 1, 4 do > + res[i] = minus_inf ^ 0.5 > +end > + > +-- XXX: Prevent hotcount side effects. > +jit.off() > +jit.flush() > + > +test:samevalues(res, ('consistent results for folding (-inf) ^ 0.5')) > + > +-- 2921 ^ 0.5 = 0x1.b05ec632536fap+5. We certainly need to add some explanation here about the precision, because it is not obvious why these magic numbers should cause any issues. > +res = {} > +-- XXX: use local variable to prevent folding via parser. > +-- XXX: use stack slot out of trace to prevent constant folding. > +local corner_case_05 = 2921 > +jit.on() > +for i = 1, 4 do > + res[i] = corner_case_05 ^ 0.5 > +end > + > +-- XXX: Prevent hotcount side effects. > +jit.off() > +jit.flush() > + > +test:samevalues(res, ('consistent results for folding 2921 ^ 0.5')) I believe it is possible to make a single function with different parameters for all three cases above. Something like `test_power(value, power, extra_map)`, so you can do | res[i] = extra_map(value ^ power) > + > +-- Narrowing for non-constant base of power operation. > +local function pow(base, power) > + return base ^ power > +end > + > +jit.on() > + > +-- Compile function first. > +pow(1, 2) > +pow(1, 2) > + > +-- Need some value near 1, to avoid infinite result. Typo: s/Need/We need/ Typo: s/avoid/avoid an/ > +local base = 1.0000000001 > +local power = 65536 * 3 > +local resulting_value = pow(base, power) > + > +-- XXX: Prevent hotcount side effects. > +jit.off() > +jit.flush() > + > +test:is(resulting_value, base ^ power, 'guard for narrowing of power operation') > + > +test:done(true) > diff --git a/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > index 21b3a0d9..1f7f65c5 100644 > --- a/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > +++ b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua > @@ -16,6 +16,8 @@ local INTERESTING_VALUES = { > -- x ^ inf = 0 (inf), if |x| < 1 (|x| > 1). > -- x ^ -inf = inf (0), if |x| < 1 (|x| > 1). > 0.999999, 1.000001, -0.999999, -1.000001, > + -- Test power of even numbers optimizations. > + 2, -2, 0.5, -0.5, > } > test:plan(1 + (#INTERESTING_VALUES) ^ 2) > > -- > 2.41.0 >