[Tarantool-patches] [PATCH luajit 4/5] Fix pow() optimization inconsistencies.

Sun Aug 20 12:26:35 MSK 2023

Hi, Sergey!
Thanks for the patch!
Please consider my comments below.

On Tue, Aug 15, 2023 at 12:36:30PM +0300, Sergey Kaplun wrote:
> From: Mike Pall <mike>
> 
> (cherry-picked from commit 9512d5c1aced61e13e7be2d3208ec7ae3516b458)
> 
> This patch fixes different misbehaviour between JIT-compiled code and
Typo: s/misbehaviour/misbehaviours/
> the interpreter for power operator with the following ways:
Typo: s/with the/in the/
> * Drop folding optimizations for base ^ 0.5 => sqrt(base), as far as
>   pow(base, 0.5) isn't interchangeable and depends on the <math.h>
>   implementation.
> * Drop folding optimizations for 2 ^ int_pow => ldexp(1.0, int_pow), to
>   avoid dependcy on the <math.h> implementation.
> * Now `asm_pow()` always assemble a call to the `lj_vm_powi()` function,
Typo: s/assemble/assembles/
>   that is general now for all CPU architectures. Using this internal
>   function instead of toolchain-provided `pow()` guarantees consistency
Typo: s/of/of the/
>   between interpreter and JIT results. Also, it drops custom
Typo: s/drops/drops the/
>   implementation for the `vm_powi_sse()` on x86_64.
Typo: s/for the/for/
> * `math_extern2` macro in the VM may take the second argument, that is
>   used as the target function to call. The first argument is still the
>   name for `func_nnsse` macro.
> * Narrowing for power operation avoids range guard for non-constant base
>   IR. This leads to invalid result if value on trace is out of range.
Typo: s/to invalid/to an invalid/
>   Now it is done unconditionally.
> 
> Be aware, that [220/502] lib/string/format/num.lua test [1] from
Typo: s/from the/from/
> LuaJIT-test suite fails after this commit.
> 
> [1]: https://www.exploringbinary.com/incorrect-floating-point-to-decimal-conversions/
> 
> Sergey Kaplun:
> * added the description and the test for the problem
> 
> Part of tarantool/tarantool#8825
> ---
>  src/lj_asm.c                                  |  7 +-
>  src/lj_asm_x86.h                              | 13 ---
>  src/lj_dispatch.h                             |  2 +-
>  src/lj_ircall.h                               |  2 +-
>  src/lj_opt_fold.c                             | 27 ------
>  src/lj_opt_narrow.c                           | 12 +--
>  src/lj_vm.h                                   |  7 +-
>  src/lj_vmmath.c                               | 82 +++++++++--------
>  src/vm_arm.dasc                               | 13 +--
>  src/vm_arm64.dasc                             | 11 ++-
>  src/vm_mips.dasc                              | 11 ++-
>  src/vm_mips64.dasc                            | 11 ++-
>  src/vm_ppc.dasc                               | 11 ++-
>  src/vm_x64.dasc                               | 44 ++-------
>  src/vm_x86.dasc                               | 46 ++--------
>  .../lj-684-pow-inconsistencies.test.lua       | 89 +++++++++++++++++++
>  .../lj-9-pow-inconsistencies.test.lua         |  2 +
>  17 files changed, 195 insertions(+), 195 deletions(-)
>  create mode 100644 test/tarantool-tests/lj-684-pow-inconsistencies.test.lua
> 
> diff --git a/src/lj_asm.c b/src/lj_asm.c
> index d71fa8c8..65261d50 100644
> --- a/src/lj_asm.c
> +++ b/src/lj_asm.c
> @@ -1650,7 +1650,6 @@ static void asm_loop(ASMState *as)
>  #if !LJ_SOFTFP32
>  #if !LJ_TARGET_X86ORX64
>  #define asm_ldexp(as, ir)	asm_callid(as, ir, IRCALL_ldexp)
> -#define asm_fppowi(as, ir)	asm_callid(as, ir, IRCALL_lj_vm_powi)
>  #endif
>  
>  static void asm_pow(ASMState *as, IRIns *ir)
> @@ -1661,10 +1660,8 @@ static void asm_pow(ASMState *as, IRIns *ir)
>  					  IRCALL_lj_carith_powu64);
>    else
>  #endif
> -  if (irt_isnum(IR(ir->op2)->t))
> -    asm_callid(as, ir, IRCALL_pow);
> -  else
> -    asm_fppowi(as, ir);
> +  asm_callid(as, ir, irt_isnum(IR(ir->op2)->t) ? IRCALL_lj_vm_pow :
> +						 IRCALL_lj_vm_powi);
>  }
>  
>  static void asm_div(ASMState *as, IRIns *ir)
> diff --git a/src/lj_asm_x86.h b/src/lj_asm_x86.h
> index 74f2d853..2b810c8d 100644
> --- a/src/lj_asm_x86.h
> +++ b/src/lj_asm_x86.h
> @@ -2005,19 +2005,6 @@ static void asm_ldexp(ASMState *as, IRIns *ir)
>    asm_x87load(as, ir->op2);
>  }
>  
> -static void asm_fppowi(ASMState *as, IRIns *ir)
> -{
> -  /* The modified regs must match with the *.dasc implementation. */
> -  RegSet drop = RSET_RANGE(RID_XMM0, RID_XMM1+1)|RID2RSET(RID_EAX);
> -  if (ra_hasreg(ir->r))
> -    rset_clear(drop, ir->r);  /* Dest reg handled below. */
> -  ra_evictset(as, drop);
> -  ra_destreg(as, ir, RID_XMM0);
> -  emit_call(as, lj_vm_powi_sse);
> -  ra_left(as, RID_XMM0, ir->op1);
> -  ra_left(as, RID_EAX, ir->op2);
> -}
> -
>  static int asm_swapops(ASMState *as, IRIns *ir)
>  {
>    IRIns *irl = IR(ir->op1);
> diff --git a/src/lj_dispatch.h b/src/lj_dispatch.h
> index b8bc2594..af870a75 100644
> --- a/src/lj_dispatch.h
> +++ b/src/lj_dispatch.h
> @@ -44,7 +44,7 @@ extern double __divdf3(double a, double b);
>  #define GOTDEF(_) \
>    _(floor) _(ceil) _(trunc) _(log) _(log10) _(exp) _(sin) _(cos) _(tan) \
>    _(asin) _(acos) _(atan) _(sinh) _(cosh) _(tanh) _(frexp) _(modf) _(atan2) \
> -  _(pow) _(fmod) _(ldexp) _(lj_vm_modi) \
> +  _(lj_vm_pow) _(fmod) _(ldexp) _(lj_vm_modi) \
>    _(lj_dispatch_call) _(lj_dispatch_ins) _(lj_dispatch_stitch) \
>    _(lj_dispatch_profile) _(lj_err_throw) \
>    _(lj_ffh_coroutine_wrap_err) _(lj_func_closeuv) _(lj_func_newL_gc) \
> diff --git a/src/lj_ircall.h b/src/lj_ircall.h
> index af064a6f..ac0888a0 100644
> --- a/src/lj_ircall.h
> +++ b/src/lj_ircall.h
> @@ -195,7 +195,7 @@ typedef struct CCallInfo {
>    _(ANY,	log,			1,   N, NUM, XA_FP) \
>    _(ANY,	lj_vm_log2,		1,   N, NUM, XA_FP) \
>    _(ANY,	lj_vm_powi,		2,   N, NUM, XA_FP) \
> -  _(ANY,	pow,			2,   N, NUM, XA2_FP) \
> +  _(ANY,	lj_vm_pow,		2,   N, NUM, XA2_FP) \
>    _(ANY,	atan2,			2,   N, NUM, XA2_FP) \
>    _(ANY,	ldexp,			2,   N, NUM, XA_FP) \
>    _(SOFTFP,	lj_vm_tobit,		1,   N, INT, XA_FP32) \
> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c
> index 0007107b..7d7cc9d1 100644
> --- a/src/lj_opt_fold.c
> +++ b/src/lj_opt_fold.c
> @@ -1114,33 +1114,6 @@ LJFOLDF(simplify_numpow_xkint)
>    return ref;
>  }
>  
> -LJFOLD(POW any KNUM)
> -LJFOLDF(simplify_numpow_xknum)
> -{
> -  if (knumright == 0.5)  /* x ^ 0.5 ==> sqrt(x) */
> -    return emitir(IRTN(IR_FPMATH), fins->op1, IRFPM_SQRT);
> -  return NEXTFOLD;
> -}
> -
> -LJFOLD(POW KNUM any)
> -LJFOLDF(simplify_numpow_kx)
> -{
> -  lua_Number n = knumleft;
> -  if (n == 2.0 && irt_isint(fright->t)) {  /* 2.0 ^ i ==> ldexp(1.0, i) */
> -#if LJ_TARGET_X86ORX64
> -    /* Different IR_LDEXP calling convention on x86/x64 requires conversion. */
> -    fins->o = IR_CONV;
> -    fins->op1 = fins->op2;
> -    fins->op2 = IRCONV_NUM_INT;
> -    fins->op2 = (IRRef1)lj_opt_fold(J);
> -#endif
> -    fins->op1 = (IRRef1)lj_ir_knum_one(J);
> -    fins->o = IR_LDEXP;
> -    return RETRYFOLD;
> -  }
> -  return NEXTFOLD;
> -}
> -
>  /* -- Simplify conversions ------------------------------------------------ */
>  
>  LJFOLD(CONV CONV IRCONV_NUM_INT)  /* _NUM */
> diff --git a/src/lj_opt_narrow.c b/src/lj_opt_narrow.c
> index 2cfb775b..d6601f4c 100644
> --- a/src/lj_opt_narrow.c
> +++ b/src/lj_opt_narrow.c
> @@ -590,20 +590,14 @@ TRef lj_opt_narrow_pow(jit_State *J, TRef rb, TRef rc, TValue *vb, TValue *vc)
>    rb = conv_str_tonum(J, rb, vb);
>    rb = lj_ir_tonum(J, rb);  /* Left arg is always treated as an FP number. */
>    rc = conv_str_tonum(J, rc, vc);
> -  /* Narrowing must be unconditional to preserve (-x)^i semantics. */
>    if (tvisint(vc) || numisint(numV(vc))) {
> -    int checkrange = 0;
> -    /* pow() is faster for bigger exponents. But do this only for (+k)^i. */
> -    if (tref_isk(rb) && (int32_t)ir_knum(IR(tref_ref(rb)))->u32.hi >= 0) {
> -      int32_t k = numberVint(vc);
> -      if (!(k >= -65536 && k <= 65536)) goto force_pow_num;
> -      checkrange = 1;
> -    }
> +    int32_t k = numberVint(vc);
> +    if (!(k >= -65536 && k <= 65536)) goto force_pow_num;
>      if (!tref_isinteger(rc)) {
>        /* Guarded conversion to integer! */
>        rc = emitir(IRTGI(IR_CONV), rc, IRCONV_INT_NUM|IRCONV_CHECK);
>      }
> -    if (checkrange && !tref_isk(rc)) {  /* Range guard: -65536 <= i <= 65536 */
> +    if (!tref_isk(rc)) {  /* Range guard: -65536 <= i <= 65536 */
>        TRef tmp = emitir(IRTI(IR_ADD), rc, lj_ir_kint(J, 65536));
>        emitir(IRTGI(IR_ULE), tmp, lj_ir_kint(J, 2*65536));
>      }
> diff --git a/src/lj_vm.h b/src/lj_vm.h
> index abaa7c52..f6f28a08 100644
> --- a/src/lj_vm.h
> +++ b/src/lj_vm.h
> @@ -82,10 +82,6 @@ LJ_ASMF int32_t LJ_FASTCALL lj_vm_modi(int32_t, int32_t);
>  LJ_ASMF void lj_vm_floor_sse(void);
>  LJ_ASMF void lj_vm_ceil_sse(void);
>  LJ_ASMF void lj_vm_trunc_sse(void);
> -LJ_ASMF void lj_vm_powi_sse(void);
> -#define lj_vm_powi	NULL
> -#else
> -LJ_ASMF double lj_vm_powi(double, int32_t);
>  #endif
>  #if LJ_TARGET_PPC || LJ_TARGET_ARM64
>  #define lj_vm_trunc	trunc
> @@ -100,6 +96,9 @@ LJ_ASMF int lj_vm_errno(void);
>  #endif
>  #endif
>  
> +LJ_ASMF double lj_vm_powi(double, int32_t);
> +LJ_ASMF double lj_vm_pow(double, double);
> +
>  /* Continuations for metamethods. */
>  LJ_ASMF void lj_cont_cat(void);  /* Continue with concatenation. */
>  LJ_ASMF void lj_cont_ra(void);  /* Store result in RA from instruction. */
> diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c
> index 14e66687..539f955b 100644
> --- a/src/lj_vmmath.c
> +++ b/src/lj_vmmath.c
> @@ -30,11 +30,51 @@ LJ_FUNCA double lj_wrap_sinh(double x) { return sinh(x); }
>  LJ_FUNCA double lj_wrap_cosh(double x) { return cosh(x); }
>  LJ_FUNCA double lj_wrap_tanh(double x) { return tanh(x); }
>  LJ_FUNCA double lj_wrap_atan2(double x, double y) { return atan2(x, y); }
> -LJ_FUNCA double lj_wrap_pow(double x, double y) { return pow(x, y); }
>  LJ_FUNCA double lj_wrap_fmod(double x, double y) { return fmod(x, y); }
>  #endif
>  
> -/* -- Helper functions for generated machine code ------------------------- */
> +/* -- Helper functions ---------------------------------------------------- */
> +
> +/* Unsigned x^k. */
> +static double lj_vm_powui(double x, uint32_t k)
> +{
> +  double y;
> +  lj_assertX(k != 0, "pow with zero exponent");
> +  for (; (k & 1) == 0; k >>= 1) x *= x;
> +  y = x;
> +  if ((k >>= 1) != 0) {
> +    for (;;) {
> +      x *= x;
> +      if (k == 1) break;
> +      if (k & 1) y *= x;
> +      k >>= 1;
> +    }
> +    y *= x;
> +  }
> +  return y;
> +}
> +
> +/* Signed x^k. */
> +double lj_vm_powi(double x, int32_t k)
> +{
> +  if (k > 1)
> +    return lj_vm_powui(x, (uint32_t)k);
> +  else if (k == 1)
> +    return x;
> +  else if (k == 0)
> +    return 1.0;
> +  else
> +    return 1.0 / lj_vm_powui(x, (uint32_t)-k);
> +}
> +
> +double lj_vm_pow(double x, double y)
> +{
> +  int32_t k = lj_num2int(y);
> +  if ((k >= -65536 && k <= 65536) && y == (double)k)
> +    return lj_vm_powi(x, k);
> +  else
> +    return pow(x, y);
> +}
>  
>  double lj_vm_foldarith(double x, double y, int op)
>  {
> @@ -44,7 +84,7 @@ double lj_vm_foldarith(double x, double y, int op)
>    case IR_MUL - IR_ADD: return x*y; break;
>    case IR_DIV - IR_ADD: return x/y; break;
>    case IR_MOD - IR_ADD: return x-lj_vm_floor(x/y)*y; break;
> -  case IR_POW - IR_ADD: return pow(x, y); break;
> +  case IR_POW - IR_ADD: return lj_vm_pow(x, y); break;
>    case IR_NEG - IR_ADD: return -x; break;
>    case IR_ABS - IR_ADD: return fabs(x); break;
>  #if LJ_HASJIT
> @@ -56,6 +96,8 @@ double lj_vm_foldarith(double x, double y, int op)
>    }
>  }
>  
> +/* -- Helper functions for generated machine code ------------------------- */
> +
>  #if (LJ_HASJIT && !(LJ_TARGET_ARM || LJ_TARGET_ARM64 || LJ_TARGET_PPC)) || LJ_TARGET_MIPS
>  int32_t LJ_FASTCALL lj_vm_modi(int32_t a, int32_t b)
>  {
> @@ -80,40 +122,6 @@ double lj_vm_log2(double a)
>  }
>  #endif
>  
> -#if !LJ_TARGET_X86ORX64
> -/* Unsigned x^k. */
> -static double lj_vm_powui(double x, uint32_t k)
> -{
> -  double y;
> -  lj_assertX(k != 0, "pow with zero exponent");
> -  for (; (k & 1) == 0; k >>= 1) x *= x;
> -  y = x;
> -  if ((k >>= 1) != 0) {
> -    for (;;) {
> -      x *= x;
> -      if (k == 1) break;
> -      if (k & 1) y *= x;
> -      k >>= 1;
> -    }
> -    y *= x;
> -  }
> -  return y;
> -}
> -
> -/* Signed x^k. */
> -double lj_vm_powi(double x, int32_t k)
> -{
> -  if (k > 1)
> -    return lj_vm_powui(x, (uint32_t)k);
> -  else if (k == 1)
> -    return x;
> -  else if (k == 0)
> -    return 1.0;
> -  else
> -    return 1.0 / lj_vm_powui(x, (uint32_t)-k);
> -}
> -#endif
> -
>  /* Computes fpm(x) for extended math functions. */
>  double lj_vm_foldfpm(double x, int fpm)
>  {
> diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
> index 767d31f9..792f0363 100644
> --- a/src/vm_arm.dasc
> +++ b/src/vm_arm.dasc
> @@ -1485,11 +1485,11 @@ static void build_subroutines(BuildCtx *ctx)
>    |.endif
>    |.endmacro
>    |
> -  |.macro math_extern2, func
> +  |.macro math_extern2, name, func
>    |.if HFABI
> -  |  .ffunc_dd math_ .. func
> +  |  .ffunc_dd math_ .. name
>    |.else
> -  |  .ffunc_nn math_ .. func
> +  |  .ffunc_nn math_ .. name
>    |.endif
>    |  .IOS mov RA, BASE
>    |  bl extern func
> @@ -1500,6 +1500,9 @@ static void build_subroutines(BuildCtx *ctx)
>    |  b ->fff_restv
>    |.endif
>    |.endmacro
> +  |.macro math_extern2, func
> +  |  math_extern2 func, func
> +  |.endmacro
>    |
>    |.if FPU
>    |  .ffunc_d math_sqrt
> @@ -1545,7 +1548,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  math_extern sinh
>    |  math_extern cosh
>    |  math_extern tanh
> -  |  math_extern2 pow
> +  |  math_extern2 pow, lj_vm_pow
>    |  math_extern2 atan2
>    |  math_extern2 fmod
>    |
> @@ -3153,7 +3156,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      break;
>    case BC_POW:
>      |  // NYI: (partial) integer arithmetic.
> -    |  ins_arithfp extern, extern pow
> +    |  ins_arithfp extern, extern lj_vm_pow
>      break;
>  
>    case BC_CAT:
> diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
> index de33bde4..fb267a76 100644
> --- a/src/vm_arm64.dasc
> +++ b/src/vm_arm64.dasc
> @@ -1391,11 +1391,14 @@ static void build_subroutines(BuildCtx *ctx)
>    |  b ->fff_resn
>    |.endmacro
>    |
> -  |.macro math_extern2, func
> -  |  .ffunc_nn math_ .. func
> +  |.macro math_extern2, name, func
> +  |  .ffunc_nn math_ .. name
>    |  bl extern func
>    |  b ->fff_resn
>    |.endmacro
> +  |.macro math_extern2, func
> +  |  math_extern2 func, func
> +  |.endmacro
>    |
>    |.ffunc_n math_sqrt
>    |  fsqrt d0, d0
> @@ -1424,7 +1427,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  math_extern sinh
>    |  math_extern cosh
>    |  math_extern tanh
> -  |  math_extern2 pow
> +  |  math_extern2 pow, lj_vm_pow
>    |  math_extern2 atan2
>    |  math_extern2 fmod
>    |
> @@ -2621,7 +2624,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  ins_arithload FARG1, FARG2
>      |  ins_arithfallback ins_arithcheck_num
>      |.if "fpins" == "fpow"
> -    |  bl extern pow
> +    |  bl extern lj_vm_pow
>      |.else
>      |  fpins FARG1, FARG1, FARG2
>      |.endif
> diff --git a/src/vm_mips.dasc b/src/vm_mips.dasc
> index 32caabf7..5664f503 100644
> --- a/src/vm_mips.dasc
> +++ b/src/vm_mips.dasc
> @@ -1631,14 +1631,17 @@ static void build_subroutines(BuildCtx *ctx)
>    |.  nop
>    |.endmacro
>    |
> -  |.macro math_extern2, func
> -  |  .ffunc_nn math_ .. func
> +  |.macro math_extern2, name, func
> +  |  .ffunc_nn math_ .. name
>    |.  load_got func
>    |  call_extern
>    |.  nop
>    |  b ->fff_resn
>    |.  nop
>    |.endmacro
> +  |.macro math_extern2, func
> +  |  math_extern2 func, func
> +  |.endmacro
>    |
>    |// TODO: Return integer type if result is integer (own sf implementation).
>    |.macro math_round, func
> @@ -1692,7 +1695,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  math_extern sinh
>    |  math_extern cosh
>    |  math_extern tanh
> -  |  math_extern2 pow
> +  |  math_extern2 pow, lj_vm_pow
>    |  math_extern2 atan2
>    |  math_extern2 fmod
>    |
> @@ -3585,7 +3588,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  sltiu AT, SFARG1HI, LJ_TISNUM
>      |  sltiu TMP0, SFARG2HI, LJ_TISNUM
>      |  and AT, AT, TMP0
> -    |  load_got pow
> +    |  load_got lj_vm_pow
>      |  beqz AT, ->vmeta_arith
>      |.  addu RA, BASE, RA
>      |.if FPU
> diff --git a/src/vm_mips64.dasc b/src/vm_mips64.dasc
> index 44fba36c..249605d4 100644
> --- a/src/vm_mips64.dasc
> +++ b/src/vm_mips64.dasc
> @@ -1669,14 +1669,17 @@ static void build_subroutines(BuildCtx *ctx)
>    |.  nop
>    |.endmacro
>    |
> -  |.macro math_extern2, func
> -  |  .ffunc_nn math_ .. func
> +  |.macro math_extern2, name, func
> +  |  .ffunc_nn math_ .. name
>    |.  load_got func
>    |  call_extern
>    |.  nop
>    |  b ->fff_resn
>    |.  nop
>    |.endmacro
> +  |.macro math_extern2, func
> +  |  math_extern2 func, func
> +  |.endmacro
>    |
>    |// TODO: Return integer type if result is integer (own sf implementation).
>    |.macro math_round, func
> @@ -1730,7 +1733,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  math_extern sinh
>    |  math_extern cosh
>    |  math_extern tanh
> -  |  math_extern2 pow
> +  |  math_extern2 pow, lj_vm_pow
>    |  math_extern2 atan2
>    |  math_extern2 fmod
>    |
> @@ -3823,7 +3826,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  sltiu TMP0, TMP0, LJ_TISNUM
>      |   sltiu TMP1, TMP1, LJ_TISNUM
>      |  and AT, TMP0, TMP1
> -    |  load_got pow
> +    |  load_got lj_vm_pow
>      |  beqz AT, ->vmeta_arith
>      |.  daddu RA, BASE, RA
>      |.if FPU
> diff --git a/src/vm_ppc.dasc b/src/vm_ppc.dasc
> index 980ad897..94af63e6 100644
> --- a/src/vm_ppc.dasc
> +++ b/src/vm_ppc.dasc
> @@ -2032,11 +2032,14 @@ static void build_subroutines(BuildCtx *ctx)
>    |  b ->fff_resn
>    |.endmacro
>    |
> -  |.macro math_extern2, func
> -  |  .ffunc_nn math_ .. func
> +  |.macro math_extern2, name, func
> +  |  .ffunc_nn math_ .. name
>    |  blex func
>    |  b ->fff_resn
>    |.endmacro
> +  |.macro math_extern2, func
> +  |  math_extern2 func, func
> +  |.endmacro
>    |
>    |.macro math_round, func
>    |  .ffunc_1 math_ .. func
> @@ -2161,7 +2164,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  math_extern sinh
>    |  math_extern cosh
>    |  math_extern tanh
> -  |  math_extern2 pow
> +  |  math_extern2 pow, lj_vm_pow
>    |  math_extern2 atan2
>    |  math_extern2 fmod
>    |
> @@ -4154,7 +4157,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  checknum cr1, CARG3
>      |  crand 4*cr0+lt, 4*cr0+lt, 4*cr1+lt
>      |  bge ->vmeta_arith_vv
> -    |  blex pow
> +    |  blex lj_vm_pow
>      |  ins_next1
>      |.if FPU
>      |  stfdx FARG1, BASE, RA
> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 7b04b928..acbe8dc2 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc
> @@ -1825,13 +1825,16 @@ static void build_subroutines(BuildCtx *ctx)
>    |  jmp ->fff_resxmm0
>    |.endmacro
>    |
> -  |.macro math_extern2, func
> -  |  .ffunc_nn math_ .. func
> +  |.macro math_extern2, name, func
> +  |  .ffunc_nn math_ .. name
>    |  mov RB, BASE
>    |  call extern func
>    |  mov BASE, RB
>    |  jmp ->fff_resxmm0
>    |.endmacro
> +  |.macro math_extern2, func
> +  |  math_extern2 func, func
> +  |.endmacro
>    |
>    |  math_extern log10
>    |  math_extern exp
> @@ -1844,7 +1847,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  math_extern sinh
>    |  math_extern cosh
>    |  math_extern tanh
> -  |  math_extern2 pow
> +  |  math_extern2 pow, lj_vm_pow
>    |  math_extern2 atan2
>    |  math_extern2 fmod
>    |
> @@ -2649,41 +2652,6 @@ static void build_subroutines(BuildCtx *ctx)
>    |  subsd xmm0, xmm1
>    |  ret
>    |
> -  |// Args in xmm0/eax. Ret in xmm0. xmm0-xmm1 and eax modified.
> -  |->vm_powi_sse:
> -  |  cmp eax, 1; jle >6			// i<=1?
> -  |  // Now 1 < (unsigned)i <= 0x80000000.
> -  |1:  // Handle leading zeros.
> -  |  test eax, 1; jnz >2
> -  |  mulsd xmm0, xmm0
> -  |  shr eax, 1
> -  |  jmp <1
> -  |2:
> -  |  shr eax, 1; jz >5
> -  |  movaps xmm1, xmm0
> -  |3:  // Handle trailing bits.
> -  |  mulsd xmm0, xmm0
> -  |  shr eax, 1; jz >4
> -  |  jnc <3
> -  |  mulsd xmm1, xmm0
> -  |  jmp <3
> -  |4:
> -  |  mulsd xmm0, xmm1
> -  |5:
> -  |  ret
> -  |6:
> -  |  je <5				// x^1 ==> x
> -  |  jb >7				// x^0 ==> 1
> -  |  neg eax
> -  |  call <1
> -  |  sseconst_1 xmm1, RD
> -  |  divsd xmm1, xmm0
> -  |  movaps xmm0, xmm1
> -  |  ret
> -  |7:
> -  |  sseconst_1 xmm0, RD
> -  |  ret
> -  |
>    |//-----------------------------------------------------------------------
>    |//-- Miscellaneous functions --------------------------------------------
>    |//-----------------------------------------------------------------------
> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> index bd1e940e..bf30cce6 100644
> --- a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc
> @@ -2240,8 +2240,8 @@ static void build_subroutines(BuildCtx *ctx)
>    |  jmp ->fff_resfp
>    |.endmacro
>    |
> -  |.macro math_extern2, func
> -  |  .ffunc_nnsse math_ .. func
> +  |.macro math_extern2, name, func
> +  |  .ffunc_nnsse math_ .. name
>    |.if not X64
>    |  movsd FPARG1, xmm0
>    |  movsd FPARG3, xmm1
> @@ -2251,6 +2251,9 @@ static void build_subroutines(BuildCtx *ctx)
>    |  mov BASE, RB
>    |  jmp ->fff_resfp
>    |.endmacro
> +  |.macro math_extern2, func
> +  |  math_extern2 func, func
> +  |.endmacro
>    |
>    |  math_extern log10
>    |  math_extern exp
> @@ -2263,7 +2266,7 @@ static void build_subroutines(BuildCtx *ctx)
>    |  math_extern sinh
>    |  math_extern cosh
>    |  math_extern tanh
> -  |  math_extern2 pow
> +  |  math_extern2 pow, lj_vm_pow
>    |  math_extern2 atan2
>    |  math_extern2 fmod
>    |
> @@ -3140,41 +3143,6 @@ static void build_subroutines(BuildCtx *ctx)
>    |  subsd xmm0, xmm1
>    |  ret
>    |
> -  |// Args in xmm0/eax. Ret in xmm0. xmm0-xmm1 and eax modified.
> -  |->vm_powi_sse:
> -  |  cmp eax, 1; jle >6			// i<=1?
> -  |  // Now 1 < (unsigned)i <= 0x80000000.
> -  |1:  // Handle leading zeros.
> -  |  test eax, 1; jnz >2
> -  |  mulsd xmm0, xmm0
> -  |  shr eax, 1
> -  |  jmp <1
> -  |2:
> -  |  shr eax, 1; jz >5
> -  |  movaps xmm1, xmm0
> -  |3:  // Handle trailing bits.
> -  |  mulsd xmm0, xmm0
> -  |  shr eax, 1; jz >4
> -  |  jnc <3
> -  |  mulsd xmm1, xmm0
> -  |  jmp <3
> -  |4:
> -  |  mulsd xmm0, xmm1
> -  |5:
> -  |  ret
> -  |6:
> -  |  je <5				// x^1 ==> x
> -  |  jb >7				// x^0 ==> 1
> -  |  neg eax
> -  |  call <1
> -  |  sseconst_1 xmm1, RDa
> -  |  divsd xmm1, xmm0
> -  |  movaps xmm0, xmm1
> -  |  ret
> -  |7:
> -  |  sseconst_1 xmm0, RDa
> -  |  ret
> -  |
>    |//-----------------------------------------------------------------------
>    |//-- Miscellaneous functions --------------------------------------------
>    |//-----------------------------------------------------------------------
> @@ -3976,7 +3944,7 @@ static void build_ins(BuildCtx *ctx, BCOp op, int defop)
>      |  movsd FPARG1, xmm0
>      |  movsd FPARG3, xmm1
>      |.endif
> -    |  call extern pow
> +    |  call extern lj_vm_pow
>      |  movzx RA, PC_RA
>      |  mov BASE, RB
>      |.if X64
> diff --git a/test/tarantool-tests/lj-684-pow-inconsistencies.test.lua b/test/tarantool-tests/lj-684-pow-inconsistencies.test.lua
> new file mode 100644
> index 00000000..5129fc45
> --- /dev/null
> +++ b/test/tarantool-tests/lj-684-pow-inconsistencies.test.lua
> @@ -0,0 +1,89 @@
> +local tap = require('tap')
> +-- Test to demonstrate the incorrect JIT behaviour for different
> +-- power operation optimizations.
> +-- See also:
> +-- https://github.com/LuaJIT/LuaJIT/issues/684.
> +local test = tap.test('lj-684-pow-inconsistencies'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +local tostring = tostring
> +
> +test:plan(4)
> +
> +jit.opt.start('hotloop=1')
> +
> +-- XXX: Prevent hotcount side effects.
> +jit.off()
> +jit.flush()
> +
> +local res = {}
> +-- -0 ^ 0.5 = 0. Test sign with `tostring()`.
Typo: s/Test/Test the/
> +-- XXX: use local variable to prevent folding via parser.
> +-- XXX: use stack slot out of trace to prevent constant folding.
> +local minus_zero = -0
> +jit.on()
> +for i = 1, 4 do
> +  res[i] = tostring(minus_zero ^ 0.5)
> +end
> +
> +-- XXX: Prevent hotcount side effects.
> +jit.off()
> +jit.flush()
> +
> +test:samevalues(res, ('consistent results for folding (-0) ^ 0.5'))
> +
> +jit.on()
> +-- -inf ^ 0.5 = inf.
> +res = {}
> +local minus_inf = -math.huge
> +jit.on()
> +for i = 1, 4 do
> +  res[i] = minus_inf ^ 0.5
> +end
> +
> +-- XXX: Prevent hotcount side effects.
> +jit.off()
> +jit.flush()
> +
> +test:samevalues(res, ('consistent results for folding (-inf) ^ 0.5'))
> +
> +-- 2921 ^ 0.5 = 0x1.b05ec632536fap+5.
We certainly need to add some explanation here about the precision, because
it is not obvious why these magic numbers should cause any issues.
> +res = {}
> +-- XXX: use local variable to prevent folding via parser.
> +-- XXX: use stack slot out of trace to prevent constant folding.
> +local corner_case_05 = 2921
> +jit.on()
> +for i = 1, 4 do
> +  res[i] = corner_case_05 ^ 0.5
> +end
> +
> +-- XXX: Prevent hotcount side effects.
> +jit.off()
> +jit.flush()
> +
> +test:samevalues(res, ('consistent results for folding 2921 ^ 0.5'))

I believe it is possible to make a single function with different
parameters for all three cases above.
Something like `test_power(value, power, extra_map)`, so you can do
| res[i] = extra_map(value ^ power)

> +
> +-- Narrowing for non-constant base of power operation.
> +local function pow(base, power)
> +  return base ^ power
> +end
> +
> +jit.on()
> +
> +-- Compile function first.
> +pow(1, 2)
> +pow(1, 2)
> +
> +-- Need some value near 1, to avoid infinite result.
Typo: s/Need/We need/
Typo: s/avoid/avoid an/
> +local base = 1.0000000001
> +local power = 65536 * 3
> +local resulting_value = pow(base, power)
> +
> +-- XXX: Prevent hotcount side effects.
> +jit.off()
> +jit.flush()
> +
> +test:is(resulting_value, base ^ power, 'guard for narrowing of power operation')
> +
> +test:done(true)
> diff --git a/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua
> index 21b3a0d9..1f7f65c5 100644
> --- a/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua
> +++ b/test/tarantool-tests/lj-9-pow-inconsistencies.test.lua
> @@ -16,6 +16,8 @@ local INTERESTING_VALUES = {
>    -- x ^  inf = 0 (inf), if |x| < 1 (|x| > 1).
>    -- x ^ -inf = inf (0), if |x| < 1 (|x| > 1).
>    0.999999, 1.000001, -0.999999, -1.000001,
> +  -- Test power of even numbers optimizations.
> +  2, -2, 0.5, -0.5,
>  }
>  test:plan(1 + (#INTERESTING_VALUES) ^ 2)
>  
> -- 
> 2.41.0
>