* Re: [Tarantool-patches] [PATCH luajit v6] Fix math.min()/math.max() inconsistencies.
2022-12-19 9:52 [Tarantool-patches] [PATCH luajit v6] Fix math.min()/math.max() inconsistencies Maxim Kokryashkin via Tarantool-patches
@ 2022-12-23 15:42 ` Sergey Kaplun via Tarantool-patches
0 siblings, 0 replies; 2+ messages in thread
From: Sergey Kaplun via Tarantool-patches @ 2022-12-23 15:42 UTC (permalink / raw)
To: Maxim Kokryashkin; +Cc: tarantool-patches
Hi Maxim!
Thanks for the fixes!
Please consider my comments below.
Also, during a flash of inspiration and debugging test cases for this
particular patch I found the test case for `math.modf()`. So, we need
an additional patch with the test as a followup to the backported
commit [8].
The following code:
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local modf = math.modf
| local nan = 0/0
| local inf = math.huge
|
| local r1 = {nil, nil, nil, nil}
| local r2 = {nil, nil, nil, nil}
| local r3 = {nil, nil, nil, nil}
| local r4 = {nil, nil, nil, nil}
| print("MODF")
| for i = 1, 4 do
| r1[i], r2[i] = modf(inf)
| r3[i], r4[i] = modf(nan)
| end
| print(r1[1], r1[2], r1[3], r1[4])
| print(r2[1], r2[2], r2[3], r2[4])
| print("")
| print(r3[1], r3[2], r3[3], r3[4])
| print(r4[1], r4[2], r4[3], r4[4])
| '
Returns before the patch (on both arches):
| MODF
| inf inf inf inf
| 0 0 nan nan
|
| nan nan nan nan
| nan nan nan nan
And after patch (on both arches):
| MODF
| inf inf inf inf
| 0 0 0 0
|
| nan nan nan nan
| nan nan nan nan
On 19.12.22, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
>
> (cherry-picked from commit 03208c8162af9cc01ca76ee1676ca79e5abe9b60)
>
> `math.min()`/`math.max()` could produce different results.
> Previously, dirty values on the Lua stack could be
> treated as arguments to `math.min()`/`math.max()`.
> This patch adds check for the number of arguments provided to
> math.min/max, which fixes the issue.
>
> Also it adds the corresponding test case for
> the mentioned issue and does some refactoring:
OK, this is not a refactoring but adding a consistency between the JIT and the
VM behaviour.
> 1. fcc is changed for min/max functions in ARM
> assembly from LO/HI (lower/upper or unordered) to LE/PL
> (lower/upper, equal or unoredered).
>
> 2. Several fold optimizations for min/max were removed
> or modified.
See my comments with tests below.
>
> Resolves tarantool/tarantool#6163
> ---
>
> >IMHO, LO -> LT (N!=V Less than or unordered) do the same thing wo
> >changing the order.
> >
> >IINM, an ordered comparison checks if neither operand is NaN.
> >Conversely, an unordered comparison checks if either operand is a NaN.
> >
> >So, looks like an attempt to fix inconsistent behaviour for NaNs in
> >math.min/math.max on aarch64.
> >
> >Also, I found inconsistent behaviour on x86 (between LuaJIT|Lua):
> >
> >| # on upstream build
> >| $ ./luajit -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| inf
> >| inf
> >| inf
> >| inf
> >| $ lua -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| -nan
> >| -nan
> >| -nan
> >| -nan
> >
> >Can you please test some similar examples on aarch64/M1?
> I've tested those on M1 and here are the results:
> $ ./src/luajit -Ohotloop=1 -e 'local res = {} for i=1,4 do res[i]=math.max(0/0,math.huge) end for i =1, #res do print(res[i]) end'
> inf
> inf
> inf
> inf
>
> $ lua -e 'local res = {} for i=1,4 do res[i]=math.max(0/0,math.huge) end for i =1, #res do print(res[i]) end'
> nan
> nan
> nan
> nan
>
> >Also, AFAICS, some optimizations are the reason of inconsistent
> >behaviour for JIT-ed code (not the fold in this commit).
> >| # on our fork
> >| ./luajit -O0 -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| inf
> >| inf
> >| inf
> >| inf
> >| # on our fork
> >| ./luajit -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| inf
> >| inf
> >| inf
> >| nan
> >
> >BTW this commit doesn't fix the problem. Can you please bisect the
> >commit to backport?
> Works perfectly fine on our fork after this patch.
>
My, bad. PEBCAK:).
The aforementioned testcase really passes after the whole patch.
Side note: __Five minutes later__:
OK, nevermind, still inconsistent on upstream (in some other way):
| $ src/luajit -Ohotloop=1 -e '
| local inf = math.huge
| local nan = inf/inf
| local min = math.min
| local max = math.max
|
| print(nan, inf)
|
| local r = {}
| local r_assoc = {}
| print("MIN:")
| for i = 1, 4 do
| r[i] = min(nan, inf)
| r_assoc[i] = min(min(nan, inf), nan)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
|
| print("MAX:")
| for i = 1, 4 do
| r[i] = max(nan, inf)
| r_assoc[i] = max(max(nan, inf), nan)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| '
| nan inf
| MIN:
| inf inf inf nan
| nan nan nan nan
| MAX:
| inf inf nan nan
| nan nan nan nan
> >> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c
> >> index 276dc040..07a52a4d 100644
> >> --- a/src/lj_opt_fold.c
> >> +++ b/src/lj_opt_fold.c
> >> @@ -1774,8 +1774,6 @@ LJFOLDF(reassoc_intarith_k64)
> >> #endif
> >> }
> >>
> >> -LJFOLD(MIN MIN any)
> >> -LJFOLD(MAX MAX any)
> >> LJFOLD(BAND BAND any)
> >> LJFOLD(BOR BOR any)
> >> LJFOLDF(reassoc_dup)
> >> @@ -1785,6 +1783,15 @@ LJFOLDF(reassoc_dup)
> >> return NEXTFOLD;
> >> }
> >>
> >> +LJFOLD(MIN MIN any)
> >> +LJFOLD(MAX MAX any)
> >> +LJFOLDF(reassoc_dup_minmax)
> >> +{
> >> + if (fins->op2 == fleft->op2)
> >> + return LEFTFOLD; /* (a o b) o b ==> a o b */
> >> + return NEXTFOLD;
> >> +}
> >> +
I don't understand why this should be fixed by removing this
optimizations (this is about `min(min(a, b), b)` and `max(max(a, b), b)`
constructions).
Do you mean `fold_comm_dup_minmax()` part?
> >
> >Do you know why the opt `(a o b) o a ==> a o b;` is ommited now?
> >Are there any examples of incorrect behaviour? I suggest to check NaN
> >behaviour in this case.
> Added the test case for that one, but I failed to find any for the
> other.
>
> I've done test runs on all of the combinations of `{1, -1, 0, -0, 0/0,
> -math.huge, math.huge}` for all of the optimizations.
>
> I'll be glad to add another test case if you can think of any.
OK, I mark all spots with tests to be added as (T). Please, provide
verbose description for each of my comments in commit message and
comments in tests.
>
> src/lj_asm_arm.h | 6 +--
> src/lj_asm_arm64.h | 6 +--
> src/lj_opt_fold.c | 53 +++++++------------
> src/lj_vmmath.c | 4 +-
> src/vm_arm.dasc | 4 +-
> src/vm_arm64.dasc | 4 +-
> src/vm_x64.dasc | 2 +-
> src/vm_x86.dasc | 2 +-
> test/tarantool-tests/gh-6163-min-max.test.lua | 48 +++++++++++++++++
> 9 files changed, 81 insertions(+), 48 deletions(-)
> create mode 100644 test/tarantool-tests/gh-6163-min-max.test.lua
>
> diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h
> index 8af19eb9..6ae6e2f2 100644
> --- a/src/lj_asm_arm.h
> +++ b/src/lj_asm_arm.h
> @@ -1663,8 +1663,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, int cc, int fcc)
> asm_intmin_max(as, ir, cc);
> }
>
> -#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_HI)
> -#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LO)
> +#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_PL)
> +#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LE)
>
> /* -- Comparisons --------------------------------------------------------- */
>
> @@ -1856,7 +1856,7 @@ static void asm_hiop(ASMState *as, IRIns *ir)
> } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
> as->curins--; /* Always skip the loword min/max. */
> if (uselo || usehi)
> - asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_HI : CC_LO);
> + asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_PL : CC_LE);
> return;
> #elif LJ_HASFFI
> } else if ((ir-1)->o == IR_CONV) {
> diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
> index 4aeb51f3..fe197700 100644
> --- a/src/lj_asm_arm64.h
> +++ b/src/lj_asm_arm64.h
> @@ -1592,7 +1592,7 @@ static void asm_fpmin_max(ASMState *as, IRIns *ir, A64CC fcc)
> Reg dest = (ra_dest(as, ir, RSET_FPR) & 31);
> Reg right, left = ra_alloc2(as, ir, RSET_FPR);
> right = ((left >> 8) & 31); left &= 31;
> - emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, left, right);
> + emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, right, left);
> emit_nm(as, A64I_FCMPd, left, right);
> }
>
> @@ -1604,8 +1604,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, A64CC cc, A64CC fcc)
> asm_intmin_max(as, ir, cc);
> }
>
> -#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_HI)
> -#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_LO)
> +#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_PL)
> +#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_LE)
>
> /* -- Comparisons --------------------------------------------------------- */
>
> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c
I test each piece of changes on x86/x64 and aarch64 and here are my
observations:
> index 49f74996..27e489af 100644
> --- a/src/lj_opt_fold.c
> +++ b/src/lj_opt_fold.c
> @@ -1797,8 +1797,6 @@ LJFOLDF(reassoc_intarith_k64)
> #endif
> }
>
> -LJFOLD(MIN MIN any)
> -LJFOLD(MAX MAX any)
> LJFOLD(BAND BAND any)
> LJFOLD(BOR BOR any)
> LJFOLDF(reassoc_dup)
> @@ -1808,6 +1806,15 @@ LJFOLDF(reassoc_dup)
> return NEXTFOLD;
> }
>
> +LJFOLD(MIN MIN any)
> +LJFOLD(MAX MAX any)
> +LJFOLDF(reassoc_dup_minmax)
> +{
> + if (fins->op2 == fleft->op2)
> + return LEFTFOLD; /* (a o b) o b ==> a o b */
> + return NEXTFOLD;
> +}
> +
Avoiding of `(a o b) o a ==> a o b` optimization "fixes" (if I get the
idea right) the cases like the following on arch64:
| min(min(x, nan), x)
where `x` is a finite number.
(T) For example the following snippet:
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local nan = 0/0
| local x = 1
| local min = math.min
|
| print(x, nan, inf)
| print("MIN:", x, nan)
| local r = {}
| local r_assoc = {}
| for k = 1, 4 do
| r[k] = min(x, nan)
| r_assoc[k] = min(min(x, nan), x)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n")
| '
prints the following before the patch:
| 1 nan inf
| MIN: 1 nan
| nan nan nan nan
| 1 1 1 nan
and after patch:
| 1 nan inf
| MIN: 1 nan
| nan nan nan nan
| 1 1 1 1
For this code we get the following IRs:
| 0003 x8 > int SLOAD #4 T ; 1
| 0004 d3 > num SLOAD #3 T ; nan
| 0005 > fun EQ 0002 math.min
| 0006 d2 num CONV 0003 num.int ; 1 -> 1.0
| 0007 d1 num MIN 0006 0004 ; res_0007 = min(1.0, nan) = nan
In case after patch with removed fold optimization we get the additional IR
later:
| 0016 d0 num MIN 0007 0006 ; min(res_0007, 1.0)
with the following mcode:
| fcmp d2, d3 ; fcmp 1.0, nan
| fcsel d1, d2, d3, cc ; d1 == nan after this instruction (*)
| ...
| fcmp d1, d2 ; fcmp nan, 1.0
| fcsel d0, d1, d2, cc ; d0 == 1.0 after this instruction (*)
Before the patch only one `fcmp` `fcsel` pair is emitted, so `nan` value
is returned in the destination register.
(*) Why this value is chosen:
According to the `fcmp` documentation:
| The IEEE 754 standard specifies that the result of a comparison is
| precisely one of <, ==, > or unordered. If either or both of the
| operands are NaNs, they are unordered, and all three of (Operand1 <
| Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false.
| This case results in the FPSCR flags being set to N=0, Z=0, C=1, and
| V=1.
`cc` (aka `CC_LO` == `CC_CC`) condition means that C flag is 0 [2], that
is false when we are comparing something with NaN.
So, according to the `fcsel` documentation [3]:
| If the condition passes, the first SIMD and FP source register value
| is taken, otherwise the second SIMD and FP source register value is
| taken.
So, the value of the second source register is always chosen.
(TT) When we change the order of arguments things become broken again:
| $ LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local inf = math.huge
| local nan = 0/0
|
| local x = 1
| local min = math.min
| local max = math.max
|
| print(x, nan, inf)
| print("MIN:", x, nan)
| local r = {}
| local r_assoc = {}
| for k = 1, 4 do
| r[k] = min(x, nan)
| r_assoc[k] = min(x, min(x, nan))
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n")
| '
Prints the following before the patch (on aarch64):
| 1 nan inf
| MIN: 1 nan
| nan nan nan nan
| nan nan nan nan
And after this particular chunk of the patch (reassoc_dup_minmax):
| 1 nan inf
| MIN: 1 nan
| nan nan nan nan
| nan nan nan 1
> LJFOLD(BXOR BXOR any)
> LJFOLDF(reassoc_bxor)
> {
> @@ -1846,23 +1853,12 @@ LJFOLDF(reassoc_shift)
> return NEXTFOLD;
> }
>
> -LJFOLD(MIN MIN KNUM)
> -LJFOLD(MAX MAX KNUM)
> LJFOLD(MIN MIN KINT)
> LJFOLD(MAX MAX KINT)
> LJFOLDF(reassoc_minmax_k)
> {
> IRIns *irk = IR(fleft->op2);
> - if (irk->o == IR_KNUM) {
> - lua_Number a = ir_knum(irk)->n;
> - lua_Number y = lj_vm_foldarith(a, knumright, fins->o - IR_ADD);
> - if (a == y) /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */
> - return LEFTFOLD;
> - PHIBARRIER(fleft);
> - fins->op1 = fleft->op1;
> - fins->op2 = (IRRef1)lj_ir_knum(J, y);
> - return RETRYFOLD; /* (x o k1) o k2 ==> x o (k1 o k2) */
> - } else if (irk->o == IR_KINT) {
> + if (irk->o == IR_KINT) {
> int32_t a = irk->i;
> int32_t y = kfold_intop(a, fright->i, fins->o);
> if (a == y) /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */
> @@ -1875,24 +1871,6 @@ LJFOLDF(reassoc_minmax_k)
> return NEXTFOLD;
> }
(TTT) This chunk fixes behaviour for constants reassociation.
Run the following chunk:
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local inf = math.huge
| local nan = 0/0
| local min = math.min
| local max = math.max
|
| local r = {}
| local r_assoc = {}
| print("MIN:")
| local x = 1.2
| for i = 1, 4 do
| r[i] = min(min(x, 0/0), 1.3)
| r_assoc[i] = min(min(x, 1.3), 0/0)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
|
| print("MAX:")
| for i = 1, 4 do
| r[i] = max(max(x, 0/0), 1.1)
| r_assoc[i] = max(max(x, 1.1), 0/0)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| '
Gives the following results before the particular (reassoc_minmax_k)
patch (on both arches):
| MIN:
| 1.3 1.3 1.3 1.2
| nan nan nan nan
| MAX:
| 1.1 1.1 1.2 1.2
| nan nan nan nan
And after patch (on both arches):
| MIN:
| 1.3 1.3 1.3 1.3
| nan nan nan nan
| MAX:
| 1.1 1.1 1.1 1.1
| nan nan nan nan
>
> -LJFOLD(MIN MAX any)
> -LJFOLD(MAX MIN any)
> -LJFOLDF(reassoc_minmax_left)
> -{
> - if (fins->op2 == fleft->op1 || fins->op2 == fleft->op2)
> - return RIGHTFOLD; /* (b o1 a) o2 b ==> b; (a o1 b) o2 b ==> b */
> - return NEXTFOLD;
> -}
This particular patch (reassoc_minmax_left) fixes the following test
case on aarch64. (TTTT)
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local min = math.min
| local max = math.max
| local nan = 0/0
|
| local r_assoc = {}
| local r_assoc2 = {}
| print("MIN - MAX:")
| for i = 1, 4 do
| r_assoc[i] = min(max(nan, 1), 1)
| r_assoc2[i] = min(max(1, nan), 1)
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
|
| print("MAX - MIN:")
| for i = 1, 4 do
| r_assoc[i] = max(min(nan, 1), 1)
| r_assoc2[i] = max(min(1, nan), 1)
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
| '
Returns before the patch:
| MIN - MAX:
| 1 1 1 nan
| 1 1 1 nan
|
| MAX - MIN:
| 1 1 nan nan
| 1 1 nan nan
And after the patch:
| MIN - MAX:
| 1 1 1 1
| 1 1 1 1
|
| MAX - MIN:
| 1 1 1 1
| 1 1 1 1
> -
> -LJFOLD(MIN any MAX)
> -LJFOLD(MAX any MIN)
> -LJFOLDF(reassoc_minmax_right)
> -{
> - if (fins->op1 == fright->op1 || fins->op1 == fright->op2)
> - return LEFTFOLD; /* a o2 (a o1 b) ==> a; a o2 (b o1 a) ==> a */
> - return NEXTFOLD;
> -}
> -
OK, let's try a similar test case here (TTTTT):
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local min = math.min
| local max = math.max
| local nan = 0/0
|
| local r_assoc = {}
| local r_assoc2 = {}
| print("MIN - MAX:")
| for i = 1, 4 do
| r_assoc[i] = min(1, max(nan, 1))
| r_assoc2[i] = min(1, max(1, nan))
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
|
| print("MAX - MIN:")
| for i = 1, 4 do
| r_assoc[i] = max(1, min(nan, 1))
| r_assoc2[i] = max(1, min(1, nan))
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
| '
Before the patch it leads to the assertion failure in
`rec_check_slots()` (both arches):
| MIN - MAX:
| luajit: src/lj_record.c:142: rec_check_slots: Assertion `lj_obj_equal(tv, &tvk)' failed.
| Aborted
After the patch we got inconsistent results on x86/x64 (same results in
the upstream):
| MIN - MAX:
| 1 1 1 1
| nan nan nan 1
|
| MAX - MIN:
| 1 1 1 1
| nan nan 1 1
But for aarch64 it is fine:
| MIN - MAX:
| 1 1 1 1
| nan nan nan nan
|
| MAX - MIN:
| 1 1 1 1
| nan nan nan nan
I suggest to add this testcase without checking values consistency, but
with FIXME: mark.
> /* -- Array bounds check elimination -------------------------------------- */
>
> /* Eliminate ABC across PHIs to handle t[i-1] forwarding case.
> @@ -2018,8 +1996,6 @@ LJFOLDF(comm_comp)
>
> LJFOLD(BAND any any)
> LJFOLD(BOR any any)
> -LJFOLD(MIN any any)
> -LJFOLD(MAX any any)
> LJFOLDF(comm_dup)
> {
> if (fins->op1 == fins->op2) /* x o x ==> x */
> @@ -2027,6 +2003,15 @@ LJFOLDF(comm_dup)
> return fold_comm_swap(J);
> }
>
> +LJFOLD(MIN any any)
> +LJFOLD(MAX any any)
> +LJFOLDF(comm_dup_minmax)
> +{
> + if (fins->op1 == fins->op2) /* x o x ==> x */
> + return LEFTFOLD;
> + return NEXTFOLD;
> +}
> +
No need to swap operands here to avoid side effects, so just continue.
This fixes (TT) on arm64.
We get the following mcode now:
| fcmp d2, d3 ; fcmp 1.0, nan
| fcsel d1, d3, d2, pl ; d1 == nan after this instruction
| ...
| fcmp d2, d1 ; fcmp 1.0, nan
| fcsel d0, d1, d2, pl ; d0 == nan after this instruction
`pl` (aka `CC_PL`) condition means that N flag is 0 [2], that
is true when we are comparing something with NaN. So, the value of the
first source register is taken.
After this particular chunk of the patch (comm_dup_minmax) plus
(reassoc_dup_minmax) (TT) output is the following on aarch64:
| 1 nan inf
| MIN: 1 nan
| nan nan nan nan
| nan nan nan nan
> LJFOLD(BXOR any any)
> LJFOLDF(comm_bxor)
> {
> diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c
> index c04459bd..ae4e0f15 100644
> --- a/src/lj_vmmath.c
> +++ b/src/lj_vmmath.c
> @@ -49,8 +49,8 @@ double lj_vm_foldarith(double x, double y, int op)
> case IR_ABS - IR_ADD: return fabs(x); break;
> #if LJ_HASJIT
> case IR_LDEXP - IR_ADD: return ldexp(x, (int)y); break;
> - case IR_MIN - IR_ADD: return x > y ? y : x; break;
> - case IR_MAX - IR_ADD: return x < y ? y : x; break;
> + case IR_MIN - IR_ADD: return x < y ? x : y; break;
> + case IR_MAX - IR_ADD: return x > y ? x : y; break;
(TTTTTT) This piece of the patch fixes inconsistency in
`fold_kfold_numarith()` (on x86/x64):
| $ LUA_PATH="src/?.lua;;" src/luajit -jdump=i -Ohotloop=1 -e '
| local min = math.min
| local max = math.max
|
| local r_assoc = {}
| print("MIN:")
| for i = 1, 4 do
| r_assoc[i] = min(min(7.1, 0/0), 1.1)
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n")
|
| print("MAX:")
| for i = 1, 4 do
| r_assoc[i] = max(max(7.1, 0/0), 1.1)
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| '
| MIN:
| ---- TRACE 1 start (command line):7
| luajit: /home/burii/reviews/luajit/minmax/src/lj_record.c:142: rec_check_slots: Assertion `lj_obj_equal(tv, &tvk)' failed.
| Aborted
NB: use 0/0 constant here according to `fold_kfold_numarith()`
semantics.
`tv` is nan(0x8000000000000), `tvk` is 7.1.
If we look in disassembled code of `lj_vm_foldarith()` we can see the
following:
| /* In our example x == 7.1, y == nan */
| case IR_MIN - IR_ADD: return x > y ? y : x; break;
| case IR_MAX - IR_ADD: return x < y ? y : x; break;
| ; case IR_MIN
| <lj_vm_foldarith+337>: movsd xmm0,QWORD PTR [rsp+0x18] ; xmm0 <- 7.1
| <lj_vm_foldarith+343>: comisd xmm0,QWORD PTR [rsp+0x10] ; comisd 7.1, nan
| <lj_vm_foldarith+349>: jbe <lj_vm_foldarith+358> ; >= ?
| <lj_vm_foldarith+351>: mov rax,QWORD PTR [rsp+0x10] ; return nan
| <lj_vm_foldarith+356>: jmp <lj_vm_foldarith+398> ;
| <lj_vm_foldarith+358>: mov rax,QWORD PTR [rsp+0x18] ; else return 7.1
| <lj_vm_foldarith+363>: jmp <lj_vm_foldarith+398> ;
| ; case IR_MAX
| <lj_vm_foldarith+365>: movsd xmm0,QWORD PTR [rsp+0x10] ; xmm0 <- nan
| <lj_vm_foldarith+371>: comisd xmm0,QWORD PTR [rsp+0x18] ; comisd nan, 7.1
| <lj_vm_foldarith+377>: jbe <lj_vm_foldarith+386> ; >= ?
| <lj_vm_foldarith+379>: mov rax,QWORD PTR [rsp+0x10] ; return nan
| <lj_vm_foldarith+384>: jmp <lj_vm_foldarith+398> ;
| <lj_vm_foldarith+386>: mov rax,QWORD PTR [rsp+0x18] ; else return 7.1
| <lj_vm_foldarith+391>: jmp <lj_vm_foldarith+398> ;
According to `comisd` documentation [4] in case when the one operand is NaN,
the result is unordered and ZF,PF,CF := 111. This means that `jbe`
condition is true (CF=1 or ZF=1)[5], so we return 7.1 (the first
operand) for case `IR_MIN`.
Q: Why it is the problem?
For answer lets see the `lj_ff_math_min()` and `lj_ff_math_max()` in the VM:
For number values we got the following:
|7:
| sseop xmm0, xmm1
Where `sseop` is `minsd`/`maxsd` instruction correspondingly.
From the instruction reference guides [6][7]:
| If only one value is a NaN (SNaN or QNaN) for this instruction, the
| second source operand, either a NaN or a valid floating-point value, is
| written to the result.
After the patch we got the following disassembled code for
`lj_vm_foldarith()`:
| ; case IR_MIN
| <lj_vm_foldarith+337>: movsd xmm0,QWORD PTR [rsp+0x10] ; xmm0 <- nan
| <lj_vm_foldarith+343>: comisd xmm0,QWORD PTR [rsp+0x18] ; comisd nan, 7.1
| <lj_vm_foldarith+349>: jbe <lj_vm_foldarith+358> ; >= ?
| <lj_vm_foldarith+351>: mov rax,QWORD PTR [rsp+0x18] ; return 7.1
| <lj_vm_foldarith+356>: jmp <lj_vm_foldarith+398> ;
| <lj_vm_foldarith+358>: mov rax,QWORD PTR [rsp+0x10] ; else return nan
| <lj_vm_foldarith+363>: jmp <lj_vm_foldarith+398> ;
| ; case IR_MAX
| <lj_vm_foldarith+365>: movsd xmm0,QWORD PTR [rsp+0x18] ; xmm0 <- 7.1
| <lj_vm_foldarith+371>: comisd xmm0,QWORD PTR [rsp+0x10] ; comisd 7.1, nan
| <lj_vm_foldarith+377>: jbe <lj_vm_foldarith+386> ; >= ?
| <lj_vm_foldarith+379>: mov rax,QWORD PTR [rsp+0x18] ; return 7.1
| <lj_vm_foldarith+384>: jmp <lj_vm_foldarith+398> ;
| <lj_vm_foldarith+386>: mov rax,QWORD PTR [rsp+0x10] ; else return nan
| <lj_vm_foldarith+391>: jmp <lj_vm_foldarith+398> ;
So now we always return the second operand (nan for case `IR_MIN`).
Side note: IMHO, instead of this behavior, it the NaN source operand
should be returned. Also, IINM, it makes the behaviour consistent with
PUC-Rio Lua 5.1.
NB:
For aarch64 this changes the follwing assembly:
| 62630: ldr d1, [sp, #40]
| 62634: ldr d0, [sp, #32]
| 62638: fcmpe d1, d0
| 6263c: b.le 62648 <lj_vm_foldarith+0x150>
| 62640: ldr d0, [sp, #32]
| 62644: b 62674 <lj_vm_foldarith+0x17c>
| 62648: ldr d0, [sp, #40]
| 6264c: b 62674 <lj_vm_foldarith+0x17c>
| 62650: ldr d1, [sp, #40]
| 62654: ldr d0, [sp, #32]
| 62658: fcmpe d1, d0
| 6265c: b.pl 62668 <lj_vm_foldarith+0x170> // b.nfrst
| 62660: ldr d0, [sp, #32]
| 62664: b 62674 <lj_vm_foldarith+0x17c>
| 62668: ldr d0, [sp, #40]
| 6266c: b 62674 <lj_vm_foldarith+0x17c>
To this one:
| 62620: ldr d1, [sp, #40]
| 62624: ldr d0, [sp, #32]
| 62628: fcmpe d1, d0
| 6262c: b.pl 62638 <lj_vm_foldarith+0x150> // b.nfrst
| 62630: ldr d0, [sp, #40]
| 62634: b 62664 <lj_vm_foldarith+0x17c>
| 62638: ldr d0, [sp, #32]
| 6263c: b 62664 <lj_vm_foldarith+0x17c>
| 62640: ldr d1, [sp, #40]
| 62644: ldr d0, [sp, #32]
| 62648: fcmpe d1, d0
| 6264c: b.le 62658 <lj_vm_foldarith+0x170>
| 62650: ldr d0, [sp, #40]
| 62654: b 62664 <lj_vm_foldarith+0x17c>
| 62658: ldr d0, [sp, #32]
| 6265c: b 62664 <lj_vm_foldarith+0x17c>
So, we should provide the same changes for VM|JIT generated mcode (and
this is why CC_LO/CC_HI are replaced with CC_LE/CC_PL)
> #endif
> default: return x;
> }
> diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
> index a29292f1..89faa03e 100644
> --- a/src/vm_arm.dasc
> +++ b/src/vm_arm.dasc
> @@ -1718,8 +1718,8 @@ static void build_subroutines(BuildCtx *ctx)
> |.endif
> |.endmacro
> |
> - | math_minmax math_min, gt, hi
> - | math_minmax math_max, lt, lo
> + | math_minmax math_min, gt, pl
> + | math_minmax math_max, lt, le
> |
> |//-- String library -----------------------------------------------------
> |
> diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
> index f517a808..2c1bb4f8 100644
> --- a/src/vm_arm64.dasc
> +++ b/src/vm_arm64.dasc
> @@ -1494,8 +1494,8 @@ static void build_subroutines(BuildCtx *ctx)
> | b <6
> |.endmacro
> |
> - | math_minmax math_min, gt, hi
> - | math_minmax math_max, lt, lo
> + | math_minmax math_min, gt, pl
> + | math_minmax math_max, lt, le
> |
> |//-- String library -----------------------------------------------------
> |
> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 59f117ba..faeb5181 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc
<snipped>
> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> index f7ffe5d2..1c995d16 100644
> --- a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc
<snipped>
> diff --git a/test/tarantool-tests/gh-6163-min-max.test.lua b/test/tarantool-tests/gh-6163-min-max.test.lua
> new file mode 100644
> index 00000000..1da8a259
> --- /dev/null
> +++ b/test/tarantool-tests/gh-6163-min-max.test.lua
> @@ -0,0 +1,48 @@
> +local tap = require('tap')
> +local test = tap.test('gh-6163-jit-min-max')
> +test:plan(3)
> +--
> +-- gh-6163: math.min/math.max inconsistencies.
> +--
> +
> +local function is_consistent(res)
> + for i = 1, #res - 1 do
> + if res[i] ~= res[i + 1] then
> + return false
> + end
> + end
> + return true
> +end
Side note: may be it is better to name it `array_is_consistent()` as far
as it checks only array table part?
Also, we probably can use it in our test utils.
> +
> +-- This function creates dirty values on the Lua stack.
> +-- The latter of them is going to be treated as an
> +-- argument by the `math.min/math.max`.
> +-- The first two of them are going to be overwritten
> +-- by the math function itself.
> +local function filler()
> + return 1, 1, 1
> +end
> +
> +-- Success with no args.
Don't get the comment.
Do you mean:
| -- `math.min()` should raise an error when is called without
| -- arguments.
> +filler()
> +local r, _ = pcall(function() math.min() end)
> +test:ok(false == r, 'math.min fails with no args')
Why don't use `not r` here?
> +
> +filler()
> +r, _ = pcall(function() math.max() end)
> +test:ok(false == r, 'math.max fails with no args')
Why do we need the second test?
Please, add the coressponding comment.
> +
> +-- Incorrect fold optimization.
Don't get your comment. What do you mean?
> +jit.off()
> +jit.flush()
Why do we need to remove previous traces before start?
> +jit.opt.start('hotloop=1')
> +jit.on()
> +
> +local res = {}
> +for i = 1, 4 do
> + res[i] = math.min(math.min(0/0, math.huge), math.huge)
NB: IINM, still inconsistent if we use
| local nan = 0/0
| -- ...
| res[i] = math.min(math.min(nan, math.huge), math.huge)
I suppose, that we should comment this.
> +end
> +
> +test:ok(is_consistent(res), '(a o b) o a -> a o b')
> +
> +os.exit(test:check() and 0 or 1)
> --
> 2.38.1
>
[1]: https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FCMP
[2]: https://www.cs.princeton.edu/courses/archive/spr19/cos217/reading/ArmInstructionSetOverview.pdf
[3]: https://developer.arm.com/documentation/100069/0608/A64-Floating-point-Instructions/FCSEL
[4]: https://www.felixcloutier.com/x86/comisd
[5]: https://www.felixcloutier.com/x86/jcc
[6]: https://www.felixcloutier.com/x86/minsd
[7]: https://www.felixcloutier.com/x86/maxsd
[8]: https://github.com/tarantool/luajit/commit/9b6c0cd8eafdd2e5a8a7ac4b33f6e33b3d8a93b9
--
Best regards,
Sergey Kaplun
^ permalink raw reply [flat|nested] 2+ messages in thread