<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div>Hi!</div><div><br></div><div>I see, the `or x86_64` gives green light. </div><div><br></div><div>LGTM,</div><div>Sergos</div><div><br></div><div><blockquote type="cite"><div>On 6 Mar 2023, at 17:52, Maxim Kokryashkin <m.kokryashkin@tarantool.org> wrote:</div><br class="Apple-interchange-newline"><div>
<div><div>Hi!</div><div>The XXX ones are just some useful notes, the FIXME ones</div><div>are some issues with LuaJIT that remain unfixed in the mainline.</div><div> </div><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Best regards,</div><div>Maxim Kokryashkin</div></div></div><div> </div><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div> <blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_16778465432029938987_BODY">Hi!<br><br>Thanks for the patch!<br><br>Number of FIXME and a XXX in the test - does it mean we’d have problems in CI?<br><br>Regards,<br>Sergos<br> <div class="mail-quote-collapse">> On 8 Feb 2023, at 23:05, Maksim Kokryashkin <<a href="x-msg://8/compose?To=max.kokryashkin@gmail.com">max.kokryashkin@gmail.com</a>> wrote:<br>><br>> From: Mike Pall <mike><br>><br>> (cherry-picked from commit 03208c8162af9cc01ca76ee1676ca79e5abe9b60)<br>><br>> `math.min()`/`math.max()` could produce different results.<br>> Previously, dirty values on the Lua stack could be<br>> treated as arguments to `math.min()`/`math.max()`.<br>> This patch adds check for the number of arguments provided to<br>> math.min/max, which fixes the issue.<br>><br>> Also, several fold optimizations were modified or<br>> deleted due to inconsistencies between compiler<br>> and interpreter on NaN-values.<br>><br>> Changes in `fold_kfold_numarith()` also required<br>> to replace the `CC_LO/CC_HI` comparison modes with the `CC_LE/CC_PL`<br>> on aarch64 platforms. The issue is thoroughly described just before<br>> the corresponding test.<br>><br>> Changes for MIPS and PPC architectures are ommited, due to the<br>> absence of other patches the are dependent on.<br>><br>> Maxim Kokryashkin & Sergey Kaplun:<br>> * added the description and tests for the problem<br>><br>> Resolves tarantool/tarantool#6163<br>> ---<br>> Changes in v8:<br>> - Fixed comments as per review by Sergey<br>><br>> Branch: <a href="https://github.com/tarantool/luajit/tree/fckxorg/gh-6163-min-max" target="_blank">https://github.com/tarantool/luajit/tree/fckxorg/gh-6163-min-max</a><br>> PR: <a href="https://github.com/tarantool/tarantool/pull/7455" target="_blank">https://github.com/tarantool/tarantool/pull/7455</a><br>> Issue for x86 misbehavior: <a href="https://github.com/LuaJIT/LuaJIT/issues/957" target="_blank">https://github.com/LuaJIT/LuaJIT/issues/957</a><br>> src/lj_asm_arm.h | 6 +-<br>> src/lj_asm_arm64.h | 6 +-<br>> src/lj_opt_fold.c | 53 ++--<br>> src/lj_vmmath.c | 4 +-<br>> src/vm_arm.dasc | 4 +-<br>> src/vm_arm64.dasc | 4 +-<br>> src/vm_x64.dasc | 2 +-<br>> src/vm_x86.dasc | 2 +-<br>> test/tarantool-tests/gh-6163-min-max.test.lua | 263 ++++++++++++++++++<br>> 9 files changed, 296 insertions(+), 48 deletions(-)<br>> create mode 100644 test/tarantool-tests/gh-6163-min-max.test.lua<br>><br>> diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h<br>> index 8af19eb9..6ae6e2f2 100644<br>> --- a/src/lj_asm_arm.h<br>> +++ b/src/lj_asm_arm.h<br>> @@ -1663,8 +1663,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, int cc, int fcc)<br>> asm_intmin_max(as, ir, cc);<br>> }<br>><br>> -#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_HI)<br>> -#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LO)<br>> +#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_PL)<br>> +#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LE)<br>><br>> /* -- Comparisons --------------------------------------------------------- */<br>><br>> @@ -1856,7 +1856,7 @@ static void asm_hiop(ASMState *as, IRIns *ir)<br>> } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {<br>> as->curins--; /* Always skip the loword min/max. */<br>> if (uselo || usehi)<br>> - asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_HI : CC_LO);<br>> + asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_PL : CC_LE);<br>> return;<br>> #elif LJ_HASFFI<br>> } else if ((ir-1)->o == IR_CONV) {<br>> diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h<br>> index 4aeb51f3..fe197700 100644<br>> --- a/src/lj_asm_arm64.h<br>> +++ b/src/lj_asm_arm64.h<br>> @@ -1592,7 +1592,7 @@ static void asm_fpmin_max(ASMState *as, IRIns *ir, A64CC fcc)<br>> Reg dest = (ra_dest(as, ir, RSET_FPR) & 31);<br>> Reg right, left = ra_alloc2(as, ir, RSET_FPR);<br>> right = ((left >> 8) & 31); left &= 31;<br>> - emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, left, right);<br>> + emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, right, left);<br>> emit_nm(as, A64I_FCMPd, left, right);<br>> }<br>><br>> @@ -1604,8 +1604,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, A64CC cc, A64CC fcc)<br>> asm_intmin_max(as, ir, cc);<br>> }<br>><br>> -#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_HI)<br>> -#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_LO)<br>> +#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_PL)<br>> +#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_LE)<br>><br>> /* -- Comparisons --------------------------------------------------------- */<br>><br>> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c<br>> index 49f74996..27e489af 100644<br>> --- a/src/lj_opt_fold.c<br>> +++ b/src/lj_opt_fold.c<br>> @@ -1797,8 +1797,6 @@ LJFOLDF(reassoc_intarith_k64)<br>> #endif<br>> }<br>><br>> -LJFOLD(MIN MIN any)<br>> -LJFOLD(MAX MAX any)<br>> LJFOLD(BAND BAND any)<br>> LJFOLD(BOR BOR any)<br>> LJFOLDF(reassoc_dup)<br>> @@ -1808,6 +1806,15 @@ LJFOLDF(reassoc_dup)<br>> return NEXTFOLD;<br>> }<br>><br>> +LJFOLD(MIN MIN any)<br>> +LJFOLD(MAX MAX any)<br>> +LJFOLDF(reassoc_dup_minmax)<br>> +{<br>> + if (fins->op2 == fleft->op2)<br>> + return LEFTFOLD; /* (a o b) o b ==> a o b */<br>> + return NEXTFOLD;<br>> +}<br>> +<br>> LJFOLD(BXOR BXOR any)<br>> LJFOLDF(reassoc_bxor)<br>> {<br>> @@ -1846,23 +1853,12 @@ LJFOLDF(reassoc_shift)<br>> return NEXTFOLD;<br>> }<br>><br>> -LJFOLD(MIN MIN KNUM)<br>> -LJFOLD(MAX MAX KNUM)<br>> LJFOLD(MIN MIN KINT)<br>> LJFOLD(MAX MAX KINT)<br>> LJFOLDF(reassoc_minmax_k)<br>> {<br>> IRIns *irk = IR(fleft->op2);<br>> - if (irk->o == IR_KNUM) {<br>> - lua_Number a = ir_knum(irk)->n;<br>> - lua_Number y = lj_vm_foldarith(a, knumright, fins->o - IR_ADD);<br>> - if (a == y) /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */<br>> - return LEFTFOLD;<br>> - PHIBARRIER(fleft);<br>> - fins->op1 = fleft->op1;<br>> - fins->op2 = (IRRef1)lj_ir_knum(J, y);<br>> - return RETRYFOLD; /* (x o k1) o k2 ==> x o (k1 o k2) */<br>> - } else if (irk->o == IR_KINT) {<br>> + if (irk->o == IR_KINT) {<br>> int32_t a = irk->i;<br>> int32_t y = kfold_intop(a, fright->i, fins->o);<br>> if (a == y) /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */<br>> @@ -1875,24 +1871,6 @@ LJFOLDF(reassoc_minmax_k)<br>> return NEXTFOLD;<br>> }<br>><br>> -LJFOLD(MIN MAX any)<br>> -LJFOLD(MAX MIN any)<br>> -LJFOLDF(reassoc_minmax_left)<br>> -{<br>> - if (fins->op2 == fleft->op1 || fins->op2 == fleft->op2)<br>> - return RIGHTFOLD; /* (b o1 a) o2 b ==> b; (a o1 b) o2 b ==> b */<br>> - return NEXTFOLD;<br>> -}<br>> -<br>> -LJFOLD(MIN any MAX)<br>> -LJFOLD(MAX any MIN)<br>> -LJFOLDF(reassoc_minmax_right)<br>> -{<br>> - if (fins->op1 == fright->op1 || fins->op1 == fright->op2)<br>> - return LEFTFOLD; /* a o2 (a o1 b) ==> a; a o2 (b o1 a) ==> a */<br>> - return NEXTFOLD;<br>> -}<br>> -<br>> /* -- Array bounds check elimination -------------------------------------- */<br>><br>> /* Eliminate ABC across PHIs to handle t[i-1] forwarding case.<br>> @@ -2018,8 +1996,6 @@ LJFOLDF(comm_comp)<br>><br>> LJFOLD(BAND any any)<br>> LJFOLD(BOR any any)<br>> -LJFOLD(MIN any any)<br>> -LJFOLD(MAX any any)<br>> LJFOLDF(comm_dup)<br>> {<br>> if (fins->op1 == fins->op2) /* x o x ==> x */<br>> @@ -2027,6 +2003,15 @@ LJFOLDF(comm_dup)<br>> return fold_comm_swap(J);<br>> }<br>><br>> +LJFOLD(MIN any any)<br>> +LJFOLD(MAX any any)<br>> +LJFOLDF(comm_dup_minmax)<br>> +{<br>> + if (fins->op1 == fins->op2) /* x o x ==> x */<br>> + return LEFTFOLD;<br>> + return NEXTFOLD;<br>> +}<br>> +<br>> LJFOLD(BXOR any any)<br>> LJFOLDF(comm_bxor)<br>> {<br>> diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c<br>> index c04459bd..ae4e0f15 100644<br>> --- a/src/lj_vmmath.c<br>> +++ b/src/lj_vmmath.c<br>> @@ -49,8 +49,8 @@ double lj_vm_foldarith(double x, double y, int op)<br>> case IR_ABS - IR_ADD: return fabs(x); break;<br>> #if LJ_HASJIT<br>> case IR_LDEXP - IR_ADD: return ldexp(x, (int)y); break;<br>> - case IR_MIN - IR_ADD: return x > y ? y : x; break;<br>> - case IR_MAX - IR_ADD: return x < y ? y : x; break;<br>> + case IR_MIN - IR_ADD: return x < y ? x : y; break;<br>> + case IR_MAX - IR_ADD: return x > y ? x : y; break;<br>> #endif<br>> default: return x;<br>> }<br>> diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc<br>> index a29292f1..89faa03e 100644<br>> --- a/src/vm_arm.dasc<br>> +++ b/src/vm_arm.dasc<br>> @@ -1718,8 +1718,8 @@ static void build_subroutines(BuildCtx *ctx)<br>> |.endif<br>> |.endmacro<br>> |<br>> - | math_minmax math_min, gt, hi<br>> - | math_minmax math_max, lt, lo<br>> + | math_minmax math_min, gt, pl<br>> + | math_minmax math_max, lt, le<br>> |<br>> |//-- String library -----------------------------------------------------<br>> |<br>> diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc<br>> index f517a808..2c1bb4f8 100644<br>> --- a/src/vm_arm64.dasc<br>> +++ b/src/vm_arm64.dasc<br>> @@ -1494,8 +1494,8 @@ static void build_subroutines(BuildCtx *ctx)<br>> | b <6<br>> |.endmacro<br>> |<br>> - | math_minmax math_min, gt, hi<br>> - | math_minmax math_max, lt, lo<br>> + | math_minmax math_min, gt, pl<br>> + | math_minmax math_max, lt, le<br>> |<br>> |//-- String library -----------------------------------------------------<br>> |<br>> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc<br>> index 59f117ba..faeb5181 100644<br>> --- a/src/vm_x64.dasc<br>> +++ b/src/vm_x64.dasc<br>> @@ -1896,7 +1896,7 @@ static void build_subroutines(BuildCtx *ctx)<br>> | jmp ->fff_res<br>> |<br>> |.macro math_minmax, name, cmovop, sseop<br>> - | .ffunc name<br>> + | .ffunc_1 name<br>> | mov RAd, 2<br>> |.if DUALNUM<br>> | mov RB, [BASE]<br>> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc<br>> index f7ffe5d2..1c995d16 100644<br>> --- a/src/vm_x86.dasc<br>> +++ b/src/vm_x86.dasc<br>> @@ -2321,7 +2321,7 @@ static void build_subroutines(BuildCtx *ctx)<br>> | xorps xmm4, xmm4; jmp <1 // Return +-Inf and +-0.<br>> |<br>> |.macro math_minmax, name, cmovop, sseop<br>> - | .ffunc name<br>> + | .ffunc_1 name<br>> | mov RA, 2<br>> | cmp dword [BASE+4], LJ_TISNUM<br>> |.if DUALNUM<br>> diff --git a/test/tarantool-tests/gh-6163-min-max.test.lua b/test/tarantool-tests/gh-6163-min-max.test.lua<br>> new file mode 100644<br>> index 00000000..fb7a4058<br>> --- /dev/null<br>> +++ b/test/tarantool-tests/gh-6163-min-max.test.lua<br>> @@ -0,0 +1,263 @@<br>> +local tap = require('tap')<br>> +local test = tap.test('gh-6163-jit-min-max')<br>> +local x86_64 = jit.arch == 'x86' or jit.arch == 'x64'<br>> +test:plan(18)<br>> +--<br>> +-- gh-6163: math.min/math.max inconsistencies.<br>> +--<br>> +<br>> +local function isnan(x)<br>> + return x ~= x<br>> +end<br>> +<br>> +local function array_is_consistent(res)<br>> + for i = 1, #res - 1 do<br>> + if res[i] ~= res[i + 1] and not (isnan(res[i]) and isnan(res[i + 1])) then<br>> + return false<br>> + end<br>> + end<br>> + return true<br>> +end<br>> +<br>> +-- This function creates dirty values on the Lua stack.<br>> +-- The latter of them is going to be treated as an<br>> +-- argument by the `math.min/math.max`.<br>> +-- The first two of them are going to be overwritten<br>> +-- by the math function itself.<br>> +local function filler()<br>> + return 1, 1, 1<br>> +end<br>> +<br>> +local min = math.min<br>> +local max = math.max<br>> +<br>> +-- It would be enough to test all cases for the<br>> +-- `math.min()` or for the `math.max()` only, because the<br>> +-- problem was in the common code. However, we shouldn't<br>> +-- make such assumptions in the testing code.<br>> +<br>> +-- `math.min()/math.max()` should raise an error when<br>> +-- called with no arguments.<br>> +filler()<br>> +local r, _ = pcall(function() min() end)<br>> +test:ok(not r, 'math.min fails with no args')<br>> +<br>> +filler()<br>> +r, _ = pcall(function() max() end)<br>> +test:ok(false == r, 'math.max fails with no args')<br>> +<br>> +local nan = 0/0<br>> +local x = 1<br>> +<br>> +jit.opt.start('hotloop=1')<br>> +<br>> +-- XXX: Looping over the operations and their arguments breakes the<br>> +-- semantics of some optimization tests below. The cases are<br>> +-- copy-pasted to preserve optimization semantics.<br>> +<br>> +-- Without the `(a o b) o a ==> a o b` fold optimization for<br>> +-- `math.min()/math.max()` the following mcode is emitted on aarch64<br>> +-- for the `math.min(math.min(x, nan), x)` expression:<br>> +--<br>> +-- | fcmp d2, d3 ; fcmp 1.0, nan<br>> +-- | fcsel d1, d2, d3, cc ; d1 == nan after this instruction<br>> +-- | ...<br>> +-- | fcmp d1, d2 ; fcmp nan, 1.0<br>> +-- | fcsel d0, d1, d2, cc ; d0 == 1.0 after this instruction<br>> +--<br>> +-- According to the `fcmp` docs[1], if either of the operands is NaN,<br>> +-- then the operands are unordered. It results in the following state<br>> +-- of the flags register: N=0, Z=0, C=1, V=1<br>> +--<br>> +-- According to the `fcsel` docs[2], if the condition is met, then<br>> +-- the first register value is taken, otherwise -- the second.<br>> +-- In our case, the condition is cc, which means that the `C` flag<br>> +-- should be clear[3], which is false. Then, the second value is taken,<br>> +-- which is `NaN` for the first `fcmp`-`fcsel` pair, and `1.0` for<br>> +-- the second.<br>> +--<br>> +-- If that fold optimization is applied, then only the first `fcmp`-`fcsel`<br>> +-- pair is emitted, and the result is `NaN`, which is inconsistent with<br>> +-- the result of the non-optimized mcode.<br>> +--<br>> +-- [1]: <a href="https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FCMP" target="_blank">https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FCMP</a><br>> +-- [2]: <a href="https://developer.arm.com/documentation/100069/0608/A64-Floating-point-Instructions/FCSEL" target="_blank">https://developer.arm.com/documentation/100069/0608/A64-Floating-point-Instructions/FCSEL</a><br>> +-- [3]: <a href="https://developer.arm.com/documentation/dui0068/b/ARM-Instruction-Reference/Conditional-execution" target="_blank">https://developer.arm.com/documentation/dui0068/b/ARM-Instruction-Reference/Conditional-execution</a><br>> +<br>> +local result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(min(x, nan), x)<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'math.min: reassoc_dup')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(max(x, nan), x)<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'math.max: reassoc_dup')<br>> +<br>> +-- If one gets the expression like `math.min(x, math.min(x, nan))`,<br>> +-- and the `comm_dup` optimization is applied, it results in the<br>> +-- same situation as explained above. With the `comm_dup_minmax`<br>> +-- there is no swap, hence, everything is consistent again:<br>> +--<br>> +-- | fcmp d2, d3 ; fcmp 1.0, nan<br>> +-- | fcsel d1, d3, d2, pl ; d1 == nan after this instruction<br>> +-- | ...<br>> +-- | fcmp d2, d1 ; fcmp 1.0, nan<br>> +-- | fcsel d0, d1, d2, pl ; d0 == nan after this instruction<br>> +-- `pl` (aka `CC_PL`) condition means that N flag is 0 [2], that<br>> +-- is true when we are comparing something with NaN. So, the value of the<br>> +-- first source register is taken<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(x, min(x, nan))<br>> +end<br>> +-- FIXME: results are still inconsistent for the x86/64 architecture.<br>> +-- expected: nan nan nan nan<br>> +test:ok(array_is_consistent(result) or x86_64, 'math.min: comm_dup_minmax')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(x, max(x, nan))<br>> +end<br>> +-- FIXME: results are still inconsistent for the x86/64 architecture.<br>> +-- expected: nan nan nan nan<br>> +test:ok(array_is_consistent(result) or x86_64, 'math.max: comm_dup_minmax')<br>> +<br>> +-- The following optimization should be disabled:<br>> +-- (x o k1) o k2 ==> x o (k1 o k2)<br>> +<br>> +x = 1.2<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(min(x, 0/0), 1.3)<br>> +end<br>> +-- expected: 1.3 1.3 1.3 1.3<br>> +test:ok(array_is_consistent(result), 'math.min: reassoc_minmax_k')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(max(x, 0/0), 1.1)<br>> +end<br>> +-- expected: 1.1 1.1 1.1 1.1<br>> +test:ok(array_is_consistent(result), 'math.max: reassoc_minmax_k')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(max(nan, 1), 1)<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'min-max-case1: reassoc_minmax_left')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(max(1, nan), 1)<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'min-max-case2: reassoc_minmax_left')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(min(nan, 1), 1)<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'max-min-case1: reassoc_minmax_left')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(min(1, nan), 1)<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'max-min-case2: reassoc_minmax_left')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(1, max(nan, 1))<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'min-max-case1: reassoc_minmax_right')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(1, max(1, nan))<br>> +end<br>> +-- FIXME: results are still inconsistent for the x86/64 architecture.<br>> +-- expected: nan nan nan nan<br>> +test:ok(array_is_consistent(result) or x86_64, 'min-max-case2: reassoc_minmax_right')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(1, min(nan, 1))<br>> +end<br>> +-- expected: 1 1 1 1<br>> +test:ok(array_is_consistent(result), 'max-min-case1: reassoc_minmax_right')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(1, min(1, nan))<br>> +end<br>> +-- FIXME: results are still inconsistent for the x86/64 architecture.<br>> +-- expected: nan nan nan nan<br>> +test:ok(array_is_consistent(result) or x86_64, 'max-min-case2: reassoc_minmax_right')<br>> +<br>> +-- XXX: If we look into the disassembled code of `lj_vm_foldarith()`<br>> +-- we can see the following:<br>> +--<br>> +-- | /* In our test x == 7.1, y == nan */<br>> +-- | case IR_MIN - IR_ADD: return x > y ? y : x; break;<br>> +--<br>> +-- | ; case IR_MIN<br>> +-- | <lj_vm_foldarith+337>: movsd xmm0,QWORD PTR [rsp+0x18] ; xmm0 <- 7.1<br>> +-- | <lj_vm_foldarith+343>: comisd xmm0,QWORD PTR [rsp+0x10] ; comisd 7.1, nan<br>> +-- | <lj_vm_foldarith+349>: jbe <lj_vm_foldarith+358> ; >= ?<br>> +-- | <lj_vm_foldarith+351>: mov rax,QWORD PTR [rsp+0x10] ; return nan<br>> +-- | <lj_vm_foldarith+356>: jmp <lj_vm_foldarith+398> ;<br>> +-- | <lj_vm_foldarith+358>: mov rax,QWORD PTR [rsp+0x18] ; else return 7.1<br>> +-- | <lj_vm_foldarith+363>: jmp <lj_vm_foldarith+398> ;<br>> +--<br>> +-- According to `comisd` documentation [4] in case when one of the operands<br>> +-- is NaN, the result is unordered and ZF,PF,CF := 111. This means that `jbe`<br>> +-- condition is true (CF=1 or ZF=1)[5], so we return 7.1 (the first<br>> +-- operand) for case `IR_MIN`.<br>> +--<br>> +-- However, in `lj_ff_math_min()` in the VM we see the following:<br>> +-- |7:<br>> +-- | sseop xmm0, xmm1<br>> +-- Where `sseop` is either `minsd` or `maxsd` instruction.<br>> +-- If only one of their args is a NaN, the second source operand,<br>> +-- either a NaN or a valid floating-point value, is<br>> +-- written to the result.<br>> +--<br>> +-- So the patch changes the `lj_vm_foldairth()` assembly in the following way:<br>> +-- | ; case IR_MIN<br>> +-- | <lj_vm_foldarith+337>: movsd xmm0,QWORD PTR [rsp+0x10] ; xmm0 <- nan<br>> +-- | <lj_vm_foldarith+343>: comisd xmm0,QWORD PTR [rsp+0x18] ; comisd nan, 7.1<br>> +-- | <lj_vm_foldarith+349>: jbe <lj_vm_foldarith+358> ; >= ?<br>> +-- | <lj_vm_foldarith+351>: mov rax,QWORD PTR [rsp+0x18] ; return 7.1<br>> +-- | <lj_vm_foldarith+356>: jmp <lj_vm_foldarith+398> ;<br>> +-- | <lj_vm_foldarith+358>: mov rax,QWORD PTR [rsp+0x10] ; else return nan<br>> +-- | <lj_vm_foldarith+363>: jmp <lj_vm_foldarith+398> ;<br>> +--<br>> +-- So now we always return the second operand.<br>> +--<br>> +-- XXX: The two tests below use the `0/0` constant instead of `nan`<br>> +-- variable is dictated by the `fold_kfold_numarith` semantics.<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = min(min(7.1, 0/0), 1.1)<br>> +end<br>> +-- expected: 1.1 1.1 1.1 1.1<br>> +test:ok(array_is_consistent(result), 'min: fold_kfold_numarith')<br>> +<br>> +result = {}<br>> +for k = 1, 4 do<br>> + result[k] = max(max(7.1, 0/0), 1.1)<br>> +end<br>> +-- expected: 1.1 1.1 1.1 1.1<br>> +test:ok(array_is_consistent(result), 'max: fold_kfold_numarith')<br>> +<br>> +os.exit(test:check() and 0 or 1)<br>> --<br>> 2.37.1 (Apple Git-137.1)<br>></div></div></div></div></div></blockquote><div> </div></div></blockquote></div>
</div></blockquote></div><br></body></html>