From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 3C3BC258C8C; Mon, 6 Mar 2023 17:57:41 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3C3BC258C8C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1678114661; bh=8arQmj3N4DoaQ/G1SKI+Euq7yEZUhX3P2+im9gIEZsY=; h=In-Reply-To:Date:References:To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=EWHnuxz7bzhK+KauWg8RN0s8fyikBZRezHhcPnYZy0Z34vomhMUtKSkRLtOK7xVOk XXWZo+AH1THFR+EgncGmz/XVWZu19r8KqlPxzM8jMLIF8ggoFalnzlN/d+xxqlJgfV lvijPztMNI9CoR0PhxOXiXu/AJjwkc9AnRb1wI9Q= Received: from smtp50.i.mail.ru (smtp50.i.mail.ru [95.163.41.92]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id DBFFB258C8C for ; Mon, 6 Mar 2023 17:57:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org DBFFB258C8C Received: by smtp50.i.mail.ru with esmtpa (envelope-from ) id 1pZCHS-001gzs-Ve; Mon, 06 Mar 2023 17:57:39 +0300 Content-Type: multipart/alternative; boundary="Apple-Mail=_D487FD51-E333-4823-913F-8DAE510AE418" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) X-Priority: 3 (Normal) In-Reply-To: <1678114360.413595907@f173.i.mail.ru> Date: Mon, 6 Mar 2023 17:57:28 +0300 Message-Id: <568C401F-6971-4C01-8C49-128A659A3449@tarantool.org> References: <20230208200550.48291-1-max.kokryashkin@gmail.com> <874063BF-3019-4BD8-9C7A-17BC9C325FEE@tarantool.org> <1678114360.413595907@f173.i.mail.ru> To: Maxim Kokryashkin X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Mailru-Src: smtp X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9BCEC41593EBD8357C44FF2494EBC845BD7AEF656CFC8618A182A05F538085040291F44E981A00305305D27A6F488240B3DC9E70B9B8F7AC1E2063CF805F68D78 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D348CDF4129B2EA85604717B1060D0439B1C017238F6D7F2CF1D938A0E28A78A33079077C853F1E4C531D7E09C32AA3244C581FF189F498DDF0749DDBB0711E402CA995755A1445935E927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojo8yGo04U2e3Y6EQXu8PsBg== X-Mailru-Sender: 11C2EC085EDE56FA38FD4C59F7EFE407D9CD7B426A54D69FD9BFFDC7ADF33D6BD008FC6E9D2C1B1B19381EE24192DF5555834048F03EF5D4C9A814A92B2E3B1BA4250FC3964EA4964198E0F3ECE9B5443453F38A29522196 X-Mras: OK Subject: Re: [Tarantool-patches] [PATCH luajit v8] Fix math.min()/math.max() inconsistencies. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Cc: Maksim Kokryashkin , tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" --Apple-Mail=_D487FD51-E333-4823-913F-8DAE510AE418 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi! I see, the `or x86_64` gives green light.=20 LGTM, Sergos > On 6 Mar 2023, at 17:52, Maxim Kokryashkin = wrote: >=20 > Hi! > The XXX ones are just some useful notes, the FIXME ones > are some issues with LuaJIT that remain unfixed in the mainline. > =20 > -- > Best regards, > Maxim Kokryashkin > =20 > =20 > =20 > Hi! >=20 > Thanks for the patch! >=20 > Number of FIXME and a XXX in the test - does it mean we=E2=80=99d have = problems in CI? >=20 > Regards, > Sergos > =20 > > On 8 Feb 2023, at 23:05, Maksim Kokryashkin = > wrote: > > > > From: Mike Pall > > > > (cherry-picked from commit 03208c8162af9cc01ca76ee1676ca79e5abe9b60) > > > > `math.min()`/`math.max()` could produce different results. > > Previously, dirty values on the Lua stack could be > > treated as arguments to `math.min()`/`math.max()`. > > This patch adds check for the number of arguments provided to > > math.min/max, which fixes the issue. > > > > Also, several fold optimizations were modified or > > deleted due to inconsistencies between compiler > > and interpreter on NaN-values. > > > > Changes in `fold_kfold_numarith()` also required > > to replace the `CC_LO/CC_HI` comparison modes with the `CC_LE/CC_PL` > > on aarch64 platforms. The issue is thoroughly described just before > > the corresponding test. > > > > Changes for MIPS and PPC architectures are ommited, due to the > > absence of other patches the are dependent on. > > > > Maxim Kokryashkin & Sergey Kaplun: > > * added the description and tests for the problem > > > > Resolves tarantool/tarantool#6163 > > --- > > Changes in v8: > > - Fixed comments as per review by Sergey > > > > Branch: = https://github.com/tarantool/luajit/tree/fckxorg/gh-6163-min-max > > PR: https://github.com/tarantool/tarantool/pull/7455 > > Issue for x86 misbehavior: = https://github.com/LuaJIT/LuaJIT/issues/957 > > src/lj_asm_arm.h | 6 +- > > src/lj_asm_arm64.h | 6 +- > > src/lj_opt_fold.c | 53 ++-- > > src/lj_vmmath.c | 4 +- > > src/vm_arm.dasc | 4 +- > > src/vm_arm64.dasc | 4 +- > > src/vm_x64.dasc | 2 +- > > src/vm_x86.dasc | 2 +- > > test/tarantool-tests/gh-6163-min-max.test.lua | 263 = ++++++++++++++++++ > > 9 files changed, 296 insertions(+), 48 deletions(-) > > create mode 100644 test/tarantool-tests/gh-6163-min-max.test.lua > > > > diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h > > index 8af19eb9..6ae6e2f2 100644 > > --- a/src/lj_asm_arm.h > > +++ b/src/lj_asm_arm.h > > @@ -1663,8 +1663,8 @@ static void asm_min_max(ASMState *as, IRIns = *ir, int cc, int fcc) > > asm_intmin_max(as, ir, cc); > > } > > > > -#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_HI) > > -#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LO) > > +#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_PL) > > +#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LE) > > > > /* -- Comparisons = --------------------------------------------------------- */ > > > > @@ -1856,7 +1856,7 @@ static void asm_hiop(ASMState *as, IRIns *ir) > > } else if ((ir-1)->o =3D=3D IR_MIN || (ir-1)->o =3D=3D IR_MAX) { > > as->curins--; /* Always skip the loword min/max. */ > > if (uselo || usehi) > > - asm_sfpmin_max(as, ir-1, (ir-1)->o =3D=3D IR_MIN ? CC_HI : CC_LO); > > + asm_sfpmin_max(as, ir-1, (ir-1)->o =3D=3D IR_MIN ? CC_PL : CC_LE); > > return; > > #elif LJ_HASFFI > > } else if ((ir-1)->o =3D=3D IR_CONV) { > > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > > index 4aeb51f3..fe197700 100644 > > --- a/src/lj_asm_arm64.h > > +++ b/src/lj_asm_arm64.h > > @@ -1592,7 +1592,7 @@ static void asm_fpmin_max(ASMState *as, IRIns = *ir, A64CC fcc) > > Reg dest =3D (ra_dest(as, ir, RSET_FPR) & 31); > > Reg right, left =3D ra_alloc2(as, ir, RSET_FPR); > > right =3D ((left >> 8) & 31); left &=3D 31; > > - emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, left, right); > > + emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, right, left); > > emit_nm(as, A64I_FCMPd, left, right); > > } > > > > @@ -1604,8 +1604,8 @@ static void asm_min_max(ASMState *as, IRIns = *ir, A64CC cc, A64CC fcc) > > asm_intmin_max(as, ir, cc); > > } > > > > -#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_HI) > > -#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_LO) > > +#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_PL) > > +#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_LE) > > > > /* -- Comparisons = --------------------------------------------------------- */ > > > > diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c > > index 49f74996..27e489af 100644 > > --- a/src/lj_opt_fold.c > > +++ b/src/lj_opt_fold.c > > @@ -1797,8 +1797,6 @@ LJFOLDF(reassoc_intarith_k64) > > #endif > > } > > > > -LJFOLD(MIN MIN any) > > -LJFOLD(MAX MAX any) > > LJFOLD(BAND BAND any) > > LJFOLD(BOR BOR any) > > LJFOLDF(reassoc_dup) > > @@ -1808,6 +1806,15 @@ LJFOLDF(reassoc_dup) > > return NEXTFOLD; > > } > > > > +LJFOLD(MIN MIN any) > > +LJFOLD(MAX MAX any) > > +LJFOLDF(reassoc_dup_minmax) > > +{ > > + if (fins->op2 =3D=3D fleft->op2) > > + return LEFTFOLD; /* (a o b) o b =3D=3D> a o b */ > > + return NEXTFOLD; > > +} > > + > > LJFOLD(BXOR BXOR any) > > LJFOLDF(reassoc_bxor) > > { > > @@ -1846,23 +1853,12 @@ LJFOLDF(reassoc_shift) > > return NEXTFOLD; > > } > > > > -LJFOLD(MIN MIN KNUM) > > -LJFOLD(MAX MAX KNUM) > > LJFOLD(MIN MIN KINT) > > LJFOLD(MAX MAX KINT) > > LJFOLDF(reassoc_minmax_k) > > { > > IRIns *irk =3D IR(fleft->op2); > > - if (irk->o =3D=3D IR_KNUM) { > > - lua_Number a =3D ir_knum(irk)->n; > > - lua_Number y =3D lj_vm_foldarith(a, knumright, fins->o - IR_ADD); > > - if (a =3D=3D y) /* (x o k1) o k2 =3D=3D> x o k1, if (k1 o k2) =3D=3D= k1. */ > > - return LEFTFOLD; > > - PHIBARRIER(fleft); > > - fins->op1 =3D fleft->op1; > > - fins->op2 =3D (IRRef1)lj_ir_knum(J, y); > > - return RETRYFOLD; /* (x o k1) o k2 =3D=3D> x o (k1 o k2) */ > > - } else if (irk->o =3D=3D IR_KINT) { > > + if (irk->o =3D=3D IR_KINT) { > > int32_t a =3D irk->i; > > int32_t y =3D kfold_intop(a, fright->i, fins->o); > > if (a =3D=3D y) /* (x o k1) o k2 =3D=3D> x o k1, if (k1 o k2) =3D=3D = k1. */ > > @@ -1875,24 +1871,6 @@ LJFOLDF(reassoc_minmax_k) > > return NEXTFOLD; > > } > > > > -LJFOLD(MIN MAX any) > > -LJFOLD(MAX MIN any) > > -LJFOLDF(reassoc_minmax_left) > > -{ > > - if (fins->op2 =3D=3D fleft->op1 || fins->op2 =3D=3D fleft->op2) > > - return RIGHTFOLD; /* (b o1 a) o2 b =3D=3D> b; (a o1 b) o2 b =3D=3D> = b */ > > - return NEXTFOLD; > > -} > > - > > -LJFOLD(MIN any MAX) > > -LJFOLD(MAX any MIN) > > -LJFOLDF(reassoc_minmax_right) > > -{ > > - if (fins->op1 =3D=3D fright->op1 || fins->op1 =3D=3D fright->op2) > > - return LEFTFOLD; /* a o2 (a o1 b) =3D=3D> a; a o2 (b o1 a) =3D=3D> = a */ > > - return NEXTFOLD; > > -} > > - > > /* -- Array bounds check elimination = -------------------------------------- */ > > > > /* Eliminate ABC across PHIs to handle t[i-1] forwarding case. > > @@ -2018,8 +1996,6 @@ LJFOLDF(comm_comp) > > > > LJFOLD(BAND any any) > > LJFOLD(BOR any any) > > -LJFOLD(MIN any any) > > -LJFOLD(MAX any any) > > LJFOLDF(comm_dup) > > { > > if (fins->op1 =3D=3D fins->op2) /* x o x =3D=3D> x */ > > @@ -2027,6 +2003,15 @@ LJFOLDF(comm_dup) > > return fold_comm_swap(J); > > } > > > > +LJFOLD(MIN any any) > > +LJFOLD(MAX any any) > > +LJFOLDF(comm_dup_minmax) > > +{ > > + if (fins->op1 =3D=3D fins->op2) /* x o x =3D=3D> x */ > > + return LEFTFOLD; > > + return NEXTFOLD; > > +} > > + > > LJFOLD(BXOR any any) > > LJFOLDF(comm_bxor) > > { > > diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c > > index c04459bd..ae4e0f15 100644 > > --- a/src/lj_vmmath.c > > +++ b/src/lj_vmmath.c > > @@ -49,8 +49,8 @@ double lj_vm_foldarith(double x, double y, int op) > > case IR_ABS - IR_ADD: return fabs(x); break; > > #if LJ_HASJIT > > case IR_LDEXP - IR_ADD: return ldexp(x, (int)y); break; > > - case IR_MIN - IR_ADD: return x > y ? y : x; break; > > - case IR_MAX - IR_ADD: return x < y ? y : x; break; > > + case IR_MIN - IR_ADD: return x < y ? x : y; break; > > + case IR_MAX - IR_ADD: return x > y ? x : y; break; > > #endif > > default: return x; > > } > > diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc > > index a29292f1..89faa03e 100644 > > --- a/src/vm_arm.dasc > > +++ b/src/vm_arm.dasc > > @@ -1718,8 +1718,8 @@ static void build_subroutines(BuildCtx *ctx) > > |.endif > > |.endmacro > > | > > - | math_minmax math_min, gt, hi > > - | math_minmax math_max, lt, lo > > + | math_minmax math_min, gt, pl > > + | math_minmax math_max, lt, le > > | > > |//-- String library = ----------------------------------------------------- > > | > > diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc > > index f517a808..2c1bb4f8 100644 > > --- a/src/vm_arm64.dasc > > +++ b/src/vm_arm64.dasc > > @@ -1494,8 +1494,8 @@ static void build_subroutines(BuildCtx *ctx) > > | b <6 > > |.endmacro > > | > > - | math_minmax math_min, gt, hi > > - | math_minmax math_max, lt, lo > > + | math_minmax math_min, gt, pl > > + | math_minmax math_max, lt, le > > | > > |//-- String library = ----------------------------------------------------- > > | > > diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc > > index 59f117ba..faeb5181 100644 > > --- a/src/vm_x64.dasc > > +++ b/src/vm_x64.dasc > > @@ -1896,7 +1896,7 @@ static void build_subroutines(BuildCtx *ctx) > > | jmp ->fff_res > > | > > |.macro math_minmax, name, cmovop, sseop > > - | .ffunc name > > + | .ffunc_1 name > > | mov RAd, 2 > > |.if DUALNUM > > | mov RB, [BASE] > > diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc > > index f7ffe5d2..1c995d16 100644 > > --- a/src/vm_x86.dasc > > +++ b/src/vm_x86.dasc > > @@ -2321,7 +2321,7 @@ static void build_subroutines(BuildCtx *ctx) > > | xorps xmm4, xmm4; jmp <1 // Return +-Inf and +-0. > > | > > |.macro math_minmax, name, cmovop, sseop > > - | .ffunc name > > + | .ffunc_1 name > > | mov RA, 2 > > | cmp dword [BASE+4], LJ_TISNUM > > |.if DUALNUM > > diff --git a/test/tarantool-tests/gh-6163-min-max.test.lua = b/test/tarantool-tests/gh-6163-min-max.test.lua > > new file mode 100644 > > index 00000000..fb7a4058 > > --- /dev/null > > +++ b/test/tarantool-tests/gh-6163-min-max.test.lua > > @@ -0,0 +1,263 @@ > > +local tap =3D require('tap') > > +local test =3D tap.test('gh-6163-jit-min-max') > > +local x86_64 =3D jit.arch =3D=3D 'x86' or jit.arch =3D=3D 'x64' > > +test:plan(18) > > +-- > > +-- gh-6163: math.min/math.max inconsistencies. > > +-- > > + > > +local function isnan(x) > > + return x ~=3D x > > +end > > + > > +local function array_is_consistent(res) > > + for i =3D 1, #res - 1 do > > + if res[i] ~=3D res[i + 1] and not (isnan(res[i]) and isnan(res[i + = 1])) then > > + return false > > + end > > + end > > + return true > > +end > > + > > +-- This function creates dirty values on the Lua stack. > > +-- The latter of them is going to be treated as an > > +-- argument by the `math.min/math.max`. > > +-- The first two of them are going to be overwritten > > +-- by the math function itself. > > +local function filler() > > + return 1, 1, 1 > > +end > > + > > +local min =3D math.min > > +local max =3D math.max > > + > > +-- It would be enough to test all cases for the > > +-- `math.min()` or for the `math.max()` only, because the > > +-- problem was in the common code. However, we shouldn't > > +-- make such assumptions in the testing code. > > + > > +-- `math.min()/math.max()` should raise an error when > > +-- called with no arguments. > > +filler() > > +local r, _ =3D pcall(function() min() end) > > +test:ok(not r, 'math.min fails with no args') > > + > > +filler() > > +r, _ =3D pcall(function() max() end) > > +test:ok(false =3D=3D r, 'math.max fails with no args') > > + > > +local nan =3D 0/0 > > +local x =3D 1 > > + > > +jit.opt.start('hotloop=3D1') > > + > > +-- XXX: Looping over the operations and their arguments breakes the > > +-- semantics of some optimization tests below. The cases are > > +-- copy-pasted to preserve optimization semantics. > > + > > +-- Without the `(a o b) o a =3D=3D> a o b` fold optimization for > > +-- `math.min()/math.max()` the following mcode is emitted on = aarch64 > > +-- for the `math.min(math.min(x, nan), x)` expression: > > +-- > > +-- | fcmp d2, d3 ; fcmp 1.0, nan > > +-- | fcsel d1, d2, d3, cc ; d1 =3D=3D nan after this instruction > > +-- | ... > > +-- | fcmp d1, d2 ; fcmp nan, 1.0 > > +-- | fcsel d0, d1, d2, cc ; d0 =3D=3D 1.0 after this instruction > > +-- > > +-- According to the `fcmp` docs[1], if either of the operands is = NaN, > > +-- then the operands are unordered. It results in the following = state > > +-- of the flags register: N=3D0, Z=3D0, C=3D1, V=3D1 > > +-- > > +-- According to the `fcsel` docs[2], if the condition is met, then > > +-- the first register value is taken, otherwise -- the second. > > +-- In our case, the condition is cc, which means that the `C` flag > > +-- should be clear[3], which is false. Then, the second value is = taken, > > +-- which is `NaN` for the first `fcmp`-`fcsel` pair, and `1.0` for > > +-- the second. > > +-- > > +-- If that fold optimization is applied, then only the first = `fcmp`-`fcsel` > > +-- pair is emitted, and the result is `NaN`, which is inconsistent = with > > +-- the result of the non-optimized mcode. > > +-- > > +-- [1]: = https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instr= uctions/FCMP > > +-- [2]: = https://developer.arm.com/documentation/100069/0608/A64-Floating-point-Ins= tructions/FCSEL > > +-- [3]: = https://developer.arm.com/documentation/dui0068/b/ARM-Instruction-Referenc= e/Conditional-execution > > + > > +local result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(min(x, nan), x) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'math.min: reassoc_dup') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(max(x, nan), x) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'math.max: reassoc_dup') > > + > > +-- If one gets the expression like `math.min(x, math.min(x, nan))`, > > +-- and the `comm_dup` optimization is applied, it results in the > > +-- same situation as explained above. With the `comm_dup_minmax` > > +-- there is no swap, hence, everything is consistent again: > > +-- > > +-- | fcmp d2, d3 ; fcmp 1.0, nan > > +-- | fcsel d1, d3, d2, pl ; d1 =3D=3D nan after this instruction > > +-- | ... > > +-- | fcmp d2, d1 ; fcmp 1.0, nan > > +-- | fcsel d0, d1, d2, pl ; d0 =3D=3D nan after this instruction > > +-- `pl` (aka `CC_PL`) condition means that N flag is 0 [2], that > > +-- is true when we are comparing something with NaN. So, the value = of the > > +-- first source register is taken > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(x, min(x, nan)) > > +end > > +-- FIXME: results are still inconsistent for the x86/64 = architecture. > > +-- expected: nan nan nan nan > > +test:ok(array_is_consistent(result) or x86_64, 'math.min: = comm_dup_minmax') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(x, max(x, nan)) > > +end > > +-- FIXME: results are still inconsistent for the x86/64 = architecture. > > +-- expected: nan nan nan nan > > +test:ok(array_is_consistent(result) or x86_64, 'math.max: = comm_dup_minmax') > > + > > +-- The following optimization should be disabled: > > +-- (x o k1) o k2 =3D=3D> x o (k1 o k2) > > + > > +x =3D 1.2 > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(min(x, 0/0), 1.3) > > +end > > +-- expected: 1.3 1.3 1.3 1.3 > > +test:ok(array_is_consistent(result), 'math.min: reassoc_minmax_k') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(max(x, 0/0), 1.1) > > +end > > +-- expected: 1.1 1.1 1.1 1.1 > > +test:ok(array_is_consistent(result), 'math.max: reassoc_minmax_k') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(max(nan, 1), 1) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'min-max-case1: = reassoc_minmax_left') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(max(1, nan), 1) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'min-max-case2: = reassoc_minmax_left') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(min(nan, 1), 1) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'max-min-case1: = reassoc_minmax_left') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(min(1, nan), 1) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'max-min-case2: = reassoc_minmax_left') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(1, max(nan, 1)) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'min-max-case1: = reassoc_minmax_right') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(1, max(1, nan)) > > +end > > +-- FIXME: results are still inconsistent for the x86/64 = architecture. > > +-- expected: nan nan nan nan > > +test:ok(array_is_consistent(result) or x86_64, 'min-max-case2: = reassoc_minmax_right') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(1, min(nan, 1)) > > +end > > +-- expected: 1 1 1 1 > > +test:ok(array_is_consistent(result), 'max-min-case1: = reassoc_minmax_right') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(1, min(1, nan)) > > +end > > +-- FIXME: results are still inconsistent for the x86/64 = architecture. > > +-- expected: nan nan nan nan > > +test:ok(array_is_consistent(result) or x86_64, 'max-min-case2: = reassoc_minmax_right') > > + > > +-- XXX: If we look into the disassembled code of = `lj_vm_foldarith()` > > +-- we can see the following: > > +-- > > +-- | /* In our test x =3D=3D 7.1, y =3D=3D nan */ > > +-- | case IR_MIN - IR_ADD: return x > y ? y : x; break; > > +-- > > +-- | ; case IR_MIN > > +-- | : movsd xmm0,QWORD PTR [rsp+0x18] ; xmm0 = <- 7.1 > > +-- | : comisd xmm0,QWORD PTR [rsp+0x10] ; = comisd 7.1, nan > > +-- | : jbe ; >=3D ? > > +-- | : mov rax,QWORD PTR [rsp+0x10] ; return = nan > > +-- | : jmp ; > > +-- | : mov rax,QWORD PTR [rsp+0x18] ; else = return 7.1 > > +-- | : jmp ; > > +-- > > +-- According to `comisd` documentation [4] in case when one of the = operands > > +-- is NaN, the result is unordered and ZF,PF,CF :=3D 111. This = means that `jbe` > > +-- condition is true (CF=3D1 or ZF=3D1)[5], so we return 7.1 (the = first > > +-- operand) for case `IR_MIN`. > > +-- > > +-- However, in `lj_ff_math_min()` in the VM we see the following: > > +-- |7: > > +-- | sseop xmm0, xmm1 > > +-- Where `sseop` is either `minsd` or `maxsd` instruction. > > +-- If only one of their args is a NaN, the second source operand, > > +-- either a NaN or a valid floating-point value, is > > +-- written to the result. > > +-- > > +-- So the patch changes the `lj_vm_foldairth()` assembly in the = following way: > > +-- | ; case IR_MIN > > +-- | : movsd xmm0,QWORD PTR [rsp+0x10] ; xmm0 = <- nan > > +-- | : comisd xmm0,QWORD PTR [rsp+0x18] ; = comisd nan, 7.1 > > +-- | : jbe ; >=3D ? > > +-- | : mov rax,QWORD PTR [rsp+0x18] ; return = 7.1 > > +-- | : jmp ; > > +-- | : mov rax,QWORD PTR [rsp+0x10] ; else = return nan > > +-- | : jmp ; > > +-- > > +-- So now we always return the second operand. > > +-- > > +-- XXX: The two tests below use the `0/0` constant instead of `nan` > > +-- variable is dictated by the `fold_kfold_numarith` semantics. > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D min(min(7.1, 0/0), 1.1) > > +end > > +-- expected: 1.1 1.1 1.1 1.1 > > +test:ok(array_is_consistent(result), 'min: fold_kfold_numarith') > > + > > +result =3D {} > > +for k =3D 1, 4 do > > + result[k] =3D max(max(7.1, 0/0), 1.1) > > +end > > +-- expected: 1.1 1.1 1.1 1.1 > > +test:ok(array_is_consistent(result), 'max: fold_kfold_numarith') > > + > > +os.exit(test:check() and 0 or 1) > > -- > > 2.37.1 (Apple Git-137.1) > > > =20 --Apple-Mail=_D487FD51-E333-4823-913F-8DAE510AE418 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Hi!

I see, the `or = x86_64` gives green = light. 

LGTM,
Sergos
On 6 Mar 2023, at 17:52, = Maxim Kokryashkin <m.kokryashkin@tarantool.org> wrote:

Hi!
The XXX ones are just some = useful notes, the FIXME ones
are some issues with = LuaJIT that remain unfixed in the mainline.
 
--
Best = regards,
Maxim = Kokryashkin
 
 
 
Hi!

Thanks for the = patch!

Number of FIXME and a XXX in the test - does it mean = we=E2=80=99d have problems in = CI?

Regards,
Sergos
 
> On 8 Feb 2023, at 23:05, Maksim = Kokryashkin <max.kokryashkin@= gmail.com> wrote:
>
> From: Mike Pall = <mike>
>
> (cherry-picked from commit = 03208c8162af9cc01ca76ee1676ca79e5abe9b60)
>
> = `math.min()`/`math.max()` could produce different results.
> = Previously, dirty values on the Lua stack could be
> treated as = arguments to `math.min()`/`math.max()`.
> This patch adds check = for the number of arguments provided to
> math.min/max, which = fixes the issue.
>
> Also, several fold optimizations were = modified or
> deleted due to inconsistencies between = compiler
> and interpreter on NaN-values.
>
> Changes = in `fold_kfold_numarith()` also required
> to replace the = `CC_LO/CC_HI` comparison modes with the `CC_LE/CC_PL`
> on aarch64 = platforms. The issue is thoroughly described just before
> the = corresponding test.
>
> Changes for MIPS and PPC = architectures are ommited, due to the
> absence of other patches = the are dependent on.
>
> Maxim Kokryashkin & Sergey = Kaplun:
> * added the description and tests for the = problem
>
> Resolves tarantool/tarantool#6163
> = ---
> Changes in v8:
> - Fixed comments as per review by = Sergey
>
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-6163= -min-max
> PR: https://github.com/tarantool/tarantool/pull/7455
= > Issue for x86 misbehavior: https://github.com/LuaJIT/LuaJIT/issues/957
> = src/lj_asm_arm.h | 6 +-
> src/lj_asm_arm64.h | 6 +-
> = src/lj_opt_fold.c | 53 ++--
> src/lj_vmmath.c | 4 +-
> = src/vm_arm.dasc | 4 +-
> src/vm_arm64.dasc | 4 +-
> = src/vm_x64.dasc | 2 +-
> src/vm_x86.dasc | 2 +-
> = test/tarantool-tests/gh-6163-min-max.test.lua | 263 = ++++++++++++++++++
> 9 files changed, 296 insertions(+), 48 = deletions(-)
> create mode 100644 = test/tarantool-tests/gh-6163-min-max.test.lua
>
> diff --git = a/src/lj_asm_arm.h b/src/lj_asm_arm.h
> index 8af19eb9..6ae6e2f2 = 100644
> --- a/src/lj_asm_arm.h
> +++ = b/src/lj_asm_arm.h
> @@ -1663,8 +1663,8 @@ static void = asm_min_max(ASMState *as, IRIns *ir, int cc, int fcc)
> = asm_intmin_max(as, ir, cc);
> }
>
> -#define = asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_HI)
> -#define = asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LO)
> +#define = asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_PL)
> +#define = asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LE)
>
> /* -- = Comparisons --------------------------------------------------------- = */
>
> @@ -1856,7 +1856,7 @@ static void asm_hiop(ASMState = *as, IRIns *ir)
> } else if ((ir-1)->o =3D=3D IR_MIN || = (ir-1)->o =3D=3D IR_MAX) {
> as->curins--; /* Always skip = the loword min/max. */
> if (uselo || usehi)
> - = asm_sfpmin_max(as, ir-1, (ir-1)->o =3D=3D IR_MIN ? CC_HI : = CC_LO);
> + asm_sfpmin_max(as, ir-1, (ir-1)->o =3D=3D IR_MIN ? = CC_PL : CC_LE);
> return;
> #elif LJ_HASFFI
> } else = if ((ir-1)->o =3D=3D IR_CONV) {
> diff --git = a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
> index = 4aeb51f3..fe197700 100644
> --- a/src/lj_asm_arm64.h
> +++ = b/src/lj_asm_arm64.h
> @@ -1592,7 +1592,7 @@ static void = asm_fpmin_max(ASMState *as, IRIns *ir, A64CC fcc)
> Reg dest =3D = (ra_dest(as, ir, RSET_FPR) & 31);
> Reg right, left =3D = ra_alloc2(as, ir, RSET_FPR);
> right =3D ((left >> 8) & = 31); left &=3D 31;
> - emit_dnm(as, A64I_FCSELd | = A64F_CC(fcc), dest, left, right);
> + emit_dnm(as, A64I_FCSELd | = A64F_CC(fcc), dest, right, left);
> emit_nm(as, A64I_FCMPd, left, = right);
> }
>
> @@ -1604,8 +1604,8 @@ static void = asm_min_max(ASMState *as, IRIns *ir, A64CC cc, A64CC fcc)
> = asm_intmin_max(as, ir, cc);
> }
>
> -#define = asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_HI)
> -#define = asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_LO)
> +#define = asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_PL)
> +#define = asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_LE)
>
> /* -- = Comparisons --------------------------------------------------------- = */
>
> diff --git a/src/lj_opt_fold.c = b/src/lj_opt_fold.c
> index 49f74996..27e489af 100644
> --- = a/src/lj_opt_fold.c
> +++ b/src/lj_opt_fold.c
> @@ -1797,8 = +1797,6 @@ LJFOLDF(reassoc_intarith_k64)
> #endif
> = }
>
> -LJFOLD(MIN MIN any)
> -LJFOLD(MAX MAX = any)
> LJFOLD(BAND BAND any)
> LJFOLD(BOR BOR any)
> = LJFOLDF(reassoc_dup)
> @@ -1808,6 +1806,15 @@ = LJFOLDF(reassoc_dup)
> return NEXTFOLD;
> }
>
> = +LJFOLD(MIN MIN any)
> +LJFOLD(MAX MAX any)
> = +LJFOLDF(reassoc_dup_minmax)
> +{
> + if (fins->op2 =3D=3D = fleft->op2)
> + return LEFTFOLD; /* (a o b) o b =3D=3D> a o = b */
> + return NEXTFOLD;
> +}
> +
> LJFOLD(BXOR = BXOR any)
> LJFOLDF(reassoc_bxor)
> {
> @@ -1846,23 = +1853,12 @@ LJFOLDF(reassoc_shift)
> return NEXTFOLD;
> = }
>
> -LJFOLD(MIN MIN KNUM)
> -LJFOLD(MAX MAX = KNUM)
> LJFOLD(MIN MIN KINT)
> LJFOLD(MAX MAX KINT)
> = LJFOLDF(reassoc_minmax_k)
> {
> IRIns *irk =3D = IR(fleft->op2);
> - if (irk->o =3D=3D IR_KNUM) {
> - = lua_Number a =3D ir_knum(irk)->n;
> - lua_Number y =3D = lj_vm_foldarith(a, knumright, fins->o - IR_ADD);
> - if (a =3D=3D= y) /* (x o k1) o k2 =3D=3D> x o k1, if (k1 o k2) =3D=3D k1. = */
> - return LEFTFOLD;
> - PHIBARRIER(fleft);
> - = fins->op1 =3D fleft->op1;
> - fins->op2 =3D = (IRRef1)lj_ir_knum(J, y);
> - return RETRYFOLD; /* (x o k1) o k2 = =3D=3D> x o (k1 o k2) */
> - } else if (irk->o =3D=3D = IR_KINT) {
> + if (irk->o =3D=3D IR_KINT) {
> int32_t a =3D= irk->i;
> int32_t y =3D kfold_intop(a, fright->i, = fins->o);
> if (a =3D=3D y) /* (x o k1) o k2 =3D=3D> x o k1, = if (k1 o k2) =3D=3D k1. */
> @@ -1875,24 +1871,6 @@ = LJFOLDF(reassoc_minmax_k)
> return NEXTFOLD;
> = }
>
> -LJFOLD(MIN MAX any)
> -LJFOLD(MAX MIN = any)
> -LJFOLDF(reassoc_minmax_left)
> -{
> - if = (fins->op2 =3D=3D fleft->op1 || fins->op2 =3D=3D = fleft->op2)
> - return RIGHTFOLD; /* (b o1 a) o2 b =3D=3D> = b; (a o1 b) o2 b =3D=3D> b */
> - return NEXTFOLD;
> = -}
> -
> -LJFOLD(MIN any MAX)
> -LJFOLD(MAX any = MIN)
> -LJFOLDF(reassoc_minmax_right)
> -{
> - if = (fins->op1 =3D=3D fright->op1 || fins->op1 =3D=3D = fright->op2)
> - return LEFTFOLD; /* a o2 (a o1 b) =3D=3D> = a; a o2 (b o1 a) =3D=3D> a */
> - return NEXTFOLD;
> = -}
> -
> /* -- Array bounds check elimination = -------------------------------------- */
>
> /* Eliminate = ABC across PHIs to handle t[i-1] forwarding case.
> @@ -2018,8 = +1996,6 @@ LJFOLDF(comm_comp)
>
> LJFOLD(BAND any = any)
> LJFOLD(BOR any any)
> -LJFOLD(MIN any any)
> = -LJFOLD(MAX any any)
> LJFOLDF(comm_dup)
> {
> if = (fins->op1 =3D=3D fins->op2) /* x o x =3D=3D> x */
> @@ = -2027,6 +2003,15 @@ LJFOLDF(comm_dup)
> return = fold_comm_swap(J);
> }
>
> +LJFOLD(MIN any = any)
> +LJFOLD(MAX any any)
> = +LJFOLDF(comm_dup_minmax)
> +{
> + if (fins->op1 =3D=3D = fins->op2) /* x o x =3D=3D> x */
> + return = LEFTFOLD;
> + return NEXTFOLD;
> +}
> +
> = LJFOLD(BXOR any any)
> LJFOLDF(comm_bxor)
> {
> diff = --git a/src/lj_vmmath.c b/src/lj_vmmath.c
> index = c04459bd..ae4e0f15 100644
> --- a/src/lj_vmmath.c
> +++ = b/src/lj_vmmath.c
> @@ -49,8 +49,8 @@ double = lj_vm_foldarith(double x, double y, int op)
> case IR_ABS - = IR_ADD: return fabs(x); break;
> #if LJ_HASJIT
> case = IR_LDEXP - IR_ADD: return ldexp(x, (int)y); break;
> - case IR_MIN = - IR_ADD: return x > y ? y : x; break;
> - case IR_MAX - = IR_ADD: return x < y ? y : x; break;
> + case IR_MIN - IR_ADD: = return x < y ? x : y; break;
> + case IR_MAX - IR_ADD: return x = > y ? x : y; break;
> #endif
> default: return x;
> = }
> diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
> index = a29292f1..89faa03e 100644
> --- a/src/vm_arm.dasc
> +++ = b/src/vm_arm.dasc
> @@ -1718,8 +1718,8 @@ static void = build_subroutines(BuildCtx *ctx)
> |.endif
> = |.endmacro
> |
> - | math_minmax math_min, gt, hi
> - = | math_minmax math_max, lt, lo
> + | math_minmax math_min, gt, = pl
> + | math_minmax math_max, lt, le
> |
> |//-- = String library = -----------------------------------------------------
> |
> = diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
> index = f517a808..2c1bb4f8 100644
> --- a/src/vm_arm64.dasc
> +++ = b/src/vm_arm64.dasc
> @@ -1494,8 +1494,8 @@ static void = build_subroutines(BuildCtx *ctx)
> | b <6
> = |.endmacro
> |
> - | math_minmax math_min, gt, hi
> - = | math_minmax math_max, lt, lo
> + | math_minmax math_min, gt, = pl
> + | math_minmax math_max, lt, le
> |
> |//-- = String library = -----------------------------------------------------
> |
> = diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index = 59f117ba..faeb5181 100644
> --- a/src/vm_x64.dasc
> +++ = b/src/vm_x64.dasc
> @@ -1896,7 +1896,7 @@ static void = build_subroutines(BuildCtx *ctx)
> | jmp ->fff_res
> = |
> |.macro math_minmax, name, cmovop, sseop
> - | .ffunc = name
> + | .ffunc_1 name
> | mov RAd, 2
> |.if = DUALNUM
> | mov RB, [BASE]
> diff --git a/src/vm_x86.dasc = b/src/vm_x86.dasc
> index f7ffe5d2..1c995d16 100644
> --- = a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc
> @@ -2321,7 = +2321,7 @@ static void build_subroutines(BuildCtx *ctx)
> | xorps = xmm4, xmm4; jmp <1 // Return +-Inf and +-0.
> |
> |.macro = math_minmax, name, cmovop, sseop
> - | .ffunc name
> + | = .ffunc_1 name
> | mov RA, 2
> | cmp dword [BASE+4], = LJ_TISNUM
> |.if DUALNUM
> diff --git = a/test/tarantool-tests/gh-6163-min-max.test.lua = b/test/tarantool-tests/gh-6163-min-max.test.lua
> new file mode = 100644
> index 00000000..fb7a4058
> --- /dev/null
> = +++ b/test/tarantool-tests/gh-6163-min-max.test.lua
> @@ -0,0 = +1,263 @@
> +local tap =3D require('tap')
> +local test =3D = tap.test('gh-6163-jit-min-max')
> +local x86_64 =3D jit.arch =3D=3D = 'x86' or jit.arch =3D=3D 'x64'
> +test:plan(18)
> = +--
> +-- gh-6163: math.min/math.max inconsistencies.
> = +--
> +
> +local function isnan(x)
> + return x ~=3D = x
> +end
> +
> +local function = array_is_consistent(res)
> + for i =3D 1, #res - 1 do
> + if = res[i] ~=3D res[i + 1] and not (isnan(res[i]) and isnan(res[i + 1])) = then
> + return false
> + end
> + end
> + return = true
> +end
> +
> +-- This function creates dirty = values on the Lua stack.
> +-- The latter of them is going to be = treated as an
> +-- argument by the `math.min/math.max`.
> = +-- The first two of them are going to be overwritten
> +-- by the = math function itself.
> +local function filler()
> + return = 1, 1, 1
> +end
> +
> +local min =3D math.min
> = +local max =3D math.max
> +
> +-- It would be enough to test = all cases for the
> +-- `math.min()` or for the `math.max()` only, = because the
> +-- problem was in the common code. However, we = shouldn't
> +-- make such assumptions in the testing code.
> = +
> +-- `math.min()/math.max()` should raise an error when
> = +-- called with no arguments.
> +filler()
> +local r, _ =3D = pcall(function() min() end)
> +test:ok(not r, 'math.min fails with = no args')
> +
> +filler()
> +r, _ =3D pcall(function() = max() end)
> +test:ok(false =3D=3D r, 'math.max fails with no = args')
> +
> +local nan =3D 0/0
> +local x =3D = 1
> +
> +jit.opt.start('hotloop=3D1')
> +
> +-- = XXX: Looping over the operations and their arguments breakes the
> = +-- semantics of some optimization tests below. The cases are
> = +-- copy-pasted to preserve optimization semantics.
> +
> = +-- Without the `(a o b) o a =3D=3D> a o b` fold optimization = for
> +-- `math.min()/math.max()` the following mcode is emitted = on aarch64
> +-- for the `math.min(math.min(x, nan), x)` = expression:
> +--
> +-- | fcmp d2, d3 ; fcmp 1.0, = nan
> +-- | fcsel d1, d2, d3, cc ; d1 =3D=3D nan after this = instruction
> +-- | ...
> +-- | fcmp d1, d2 ; fcmp nan, = 1.0
> +-- | fcsel d0, d1, d2, cc ; d0 =3D=3D 1.0 after this = instruction
> +--
> +-- According to the `fcmp` docs[1], if = either of the operands is NaN,
> +-- then the operands are = unordered. It results in the following state
> +-- of the flags = register: N=3D0, Z=3D0, C=3D1, V=3D1
> +--
> +-- According = to the `fcsel` docs[2], if the condition is met, then
> +-- the = first register value is taken, otherwise -- the second.
> +-- In = our case, the condition is cc, which means that the `C` flag
> +-- = should be clear[3], which is false. Then, the second value is = taken,
> +-- which is `NaN` for the first `fcmp`-`fcsel` pair, and = `1.0` for
> +-- the second.
> +--
> +-- If that fold = optimization is applied, then only the first `fcmp`-`fcsel`
> +-- = pair is emitted, and the result is `NaN`, which is inconsistent = with
> +-- the result of the non-optimized mcode.
> = +--
> +-- [1]: https://developer.arm.com/documentation/dui0801/g/A64-Fl= oating-point-Instructions/FCMP
> +-- [2]: https://developer.arm.com/documentation/100069/0608/A64-= Floating-point-Instructions/FCSEL
> +-- [3]: https://developer.arm.com/documentation/dui0068/b/ARM-In= struction-Reference/Conditional-execution
> +
> +local = result =3D {}
> +for k =3D 1, 4 do
> + result[k] =3D = min(min(x, nan), x)
> +end
> +-- expected: 1 1 1 1
> = +test:ok(array_is_consistent(result), 'math.min: reassoc_dup')
> = +
> +result =3D {}
> +for k =3D 1, 4 do
> + result[k] = =3D max(max(x, nan), x)
> +end
> +-- expected: 1 1 1 = 1
> +test:ok(array_is_consistent(result), 'math.max: = reassoc_dup')
> +
> +-- If one gets the expression like = `math.min(x, math.min(x, nan))`,
> +-- and the `comm_dup` = optimization is applied, it results in the
> +-- same situation as = explained above. With the `comm_dup_minmax`
> +-- there is no = swap, hence, everything is consistent again:
> +--
> +-- | = fcmp d2, d3 ; fcmp 1.0, nan
> +-- | fcsel d1, d3, d2, pl ; d1 =3D=3D= nan after this instruction
> +-- | ...
> +-- | fcmp d2, d1 = ; fcmp 1.0, nan
> +-- | fcsel d0, d1, d2, pl ; d0 =3D=3D nan after = this instruction
> +-- `pl` (aka `CC_PL`) condition means that N = flag is 0 [2], that
> +-- is true when we are comparing something = with NaN. So, the value of the
> +-- first source register is = taken
> +
> +result =3D {}
> +for k =3D 1, 4 = do
> + result[k] =3D min(x, min(x, nan))
> +end
> +-- = FIXME: results are still inconsistent for the x86/64 = architecture.
> +-- expected: nan nan nan nan
> = +test:ok(array_is_consistent(result) or x86_64, 'math.min: = comm_dup_minmax')
> +
> +result =3D {}
> +for k =3D 1, = 4 do
> + result[k] =3D max(x, max(x, nan))
> +end
> = +-- FIXME: results are still inconsistent for the x86/64 = architecture.
> +-- expected: nan nan nan nan
> = +test:ok(array_is_consistent(result) or x86_64, 'math.max: = comm_dup_minmax')
> +
> +-- The following optimization = should be disabled:
> +-- (x o k1) o k2 =3D=3D> x o (k1 o = k2)
> +
> +x =3D 1.2
> +result =3D {}
> +for k =3D= 1, 4 do
> + result[k] =3D min(min(x, 0/0), 1.3)
> = +end
> +-- expected: 1.3 1.3 1.3 1.3
> = +test:ok(array_is_consistent(result), 'math.min: = reassoc_minmax_k')
> +
> +result =3D {}
> +for k =3D = 1, 4 do
> + result[k] =3D max(max(x, 0/0), 1.1)
> = +end
> +-- expected: 1.1 1.1 1.1 1.1
> = +test:ok(array_is_consistent(result), 'math.max: = reassoc_minmax_k')
> +
> +result =3D {}
> +for k =3D = 1, 4 do
> + result[k] =3D min(max(nan, 1), 1)
> +end
> = +-- expected: 1 1 1 1
> +test:ok(array_is_consistent(result), = 'min-max-case1: reassoc_minmax_left')
> +
> +result =3D = {}
> +for k =3D 1, 4 do
> + result[k] =3D min(max(1, nan), = 1)
> +end
> +-- expected: 1 1 1 1
> = +test:ok(array_is_consistent(result), 'min-max-case2: = reassoc_minmax_left')
> +
> +result =3D {}
> +for k =3D = 1, 4 do
> + result[k] =3D max(min(nan, 1), 1)
> +end
> = +-- expected: 1 1 1 1
> +test:ok(array_is_consistent(result), = 'max-min-case1: reassoc_minmax_left')
> +
> +result =3D = {}
> +for k =3D 1, 4 do
> + result[k] =3D max(min(1, nan), = 1)
> +end
> +-- expected: 1 1 1 1
> = +test:ok(array_is_consistent(result), 'max-min-case2: = reassoc_minmax_left')
> +
> +result =3D {}
> +for k =3D = 1, 4 do
> + result[k] =3D min(1, max(nan, 1))
> +end
> = +-- expected: 1 1 1 1
> +test:ok(array_is_consistent(result), = 'min-max-case1: reassoc_minmax_right')
> +
> +result =3D = {}
> +for k =3D 1, 4 do
> + result[k] =3D min(1, max(1, = nan))
> +end
> +-- FIXME: results are still inconsistent for = the x86/64 architecture.
> +-- expected: nan nan nan nan
> = +test:ok(array_is_consistent(result) or x86_64, 'min-max-case2: = reassoc_minmax_right')
> +
> +result =3D {}
> +for k =3D= 1, 4 do
> + result[k] =3D max(1, min(nan, 1))
> = +end
> +-- expected: 1 1 1 1
> = +test:ok(array_is_consistent(result), 'max-min-case1: = reassoc_minmax_right')
> +
> +result =3D {}
> +for k =3D= 1, 4 do
> + result[k] =3D max(1, min(1, nan))
> = +end
> +-- FIXME: results are still inconsistent for the x86/64 = architecture.
> +-- expected: nan nan nan nan
> = +test:ok(array_is_consistent(result) or x86_64, 'max-min-case2: = reassoc_minmax_right')
> +
> +-- XXX: If we look into the = disassembled code of `lj_vm_foldarith()`
> +-- we can see the = following:
> +--
> +-- | /* In our test x =3D=3D 7.1, y =3D=3D= nan */
> +-- | case IR_MIN - IR_ADD: return x > y ? y : x; = break;
> +--
> +-- | ; case IR_MIN
> +-- | = <lj_vm_foldarith+337>: movsd xmm0,QWORD PTR [rsp+0x18] ; xmm0 = <- 7.1
> +-- | <lj_vm_foldarith+343>: comisd xmm0,QWORD = PTR [rsp+0x10] ; comisd 7.1, nan
> +-- | = <lj_vm_foldarith+349>: jbe <lj_vm_foldarith+358> ; >=3D = ?
> +-- | <lj_vm_foldarith+351>: mov rax,QWORD PTR = [rsp+0x10] ; return nan
> +-- | <lj_vm_foldarith+356>: jmp = <lj_vm_foldarith+398> ;
> +-- | <lj_vm_foldarith+358>: = mov rax,QWORD PTR [rsp+0x18] ; else return 7.1
> +-- | = <lj_vm_foldarith+363>: jmp <lj_vm_foldarith+398> ;
> = +--
> +-- According to `comisd` documentation [4] in case when one = of the operands
> +-- is NaN, the result is unordered and ZF,PF,CF = :=3D 111. This means that `jbe`
> +-- condition is true (CF=3D1 or = ZF=3D1)[5], so we return 7.1 (the first
> +-- operand) for case = `IR_MIN`.
> +--
> +-- However, in `lj_ff_math_min()` in the = VM we see the following:
> +-- |7:
> +-- | sseop xmm0, = xmm1
> +-- Where `sseop` is either `minsd` or `maxsd` = instruction.
> +-- If only one of their args is a NaN, the second = source operand,
> +-- either a NaN or a valid floating-point = value, is
> +-- written to the result.
> +--
> +-- So = the patch changes the `lj_vm_foldairth()` assembly in the following = way:
> +-- | ; case IR_MIN
> +-- | = <lj_vm_foldarith+337>: movsd xmm0,QWORD PTR [rsp+0x10] ; xmm0 = <- nan
> +-- | <lj_vm_foldarith+343>: comisd xmm0,QWORD = PTR [rsp+0x18] ; comisd nan, 7.1
> +-- | = <lj_vm_foldarith+349>: jbe <lj_vm_foldarith+358> ; >=3D = ?
> +-- | <lj_vm_foldarith+351>: mov rax,QWORD PTR = [rsp+0x18] ; return 7.1
> +-- | <lj_vm_foldarith+356>: jmp = <lj_vm_foldarith+398> ;
> +-- | <lj_vm_foldarith+358>: = mov rax,QWORD PTR [rsp+0x10] ; else return nan
> +-- | = <lj_vm_foldarith+363>: jmp <lj_vm_foldarith+398> ;
> = +--
> +-- So now we always return the second operand.
> = +--
> +-- XXX: The two tests below use the `0/0` constant instead = of `nan`
> +-- variable is dictated by the `fold_kfold_numarith` = semantics.
> +result =3D {}
> +for k =3D 1, 4 do
> + = result[k] =3D min(min(7.1, 0/0), 1.1)
> +end
> +-- expected: = 1.1 1.1 1.1 1.1
> +test:ok(array_is_consistent(result), 'min: = fold_kfold_numarith')
> +
> +result =3D {}
> +for k =3D = 1, 4 do
> + result[k] =3D max(max(7.1, 0/0), 1.1)
> = +end
> +-- expected: 1.1 1.1 1.1 1.1
> = +test:ok(array_is_consistent(result), 'max: = fold_kfold_numarith')
> +
> +os.exit(test:check() and 0 or = 1)
> --
> 2.37.1 (Apple = Git-137.1)
>
 

= --Apple-Mail=_D487FD51-E333-4823-913F-8DAE510AE418--