From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id EF0081DBD0F; Fri, 23 Dec 2022 18:46:11 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EF0081DBD0F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1671810372; bh=FrD1TMj5ZBlmov0YZQ9PVIj0qXJiRKMKjBGLQVnFyYk=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=mttnvA2alpBF14R3yPZkUNMBTtUXuNyulSjWI2ES/wKLR9L3vmQPF9YXIiHXm/Os2 oB1hXIi0oT9i3xwUdY0rbBUbDLn9n3tW7x8XFRRbtBTQVSXE5jyXdDUGoxL870eqta CKrJZyAd411U+oYFaRQGoMDiv81h0Vhwo8E3N+e8= Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2230D164E68 for ; Fri, 23 Dec 2022 18:46:10 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2230D164E68 Received: by smtp63.i.mail.ru with esmtpa (envelope-from ) id 1p8kFN-0006VB-44; Fri, 23 Dec 2022 18:46:09 +0300 Date: Fri, 23 Dec 2022 18:42:49 +0300 To: Maxim Kokryashkin Message-ID: References: <20221219095228.126312-1-m.kokryashkin@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221219095228.126312-1-m.kokryashkin@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD90D1502B3BE350FE41C8ED8347A4B23AB2AE947383B9B5AA400894C459B0CD1B9430F390B3E79EB22ADB240CF923C21A2DDC4A2E43965A82B3F8282B6C5816065 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F0ABDA2F087648F5EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637013F392EFFCDE01C8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8CDD2D571BA03DA5EE0584F0043EF178D117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735200AC5B80A05675ACD618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EED76C6ED7039589DE287C8E22D4AE2A51D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE367F1C1C3ABB44F3A6E0066C2D8992A16C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F79006376A91CFDE938F542CEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34BC3EEE75EF3BACCFB32744BC125445D6D08D1F86D642C10537204735ADC63CF289E638ADE8E5AD281D7E09C32AA3244CB98949E9300B9B2A616F5005F7B7EE8E7101BF96129E4011FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojXS8M16GLHMOb66mdvwZxVw== X-Mailru-Sender: F16D9CAFEEA6770E7B6EAD4ADB3BCAF04356AC8E9499A40D61237F3320B0AEBE56D7F21BDCCCDD9FF2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A84198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit v6] Fix math.min()/math.max() inconsistencies. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi Maxim! Thanks for the fixes! Please consider my comments below. Also, during a flash of inspiration and debugging test cases for this particular patch I found the test case for `math.modf()`. So, we need an additional patch with the test as a followup to the backported commit [8]. The following code: | LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e ' | local modf = math.modf | local nan = 0/0 | local inf = math.huge | | local r1 = {nil, nil, nil, nil} | local r2 = {nil, nil, nil, nil} | local r3 = {nil, nil, nil, nil} | local r4 = {nil, nil, nil, nil} | print("MODF") | for i = 1, 4 do | r1[i], r2[i] = modf(inf) | r3[i], r4[i] = modf(nan) | end | print(r1[1], r1[2], r1[3], r1[4]) | print(r2[1], r2[2], r2[3], r2[4]) | print("") | print(r3[1], r3[2], r3[3], r3[4]) | print(r4[1], r4[2], r4[3], r4[4]) | ' Returns before the patch (on both arches): | MODF | inf inf inf inf | 0 0 nan nan | | nan nan nan nan | nan nan nan nan And after patch (on both arches): | MODF | inf inf inf inf | 0 0 0 0 | | nan nan nan nan | nan nan nan nan On 19.12.22, Maxim Kokryashkin wrote: > From: Mike Pall > > (cherry-picked from commit 03208c8162af9cc01ca76ee1676ca79e5abe9b60) > > `math.min()`/`math.max()` could produce different results. > Previously, dirty values on the Lua stack could be > treated as arguments to `math.min()`/`math.max()`. > This patch adds check for the number of arguments provided to > math.min/max, which fixes the issue. > > Also it adds the corresponding test case for > the mentioned issue and does some refactoring: OK, this is not a refactoring but adding a consistency between the JIT and the VM behaviour. > 1. fcc is changed for min/max functions in ARM > assembly from LO/HI (lower/upper or unordered) to LE/PL > (lower/upper, equal or unoredered). > > 2. Several fold optimizations for min/max were removed > or modified. See my comments with tests below. > > Resolves tarantool/tarantool#6163 > --- > > >IMHO, LO -> LT (N!=V Less than or unordered) do the same thing wo > >changing the order. > > > >IINM, an ordered comparison checks if neither operand is NaN. > >Conversely, an unordered comparison checks if either operand is a NaN. > > > >So, looks like an attempt to fix inconsistent behaviour for NaNs in > >math.min/math.max on aarch64. > > > >Also, I found inconsistent behaviour on x86 (between LuaJIT|Lua): > > > >| # on upstream build > >| $ ./luajit -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end' > >| inf > >| inf > >| inf > >| inf > >| $ lua -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end' > >| -nan > >| -nan > >| -nan > >| -nan > > > >Can you please test some similar examples on aarch64/M1? > I've tested those on M1 and here are the results: > $ ./src/luajit -Ohotloop=1 -e 'local res = {} for i=1,4 do res[i]=math.max(0/0,math.huge) end for i =1, #res do print(res[i]) end' > inf > inf > inf > inf > > $ lua -e 'local res = {} for i=1,4 do res[i]=math.max(0/0,math.huge) end for i =1, #res do print(res[i]) end' > nan > nan > nan > nan > > >Also, AFAICS, some optimizations are the reason of inconsistent > >behaviour for JIT-ed code (not the fold in this commit). > >| # on our fork > >| ./luajit -O0 -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end' > >| inf > >| inf > >| inf > >| inf > >| # on our fork > >| ./luajit -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end' > >| inf > >| inf > >| inf > >| nan > > > >BTW this commit doesn't fix the problem. Can you please bisect the > >commit to backport? > Works perfectly fine on our fork after this patch. > My, bad. PEBCAK:). The aforementioned testcase really passes after the whole patch. Side note: __Five minutes later__: OK, nevermind, still inconsistent on upstream (in some other way): | $ src/luajit -Ohotloop=1 -e ' | local inf = math.huge | local nan = inf/inf | local min = math.min | local max = math.max | | print(nan, inf) | | local r = {} | local r_assoc = {} | print("MIN:") | for i = 1, 4 do | r[i] = min(nan, inf) | r_assoc[i] = min(min(nan, inf), nan) | end | print(r[1], r[2], r[3], r[4]) | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | | print("MAX:") | for i = 1, 4 do | r[i] = max(nan, inf) | r_assoc[i] = max(max(nan, inf), nan) | end | print(r[1], r[2], r[3], r[4]) | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | ' | nan inf | MIN: | inf inf inf nan | nan nan nan nan | MAX: | inf inf nan nan | nan nan nan nan > >> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c > >> index 276dc040..07a52a4d 100644 > >> --- a/src/lj_opt_fold.c > >> +++ b/src/lj_opt_fold.c > >> @@ -1774,8 +1774,6 @@ LJFOLDF(reassoc_intarith_k64) > >> #endif > >> } > >> > >> -LJFOLD(MIN MIN any) > >> -LJFOLD(MAX MAX any) > >> LJFOLD(BAND BAND any) > >> LJFOLD(BOR BOR any) > >> LJFOLDF(reassoc_dup) > >> @@ -1785,6 +1783,15 @@ LJFOLDF(reassoc_dup) > >> return NEXTFOLD; > >> } > >> > >> +LJFOLD(MIN MIN any) > >> +LJFOLD(MAX MAX any) > >> +LJFOLDF(reassoc_dup_minmax) > >> +{ > >> + if (fins->op2 == fleft->op2) > >> + return LEFTFOLD; /* (a o b) o b ==> a o b */ > >> + return NEXTFOLD; > >> +} > >> + I don't understand why this should be fixed by removing this optimizations (this is about `min(min(a, b), b)` and `max(max(a, b), b)` constructions). Do you mean `fold_comm_dup_minmax()` part? > > > >Do you know why the opt `(a o b) o a ==> a o b;` is ommited now? > >Are there any examples of incorrect behaviour? I suggest to check NaN > >behaviour in this case. > Added the test case for that one, but I failed to find any for the > other. > > I've done test runs on all of the combinations of `{1, -1, 0, -0, 0/0, > -math.huge, math.huge}` for all of the optimizations. > > I'll be glad to add another test case if you can think of any. OK, I mark all spots with tests to be added as (T). Please, provide verbose description for each of my comments in commit message and comments in tests. > > src/lj_asm_arm.h | 6 +-- > src/lj_asm_arm64.h | 6 +-- > src/lj_opt_fold.c | 53 +++++++------------ > src/lj_vmmath.c | 4 +- > src/vm_arm.dasc | 4 +- > src/vm_arm64.dasc | 4 +- > src/vm_x64.dasc | 2 +- > src/vm_x86.dasc | 2 +- > test/tarantool-tests/gh-6163-min-max.test.lua | 48 +++++++++++++++++ > 9 files changed, 81 insertions(+), 48 deletions(-) > create mode 100644 test/tarantool-tests/gh-6163-min-max.test.lua > > diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h > index 8af19eb9..6ae6e2f2 100644 > --- a/src/lj_asm_arm.h > +++ b/src/lj_asm_arm.h > @@ -1663,8 +1663,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, int cc, int fcc) > asm_intmin_max(as, ir, cc); > } > > -#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_HI) > -#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LO) > +#define asm_min(as, ir) asm_min_max(as, ir, CC_GT, CC_PL) > +#define asm_max(as, ir) asm_min_max(as, ir, CC_LT, CC_LE) > > /* -- Comparisons --------------------------------------------------------- */ > > @@ -1856,7 +1856,7 @@ static void asm_hiop(ASMState *as, IRIns *ir) > } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) { > as->curins--; /* Always skip the loword min/max. */ > if (uselo || usehi) > - asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_HI : CC_LO); > + asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_PL : CC_LE); > return; > #elif LJ_HASFFI > } else if ((ir-1)->o == IR_CONV) { > diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h > index 4aeb51f3..fe197700 100644 > --- a/src/lj_asm_arm64.h > +++ b/src/lj_asm_arm64.h > @@ -1592,7 +1592,7 @@ static void asm_fpmin_max(ASMState *as, IRIns *ir, A64CC fcc) > Reg dest = (ra_dest(as, ir, RSET_FPR) & 31); > Reg right, left = ra_alloc2(as, ir, RSET_FPR); > right = ((left >> 8) & 31); left &= 31; > - emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, left, right); > + emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, right, left); > emit_nm(as, A64I_FCMPd, left, right); > } > > @@ -1604,8 +1604,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, A64CC cc, A64CC fcc) > asm_intmin_max(as, ir, cc); > } > > -#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_HI) > -#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_LO) > +#define asm_min(as, ir) asm_min_max(as, ir, CC_LT, CC_PL) > +#define asm_max(as, ir) asm_min_max(as, ir, CC_GT, CC_LE) > > /* -- Comparisons --------------------------------------------------------- */ > > diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c I test each piece of changes on x86/x64 and aarch64 and here are my observations: > index 49f74996..27e489af 100644 > --- a/src/lj_opt_fold.c > +++ b/src/lj_opt_fold.c > @@ -1797,8 +1797,6 @@ LJFOLDF(reassoc_intarith_k64) > #endif > } > > -LJFOLD(MIN MIN any) > -LJFOLD(MAX MAX any) > LJFOLD(BAND BAND any) > LJFOLD(BOR BOR any) > LJFOLDF(reassoc_dup) > @@ -1808,6 +1806,15 @@ LJFOLDF(reassoc_dup) > return NEXTFOLD; > } > > +LJFOLD(MIN MIN any) > +LJFOLD(MAX MAX any) > +LJFOLDF(reassoc_dup_minmax) > +{ > + if (fins->op2 == fleft->op2) > + return LEFTFOLD; /* (a o b) o b ==> a o b */ > + return NEXTFOLD; > +} > + Avoiding of `(a o b) o a ==> a o b` optimization "fixes" (if I get the idea right) the cases like the following on arch64: | min(min(x, nan), x) where `x` is a finite number. (T) For example the following snippet: | LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e ' | local nan = 0/0 | local x = 1 | local min = math.min | | print(x, nan, inf) | print("MIN:", x, nan) | local r = {} | local r_assoc = {} | for k = 1, 4 do | r[k] = min(x, nan) | r_assoc[k] = min(min(x, nan), x) | end | print(r[1], r[2], r[3], r[4]) | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n") | ' prints the following before the patch: | 1 nan inf | MIN: 1 nan | nan nan nan nan | 1 1 1 nan and after patch: | 1 nan inf | MIN: 1 nan | nan nan nan nan | 1 1 1 1 For this code we get the following IRs: | 0003 x8 > int SLOAD #4 T ; 1 | 0004 d3 > num SLOAD #3 T ; nan | 0005 > fun EQ 0002 math.min | 0006 d2 num CONV 0003 num.int ; 1 -> 1.0 | 0007 d1 num MIN 0006 0004 ; res_0007 = min(1.0, nan) = nan In case after patch with removed fold optimization we get the additional IR later: | 0016 d0 num MIN 0007 0006 ; min(res_0007, 1.0) with the following mcode: | fcmp d2, d3 ; fcmp 1.0, nan | fcsel d1, d2, d3, cc ; d1 == nan after this instruction (*) | ... | fcmp d1, d2 ; fcmp nan, 1.0 | fcsel d0, d1, d2, cc ; d0 == 1.0 after this instruction (*) Before the patch only one `fcmp` `fcsel` pair is emitted, so `nan` value is returned in the destination register. (*) Why this value is chosen: According to the `fcmp` documentation: | The IEEE 754 standard specifies that the result of a comparison is | precisely one of <, ==, > or unordered. If either or both of the | operands are NaNs, they are unordered, and all three of (Operand1 < | Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false. | This case results in the FPSCR flags being set to N=0, Z=0, C=1, and | V=1. `cc` (aka `CC_LO` == `CC_CC`) condition means that C flag is 0 [2], that is false when we are comparing something with NaN. So, according to the `fcsel` documentation [3]: | If the condition passes, the first SIMD and FP source register value | is taken, otherwise the second SIMD and FP source register value is | taken. So, the value of the second source register is always chosen. (TT) When we change the order of arguments things become broken again: | $ LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e ' | local inf = math.huge | local nan = 0/0 | | local x = 1 | local min = math.min | local max = math.max | | print(x, nan, inf) | print("MIN:", x, nan) | local r = {} | local r_assoc = {} | for k = 1, 4 do | r[k] = min(x, nan) | r_assoc[k] = min(x, min(x, nan)) | end | print(r[1], r[2], r[3], r[4]) | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n") | ' Prints the following before the patch (on aarch64): | 1 nan inf | MIN: 1 nan | nan nan nan nan | nan nan nan nan And after this particular chunk of the patch (reassoc_dup_minmax): | 1 nan inf | MIN: 1 nan | nan nan nan nan | nan nan nan 1 > LJFOLD(BXOR BXOR any) > LJFOLDF(reassoc_bxor) > { > @@ -1846,23 +1853,12 @@ LJFOLDF(reassoc_shift) > return NEXTFOLD; > } > > -LJFOLD(MIN MIN KNUM) > -LJFOLD(MAX MAX KNUM) > LJFOLD(MIN MIN KINT) > LJFOLD(MAX MAX KINT) > LJFOLDF(reassoc_minmax_k) > { > IRIns *irk = IR(fleft->op2); > - if (irk->o == IR_KNUM) { > - lua_Number a = ir_knum(irk)->n; > - lua_Number y = lj_vm_foldarith(a, knumright, fins->o - IR_ADD); > - if (a == y) /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */ > - return LEFTFOLD; > - PHIBARRIER(fleft); > - fins->op1 = fleft->op1; > - fins->op2 = (IRRef1)lj_ir_knum(J, y); > - return RETRYFOLD; /* (x o k1) o k2 ==> x o (k1 o k2) */ > - } else if (irk->o == IR_KINT) { > + if (irk->o == IR_KINT) { > int32_t a = irk->i; > int32_t y = kfold_intop(a, fright->i, fins->o); > if (a == y) /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */ > @@ -1875,24 +1871,6 @@ LJFOLDF(reassoc_minmax_k) > return NEXTFOLD; > } (TTT) This chunk fixes behaviour for constants reassociation. Run the following chunk: | LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e ' | local inf = math.huge | local nan = 0/0 | local min = math.min | local max = math.max | | local r = {} | local r_assoc = {} | print("MIN:") | local x = 1.2 | for i = 1, 4 do | r[i] = min(min(x, 0/0), 1.3) | r_assoc[i] = min(min(x, 1.3), 0/0) | end | print(r[1], r[2], r[3], r[4]) | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | | print("MAX:") | for i = 1, 4 do | r[i] = max(max(x, 0/0), 1.1) | r_assoc[i] = max(max(x, 1.1), 0/0) | end | print(r[1], r[2], r[3], r[4]) | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | ' Gives the following results before the particular (reassoc_minmax_k) patch (on both arches): | MIN: | 1.3 1.3 1.3 1.2 | nan nan nan nan | MAX: | 1.1 1.1 1.2 1.2 | nan nan nan nan And after patch (on both arches): | MIN: | 1.3 1.3 1.3 1.3 | nan nan nan nan | MAX: | 1.1 1.1 1.1 1.1 | nan nan nan nan > > -LJFOLD(MIN MAX any) > -LJFOLD(MAX MIN any) > -LJFOLDF(reassoc_minmax_left) > -{ > - if (fins->op2 == fleft->op1 || fins->op2 == fleft->op2) > - return RIGHTFOLD; /* (b o1 a) o2 b ==> b; (a o1 b) o2 b ==> b */ > - return NEXTFOLD; > -} This particular patch (reassoc_minmax_left) fixes the following test case on aarch64. (TTTT) | LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e ' | local min = math.min | local max = math.max | local nan = 0/0 | | local r_assoc = {} | local r_assoc2 = {} | print("MIN - MAX:") | for i = 1, 4 do | r_assoc[i] = min(max(nan, 1), 1) | r_assoc2[i] = min(max(1, nan), 1) | end | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n") | | print("MAX - MIN:") | for i = 1, 4 do | r_assoc[i] = max(min(nan, 1), 1) | r_assoc2[i] = max(min(1, nan), 1) | end | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n") | ' Returns before the patch: | MIN - MAX: | 1 1 1 nan | 1 1 1 nan | | MAX - MIN: | 1 1 nan nan | 1 1 nan nan And after the patch: | MIN - MAX: | 1 1 1 1 | 1 1 1 1 | | MAX - MIN: | 1 1 1 1 | 1 1 1 1 > - > -LJFOLD(MIN any MAX) > -LJFOLD(MAX any MIN) > -LJFOLDF(reassoc_minmax_right) > -{ > - if (fins->op1 == fright->op1 || fins->op1 == fright->op2) > - return LEFTFOLD; /* a o2 (a o1 b) ==> a; a o2 (b o1 a) ==> a */ > - return NEXTFOLD; > -} > - OK, let's try a similar test case here (TTTTT): | LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e ' | local min = math.min | local max = math.max | local nan = 0/0 | | local r_assoc = {} | local r_assoc2 = {} | print("MIN - MAX:") | for i = 1, 4 do | r_assoc[i] = min(1, max(nan, 1)) | r_assoc2[i] = min(1, max(1, nan)) | end | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n") | | print("MAX - MIN:") | for i = 1, 4 do | r_assoc[i] = max(1, min(nan, 1)) | r_assoc2[i] = max(1, min(1, nan)) | end | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n") | ' Before the patch it leads to the assertion failure in `rec_check_slots()` (both arches): | MIN - MAX: | luajit: src/lj_record.c:142: rec_check_slots: Assertion `lj_obj_equal(tv, &tvk)' failed. | Aborted After the patch we got inconsistent results on x86/x64 (same results in the upstream): | MIN - MAX: | 1 1 1 1 | nan nan nan 1 | | MAX - MIN: | 1 1 1 1 | nan nan 1 1 But for aarch64 it is fine: | MIN - MAX: | 1 1 1 1 | nan nan nan nan | | MAX - MIN: | 1 1 1 1 | nan nan nan nan I suggest to add this testcase without checking values consistency, but with FIXME: mark. > /* -- Array bounds check elimination -------------------------------------- */ > > /* Eliminate ABC across PHIs to handle t[i-1] forwarding case. > @@ -2018,8 +1996,6 @@ LJFOLDF(comm_comp) > > LJFOLD(BAND any any) > LJFOLD(BOR any any) > -LJFOLD(MIN any any) > -LJFOLD(MAX any any) > LJFOLDF(comm_dup) > { > if (fins->op1 == fins->op2) /* x o x ==> x */ > @@ -2027,6 +2003,15 @@ LJFOLDF(comm_dup) > return fold_comm_swap(J); > } > > +LJFOLD(MIN any any) > +LJFOLD(MAX any any) > +LJFOLDF(comm_dup_minmax) > +{ > + if (fins->op1 == fins->op2) /* x o x ==> x */ > + return LEFTFOLD; > + return NEXTFOLD; > +} > + No need to swap operands here to avoid side effects, so just continue. This fixes (TT) on arm64. We get the following mcode now: | fcmp d2, d3 ; fcmp 1.0, nan | fcsel d1, d3, d2, pl ; d1 == nan after this instruction | ... | fcmp d2, d1 ; fcmp 1.0, nan | fcsel d0, d1, d2, pl ; d0 == nan after this instruction `pl` (aka `CC_PL`) condition means that N flag is 0 [2], that is true when we are comparing something with NaN. So, the value of the first source register is taken. After this particular chunk of the patch (comm_dup_minmax) plus (reassoc_dup_minmax) (TT) output is the following on aarch64: | 1 nan inf | MIN: 1 nan | nan nan nan nan | nan nan nan nan > LJFOLD(BXOR any any) > LJFOLDF(comm_bxor) > { > diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c > index c04459bd..ae4e0f15 100644 > --- a/src/lj_vmmath.c > +++ b/src/lj_vmmath.c > @@ -49,8 +49,8 @@ double lj_vm_foldarith(double x, double y, int op) > case IR_ABS - IR_ADD: return fabs(x); break; > #if LJ_HASJIT > case IR_LDEXP - IR_ADD: return ldexp(x, (int)y); break; > - case IR_MIN - IR_ADD: return x > y ? y : x; break; > - case IR_MAX - IR_ADD: return x < y ? y : x; break; > + case IR_MIN - IR_ADD: return x < y ? x : y; break; > + case IR_MAX - IR_ADD: return x > y ? x : y; break; (TTTTTT) This piece of the patch fixes inconsistency in `fold_kfold_numarith()` (on x86/x64): | $ LUA_PATH="src/?.lua;;" src/luajit -jdump=i -Ohotloop=1 -e ' | local min = math.min | local max = math.max | | local r_assoc = {} | print("MIN:") | for i = 1, 4 do | r_assoc[i] = min(min(7.1, 0/0), 1.1) | end | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n") | | print("MAX:") | for i = 1, 4 do | r_assoc[i] = max(max(7.1, 0/0), 1.1) | end | print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4]) | ' | MIN: | ---- TRACE 1 start (command line):7 | luajit: /home/burii/reviews/luajit/minmax/src/lj_record.c:142: rec_check_slots: Assertion `lj_obj_equal(tv, &tvk)' failed. | Aborted NB: use 0/0 constant here according to `fold_kfold_numarith()` semantics. `tv` is nan(0x8000000000000), `tvk` is 7.1. If we look in disassembled code of `lj_vm_foldarith()` we can see the following: | /* In our example x == 7.1, y == nan */ | case IR_MIN - IR_ADD: return x > y ? y : x; break; | case IR_MAX - IR_ADD: return x < y ? y : x; break; | ; case IR_MIN | : movsd xmm0,QWORD PTR [rsp+0x18] ; xmm0 <- 7.1 | : comisd xmm0,QWORD PTR [rsp+0x10] ; comisd 7.1, nan | : jbe ; >= ? | : mov rax,QWORD PTR [rsp+0x10] ; return nan | : jmp ; | : mov rax,QWORD PTR [rsp+0x18] ; else return 7.1 | : jmp ; | ; case IR_MAX | : movsd xmm0,QWORD PTR [rsp+0x10] ; xmm0 <- nan | : comisd xmm0,QWORD PTR [rsp+0x18] ; comisd nan, 7.1 | : jbe ; >= ? | : mov rax,QWORD PTR [rsp+0x10] ; return nan | : jmp ; | : mov rax,QWORD PTR [rsp+0x18] ; else return 7.1 | : jmp ; According to `comisd` documentation [4] in case when the one operand is NaN, the result is unordered and ZF,PF,CF := 111. This means that `jbe` condition is true (CF=1 or ZF=1)[5], so we return 7.1 (the first operand) for case `IR_MIN`. Q: Why it is the problem? For answer lets see the `lj_ff_math_min()` and `lj_ff_math_max()` in the VM: For number values we got the following: |7: | sseop xmm0, xmm1 Where `sseop` is `minsd`/`maxsd` instruction correspondingly. >From the instruction reference guides [6][7]: | If only one value is a NaN (SNaN or QNaN) for this instruction, the | second source operand, either a NaN or a valid floating-point value, is | written to the result. After the patch we got the following disassembled code for `lj_vm_foldarith()`: | ; case IR_MIN | : movsd xmm0,QWORD PTR [rsp+0x10] ; xmm0 <- nan | : comisd xmm0,QWORD PTR [rsp+0x18] ; comisd nan, 7.1 | : jbe ; >= ? | : mov rax,QWORD PTR [rsp+0x18] ; return 7.1 | : jmp ; | : mov rax,QWORD PTR [rsp+0x10] ; else return nan | : jmp ; | ; case IR_MAX | : movsd xmm0,QWORD PTR [rsp+0x18] ; xmm0 <- 7.1 | : comisd xmm0,QWORD PTR [rsp+0x10] ; comisd 7.1, nan | : jbe ; >= ? | : mov rax,QWORD PTR [rsp+0x18] ; return 7.1 | : jmp ; | : mov rax,QWORD PTR [rsp+0x10] ; else return nan | : jmp ; So now we always return the second operand (nan for case `IR_MIN`). Side note: IMHO, instead of this behavior, it the NaN source operand should be returned. Also, IINM, it makes the behaviour consistent with PUC-Rio Lua 5.1. NB: For aarch64 this changes the follwing assembly: | 62630: ldr d1, [sp, #40] | 62634: ldr d0, [sp, #32] | 62638: fcmpe d1, d0 | 6263c: b.le 62648 | 62640: ldr d0, [sp, #32] | 62644: b 62674 | 62648: ldr d0, [sp, #40] | 6264c: b 62674 | 62650: ldr d1, [sp, #40] | 62654: ldr d0, [sp, #32] | 62658: fcmpe d1, d0 | 6265c: b.pl 62668 // b.nfrst | 62660: ldr d0, [sp, #32] | 62664: b 62674 | 62668: ldr d0, [sp, #40] | 6266c: b 62674 To this one: | 62620: ldr d1, [sp, #40] | 62624: ldr d0, [sp, #32] | 62628: fcmpe d1, d0 | 6262c: b.pl 62638 // b.nfrst | 62630: ldr d0, [sp, #40] | 62634: b 62664 | 62638: ldr d0, [sp, #32] | 6263c: b 62664 | 62640: ldr d1, [sp, #40] | 62644: ldr d0, [sp, #32] | 62648: fcmpe d1, d0 | 6264c: b.le 62658 | 62650: ldr d0, [sp, #40] | 62654: b 62664 | 62658: ldr d0, [sp, #32] | 6265c: b 62664 So, we should provide the same changes for VM|JIT generated mcode (and this is why CC_LO/CC_HI are replaced with CC_LE/CC_PL) > #endif > default: return x; > } > diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc > index a29292f1..89faa03e 100644 > --- a/src/vm_arm.dasc > +++ b/src/vm_arm.dasc > @@ -1718,8 +1718,8 @@ static void build_subroutines(BuildCtx *ctx) > |.endif > |.endmacro > | > - | math_minmax math_min, gt, hi > - | math_minmax math_max, lt, lo > + | math_minmax math_min, gt, pl > + | math_minmax math_max, lt, le > | > |//-- String library ----------------------------------------------------- > | > diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc > index f517a808..2c1bb4f8 100644 > --- a/src/vm_arm64.dasc > +++ b/src/vm_arm64.dasc > @@ -1494,8 +1494,8 @@ static void build_subroutines(BuildCtx *ctx) > | b <6 > |.endmacro > | > - | math_minmax math_min, gt, hi > - | math_minmax math_max, lt, lo > + | math_minmax math_min, gt, pl > + | math_minmax math_max, lt, le > | > |//-- String library ----------------------------------------------------- > | > diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc > index 59f117ba..faeb5181 100644 > --- a/src/vm_x64.dasc > +++ b/src/vm_x64.dasc > diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc > index f7ffe5d2..1c995d16 100644 > --- a/src/vm_x86.dasc > +++ b/src/vm_x86.dasc > diff --git a/test/tarantool-tests/gh-6163-min-max.test.lua b/test/tarantool-tests/gh-6163-min-max.test.lua > new file mode 100644 > index 00000000..1da8a259 > --- /dev/null > +++ b/test/tarantool-tests/gh-6163-min-max.test.lua > @@ -0,0 +1,48 @@ > +local tap = require('tap') > +local test = tap.test('gh-6163-jit-min-max') > +test:plan(3) > +-- > +-- gh-6163: math.min/math.max inconsistencies. > +-- > + > +local function is_consistent(res) > + for i = 1, #res - 1 do > + if res[i] ~= res[i + 1] then > + return false > + end > + end > + return true > +end Side note: may be it is better to name it `array_is_consistent()` as far as it checks only array table part? Also, we probably can use it in our test utils. > + > +-- This function creates dirty values on the Lua stack. > +-- The latter of them is going to be treated as an > +-- argument by the `math.min/math.max`. > +-- The first two of them are going to be overwritten > +-- by the math function itself. > +local function filler() > + return 1, 1, 1 > +end > + > +-- Success with no args. Don't get the comment. Do you mean: | -- `math.min()` should raise an error when is called without | -- arguments. > +filler() > +local r, _ = pcall(function() math.min() end) > +test:ok(false == r, 'math.min fails with no args') Why don't use `not r` here? > + > +filler() > +r, _ = pcall(function() math.max() end) > +test:ok(false == r, 'math.max fails with no args') Why do we need the second test? Please, add the coressponding comment. > + > +-- Incorrect fold optimization. Don't get your comment. What do you mean? > +jit.off() > +jit.flush() Why do we need to remove previous traces before start? > +jit.opt.start('hotloop=1') > +jit.on() > + > +local res = {} > +for i = 1, 4 do > + res[i] = math.min(math.min(0/0, math.huge), math.huge) NB: IINM, still inconsistent if we use | local nan = 0/0 | -- ... | res[i] = math.min(math.min(nan, math.huge), math.huge) I suppose, that we should comment this. > +end > + > +test:ok(is_consistent(res), '(a o b) o a -> a o b') > + > +os.exit(test:check() and 0 or 1) > -- > 2.38.1 > [1]: https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FCMP [2]: https://www.cs.princeton.edu/courses/archive/spr19/cos217/reading/ArmInstructionSetOverview.pdf [3]: https://developer.arm.com/documentation/100069/0608/A64-Floating-point-Instructions/FCSEL [4]: https://www.felixcloutier.com/x86/comisd [5]: https://www.felixcloutier.com/x86/jcc [6]: https://www.felixcloutier.com/x86/minsd [7]: https://www.felixcloutier.com/x86/maxsd [8]: https://github.com/tarantool/luajit/commit/9b6c0cd8eafdd2e5a8a7ac4b33f6e33b3d8a93b9 -- Best regards, Sergey Kaplun