From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounces@dev.tarantool.org>
Received: from [87.239.111.99] (localhost [127.0.0.1])
	by dev.tarantool.org (Postfix) with ESMTP id EF0081DBD0F;
	Fri, 23 Dec 2022 18:46:11 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EF0081DBD0F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev;
	t=1671810372; bh=FrD1TMj5ZBlmov0YZQ9PVIj0qXJiRKMKjBGLQVnFyYk=;
	h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=mttnvA2alpBF14R3yPZkUNMBTtUXuNyulSjWI2ES/wKLR9L3vmQPF9YXIiHXm/Os2
	 oB1hXIi0oT9i3xwUdY0rbBUbDLn9n3tW7x8XFRRbtBTQVSXE5jyXdDUGoxL870eqta
	 CKrJZyAd411U+oYFaRQGoMDiv81h0Vhwo8E3N+e8=
Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 2230D164E68
 for <tarantool-patches@dev.tarantool.org>;
 Fri, 23 Dec 2022 18:46:10 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2230D164E68
Received: by smtp63.i.mail.ru with esmtpa (envelope-from
 <skaplun@tarantool.org>)
 id 1p8kFN-0006VB-44; Fri, 23 Dec 2022 18:46:09 +0300
Date: Fri, 23 Dec 2022 18:42:49 +0300
To: Maxim Kokryashkin <max.kokryashkin@gmail.com>
Message-ID: <Y6XMebt03ffqqDPQ@root>
References: <20221219095228.126312-1-m.kokryashkin@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20221219095228.126312-1-m.kokryashkin@tarantool.org>
X-Mailru-Src: smtp
X-4EC0790: 10
X-7564579A: 646B95376F6C166E
X-77F55803: 4F1203BC0FB41BD90D1502B3BE350FE41C8ED8347A4B23AB2AE947383B9B5AA400894C459B0CD1B9430F390B3E79EB22ADB240CF923C21A2DDC4A2E43965A82B3F8282B6C5816065
X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F0ABDA2F087648F5EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637013F392EFFCDE01C8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8CDD2D571BA03DA5EE0584F0043EF178D117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735200AC5B80A05675ACD618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EED76C6ED7039589DE287C8E22D4AE2A51D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE367F1C1C3ABB44F3A6E0066C2D8992A16C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F79006376A91CFDE938F542CEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A
X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34BC3EEE75EF3BACCFB32744BC125445D6D08D1F86D642C10537204735ADC63CF289E638ADE8E5AD281D7E09C32AA3244CB98949E9300B9B2A616F5005F7B7EE8E7101BF96129E4011FACE5A9C96DEB163
X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojXS8M16GLHMOb66mdvwZxVw==
X-Mailru-Sender: F16D9CAFEEA6770E7B6EAD4ADB3BCAF04356AC8E9499A40D61237F3320B0AEBE56D7F21BDCCCDD9FF2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A84198E0F3ECE9B5443453F38A29522196
X-Mras: Ok
Subject: Re: [Tarantool-patches] [PATCH luajit v6] Fix math.min()/math.max()
 inconsistencies.
X-BeenThere: tarantool-patches@dev.tarantool.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org>
Reply-To: Sergey Kaplun <skaplun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Errors-To: tarantool-patches-bounces@dev.tarantool.org
Sender: "Tarantool-patches" <tarantool-patches-bounces@dev.tarantool.org>

Hi Maxim!

Thanks for the fixes!

Please consider my comments below.

Also, during a flash of inspiration and debugging test cases for this
particular patch I found the test case for `math.modf()`. So, we need
an additional patch with the test as a followup to the backported
commit [8].

The following code:
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local modf = math.modf
| local nan = 0/0
| local inf = math.huge
|
| local r1 = {nil, nil, nil, nil}
| local r2 = {nil, nil, nil, nil}
| local r3 = {nil, nil, nil, nil}
| local r4 = {nil, nil, nil, nil}
| print("MODF")
| for i = 1, 4 do
|   r1[i], r2[i] = modf(inf)
|   r3[i], r4[i] = modf(nan)
| end
| print(r1[1],  r1[2],  r1[3],  r1[4])
| print(r2[1],  r2[2],  r2[3],  r2[4])
| print("")
| print(r3[1],  r3[2],  r3[3],  r3[4])
| print(r4[1],  r4[2],  r4[3],  r4[4])
| '

Returns before the patch (on both arches):

| MODF
| inf     inf     inf     inf
| 0       0       nan     nan
|
| nan     nan     nan     nan
| nan     nan     nan     nan

And after patch (on both arches):

| MODF
| inf     inf     inf     inf
| 0       0       0       0
|
| nan     nan     nan     nan
| nan     nan     nan     nan

On 19.12.22, Maxim Kokryashkin wrote:
> From: Mike Pall <mike>
> 
> (cherry-picked from commit 03208c8162af9cc01ca76ee1676ca79e5abe9b60)
> 
> `math.min()`/`math.max()` could produce different results.
> Previously, dirty values on the Lua stack could be
> treated as arguments to `math.min()`/`math.max()`.
> This patch adds check for the number of arguments provided to
> math.min/max, which fixes the issue.
> 
> Also it adds the corresponding test case for
> the mentioned issue and does some refactoring:

OK, this is not a refactoring but adding a consistency between the JIT and the
VM behaviour.

> 1. fcc is changed for min/max functions in ARM
> assembly from LO/HI (lower/upper or unordered) to LE/PL
> (lower/upper, equal or unoredered).
> 
> 2. Several fold optimizations for min/max were removed
> or modified.

See my comments with tests below.

> 
> Resolves tarantool/tarantool#6163
> ---
> 
> >IMHO, LO -> LT (N!=V Less than or unordered) do the same thing wo
> >changing the order.
> >
> >IINM, an ordered comparison checks if neither operand is NaN.
> >Conversely, an unordered comparison checks if either operand is a NaN.
> >
> >So, looks like an attempt to fix inconsistent behaviour for NaNs in
> >math.min/math.max on aarch64.
> >
> >Also, I found inconsistent behaviour on x86 (between LuaJIT|Lua):
> >
> >| # on upstream build
> >| $ ./luajit -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| inf
> >| inf
> >| inf
> >| inf
> >| $ lua -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| -nan
> >| -nan
> >| -nan
> >| -nan
> >
> >Can you please test some similar examples on aarch64/M1?
> I've tested those on M1 and here are the results:
> $ ./src/luajit -Ohotloop=1 -e 'local res = {} for i=1,4 do res[i]=math.max(0/0,math.huge) end for i =1, #res do print(res[i]) end'
> inf
> inf
> inf
> inf
> 
> $ lua -e 'local res = {} for i=1,4 do res[i]=math.max(0/0,math.huge) end for i =1, #res do print(res[i]) end'
> nan
> nan
> nan
> nan
> 
> >Also, AFAICS, some optimizations are the reason of inconsistent
> >behaviour for JIT-ed code (not the fold in this commit).
> >| # on our fork
> >| ./luajit -O0 -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| inf
> >| inf
> >| inf
> >| inf
> >| # on our fork
> >| ./luajit -Ohotloop=1 -e 'local res = {} for i = 1,4 do res[i] = math.max(0/0, math.huge) end for i = 1, #res do print(res[i]) end'
> >| inf
> >| inf
> >| inf
> >| nan
> >
> >BTW this commit doesn't fix the problem. Can you please bisect the
> >commit to backport?
> Works perfectly fine on our fork after this patch.
> 

My, bad. PEBCAK:).
The aforementioned testcase really passes after the whole patch.

Side note: __Five minutes later__:
OK, nevermind, still inconsistent on upstream (in some other way):

| $ src/luajit -Ohotloop=1 -e '
| local inf = math.huge
| local nan = inf/inf
| local min = math.min
| local max = math.max
|
| print(nan, inf)
|
| local r = {}
| local r_assoc = {}
| print("MIN:")
| for i = 1, 4 do
|   r[i] = min(nan, inf)
|   r_assoc[i] =  min(min(nan, inf), nan)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
|
| print("MAX:")
| for i = 1, 4 do
|   r[i] = max(nan, inf)
|   r_assoc[i] =  max(max(nan, inf), nan)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| '
| nan     inf
| MIN:
| inf     inf     inf     nan
| nan     nan     nan     nan
| MAX:
| inf     inf     nan     nan
| nan     nan     nan     nan

> >> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c
> >> index 276dc040..07a52a4d 100644
> >> --- a/src/lj_opt_fold.c
> >> +++ b/src/lj_opt_fold.c
> >> @@ -1774,8 +1774,6 @@ LJFOLDF(reassoc_intarith_k64)
> >> #endif
> >> }
> >>
> >> -LJFOLD(MIN MIN any)
> >> -LJFOLD(MAX MAX any)
> >> LJFOLD(BAND BAND any)
> >> LJFOLD(BOR BOR any)
> >> LJFOLDF(reassoc_dup)
> >> @@ -1785,6 +1783,15 @@ LJFOLDF(reassoc_dup)
> >> return NEXTFOLD;
> >> }
> >>
> >> +LJFOLD(MIN MIN any)
> >> +LJFOLD(MAX MAX any)
> >> +LJFOLDF(reassoc_dup_minmax)
> >> +{
> >> + if (fins->op2 == fleft->op2)
> >> + return LEFTFOLD; /* (a o b) o b ==> a o b */
> >> + return NEXTFOLD;
> >> +}
> >> +

I don't understand why this should be fixed by removing this
optimizations (this is about `min(min(a, b), b)` and `max(max(a, b), b)`
constructions).
Do you mean `fold_comm_dup_minmax()` part?

> >
> >Do you know why the opt `(a o b) o a ==> a o b;` is ommited now?
> >Are there any examples of incorrect behaviour? I suggest to check NaN
> >behaviour in this case.
> Added the test case for that one, but I failed to find any for the
> other.
> 
> I've done test runs on all of the combinations of `{1, -1, 0, -0, 0/0,
> -math.huge, math.huge}` for all of the optimizations.
> 
> I'll be glad to add another test case if you can think of any.

OK, I mark all spots with tests to be added as (T). Please, provide
verbose description for each of my comments in commit message and
comments in tests.

> 
>  src/lj_asm_arm.h                              |  6 +--
>  src/lj_asm_arm64.h                            |  6 +--
>  src/lj_opt_fold.c                             | 53 +++++++------------
>  src/lj_vmmath.c                               |  4 +-
>  src/vm_arm.dasc                               |  4 +-
>  src/vm_arm64.dasc                             |  4 +-
>  src/vm_x64.dasc                               |  2 +-
>  src/vm_x86.dasc                               |  2 +-
>  test/tarantool-tests/gh-6163-min-max.test.lua | 48 +++++++++++++++++
>  9 files changed, 81 insertions(+), 48 deletions(-)
>  create mode 100644 test/tarantool-tests/gh-6163-min-max.test.lua
> 
> diff --git a/src/lj_asm_arm.h b/src/lj_asm_arm.h
> index 8af19eb9..6ae6e2f2 100644
> --- a/src/lj_asm_arm.h
> +++ b/src/lj_asm_arm.h
> @@ -1663,8 +1663,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, int cc, int fcc)
>      asm_intmin_max(as, ir, cc);
>  }
>  
> -#define asm_min(as, ir)		asm_min_max(as, ir, CC_GT, CC_HI)
> -#define asm_max(as, ir)		asm_min_max(as, ir, CC_LT, CC_LO)
> +#define asm_min(as, ir)		asm_min_max(as, ir, CC_GT, CC_PL)
> +#define asm_max(as, ir)		asm_min_max(as, ir, CC_LT, CC_LE)
>  
>  /* -- Comparisons --------------------------------------------------------- */
>  
> @@ -1856,7 +1856,7 @@ static void asm_hiop(ASMState *as, IRIns *ir)
>    } else if ((ir-1)->o == IR_MIN || (ir-1)->o == IR_MAX) {
>      as->curins--;  /* Always skip the loword min/max. */
>      if (uselo || usehi)
> -      asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_HI : CC_LO);
> +      asm_sfpmin_max(as, ir-1, (ir-1)->o == IR_MIN ? CC_PL : CC_LE);
>      return;
>  #elif LJ_HASFFI
>    } else if ((ir-1)->o == IR_CONV) {
> diff --git a/src/lj_asm_arm64.h b/src/lj_asm_arm64.h
> index 4aeb51f3..fe197700 100644
> --- a/src/lj_asm_arm64.h
> +++ b/src/lj_asm_arm64.h
> @@ -1592,7 +1592,7 @@ static void asm_fpmin_max(ASMState *as, IRIns *ir, A64CC fcc)
>    Reg dest = (ra_dest(as, ir, RSET_FPR) & 31);
>    Reg right, left = ra_alloc2(as, ir, RSET_FPR);
>    right = ((left >> 8) & 31); left &= 31;
> -  emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, left, right);
> +  emit_dnm(as, A64I_FCSELd | A64F_CC(fcc), dest, right, left);
>    emit_nm(as, A64I_FCMPd, left, right);
>  }
>  
> @@ -1604,8 +1604,8 @@ static void asm_min_max(ASMState *as, IRIns *ir, A64CC cc, A64CC fcc)
>      asm_intmin_max(as, ir, cc);
>  }
>  
> -#define asm_max(as, ir)		asm_min_max(as, ir, CC_GT, CC_HI)
> -#define asm_min(as, ir)		asm_min_max(as, ir, CC_LT, CC_LO)
> +#define asm_min(as, ir)		asm_min_max(as, ir, CC_LT, CC_PL)
> +#define asm_max(as, ir)		asm_min_max(as, ir, CC_GT, CC_LE)
>  
>  /* -- Comparisons --------------------------------------------------------- */
>  
> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c

I test each piece of changes on x86/x64 and aarch64 and here are my
observations:

> index 49f74996..27e489af 100644
> --- a/src/lj_opt_fold.c
> +++ b/src/lj_opt_fold.c
> @@ -1797,8 +1797,6 @@ LJFOLDF(reassoc_intarith_k64)
>  #endif
>  }
>  
> -LJFOLD(MIN MIN any)
> -LJFOLD(MAX MAX any)
>  LJFOLD(BAND BAND any)
>  LJFOLD(BOR BOR any)
>  LJFOLDF(reassoc_dup)
> @@ -1808,6 +1806,15 @@ LJFOLDF(reassoc_dup)
>    return NEXTFOLD;
>  }
>  
> +LJFOLD(MIN MIN any)
> +LJFOLD(MAX MAX any)
> +LJFOLDF(reassoc_dup_minmax)
> +{
> +  if (fins->op2 == fleft->op2)
> +    return LEFTFOLD;  /* (a o b) o b ==> a o b */
> +  return NEXTFOLD;
> +}
> +

Avoiding of `(a o b) o a ==> a o b` optimization "fixes" (if I get the
idea right) the cases like the following on arch64:

| min(min(x, nan), x)
where `x` is a finite number.

(T) For example the following snippet:
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local nan = 0/0
| local x = 1
| local min = math.min
|
| print(x, nan, inf)
| print("MIN:", x, nan)
| local r = {}
| local r_assoc = {}
| for k = 1, 4 do
|   r[k] = min(x, nan)
|   r_assoc[k] =  min(min(x, nan), x)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n")
| '

prints the following before the patch:
| 1       nan     inf
| MIN:    1       nan
| nan     nan     nan     nan
| 1       1       1       nan

and after patch:
| 1       nan     inf
| MIN:    1       nan
| nan     nan     nan     nan
| 1       1       1       1

For this code we get the following IRs:

| 0003 x8 >  int SLOAD  #4    T       ; 1
| 0004 d3 >  num SLOAD  #3    T       ; nan
| 0005    >  fun EQ     0002  math.min
| 0006 d2    num CONV   0003  num.int ; 1 -> 1.0
| 0007 d1    num MIN    0006  0004    ; res_0007 = min(1.0, nan) = nan


In case after patch with removed fold optimization we get the additional IR
later:

| 0016 d0    num MIN    0007  0006    ; min(res_0007, 1.0)

with the following mcode:
| fcmp  d2, d3         ; fcmp 1.0, nan
| fcsel d1, d2, d3, cc ; d1 == nan after this instruction (*)
| ...
| fcmp  d1, d2         ; fcmp nan, 1.0
| fcsel d0, d1, d2, cc ; d0 == 1.0 after this instruction (*)

Before the patch only one `fcmp` `fcsel` pair is emitted, so `nan` value
is returned in the destination register.

(*) Why this value is chosen:

According to the `fcmp` documentation:

| The IEEE 754 standard specifies that the result of a comparison is
| precisely one of <, ==, > or unordered. If either or both of the
| operands are NaNs, they are unordered, and all three of (Operand1 <
| Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false.
| This case results in the FPSCR flags being set to N=0, Z=0, C=1, and
| V=1.

`cc` (aka `CC_LO` == `CC_CC`) condition means that C flag is 0 [2], that
is false when we are comparing something with NaN.

So, according to the `fcsel` documentation [3]:

| If the condition passes, the first SIMD and FP source register value
| is taken, otherwise the second SIMD and FP source register value is
| taken.

So, the value of the second source register is always chosen.

(TT) When we change the order of arguments things become broken again:

| $ LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local inf = math.huge
| local nan = 0/0
|
| local x = 1
| local min = math.min
| local max = math.max
|
| print(x, nan, inf)
| print("MIN:", x, nan)
| local r = {}
| local r_assoc = {}
| for k = 1, 4 do
|   r[k] = min(x, nan)
|   r_assoc[k] =  min(x, min(x, nan))
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n")
| '

Prints the following before the patch (on aarch64):

| 1       nan     inf
| MIN:    1       nan
| nan     nan     nan     nan
| nan     nan     nan     nan

And after this particular chunk of the patch (reassoc_dup_minmax):

| 1       nan     inf
| MIN:    1       nan
| nan     nan     nan     nan
| nan     nan     nan     1

>  LJFOLD(BXOR BXOR any)
>  LJFOLDF(reassoc_bxor)
>  {
> @@ -1846,23 +1853,12 @@ LJFOLDF(reassoc_shift)
>    return NEXTFOLD;
>  }
>  
> -LJFOLD(MIN MIN KNUM)
> -LJFOLD(MAX MAX KNUM)
>  LJFOLD(MIN MIN KINT)
>  LJFOLD(MAX MAX KINT)
>  LJFOLDF(reassoc_minmax_k)
>  {
>    IRIns *irk = IR(fleft->op2);
> -  if (irk->o == IR_KNUM) {
> -    lua_Number a = ir_knum(irk)->n;
> -    lua_Number y = lj_vm_foldarith(a, knumright, fins->o - IR_ADD);
> -    if (a == y)  /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */
> -      return LEFTFOLD;
> -    PHIBARRIER(fleft);
> -    fins->op1 = fleft->op1;
> -    fins->op2 = (IRRef1)lj_ir_knum(J, y);
> -    return RETRYFOLD;  /* (x o k1) o k2 ==> x o (k1 o k2) */
> -  } else if (irk->o == IR_KINT) {
> +  if (irk->o == IR_KINT) {
>      int32_t a = irk->i;
>      int32_t y = kfold_intop(a, fright->i, fins->o);
>      if (a == y)  /* (x o k1) o k2 ==> x o k1, if (k1 o k2) == k1. */
> @@ -1875,24 +1871,6 @@ LJFOLDF(reassoc_minmax_k)
>    return NEXTFOLD;
>  }

(TTT) This chunk fixes behaviour for constants reassociation.

Run the following chunk:
| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local inf = math.huge
| local nan = 0/0
| local min = math.min
| local max = math.max
|
| local r = {}
| local r_assoc = {}
| print("MIN:")
| local x = 1.2
| for i = 1, 4 do
|   r[i] =  min(min(x, 0/0), 1.3)
|   r_assoc[i] =  min(min(x, 1.3), 0/0)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
|
| print("MAX:")
| for i = 1, 4 do
|   r[i] =  max(max(x, 0/0), 1.1)
|   r_assoc[i] =  max(max(x, 1.1), 0/0)
| end
| print(r[1], r[2], r[3], r[4])
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| '

Gives the following results before the particular (reassoc_minmax_k)
patch (on both arches):

| MIN:
| 1.3     1.3     1.3     1.2
| nan     nan     nan     nan
| MAX:
| 1.1     1.1     1.2     1.2
| nan     nan     nan     nan

And after patch (on both arches):
| MIN:
| 1.3     1.3     1.3     1.3
| nan     nan     nan     nan
| MAX:
| 1.1     1.1     1.1     1.1
| nan     nan     nan     nan

>  
> -LJFOLD(MIN MAX any)
> -LJFOLD(MAX MIN any)
> -LJFOLDF(reassoc_minmax_left)
> -{
> -  if (fins->op2 == fleft->op1 || fins->op2 == fleft->op2)
> -    return RIGHTFOLD;  /* (b o1 a) o2 b ==> b; (a o1 b) o2 b ==> b */
> -  return NEXTFOLD;
> -}

This particular patch (reassoc_minmax_left) fixes the following test
case on aarch64. (TTTT)

| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local min = math.min
| local max = math.max
| local nan = 0/0
|
| local r_assoc = {}
| local r_assoc2 = {}
| print("MIN - MAX:")
| for i = 1, 4 do
|   r_assoc[i]  =  min(max(nan, 1), 1)
|   r_assoc2[i] =  min(max(1, nan), 1)
| end
| print(r_assoc[1],  r_assoc[2],  r_assoc[3],  r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
|
| print("MAX - MIN:")
| for i = 1, 4 do
|   r_assoc[i]  =  max(min(nan, 1), 1)
|   r_assoc2[i] =  max(min(1, nan), 1)
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
| '

Returns before the patch:
| MIN - MAX:
| 1       1       1       nan
| 1       1       1       nan
|
| MAX - MIN:
| 1       1       nan     nan
| 1       1       nan     nan

And after the patch:

| MIN - MAX:
| 1       1       1       1
| 1       1       1       1
|
| MAX - MIN:
| 1       1       1       1
| 1       1       1       1

> -
> -LJFOLD(MIN any MAX)
> -LJFOLD(MAX any MIN)
> -LJFOLDF(reassoc_minmax_right)
> -{
> -  if (fins->op1 == fright->op1 || fins->op1 == fright->op2)
> -    return LEFTFOLD;  /* a o2 (a o1 b) ==> a; a o2 (b o1 a) ==> a */
> -  return NEXTFOLD;
> -}
> -

OK, let's try a similar test case here (TTTTT):

| LUA_PATH="src/?.lua;;" src/luajit -Ohotloop=1 -e '
| local min = math.min
| local max = math.max
| local nan = 0/0
|
| local r_assoc = {}
| local r_assoc2 = {}
| print("MIN - MAX:")
| for i = 1, 4 do
|   r_assoc[i]  =  min(1, max(nan, 1))
|   r_assoc2[i] =  min(1, max(1, nan))
| end
| print(r_assoc[1],  r_assoc[2],  r_assoc[3],  r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
|
| print("MAX - MIN:")
| for i = 1, 4 do
|   r_assoc[i]  =  max(1, min(nan, 1))
|   r_assoc2[i] =  max(1, min(1, nan))
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| print(r_assoc2[1], r_assoc2[2], r_assoc2[3], r_assoc2[4], "\n")
| '

Before the patch it leads to the assertion failure in
`rec_check_slots()` (both arches):
| MIN - MAX:
| luajit: src/lj_record.c:142: rec_check_slots: Assertion `lj_obj_equal(tv, &tvk)' failed.
| Aborted

After the patch we got inconsistent results on x86/x64 (same results in
the upstream):

| MIN - MAX:
| 1       1       1       1
| nan     nan     nan     1
|
| MAX - MIN:
| 1       1       1       1
| nan     nan     1       1

But for aarch64 it is fine:

| MIN - MAX:
| 1       1       1       1
| nan     nan     nan     nan
|
| MAX - MIN:
| 1       1       1       1
| nan     nan     nan     nan

I suggest to add this testcase without checking values consistency, but
with FIXME: mark.

>  /* -- Array bounds check elimination -------------------------------------- */
>  
>  /* Eliminate ABC across PHIs to handle t[i-1] forwarding case.
> @@ -2018,8 +1996,6 @@ LJFOLDF(comm_comp)
>  
>  LJFOLD(BAND any any)
>  LJFOLD(BOR any any)
> -LJFOLD(MIN any any)
> -LJFOLD(MAX any any)
>  LJFOLDF(comm_dup)
>  {
>    if (fins->op1 == fins->op2)  /* x o x ==> x */
> @@ -2027,6 +2003,15 @@ LJFOLDF(comm_dup)
>    return fold_comm_swap(J);
>  }
>  
> +LJFOLD(MIN any any)
> +LJFOLD(MAX any any)
> +LJFOLDF(comm_dup_minmax)
> +{
> +  if (fins->op1 == fins->op2)  /* x o x ==> x */
> +    return LEFTFOLD;
> +  return NEXTFOLD;
> +}
> +

No need to swap operands here to avoid side effects, so just continue.
This fixes (TT) on arm64.

We get the following mcode now:
| fcmp  d2, d3         ; fcmp 1.0, nan
| fcsel d1, d3, d2, pl ; d1 == nan after this instruction
| ...
| fcmp  d2, d1         ; fcmp 1.0, nan
| fcsel d0, d1, d2, pl ; d0 == nan after this instruction

`pl` (aka `CC_PL`) condition means that N flag is 0 [2], that
is true when we are comparing something with NaN. So, the value of the
first source register is taken.

After this particular chunk of the patch (comm_dup_minmax) plus
(reassoc_dup_minmax) (TT) output is the following on aarch64:

| 1       nan     inf
| MIN:    1       nan
| nan     nan     nan     nan
| nan     nan     nan     nan

>  LJFOLD(BXOR any any)
>  LJFOLDF(comm_bxor)
>  {
> diff --git a/src/lj_vmmath.c b/src/lj_vmmath.c
> index c04459bd..ae4e0f15 100644
> --- a/src/lj_vmmath.c
> +++ b/src/lj_vmmath.c
> @@ -49,8 +49,8 @@ double lj_vm_foldarith(double x, double y, int op)
>    case IR_ABS - IR_ADD: return fabs(x); break;
>  #if LJ_HASJIT
>    case IR_LDEXP - IR_ADD: return ldexp(x, (int)y); break;
> -  case IR_MIN - IR_ADD: return x > y ? y : x; break;
> -  case IR_MAX - IR_ADD: return x < y ? y : x; break;
> +  case IR_MIN - IR_ADD: return x < y ? x : y; break;
> +  case IR_MAX - IR_ADD: return x > y ? x : y; break;

(TTTTTT) This piece of the patch fixes inconsistency in
`fold_kfold_numarith()` (on x86/x64):

| $ LUA_PATH="src/?.lua;;" src/luajit -jdump=i -Ohotloop=1 -e '
| local min = math.min
| local max = math.max
|
| local r_assoc = {}
| print("MIN:")
| for i = 1, 4 do
|   r_assoc[i] =  min(min(7.1, 0/0), 1.1)
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4], "\n")
|
| print("MAX:")
| for i = 1, 4 do
|   r_assoc[i] =  max(max(7.1, 0/0), 1.1)
| end
| print(r_assoc[1], r_assoc[2], r_assoc[3], r_assoc[4])
| '
| MIN:
| ---- TRACE 1 start (command line):7
| luajit: /home/burii/reviews/luajit/minmax/src/lj_record.c:142: rec_check_slots: Assertion `lj_obj_equal(tv, &tvk)' failed.
| Aborted
NB: use 0/0 constant here according to `fold_kfold_numarith()`
semantics.

`tv` is nan(0x8000000000000), `tvk` is 7.1.

If we look in disassembled code of `lj_vm_foldarith()` we can see the
following:

| /* In our example x == 7.1, y == nan */
| case IR_MIN - IR_ADD: return x > y ? y : x; break;
| case IR_MAX - IR_ADD: return x < y ? y : x; break;

| ; case IR_MIN
| <lj_vm_foldarith+337>:  movsd  xmm0,QWORD PTR [rsp+0x18] ; xmm0 <- 7.1
| <lj_vm_foldarith+343>:  comisd xmm0,QWORD PTR [rsp+0x10] ; comisd 7.1, nan
| <lj_vm_foldarith+349>:  jbe    <lj_vm_foldarith+358>     ; >= ?
| <lj_vm_foldarith+351>:  mov    rax,QWORD PTR [rsp+0x10]  ; return nan
| <lj_vm_foldarith+356>:  jmp    <lj_vm_foldarith+398>     ;
| <lj_vm_foldarith+358>:  mov    rax,QWORD PTR [rsp+0x18]  ; else return 7.1
| <lj_vm_foldarith+363>:  jmp    <lj_vm_foldarith+398>     ;
| ; case IR_MAX
| <lj_vm_foldarith+365>:  movsd  xmm0,QWORD PTR [rsp+0x10] ; xmm0 <- nan
| <lj_vm_foldarith+371>:  comisd xmm0,QWORD PTR [rsp+0x18] ; comisd nan, 7.1
| <lj_vm_foldarith+377>:  jbe    <lj_vm_foldarith+386>     ; >= ?
| <lj_vm_foldarith+379>:  mov    rax,QWORD PTR [rsp+0x10]  ; return nan
| <lj_vm_foldarith+384>:  jmp    <lj_vm_foldarith+398>     ;
| <lj_vm_foldarith+386>:  mov    rax,QWORD PTR [rsp+0x18]  ; else return 7.1
| <lj_vm_foldarith+391>:  jmp    <lj_vm_foldarith+398>     ;

According to `comisd` documentation [4] in case when the one operand is NaN,
the result is unordered and ZF,PF,CF := 111. This means that `jbe`
condition is true (CF=1 or ZF=1)[5], so we return 7.1 (the first
operand) for case `IR_MIN`.

Q: Why it is the problem?
For answer lets see the `lj_ff_math_min()` and `lj_ff_math_max()` in the VM:

For number values we got the following:
|7:
|  sseop xmm0, xmm1
Where `sseop` is `minsd`/`maxsd` instruction correspondingly.
>From the instruction reference guides [6][7]:
| If only one value is a NaN (SNaN or QNaN) for this instruction, the
| second source operand, either a NaN or a valid floating-point value, is
| written to the result.

After the patch we got the following disassembled code for
`lj_vm_foldarith()`:

| ; case IR_MIN
| <lj_vm_foldarith+337>:  movsd  xmm0,QWORD PTR [rsp+0x10] ; xmm0 <- nan
| <lj_vm_foldarith+343>:  comisd xmm0,QWORD PTR [rsp+0x18] ; comisd nan, 7.1
| <lj_vm_foldarith+349>:  jbe    <lj_vm_foldarith+358>     ; >= ?
| <lj_vm_foldarith+351>:  mov    rax,QWORD PTR [rsp+0x18]  ; return 7.1
| <lj_vm_foldarith+356>:  jmp    <lj_vm_foldarith+398>     ;
| <lj_vm_foldarith+358>:  mov    rax,QWORD PTR [rsp+0x10]  ; else return nan
| <lj_vm_foldarith+363>:  jmp    <lj_vm_foldarith+398>     ;
| ; case IR_MAX
| <lj_vm_foldarith+365>:  movsd  xmm0,QWORD PTR [rsp+0x18] ; xmm0 <- 7.1
| <lj_vm_foldarith+371>:  comisd xmm0,QWORD PTR [rsp+0x10] ; comisd 7.1, nan
| <lj_vm_foldarith+377>:  jbe    <lj_vm_foldarith+386>     ; >= ?
| <lj_vm_foldarith+379>:  mov    rax,QWORD PTR [rsp+0x18]  ; return 7.1
| <lj_vm_foldarith+384>:  jmp    <lj_vm_foldarith+398>     ;
| <lj_vm_foldarith+386>:  mov    rax,QWORD PTR [rsp+0x10]  ; else return nan
| <lj_vm_foldarith+391>:  jmp    <lj_vm_foldarith+398>     ;

So now we always return the second operand (nan for case `IR_MIN`).

Side note: IMHO, instead of this behavior, it the NaN source operand
should be returned. Also, IINM, it makes the behaviour consistent with
PUC-Rio Lua 5.1.

NB:

For aarch64 this changes the follwing assembly:
| 62630: ldr     d1, [sp, #40]
| 62634: ldr     d0, [sp, #32]
| 62638: fcmpe   d1, d0
| 6263c: b.le    62648 <lj_vm_foldarith+0x150>
| 62640: ldr     d0, [sp, #32]
| 62644: b       62674 <lj_vm_foldarith+0x17c>
| 62648: ldr     d0, [sp, #40]
| 6264c: b       62674 <lj_vm_foldarith+0x17c>
| 62650: ldr     d1, [sp, #40]
| 62654: ldr     d0, [sp, #32]
| 62658: fcmpe   d1, d0
| 6265c: b.pl    62668 <lj_vm_foldarith+0x170>  // b.nfrst
| 62660: ldr     d0, [sp, #32]
| 62664: b       62674 <lj_vm_foldarith+0x17c>
| 62668: ldr     d0, [sp, #40]
| 6266c: b       62674 <lj_vm_foldarith+0x17c>

To this one:

| 62620: ldr     d1, [sp, #40]
| 62624: ldr     d0, [sp, #32]
| 62628: fcmpe   d1, d0
| 6262c: b.pl    62638 <lj_vm_foldarith+0x150>  // b.nfrst
| 62630: ldr     d0, [sp, #40]
| 62634: b       62664 <lj_vm_foldarith+0x17c>
| 62638: ldr     d0, [sp, #32]
| 6263c: b       62664 <lj_vm_foldarith+0x17c>
| 62640: ldr     d1, [sp, #40]
| 62644: ldr     d0, [sp, #32]
| 62648: fcmpe   d1, d0
| 6264c: b.le    62658 <lj_vm_foldarith+0x170>
| 62650: ldr     d0, [sp, #40]
| 62654: b       62664 <lj_vm_foldarith+0x17c>
| 62658: ldr     d0, [sp, #32]
| 6265c: b       62664 <lj_vm_foldarith+0x17c>

So, we should provide the same changes for VM|JIT generated mcode (and
this is why CC_LO/CC_HI are replaced with CC_LE/CC_PL)

>  #endif
>    default: return x;
>    }
> diff --git a/src/vm_arm.dasc b/src/vm_arm.dasc
> index a29292f1..89faa03e 100644
> --- a/src/vm_arm.dasc
> +++ b/src/vm_arm.dasc
> @@ -1718,8 +1718,8 @@ static void build_subroutines(BuildCtx *ctx)
>    |.endif
>    |.endmacro
>    |
> -  |  math_minmax math_min, gt, hi
> -  |  math_minmax math_max, lt, lo
> +  |  math_minmax math_min, gt, pl
> +  |  math_minmax math_max, lt, le
>    |
>    |//-- String library -----------------------------------------------------
>    |
> diff --git a/src/vm_arm64.dasc b/src/vm_arm64.dasc
> index f517a808..2c1bb4f8 100644
> --- a/src/vm_arm64.dasc
> +++ b/src/vm_arm64.dasc
> @@ -1494,8 +1494,8 @@ static void build_subroutines(BuildCtx *ctx)
>    |  b <6
>    |.endmacro
>    |
> -  |  math_minmax math_min, gt, hi
> -  |  math_minmax math_max, lt, lo
> +  |  math_minmax math_min, gt, pl
> +  |  math_minmax math_max, lt, le
>    |
>    |//-- String library -----------------------------------------------------
>    |
> diff --git a/src/vm_x64.dasc b/src/vm_x64.dasc
> index 59f117ba..faeb5181 100644
> --- a/src/vm_x64.dasc
> +++ b/src/vm_x64.dasc

<snipped>

> diff --git a/src/vm_x86.dasc b/src/vm_x86.dasc
> index f7ffe5d2..1c995d16 100644
> --- a/src/vm_x86.dasc
> +++ b/src/vm_x86.dasc

<snipped>

> diff --git a/test/tarantool-tests/gh-6163-min-max.test.lua b/test/tarantool-tests/gh-6163-min-max.test.lua
> new file mode 100644
> index 00000000..1da8a259
> --- /dev/null
> +++ b/test/tarantool-tests/gh-6163-min-max.test.lua
> @@ -0,0 +1,48 @@
> +local tap = require('tap')
> +local test = tap.test('gh-6163-jit-min-max')
> +test:plan(3)
> +--
> +-- gh-6163: math.min/math.max inconsistencies.
> +--
> +
> +local function is_consistent(res)
> +  for i = 1, #res - 1 do
> +    if res[i] ~= res[i + 1] then
> +      return false
> +    end
> +  end
> +  return true
> +end

Side note: may be it is better to name it `array_is_consistent()` as far
as it checks only array table part?
Also, we probably can use it in our test utils.

> +
> +-- This function creates dirty values on the Lua stack.
> +-- The latter of them is going to be treated as an
> +-- argument by the `math.min/math.max`.
> +-- The first two of them are going to be overwritten
> +-- by the math function itself.
> +local function filler()
> +  return 1, 1, 1
> +end
> +
> +-- Success with no args.

Don't get the comment.
Do you mean:
| -- `math.min()` should raise an error when is called without
| -- arguments.

> +filler()
> +local r, _ = pcall(function() math.min() end)
> +test:ok(false == r, 'math.min fails with no args')

Why don't use `not r` here?

> +
> +filler()
> +r, _ = pcall(function() math.max() end)
> +test:ok(false == r, 'math.max fails with no args')

Why do we need the second test?
Please, add the coressponding comment.

> +
> +-- Incorrect fold optimization.

Don't get your comment. What do you mean?

> +jit.off()
> +jit.flush()

Why do we need to remove previous traces before start?

> +jit.opt.start('hotloop=1')
> +jit.on()
> +
> +local res = {}
> +for i = 1, 4 do
> +  res[i] = math.min(math.min(0/0, math.huge), math.huge)

NB: IINM, still inconsistent if we use
| local nan = 0/0
| -- ...
|   res[i] = math.min(math.min(nan, math.huge), math.huge)

I suppose, that we should comment this.

> +end
> +
> +test:ok(is_consistent(res), '(a o b) o a -> a o b')
> +
> +os.exit(test:check() and 0 or 1)
> -- 
> 2.38.1
> 

[1]: https://developer.arm.com/documentation/dui0801/g/A64-Floating-point-Instructions/FCMP
[2]: https://www.cs.princeton.edu/courses/archive/spr19/cos217/reading/ArmInstructionSetOverview.pdf
[3]: https://developer.arm.com/documentation/100069/0608/A64-Floating-point-Instructions/FCSEL
[4]: https://www.felixcloutier.com/x86/comisd
[5]: https://www.felixcloutier.com/x86/jcc
[6]: https://www.felixcloutier.com/x86/minsd
[7]: https://www.felixcloutier.com/x86/maxsd
[8]: https://github.com/tarantool/luajit/commit/9b6c0cd8eafdd2e5a8a7ac4b33f6e33b3d8a93b9

-- 
Best regards,
Sergey Kaplun