From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounces@dev.tarantool.org>
Received: from [87.239.111.99] (localhost [127.0.0.1])
	by dev.tarantool.org (Postfix) with ESMTP id 714D66F153;
	Thu, 15 Sep 2022 01:09:13 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 714D66F153
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev;
	t=1663193353; bh=AYIsm+uUNIDBZ00YjFyS8Evp7gY+h1Wsobi/a6x57uQ=;
	h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=NBAw8OuJ2tr2ZfXMiwOB+nUfs9mYuyTbrv4j3k6/FVnMdzbLboXc89it6S3vmGnkF
	 lZdzgqmEvC5etB08adc0FG0kbid5HnFxUImiE0fJJkc9T65nM/pjS5f8ObFiH8ztVW
	 sJAAbK4L10nfGNjh8UUgzSS/DbkejhUlVIWwu91A=
Received: from smtp59.i.mail.ru (smtp59.i.mail.ru [217.69.128.39])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id A59276F153
 for <tarantool-patches@dev.tarantool.org>;
 Thu, 15 Sep 2022 01:09:11 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A59276F153
Received: by smtp59.i.mail.ru with esmtpa (envelope-from
 <skaplun@tarantool.org>)
 id 1oYaZC-0007Lm-NS; Thu, 15 Sep 2022 01:09:11 +0300
Date: Thu, 15 Sep 2022 01:06:29 +0300
To: Maksim Kokryashkin <max.kokryashkin@gmail.com>
Message-ID: <YyJQZUUa41uwJcL1@root>
References: <20220909121531.87545-1-max.kokryashkin@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20220909121531.87545-1-max.kokryashkin@gmail.com>
X-Mailru-Src: smtp
X-4EC0790: 10
X-7564579A: 646B95376F6C166E
X-77F55803: 4F1203BC0FB41BD9EBD81B01C038F8F396C53C77C8EAFD6814A2CCDCE7FE122F182A05F5380850409F140F79F0CA7235DE457B8B5B105C2D8AC7D8125E7AB6407749DFE48FE02DCE
X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE77603ADE015AF816DEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637575C0790A70C4B158638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8A5E59DEE3F0501AAC2F7BA8D7C043367117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18BDFBBEFFF4125B51D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EED76C6ED7039589DE03CEA74F0D118906D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE367F1C1C3ABB44F3A6E0066C2D8992A16C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637A7EFCB0EB5ACB161EFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A
X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142185AC9E3593CE4B31AB1881A6453793CE9274300E5CE05BD4401A9E91200F654B0C7A0BC55FA0FE5FC0FF29EA5888F38274760C267656DBD4F4AC49324C72D5A23B1881A6453793CE9C32612AADDFBE06133F7A9E5587C79A693EDB24507CE13387DFF0A840B692CF8
X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34ECB3E21D3CD9CB4F1AE4913169FB4120318AE4203924E30C002D3902B5F239B7AFC4D99A3A00AF461D7E09C32AA3244C9E84CE513BBA223285E0037D96239958FE8DA44ABE2443F7FACE5A9C96DEB163
X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojnfsvVopF4B/0SwZyCCiibQ==
X-Mailru-Sender: F16D9CAFEEA6770E7B6EAD4ADB3BCAF0501802350ECE6A9BC15C14B5750D5C5CF233904666D6F39AF2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A84198E0F3ECE9B5443453F38A29522196
X-Mras: Ok
Subject: Re: [Tarantool-patches] [PATCH luajit] Fix narrowing of unary minus.
X-BeenThere: tarantool-patches@dev.tarantool.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
From: Sergey Kaplun via Tarantool-patches <tarantool-patches@dev.tarantool.org>
Reply-To: Sergey Kaplun <skaplun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Errors-To: tarantool-patches-bounces@dev.tarantool.org
Sender: "Tarantool-patches" <tarantool-patches-bounces@dev.tarantool.org>

Hi, Maxim!

Thanks for the patch!

Please consider my comments below.

On 09.09.22, Maksim Kokryashkin wrote:
> From: Mike Pall <mike>
> 

Please, don't forget the common cherry-picked header:

| (cherry picked from commit 1e6e8aaa20626ac94cf907c69b0452f76e9f5fa5)

It is necessary to recognize the original commit without grepping of
LuaJIT source code.

> With DUALNUM mode disabled, unary minus narrowing can produce
> inconsistent results on zeros, returning both -0 and 0 on the
> same chunk of code.
> 
> This patch fixes the inconsistency by adding checks for zeros.

I suggest to reformulate these paragraphes like the following:

| LuaJIT narrowing optimization during BC_UNM recording may ignore
| information about sign of zero for integer types of IR. So far the
| resulting value on a trace is not the same as for the interpreter.
| However, in DUALNUM mode with using of TValues containing integers the
| -0 value is represented in the same way as the 0 value and there is no
| difference between them.
|
| This patch fixes the non-DUALNUM mode behaviour. When the zero value is
| identified during recording it should be cast to number so IR_CONV is
| emitted. Also, this patch adds assertion guard checking that value on
| which operation of unary minus is performed isn't zero.

Feel free to modify it as you wish.

> 

Don't be shy to mention your impact in this patch (description and
tests) like it is done in previous backported commits.

> Part of tarantool/tarantool#7230
> Resolves tarantool/tarantool#6976

Minor: Igor prefers to place the resolved issue first (as far as this is
the main issue to close). I prefer the opposite order. But we've agreed
with Igor's way, so let use it for consistency.

> ---
> Issue: https://github.com/tarantool/tarantool/issues/6976
> Branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-6976-narrowing-of-unary-minus
> PR: https://github.com/tarantool/tarantool/pull/7662
> 
>  src/lj_opt_narrow.c                           |  9 +++--
>  .../gh-6976-narrowing-of-unary-minus.test.lua | 34 +++++++++++++++++++
>  2 files changed, 41 insertions(+), 2 deletions(-)
>  create mode 100644 test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua
> 
> diff --git a/src/lj_opt_narrow.c b/src/lj_opt_narrow.c
> index cd96ca4b..bb61f97b 100644
> --- a/src/lj_opt_narrow.c
> +++ b/src/lj_opt_narrow.c
> @@ -551,8 +551,13 @@ TRef lj_opt_narrow_unm(jit_State *J, TRef rc, TValue *vc)
>  {
>    rc = conv_str_tonum(J, rc, vc);
>    if (tref_isinteger(rc)) {
> -    if ((uint32_t)numberVint(vc) != 0x80000000u)
> -      return emitir(IRTGI(IR_SUBOV), lj_ir_kint(J, 0), rc);
> +    uint32_t k = (uint32_t)numberVint(vc);
> +    if ((LJ_DUALNUM || k != 0) && k != 0x80000000u) {
> +      TRef zero = lj_ir_kint(J, 0);
> +      if (!LJ_DUALNUM)
> +	emitir(IRTGI(IR_NE), rc, zero);
> +      return emitir(IRTGI(IR_SUBOV), zero, rc);
> +    }

Side note:

I wonder how much it is better to use this brancn in narrowing
optimization for non-DUALNUMBER mode, as far as we even add additional
assertion guard for zero check (IR_NE).

Assume we have the following code:
| local r = 0
| for i = 1, 1e9 do
|   local a = -i
|   r = r + a
| end
| print(r)

Before the patch we've had the following IRs for LOOP part:

| ------ LOOP ------------
| 0009 >  int SUBOV  +0    0006
| 0010    num CONV   0009  num.int
| 0011  + num ADD    0010  0005
| 0012  + int ADD    0006  +1
| 0013 >  int LE     0012  +1000000000
| 0014    int PHI    0006  0012
| 0015    num PHI    0005  0011

And after patch:

| ------ LOOP ------------
| 0010 >  int NE     0007  +0
| 0011 >  int SUBOV  +0    0007
| 0012    num CONV   0011  num.int
| 0013  + num ADD    0012  0006
| 0014  + int ADD    0007  +1
| 0015 >  int LE     0014  +1000000000
| 0016    int PHI    0007  0014
| 0017    num PHI    0006  0013

This leads to the following additional instructions:

| test ebp, ebp
| jz 0x561394990018 ->2

I run 10 loops like the aforementioned one with 21e8 iterations.

The average time for 5 runs (of 10 loops) before the patch: 25.763s
The average time for 5 runs (of 10 loops) after  the patch: 25.885s

And, actually, if we skip this optimization with the following patch
(I just skip the branch with SUBOV emitting for non-DUALNUMBER mode):

===================================================================
diff --git a/src/lj_opt_narrow.c b/src/lj_opt_narrow.c
index bb61f97b..aaa5b3cd 100644
--- a/src/lj_opt_narrow.c
+++ b/src/lj_opt_narrow.c
@@ -552,7 +552,7 @@ TRef lj_opt_narrow_unm(jit_State *J, TRef rc, TValue *vc)
   rc = conv_str_tonum(J, rc, vc);
   if (tref_isinteger(rc)) {
     uint32_t k = (uint32_t)numberVint(vc);
-    if ((LJ_DUALNUM || k != 0) && k != 0x80000000u) {
+    if ((LJ_DUALNUM) && k != 0x80000000u) {
       TRef zero = lj_ir_kint(J, 0);
       if (!LJ_DUALNUM)
         emitir(IRTGI(IR_NE), rc, zero);
===================================================================

The IRs for the loop part are the following:
| ------ LOOP ------------
| 0010    num CONV   0007  num.int
| 0011  + num SUB    0006  0010
| 0012  + int ADD    0007  +1
| 0013 >  int LE     0012  +1000000000
| 0014    int PHI    0007  0012
| 0015    num PHI    0006  0011

As you can see we have even 2 IRs instead of 4 (we skip assertion guard
and creation of negative number -- we just use substitution instead of
addition). I suppose for non-DUALNUM mode this is the __common__ case --
operations in doubles, not integers (it may be different for cdata
indexes -- I don't check it).

As far as we have only 3 instructions for this loop payload:

| xorps xmm6, xmm6
| cvtsi2sd xmm6, ebp
| subsd xmm7, xmm6

instead of old 8 instructions:
| test ebp, ebp
| jz 0x555995dc0018 ->2
| xor ebx, ebx
| sub ebx, ebp
| jo 0x555995dc0018 ->2
| xorps xmm6, xmm6
| cvtsi2sd xmm6, ebx
| addsd xmm7, xmm6

the results are slightly better: 25.368s

I understand, that the benchmark is kinda synthetic, but still...

P.S. For others payload the result can be the diffirent, but this code
looks like the most common, as I said before.

I mean constructions like the following:
| local a = -b;
| c = d + a

Also, it is converted to number anyway in HREF:

| 0021 >  int NE     0018  +0
| 0022 >  int SUBOV  +0    0018
| 0023    int FLOAD  0004  tab.asize
| 0024 >  int EQ     0023  +0
| 0025    num CONV   0022  num.int
| 0026    p32 HREF   0004  0025

with the following assembly:

| mov edi, [0x40162540]
| mov esi, [rsp+0x10]
|
| test ebp, ebp
| jz 0x558aff9c0018 ->2
| xor r15d, r15d
| sub r15d, ebp
| jo 0x558aff9c0018 ->2
|
| mov ebx, [rsi+0x18]
| test ebx, ebx
| jnz 0x558aff9c0018        ->2
|
| xorps xmm7, xmm7
| cvtsi2sd xmm7, r15d
|
| movq rdx, xmm7

for the following compiled code:
| local a = tab[-i] -- type(tab) == 'table'

Without NE, SUBOV emitting IRs are the following:
| 0020    num CONV   0017  num.int
| 0021    num NEG    0020  0003
| 0022    int FLOAD  0005  tab.asize
| 0023 >  int EQ     0022  +0
| 0024    p32 HREF   0005  0021

with the following assembly:

| mov edi, [0x40093540]
| mov esi, [rsp+0x10]
|
| xorps xmm7, xmm7
| cvtsi2sd xmm7, ebp
| movsd [rsp+0x8], xmm7
| movaps xmm6, xmm7
| xorps xmm6, [0x400936f0]
|
| mov ebx, [rsi+0x18]
| test ebx, ebx
| jnz 0x55c341180018        ->2
|
| movq rdx, xmm6

Average time when run 3 chunks like the following:
| local r = require"table.new"(1e8, 0)
| for i = 1, 1e8 do
|   local a = r[-i]
|   r[i] = a
| end
| print(r)

With NE and SUBOV: 7.769s
Without: 7.670s

Also, it is obviously more natural to use this cast in DUALNUMBER mode,
where integer TValues are usual.

P.P.S. The patch itself as a bugfix LGTM.

P.P.P.S. Thoughts? :)

>      rc = emitir(IRTN(IR_CONV), rc, IRCONV_NUM_INT);
>    }
>    return emitir(IRTN(IR_NEG), rc, lj_ir_ksimd(J, LJ_KSIMD_NEG));
> diff --git a/test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua b/test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua
> new file mode 100644
> index 00000000..caae8c34
> --- /dev/null
> +++ b/test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua

Side note: This 6976 issue has no description, but OK :)

> @@ -0,0 +1,34 @@
> +require("utils").skipcond(

Minor: Use single quotes instead of double quotes.

> +  jit.arch ~= "x86" and jit.arch ~= "x64",
> +  "DUALNUM mode must be disabled"
> +)

Not sure that we must skip other arches, see argumentation below.

> +
> +local tap = require("tap")
> +local test = tap.test("gh-6976-narrowing-of-unary-minus")
> +test:plan(1)
> +
> +jit.off()
> +jit.flush()
> +jit.opt.start('hotloop=1')
> +jit.on()

Don't see the reason to restart the JIT compiler at this point. Just set
the option is enough.

> +
> +local has_error = false
> +
> +-- The first two iterations are needed to write trace in a way
> +-- where `a` is treated like a non-zero integer value. After the
> +-- first two iterations 'a' becomes a zero integer, but it is still
> +-- processed by the same trace. Finally, roughly ten loop

These ten iterations are needed for side trace compilation.
You can set hotexit=1 to start compiling trace at the first trace exit.
But this is not necessary, see argumentation below.

> +-- iterations later, a new trace is formed, where `a` is treated
> +-- like a number.
> +for i=1,20 do
> +  local a = 1
> +  if i > 2 then
> +    a = 0
> +  end
> +  a = -a
> +  has_error = has_error or tostring(a):match("%-%d") == nil
> +end

Actually, I don't see the reason for such complicated test case.
We can just use the following (with better naming, etc.):

| local res = require'table.new'(3, 0)
|
| for _ = 1, 3 do
|   local a = 0
|   a = -a
|   table.insert(res, tostring(a))
| end

We use `table.new()` here to avoid trace exits due to table rehashing
(such trace exits return us to the interpreter and you see the -0 value
instead of expected 0, when test failure expected without the patch).

In your test case you actually want to verify, that behaviour for JIT
compiled code and interpreter is the same. So we can run the loop above
with JIT on/off and verify those tables' content is the same. From this
point of view, there is no different for the DUALNUMBER mode in this
test -- the results should be equal for the JIT and the interpreter (but
it will be nice manner to write the comment with expected content for
both modes).

> +
> +test:ok(has_error == false, "inconsistent unary minus result")

For now we check only _recording_ for 0.
But we should also check the added assertion guard. I suggest the test
like the following:

| local res2 = require"table.new"(3,0)
|
| for i = 2, 0, -1 do
|   table.insert(res2, tostring(-i))
| end

> +os.exit(test:check() and 0 or 1)
> +
> -- 
> 2.32.1 (Apple Git-133)
> 

-- 
Best regards,
Sergey Kaplun