From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 714D66F153; Thu, 15 Sep 2022 01:09:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 714D66F153 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1663193353; bh=AYIsm+uUNIDBZ00YjFyS8Evp7gY+h1Wsobi/a6x57uQ=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=NBAw8OuJ2tr2ZfXMiwOB+nUfs9mYuyTbrv4j3k6/FVnMdzbLboXc89it6S3vmGnkF lZdzgqmEvC5etB08adc0FG0kbid5HnFxUImiE0fJJkc9T65nM/pjS5f8ObFiH8ztVW sJAAbK4L10nfGNjh8UUgzSS/DbkejhUlVIWwu91A= Received: from smtp59.i.mail.ru (smtp59.i.mail.ru [217.69.128.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A59276F153 for ; Thu, 15 Sep 2022 01:09:11 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A59276F153 Received: by smtp59.i.mail.ru with esmtpa (envelope-from ) id 1oYaZC-0007Lm-NS; Thu, 15 Sep 2022 01:09:11 +0300 Date: Thu, 15 Sep 2022 01:06:29 +0300 To: Maksim Kokryashkin Message-ID: References: <20220909121531.87545-1-max.kokryashkin@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220909121531.87545-1-max.kokryashkin@gmail.com> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9EBD81B01C038F8F396C53C77C8EAFD6814A2CCDCE7FE122F182A05F5380850409F140F79F0CA7235DE457B8B5B105C2D8AC7D8125E7AB6407749DFE48FE02DCE X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE77603ADE015AF816DEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637575C0790A70C4B158638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8A5E59DEE3F0501AAC2F7BA8D7C043367117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18BDFBBEFFF4125B51D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EED76C6ED7039589DE03CEA74F0D118906D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE367F1C1C3ABB44F3A6E0066C2D8992A16C4224003CC836476E2F48590F00D11D6E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637A7EFCB0EB5ACB161EFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-C1DE0DAB: 9604B64F49C60606AD91A466A1DEF99B296C473AB1E142185AC9E3593CE4B31AB1881A6453793CE9274300E5CE05BD4401A9E91200F654B0C7A0BC55FA0FE5FC0FF29EA5888F38274760C267656DBD4F4AC49324C72D5A23B1881A6453793CE9C32612AADDFBE06133F7A9E5587C79A693EDB24507CE13387DFF0A840B692CF8 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34ECB3E21D3CD9CB4F1AE4913169FB4120318AE4203924E30C002D3902B5F239B7AFC4D99A3A00AF461D7E09C32AA3244C9E84CE513BBA223285E0037D96239958FE8DA44ABE2443F7FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojnfsvVopF4B/0SwZyCCiibQ== X-Mailru-Sender: F16D9CAFEEA6770E7B6EAD4ADB3BCAF0501802350ECE6A9BC15C14B5750D5C5CF233904666D6F39AF2400F607609286E924004A7DEC283833C7120B22964430C52B393F8C72A41A84198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] Fix narrowing of unary minus. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Kaplun via Tarantool-patches Reply-To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi, Maxim! Thanks for the patch! Please consider my comments below. On 09.09.22, Maksim Kokryashkin wrote: > From: Mike Pall > Please, don't forget the common cherry-picked header: | (cherry picked from commit 1e6e8aaa20626ac94cf907c69b0452f76e9f5fa5) It is necessary to recognize the original commit without grepping of LuaJIT source code. > With DUALNUM mode disabled, unary minus narrowing can produce > inconsistent results on zeros, returning both -0 and 0 on the > same chunk of code. > > This patch fixes the inconsistency by adding checks for zeros. I suggest to reformulate these paragraphes like the following: | LuaJIT narrowing optimization during BC_UNM recording may ignore | information about sign of zero for integer types of IR. So far the | resulting value on a trace is not the same as for the interpreter. | However, in DUALNUM mode with using of TValues containing integers the | -0 value is represented in the same way as the 0 value and there is no | difference between them. | | This patch fixes the non-DUALNUM mode behaviour. When the zero value is | identified during recording it should be cast to number so IR_CONV is | emitted. Also, this patch adds assertion guard checking that value on | which operation of unary minus is performed isn't zero. Feel free to modify it as you wish. > Don't be shy to mention your impact in this patch (description and tests) like it is done in previous backported commits. > Part of tarantool/tarantool#7230 > Resolves tarantool/tarantool#6976 Minor: Igor prefers to place the resolved issue first (as far as this is the main issue to close). I prefer the opposite order. But we've agreed with Igor's way, so let use it for consistency. > --- > Issue: https://github.com/tarantool/tarantool/issues/6976 > Branch: https://github.com/tarantool/luajit/tree/fckxorg/gh-6976-narrowing-of-unary-minus > PR: https://github.com/tarantool/tarantool/pull/7662 > > src/lj_opt_narrow.c | 9 +++-- > .../gh-6976-narrowing-of-unary-minus.test.lua | 34 +++++++++++++++++++ > 2 files changed, 41 insertions(+), 2 deletions(-) > create mode 100644 test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua > > diff --git a/src/lj_opt_narrow.c b/src/lj_opt_narrow.c > index cd96ca4b..bb61f97b 100644 > --- a/src/lj_opt_narrow.c > +++ b/src/lj_opt_narrow.c > @@ -551,8 +551,13 @@ TRef lj_opt_narrow_unm(jit_State *J, TRef rc, TValue *vc) > { > rc = conv_str_tonum(J, rc, vc); > if (tref_isinteger(rc)) { > - if ((uint32_t)numberVint(vc) != 0x80000000u) > - return emitir(IRTGI(IR_SUBOV), lj_ir_kint(J, 0), rc); > + uint32_t k = (uint32_t)numberVint(vc); > + if ((LJ_DUALNUM || k != 0) && k != 0x80000000u) { > + TRef zero = lj_ir_kint(J, 0); > + if (!LJ_DUALNUM) > + emitir(IRTGI(IR_NE), rc, zero); > + return emitir(IRTGI(IR_SUBOV), zero, rc); > + } Side note: I wonder how much it is better to use this brancn in narrowing optimization for non-DUALNUMBER mode, as far as we even add additional assertion guard for zero check (IR_NE). Assume we have the following code: | local r = 0 | for i = 1, 1e9 do | local a = -i | r = r + a | end | print(r) Before the patch we've had the following IRs for LOOP part: | ------ LOOP ------------ | 0009 > int SUBOV +0 0006 | 0010 num CONV 0009 num.int | 0011 + num ADD 0010 0005 | 0012 + int ADD 0006 +1 | 0013 > int LE 0012 +1000000000 | 0014 int PHI 0006 0012 | 0015 num PHI 0005 0011 And after patch: | ------ LOOP ------------ | 0010 > int NE 0007 +0 | 0011 > int SUBOV +0 0007 | 0012 num CONV 0011 num.int | 0013 + num ADD 0012 0006 | 0014 + int ADD 0007 +1 | 0015 > int LE 0014 +1000000000 | 0016 int PHI 0007 0014 | 0017 num PHI 0006 0013 This leads to the following additional instructions: | test ebp, ebp | jz 0x561394990018 ->2 I run 10 loops like the aforementioned one with 21e8 iterations. The average time for 5 runs (of 10 loops) before the patch: 25.763s The average time for 5 runs (of 10 loops) after the patch: 25.885s And, actually, if we skip this optimization with the following patch (I just skip the branch with SUBOV emitting for non-DUALNUMBER mode): =================================================================== diff --git a/src/lj_opt_narrow.c b/src/lj_opt_narrow.c index bb61f97b..aaa5b3cd 100644 --- a/src/lj_opt_narrow.c +++ b/src/lj_opt_narrow.c @@ -552,7 +552,7 @@ TRef lj_opt_narrow_unm(jit_State *J, TRef rc, TValue *vc) rc = conv_str_tonum(J, rc, vc); if (tref_isinteger(rc)) { uint32_t k = (uint32_t)numberVint(vc); - if ((LJ_DUALNUM || k != 0) && k != 0x80000000u) { + if ((LJ_DUALNUM) && k != 0x80000000u) { TRef zero = lj_ir_kint(J, 0); if (!LJ_DUALNUM) emitir(IRTGI(IR_NE), rc, zero); =================================================================== The IRs for the loop part are the following: | ------ LOOP ------------ | 0010 num CONV 0007 num.int | 0011 + num SUB 0006 0010 | 0012 + int ADD 0007 +1 | 0013 > int LE 0012 +1000000000 | 0014 int PHI 0007 0012 | 0015 num PHI 0006 0011 As you can see we have even 2 IRs instead of 4 (we skip assertion guard and creation of negative number -- we just use substitution instead of addition). I suppose for non-DUALNUM mode this is the __common__ case -- operations in doubles, not integers (it may be different for cdata indexes -- I don't check it). As far as we have only 3 instructions for this loop payload: | xorps xmm6, xmm6 | cvtsi2sd xmm6, ebp | subsd xmm7, xmm6 instead of old 8 instructions: | test ebp, ebp | jz 0x555995dc0018 ->2 | xor ebx, ebx | sub ebx, ebp | jo 0x555995dc0018 ->2 | xorps xmm6, xmm6 | cvtsi2sd xmm6, ebx | addsd xmm7, xmm6 the results are slightly better: 25.368s I understand, that the benchmark is kinda synthetic, but still... P.S. For others payload the result can be the diffirent, but this code looks like the most common, as I said before. I mean constructions like the following: | local a = -b; | c = d + a Also, it is converted to number anyway in HREF: | 0021 > int NE 0018 +0 | 0022 > int SUBOV +0 0018 | 0023 int FLOAD 0004 tab.asize | 0024 > int EQ 0023 +0 | 0025 num CONV 0022 num.int | 0026 p32 HREF 0004 0025 with the following assembly: | mov edi, [0x40162540] | mov esi, [rsp+0x10] | | test ebp, ebp | jz 0x558aff9c0018 ->2 | xor r15d, r15d | sub r15d, ebp | jo 0x558aff9c0018 ->2 | | mov ebx, [rsi+0x18] | test ebx, ebx | jnz 0x558aff9c0018 ->2 | | xorps xmm7, xmm7 | cvtsi2sd xmm7, r15d | | movq rdx, xmm7 for the following compiled code: | local a = tab[-i] -- type(tab) == 'table' Without NE, SUBOV emitting IRs are the following: | 0020 num CONV 0017 num.int | 0021 num NEG 0020 0003 | 0022 int FLOAD 0005 tab.asize | 0023 > int EQ 0022 +0 | 0024 p32 HREF 0005 0021 with the following assembly: | mov edi, [0x40093540] | mov esi, [rsp+0x10] | | xorps xmm7, xmm7 | cvtsi2sd xmm7, ebp | movsd [rsp+0x8], xmm7 | movaps xmm6, xmm7 | xorps xmm6, [0x400936f0] | | mov ebx, [rsi+0x18] | test ebx, ebx | jnz 0x55c341180018 ->2 | | movq rdx, xmm6 Average time when run 3 chunks like the following: | local r = require"table.new"(1e8, 0) | for i = 1, 1e8 do | local a = r[-i] | r[i] = a | end | print(r) With NE and SUBOV: 7.769s Without: 7.670s Also, it is obviously more natural to use this cast in DUALNUMBER mode, where integer TValues are usual. P.P.S. The patch itself as a bugfix LGTM. P.P.P.S. Thoughts? :) > rc = emitir(IRTN(IR_CONV), rc, IRCONV_NUM_INT); > } > return emitir(IRTN(IR_NEG), rc, lj_ir_ksimd(J, LJ_KSIMD_NEG)); > diff --git a/test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua b/test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua > new file mode 100644 > index 00000000..caae8c34 > --- /dev/null > +++ b/test/tarantool-tests/gh-6976-narrowing-of-unary-minus.test.lua Side note: This 6976 issue has no description, but OK :) > @@ -0,0 +1,34 @@ > +require("utils").skipcond( Minor: Use single quotes instead of double quotes. > + jit.arch ~= "x86" and jit.arch ~= "x64", > + "DUALNUM mode must be disabled" > +) Not sure that we must skip other arches, see argumentation below. > + > +local tap = require("tap") > +local test = tap.test("gh-6976-narrowing-of-unary-minus") > +test:plan(1) > + > +jit.off() > +jit.flush() > +jit.opt.start('hotloop=1') > +jit.on() Don't see the reason to restart the JIT compiler at this point. Just set the option is enough. > + > +local has_error = false > + > +-- The first two iterations are needed to write trace in a way > +-- where `a` is treated like a non-zero integer value. After the > +-- first two iterations 'a' becomes a zero integer, but it is still > +-- processed by the same trace. Finally, roughly ten loop These ten iterations are needed for side trace compilation. You can set hotexit=1 to start compiling trace at the first trace exit. But this is not necessary, see argumentation below. > +-- iterations later, a new trace is formed, where `a` is treated > +-- like a number. > +for i=1,20 do > + local a = 1 > + if i > 2 then > + a = 0 > + end > + a = -a > + has_error = has_error or tostring(a):match("%-%d") == nil > +end Actually, I don't see the reason for such complicated test case. We can just use the following (with better naming, etc.): | local res = require'table.new'(3, 0) | | for _ = 1, 3 do | local a = 0 | a = -a | table.insert(res, tostring(a)) | end We use `table.new()` here to avoid trace exits due to table rehashing (such trace exits return us to the interpreter and you see the -0 value instead of expected 0, when test failure expected without the patch). In your test case you actually want to verify, that behaviour for JIT compiled code and interpreter is the same. So we can run the loop above with JIT on/off and verify those tables' content is the same. From this point of view, there is no different for the DUALNUMBER mode in this test -- the results should be equal for the JIT and the interpreter (but it will be nice manner to write the comment with expected content for both modes). > + > +test:ok(has_error == false, "inconsistent unary minus result") For now we check only _recording_ for 0. But we should also check the added assertion guard. I suggest the test like the following: | local res2 = require"table.new"(3,0) | | for i = 2, 0, -1 do | table.insert(res2, tostring(-i)) | end > +os.exit(test:check() and 0 or 1) > + > -- > 2.32.1 (Apple Git-133) > -- Best regards, Sergey Kaplun