From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id C20BE70353; Thu, 28 Oct 2021 15:37:57 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C20BE70353 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1635424677; bh=JwaIgZCsfeV1Nh+X61YjTv7MMpOmoJfYYIvQIM++OtQ=; h=Date:In-Reply-To:To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ENzxx9zOjYwRUdr916u/S6IsaUG7jtdVokfroe0pwWkNSPmh2LheURArkqYNtAOEZ 7Qj5b8+brdKwXLomn4gpDhQb3QWvadstAp04i3wzrjwT1/8XlC94sL4S0/UJoqH0L5 GPHwo6ISTWpyBPIjDl09FYJ4wlRJPzOWx1zhs8sY= Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4AD6770353 for ; Thu, 28 Oct 2021 15:37:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4AD6770353 Received: by smtp49.i.mail.ru with esmtpa (envelope-from ) id 1mg4fL-0003lZ-FD; Thu, 28 Oct 2021 15:37:55 +0300 Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_CF2045DA-3A3D-41FA-9551-018C4A9A1514" Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Date: Thu, 28 Oct 2021 15:37:54 +0300 In-Reply-To: To: Sergey Kaplun References: <20211018185341.32155-1-skaplun@tarantool.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9E6B4260954843F6F3E786F33575CCF95BE72D31FADAB1DEE00894C459B0CD1B950737ADEA30541F05D5B7E0235A76D136FCFF714E76D4FD224DB72CEC37D2004 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F9D05773942AAE9CEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637BD378188104BC8BE8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D867C3B75097426CD695F95DD3273173EA117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BF1175FABE1C0F9B6A471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7C565C1E6824D8037B43847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A59C40F6AC7555B0B612F6AA3644210E1A713A8C9204E0BC5AD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75C29D03FC76C37677410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3453741B480A6503C39F577A53B8BF18AFB2D2C15CB90F09571B3BA9EBE2D94FD937CA99D8F54C74861D7E09C32AA3244C3AFAE197B4C0443F95582A86AB24ABE9725D5B54B2FE4575FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojdMRfVmNkPDgtsKYfX4le/w== X-Mailru-Sender: 3B9A0136629DC912F4AABCEFC589C81EB8F3E6DBCB4283C5FBE0E89A648FEB0A0901A01E0DC3C3E3AD07DD1419AC565FA614486B47F28B67C5E079CCF3B0523AED31B7EB2E253A9E112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] Fix FOLD rule for strength reduction of widening. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: sergos via Tarantool-patches Reply-To: sergos Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" --Apple-Mail=_CF2045DA-3A3D-41FA-9551-018C4A9A1514 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi!=20 The description looks good! Although, I can=E2=80=99t get the point how this > (negative offset from the stack > pointer may appear positive and result is undefined memory access).=20 can=E2=80=99t be applied to a constant? Hence, we=E2=80=99d need a = check? Although, I=E2=80=99m trying to get into Mike=E2=80=99s business again. = So it LGTM as a backport. Regards, Sergos > On 28 Oct 2021, at 14:36, Sergey Kaplun wrote: >=20 > Hi, Sergos! >=20 > Thanks for the review! >=20 > On 27.10.21, sergos wrote: >> Hi! >>=20 >> Thanks for the patch, see my comments below. >>=20 >> Regards, >> Sergos >>=20 >>> On 18 Oct 2021, at 21:53, Sergey Kaplun = wrote: >>>=20 >>> From: Mike Pall >>>=20 >>> Reported by Matthew Burk. >>>=20 >>> (cherry picked from commit 9f0caad0e43f97a4613850b3874b851cb1bc301d) >>>=20 >>> The simplify_conv_sext optimization is used for reduction of = widening. >>=20 >> Whether it is part of narrowing itself?=20 >=20 > Not exactly, it is fold optimization that can be used during = narrowing. >=20 > lj_opt_narrow_cindex -> lj_opt_fold (in emitir) -> fold_narrow_convert = -> > lj_opt_narrow_convert -> narrow_conv_emit -> lj_opt_fold -> > fold_simplify_conv_sext >=20 >>=20 >>> cdata indexing narrow optimization uses it for narrowing of a C = array >>=20 >> The sentence is started with lowercase, making one to track back the = start >> of the sentence. =E2=80=9CA narrow optimization for cdata=E2=80=A6=E2=80= =9D in conjunction of >> first sentence above should help. >>=20 >>=20 >>> index. The optimization eliminates sign extension for corresponding >> the >>=20 >>> integer value. However, this conversion cannot be omitted for non >>> constant values (for example loading stack slots) as far as their = sign >>> extension may change. The emitted machine code may be incorrect = without >>=20 >> I believe you meant =E2=80=98with=E2=80=99 the conversion.=20 >>=20 >>> aforementioned conversion (for example mov instruction instead = movsxd is >>> used on x86 architecture). As a result the value in a destination >>=20 >> The example is too much. Just =E2=80=9Cnegative offset from the stack = pointer >> may appear positive and result in undefined memory access=E2=80=9D >=20 > Fixed all your comments, see the updated commit message below. > Branch is force pushed. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Fix FOLD rule for strength reduction of widening. >=20 > Reported by Matthew Burk. >=20 > (cherry picked from commit 9f0caad0e43f97a4613850b3874b851cb1bc301d) >=20 > The simplify_conv_sext optimization is used for reduction of widening. > An indexing narrow optimization for cdata uses it for narrowing of a C > array index. The optimization eliminates sign extension for the > corresponding integer value. However, this conversion cannot be = omitted > for non constant values (for example loading stack slots) as far as > their sign extension may change. The emitted machine code may be > incorrect with aforementioned conversion (negative offset from the = stack > pointer may appear positive and result is undefined memory access). As = a > result the value in a destination register during trace execution is > invalid. >=20 > This patch allows this optimization only for constant integer values. >=20 > Sergey Kaplun: > * added the description and the test for the problem >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>=20 >>> register during trace execution is invalid. >>>=20 >>> This patch allows this optimization only for constant integer = values. >>=20 >> Should it check if integer - even a constant one - is positive?=20 >=20 > No, why? In fold rule are declared both types: INT64 and UINT64. >=20 >>=20 >>>=20 >>> Sergey Kaplun: >>> * added the description and the test for the problem >>> --- >>>=20 >>> Tarantool branch: = https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket-fix-fold-s= implify-conv-sext >>> Branch: = https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fix-fold-simp= lify-conv-sext >>>=20 >>> src/lj_opt_fold.c | 2 +- >>> .../lj-fix-fold-simplify-conv-sext.test.lua | 35 = +++++++++++++++++++ >>> 2 files changed, 36 insertions(+), 1 deletion(-) >>> create mode 100644 = test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua >>>=20 >>> diff --git a/src/lj_opt_fold.c b/src/lj_opt_fold.c >>> index 3c508062..276dc040 100644 >>> --- a/src/lj_opt_fold.c >>> +++ b/src/lj_opt_fold.c >>> @@ -1227,7 +1227,7 @@ LJFOLDF(simplify_conv_sext) >>> if (ref =3D=3D J->scev.idx) { >>> IRRef lo =3D J->scev.dir ? J->scev.start : J->scev.stop; >>> lua_assert(irt_isint(J->scev.t)); >>> - if (lo && IR(lo)->i + ofs >=3D 0) { >>> + if (lo && IR(lo)->o =3D=3D IR_KINT && IR(lo)->i + ofs >=3D 0) { >>> ok_reduce: >>> #if LJ_TARGET_X64 >>> /* Eliminate widening. All 32 bit ops do an implicit = zero-extension. */ >>> diff --git = a/test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua = b/test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua >>> new file mode 100644 >>> index 00000000..bd3738c5 >>> --- /dev/null >>> +++ b/test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua >>> @@ -0,0 +1,35 @@ >>> +local tap =3D require('tap') >>> +local ffi =3D require('ffi') >>> + >>> +local test =3D tap.test('lj-fix-fold-simplify-conv-sext') >>> + >>> +local NSAMPLES =3D 4 >>> +local NTEST =3D NSAMPLES * 2 + 1 >=20 > Also fixed typo here: s/+/-/ >=20 >>> +test:plan(NTEST) >>> + >>> +local samples =3D ffi.new('int [?]', NSAMPLES) >>> + >>> +-- Prepare data. >>> +for i =3D 0, NSAMPLES - 1 do samples[i] =3D i end >>> + >>> +local expected =3D {3, 2, 1, 0, 3, 2, 1} >>> + >>> +local START =3D 3 >>> +local STOP =3D -START >>> + >>> +local results =3D {} >>> +jit.opt.start('hotloop=3D1') >>> +for i =3D START, STOP, -1 do >>> + -- While recording cdata indexing the fold CONV SEXT >>> + -- optimization eliminate sign extension for the corresponding >>> + -- non constant value (i.e. stack slot). As a result the read >>> + -- out of bounds was occurring. >>> + results[#results + 1] =3D samples[i % NSAMPLES] >>> +end >>> + >>> +for i =3D 1, NTEST do >>> + test:ok(results[i] =3D=3D expected[i], 'correct cdata indexing') >>> +end >>> + >>> +os.exit(test:check() and 0 or 1) >>> + >>> --=20 >>> 2.31.0 >>>=20 >>=20 >=20 > --=20 > Best regards, > Sergey Kaplun --Apple-Mail=_CF2045DA-3A3D-41FA-9551-018C4A9A1514 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi! 

The description looks good!

Although, I can=E2=80=99t get the point = how this

(negative offset from the stack
> = pointer may appear positive and result is undefined memory = access). 

can=E2=80=99t be = applied to a constant? Hence, we=E2=80=99d need a check?

Although, I=E2=80=99m trying to get into Mike=E2=80=99= s business again. So it LGTM as a backport.

Regards,
Sergos


On 28 Oct 2021, at 14:36, Sergey Kaplun = <skaplun@tarantool.org> wrote:

Hi, Sergos!

Thanks for the review!

On 27.10.21, sergos wrote:
Hi!

Thanks= for the patch, see my comments below.

Regards,
Sergos

On 18 Oct 2021, at = 21:53, Sergey Kaplun <skaplun@tarantool.org> wrote:

From: Mike Pall <mike>

Reported by Matthew Burk.

(cherry = picked from commit 9f0caad0e43f97a4613850b3874b851cb1bc301d)

The simplify_conv_sext optimization is used = for reduction of widening.

Whether it is part of narrowing itself? 

Not exactly, it is fold optimization that can be used during = narrowing.

lj_opt_narrow_cindex -> lj_opt_fold (in emitir) -> = fold_narrow_convert ->
lj_opt_narrow_convert -> narrow_conv_emit -> = lj_opt_fold ->
fold_simplify_conv_sext


cdata indexing narrow optimization uses it for = narrowing of a C array

The = sentence is started with lowercase, making one to track back the = start
of the sentence. =E2=80=9CA narrow optimization for = cdata=E2=80=A6=E2=80=9D in conjunction of
first sentence = above should help.


index. The optimization eliminates sign = extension for corresponding
     the
integer = value. However, this conversion cannot be omitted for non
constant values (for example loading stack slots) as far as = their sign
extension may change. The emitted machine code = may be incorrect without

I = believe you meant =E2=80=98with=E2=80=99 the conversion. 

aforementioned = conversion (for example mov instruction instead movsxd is
used on x86 architecture). As a result the value in a = destination

The example is too = much. Just =E2=80=9Cnegative offset from the stack pointer
may appear positive and result in undefined memory = access=E2=80=9D

Fixed all your comments, see the updated commit message = below.
Branch is = force pushed.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Fix FOLD rule = for strength reduction of widening.

Reported by Matthew Burk.

(cherry picked from commit = 9f0caad0e43f97a4613850b3874b851cb1bc301d)

The simplify_conv_sext optimization is used for reduction of = widening.
An indexing = narrow optimization for cdata uses it for narrowing of a C
array index. = The optimization eliminates sign extension for the
corresponding = integer value. However, this conversion cannot be omitted
for non = constant values (for example loading stack slots) as far as
their sign = extension may change. The emitted machine code may be
incorrect = with aforementioned conversion (negative offset from the stack
pointer may = appear positive and result is undefined memory access). As a
result the = value in a destination register during trace execution is
invalid.

This patch allows this optimization only for constant integer = values.

Sergey = Kaplun:
* added the = description and the test for the problem

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

register during trace = execution is invalid.

This patch allows = this optimization only for constant integer values.

Should it check if integer - even = a constant one - is positive? 

No, why? In fold rule are declared both types: INT64 and = UINT64.



Sergey = Kaplun:
* added the description and the test for the = problem
---

Tarantool branch: = https://github.com/tarantool/tarantool/tree/skaplun/gh-noticket= -fix-fold-simplify-conv-sext
Branch: https://github.com/tarantool/luajit/tree/skaplun/gh-noticket-fi= x-fold-simplify-conv-sext

src/lj_opt_fold.c =             &n= bsp;           &nbs= p;   |  2 +-
.../lj-fix-fold-simplify-conv-sext.test.lua   | 35 = +++++++++++++++++++
2 files changed, 36 insertions(+), 1 = deletion(-)
create mode 100644 = test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua

diff --git a/src/lj_opt_fold.c = b/src/lj_opt_fold.c
index 3c508062..276dc040 100644
--- a/src/lj_opt_fold.c
+++ = b/src/lj_opt_fold.c
@@ -1227,7 +1227,7 @@ = LJFOLDF(simplify_conv_sext)
 if (ref =3D=3D = J->scev.idx) {
   IRRef lo =3D = J->scev.dir ? J->scev.start : J->scev.stop;
   lua_assert(irt_isint(J->scev.t));
-    if (lo && IR(lo)->i + ofs = >=3D 0) {
+    if (lo && = IR(lo)->o =3D=3D IR_KINT && IR(lo)->i + ofs >=3D 0) = {
   ok_reduce:
#if = LJ_TARGET_X64
     /* Eliminate = widening. All 32 bit ops do an implicit zero-extension. */
diff --git = a/test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua = b/test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua
new file mode 100644
index = 00000000..bd3738c5
--- /dev/null
+++ = b/test/tarantool-tests/lj-fix-fold-simplify-conv-sext.test.lua
@@ -0,0 +1,35 @@
+local tap =3D = require('tap')
+local ffi =3D require('ffi')
+
+local test =3D = tap.test('lj-fix-fold-simplify-conv-sext')
+
+local NSAMPLES =3D 4
+local NTEST =3D NSAMPLES = * 2 + 1

Also fixed typo here: s/+/-/

+test:plan(NTEST)
+
+local = samples =3D ffi.new('int [?]', NSAMPLES)
+
+--= Prepare data.
+for i =3D 0, NSAMPLES - 1 do samples[i] =3D = i end
+
+local expected =3D {3, 2, 1, 0, 3, = 2, 1}
+
+local START =3D 3
+local STOP =3D -START
+
+local = results =3D {}
+jit.opt.start('hotloop=3D1')
+for i =3D START, STOP, -1 do
+  -- While = recording cdata indexing the fold CONV SEXT
+  -- = optimization eliminate sign extension for the corresponding
+  -- non constant value (i.e. stack slot). As a result = the read
+  -- out of bounds was occurring.
+  results[#results + 1] =3D samples[i % NSAMPLES]
+end
+
+for i =3D 1, NTEST do
+  test:ok(results[i] =3D=3D expected[i], 'correct cdata = indexing')
+end
+
+os.exit(test:check() and 0 or 1)
+
-- 
2.31.0



-- Best = regards,
Sergey = Kaplun

= --Apple-Mail=_CF2045DA-3A3D-41FA-9551-018C4A9A1514--