From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 8300014C410E; Mon, 8 Sep 2025 13:40:51 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 8300014C410E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1757328051; bh=eCWdq0m+Ul4l77bXbisLrI4KAfEx3gCIuM8HBg+7PTU=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=hgQKbeDivmtg0cZZwTJIyeWls7TqfVCPw7KoUV/DjT0owoEUb1thYvIFsDopBcdNo vnOjivFVU+gN63ImFjtlIU9/eFlZkWgMWhps4x1XSicalXVQAfB2rUz/1RDyDMnVde rAO1nbicj4GLHERfbLsjKDkxxkqtQY4V3nSqGVOs= Received: from send194.i.mail.ru (send194.i.mail.ru [95.163.59.33]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C033214C4110 for ; Mon, 8 Sep 2025 13:40:49 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C033214C4110 Received: by exim-smtp-c584fb9f-fnjhh with esmtpa (envelope-from ) id 1uvZIm-000000003FN-3fWc; Mon, 08 Sep 2025 13:40:49 +0300 Content-Type: multipart/alternative; boundary="------------pbQFTZ2Fy05Y0Tjp0a04U5nn" Message-ID: Date: Mon, 8 Sep 2025 13:40:48 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: <20250827091711.13681-1-skaplun@tarantool.org> In-Reply-To: X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD9C019336662AA214112ED56D2825AC2CF1E9D148A301EAEAB182A05F538085040E524CE391465D0BB3DE06ABAFEAF67055559FAF858399EBC85D736BB69C359640C8629FDE0E8EB2A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE727FD6E7FC3A8F857EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB553375661343631359B0F9D9B58A30F9C510199806D0DA2365D2D8FA335E537D7B331B34389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A3E989B1926288338941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6F459A8243F1D1D44CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C22490F250D17497FEF6176E601842F6C81A12EF20D2F80756B5FB606B96278B59C4276E601842F6C81A127C277FBC8AE2E8B8DBB596EC94336063AA81AA40904B5D99C9F4D5AE37F343AD1F44FA8B9022EA23BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4930A3850AC1BE2E735026D3A1080F4EF5CC4224003CC83647689D4C264860C145E X-C1DE0DAB: 0D63561A33F958A5FAFCA2B4EC0E28AF5002B1117B3ED696355E1091D668387DC89B063BDC7FAC35823CB91A9FED034534781492E4B8EEAD2739D626790C8313BDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D34AA13E2DDB90678628E2E343C499ADA9A2194CE8F852A2EC1863D43FCA64FD6468F667E53CE38695B1D7E09C32AA3244CACE0305D63900A1177DD89D51EBB7742CE9F7150700F91BEEA455F16B58544A2E30DDF7C44BCB90DA5AE236DF995FB59978A700BF655EAEEED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVdVMtzNxwZu552rpikAV2mY= X-Mailru-Sender: 811C44EDE0507D1F797560C68D020EBD317E94CD4654A2FE4825860F061DF2147241DAC0D84416A9F0C73B2CB4A1EE7C645D15D82EE4B272BD6E4642A116CA93524AA66B5ACBE6721EF430B9A63E2A504198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusion (again). X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------pbQFTZ2Fy05Y0Tjp0a04U5nn Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit LGTM On 9/8/25 12:48, Sergey Kaplun wrote: > Hi, Sergey! > Thanks for the review! > Fixed your comment and force-pushed the branch. > > On 08.09.25, Sergey Bronnikov wrote: >> Hi, Sergey, >> >> thanks for the patch! LGTM with two minor comments >> >> Sergey >> > > >>> On 8/27/25 12:17, Sergey Kaplun wrote: >>>> From: Mike Pall >>>> >>>> Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley. >>>> >>>> (cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b) >>>> >>>> Assume we have stores/loads from the pointer with offset +488 and -16. >>>> The lower bits of the offset are the same as for the offset (488 + 8). >>>> This leads to the incorrect fusion of these instructions: >>>> | str x20, [x21, 488] >>>> | stur x20, [x21, -16] >>>> to the following instruction: >>>> | stp x20, x20, [x21, 488] >>>> >>>> This patch prevents this fusion by more accurate offset comparison. >>>> >>>> Sergey Kaplun: >>>> * added the description and the test for the problem >>>> >>>> Part of tarantool/tarantool#11691 >>>> --- >>>> >>>> Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion >>>> Related issues: >>>> *https://github.com/tarantool/tarantool/issues/11691 >>>> *https://github.com/LuaJIT/LuaJIT/issues/1075 >>>> >>>> src/lj_emit_arm64.h | 17 ++- >>>> ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++ >>>> 2 files changed, 142 insertions(+), 4 deletions(-) >>>> create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua >>>> >>>> diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h >>>> index 5c1bc372..9dd92c40 100644 >>>> --- a/src/lj_emit_arm64.h >>>> +++ b/src/lj_emit_arm64.h > > >>>> diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua >>>> new file mode 100644 >>>> index 00000000..c84c3b23 >>>> --- /dev/null >>>> +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua >>>> @@ -0,0 +1,129 @@ > > >>>> + >>>> +jit.opt.start('hotloop=2') >> Why 2? It deserves a comment, because usually we use 1 hotloop. > It's a copy-pasting mistake from the aarch64 machine, fixed to > `hotloop=1`, thanks: Thanks! > > =================================================================== > diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > index c84c3b23..393a1aa7 100644 > --- a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua > @@ -40,7 +40,7 @@ local function init_buf() > end > end > > -jit.opt.start('hotloop=2') > +jit.opt.start('hotloop=1') > > -- Assume we have stores/loads from the pointer with offset > -- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are > =================================================================== > > > >>>> + >>>> +-- Another reproducer that is based on the snapshot restoring. >>>> +-- Its advantage is avoiding FFI usage. >>>> + >>>> +-- Snapshot slots are restored in the reversed order. >>>> +-- The recording order is the following (from the bottom of the >>>> +-- trace to the top): >>>> +-- - 0th (ofs == -16) -- `f64()` replaced the `tail64()` on the >>>> +-- stack, >>>> +-- - 63rd (ofs == 488) -- 1, >>>> +-- - 64th (ofs == 496) -- 2. >>>> +-- At recording, the instructions for the 0th and 63rd slots are >>>> +-- merged like the following: >>>> +-- | str x3, [x19, #496] >>>> +-- | stp x2, x1, [x19, #488] >>>> +-- The first store is dominated by the stp, so the restored value >>>> +-- is incorrect. >>>> + >>>> +-- Function with 63 slots on the stack. >>>> +local function f63() >> Minor: Hardcode a number of slots to the function name looks odd. > It is mentioned above why exactly this amount of slots is required. > It shouldn't be touched. The question was about hard-coding a number in a function name, not about using exactly this number of slots. Ok, I'll not insist, as I said in a question. >> The same for tail63. Bumping a number of slots will >> >> require renaming of two functions. >> >> Feel free to ignore. > Ignoring. > >>>> + -- 61 unused slots to avoid extra stores in between. >>>> + -- luacheck: no unused >>>> + local _, _, _, _, _, _, _, _, _, _ >>>> + local _, _, _, _, _, _, _, _, _, _ >>>> + local _, _, _, _, _, _, _, _, _, _ >>>> + local _, _, _, _, _, _, _, _, _, _ >>>> + local _, _, _, _, _, _, _, _, _, _ >>>> + local _, _, _, _, _, _, _, _, _, _ >>>> + local _ >>>> + return 1, 2 >>>> +end >>>> + > > >>>> +test:done(true) --------------pbQFTZ2Fy05Y0Tjp0a04U5nn Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

LGTM

On 9/8/25 12:48, Sergey Kaplun wrote:
Hi, Sergey!
Thanks for the review!
Fixed your comment and force-pushed the branch.

On 08.09.25, Sergey Bronnikov wrote:
Hi, Sergey,

thanks for the patch! LGTM with two minor comments

Sergey

<snipped>

On 8/27/25 12:17, Sergey Kaplun wrote:
From: Mike Pall <mike>

Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley.

(cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b)

Assume we have stores/loads from the pointer with offset +488 and -16.
The lower bits of the offset are the same as for the offset (488 + 8).
This leads to the incorrect fusion of these instructions:
| str   x20, [x21, 488]
| stur  x20, [x21, -16]
to the following instruction:
| stp   x20, x20, [x21, 488]

This patch prevents this fusion by more accurate offset comparison.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11691
---

Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion
Related issues:
*https://github.com/tarantool/tarantool/issues/11691
*https://github.com/LuaJIT/LuaJIT/issues/1075

  src/lj_emit_arm64.h                           |  17 ++-
  ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++
  2 files changed, 142 insertions(+), 4 deletions(-)
  create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua

diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
index 5c1bc372..9dd92c40 100644
--- a/src/lj_emit_arm64.h
+++ b/src/lj_emit_arm64.h
<snipped>

diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
new file mode 100644
index 00000000..c84c3b23
--- /dev/null
+++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
@@ -0,0 +1,129 @@
<snipped>

+
+jit.opt.start('hotloop=2')
Why 2? It deserves a comment, because usually we use 1 hotloop.
It's a copy-pasting mistake from the aarch64 machine, fixed to
`hotloop=1`, thanks:
Thanks!

===================================================================
diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
index c84c3b23..393a1aa7 100644
--- a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
+++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
@@ -40,7 +40,7 @@ local function init_buf()
   end
 end
 
-jit.opt.start('hotloop=2')
+jit.opt.start('hotloop=1')
 
 -- Assume we have stores/loads from the pointer with offset
 -- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are
===================================================================


      
<snipped>

+
+-- Another reproducer that is based on the snapshot restoring.
+-- Its advantage is avoiding FFI usage.
+
+-- Snapshot slots are restored in the reversed order.
+-- The recording order is the following (from the bottom of the
+-- trace to the top):
+-- - 0th  (ofs == -16) -- `f64()` replaced the `tail64()` on the
+--                         stack,
+-- - 63rd (ofs == 488) -- 1,
+-- - 64th (ofs == 496) -- 2.
+-- At recording, the instructions for the 0th and 63rd slots are
+-- merged like the following:
+-- | str   x3, [x19, #496]
+-- | stp   x2, x1, [x19, #488]
+-- The first store is dominated by the stp, so the restored value
+-- is incorrect.
+
+-- Function with 63 slots on the stack.
+local function f63()
Minor: Hardcode a number of slots to the function name looks odd.
It is mentioned above why exactly this amount of slots is required.
It shouldn't be touched.

The question was about hard-coding a number in a function name, not about

using exactly this number of slots. Ok, I'll not insist, as I said in a question.


      
The same for tail63. Bumping a number of slots will

require renaming of two functions.

Feel free to ignore.
Ignoring.


        
+  -- 61 unused slots to avoid extra stores in between.
+  -- luacheck: no unused
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _
+  return 1, 2
+end
+
<snipped>

+test:done(true)

    
--------------pbQFTZ2Fy05Y0Tjp0a04U5nn--