From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounces@dev.tarantool.org>
Received: from [87.239.111.99] (localhost [127.0.0.1])
	by dev.tarantool.org (Postfix) with ESMTP id 39DD114BEB3C;
	Mon,  8 Sep 2025 12:26:38 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 39DD114BEB3C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev;
	t=1757323598; bh=UZopt/PBNuL7G8AV+bDSNu7oBq2JVeWuu49K8LApBSg=;
	h=Date:To:Cc:References:In-Reply-To:Subject:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=mTEXg39FWdMYgMMm2UCjpOOkseJwfHZWcPQC+2RXwd65ndya4HskjLcjofeZvd0Ps
	 qgLaXCrgxVcS1pjL6KtipI+qRXoWc3A6ginCPdDxtLFchoYK2DTaNZRi/UbntISJnX
	 dOhEkN23GQF58EbPUcPONd/ryEPrQ7hNVEs6vtAo=
Received: from send60.i.mail.ru (send60.i.mail.ru [89.221.237.155])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 1D45E14BEB39
 for <tarantool-patches@dev.tarantool.org>;
 Mon,  8 Sep 2025 12:26:36 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 1D45E14BEB39
Received: by exim-smtp-c584fb9f-swbmr with esmtpa (envelope-from
 <sergeyb@tarantool.org>)
 id 1uvY8w-00000000A7c-3xnb; Mon, 08 Sep 2025 12:26:35 +0300
Content-Type: multipart/alternative;
 boundary="------------HDQ9Urc93a05OqE96EYWmiKF"
Message-ID: <fa085e96-8056-444f-9134-5287db011005@tarantool.org>
Date: Mon, 8 Sep 2025 12:26:34 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-US
To: Sergey Kaplun <skaplun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
References: <20250827091711.13681-1-skaplun@tarantool.org>
 <a5f88d0d-9554-4412-a1e1-e4b6fef84b72@tarantool.org>
In-Reply-To: <a5f88d0d-9554-4412-a1e1-e4b6fef84b72@tarantool.org>
X-Mailru-Src: smtp
X-4EC0790: 10
X-7564579A: B8F34718100C35BD
X-77F55803: 4F1203BC0FB41BD9C019336662AA214148A42EA06D9F577460AA5A9CD72BAEDB182A05F5380850406FFD02B53701E1F93DE06ABAFEAF6705B42FF669472A446F7F9209188B4A307C02FAF9720996C769
X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7544B1CCE26E01C74EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AC83A81C8FD4AD23D82A6BABE6F325AC2E85FA5F3EDFCBAA7353EFBB553375661343631359B0F9D98B418AE3F32EADEF1B37F0C1F2F1B265B82EB9BFAE038CFA389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A3E989B1926288338941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B6E5E764EB5D94DBD4CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C2249B0E9FD5D4288160E3AA81AA40904B5D9CF19DD082D7633A0C84D3B47A649675F3AA81AA40904B5D98AA50765F7900637F52694881A387976EC76A7562686271ED91E3A1F190DE8FD2E808ACE2090B5E14AD6D5ED66289B5259CC434672EE63711DD303D21008E298D5E8D9A59859A8B6B372FE9A2E580EFC725E5C173C3A84C3C8F21CEC4765490D35872C767BF85DA2F004C90652538430E4A6367B16DE6309
X-C1DE0DAB: 0D63561A33F958A52E2DFCCCB7A155BB5002B1117B3ED69661F49FDB7AD391B8361FAC1196A180DE823CB91A9FED034534781492E4B8EEAD2F8D89FC5850081EBDAD6C7F3747799A
X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D340CB2836B82369449E986666F837A62AF052B9C89AB29B2CDF092F8DF99CDBCE00F1C32091BABD4FE1D7E09C32AA3244C92E8AE1D4D71176177DD89D51EBB7742B7CA81A03A17AA2AEA455F16B58544A2E30DDF7C44BCB90DA5AE236DF995FB59978A700BF655EAEEED6A17656DB59BCAD427812AF56FC65B
X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVdVMtzNxwZu5F0bH1wu/I9E=
X-Mailru-Sender: 811C44EDE0507D1F797560C68D020EBD5EFE59E75E032F95DF237C39DF130474344EF998EC98B43BD45B5A94579F6493645D15D82EE4B272BD6E4642A116CA93524AA66B5ACBE6721EF430B9A63E2A504198E0F3ECE9B5443453F38A29522196
X-Mras: Ok
Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusion
 (again).
X-BeenThere: tarantool-patches@dev.tarantool.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
From: Sergey Bronnikov via Tarantool-patches
 <tarantool-patches@dev.tarantool.org>
Reply-To: Sergey Bronnikov <sergeyb@tarantool.org>
Errors-To: tarantool-patches-bounces@dev.tarantool.org
Sender: "Tarantool-patches" <tarantool-patches-bounces@dev.tarantool.org>

This is a multi-part message in MIME format.
--------------HDQ9Urc93a05OqE96EYWmiKF
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Hi, Sergey,

thanks for the patch! LGTM with two minor comments

Sergey

On 9/8/25 11:54, Sergey Bronnikov wrote:
>
> Hi, Sergey,
>
> The test added with initial fix 
> (test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua)
>
> segfaults with proposed patch.
>
Please disregard, seems there was a misconfiguration or "dirty" build on 
the machine.
>
> CMake configuration: cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug 
> -DLUA_USE_ASSERT=ON -DLUA_USE_APICHECK=ON
>
> Arch: ARM64.
>
> Sergey
>
> On 8/27/25 12:17, Sergey Kaplun wrote:
>> From: Mike Pall <mike>
>>
>> Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley.
>>
>> (cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b)
>>
>> Assume we have stores/loads from the pointer with offset +488 and -16.
>> The lower bits of the offset are the same as for the offset (488 + 8).
>> This leads to the incorrect fusion of these instructions:
>> | str   x20, [x21, 488]
>> | stur  x20, [x21, -16]
>> to the following instruction:
>> | stp   x20, x20, [x21, 488]
>>
>> This patch prevents this fusion by more accurate offset comparison.
>>
>> Sergey Kaplun:
>> * added the description and the test for the problem
>>
>> Part of tarantool/tarantool#11691
>> ---
>>
>> Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion
>> Related issues:
>> *https://github.com/tarantool/tarantool/issues/11691
>> *https://github.com/LuaJIT/LuaJIT/issues/1075
>>
>>   src/lj_emit_arm64.h                           |  17 ++-
>>   ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++
>>   2 files changed, 142 insertions(+), 4 deletions(-)
>>   create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
>>
>> diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
>> index 5c1bc372..9dd92c40 100644
>> --- a/src/lj_emit_arm64.h
>> +++ b/src/lj_emit_arm64.h
>> @@ -121,6 +121,17 @@ static int emit_checkofs(A64Ins ai, int64_t ofs)
>>     }
>>   }
>>   
>> +static LJ_AINLINE uint32_t emit_lso_pair_candidate(A64Ins ai, int ofs, int sc)
>> +{
>> +  if (ofs >= 0) {
>> +    return ai | A64F_U12(ofs>>sc);  /* Subsequent lj_ror checks ofs. */
>> +  } else if (ofs >= -256) {
>> +    return (ai^A64I_LS_U) | A64F_S9(ofs & 0x1ff);
>> +  } else {
>> +    return A64F_D(31);  /* Will mismatch prev. */
>> +  }
>> +}
>> +
>>   static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
>>   {
>>     int ot = emit_checkofs(ai, ofs), sc = (ai >> 30) & 3;
>> @@ -132,11 +143,9 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
>>       uint32_t prev = *as->mcp & ~A64F_D(31);
>>       int ofsm = ofs - (1<<sc), ofsp = ofs + (1<<sc);
>>       A64Ins aip;
>> -    if (prev == (ai | A64F_N(rn) | A64F_U12(ofsm>>sc)) ||
>> -	prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsm&0x1ff))) {
>> +    if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsm, sc)) {
>>         aip = (A64F_A(rd) | A64F_D(*as->mcp & 31));
>> -    } else if (prev == (ai | A64F_N(rn) | A64F_U12(ofsp>>sc)) ||
>> -	       prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsp&0x1ff))) {
>> +    } else if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsp, sc)) {
>>         aip = (A64F_D(rd) | A64F_A(*as->mcp & 31));
>>         ofsm = ofs;
>>       } else {
>> diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
>> new file mode 100644
>> index 00000000..c84c3b23
>> --- /dev/null
>> +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
>> @@ -0,0 +1,129 @@
>> +local tap = require('tap')
>> +local ffi = require('ffi')
>> +
>> +-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP
>> +-- instruction fused from LDR/STR with negative offset and
>> +-- positive offset with the same lower bits on arm64.
>> +-- See alsohttps://github.com/LuaJIT/LuaJIT/pull/1075.
>> +local test = tap.test('lj-1075-arm64-incorrect-ldp-stp-fusion'):skipcond({
>> +  ['Test requires JIT enabled'] = not jit.status(),
>> +})
>> +
>> +test:plan(6)
>> +
>> +-- Amount of iterations to compile and run the invariant part of
>> +-- the trace.
>> +local N_ITERATIONS = 4
>> +
>> +local EXPECTED = 42
>> +
>> +-- 4 slots of redzone for int64_t load/store.
>> +local REDZONE = 4
>> +local MASK_IMM7 = 0x7f
>> +local BUFLEN = (MASK_IMM7 + REDZONE) * 4
>> +local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0)
>> +
>> +local function clear_buf()
>> +  ffi.fill(buf, ffi.sizeof(buf), 0)
>> +end
>> +
>> +-- Initialize the buffer with simple values.
>> +local function init_buf()
>> +  -- Limit to fill the buffer. 0 in the top part helps
>> +  -- to detect the issue.
>> +  local LIMIT = BUFLEN - 12
>> +  for i = 0, LIMIT - 1  do
>> +    buf[i] = i
>> +  end
>> +  for i = LIMIT, BUFLEN - 1  do
>> +    buf[i] = 0
>> +  end
>> +end
>> +
>> +jit.opt.start('hotloop=2')

Why 2? It deserves a comment, because usually we use 1 hotloop.


>> +
>> +-- Assume we have stores/loads from the pointer with offset
>> +-- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are
>> +-- 1111100. These bits are the same as for the offset (488 + 8).
>> +-- Thus, before the patch, these two instructions:
>> +-- | str   x20, [x21, #488]
>> +-- | stur  x20, [x21, #-16]
>> +-- are incorrectly fused to the:
>> +-- | stp   x20, x20, [x21, #488]
>> +
>> +-- Test stores.
>> +
>> +local start = ffi.cast('unsigned char *', buf)
>> +-- Use constants to allow optimization to take place.
>> +local base_ptr = start + 16
>> +for _ = 1, N_ITERATIONS do
>> +  -- Save the result only for the last iteration.
>> +  clear_buf()
>> +  -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496`
>> +  -- on the trace before the patch.
>> +  ffi.cast('uint64_t *', base_ptr + 488)[0] = EXPECTED
>> +  ffi.cast('uint64_t *', base_ptr - 16)[0] = EXPECTED
>> +end
>> +
>> +test:is(buf[488 + 16], EXPECTED, 'correct store top value')
>> +test:is(buf[0], EXPECTED, 'correct store bottom value')
>> +
>> +-- Test loads.
>> +
>> +init_buf()
>> +
>> +local top, bottom
>> +for _ = 1, N_ITERATIONS do
>> +  -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496`
>> +  -- on the trace before the patch.
>> +  top = ffi.cast('uint64_t *', base_ptr + 488)[0]
>> +  bottom = ffi.cast('uint64_t *', base_ptr - 16)[0]
>> +end
>> +
>> +test:is(top, 0xfffefdfcfbfaf9f8ULL, 'correct load top value')
>> +test:is(bottom, 0x706050403020100ULL, 'correct load bottom value')
>> +
>> +-- Another reproducer that is based on the snapshot restoring.
>> +-- Its advantage is avoiding FFI usage.
>> +
>> +-- Snapshot slots are restored in the reversed order.
>> +-- The recording order is the following (from the bottom of the
>> +-- trace to the top):
>> +-- - 0th  (ofs == -16) -- `f64()` replaced the `tail64()` on the
>> +--                         stack,
>> +-- - 63rd (ofs == 488) -- 1,
>> +-- - 64th (ofs == 496) -- 2.
>> +-- At recording, the instructions for the 0th and 63rd slots are
>> +-- merged like the following:
>> +-- | str   x3, [x19, #496]
>> +-- | stp   x2, x1, [x19, #488]
>> +-- The first store is dominated by the stp, so the restored value
>> +-- is incorrect.
>> +
>> +-- Function with 63 slots on the stack.
>> +local function f63()

Minor: Hardcode a number of slots to the function name looks odd.

The same for tail63. Bumping a number of slots will

require renaming of two functions.

Feel free to ignore.

>> +  -- 61 unused slots to avoid extra stores in between.
>> +  -- luacheck: no unused
>> +  local _, _, _, _, _, _, _, _, _, _
>> +  local _, _, _, _, _, _, _, _, _, _
>> +  local _, _, _, _, _, _, _, _, _, _
>> +  local _, _, _, _, _, _, _, _, _, _
>> +  local _, _, _, _, _, _, _, _, _, _
>> +  local _, _, _, _, _, _, _, _, _, _
>> +  local _
>> +  return 1, 2
>> +end
>> +
>> +local function tail63()
>> +  return f63()
>> +end
>> +
>> +-- Record the trace.
>> +tail63()
>> +tail63()
>> +-- Run the trace.
>> +local one, two = tail63()
>> +test:is(one, 1, 'correct 1st value on stack')
>> +test:is(two, 2, 'correct 2nd value on stack')
>> +
>> +test:done(true)
--------------HDQ9Urc93a05OqE96EYWmiKF
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

<!DOCTYPE html>
<html data-lt-installed="true">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body style="padding-bottom: 1px;">
    <p>Hi, Sergey,</p>
    <p>thanks for the patch! LGTM with two minor comments</p>
    <p>Sergey<br>
    </p>
    <div class="moz-cite-prefix">On 9/8/25 11:54, Sergey Bronnikov
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:a5f88d0d-9554-4412-a1e1-e4b6fef84b72@tarantool.org">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>Hi, Sergey,</p>
      <p>The test added with initial fix
        (test/tarantool-tests/lj-1057-arm64-stp-fusing-across-tbar.test.lua)</p>
      <p>segfaults with proposed patch.</p>
    </blockquote>
    Please disregard, seems there was a misconfiguration or "dirty"
    build on the machine.<br>
    <blockquote type="cite"
      cite="mid:a5f88d0d-9554-4412-a1e1-e4b6fef84b72@tarantool.org">
      <p>CMake configuration: cmake -S . -B build
        -DCMAKE_BUILD_TYPE=Debug -DLUA_USE_ASSERT=ON
        -DLUA_USE_APICHECK=ON</p>
      <p>Arch: ARM64.</p>
      <p>Sergey<br>
      </p>
      <div class="moz-cite-prefix">On 8/27/25 12:17, Sergey Kaplun
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:20250827091711.13681-1-skaplun@tarantool.org">
        <pre wrap="" class="moz-quote-pre">From: Mike Pall &lt;mike&gt;

Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley.

(cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b)

Assume we have stores/loads from the pointer with offset +488 and -16.
The lower bits of the offset are the same as for the offset (488 + 8).
This leads to the incorrect fusion of these instructions:
| str   x20, [x21, 488]
| stur  x20, [x21, -16]
to the following instruction:
| stp   x20, x20, [x21, 488]

This patch prevents this fusion by more accurate offset comparison.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11691
---

Branch: <a class="moz-txt-link-freetext"
href="https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion"
        moz-do-not-send="true">https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion</a>
Related issues:
* <a class="moz-txt-link-freetext"
        href="https://github.com/tarantool/tarantool/issues/11691"
        moz-do-not-send="true">https://github.com/tarantool/tarantool/issues/11691</a>
* <a class="moz-txt-link-freetext"
        href="https://github.com/LuaJIT/LuaJIT/issues/1075"
        moz-do-not-send="true">https://github.com/LuaJIT/LuaJIT/issues/1075</a>

 src/lj_emit_arm64.h                           |  17 ++-
 ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++
 2 files changed, 142 insertions(+), 4 deletions(-)
 create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua

diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
index 5c1bc372..9dd92c40 100644
--- a/src/lj_emit_arm64.h
+++ b/src/lj_emit_arm64.h
@@ -121,6 +121,17 @@ static int emit_checkofs(A64Ins ai, int64_t ofs)
   }
 }
 
+static LJ_AINLINE uint32_t emit_lso_pair_candidate(A64Ins ai, int ofs, int sc)
+{
+  if (ofs &gt;= 0) {
+    return ai | A64F_U12(ofs&gt;&gt;sc);  /* Subsequent lj_ror checks ofs. */
+  } else if (ofs &gt;= -256) {
+    return (ai^A64I_LS_U) | A64F_S9(ofs &amp; 0x1ff);
+  } else {
+    return A64F_D(31);  /* Will mismatch prev. */
+  }
+}
+
 static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
 {
   int ot = emit_checkofs(ai, ofs), sc = (ai &gt;&gt; 30) &amp; 3;
@@ -132,11 +143,9 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
     uint32_t prev = *as-&gt;mcp &amp; ~A64F_D(31);
     int ofsm = ofs - (1&lt;&lt;sc), ofsp = ofs + (1&lt;&lt;sc);
     A64Ins aip;
-    if (prev == (ai | A64F_N(rn) | A64F_U12(ofsm&gt;&gt;sc)) ||
-	prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsm&amp;0x1ff))) {
+    if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsm, sc)) {
       aip = (A64F_A(rd) | A64F_D(*as-&gt;mcp &amp; 31));
-    } else if (prev == (ai | A64F_N(rn) | A64F_U12(ofsp&gt;&gt;sc)) ||
-	       prev == ((ai^A64I_LS_U) | A64F_N(rn) | A64F_S9(ofsp&amp;0x1ff))) {
+    } else if (prev == emit_lso_pair_candidate(ai | A64F_N(rn), ofsp, sc)) {
       aip = (A64F_D(rd) | A64F_A(*as-&gt;mcp &amp; 31));
       ofsm = ofs;
     } else {
diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
new file mode 100644
index 00000000..c84c3b23
--- /dev/null
+++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
@@ -0,0 +1,129 @@
+local tap = require('tap')
+local ffi = require('ffi')
+
+-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP
+-- instruction fused from LDR/STR with negative offset and
+-- positive offset with the same lower bits on arm64.
+-- See also <a class="moz-txt-link-freetext"
        href="https://github.com/LuaJIT/LuaJIT/pull/1075"
        moz-do-not-send="true">https://github.com/LuaJIT/LuaJIT/pull/1075</a>.
+local test = tap.test('lj-1075-arm64-incorrect-ldp-stp-fusion'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(6)
+
+-- Amount of iterations to compile and run the invariant part of
+-- the trace.
+local N_ITERATIONS = 4
+
+local EXPECTED = 42
+
+-- 4 slots of redzone for int64_t load/store.
+local REDZONE = 4
+local MASK_IMM7 = 0x7f
+local BUFLEN = (MASK_IMM7 + REDZONE) * 4
+local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0)
+
+local function clear_buf()
+  ffi.fill(buf, ffi.sizeof(buf), 0)
+end
+
+-- Initialize the buffer with simple values.
+local function init_buf()
+  -- Limit to fill the buffer. 0 in the top part helps
+  -- to detect the issue.
+  local LIMIT = BUFLEN - 12
+  for i = 0, LIMIT - 1  do
+    buf[i] = i
+  end
+  for i = LIMIT, BUFLEN - 1  do
+    buf[i] = 0
+  end
+end
+
+jit.opt.start('hotloop=2')</pre>
      </blockquote>
    </blockquote>
    <p>Why 2? It deserves a comment, because usually we use 1 hotloop.</p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:a5f88d0d-9554-4412-a1e1-e4b6fef84b72@tarantool.org">
      <blockquote type="cite"
        cite="mid:20250827091711.13681-1-skaplun@tarantool.org">
        <pre wrap="" class="moz-quote-pre">
+
+-- Assume we have stores/loads from the pointer with offset
+-- +488 and -16. The lower 7 bits of the offset (-16) &gt;&gt; 2 are
+-- 1111100. These bits are the same as for the offset (488 + 8).
+-- Thus, before the patch, these two instructions:
+-- | str   x20, [x21, #488]
+-- | stur  x20, [x21, #-16]
+-- are incorrectly fused to the:
+-- | stp   x20, x20, [x21, #488]
+
+-- Test stores.
+
+local start = ffi.cast('unsigned char *', buf)
+-- Use constants to allow optimization to take place.
+local base_ptr = start + 16
+for _ = 1, N_ITERATIONS do
+  -- Save the result only for the last iteration.
+  clear_buf()
+  -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496`
+  -- on the trace before the patch.
+  ffi.cast('uint64_t *', base_ptr + 488)[0] = EXPECTED
+  ffi.cast('uint64_t *', base_ptr - 16)[0] = EXPECTED
+end
+
+test:is(buf[488 + 16], EXPECTED, 'correct store top value')
+test:is(buf[0], EXPECTED, 'correct store bottom value')
+
+-- Test loads.
+
+init_buf()
+
+local top, bottom
+for _ = 1, N_ITERATIONS do
+  -- These 2 accesses become `base_ptr + 488` and `base_ptr + 496`
+  -- on the trace before the patch.
+  top = ffi.cast('uint64_t *', base_ptr + 488)[0]
+  bottom = ffi.cast('uint64_t *', base_ptr - 16)[0]
+end
+
+test:is(top, 0xfffefdfcfbfaf9f8ULL, 'correct load top value')
+test:is(bottom, 0x706050403020100ULL, 'correct load bottom value')
+
+-- Another reproducer that is based on the snapshot restoring.
+-- Its advantage is avoiding FFI usage.
+
+-- Snapshot slots are restored in the reversed order.
+-- The recording order is the following (from the bottom of the
+-- trace to the top):
+-- - 0th  (ofs == -16) -- `f64()` replaced the `tail64()` on the
+--                         stack,
+-- - 63rd (ofs == 488) -- 1,
+-- - 64th (ofs == 496) -- 2.
+-- At recording, the instructions for the 0th and 63rd slots are
+-- merged like the following:
+-- | str   x3, [x19, #496]
+-- | stp   x2, x1, [x19, #488]
+-- The first store is dominated by the stp, so the restored value
+-- is incorrect.
+
+-- Function with 63 slots on the stack.
+local function f63()</pre>
      </blockquote>
    </blockquote>
    <p>Minor: Hardcode a number of slots to the function name looks odd.</p>
    <p>The same for tail63. Bumping a number of slots will</p>
    <p>require renaming of two functions.</p>
    <p>Feel free to ignore. </p>
    <blockquote type="cite"
      cite="mid:a5f88d0d-9554-4412-a1e1-e4b6fef84b72@tarantool.org">
      <blockquote type="cite"
        cite="mid:20250827091711.13681-1-skaplun@tarantool.org">
        <pre wrap="" class="moz-quote-pre">
+  -- 61 unused slots to avoid extra stores in between.
+  -- luacheck: no unused
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _, _, _, _, _, _, _, _, _, _
+  local _
+  return 1, 2
+end
+
+local function tail63()
+  return f63()
+end
+
+-- Record the trace.
+tail63()
+tail63()
+-- Run the trace.
+local one, two = tail63()
+test:is(one, 1, 'correct 1st value on stack')
+test:is(two, 2, 'correct 2nd value on stack')
+
+test:done(true)
</pre>
      </blockquote>
      <lt-container></lt-container>
    </blockquote>
  </body>
</html>

--------------HDQ9Urc93a05OqE96EYWmiKF--