From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 07F2943ED44; Fri, 27 Jun 2025 16:20:55 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 07F2943ED44 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1751030455; bh=DepP62XxKPQWULU4sDBLb9A4q+hbgZaawp5h7XOkGlU=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Za1n5SljDKUQY3UGLf3h83ZSOD1SV8myeqNfSMU58G6zRgypII9TLss0Ti/oUWrQM /K8jHKcyi602cGLzINSmoY0GF56PCrjvwzKbvJs5R/pZPvjtnYtlqakRkqObazsJou AbIC1uC6LfM8nvWyp2SQoB5XOVvQdjPtd7dYX2fQ= Received: from send197.i.mail.ru (send197.i.mail.ru [95.163.59.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 710CB43ED44 for ; Fri, 27 Jun 2025 16:20:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 710CB43ED44 Received: by exim-smtp-68f89ddb46-frp5g with esmtpa (envelope-from ) id 1uV90f-00000000OgH-1jbs; Fri, 27 Jun 2025 16:20:53 +0300 Content-Type: multipart/alternative; boundary="------------QXfKlDETmf9V63xhtQdOKXlz" Message-ID: <63170032-dc7a-47e4-ad84-9627a02070e0@tarantool.org> Date: Fri, 27 Jun 2025 16:20:53 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: <20250626151224.27925-1-skaplun@tarantool.org> In-Reply-To: <20250626151224.27925-1-skaplun@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D919194CF4FC66048688A9712638A5F39E38945EEBFA69D600894C459B0CD1B945F33ADEFC24AEF305B18F8CB88C7B49FAFA878F1B09FE104344415C3F23760A83F5067A33008722 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE782A779A89F7D69B2C2099A533E45F2D0395957E7521B51C2CFCAF695D4D8E9FCEA1F7E6F0F101C6759CC434672EE6371C2A783ECEC0211ADC4224003CC836476D5A39DEEDB180909611E41BBFE2FEB2B424DB17C60FEAEA7D833C6231A376FF56FE1CFDE2FDF41D7104924E4FEEE618B9FA2833FD35BB23D9E625A9149C048EE33AC447995A7AD182CC0D3CB04F14752D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8B3A703B70628EAD7BA471835C12D1D977C4224003CC836476EB9C4185024447017B076A6E789B0E975F5C1EE8F4F765FC8DBB596EC94336063AA81AA40904B5D9CF19DD082D7633A0C84D3B47A649675F3AA81AA40904B5D98AA50765F79006375F9A1B9E53A3C84ED81D268191BDAD3D3666184CF4C3C14F3FC91FA280E0CE3D1A620F70A64A45A98AA50765F79006372E808ACE2090B5E1725E5C173C3A84C3C5EA940A35A165FF2DBA43225CD8A89FB26E97DCB74E6252A91E23F1B6B78B78B5C8C57E37DE458BEDA766A37F9254B7 X-C1DE0DAB: 0D63561A33F958A511B1085CC9F812755002B1117B3ED696FFC23A00AF3BE49CB91D2EB2DEE3878C823CB91A9FED034534781492E4B8EEAD14747542773C033FBDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADBF74143AD284FC7177DD89D51EBB7742424CF958EAFF5D571004E42C50DC4CA955A7F0CF078B5EC49A30900B95165D34A503FBFE8BE8FC49683E98D55E8861D354C2F77F0455A5FC59DA5E79DFEAEC8F896A99D78115033A1D7E09C32AA3244C57913CFD28D31C1077DD89D51EBB7742B7BFC3F03634F3CEEA455F16B58544A2E30DDF7C44BCB90DA5AE236DF995FB59978A700BF655EAEEED6A17656DB59BCAD427812AF56FC65B X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVV45pMz5J5bNWeIXINQm/Ns= X-Mailru-Sender: 520A125C2F17F0B1E52FEF5D219D614045F33ADEFC24AEF305B18F8CB88C7B49FC7F893B576EE0560152A3D17938EB451EB5A0BCEC6A560B3DDE9B364B0DF289BE2DA36745F2EEB5CEBA01FB949A1F1EEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusing for unaligned accesses. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------QXfKlDETmf9V63xhtQdOKXlz Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, Sergey, thanks for the patch! LGTM with a minor two comments below. Sergey On 6/26/25 18:12, Sergey Kaplun wrote: > From: Mike Pall > > Thanks to Peter Cawley. > > (cherry picked from commit 0fa2f1cbcf023ad0549f1428809e506fa2c78552) > > The arm64 emitting of load/store operation works incorrectly in the > case when at least one offset of load/store to be fused into ldp/stp is > misaligned. In this case this misaligning is ignored, and instructions > are fused, which leads to loading/storing from/to at least one incorrect > address. > > For example, the following instructions: > | stur w0, [x1, #17] > | stur w0, [x1, #21] > > May be fused to the following: > | stp w0, w0, [x1, #16] > > This patch prevents fusion in this case by testing the alignment with > the help of bitwise ROR by the alignment value. In case of misaligned > offset, the value overflows the 7-bit length mask in the check. > > The negative immediate (7-bit width including sign bit) is limited by > the corresponding addition of `64 << sc` (it is harmless in the case of > positive values). > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#11278 > --- > > Related issues: > *https://github.com/LuaJIT/LuaJIT/issues/1056 > *https://github.com/tarantool/tarantool/issues/11278 > Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1056-arm64-ldp-sdp-misaligned-fusing > > src/lj_emit_arm64.h | 2 +- > ...6-arm64-ldp-sdp-misaligned-fusing.test.lua | 98 +++++++++++++++++++ > 2 files changed, 99 insertions(+), 1 deletion(-) > create mode 100644 test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua > > diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h > index 30cd3505..5c1bc372 100644 > --- a/src/lj_emit_arm64.h > +++ b/src/lj_emit_arm64.h > @@ -142,7 +142,7 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) > } else { > goto nopair; > } > - if (ofsm >= (int)((unsigned int)-64< + if (lj_ror((unsigned int)ofsm + (64u< *as->mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) | > (ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000)); > return; > diff --git a/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua > new file mode 100644 > index 00000000..5d03097e > --- /dev/null > +++ b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua > @@ -0,0 +1,98 @@ > +local tap = require('tap') > +local ffi = require('ffi') > + > +-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP > +-- instructions from LDUR/STUR instructions with misaligned offset > +-- on arm64. > +-- See alsohttps://github.com/LuaJIT/LuaJIT/issue/1056. s/issue/issues/ > +local test = tap.test('lj-1056-arm64-ldp-sdp-misaligned-fusing'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(6) > + > +-- Amount of iterations to compile and run the invariant part of > +-- the trace. > +local N_ITERATIONS = 4 > + > +local EXPECTED = 42 > + > +-- 9 bytes to make unaligned 4-byte access like buf + 5. > +local BUFLEN = 9 > +local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0) > + > +local function clear_buf() > + ffi.fill(buf, ffi.sizeof(buf), 0) > +end > + > +-- Initialize the buffer with simple values. > +local function init_buf() > + for i = 0, BUFLEN - 1 do > + buf[i] = i > + end > +end > + > +local function test_buf_content(expected_bytes, msg) > + local got_bytes = {} > + assert(#expected_bytes == BUFLEN, 'mismatched size of buffer and table') > + for i = 0, BUFLEN - 1 do > + got_bytes[i + 1] = buf[i] > + end > +test:is_deeply(got_bytes, expected_bytes, msg) > +end > + > +jit.opt.start('hotloop=1') > + > +-- Test stores. > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- These 2 accesses become ptr + 0 and ptr + 4 on the trace > + -- before the patch. > + ffi.cast('int32_t *', ptr + 1)[0] = EXPECTED > + ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED > +end > +test_buf_content({0, EXPECTED, 0, 0, 0, EXPECTED, 0, 0, 0}, > + 'pair of misaligned stores') > + > +clear_buf() > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- The next access becomes ptr + 4 on the trace before the > + -- patch. > + ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED > + ffi.cast('int32_t *', ptr)[0] = EXPECTED > +end > +test_buf_content({EXPECTED, 0, 0, 0, 0, EXPECTED, 0, 0, 0}, > + 'aligned / misaligned stores') > + > +-- Test loads. > + > +local resl, resr = 0, 0 > + > +init_buf() > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- These 2 accesses become ptr + 0 and ptr + 4 on the trace > + -- before the patch. > + resl = ffi.cast('int32_t *', ptr + 1)[0] > + resr = ffi.cast('int32_t *', ptr + 5)[0] > +end > + > +test:is(resl, 0x4030201, 'pair of misaligned loads, left') > +test:is(resr, 0x8070605, 'pair of misaligned loads, right') What does mean these magic numbers? Please add a comment or use a variable with self-explained name. Here and below. > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- The next access becomes ptr + 4 on the trace before the > + -- patch. > + resr = ffi.cast('int32_t *', ptr + 5)[0] > + resl = ffi.cast('int32_t *', ptr)[0] > +end > + > +test:is(resl, 0x3020100, 'aligned / misaligned load, aligned') > +test:is(resr, 0x8070605, 'aligned / misaligned load, misaligned') > + > +test:done(true) --------------QXfKlDETmf9V63xhtQdOKXlz Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi, Sergey,

thanks for the patch! LGTM with a minor two comments below.

Sergey

On 6/26/25 18:12, Sergey Kaplun wrote:
From: Mike Pall <mike>

Thanks to Peter Cawley.

(cherry picked from commit 0fa2f1cbcf023ad0549f1428809e506fa2c78552)

The arm64 emitting of load/store operation works incorrectly in the
case when at least one offset of load/store to be fused into ldp/stp is
misaligned. In this case this misaligning is ignored, and instructions
are fused, which leads to loading/storing from/to at least one incorrect
address.

For example, the following instructions:
| stur  w0, [x1, #17]
| stur  w0, [x1, #21]

May be fused to the following:
| stp   w0, w0, [x1, #16]

This patch prevents fusion in this case by testing the alignment with
the help of bitwise ROR by the alignment value. In case of misaligned
offset, the value overflows the 7-bit length mask in the check.

The negative immediate (7-bit width including sign bit) is limited by
the corresponding addition of `64 << sc` (it is harmless in the case of
positive values).

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#11278
---

Related issues:
* https://github.com/LuaJIT/LuaJIT/issues/1056
* https://github.com/tarantool/tarantool/issues/11278
Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1056-arm64-ldp-sdp-misaligned-fusing

 src/lj_emit_arm64.h                           |  2 +-
 ...6-arm64-ldp-sdp-misaligned-fusing.test.lua | 98 +++++++++++++++++++
 2 files changed, 99 insertions(+), 1 deletion(-)
 create mode 100644 test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua

diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
index 30cd3505..5c1bc372 100644
--- a/src/lj_emit_arm64.h
+++ b/src/lj_emit_arm64.h
@@ -142,7 +142,7 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
     } else {
       goto nopair;
     }
-    if (ofsm >= (int)((unsigned int)-64<<sc) && ofsm <= (63<<sc)) {
+    if (lj_ror((unsigned int)ofsm + (64u<<sc), sc) <= 127u) {
       *as->mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) |
 	(ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000));
       return;
diff --git a/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua
new file mode 100644
index 00000000..5d03097e
--- /dev/null
+++ b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua
@@ -0,0 +1,98 @@
+local tap = require('tap')
+local ffi = require('ffi')
+
+-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP
+-- instructions from LDUR/STUR instructions with misaligned offset
+-- on arm64.
+-- See also https://github.com/LuaJIT/LuaJIT/issue/1056.
s/issue/issues/
+local test = tap.test('lj-1056-arm64-ldp-sdp-misaligned-fusing'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(6)
+
+-- Amount of iterations to compile and run the invariant part of
+-- the trace.
+local N_ITERATIONS = 4
+
+local EXPECTED = 42
+
+-- 9 bytes to make unaligned 4-byte access like buf + 5.
+local BUFLEN = 9
+local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0)
+
+local function clear_buf()
+  ffi.fill(buf, ffi.sizeof(buf), 0)
+end
+
+-- Initialize the buffer with simple values.
+local function init_buf()
+  for i = 0, BUFLEN - 1 do
+    buf[i] = i
+  end
+end
+
+local function test_buf_content(expected_bytes, msg)
+  local got_bytes = {}
+  assert(#expected_bytes == BUFLEN, 'mismatched size of buffer and table')
+  for i = 0, BUFLEN - 1 do
+    got_bytes[i + 1] = buf[i]
+  end
+  test:is_deeply(got_bytes, expected_bytes, msg)
+end
+
+jit.opt.start('hotloop=1')
+
+-- Test stores.
+
+for _ = 1, N_ITERATIONS do
+  local ptr = ffi.cast('unsigned char *', buf)
+  -- These 2 accesses become ptr + 0 and ptr + 4 on the trace
+  -- before the patch.
+  ffi.cast('int32_t *', ptr + 1)[0] = EXPECTED
+  ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED
+end
+test_buf_content({0, EXPECTED, 0, 0, 0, EXPECTED, 0, 0, 0},
+                 'pair of misaligned stores')
+
+clear_buf()
+
+for _ = 1, N_ITERATIONS do
+  local ptr = ffi.cast('unsigned char *', buf)
+  -- The next access becomes ptr + 4 on the trace before the
+  -- patch.
+  ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED
+  ffi.cast('int32_t *', ptr)[0] = EXPECTED
+end
+test_buf_content({EXPECTED, 0, 0, 0, 0, EXPECTED, 0, 0, 0},
+                 'aligned / misaligned stores')
+
+-- Test loads.
+
+local resl, resr = 0, 0
+
+init_buf()
+
+for _ = 1, N_ITERATIONS do
+  local ptr = ffi.cast('unsigned char *', buf)
+  -- These 2 accesses become ptr + 0 and ptr + 4 on the trace
+  -- before the patch.
+  resl = ffi.cast('int32_t *', ptr + 1)[0]
+  resr = ffi.cast('int32_t *', ptr + 5)[0]
+end
+
+test:is(resl, 0x4030201, 'pair of misaligned loads, left')
+test:is(resr, 0x8070605, 'pair of misaligned loads, right')

What does mean these magic numbers? Please add a comment or

use a variable with self-explained name. Here and below.

+
+for _ = 1, N_ITERATIONS do
+  local ptr = ffi.cast('unsigned char *', buf)
+  -- The next access becomes ptr + 4 on the trace before the
+  -- patch.
+  resr = ffi.cast('int32_t *', ptr + 5)[0]
+  resl = ffi.cast('int32_t *', ptr)[0]
+end
+
+test:is(resl, 0x3020100, 'aligned / misaligned load, aligned')
+test:is(resr, 0x8070605, 'aligned / misaligned load, misaligned')
+
+test:done(true)
--------------QXfKlDETmf9V63xhtQdOKXlz--