From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id EE77642080C; Wed, 25 Jun 2025 18:35:36 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EE77642080C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1750865737; bh=a+vtA7Q4Hk7TCXUvnx3CnLjmfXjLtLhtOs9ER7kDov0=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=dRQWjk6Z7gER0NkwAQuinFDqUQ9xPRO5L0dCyPpkb6prnxRylDM4DD9y9C5Lyc8Ar lPgLvPWj70bpB7dLZLZeMnoCjXYLcfq905jfzYjnh85pTXLE5ezcPHcqnxme3wnnDI REgckT1x50NSRO4C9aQ62E19xjC1CundVOAHY1G8= Received: from send240.i.mail.ru (send240.i.mail.ru [95.163.59.79]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 41BE942080C for ; Wed, 25 Jun 2025 18:35:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 41BE942080C Received: by exim-smtp-68f89ddb46-mbccl with esmtpa (envelope-from ) id 1uUS9u-000000004gB-1JOF; Wed, 25 Jun 2025 18:35:34 +0300 Content-Type: multipart/alternative; boundary="------------xKn3DQQosH5BtkAI9cL7zQbg" Message-ID: Date: Wed, 25 Jun 2025 18:35:34 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org References: <20250611160142.19383-1-skaplun@tarantool.org> In-Reply-To: <20250611160142.19383-1-skaplun@tarantool.org> X-Mailru-Src: smtp X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9D919194CF4FC66048688A9712638A5F39E38945EEBFA69D600894C459B0CD1B9C281C0DEA4259934D4FF92D56319F197AABB38A774DF055BD7D2C03B1B21DDB0224C9A2F61B97637 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7495A032B936E882FEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637FE9EFE935CD7C6AE8638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B26026C025FE5B37FF2E070BE324C7D3C48ABBD783D1BE7FA9F6B57BC7E64490618DEB871D839B73339E8FC8737B5C224936DA1BED736F9328CC7F00164DA146DAFE8445B8C89999729449624AB7ADAF37F6B57BC7E64490611E7FA7ABCAF51C92176DF2183F8FC7C0C26CFBAC0749D213D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE9647ADFADE5905B19735652A29929C6C4AD6D5ED66289B523666184CF4C3C14F6136E347CC761E07725E5C173C3A84C30AA277257C6A5E3D76E601842F6C81A1F004C906525384303E02D724532EE2C3F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CB59C7783CC88FA9643847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A592851113AA70A23E5002B1117B3ED6966D68A9006BABB95DED71F038FC046993823CB91A9FED034534781492E4B8EEAD09F854029C6BD0DABDAD6C7F3747799A X-C8649E89: 1C3962B70DF3F0ADE00A9FD3E00BEEDF3FED46C3ACD6F73ED3581295AF09D3DF87807E0823442EA2ED31085941D9CD0AF7F820E7B07EA4CFA0CA213D107CF0594B568C1FC7333BB2A286E7A30B53D02A20946FBA8A0BAB26486D6DC23143AA0EDA027526E6D8A3D2C5FE96CAEF457261F3A395D432DDF310730F61FD13ACF833111DC66A97D0BFE2913E6812662D5F2AB9AF64DB4688768036DF5FE9C0001AF333F2C28C22F508233FCF178C6DD14203 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu53w8ahmwBjZKM/YPHZyZHvz5uv+WouB9+ObcCpyrx6l7KImUglyhkEat/+ysWwi0gdhEs0JGjl6ggRWTy1haxBpVdbIX1nthFXMZebaIdHP2ghjoIc/363UZI6Kf1ptIMVVWXk7QTiVzHQS3vWiBdSCA= X-Mailru-Sender: 520A125C2F17F0B1E52FEF5D219D6140C281C0DEA4259934D4FF92D56319F19720592513087CE6E10152A3D17938EB451EB5A0BCEC6A560B3DDE9B364B0DF289BE2DA36745F2EEB5CEBA01FB949A1F1EEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP code generation. X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Sergey Bronnikov via Tarantool-patches Reply-To: Sergey Bronnikov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" This is a multi-part message in MIME format. --------------xKn3DQQosH5BtkAI9cL7zQbg Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, Sergey, thanks for the patch! LGTM Sergey On 6/11/25 19:01, Sergey Kaplun wrote: > From: Mike Pall > > Thanks to Zhongwei Yao. > > (cherry picked from commit 9493acc1a28f15b0ac4453e716f33436186c7acd) > > When fusing two LDR (STR) instructions to the single LDP (STP) > instruction, the arm64 emitter shifts the offset value to encode the > immediate. In the case when the offset is negative, the resulting field > value exceeds the 7-bit length of the immediate, see [1]. This results > in the invalid instruction decoding. > > This patch fixes this by masking the value with the 7-bit-width mask > `0x7f`. > > Sergey Kaplun: > * added the description and the test for the problem > > [1]:https://developer.arm.com/documentation/ddi0602/2025-03/Base-Instructions/LDP--Load-pair-of-registers- > > Part of tarantool/tarantool#11278 > --- > > Related issues: > *https://github.com/LuaJIT/LuaJIT/pull/1028 > *https://github.com/tarantool/tarantool/issues/11278 > Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1028-ldr-fusion-to-ldp-negative-offset > > src/lj_emit_arm64.h | 2 +- > ...ldr-fusion-to-ldp-negative-offset.test.lua | 45 +++++++++++++++++++ > 2 files changed, 46 insertions(+), 1 deletion(-) > create mode 100644 test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua > > diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h > index e1a9d3e4..30cd3505 100644 > --- a/src/lj_emit_arm64.h > +++ b/src/lj_emit_arm64.h > @@ -143,7 +143,7 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) > goto nopair; > } > if (ofsm >= (int)((unsigned int)-64< - *as->mcp = aip | A64F_N(rn) | ((ofsm >> sc) << 15) | > + *as->mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) | > (ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000)); > return; > } > diff --git a/test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua b/test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua > new file mode 100644 > index 00000000..1ba28449 > --- /dev/null > +++ b/test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua > @@ -0,0 +1,45 @@ > +local tap = require('tap') > +local ffi = require('ffi') > + > +-- This test demonstrates LuaJIT's incorrect emitting of LDP > +-- instruction with negative offset fused from LDR on arm64. > +-- See alsohttps://github.com/LuaJIT/LuaJIT/pull/1028. > +local test = tap.test('lj-1028-ldr-fusion-to-ldp-negative-offset'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(1) > + > +-- Amount of iterations to compile and start the trace. > +local N_ITERATIONS = 4 > + > +ffi.cdef[[ > + typedef struct data { > + int32_t m1; > + int32_t m2; > + } data; > +]] > + > +local data_arr = ffi.new('data[' .. N_ITERATIONS .. ']') > + > +local const_data_ptr = ffi.typeof('const data *') > +local data = ffi.cast(const_data_ptr, data_arr) > + > +local results = {} > + > +jit.opt.start('hotloop=1') > + > +for i = 1, N_ITERATIONS do > + -- Pair loading from the negative offset generates an invalid > + -- instruction on AArch64 before this patch. > + local field = data[i - 1] > + local m1 = field.m1 > + local m2 = field.m2 > + > + -- Use loaded values to avoid DCE. > + results[i] = m1 + m2 > +end > + > +test:samevalues(results, 'no invalid instruction') > + > +test:done(true) --------------xKn3DQQosH5BtkAI9cL7zQbg Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

Hi, Sergey,

thanks for the patch! LGTM

Sergey

On 6/11/25 19:01, Sergey Kaplun wrote:
From: Mike Pall <mike>

Thanks to Zhongwei Yao.

(cherry picked from commit 9493acc1a28f15b0ac4453e716f33436186c7acd)

When fusing two LDR (STR) instructions to the single LDP (STP)
instruction, the arm64 emitter shifts the offset value to encode the
immediate. In the case when the offset is negative, the resulting field
value exceeds the 7-bit length of the immediate, see [1]. This results
in the invalid instruction decoding.

This patch fixes this by masking the value with the 7-bit-width mask
`0x7f`.

Sergey Kaplun:
* added the description and the test for the problem

[1]: https://developer.arm.com/documentation/ddi0602/2025-03/Base-Instructions/LDP--Load-pair-of-registers-

Part of tarantool/tarantool#11278
---

Related issues:
* https://github.com/LuaJIT/LuaJIT/pull/1028
* https://github.com/tarantool/tarantool/issues/11278
Branch: https://github.com/tarantool/luajit/tree/skaplun/lj-1028-ldr-fusion-to-ldp-negative-offset

 src/lj_emit_arm64.h                           |  2 +-
 ...ldr-fusion-to-ldp-negative-offset.test.lua | 45 +++++++++++++++++++
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua

diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
index e1a9d3e4..30cd3505 100644
--- a/src/lj_emit_arm64.h
+++ b/src/lj_emit_arm64.h
@@ -143,7 +143,7 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
       goto nopair;
     }
     if (ofsm >= (int)((unsigned int)-64<<sc) && ofsm <= (63<<sc)) {
-      *as->mcp = aip | A64F_N(rn) | ((ofsm >> sc) << 15) |
+      *as->mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) |
 	(ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000));
       return;
     }
diff --git a/test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua b/test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua
new file mode 100644
index 00000000..1ba28449
--- /dev/null
+++ b/test/tarantool-tests/lj-1028-ldr-fusion-to-ldp-negative-offset.test.lua
@@ -0,0 +1,45 @@
+local tap = require('tap')
+local ffi = require('ffi')
+
+-- This test demonstrates LuaJIT's incorrect emitting of LDP
+-- instruction with negative offset fused from LDR on arm64.
+-- See also https://github.com/LuaJIT/LuaJIT/pull/1028.
+local test = tap.test('lj-1028-ldr-fusion-to-ldp-negative-offset'):skipcond({
+  ['Test requires JIT enabled'] = not jit.status(),
+})
+
+test:plan(1)
+
+-- Amount of iterations to compile and start the trace.
+local N_ITERATIONS = 4
+
+ffi.cdef[[
+  typedef struct data {
+    int32_t m1;
+    int32_t m2;
+  } data;
+]]
+
+local data_arr = ffi.new('data[' .. N_ITERATIONS .. ']')
+
+local const_data_ptr = ffi.typeof('const data *')
+local data = ffi.cast(const_data_ptr, data_arr)
+
+local results = {}
+
+jit.opt.start('hotloop=1')
+
+for i = 1, N_ITERATIONS do
+  -- Pair loading from the negative offset generates an invalid
+  -- instruction on AArch64 before this patch.
+  local field = data[i - 1]
+  local m1 = field.m1
+  local m2 = field.m2
+
+  -- Use loaded values to avoid DCE.
+  results[i] = m1 + m2
+end
+
+test:samevalues(results, 'no invalid instruction')
+
+test:done(true)
--------------xKn3DQQosH5BtkAI9cL7zQbg--