From: Sergey Bronnikov via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Sergey Kaplun <skaplun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusing for unaligned accesses. Date: Fri, 27 Jun 2025 16:20:53 +0300 [thread overview] Message-ID: <63170032-dc7a-47e4-ad84-9627a02070e0@tarantool.org> (raw) In-Reply-To: <20250626151224.27925-1-skaplun@tarantool.org> [-- Attachment #1: Type: text/plain, Size: 5750 bytes --] Hi, Sergey, thanks for the patch! LGTM with a minor two comments below. Sergey On 6/26/25 18:12, Sergey Kaplun wrote: > From: Mike Pall <mike> > > Thanks to Peter Cawley. > > (cherry picked from commit 0fa2f1cbcf023ad0549f1428809e506fa2c78552) > > The arm64 emitting of load/store operation works incorrectly in the > case when at least one offset of load/store to be fused into ldp/stp is > misaligned. In this case this misaligning is ignored, and instructions > are fused, which leads to loading/storing from/to at least one incorrect > address. > > For example, the following instructions: > | stur w0, [x1, #17] > | stur w0, [x1, #21] > > May be fused to the following: > | stp w0, w0, [x1, #16] > > This patch prevents fusion in this case by testing the alignment with > the help of bitwise ROR by the alignment value. In case of misaligned > offset, the value overflows the 7-bit length mask in the check. > > The negative immediate (7-bit width including sign bit) is limited by > the corresponding addition of `64 << sc` (it is harmless in the case of > positive values). > > Sergey Kaplun: > * added the description and the test for the problem > > Part of tarantool/tarantool#11278 > --- > > Related issues: > *https://github.com/LuaJIT/LuaJIT/issues/1056 > *https://github.com/tarantool/tarantool/issues/11278 > Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1056-arm64-ldp-sdp-misaligned-fusing > > src/lj_emit_arm64.h | 2 +- > ...6-arm64-ldp-sdp-misaligned-fusing.test.lua | 98 +++++++++++++++++++ > 2 files changed, 99 insertions(+), 1 deletion(-) > create mode 100644 test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua > > diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h > index 30cd3505..5c1bc372 100644 > --- a/src/lj_emit_arm64.h > +++ b/src/lj_emit_arm64.h > @@ -142,7 +142,7 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs) > } else { > goto nopair; > } > - if (ofsm >= (int)((unsigned int)-64<<sc) && ofsm <= (63<<sc)) { > + if (lj_ror((unsigned int)ofsm + (64u<<sc), sc) <= 127u) { > *as->mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) | > (ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000)); > return; > diff --git a/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua > new file mode 100644 > index 00000000..5d03097e > --- /dev/null > +++ b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua > @@ -0,0 +1,98 @@ > +local tap = require('tap') > +local ffi = require('ffi') > + > +-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP > +-- instructions from LDUR/STUR instructions with misaligned offset > +-- on arm64. > +-- See alsohttps://github.com/LuaJIT/LuaJIT/issue/1056. s/issue/issues/ > +local test = tap.test('lj-1056-arm64-ldp-sdp-misaligned-fusing'):skipcond({ > + ['Test requires JIT enabled'] = not jit.status(), > +}) > + > +test:plan(6) > + > +-- Amount of iterations to compile and run the invariant part of > +-- the trace. > +local N_ITERATIONS = 4 > + > +local EXPECTED = 42 > + > +-- 9 bytes to make unaligned 4-byte access like buf + 5. > +local BUFLEN = 9 > +local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0) > + > +local function clear_buf() > + ffi.fill(buf, ffi.sizeof(buf), 0) > +end > + > +-- Initialize the buffer with simple values. > +local function init_buf() > + for i = 0, BUFLEN - 1 do > + buf[i] = i > + end > +end > + > +local function test_buf_content(expected_bytes, msg) > + local got_bytes = {} > + assert(#expected_bytes == BUFLEN, 'mismatched size of buffer and table') > + for i = 0, BUFLEN - 1 do > + got_bytes[i + 1] = buf[i] > + end > +test:is_deeply(got_bytes, expected_bytes, msg) > +end > + > +jit.opt.start('hotloop=1') > + > +-- Test stores. > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- These 2 accesses become ptr + 0 and ptr + 4 on the trace > + -- before the patch. > + ffi.cast('int32_t *', ptr + 1)[0] = EXPECTED > + ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED > +end > +test_buf_content({0, EXPECTED, 0, 0, 0, EXPECTED, 0, 0, 0}, > + 'pair of misaligned stores') > + > +clear_buf() > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- The next access becomes ptr + 4 on the trace before the > + -- patch. > + ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED > + ffi.cast('int32_t *', ptr)[0] = EXPECTED > +end > +test_buf_content({EXPECTED, 0, 0, 0, 0, EXPECTED, 0, 0, 0}, > + 'aligned / misaligned stores') > + > +-- Test loads. > + > +local resl, resr = 0, 0 > + > +init_buf() > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- These 2 accesses become ptr + 0 and ptr + 4 on the trace > + -- before the patch. > + resl = ffi.cast('int32_t *', ptr + 1)[0] > + resr = ffi.cast('int32_t *', ptr + 5)[0] > +end > + > +test:is(resl, 0x4030201, 'pair of misaligned loads, left') > +test:is(resr, 0x8070605, 'pair of misaligned loads, right') What does mean these magic numbers? Please add a comment or use a variable with self-explained name. Here and below. > + > +for _ = 1, N_ITERATIONS do > + local ptr = ffi.cast('unsigned char *', buf) > + -- The next access becomes ptr + 4 on the trace before the > + -- patch. > + resr = ffi.cast('int32_t *', ptr + 5)[0] > + resl = ffi.cast('int32_t *', ptr)[0] > +end > + > +test:is(resl, 0x3020100, 'aligned / misaligned load, aligned') > +test:is(resr, 0x8070605, 'aligned / misaligned load, misaligned') > + > +test:done(true) [-- Attachment #2: Type: text/html, Size: 6801 bytes --]
next prev parent reply other threads:[~2025-06-27 13:20 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2025-06-26 15:12 Sergey Kaplun via Tarantool-patches 2025-06-27 13:20 ` Sergey Bronnikov via Tarantool-patches [this message] 2025-06-30 7:26 ` Sergey Kaplun via Tarantool-patches 2025-07-01 8:53 ` Sergey Bronnikov via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=63170032-dc7a-47e4-ad84-9627a02070e0@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=sergeyb@tarantool.org \ --cc=skaplun@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusing for unaligned accesses.' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox