Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Bronnikov via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Sergey Kaplun <skaplun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusing for unaligned accesses.
Date: Fri, 27 Jun 2025 16:20:53 +0300	[thread overview]
Message-ID: <63170032-dc7a-47e4-ad84-9627a02070e0@tarantool.org> (raw)
In-Reply-To: <20250626151224.27925-1-skaplun@tarantool.org>

[-- Attachment #1: Type: text/plain, Size: 5750 bytes --]

Hi, Sergey,

thanks for the patch! LGTM with a minor two comments below.

Sergey

On 6/26/25 18:12, Sergey Kaplun wrote:
> From: Mike Pall <mike>
>
> Thanks to Peter Cawley.
>
> (cherry picked from commit 0fa2f1cbcf023ad0549f1428809e506fa2c78552)
>
> The arm64 emitting of load/store operation works incorrectly in the
> case when at least one offset of load/store to be fused into ldp/stp is
> misaligned. In this case this misaligning is ignored, and instructions
> are fused, which leads to loading/storing from/to at least one incorrect
> address.
>
> For example, the following instructions:
> | stur  w0, [x1, #17]
> | stur  w0, [x1, #21]
>
> May be fused to the following:
> | stp   w0, w0, [x1, #16]
>
> This patch prevents fusion in this case by testing the alignment with
> the help of bitwise ROR by the alignment value. In case of misaligned
> offset, the value overflows the 7-bit length mask in the check.
>
> The negative immediate (7-bit width including sign bit) is limited by
> the corresponding addition of `64 << sc` (it is harmless in the case of
> positive values).
>
> Sergey Kaplun:
> * added the description and the test for the problem
>
> Part of tarantool/tarantool#11278
> ---
>
> Related issues:
> *https://github.com/LuaJIT/LuaJIT/issues/1056
> *https://github.com/tarantool/tarantool/issues/11278
> Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1056-arm64-ldp-sdp-misaligned-fusing
>
>   src/lj_emit_arm64.h                           |  2 +-
>   ...6-arm64-ldp-sdp-misaligned-fusing.test.lua | 98 +++++++++++++++++++
>   2 files changed, 99 insertions(+), 1 deletion(-)
>   create mode 100644 test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua
>
> diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
> index 30cd3505..5c1bc372 100644
> --- a/src/lj_emit_arm64.h
> +++ b/src/lj_emit_arm64.h
> @@ -142,7 +142,7 @@ static void emit_lso(ASMState *as, A64Ins ai, Reg rd, Reg rn, int64_t ofs)
>       } else {
>         goto nopair;
>       }
> -    if (ofsm >= (int)((unsigned int)-64<<sc) && ofsm <= (63<<sc)) {
> +    if (lj_ror((unsigned int)ofsm + (64u<<sc), sc) <= 127u) {
>         *as->mcp = aip | A64F_N(rn) | (((ofsm >> sc) & 0x7f) << 15) |
>   	(ai ^ ((ai == A64I_LDRx || ai == A64I_STRx) ? 0x50000000 : 0x90000000));
>         return;
> diff --git a/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua
> new file mode 100644
> index 00000000..5d03097e
> --- /dev/null
> +++ b/test/tarantool-tests/lj-1056-arm64-ldp-sdp-misaligned-fusing.test.lua
> @@ -0,0 +1,98 @@
> +local tap = require('tap')
> +local ffi = require('ffi')
> +
> +-- This test demonstrates LuaJIT's incorrect emitting of LDP/STP
> +-- instructions from LDUR/STUR instructions with misaligned offset
> +-- on arm64.
> +-- See alsohttps://github.com/LuaJIT/LuaJIT/issue/1056.
s/issue/issues/
> +local test = tap.test('lj-1056-arm64-ldp-sdp-misaligned-fusing'):skipcond({
> +  ['Test requires JIT enabled'] = not jit.status(),
> +})
> +
> +test:plan(6)
> +
> +-- Amount of iterations to compile and run the invariant part of
> +-- the trace.
> +local N_ITERATIONS = 4
> +
> +local EXPECTED = 42
> +
> +-- 9 bytes to make unaligned 4-byte access like buf + 5.
> +local BUFLEN = 9
> +local buf = ffi.new('unsigned char [' .. BUFLEN .. ']', 0)
> +
> +local function clear_buf()
> +  ffi.fill(buf, ffi.sizeof(buf), 0)
> +end
> +
> +-- Initialize the buffer with simple values.
> +local function init_buf()
> +  for i = 0, BUFLEN - 1 do
> +    buf[i] = i
> +  end
> +end
> +
> +local function test_buf_content(expected_bytes, msg)
> +  local got_bytes = {}
> +  assert(#expected_bytes == BUFLEN, 'mismatched size of buffer and table')
> +  for i = 0, BUFLEN - 1 do
> +    got_bytes[i + 1] = buf[i]
> +  end
> +test:is_deeply(got_bytes, expected_bytes, msg)
> +end
> +
> +jit.opt.start('hotloop=1')
> +
> +-- Test stores.
> +
> +for _ = 1, N_ITERATIONS do
> +  local ptr = ffi.cast('unsigned char *', buf)
> +  -- These 2 accesses become ptr + 0 and ptr + 4 on the trace
> +  -- before the patch.
> +  ffi.cast('int32_t *', ptr + 1)[0] = EXPECTED
> +  ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED
> +end
> +test_buf_content({0, EXPECTED, 0, 0, 0, EXPECTED, 0, 0, 0},
> +                 'pair of misaligned stores')
> +
> +clear_buf()
> +
> +for _ = 1, N_ITERATIONS do
> +  local ptr = ffi.cast('unsigned char *', buf)
> +  -- The next access becomes ptr + 4 on the trace before the
> +  -- patch.
> +  ffi.cast('int32_t *', ptr + 5)[0] = EXPECTED
> +  ffi.cast('int32_t *', ptr)[0] = EXPECTED
> +end
> +test_buf_content({EXPECTED, 0, 0, 0, 0, EXPECTED, 0, 0, 0},
> +                 'aligned / misaligned stores')
> +
> +-- Test loads.
> +
> +local resl, resr = 0, 0
> +
> +init_buf()
> +
> +for _ = 1, N_ITERATIONS do
> +  local ptr = ffi.cast('unsigned char *', buf)
> +  -- These 2 accesses become ptr + 0 and ptr + 4 on the trace
> +  -- before the patch.
> +  resl = ffi.cast('int32_t *', ptr + 1)[0]
> +  resr = ffi.cast('int32_t *', ptr + 5)[0]
> +end
> +
> +test:is(resl, 0x4030201, 'pair of misaligned loads, left')
> +test:is(resr, 0x8070605, 'pair of misaligned loads, right')

What does mean these magic numbers? Please add a comment or

use a variable with self-explained name. Here and below.

> +
> +for _ = 1, N_ITERATIONS do
> +  local ptr = ffi.cast('unsigned char *', buf)
> +  -- The next access becomes ptr + 4 on the trace before the
> +  -- patch.
> +  resr = ffi.cast('int32_t *', ptr + 5)[0]
> +  resl = ffi.cast('int32_t *', ptr)[0]
> +end
> +
> +test:is(resl, 0x3020100, 'aligned / misaligned load, aligned')
> +test:is(resr, 0x8070605, 'aligned / misaligned load, misaligned')
> +
> +test:done(true)

[-- Attachment #2: Type: text/html, Size: 6801 bytes --]

  reply	other threads:[~2025-06-27 13:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-26 15:12 Sergey Kaplun via Tarantool-patches
2025-06-27 13:20 ` Sergey Bronnikov via Tarantool-patches [this message]
2025-06-30  7:26   ` Sergey Kaplun via Tarantool-patches
2025-07-01  8:53     ` Sergey Bronnikov via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63170032-dc7a-47e4-ad84-9627a02070e0@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=skaplun@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusing for unaligned accesses.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox