[Tarantool-patches] [PATCH luajit] ARM64: Fix LDP/STP fusion (again).
Sergey Bronnikov
sergeyb at tarantool.org
Mon Sep 8 13:40:48 MSK 2025
LGTM
On 9/8/25 12:48, Sergey Kaplun wrote:
> Hi, Sergey!
> Thanks for the review!
> Fixed your comment and force-pushed the branch.
>
> On 08.09.25, Sergey Bronnikov wrote:
>> Hi, Sergey,
>>
>> thanks for the patch! LGTM with two minor comments
>>
>> Sergey
>>
> <snipped>
>
>>> On 8/27/25 12:17, Sergey Kaplun wrote:
>>>> From: Mike Pall <mike>
>>>>
>>>> Reported and analyzed by Zhongwei Yao. Fix by Peter Cawley.
>>>>
>>>> (cherry picked from commit b8c6ccd50c61b7a2df5123ddc5a85ac7d089542b)
>>>>
>>>> Assume we have stores/loads from the pointer with offset +488 and -16.
>>>> The lower bits of the offset are the same as for the offset (488 + 8).
>>>> This leads to the incorrect fusion of these instructions:
>>>> | str x20, [x21, 488]
>>>> | stur x20, [x21, -16]
>>>> to the following instruction:
>>>> | stp x20, x20, [x21, 488]
>>>>
>>>> This patch prevents this fusion by more accurate offset comparison.
>>>>
>>>> Sergey Kaplun:
>>>> * added the description and the test for the problem
>>>>
>>>> Part of tarantool/tarantool#11691
>>>> ---
>>>>
>>>> Branch:https://github.com/tarantool/luajit/tree/skaplun/lj-1075-arm64-incorrect-ldp-stp-fusion
>>>> Related issues:
>>>> *https://github.com/tarantool/tarantool/issues/11691
>>>> *https://github.com/LuaJIT/LuaJIT/issues/1075
>>>>
>>>> src/lj_emit_arm64.h | 17 ++-
>>>> ...75-arm64-incorrect-ldp-stp-fusion.test.lua | 129 ++++++++++++++++++
>>>> 2 files changed, 142 insertions(+), 4 deletions(-)
>>>> create mode 100644 test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
>>>>
>>>> diff --git a/src/lj_emit_arm64.h b/src/lj_emit_arm64.h
>>>> index 5c1bc372..9dd92c40 100644
>>>> --- a/src/lj_emit_arm64.h
>>>> +++ b/src/lj_emit_arm64.h
> <snipped>
>
>>>> diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
>>>> new file mode 100644
>>>> index 00000000..c84c3b23
>>>> --- /dev/null
>>>> +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
>>>> @@ -0,0 +1,129 @@
> <snipped>
>
>>>> +
>>>> +jit.opt.start('hotloop=2')
>> Why 2? It deserves a comment, because usually we use 1 hotloop.
> It's a copy-pasting mistake from the aarch64 machine, fixed to
> `hotloop=1`, thanks:
Thanks!
>
> ===================================================================
> diff --git a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
> index c84c3b23..393a1aa7 100644
> --- a/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
> +++ b/test/tarantool-tests/lj-1075-arm64-incorrect-ldp-stp-fusion.test.lua
> @@ -40,7 +40,7 @@ local function init_buf()
> end
> end
>
> -jit.opt.start('hotloop=2')
> +jit.opt.start('hotloop=1')
>
> -- Assume we have stores/loads from the pointer with offset
> -- +488 and -16. The lower 7 bits of the offset (-16) >> 2 are
> ===================================================================
>
> <snipped>
>
>>>> +
>>>> +-- Another reproducer that is based on the snapshot restoring.
>>>> +-- Its advantage is avoiding FFI usage.
>>>> +
>>>> +-- Snapshot slots are restored in the reversed order.
>>>> +-- The recording order is the following (from the bottom of the
>>>> +-- trace to the top):
>>>> +-- - 0th (ofs == -16) -- `f64()` replaced the `tail64()` on the
>>>> +-- stack,
>>>> +-- - 63rd (ofs == 488) -- 1,
>>>> +-- - 64th (ofs == 496) -- 2.
>>>> +-- At recording, the instructions for the 0th and 63rd slots are
>>>> +-- merged like the following:
>>>> +-- | str x3, [x19, #496]
>>>> +-- | stp x2, x1, [x19, #488]
>>>> +-- The first store is dominated by the stp, so the restored value
>>>> +-- is incorrect.
>>>> +
>>>> +-- Function with 63 slots on the stack.
>>>> +local function f63()
>> Minor: Hardcode a number of slots to the function name looks odd.
> It is mentioned above why exactly this amount of slots is required.
> It shouldn't be touched.
The question was about hard-coding a number in a function name, not about
using exactly this number of slots. Ok, I'll not insist, as I said in a
question.
>> The same for tail63. Bumping a number of slots will
>>
>> require renaming of two functions.
>>
>> Feel free to ignore.
> Ignoring.
>
>>>> + -- 61 unused slots to avoid extra stores in between.
>>>> + -- luacheck: no unused
>>>> + local _, _, _, _, _, _, _, _, _, _
>>>> + local _, _, _, _, _, _, _, _, _, _
>>>> + local _, _, _, _, _, _, _, _, _, _
>>>> + local _, _, _, _, _, _, _, _, _, _
>>>> + local _, _, _, _, _, _, _, _, _, _
>>>> + local _, _, _, _, _, _, _, _, _, _
>>>> + local _
>>>> + return 1, 2
>>>> +end
>>>> +
> <snipped>
>
>>>> +test:done(true)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20250908/de0383ac/attachment.htm>
More information about the Tarantool-patches
mailing list